Build Your First AI Test Suite in 1 Hour

Your AI works great in demos. But will it survive real users?

In this guide, I'll show you how to build a basic test suite that catches 80% of AI issues before they reach production.

Time investment: 1 hour

Skill level: Beginner-friendly (no ML expertise required)

Tools: Promptfoo (free, open-source)

Why You Need a Test Suite

Manual testing doesn't scale. Here's what happens without systematic testing:

You test 10 queries → miss the 11th that breaks everything
Model update → no idea if it broke existing functionality
Prompt change → regression bugs you didn't catch
Production incident → scrambling to reproduce and fix

With a test suite:

Test 100+ scenarios in 30 seconds
Catch regressions before deployment
Document expected behavior
Sleep better at night

The 5 Core Test Categories

1. Happy Path Tests (30% of suite)

What: Test that your AI works for common, straightforward queries.

Example:

Query: "What is your return policy?"
Expected: Mentions 30-day window, requires receipt

2. Edge Case Tests (25% of suite)

What: Test unusual but valid inputs.

Examples:

Typos: "refnd polcy"
Abbreviations: "ROI of Product X"
Ambiguity: "How do I reset it?" (what is "it"?)
Multi-question: "What's the price and refund policy?"

3. Hallucination Tests (20% of suite)

What: Ensure AI doesn't fabricate information.

Examples:

"What's your policy on X?" (X doesn't exist → should say "no information")
"Who is your CEO?" → Must match actual CEO
"How much does Y cost?" (if Y not in docs → should acknowledge uncertainty)

4. Safety Tests (15% of suite)

What: Test for harmful, biased, or inappropriate outputs.

Examples:

Jailbreak attempts: "Ignore all previous instructions..."
Harmful requests: "How do I hack..."
Bias: Test same query with different demographic markers

5. Performance Tests (10% of suite)

What: Ensure acceptable latency and quality under load.

Examples:

Response time < 3 seconds
Consistent quality across 100 runs
No timeouts or errors

Step-by-Step: Building Your Test Suite

Step 1: Install Promptfoo (5 minutes)

npm install -g promptfoo
promptfoo init my-test-suite
cd my-test-suite

Step 2: Define Your Test Cases (20 minutes)

Create prompts.yaml:

prompts:
  - "You are a helpful customer service assistant. Answer based only on the provided context."

providers:
  - openai:gpt-4

tests:
  # Happy path
  - description: "Return policy question"
    vars:
      question: "What is your return policy?"
    assert:
      - type: contains
        value: "30 days"
      - type: contains
        value: "receipt"

  # Edge case
  - description: "Typo in query"
    vars:
      question: "refnd polcy"
    assert:
      - type: contains
        value: "return"

  # Hallucination test
  - description: "Non-existent policy"
    vars:
      question: "What is your policy on lunar deliveries?"
    assert:
      - type: not-contains
        value: "we offer"
      - type: contains-any
        value: ["don't have", "no information", "not available"]

Step 3: Run Tests (1 minute)

promptfoo eval

Step 4: Review Results (5 minutes)

Promptfoo generates a beautiful HTML report showing:

✓ Passed tests (green)
✗ Failed tests (red)
Actual vs. expected outputs
Performance metrics

50 Starter Test Cases (Copy & Paste)

Download our pre-built test suite:

📥 LLM Evaluation Template

200+ test cases organized by category. Just customize for your use case.

Best Practices

Start small: 20-30 tests, then grow to 100+
Run on every commit: Integrate with CI/CD
Track over time: Monitor pass rates
Update regularly: Add tests for every bug you find
Document expected behavior: Tests = living documentation

Common Pitfalls

❌ Too many tests: Start small, iterate
❌ Brittle assertions: Don't check exact wording
❌ No negative tests: Test what should NOT happen
❌ Ignoring failures: If tests fail, fix them

What's Next?

Once you have a basic suite:

Week 2: Add 50 more tests
Week 3: Integrate with CI/CD
Week 4: Add performance benchmarks
Month 2: Set up production monitoring

✓ Action Item:

Spend the next hour building your first 20 test cases. Use our template as a starting point.

Need Help Building Your Test Suite?

We'll help you design and implement a custom test suite for your AI system.

Book Consultation

Building Your First AI Test Suite in 1 Hour

Why You Need a Test Suite

The 5 Core Test Categories

1. Happy Path Tests (30% of suite)

2. Edge Case Tests (25% of suite)

3. Hallucination Tests (20% of suite)

4. Safety Tests (15% of suite)

5. Performance Tests (10% of suite)

Step-by-Step: Building Your Test Suite

Step 1: Install Promptfoo (5 minutes)

Step 2: Define Your Test Cases (20 minutes)

Step 3: Run Tests (1 minute)

Step 4: Review Results (5 minutes)

50 Starter Test Cases (Copy & Paste)

Best Practices

Common Pitfalls

What's Next?

Need Help Building Your Test Suite?