The Slowness
Our AI email system was taking 1.2 seconds to write each email.
The AI itself took 800ms. That's expected it's doing complex work.
But we were adding another 400ms just to check if the email was okay.
That's like baking a cake in 8 minutes and then spending 4 minutes deciding if it looks good enough.
What Is "Validation"?
First, let's understand what we were doing.
Validation means checking that something is correct before using it.
AI GENERATES:
{
"subject": "Special offer just for you!",
"body": "Hi John, check out our new product...",
"price": "$29.99",
"product_id": "SKU-12345"
}
VALIDATION CHECKS:
✓ Is the subject under 60 characters?
✓ Does the product ID actually exist?
✓ Is the price correct?
✓ No inappropriate content?
✓ Personalization looks right?Each check takes time. We had five checks, each taking 50-150ms.
The Problem - One at a Time
Here's how our validation worked:
AI RESPONSE ARRIVES (800ms)
│
▼
Check 1: Schema validation → 5ms
│
▼
Check 2: Product exists? → 100ms (database lookup)
│
▼
Check 3: Price correct? → 50ms (another database lookup)
│
▼
Check 4: Content safety? → 150ms (API call to moderation)
│
▼
Check 5: Personalization OK? → 50ms (template checking)
│
▼
DONE!
TOTAL VALIDATION TIME: 5 + 100 + 50 + 150 + 50 = 355msEach check waited for the previous one to finish.
Like standing in five separate lines at the DMV.
Solution 1 - Run Checks in Parallel
The realization: Most checks don't depend on each other.
SEQUENTIAL (One at a time):
─────────────────────────────────────────────────────►
│ Check 1 │ Check 2 │ Check 3 │ Check 4 │ Check 5 │
5ms 100ms 50ms 150ms 50ms
Total: 355ms
PARALLEL (All at once):
─────────────────────────────────────────►
│ Check 1 │ 5ms
│ Check 2 │────────────│ 100ms
│ Check 3 │──────│ 50ms
│ Check 4 │───────────────────│ 150ms
│ Check 5 │──────│ 50ms
Total: 150ms (longest check wins)Think of it like a kitchen:
SEQUENTIAL COOKING:
Chef: "I'll make salad first" (5 min)
Chef: "Now I'll cook the pasta" (20 min)
Chef: "Now I'll grill the chicken" (15 min)
Chef: "Now I'll prepare dessert" (10 min)
Total: 50 minutes
PARALLEL COOKING:
Chef 1: "I'll make salad" (5 min)
Chef 2: "I'll cook pasta" (20 min) ← Longest task
Chef 3: "I'll grill chicken" (15 min)
Chef 4: "I'll prepare dessert" (10 min)
Total: 20 minutes (everyone works at once)Result: 355ms → 150ms. We saved 57% of time just by running things simultaneously.
Solution 2 - Not Everything Needs to Block
Here's a key insight:
Some checks MUST pass before we continue. Others can happen in the background.
BLOCKING CHECKS (Must pass now):
─────────────────────────────────
• Is the JSON valid? (If not, we can't use it at all)
• Is the subject too long? (Will break email clients)
• Are there template errors? (Would look unprofessional)
These are FAST (1-5ms each) and CRITICAL.
BACKGROUND CHECKS (Can pass later):
──────────────────────────────────
• Does product exist? (Rare failure, can fix)
• Content moderation (Rare issues, can suppress)
• Detailed quality scoring (Nice to have)
These are SLOW (50-150ms each) but NOT urgent.The new flow:
AI RESPONSE ARRIVES
│
├──► BLOCKING CHECKS (5ms total)
│ ✓ Schema valid?
│ ✓ Subject length?
│ ✓ No template errors?
│
├──► RETURN RESPONSE TO USER (fast!)
│
└──► BACKGROUND CHECKS (happen after)
• Product validation
• Content moderation
• Quality scoring
→ If any fail, flag for reviewThink of it like airport security:
MUST CHECK NOW (Blocking):
• Do you have a boarding pass?
• Is your ID valid?
→ Can't fly without these.
CAN CHECK LATER (Background):
• Does your luggage have prohibited items?
→ We'll catch it, but you can board while we check.Solution 3 - Remember What You Already Checked
Many validations repeat. Why check the same thing twice?
EMAIL 1: "Check if product SKU-123 exists"
→ Database lookup → YES, exists (100ms)
EMAIL 2: "Check if product SKU-123 exists"
→ Database lookup → YES, exists (100ms)
EMAIL 3: "Check if product SKU-123 exists"
→ Database lookup → YES, exists (100ms)
TOTAL: 300ms for the same answer three times!With caching:
EMAIL 1: "Check if product SKU-123 exists"
→ Database lookup → YES, exists (100ms)
→ Save result for 1 hour
EMAIL 2: "Check if product SKU-123 exists"
→ Check cache → YES, exists (1ms)
EMAIL 3: "Check if product SKU-123 exists"
→ Check cache → YES, exists (1ms)
TOTAL: 102msThink of it like a phone contact list:
WITHOUT CACHING:
"What's Mom's phone number?"
→ Look in paper address book (30 seconds)
"What's Mom's phone number?"
→ Look in paper address book (30 seconds)
WITH CACHING:
"What's Mom's phone number?"
→ Look in paper address book (30 seconds)
→ Save to phone contacts
"What's Mom's phone number?"
→ Check phone contacts (1 second)Solution 4 - Start Checking Before It's Done
Here's a clever trick: Start validating while the AI is still writing.
NORMAL APPROACH:
AI: "Generating... generating... generating... DONE!"
─────────────────────────────────────────────────►
│
▼
Start validation
STREAMING APPROACH:
AI: "Subject: Special offer just for you..."
↓
Can we validate subject NOW? Yes!
↓ (Continue generating)
AI: "Body: Hi John, check out..."
↓
Can we check for forbidden words? Yes!
↓ (Continue generating)
AI: "...DONE!"
↓
Almost everything already validated!Think of it like proofreading a letter as someone writes it:
WITHOUT STREAMING:
Writer: *writes entire letter*
Writer: "Done! Can you proofread?"
Proofreader: "Sure, give me 5 minutes"
WITH STREAMING:
Writer: "Dear..."
Proofreader: "Looks good so far"
Writer: "...Sir/Madam..."
Proofreader: "Good, keep going"
Writer: "...I am writing to..."
Proofreader: "Wait, you misspelled something!"
Writer: *fixes immediately*
When the letter is done, proofreading is almost done too!Solution 5 - Load Everything Into Memory
Database lookups are slow. Memory lookups are fast.
SLOW (Database lookup every time):
"Does product SKU-123 exist?"
→ Send query to database
→ Database searches millions of rows
→ Database returns answer
→ 100ms
FAST (Memory lookup):
"Does product SKU-123 exist?"
→ Check set in memory
→ {"SKU-001", "SKU-002", ..., "SKU-123", ...}
→ Yes, it's in the set
→ 0.001msThe tradeoff:
- Uses more memory
- Need to refresh periodically (data might change)
- Worth it for data that changes rarely
Think of it like a cheat sheet:
WITHOUT CHEAT SHEET:
"What's the formula for area of a circle?"
→ Open textbook
→ Find chapter on circles
→ Find the formula
→ 2 minutes
WITH CHEAT SHEET:
"What's the formula for area of a circle?"
→ Look at cheat sheet on desk
→ πr²
→ 2 secondsPutting It All Together
Here's how I validate now:
AI RESPONSE ARRIVES
│
▼
┌───────────────────────────────────────────────┐
│ STEP 1: Fast Blocking Checks (5ms total) │
│ ──────────────────────────────────────── │
│ • Is JSON valid? │
│ • Subject under 60 chars? │
│ • No template artifacts? │
│ • Product ID in memory cache? │
│ │
│ If ANY fail → Reject immediately │
└───────────────────────────────────────────────┘
│
│ All passed
▼
┌───────────────────────────────────────────────┐
│ STEP 2: Return Response (0ms) │
│ ───────────────────────────── │
│ User gets the email content NOW │
└───────────────────────────────────────────────┘
│
│ Meanwhile, in the background...
▼
┌───────────────────────────────────────────────┐
│ STEP 3: Background Checks (parallel) │
│ ──────────────────────────────────── │
│ • Content safety moderation │
│ • Detailed quality scoring │
│ • Personalization audit │
│ │
│ If ANY fail → Flag for review, maybe stop │
└───────────────────────────────────────────────┘The Results
BEFORE (Naive approach):
─────────────────────────────────
AI generation: 800ms
Validation: 400ms (sequential, all blocking)
TOTAL: 1200ms
AFTER (Optimized approach):
─────────────────────────────────
AI generation: 800ms
Blocking validation: 15ms (parallel, cached, in-memory)
Background: 0ms (happens after response)
TOTAL: 815msValidation time: 400ms → 15ms (96% reduction!)
We went from adding 50% overhead to adding 2% overhead.
When NOT to Use These Tricks
These optimizations work when:
- Background failures are rare (< 1%)
- You can fix problems after the fact
- Speed matters more than perfect accuracy
They DON'T work when:
- Every output must be verified before use
- Failures are common
- A bad output causes serious harm
GOOD USE CASES:
• Marketing emails (can suppress bad ones)
• Product recommendations (can show generic fallback)
• Content suggestions (user can ignore)
BAD USE CASES:
• Financial transactions (must verify before executing)
• Medical advice (can't risk bad output)
• Legal documents (must be 100% accurate)Key Lessons
Lesson 1: Sequential Is the Enemy of Speed
If checks don't depend on each other, run them at the same time. This alone cut our validation from 355ms to 150ms.
Lesson 2: Not Everything Is Urgent
Some checks can happen after you've already responded. Move slow, non-critical checks to the background.
Lesson 3: Cache Everything You Can
If you've checked something once and it won't change soon, remember the answer.
Lesson 4: Memory Is Faster Than Databases
If your validation data fits in memory and doesn't change often, load it once and keep it there.
Quick Reference
Parallel validation (using Python's asyncio.gather):
results = await asyncio.gather(
check_1(),
check_2(),
check_3(),
)Tiered validation (block vs background):
# Must pass now (fast checks)
validate_schema(content) # 1ms
validate_format(content) # 1ms
# Can pass later (slow checks)
background_task.delay(content) # 0ms now, runs laterCaching (remember answers):
cache_key = f"product_exists:{product_id}"
if cache.get(cache_key):
return True # 1ms
else:
result = database.lookup(product_id) # 100ms
cache.set(cache_key, result, ttl=3600)
return resultSummary
THE PROBLEM:
Validation added 400ms to every AI response
Total time: 800ms (AI) + 400ms (validation) = 1200ms
WHY IT HAPPENED:
- Checks ran one at a time (sequential)
- Every check blocked the response
- Same checks repeated without caching
- Database lookups instead of memory lookups
THE FIX:
1. Run checks in parallel (355ms → 150ms)
2. Move non-critical checks to background (150ms → 5ms)
3. Cache repeated lookups (5ms → 2ms)
4. Use memory instead of database (2ms → <1ms)
THE RESULT:
400ms validation → 15ms validation
96% reduction in validation overheadDon't skip validation. Make it faster.
Related Reading
- Preventing LLM Hallucinations - The validation rules that catch AI mistakes before they reach customers
- Retry Storms - What happens when your LLM fallback chain triggers too many retries
- Queue Sizing - Managing queues when validation adds latency to your pipeline
