Which LLM For Which Task (And Why I Didn't Self-Host)
Lessons from building a platform that needed to pick the right model for each task. Spoiler: 'just use GPT-4' isn't a strategy.
Notes
Writing helps me think. These are notes on problems I've solved, experiments I've run, and things I wish I knew earlier.
Showing 1-25 of 25 notes
Lessons from building a platform that needed to pick the right model for each task. Spoiler: 'just use GPT-4' isn't a strategy.
What I learned building request control for a multi-service LLM platform. Real patterns, real mistakes.
SEO isn't just about rankings anymore. It's about visibility in AI Overviews, building trust signals, and understanding how machines interpret your content. Here's why every digital professional needs SEO now.
The March 2024 core update changed everything. Here's what I learned about surviving Google's algorithm changes, why AI content strategies failed, and the lessons that shaped modern SEO.
How I learned matrices, eigenvalues, and SVD by connecting them to neural networks, PCA, and LoRA fine-tuning. The practical understanding that made AI systems click.
How I learned to see vectors, dot products, and norms as practical tools for AI systems. The visual explanations that finally made embeddings and similarity click.
The mathematical foundations behind modern AI systems and why understanding them matters for building better applications. Part 1 of my learning journey.
A deep dive into debugging Celery worker crashes in production. How I fixed memory fragmentation and database connection leaks in Django/Python using max-tasks-per-child.
A practical guide to preventing retry storms in distributed systems. Learn exponential backoff, circuit breakers, and jitter strategies that protect your services from cascading failures.
A practical guide to preventing LLM hallucinations in production systems. Learn how to validate AI-generated content using Pydantic schemas, fallback chains, and output validation before it reaches your customers.
A practical guide to reducing LLM response validation latency. Learn parallel validation, tiered checking, caching strategies, and streaming validation to cut validation time from 400ms to 20ms.
A practical guide to sizing message queues and implementing backpressure. Learn how to prevent queue overflow, handle traffic spikes, and build systems that degrade gracefully under load.
A practical guide to understanding PostgreSQL full-text search limits. Learn when to stick with PostgreSQL and when to migrate to Elasticsearch, with real performance numbers from a 2M record vehicle search.
How we reduced data processing latency from 15 minutes to 30 seconds by switching from cron jobs to event-driven architecture with AWS Lambda and SQS. A practical guide to real-time data pipelines.
Auto-fill an Excel template with shapes? I tried 4 approaches. Only the dumbest one worked.
I built 5 dashboards. Leadership used 1. Here's what made the winner different and how I fixed the others.
A script that runs isn't the same as a script that works. How I made nightly jobs self-healing with retries, checkpoints, and loud failures.
I was becoming a human SQL interface for 50+ engineers. So I built a query parser that let them self-serve with typo suggestions.
Watching an analyst spend 4 hours on VLOOKUPs every Monday, I built a Python automation that found 4x more anomalies in 45 seconds.
How I reduced an auction listing page from 147 queries to 3 using select_related, prefetch_related, and annotations.
When polling caused duplicate bids in our live auction system, we learned the hard way why WebSockets matter for real-time features.
Building a daily ETL pipeline that downloads, deduplicates, transforms, and indexes millions of vehicle listings before users wake up.
Configuration over code: how I avoided 30 if-statements and built a maintainable search API with Elasticsearch and Django REST Framework.
Elasticsearch's default sharding spread our data randomly. Custom routing by vehicle make made searches 5x faster. Here's how.
When Pandas crashed processing 40 million rows, I discovered Miller CLI. Here's how streaming beats loading for massive CSV deduplication.