The Grafana Alert Nobody Expected
Tuesday morning. Coffee in hand. Then the Slack notification:
"API response time alert: /api/auctions/ averaging 4.2 seconds"
4.2 seconds. For a list of 50 auctions.
I added some debug logging:
from django.db import connection
# At the end of the view
print(f"Queries executed: {len(connection.queries)}")Output:
Queries executed: 147147 queries to show 50 auctions.
That's not a database query. That's a denial of service attack against your own server.
Understanding N+1: The Silent Performance Killer
N+1 queries happen when you fetch a list of things (1 query), then fetch related data for each thing (N queries).
N+1 QUERY PATTERN:
Query 1: Get all auctions (50 results)
Query 2: Get car for auction 1
Query 3: Get car for auction 2
Query 4: Get car for auction 3
...
Query 51: Get car for auction 50
Query 52: Get latest bid for auction 1
Query 53: Check if bids exist for auction 1
Query 54: Get latest bid for auction 2
Query 55: Check if bids exist for auction 2
...
Total: 1 + 50 + (50 × 2) = 151 queriesDjango's ORM makes this easy to do by accident. The lazy loading that makes development fast makes production slow.
Where We Found N+1 Queries
Problem 1: The Auction Model Property
# auction/models.py
class Auction(models.Model):
car = models.ForeignKey(Car, on_delete=models.CASCADE)
min_desired_price = models.DecimalField(max_digits=10, decimal_places=2)
current_price = models.DecimalField(max_digits=10, decimal_places=2, null=True)
@property
def get_bid(self):
from auction.models import Bid
# TWO QUERIES PER AUCTION!
if Bid.objects.filter(auction=self).exists(): # Query 1: EXISTS
latest_bid = Bid.objects.filter(auction=self).latest('bid_time') # Query 2: SELECT
return int(latest_bid.bid_amount)
else:
return int(self.min_desired_price)This property gets called during serialization. 50 auctions × 2 queries = 100 extra queries.
Problem 2: The Serializer Did It Too
# auction/ser.py
class AuctionSerializer(serializers.ModelSerializer):
latest_bid = serializers.SerializerMethodField()
def get_latest_bid(self, obj):
# SAME PATTERN, SAME PROBLEM
if Bid.objects.filter(auction=obj).exists():
latest_bid = Bid.objects.filter(auction=obj).latest('bid_time')
return latest_bid.bid_amount
else:
return obj.min_desired_priceIf both the model property AND the serializer method get called, that's 4 queries per auction.
Problem 3: The View Loop
# auction/views.py
def filter_auctions(request):
nearby_auctions = []
for auction in Auction.objects.all(): # 1 query
car = auction.car # N queries (1 per auction)
car_lat, car_lng = get_geocode(car.zip_code)
distance = haversine(target_lng, target_lat, car_lng, car_lat)
if distance <= 100:
nearby_auctions.append(auction)
return nearby_auctionsLooping through a queryset and accessing auction.car triggers a query for each auction.
The Fix: select_related and prefetch_related
Django provides two tools to kill N+1 queries:
select_related: For ForeignKey/OneToOne (single object)
Uses SQL JOIN
One query gets everything
prefetch_related: For ManyToMany/reverse ForeignKey (multiple objects)
Uses separate query + Python joining
Two queries total, regardless of countFix 1: Add select_related in Views
# BEFORE: N+1 queries
auctions = Auction.objects.filter(approved_status='APPROVED')
for auction in auctions:
print(auction.car.make) # Query per auction!
# AFTER: 1 query with JOIN
auctions = Auction.objects.filter(approved_status='APPROVED').select_related('car', 'seller')
for auction in auctions:
print(auction.car.make) # Already loaded!One query:
SELECT auction.*, car.*, seller.*
FROM auction
LEFT JOIN car ON auction.car_id = car.id
LEFT JOIN user ON auction.seller_id = user.id
WHERE auction.approved_status = 'APPROVED'Fix 2: Prefetch Related Bids
# BEFORE: 2 queries per auction for bids
auctions = Auction.objects.all()
# AFTER: 2 queries TOTAL
from django.db.models import Prefetch
auctions = Auction.objects.all().prefetch_related(
Prefetch(
'bid_set',
queryset=Bid.objects.order_by('-bid_time'),
to_attr='ordered_bids'
)
)
for auction in auctions:
latest_bid = auction.ordered_bids[0] if auction.ordered_bids else NoneQuery 1: Get all auctions Query 2: Get all bids for those auctions (one query, all bids)
Fix 3: Annotate Instead of Property
Instead of a property that queries, use annotation:
# BEFORE: Property with N+1
class Auction(models.Model):
@property
def get_bid(self):
if Bid.objects.filter(auction=self).exists():
return Bid.objects.filter(auction=self).latest('bid_time').bid_amount
return self.min_desired_price
# AFTER: Annotation in queryset
from django.db.models import Max, Coalesce
auctions = Auction.objects.annotate(
latest_bid_amount=Coalesce(
Max('bid__bid_amount'),
'min_desired_price'
)
)
for auction in auctions:
print(auction.latest_bid_amount) # No extra query!The database calculates latest_bid_amount in the original query. Zero extra queries.
Fix 4: Annotate Message Counts
# BEFORE: N+1 on messages_count property
chat_rooms = ChatRoom.objects.all()
for room in chat_rooms:
print(room.messages_count()) # Query per room!
# AFTER: Annotation
from django.db.models import Count
chat_rooms = ChatRoom.objects.annotate(
message_count=Count('message')
)
for room in chat_rooms:
print(room.message_count) # Already calculated!Custom Middleware We Built
Beyond ORM fixes, we built middleware to monitor and optimize requests.
Middleware 1: TimingMiddleware (Request Profiling)
# user_stats/middleware.py
import time
from django.utils.deprecation import MiddlewareMixin
class TimingMiddleware(MiddlewareMixin):
"""Measures request processing time"""
def process_request(self, request):
request._timing_start = time.time()
def process_response(self, request, response):
if hasattr(request, '_timing_start'):
elapsed = time.time() - request._timing_start
response['X-Elapsed-Time'] = f"{elapsed:.4f}"
return responseEvery response now has an X-Elapsed-Time header. We piped these to Grafana.
SAMPLE HEADERS:
HTTP/1.1 200 OK
Content-Type: application/json
X-Elapsed-Time: 0.0234When that number spikes, we investigate.
Caching Strategies That Actually Helped
Strategy 1: Redis Cache Configuration
# ecar/settings.py
CACHES = {
"default": {
"BACKEND": "django_redis.cache.RedisCache",
"LOCATION": REDIS_URL,
"OPTIONS": {
"CLIENT_CLASS": "django_redis.client.DefaultClient",
"CONNECTION_POOL_KWARGS": {
"max_connections": 100 # Connection pooling
},
},
}
}Connection pooling is critical. Without it, each cache operation opens a new connection. With it, connections are reused.
Strategy 2: Geocoding Cache (7-Day TTL)
# front_web/cars/query.py
def get_geocode(zipcode):
cache_key = f'geocode_{zipcode}'
cached_geocode = cache.get(cache_key)
if cached_geocode is not None:
return cached_geocode # Cache hit!
# Cache miss - call Google Maps API
response = requests.get(
f'https://maps.googleapis.com/maps/api/geocode/json?address={zipcode}&key={API_KEY}'
)
data = response.json()
lat = data['results'][0]['geometry']['location']['lat']
lng = data['results'][0]['geometry']['location']['lng']
# Cache for 7 days (zipcodes don't move)
cache.set(cache_key, (lat, lng), 60 * 60 * 24 * 7)
return lat, lngZipcodes don't move. Cache them for a week.
Before caching: 1,000 searches/day × $0.005/request = $5/day in API costs. After caching: ~50 unique zipcodes × $0.005 = $0.25/day.
Fixing Race Conditions with F() and select_for_update()
Beyond N+1, we had race conditions in auction updates.
The Problem: Counter Increment Race
# BROKEN: Race condition on counter
def close_auction(auction_id):
auction = Auction.objects.get(id=auction_id)
auction.car.auction_count += 1 # Read-modify-write
auction.car.save()Two requests arrive simultaneously:
- Both read
auction_count = 5 - Both calculate
5 + 1 = 6 - Both save
auction_count = 6 - Should be 7!
The Fix: F() Expression
# FIXED: Atomic increment
from django.db.models import F
def close_auction(auction_id):
auction = Auction.objects.get(id=auction_id)
auction.car.auction_count = F('auction_count') + 1
auction.car.save()
auction.car.refresh_from_db() # Get the updated valueF('auction_count') + 1 translates to:
UPDATE car SET auction_count = auction_count + 1 WHERE id = 123The database handles the increment atomically. No race condition.
The Complete Fix: Transaction + Row Lock
# auction/task.py
from django.db import transaction
from django.db.models import F
@shared_task
def close_auction(auction_id):
try:
with transaction.atomic():
# Lock the row - other transactions wait
auction = Auction.objects.select_for_update().get(id=auction_id)
auction.status = 'CLOSED'
auction.save()
# Atomic increment
auction.car.auction_count = F('auction_count') + 1
auction.car.save()
# Refresh to get actual value
auction.car.refresh_from_db()
return f"Auction {auction_id} closed."
except Auction.DoesNotExist:
return f"Auction {auction_id} not found"select_for_update() + transaction.atomic() = bulletproof.
The Results
After all optimizations:
| Metric | Before | After |
|---|---|---|
| Queries per auction list | 147 | 3 |
| Response time | 4.2 seconds | 180ms |
| Database CPU | 85% | 15% |
| Google Maps API cost | $150/month | $12/month |
| Cache hit rate | 0% | 94% |
The auction list page went from 4.2 seconds to 180ms. Users stopped complaining.
How to Find N+1 Queries in Your Code
Tool 1: Django Debug Toolbar
# settings.py (dev only!)
INSTALLED_APPS += ['debug_toolbar']
MIDDLEWARE += ['debug_toolbar.middleware.DebugToolbarMiddleware']Shows query count and time on every page. If you see 100+ queries, you have N+1.
Tool 2: Query Logging
# settings.py
LOGGING = {
'loggers': {
'django.db.backends': {
'level': 'DEBUG',
'handlers': ['console'],
},
},
}Logs every SQL query. Watch for patterns like:
SELECT * FROM car WHERE id = 1
SELECT * FROM car WHERE id = 2
SELECT * FROM car WHERE id = 3
...That's N+1.
Tool 3: assertNumQueries in Tests
def test_auction_list_queries(self):
# Create test data
create_auctions(50)
# Should be 3 queries max, not 150
with self.assertNumQueries(3):
response = self.client.get('/api/auctions/')Catch N+1 in CI before it hits production.
Key Lessons
Lesson 1: Properties Are Dangerous
Model properties that query the database are N+1 factories. Use annotations instead.
# BAD: Property with query
@property
def bid_count(self):
return self.bid_set.count()
# GOOD: Annotation in queryset
auctions = Auction.objects.annotate(bid_count=Count('bid'))Lesson 2: select_related for ForeignKey
Anytime you access obj.related_object, use select_related:
Auction.objects.select_related('car', 'seller').all()Lesson 3: prefetch_related for Reverse Relations
Anytime you access obj.related_set.all(), use prefetch_related:
Auction.objects.prefetch_related('bid_set').all()Lesson 4: Cache External API Calls
If you're calling an external API with the same inputs, cache it:
result = cache.get(key)
if result is None:
result = expensive_api_call()
cache.set(key, result, ttl)Lesson 5: Offload to Background Tasks
Analytics, logging, emails anything that doesn't affect the response:
track_visit.delay(user_id, page) # Celery handles it
return Response(data) # User doesn't waitQuick Reference
select_related (ForeignKey/OneToOne):
Auction.objects.select_related('car', 'seller')prefetch_related (ManyToMany/reverse FK):
Auction.objects.prefetch_related('bid_set')Annotation instead of property:
Auction.objects.annotate(bid_count=Count('bid'))Atomic increment:
obj.counter = F('counter') + 1
obj.save()Row locking:
with transaction.atomic():
obj = Model.objects.select_for_update().get(id=id)Cache pattern:
result = cache.get(key) or expensive_call()
cache.set(key, result, ttl)That's how we went from 147 queries to 3 and why our database finally stopped screaming.
Related Reading
- Why Did Two Users Bid the Same Amount? - Race conditions in the auction system
- Processing 200GB Daily - The ETL pipeline feeding this database
- Elasticsearch Routing by Make - When PostgreSQL wasn't enough
