The Feature Request from Hell
"Users should be able to filter by make, model, year, price, mileage, body style, transmission, drivetrain, fuel type, color, features, location, radius, dealer rating, single owner, clean title, certified, and also sort by price, mileage, distance, or newest."
I counted. 30+ optional parameters.
Any combination of them. All optional.
SOME VALID SEARCHES:
/search?make=toyota
/search?make=toyota&model=camry&min_year=2018
/search?min_price=10000&max_price=30000&body_style=SUV
/search?zipcode=90210&radius=50&fuel_type=electric
/search?make=honda&transmission=automatic&drivetrain=awd&single_owner=true&sort_by=lowest_price
/search (no params - return everything)I had nightmares about the if statements.
The Naive Approach (Don't Do This)
My first instinct was to build the query piece by piece:
def build_search_query(params):
query = {"bool": {"must": [], "filter": []}}
if params.get('make'):
query['bool']['filter'].append(
{"term": {"make": params['make']}}
)
if params.get('model'):
query['bool']['filter'].append(
{"term": {"model": params['model']}}
)
if params.get('min_year'):
query['bool']['filter'].append(
{"range": {"year": {"gte": params['min_year']}}}
)
if params.get('max_year'):
query['bool']['filter'].append(
{"range": {"year": {"lte": params['max_year']}}}
)
# ... 26 more if statements
return queryThis technically works. But:
- 150+ lines of repetitive
ifstatements - Easy to make typos
- Hard to maintain
- Every new filter = more spaghetti
The Pattern That Saved Me
I realized most filters fall into a few categories:
FILTER TYPES:
1. EXACT MATCH (term)
make=toyota → {"term": {"make": "toyota"}}
2. RANGE (gte/lte)
min_price=10000 → {"range": {"price": {"gte": 10000}}}
max_price=50000 → {"range": {"price": {"lte": 50000}}}
3. MULTIPLE VALUES (terms)
transmission=automatic,manual → {"terms": {"transmission": ["automatic", "manual"]}}
4. BOOLEAN FLAG
single_owner=true → {"term": {"single_owner": true}}
5. GEO DISTANCE
zipcode=90210&radius=50 → geo_distance queryInstead of 30 if statements, I created a configuration that describes each filter:
FILTER_CONFIG = {
# Exact match filters
"make": {"type": "term", "field": "build.make"},
"model": {"type": "term", "field": "build.model"},
"body_style": {"type": "term", "field": "build.body_type"},
"drivetrain": {"type": "term", "field": "build.drivetrain"},
"fuel_type": {"type": "term", "field": "build.fuel_type"},
# Range filters (min/max pairs)
"year": {"type": "range", "field": "build.year"},
"price": {"type": "range", "field": "price"},
"miles": {"type": "range", "field": "miles"},
# Multi-value filters
"transmission": {"type": "terms", "field": "build.transmission"},
"exterior": {"type": "terms", "field": "exterior_color"},
"interior": {"type": "terms", "field": "interior_color"},
"feature": {"type": "terms", "field": "features"},
# Boolean filters
"single_owner": {"type": "bool", "field": "single_owner"},
"clean_record": {"type": "bool", "field": "clean_title"},
"certified": {"type": "bool", "field": "is_certified"},
}Then ONE function to process them all:
def build_filters(params, config):
filters = []
for param_name, conf in config.items():
value = params.get(param_name)
if value is None:
continue
field = conf["field"]
filter_type = conf["type"]
if filter_type == "term":
filters.append({"term": {field: value}})
elif filter_type == "terms":
# Handle comma-separated values
values = value.split(",") if isinstance(value, str) else value
filters.append({"terms": {field: values}})
elif filter_type == "range":
# Check for min_X and max_X params
min_val = params.get(f"min_{param_name}")
max_val = params.get(f"max_{param_name}")
range_query = {}
if min_val is not None:
range_query["gte"] = min_val
if max_val is not None:
range_query["lte"] = max_val
if range_query:
filters.append({"range": {field: range_query}})
elif filter_type == "bool":
# Handle "true"/"false" strings
bool_val = str(value).lower() == "true"
filters.append({"term": {field: bool_val}})
return filtersNow adding a new filter is ONE line in the config. No new code.
Handling Geo Search
Location search was trickier. Users send a zipcode, we need to search by coordinates.
USER SENDS: zipcode=90210&radius=50
WE NEED: lat=34.0901, lon=-118.4065, distance=50 milesStep 1: Cache zipcode → coordinates
# We pre-loaded all US zipcodes into a lookup table
ZIPCODE_COORDS = {
"90210": {"lat": 34.0901, "lon": -118.4065},
"10001": {"lat": 40.7484, "lon": -73.9967},
# ... 40,000+ zipcodes
}Step 2: Build geo query
def build_geo_filter(zipcode, radius_miles):
coords = ZIPCODE_COORDS.get(zipcode)
if not coords:
return None
return {
"geo_distance": {
"distance": f"{radius_miles}mi",
"location": {
"lat": coords["lat"],
"lon": coords["lon"]
}
}
}Step 3: Add distance to results
Users also want to see "25 miles away" on each listing. Elasticsearch can calculate this:
def add_distance_sort(query, zipcode):
coords = ZIPCODE_COORDS.get(zipcode)
if not coords:
return
query["script_fields"] = {
"distance": {
"script": {
"source": """
doc['location'].arcDistance(params.lat, params.lon) * 0.000621371
""",
"params": {"lat": coords["lat"], "lon": coords["lon"]}
}
}
}The 0.000621371 converts meters to miles.
The Sorting Problem
Six sort options:
SORT_OPTIONS = {
"newest_listed": [{"listed_date": "desc"}],
"oldest_listed": [{"listed_date": "asc"}],
"low_price": [{"price": "asc"}],
"high_price": [{"price": "desc"}],
"lowest_miles": [{"miles": "asc"}],
"highest_miles": [{"miles": "desc"}],
"nearest_dist": [{"_geo_distance": {...}}], # Special case
}The geo sort is special because it needs the user's coordinates:
def get_sort(sort_by, zipcode=None):
if sort_by == "nearest_dist" and zipcode:
coords = ZIPCODE_COORDS.get(zipcode)
if coords:
return [{
"_geo_distance": {
"location": coords,
"order": "asc",
"unit": "mi"
}
}]
return SORT_OPTIONS.get(sort_by, SORT_OPTIONS["newest_listed"])Faceted Counts (Aggregations)
Users see filter counts like "Automatic (2,345)" next to each option.
TRANSMISSION
☐ Automatic (2,345)
☐ Manual (234)
☐ CVT (567)These need to update dynamically based on OTHER active filters.
If the user filters by "Toyota", the counts should show:
- Automatic Toyotas: 1,890
- Manual Toyotas: 156
This is done with Elasticsearch aggregations:
def build_aggregations():
return {
"transmissions": {
"filters": {
"filters": {
"Automatic": {"term": {"build.transmission": "Automatic"}},
"Manual": {"term": {"build.transmission": "Manual"}},
"CVT": {"term": {"build.transmission": "CVT"}}
}
},
"aggs": {
"count": {
"cardinality": {
"field": "vin",
"precision_threshold": 100
}
}
}
},
# Same pattern for body_style, fuel_type, etc.
}The cardinality aggregation counts unique VINs, not duplicate listings.
The Final Query Builder
Putting it all together:
def build_search_query(params):
# 1. Start with base structure
query = {
"query": {
"bool": {
"must": [],
"filter": []
}
},
"size": params.get("size", 40),
"from": (params.get("page", 1) - 1) * 40
}
# 2. Build filters from config
filters = build_filters(params, FILTER_CONFIG)
query["query"]["bool"]["filter"].extend(filters)
# 3. Add geo filter if zipcode provided
if params.get("zipcode"):
geo_filter = build_geo_filter(
params["zipcode"],
params.get("radius", 50)
)
if geo_filter:
query["query"]["bool"]["filter"].append(geo_filter)
# 4. Add sorting
query["sort"] = get_sort(
params.get("sort_by", "newest_listed"),
params.get("zipcode")
)
# 5. Add distance calculation if location search
if params.get("zipcode"):
add_distance_script(query, params["zipcode"])
# 6. Add aggregations for facet counts
query["aggs"] = build_aggregations()
return queryClean. Maintainable. New filters = add to config, done.
The API Endpoint
Django REST Framework made the endpoint simple:
class VehicleSearchView(APIView):
def post(self, request):
# 1. Validate params
serializer = SearchParamsSerializer(data=request.data)
serializer.is_valid(raise_exception=True)
params = serializer.validated_data
# 2. Build query
query = build_search_query(params)
# 3. Execute search (with routing if make specified)
routing = get_routing(params.get("make"))
response = es.search(
index=INDEX_NAME,
body=query,
routing=routing
)
# 4. Format response
return Response({
"total": response["hits"]["total"]["value"],
"results": [format_hit(h) for h in response["hits"]["hits"]],
"facets": format_aggregations(response["aggregations"])
})Performance Optimizations
1. Filter Before Query
filter clauses are cached and faster than must clauses:
# SLOWER - scores every document
{"bool": {"must": [{"term": {"make": "toyota"}}]}}
# FASTER - doesn't score, uses cache
{"bool": {"filter": [{"term": {"make": "toyota"}}]}}We put everything in filter unless we need relevance scoring (like text search).
2. Routing for Single-Make Searches
If searching for one make, only hit that shard:
if params.get("make") and not "," in params["make"]:
routing = get_routing_key(params["make"])
# Search only hits one shard instead of all 643. Aggregation Sampling
For facet counts, approximate is fine:
"cardinality": {
"field": "vin",
"precision_threshold": 100 # Faster, ~5% error at high counts
}precision_threshold=100 is accurate up to ~100 unique values, then approximates. Users don't care if it's "12,345" vs "12,412".
Key Lessons
Lesson 1: Configuration Over Code
30 if statements = unmaintainable. A config dictionary + generic handler = clean.
Lesson 2: Validate Early
Reject bad input before it hits the database. Detailed error messages save debugging time.
Lesson 3: Know Your Query Clauses
filter = fast, cached, no scoring
must = slower, scores documents
Use filter for structured data (dropdowns, checkboxes).
Use must for text search where relevance matters.
Lesson 4: Measure What Matters
We tracked:
- P95 response time by number of active filters
- Most common filter combinations
- Zero-result searches (users filtering too aggressively)
This data drove optimization priorities.
Quick Reference
Filter types cheatsheet:
# Exact match
{"term": {"field": "value"}}
# Multiple values (OR)
{"terms": {"field": ["val1", "val2"]}}
# Range
{"range": {"field": {"gte": 10, "lte": 100}}}
# Geo distance
{"geo_distance": {"distance": "50mi", "location": {"lat": 34, "lon": -118}}}
# Boolean
{"term": {"field": True}}Aggregation pattern:
"aggs": {
"my_facet": {
"terms": {"field": "my_field", "size": 50}
}
}That's how you build a search API that doesn't collapse under its own complexity. Config-driven, validated, and fast.
Related Reading
- Why Did Toyota Searches Take 3x Longer? - How we made single-make searches 5x faster
- Why Did Our Search Get Slower? - Why we needed Elasticsearch in the first place
- Deduplicating 3 Million Records - Cleaning the data before indexing
