How Do You Build a Search API with 30+ Optional Filters?

The Feature Request from Hell

"Users should be able to filter by make, model, year, price, mileage, body style, transmission, drivetrain, fuel type, color, features, location, radius, dealer rating, single owner, clean title, certified, and also sort by price, mileage, distance, or newest."

I counted. 30+ optional parameters.

Any combination of them. All optional.

SOME VALID SEARCHES:
 
/search?make=toyota
/search?make=toyota&model=camry&min_year=2018
/search?min_price=10000&max_price=30000&body_style=SUV
/search?zipcode=90210&radius=50&fuel_type=electric
/search?make=honda&transmission=automatic&drivetrain=awd&single_owner=true&sort_by=lowest_price
/search (no params - return everything)

I had nightmares about the if statements.

The Naive Approach (Don't Do This)

My first instinct was to build the query piece by piece:

def build_search_query(params):
    query = {"bool": {"must": [], "filter": []}}
 
    if params.get('make'):
        query['bool']['filter'].append(
            {"term": {"make": params['make']}}
        )
 
    if params.get('model'):
        query['bool']['filter'].append(
            {"term": {"model": params['model']}}
        )
 
    if params.get('min_year'):
        query['bool']['filter'].append(
            {"range": {"year": {"gte": params['min_year']}}}
        )
 
    if params.get('max_year'):
        query['bool']['filter'].append(
            {"range": {"year": {"lte": params['max_year']}}}
        )
 
    # ... 26 more if statements
 
    return query

This technically works. But:

150+ lines of repetitive if statements
Easy to make typos
Hard to maintain
Every new filter = more spaghetti

The Pattern That Saved Me

I realized most filters fall into a few categories:

FILTER TYPES:
 
1. EXACT MATCH (term)
   make=toyota        →  {"term": {"make": "toyota"}}
 
2. RANGE (gte/lte)
   min_price=10000    →  {"range": {"price": {"gte": 10000}}}
   max_price=50000    →  {"range": {"price": {"lte": 50000}}}
 
3. MULTIPLE VALUES (terms)
   transmission=automatic,manual  →  {"terms": {"transmission": ["automatic", "manual"]}}
 
4. BOOLEAN FLAG
   single_owner=true  →  {"term": {"single_owner": true}}
 
5. GEO DISTANCE
   zipcode=90210&radius=50  →  geo_distance query

Instead of 30 if statements, I created a configuration that describes each filter:

FILTER_CONFIG = {
    # Exact match filters
    "make": {"type": "term", "field": "build.make"},
    "model": {"type": "term", "field": "build.model"},
    "body_style": {"type": "term", "field": "build.body_type"},
    "drivetrain": {"type": "term", "field": "build.drivetrain"},
    "fuel_type": {"type": "term", "field": "build.fuel_type"},
 
    # Range filters (min/max pairs)
    "year": {"type": "range", "field": "build.year"},
    "price": {"type": "range", "field": "price"},
    "miles": {"type": "range", "field": "miles"},
 
    # Multi-value filters
    "transmission": {"type": "terms", "field": "build.transmission"},
    "exterior": {"type": "terms", "field": "exterior_color"},
    "interior": {"type": "terms", "field": "interior_color"},
    "feature": {"type": "terms", "field": "features"},
 
    # Boolean filters
    "single_owner": {"type": "bool", "field": "single_owner"},
    "clean_record": {"type": "bool", "field": "clean_title"},
    "certified": {"type": "bool", "field": "is_certified"},
}

Then ONE function to process them all:

def build_filters(params, config):
    filters = []
 
    for param_name, conf in config.items():
        value = params.get(param_name)
        if value is None:
            continue
 
        field = conf["field"]
        filter_type = conf["type"]
 
        if filter_type == "term":
            filters.append({"term": {field: value}})
 
        elif filter_type == "terms":
            # Handle comma-separated values
            values = value.split(",") if isinstance(value, str) else value
            filters.append({"terms": {field: values}})
 
        elif filter_type == "range":
            # Check for min_X and max_X params
            min_val = params.get(f"min_{param_name}")
            max_val = params.get(f"max_{param_name}")
 
            range_query = {}
            if min_val is not None:
                range_query["gte"] = min_val
            if max_val is not None:
                range_query["lte"] = max_val
 
            if range_query:
                filters.append({"range": {field: range_query}})
 
        elif filter_type == "bool":
            # Handle "true"/"false" strings
            bool_val = str(value).lower() == "true"
            filters.append({"term": {field: bool_val}})
 
    return filters

Now adding a new filter is ONE line in the config. No new code.

Handling Geo Search

Location search was trickier. Users send a zipcode, we need to search by coordinates.

USER SENDS:    zipcode=90210&radius=50
WE NEED:       lat=34.0901, lon=-118.4065, distance=50 miles

Step 1: Cache zipcode → coordinates

# We pre-loaded all US zipcodes into a lookup table
ZIPCODE_COORDS = {
    "90210": {"lat": 34.0901, "lon": -118.4065},
    "10001": {"lat": 40.7484, "lon": -73.9967},
    # ... 40,000+ zipcodes
}

Step 2: Build geo query

def build_geo_filter(zipcode, radius_miles):
    coords = ZIPCODE_COORDS.get(zipcode)
    if not coords:
        return None
 
    return {
        "geo_distance": {
            "distance": f"{radius_miles}mi",
            "location": {
                "lat": coords["lat"],
                "lon": coords["lon"]
            }
        }
    }

Step 3: Add distance to results

Users also want to see "25 miles away" on each listing. Elasticsearch can calculate this:

def add_distance_sort(query, zipcode):
    coords = ZIPCODE_COORDS.get(zipcode)
    if not coords:
        return
 
    query["script_fields"] = {
        "distance": {
            "script": {
                "source": """
                    doc['location'].arcDistance(params.lat, params.lon) * 0.000621371
                """,
                "params": {"lat": coords["lat"], "lon": coords["lon"]}
            }
        }
    }

The 0.000621371 converts meters to miles.

The Sorting Problem

Six sort options:

SORT_OPTIONS = {
    "newest_listed": [{"listed_date": "desc"}],
    "oldest_listed": [{"listed_date": "asc"}],
    "low_price": [{"price": "asc"}],
    "high_price": [{"price": "desc"}],
    "lowest_miles": [{"miles": "asc"}],
    "highest_miles": [{"miles": "desc"}],
    "nearest_dist": [{"_geo_distance": {...}}],  # Special case
}

The geo sort is special because it needs the user's coordinates:

def get_sort(sort_by, zipcode=None):
    if sort_by == "nearest_dist" and zipcode:
        coords = ZIPCODE_COORDS.get(zipcode)
        if coords:
            return [{
                "_geo_distance": {
                    "location": coords,
                    "order": "asc",
                    "unit": "mi"
                }
            }]
 
    return SORT_OPTIONS.get(sort_by, SORT_OPTIONS["newest_listed"])

Faceted Counts (Aggregations)

Users see filter counts like "Automatic (2,345)" next to each option.

TRANSMISSION
☐ Automatic (2,345)
☐ Manual (234)
☐ CVT (567)

These need to update dynamically based on OTHER active filters.

If the user filters by "Toyota", the counts should show:

Automatic Toyotas: 1,890
Manual Toyotas: 156

This is done with Elasticsearch aggregations:

def build_aggregations():
    return {
        "transmissions": {
            "filters": {
                "filters": {
                    "Automatic": {"term": {"build.transmission": "Automatic"}},
                    "Manual": {"term": {"build.transmission": "Manual"}},
                    "CVT": {"term": {"build.transmission": "CVT"}}
                }
            },
            "aggs": {
                "count": {
                    "cardinality": {
                        "field": "vin",
                        "precision_threshold": 100
                    }
                }
            }
        },
        # Same pattern for body_style, fuel_type, etc.
    }

The cardinality aggregation counts unique VINs, not duplicate listings.

The Final Query Builder

Putting it all together:

def build_search_query(params):
    # 1. Start with base structure
    query = {
        "query": {
            "bool": {
                "must": [],
                "filter": []
            }
        },
        "size": params.get("size", 40),
        "from": (params.get("page", 1) - 1) * 40
    }
 
    # 2. Build filters from config
    filters = build_filters(params, FILTER_CONFIG)
    query["query"]["bool"]["filter"].extend(filters)
 
    # 3. Add geo filter if zipcode provided
    if params.get("zipcode"):
        geo_filter = build_geo_filter(
            params["zipcode"],
            params.get("radius", 50)
        )
        if geo_filter:
            query["query"]["bool"]["filter"].append(geo_filter)
 
    # 4. Add sorting
    query["sort"] = get_sort(
        params.get("sort_by", "newest_listed"),
        params.get("zipcode")
    )
 
    # 5. Add distance calculation if location search
    if params.get("zipcode"):
        add_distance_script(query, params["zipcode"])
 
    # 6. Add aggregations for facet counts
    query["aggs"] = build_aggregations()
 
    return query

Clean. Maintainable. New filters = add to config, done.

The API Endpoint

Django REST Framework made the endpoint simple:

class VehicleSearchView(APIView):
    def post(self, request):
        # 1. Validate params
        serializer = SearchParamsSerializer(data=request.data)
        serializer.is_valid(raise_exception=True)
        params = serializer.validated_data
 
        # 2. Build query
        query = build_search_query(params)
 
        # 3. Execute search (with routing if make specified)
        routing = get_routing(params.get("make"))
        response = es.search(
            index=INDEX_NAME,
            body=query,
            routing=routing
        )
 
        # 4. Format response
        return Response({
            "total": response["hits"]["total"]["value"],
            "results": [format_hit(h) for h in response["hits"]["hits"]],
            "facets": format_aggregations(response["aggregations"])
        })

Performance Optimizations

1. Filter Before Query

filter clauses are cached and faster than must clauses:

# SLOWER - scores every document
{"bool": {"must": [{"term": {"make": "toyota"}}]}}
 
# FASTER - doesn't score, uses cache
{"bool": {"filter": [{"term": {"make": "toyota"}}]}}

We put everything in filter unless we need relevance scoring (like text search).

2. Routing for Single-Make Searches

If searching for one make, only hit that shard:

if params.get("make") and not "," in params["make"]:
    routing = get_routing_key(params["make"])
    # Search only hits one shard instead of all 64

3. Aggregation Sampling

For facet counts, approximate is fine:

"cardinality": {
    "field": "vin",
    "precision_threshold": 100  # Faster, ~5% error at high counts
}

precision_threshold=100 is accurate up to ~100 unique values, then approximates. Users don't care if it's "12,345" vs "12,412".

Key Lessons

Lesson 1: Configuration Over Code

30 if statements = unmaintainable. A config dictionary + generic handler = clean.

Lesson 2: Validate Early

Reject bad input before it hits the database. Detailed error messages save debugging time.

Lesson 3: Know Your Query Clauses

filter = fast, cached, no scoring must = slower, scores documents

Use filter for structured data (dropdowns, checkboxes). Use must for text search where relevance matters.

Lesson 4: Measure What Matters

We tracked:

P95 response time by number of active filters
Most common filter combinations
Zero-result searches (users filtering too aggressively)

This data drove optimization priorities.

Quick Reference

Filter types cheatsheet:

# Exact match
{"term": {"field": "value"}}
 
# Multiple values (OR)
{"terms": {"field": ["val1", "val2"]}}
 
# Range
{"range": {"field": {"gte": 10, "lte": 100}}}
 
# Geo distance
{"geo_distance": {"distance": "50mi", "location": {"lat": 34, "lon": -118}}}
 
# Boolean
{"term": {"field": True}}

Aggregation pattern:

"aggs": {
    "my_facet": {
        "terms": {"field": "my_field", "size": 50}
    }
}

That's how you build a search API that doesn't collapse under its own complexity. Config-driven, validated, and fast.

Why Did Toyota Searches Take 3x Longer? - How we made single-make searches 5x faster
Why Did Our Search Get Slower? - Why we needed Elasticsearch in the first place
Deduplicating 3 Million Records - Cleaning the data before indexing