Why Did Toyota Searches Take 3x Longer Than Lamborghini?

The Complaint That Confused Me

A week after launch, our analytics showed something weird:

Average search response time by make:
 
Lamborghini:    45ms  ✓
Ferrari:        52ms  ✓
Rivian:         61ms  ✓
...
Honda:          380ms  ✗
Toyota:         420ms  ✗
Ford:           395ms  ✗

Searching for a Toyota took 8x longer than searching for a Lamborghini.

Same code. Same index. Same Elasticsearch cluster. Why?

How Elasticsearch Stores Data

Before I explain the problem, let me explain how Elasticsearch works.

When you create an index, Elasticsearch splits it into shards. Think of shards like filing cabinets:

ELASTICSEARCH INDEX: "cars"
 
┌─────────────────────────────────────────────────────┐
│                                                     │
│   ┌─────────┐  ┌─────────┐  ┌─────────┐            │
│   │ Shard 0 │  │ Shard 1 │  │ Shard 2 │   ...      │
│   │         │  │         │  │         │            │
│   │  Docs   │  │  Docs   │  │  Docs   │            │
│   └─────────┘  └─────────┘  └─────────┘            │
│                                                     │
└─────────────────────────────────────────────────────┘

By default, when you add a document, Elasticsearch decides which shard to put it in using a hash of the document ID:

shard = hash(document_id) % number_of_shards

Sounds fair, right? Random distribution. Each shard gets roughly the same amount of data.

The Problem: Our Data Wasn't Random

Here's what was actually in our database:

VEHICLES BY MAKE:
 
Toyota:       312,000 listings
Honda:        287,000 listings
Ford:         298,000 listings
Chevrolet:    245,000 listings
...
Lamborghini:  1,200 listings
Ferrari:      890 listings
Bugatti:      47 listings

Toyota had 312,000 listings. Lamborghini had 1,200.

But with default sharding, both Toyota and Lamborghini documents were spread evenly across all shards.

Why is this a problem?

When Someone Searches for Toyota

Let's say we have 5 shards (we actually had more, but simpler to explain):

SEARCH: "Toyota Camry"
 
┌─────────────────────────────────────────────────────┐
│                                                     │
│   ┌─────────┐  ┌─────────┐  ┌─────────┐  ...       │
│   │ Shard 0 │  │ Shard 1 │  │ Shard 2 │            │
│   │         │  │         │  │         │            │
│   │ Toyota: │  │ Toyota: │  │ Toyota: │            │
│   │ 62,000  │  │ 63,000  │  │ 62,000  │            │
│   │ Honda:  │  │ Honda:  │  │ Honda:  │            │
│   │ 57,000  │  │ 58,000  │  │ 57,000  │            │
│   │ Ford:   │  │ Ford:   │  │ Ford:   │            │
│   │ 59,000  │  │ 60,000  │  │ 60,000  │            │
│   │ ...etc  │  │ ...etc  │  │ ...etc  │            │
│   └─────────┘  └─────────┘  └─────────┘            │
│        ↑            ↑            ↑                  │
│        └────────────┼────────────┘                  │
│                     │                               │
│         QUERY HITS ALL SHARDS                       │
│         Each shard searches ALL its docs            │
│         to find Toyotas                             │
│                                                     │
└─────────────────────────────────────────────────────┘

Elasticsearch has to:

Send the query to ALL 5 shards
Each shard searches through ALL its documents (including Hondas, Fords, etc.)
Each shard returns its Toyota matches
Coordinator combines results from all 5 shards

With 2 million documents spread across 5 shards, that's 400,000 documents per shard being searched even though we only want Toyotas.

The Fix: Route by Make

What if all Toyotas lived in the same shard?

BEFORE (default routing):
 
Shard 0: Toyota, Honda, Ford, BMW, Lamborghini, Tesla, ...
Shard 1: Toyota, Honda, Ford, BMW, Lamborghini, Tesla, ...
Shard 2: Toyota, Honda, Ford, BMW, Lamborghini, Tesla, ...
 
Every search hits every shard.
 
 
AFTER (routing by make):
 
Shard 0: Lamborghini (all 1,200)
Shard 1: Ferrari (all 890)
Shard 2: Tesla (all 45,000)
Shard 3: Honda (all 287,000)
Shard 4: Toyota (all 312,000)
...
 
Search for Toyota? Only hit Shard 4.

Elasticsearch supports this with custom routing:

# When indexing
es.index(
    index="cars",
    id=vehicle_id,
    body=document,
    routing=make  # "toyota", "honda", etc.
)
 
# When searching
es.search(
    index="cars",
    body=query,
    routing="toyota"  # Only search the Toyota shard!
)

Our Implementation

We created a mapping of make → shard number:

# cars/constants.py
 
MAKES_DETAILED = {
    "lamborghini": 1,
    "fisker": 2,
    "honda": 3,
    "rivian": 4,
    "mclaren": 5,
    "lotus": 6,
    "rolls-royce": 7,
    "bentley": 8,
    "aston-martin": 9,
    "tesla": 10,
    "toyota": 11,
    "lexus": 12,
    "ford": 13,
    "lincoln": 14,
    # ... up to 64 makes
    "other": 64,  # Catch-all for rare makes
}

When indexing:

def get_routing_key(make):
    """Get shard routing key for a vehicle make"""
    make_lower = make.lower().strip()
    return MAKES_DETAILED.get(make_lower, MAKES_DETAILED["other"])
 
def index_vehicle(vehicle):
    routing = get_routing_key(vehicle['make'])
 
    es.index(
        index="cars",
        id=vehicle['id'],
        body=transform_vehicle(vehicle),
        routing=routing
    )

When searching:

def search_vehicles(make=None, **filters):
    search_params = build_query(make, **filters)
 
    # If searching for specific make, route to that shard only
    if make:
        routing = get_routing_key(make)
        response = es.search(
            index="cars",
            body=search_params,
            routing=routing
        )
    else:
        # No make filter = search all shards
        response = es.search(
            index="cars",
            body=search_params
        )
 
    return response

The Results

After implementing routing by make:

BEFORE (all searches hit all shards):
 
Toyota search:      420ms
Honda search:       380ms
Lamborghini search: 45ms
No-make search:     650ms
 
 
AFTER (make searches hit one shard):
 
Toyota search:      85ms   (-80%)
Honda search:       78ms   (-79%)
Lamborghini search: 22ms   (-51%)
No-make search:     680ms  (slightly worse, expected)

Searches with a make filter got 4-5x faster.

Searches without a make filter got slightly slower (because shards are now unevenly sized). But 80%+ of our searches included a make, so this was a huge net win.

Multi-Make Searches

Some users search for multiple makes: "Show me Toyota OR Honda."

def search_vehicles(makes=None, **filters):
    if makes and len(makes) > 0:
        # Route to multiple shards
        routings = [get_routing_key(m) for m in makes]
        routing_param = ",".join(map(str, routings))
 
        response = es.search(
            index="cars",
            body=search_params,
            routing=routing_param  # "3,11" for Honda and Toyota
        )
    else:
        # Search all shards
        response = es.search(index="cars", body=search_params)

Elasticsearch accepts comma-separated routing values. So "Toyota OR Honda" only hits shards 11 and 3 not all 64.

Gotchas We Hit

1. Can't Change Routing After Indexing

Once a document is indexed with routing="toyota", you can't search for it with routing="honda". Sounds obvious, but we had a bug where make was null for some records, so they went to "other" shard. Searches for those makes found nothing.

Fix: Validate make before indexing, default to "other" explicitly.

2. Routing Changes Require Reindex

We started with 32 shards, then realized we needed 64. Changing shard count requires reindexing everything.

Fix: Plan shard count carefully upfront. Overestimate slightly.

3. Must Pass Routing on Every Operation

Forgot to pass routing when deleting? Elasticsearch won't find the document.

# WRONG - won't find the doc if it was indexed with routing
es.delete(index="cars", id=vehicle_id)
 
# RIGHT - must use same routing as when indexed
es.delete(index="cars", id=vehicle_id, routing=make)

When to Use Custom Routing

Use custom routing when:

Queries almost always filter by a specific field (make, tenant_id, user_id)
That field has high cardinality (many distinct values)
You want to avoid scatter-gather across all shards

Don't use custom routing when:

Queries rarely filter by a consistent field
Field has low cardinality (would create huge shards)
You need flexible querying across all dimensions equally

Key Lessons

Lesson 1: Default Sharding Assumes Random Access

Elasticsearch's default routing optimizes for "any document could be requested." If your access patterns are predictable (always filter by make, tenant, etc.), custom routing is faster.

Lesson 2: Know Your Query Patterns

We analyzed our logs: 83% of searches included a make. That made routing by make a clear win.

If only 20% had included make, it wouldn't have been worth it.

Lesson 3: Routing Is a Tradeoff

Single-field queries get faster. Cross-field queries might get slower (or require hitting multiple shards). Design for your dominant use case.

Quick Reference

Index with routing:

es.index(
    index="cars",
    id=doc_id,
    body=document,
    routing="toyota"
)

Search with routing:

es.search(
    index="cars",
    body=query,
    routing="toyota"  # Single shard
)
 
es.search(
    index="cars",
    body=query,
    routing="toyota,honda"  # Multiple shards
)

Delete with routing:

es.delete(
    index="cars",
    id=doc_id,
    routing="toyota"  # MUST match index routing
)

That's how we made Toyota searches as fast as Lamborghini searches by putting them in different filing cabinets.

Building a Search API with 30+ Filters - The search query that uses this routing
Why Did Our Search Get Slower? - Why we moved from Postgres to Elasticsearch
Deduplicating 3 Million Records - Preparing the data before indexing