Geospatial Indexing (H3 / Geohash)
Turn 'find things near me' into fast cell lookups instead of expensive distance scans.
Geospatial indexing turns location search into indexed cell lookups instead of full-table distance scans. Products such as ride hailing, delivery, maps, local search, fleet tracking, and marketplace discovery all need the same primitive: given a latitude/longitude and a radius, find nearby entities quickly.
The problem: haversine scans do not scale
The mathematically correct distance between two latitude/longitude points is often computed with the haversine formula. The naive query is: scan every driver, restaurant, or point of interest; compute distance from the user; sort; return the closest results. That is fine for 1,000 rows and terrible for 100 million.
function nearby(userLat, userLng, radiusMeters):
matches = []
for place in allPlaces:
d = haversine(userLat, userLng, place.lat, place.lng)
if d <= radiusMeters:
matches.append((place, d))
return sortByDistance(matches).take(50)
// Correct, but O(number of places) per request.The failure mode is predictable: every "near me" request burns CPU across the entire dataset, cannot use a normal B-tree index well, and gets slower as your city coverage grows. You need a way to narrow the candidate set before doing exact distance math.
| Approach | Candidate count | Failure mode |
|---|---|---|
| Full scan plus haversine | Every row in the table | CPU grows with total dataset size |
| Bounding box | Rows in a lat/lng rectangle | Better, but awkward near poles and date line |
| Grid cell index | Rows in target and neighbor cells | Requires precision choice and boundary filtering |
Grid systems: Geohash, H3, S2, and quadtrees
A geospatial index divides the world into named cells. Each point is encoded into the cell containing it, and nearby search becomes a lookup over the current cell plus neighboring cells. Different systems choose different cell shapes and hierarchy rules.
| System | Cell shape | Common strengths |
|---|---|---|
| Geohash | Rectangles encoded as base32 strings | Simple prefix queries; easy to store in text indexes |
| H3 | Mostly hexagons at multiple resolutions | Uniform neighbor behavior; popular for analytics and dispatch |
| S2 | Cells projected from a sphere onto cube faces | Strong spherical geometry; used in maps infrastructure |
| Quadtree | Recursive square subdivision | Simple hierarchy; useful for tiles and spatial partitioning |
All of them share the same design move: convert two continuous numbers into a discrete key. Once location becomes a key, you can index it, shard by it, cache it, and query adjacent keys.
Encoding lat/lng into cells and prefixes
Geohash interleaves longitude and latitude bits, then encodes them as a base32 string. Nearby locations usually share a prefix. Short prefixes describe large areas; longer prefixes describe smaller areas. That makes geohash friendly to prefix and range lookups.
// Example geohashes around Bengaluru.
restaurant_a = "tdr1v9x8"
restaurant_b = "tdr1v9z2"
restaurant_c = "tdr1uabc"
restaurant_far = "tdr4p123"
// Query a neighborhood-sized prefix.
prefix = "tdr1v9"
SELECT *
FROM places
WHERE geohash >= "tdr1v9"
AND geohash < "tdr1va";
// Then compute exact haversine distance on the returned candidates.H3 and S2 use numeric cell IDs instead of geohash strings, but the mental model is similar: encode a point to a cell at a chosen precision, look up rows in that cell and neighbors, then filter by exact distance.
Why exact filtering still matters
Cells are approximations. A prefix or cell query returns candidates in a region, not a perfect circle. Always run an exact distance check after the index lookup so users do not see restaurants 9 km away in a 5 km search.
Querying a radius with neighbor cells
A correct radius query must include cells around the query point, because the closest object might sit just across a cell boundary. H3 calls this a k-ring: all cells within k steps of the center cell. Geohash implementations compute adjacent prefixes that cover a bounding box.
function nearby(lat, lng, radiusMeters):
resolution = chooseResolution(radiusMeters)
center = h3.latLngToCell(lat, lng, resolution)
cells = h3.gridDisk(center, kFor(radiusMeters, resolution))
candidates = []
for cell in cells:
candidates += index.lookup(cell)
return candidates
.map(p => (p, haversine(lat, lng, p.lat, p.lng)))
.filter((p, d) => d <= radiusMeters)
.sortBy(d)
.take(50)- Candidate generation: use cells to cheaply find a small superset of possible matches.
- Exact filtering: compute haversine or a more precise distance only on that candidate set.
- Ranking: sort by distance, availability, rating, delivery time, surge, or product-specific features.
Precision levels and operational trade-offs
Precision controls cell size. Coarse cells cover large areas and return too many candidates. Fine cells cover small areas and require scanning many cells for a large radius. The best precision depends on query radius, entity density, and update frequency.
| Choice | Benefit | Cost |
|---|---|---|
| Coarse cells | Few cells per query; simple routing | Many false positives in dense cities |
| Fine cells | Small candidate sets near the user | More neighbor cells to cover a radius |
| Multiple precisions | Pick resolution by query radius | More indexes or duplicated cell memberships |
- Dense city vs rural area: the same radius can return 50 restaurants downtown or two restaurants in a rural area. Adaptive search may expand rings until enough candidates appear.
- Moving objects: drivers and couriers update location frequently. Keep the geospatial index write path cheap and expire old locations quickly.
- Hot cells: airports, stadiums, and city centers can become hot partitions. Split busy cells, add secondary sharding, or cache popular lookups.
- Earth geometry: latitude/longitude rectangles behave oddly near poles and the international date line. Libraries such as H3 and S2 help hide those edge cases.
Real-world examples
Ride-hailing systems such as Uber and Lyft use cell indexes to match riders to nearby drivers and to aggregate supply by region. Maps and local search use them for points of interest. Delivery systems use them to decide whether an address is inside a service area and to estimate nearby courier availability.
| Use case | Indexed entity | Query pattern |
|---|---|---|
| Ride hailing | Driver current location | Find available drivers in nearby cells, then rank by ETA |
| Food delivery | Restaurants and couriers | Find restaurants in service radius and couriers near pickup |
| Maps search | Points of interest | Prefix or cell lookup around viewport, then rank results |
| Fraud and analytics | Events by location | Aggregate counts by H3 resolution over time windows |
- Naive haversine scans are correct but scale with the full dataset, so they fail for large nearby-search systems.
- Geospatial indexes encode latitude/longitude into cells using systems such as Geohash, H3, S2, or quadtrees.
- A radius query looks up the center cell plus neighbors, then filters candidates with exact distance math.
- Precision is a trade-off: coarse cells return too many candidates, fine cells require more cell lookups.
- Real systems still handle boundaries, hot cells, moving objects, date-line edge cases, and product-specific ranking.
Mark it complete to track your progress through the workbook.