Hash, Bitmap, GIN/GiST: Specialty Indexes

What This Concept Is

B+-trees and LSM-trees are generalists. When the workload has a specific shape -- strict equality, low-cardinality filters, multi-valued columns, or geometric data -- a specialty index can be an order of magnitude faster. The four most useful ones to know:

Hash index: a hash table mapping key -> RID. Serves only equality predicates. No ordering, no range scans. O(1) expected lookup. PostgreSQL and SQL Server support them; MySQL InnoDB has an adaptive internal variant.
Bitmap index: for each distinct value in a low-cardinality column, store a bitmap with one bit per row. AND/OR/NOT of several predicates becomes bitwise operations on the bitmaps. Enormous for analytics on columns like status, country, is_active.
GIN (Generalized Inverted Index): for columns whose values contain multiple searchable elements (arrays, jsonb, tsvector for full-text). Stores element -> list_of_row_ids. Dominates for "does this value contain X" predicates.
GiST (Generalized Search Tree): an extensible tree framework supporting multi-dimensional and user-defined types. The workhorse for geospatial (geometry, geography), range types, and nearest-neighbor queries via the <-> operator.

These are not replacements for B+-trees. They are the right shape when the predicate or data type does not match B+-tree's ordered-key access pattern.

Why It Matters Here

Every PostgreSQL user has encountered EXPLAIN output showing Bitmap Index Scan or GIN index. Knowing when each applies is the difference between a 50 ms query and a 10 s one.

The general rule: if a B+-tree produces thousands of random heap fetches for one predicate, check whether a different index structure would avoid that.

Concrete Example -- Bitmap Index

Table orders(status TEXT, country TEXT, ...) with 10^8 rows. status has 4 values and country has 200. Query: WHERE status = 'PENDING' AND country = 'DE'.

Under a B+-tree on status, the index matches 25% of rows -- 25 million RIDs. Following each to the heap is catastrophic.

Under bitmaps:

bitmap(status='PENDING')   : 10^8 bits, ~25% set
bitmap(country='DE')       : 10^8 bits, ~1%  set
AND                         : 10^8 bits, ~0.25% set  -> ~250,000 rows

The AND is a straight scan over two bit arrays, highly vectorizable. Then the engine does a bitmap heap scan: iterate set bits, fetch heap rows in page order so I/O becomes sequential. A query that was millions of random reads becomes a few bitmap operations plus ordered heap reads.

Concrete Example -- GIN for jsonb

CREATE INDEX ON docs USING GIN (data jsonb_path_ops);
SELECT * FROM docs WHERE data @> '{"tags": ["urgent"]}';

A B+-tree on data would be useless here -- a JSON blob's comparison order does not match "contains this tag." The GIN index instead indexes each (path, value) pair inside the JSON and can point directly at the rows that contain "tags": ["urgent"].

Concrete Example -- GiST for Geospatial

CREATE INDEX ON places USING GIST (location);
SELECT id FROM places ORDER BY location <-> POINT(-74.0, 40.7) LIMIT 10;

Nearest-neighbor queries over a B+-tree are not well-defined; distance in 2D is not a total order. GiST stores a hierarchy of bounding boxes and prunes the search to the boxes that could contain the nearest k points.

When Each Fits

Workload	Best structure	Why
Equality only, high cardinality, no ranges	Hash index	O(1) lookup, minimal space overhead
Low-cardinality filters, analytical queries	Bitmap	Bitwise AND/OR across many predicates
Full-text, arrays, `jsonb` containment	GIN	Inverted list of embedded tokens
Geospatial, ranges, nearest-neighbor	GiST / SP-GiST / BRIN	Multi-dimensional pruning
Ordered range scans + point lookups	B+-tree	The generalist default
Write-heavy KV, possibly append	LSM-tree	Sequential writes, batched compaction

Common Confusion / Misconception

"Bitmap indexes are only for columns with two values." The technique works up to a few thousand distinct values; the bitmap count grows with cardinality, but compression (run-length encoded bitmaps like Roaring) keeps storage sub-linear. They fail when cardinality is comparable to the row count -- the bitmaps become as expensive as the rows themselves.

"Hash indexes are always faster than B+-trees for equality." Only until you need a range, an ordered iteration, or durability guarantees the engine does not offer for hash indexes. PostgreSQL's hash indexes were not crash-safe until version 10 -- one reason B+-tree remained the default.

How To Use It

When a predicate is not best served by a B+-tree:

Identify the predicate shape: equality, containment, spatial, nearest-neighbor, low-cardinality conjunction.
Match it to the structure above.
Check whether your engine supports the index kind you want and whether EXPLAIN actually picks it.
Confirm with a cost comparison: sometimes the optimizer needs hints or statistics refresh to choose correctly.

Check Yourself

Why does a bitmap index dominate a B+-tree for low-cardinality conjunctions but collapse as cardinality grows?
Why can a hash index answer equality faster than a B+-tree yet still be a bad default?
Why are GIN and GiST usually described as "frameworks" rather than single algorithms?

Mini Drill or Application

For each workload, pick hash / bitmap / GIN / GiST / B+-tree / LSM and justify in one sentence:

Append-heavy IoT time-series store with 10^8 events/day, mostly range scans by time.
Analytical filters over country, segment, plan_tier on a 5 * 10^8 row table.
Full-text search over a jsonb column with a few dozen path predicates.
Web-app login by email, nothing else on the column.
Geofencing: find all points within 500 m of a moving vehicle.
Primary key lookup on a customer table.

What This Concept Is​

Why It Matters Here​

Concrete Example -- Bitmap Index​

Concrete Example -- GIN for jsonb​

Concrete Example -- GiST for Geospatial​

When Each Fits​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​