Spec: Filtered ANN Contracts
Status: partially implemented Risk tier: CAUTION Primary goal: define safe filtered vector-search contracts without weakening the current sorted_hnsw ordered-scan guarantee.
Current completion state:
- Done: F3 route-first partition/routing-aware helper via
sorted_hnsw_partition_search(...), including explicit selected-leaf validation, leaf-local index validation, local ANN candidate pools, and global exact rerank. - Done: routed-search underfill diagnostics via
sorted_hnsw_partition_search_status(...). - Done: explicit selected-leaf exact fallback for routed-search underfill via
exact_fallback := true. - Proposed: F2 overfetch/exact filtered rerank with explicit underfill metadata and fallback reporting for arbitrary non-shard filters.
- Non-goal for now: transparent arbitrary
WHERE ... ORDER BY embedding <=> qplanner support.
Problem
The planner-integrated sorted_hnsw index path currently targets a narrow shape:
SELECT ...
FROM items
ORDER BY embedding <=> $query
LIMIT k;
It deliberately avoids extra base-table quals and parameterized scans. A normal executor filter applied after a bounded ANN scan can under-return results: the index might return ef_search approximate nearest candidates, the executor then filters most of them away, and the final result contains fewer than k rows even though more qualifying rows exist in the table.
This is a correctness contract issue, not just a planner-cost issue.
Non-Goals
- Do not make the existing
sorted_hnswordered Index AM silently accept arbitraryWHEREclauses. - Do not claim pgvector-compatible filtered ANN semantics until the final top-k contract and underfill behavior are explicitly tested.
- Do not implement global cross-partition HNSW in this spec.
- Do not conflate fact-shaped GraphRAG routing with generic row filtering.
Current Stable Contract
sorted_hnsw is stable for pure ordered KNN over a base relation:
ORDER BY embedding <=> query LIMIT k;- no unbounded KNN;
- no
LIMIT > sorted_hnsw.ef_search; - no extra base-table quals that would make executor filtering under-return candidates;
- exact rerank happens inside the index scan before rows are returned.
Filtered retrieval should currently use one of:
- materialize/filter first, then query the filtered relation;
- GraphRAG helper/wrapper APIs for fact-shaped relation filters;
- routed GraphRAG for tenant/segment/knowledge-base shard routing.
Proposed Filtered ANN Modes
F1. Pre-filter then ANN
Use when the filter is selective and can be materialized cheaply.
Contract:
- build or materialize the filtered candidate set first;
- run ANN or exact search over that candidate set;
- final top-k is relative to the filtered set.
Example shape:
CREATE TEMP TABLE candidate_items AS
SELECT *
FROM items
WHERE tenant_id = 42
AND status = 'active';
CREATE INDEX candidate_items_embedding_idx
ON candidate_items USING sorted_hnsw (embedding);
SELECT *
FROM candidate_items
ORDER BY embedding <=> $query
LIMIT 10;
This is operationally heavier, but the contract is simple: the ANN index sees only eligible rows.
F2. ANN overfetch then exact filtered rerank
Use when the filter is not too selective and overfetch can be bounded.
Contract:
- ANN produces a candidate pool larger than
top_k; - executor or helper applies the filter to the pool;
- exact distance is recomputed on surviving candidates;
- if fewer than
top_krows survive, the result must exposeunderfilled=trueor fall back to a larger search/exact path.
Required metadata:
ann_k: ANN candidate budget;top_k: final result budget;matched_after_filter: number of candidates surviving the filter;underfilled: whether final rows< top_k;- optional fallback marker:
none | underfilled_no_fallback | exact_filtered.
This should be a helper/function contract before it is a transparent planner path, because callers need to see when the filtered result is approximate or underfilled.
F3. Partition/routing-aware filtered ANN
Use when filters map to physical shards, partitions, tenants, segment groups, or knowledge-base routes.
Contract:
- route first, using exact metadata (
tenant_id, segment registry, partition bounds, or GraphRAG route profile); - run local ANN on selected shards/leaves;
- exact-rerank a global candidate pool;
- final top-k is selected by global exact distance, not by concatenating local top-k lists.
Required global merge invariant:
global_top_k = exact_top_k(union(local_candidate_pools))
Local top-k concatenation is insufficient because a dense shard can contribute more than k_per_shard globally relevant rows.
Implemented first-pass helper:
SELECT *
FROM sorted_hnsw_partition_search(
'items_parent'::regclass,
'embedding',
'[0.1,0.2,0.3,...]',
top_k := 10,
local_k := 32,
leaf_relids := ARRAY['items_tenant_42'::regclass]);
This helper keeps the planner guard intact. It does not make arbitrary WHERE ... ORDER BY embedding <=> query LIMIT k transparent, because executor filters can still underfill bounded ANN scans. Instead, callers route first to whole eligible leaves, run local sorted_hnsw scans, and receive a globally reranked result over the union of local candidate pools.
Implemented diagnostic helper:
SELECT *
FROM sorted_hnsw_partition_search_status(...);
This reports returned_rows, underfilled, and fallback for the same routed search contract. By default it is diagnostic and does not silently fall back to exact search. If the caller explicitly sets exact_fallback := true, an underfilled ANN candidate pool is replaced with an exact rerank over the same selected leaves and fallback reports exact_filtered.
Safety checks:
local_k >= top_k;local_k <= sorted_hnsw.ef_search;- unsupported leaves fail closed by default;
- selected sorted_heap leaves must have a valid leaf-local
sorted_hnswindex on the vector column, otherwise the helper fails instead of exact-sorting; - explicit selected leaves must belong to the requested parent;
- selected leaf routing is explicit via
leaf_relids.
Acceptance Tests
F1. Pre-filter materialization correctness
Create a table with two tenants and embeddings where the nearest global row is outside the target tenant.
Expected:
- filtered materialization returns only target-tenant rows;
- final top-k matches exact search over the target tenant.
F2. Overfetch underfill detection
Create a table where the first ann_k ANN candidates mostly fail the filter.
Expected:
- helper does not silently return fewer than
top_kas if complete; - result metadata reports
underfilled=true, or the helper escalates to a documented fallback.
F3. Cross-shard global merge
Create two shards where one shard contains all true top-k rows.
Expected:
- route selects both shards;
- local candidate pools are unioned;
- exact global rerank returns all true top-k rows from the dense shard;
- test fails if implementation concatenates fixed local top-k slices.
Current status:
- Covered for partitioned
svecandhsvecleaves in thesorted_hnswregression. The helper routes selected leaves, unions local pools, globally orders by exact distance, rejectslocal_k > sorted_hnsw.ef_search, and rejects selected leaves without a validsorted_hnswindex or without a parent-child relationship to the requested parent. - The same regression covers
sorted_hnsw_partition_search_status(...)for a complete selected-leaf result, an underfilled selected-leaf result, and the explicitexact_fallback := truemarker path. scripts/bench_partitioned_sorted_hnsw.shprovides an operator benchmark for selected-leaf routing versus parent filtered exact and all-leaf fanout.
F4. Planner safety guard
Run a base-table WHERE clause with ORDER BY embedding <=> query LIMIT k.
Expected:
- planner does not select the transparent
sorted_hnswpath when the qual could under-return candidates; - if a future planner path supports this shape, EXPLAIN must expose the filtered-ANN mode and underfill/fallback behavior.
Current status:
- Covered by
sorted_hnswregression. The guard asserts that the planner-integratedsorted_hnswindex path is not selected for unbounded KNN,LIMIT > sorted_hnsw.ef_search, or a base-table filtered KNN query.
Implementation Direction
Preferred order:
- Keep the current planner guard intact.
- Treat F3 partition/route-aware search as the first implemented helper-level contract: explicit leaf routing, local ANN, global exact rerank, fail-closed leaf/index validation.
- Add helper-level F2 with explicit result metadata only if users need filtered retrieval that does not map cleanly to whole leaves.
- Reuse routed GraphRAG metadata for F3 where filters are actually routing predicates.
- Benchmark whether the current PL/pgSQL F3 helper should be moved to C before adding more filtered-ANN API surface.
- Only after helper contracts are tested, consider a transparent planner path for a narrow class of safe filters.
Quadrumvirate Notes
Cassandra:
- Likely failure mode: making filtered ANN look like a normal index scan while silently under-returning after executor filtering.
- Likely failure mode: overfetch constants that pass one demo but lack underfill detection.
Daedalus:
- Reframe from “support WHERE filters in HNSW” to “choose a filtered retrieval contract based on whether the filter is a materialization, overfetch, or routing problem”.
Maieutic:
- Assumption: users want pgvector-like syntax. Counterpoint: they need correctness and observability more than syntax if the result can underfill.
- Assumption: per-shard top-k is enough. Falsifier: one shard contains all true top-k rows.
Adversary:
- Any helper must expose underfill/fallback state.
- Any cross-shard path must do global exact rerank.
- Any planner-integrated path must explain why executor filtering cannot invalidate the final top-k contract.