Vector Search

pg_sorted_heap includes two built-in vector types, a planner-integrated sorted_hnsw Index AM, and legacy/manual ANN paths (svec_ann_scan, svec_hnsw_scan). The default vector story is now CREATE INDEX ... USING sorted_hnsw; the older IVF-PQ and sidecar HNSW APIs remain available when you want explicit control over storage or rerank behavior.

Release guidance:

Stable: sorted_hnsw on svec and hsvec
Legacy/manual: svec_ann_scan, svec_ann_search, svec_hnsw_scan

Vector types

Type	Precision	Bytes/dim	Max dimensions	Use case
`svec`	float32	4	16,000	Full precision, training codebooks
`hsvec`	float16	2	32,000	Storage-optimized, large embeddings

Both types support the <=> cosine distance operator (returns 1 − cosine similarity, range [0, 2]). Distance is accumulated in float8 (64-bit) for precision.

hsvec casts to svec implicitly — all PQ/IVF functions accept both types without code changes. svec casts to hsvec via explicit or assignment cast (lossy, float32 → float16).

Dimension envelope: pgvector’s dense vector type is limited to 2,000 dimensions, while halfvec extends that to 4,000. svec supports up to 16K dims and hsvec up to 32K dims, so pg_sorted_heap still has a larger native ANN/storage envelope for very high-dimensional embeddings.

-- svec: float32, up to 16,000 dimensions
CREATE TABLE items (
    id        text PRIMARY KEY,
    embedding pg_sorted_heap.svec(768)
);

-- hsvec: float16, up to 32,000 dimensions
CREATE TABLE items_compact (
    id        text PRIMARY KEY,
    embedding pg_sorted_heap.hsvec(768)
);

-- Insert with bracket notation (same for both types)
INSERT INTO items VALUES ('doc1', '[0.1, 0.2, 0.3, ...]');

-- Cosine distance operator works on both types
SELECT a.id, a.embedding <=> b.embedding AS distance
FROM items a, items b
WHERE a.id = 'doc1' AND b.id = 'doc2';

-- hsvec casts to svec implicitly for PQ/IVF functions
SELECT svec_cosine_distance('[1,0,0]'::hsvec, '[0,1,0]'::hsvec);

Current default: `sorted_hnsw`

For new deployments, prefer the Index AM. It supports both svec and hsvec source columns; use hsvec when you want the heap/TOAST footprint to stay close to pgvector halfvec, with float32 used only as internal scratch during build/search/rerank.

CREATE TABLE items (
    id        bigserial PRIMARY KEY,
    embedding pg_sorted_heap.svec(384),
    body      text
);

CREATE INDEX items_embedding_idx
ON items USING sorted_hnsw (embedding)
WITH (m = 16, ef_construction = 200);

SET sorted_hnsw.shared_cache = on;
SET sorted_hnsw.ef_search = 96;

SELECT id, body
FROM items
ORDER BY embedding <=> '[0.1,0.2,0.3,...]'::pg_sorted_heap.svec
LIMIT 10;

On constrained builders, the current low-memory build knob is:

SET sorted_hnsw.build_sq8 = on;

That makes CREATE INDEX ... USING sorted_hnsw build the graph from SQ8-compressed build vectors instead of a full float32 build slab. The tradeoff is one extra heap scan during build and a possible graph-quality loss on some corpora, but on the current local 1M x 64D multidepth GraphRAG point it preserved quality and slightly improved build time.

Compact-storage variant:

CREATE TABLE items_compact (
    id        bigserial PRIMARY KEY,
    embedding pg_sorted_heap.hsvec(384),
    body      text
);

CREATE INDEX items_compact_embedding_idx
ON items_compact USING sorted_hnsw (embedding hsvec_cosine_ops)
WITH (m = 16, ef_construction = 200);

This path is planner-integrated and exact-reranks internally. There is no sidecar prefix argument and no manual rerank_topk in the index-scan path.

Current ordered-scan contract:

automatic sorted_hnsw planning is for base-relation ORDER BY embedding <=> query LIMIT k
the planner does not use the current Phase 1 path when there is no LIMIT, when LIMIT > sorted_hnsw.ef_search, or when extra base-table quals would make the index under-return candidates
sorted_hnsw.shared_cache is most useful when shared_preload_libraries = 'pg_sorted_heap'; otherwise scans fall back to backend-local cache builds

For filtered retrieval or expansion workflows, materialize/filter first or use the GraphRAG helper API instead of expecting the ordered index scan to serve as a general filtered ANN primitive. The remaining filtered-ANN contracts are tracked in Filtered ANN Contracts.

For declarative partitioned tables, sorted_hnsw_partition_search(...) provides an explicit route-first helper. Selected leaves run local sorted_hnsw scans, their candidate pools are unioned, and the final result is globally reranked by exact distance. Use this when tenant/time/segment routing maps to whole partitions instead of executor filters inside one leaf.

Benchmark the route-first contract with:

make bench-partitioned-sorted-hnsw

On the local PostgreSQL 18 default run (8 x 50K rows, self-query top-10), selected-leaf routing measured 5.359 ms average at 100.0% recall@10 versus 8.849 ms for the parent filtered exact query. The same run showed all-leaf fanout around 23-25 ms. The script now also reports direct_leaf_index: 2.942 ms on the same run, which quantifies the current PL/pgSQL wrapper overhead and is the main signal for a future C fanout helper. The promotion criteria for that work are tracked in Partitioned HNSW C Helper Gate.

Legacy/manual IVF-PQ quick start

1. Create table

The table uses partition_id (IVF cluster assignment) as the leading PK column. This makes sorted_heap cluster rows by IVF partition physically — the zone map then skips irrelevant partitions at the I/O level.

CREATE TABLE vectors (
    id           text,
    partition_id int2 GENERATED ALWAYS AS (
                     pg_sorted_heap.svec_ivf_assign(embedding, 1)) STORED,
    embedding    pg_sorted_heap.svec(768),
    pq_code      bytea GENERATED ALWAYS AS (
                     pg_sorted_heap.svec_pq_encode_residual(
                         embedding,
                         pg_sorted_heap.svec_ivf_assign(embedding, 1),
                         2, 1)) STORED,
    PRIMARY KEY (partition_id, id)
) USING sorted_heap;

The pq_code column stores M-byte Product Quantization codes. Both columns are generated automatically — you only INSERT id and embedding.

2. Train codebooks

Permissions: Training creates internal metadata tables in the extension schema. The calling role needs CREATE privilege on that schema (or be the extension owner / superuser). For non-superuser roles: GRANT CREATE ON SCHEMA <ext_schema> TO <role>;

-- Train IVF centroids (nlist partitions) + PQ codebook (M subvectors)
SELECT * FROM pg_sorted_heap.svec_ann_train(
    'SELECT embedding FROM vectors',
    nlist := 64,     -- number of IVF partitions
    m     := 192     -- PQ subvectors (768/192 = 4-dim each)
);
-- Returns: ivf_cb_id=1, pq_cb_id=1

-- For higher recall, train residual PQ (trains on vec - centroid residuals)
SELECT pg_sorted_heap.svec_pq_train_residual(
    'SELECT embedding FROM vectors',
    m := 192, ivf_cb_id := 1);
-- Returns: pq_cb_id=2

After training, compact the table so rows re-cluster by their new partition_id:

SELECT pg_sorted_heap.sorted_heap_compact('vectors');

3. Search

-- PQ-only (fastest): ~8 ms, R@1 79% cross-query / 100% self-query
SELECT * FROM pg_sorted_heap.svec_ann_scan(
    'vectors', query_vec,
    nprobe := 3, lim := 10,
    cb_id := 2, ivf_cb_id := 1);

-- With reranking (higher recall): ~22 ms, R@1 97%
SELECT * FROM pg_sorted_heap.svec_ann_scan(
    'vectors', query_vec,
    nprobe := 10, lim := 10, rerank_topk := 200,
    cb_id := 2, ivf_cb_id := 1);

How IVF-PQ works

query vector
    │
    ├─ IVF probe: find nearest nprobe centroids
    │     → partition_id IN (3, 17, 42, ...)
    │
    ├─ PQ ADC: for each candidate row, sum M precomputed distances
    │     → O(M) per row using M-byte PQ code (no TOAST decompression)
    │
    ├─ Top-K: max-heap selects best candidates
    │
    └─ Optional rerank: exact cosine on top candidates → return top-K

Physical clustering by (partition_id, id) means the IVF probe translates directly to a small set of physical block ranges — sorted_heap’s zone map skips all other partitions at the I/O level.

Residual PQ

Standard PQ encodes vectors directly. Residual PQ encodes the residual (vector − IVF centroid) instead. This removes inter-centroid variance so PQ focuses on fine intra-cluster distinctions, improving recall at no storage cost.

The trade-off: residual PQ requires computing a separate distance table per probed centroid (vs one global table for standard PQ), roughly doubling the PQ-only latency. With reranking, the difference is negligible.

Use residual PQ by passing ivf_cb_id to svec_ann_scan:

-- cb_id=2 is the residual PQ codebook, ivf_cb_id=1 is the IVF codebook
SELECT * FROM pg_sorted_heap.svec_ann_scan(
    'vectors', query_vec,
    nprobe := 3, lim := 10,
    cb_id := 2, ivf_cb_id := 1);

Tuning guide

nprobe and rerank_topk

Use case	nprobe	rerank	Latency	R@1	Recall@10
Lowest latency	1	0	5.5 ms	54%	48%
Self-query RAG	3	0	8 ms	100%*	71%
Balanced	5	96	12 ms	89–93%	86–93%
High quality	10	200	22 ms	97–99%	94–99%

* Self-query R@1 = 100% because the query vector is in the dataset. Cross-query R@1 at nprobe=3 is 79%. The recall ranges reflect different dataset sizes (10K–103K vectors).

Guidelines:

Start with nprobe=3, no rerank for RAG workloads (searching your own corpus)
Add rerank=96 if you need cross-query accuracy (query not in corpus)
Increase nprobe for higher recall at the cost of latency
nprobe × (rows/nlist) = approximate number of PQ codes scanned per query

nlist and M

Parameter	Effect
nlist (IVF partitions)	More partitions = more precise routing but smaller clusters. 64–256 typical.
M (PQ subvectors)	More subvectors = higher PQ fidelity. dim/M = subvector dimension (4–16 typical).

Rule of thumb: nlist ≈ sqrt(N) where N is dataset size. M = dim/4 gives 4-dimensional subvectors — a good balance of fidelity and code size.

Benchmarks

The tables below mix the current stable Index AM path with the older legacy/manual ANN paths. Use the sorted_hnsw rows as the release default; use the IVF-PQ and sidecar rows only when you explicitly want those manual trade-offs.

For future very-large tables, IVF-PQ / SQ-family revival is tracked as a benchmark-gated scale feature, not as a default replacement. See Large-Vector Sublinear Search.

SIMD ADC optimization and pgvectorscale StreamingDiskANN comparison are tracked separately in SIMD ADC And DiskANN Comparison.

103K vectors, 2880-dim (Gutenberg corpus)

Residual PQ (M=720, dsub=4), 256 IVF partitions. 1 Gi k8s pod, PostgreSQL 18. 100 cross-queries (self-match excluded):

Config	R@1	Recall@10	Avg latency
nprobe=1, PQ-only	54%	48%	5.5 ms
nprobe=3, PQ-only	79%	71%	8 ms
nprobe=3, rerank=96	82%	74%	10 ms
nprobe=5, rerank=96	89%	86%	12 ms
nprobe=10, rerank=200	97%	94%	22 ms

10K vectors, 2880-dim (float32 precision test)

Same corpus, pure svec (float32), nlist=64, M=720 residual PQ:

Config	R@1	Recall@10
nprobe=1, PQ-only	56%	56%
nprobe=3, PQ-only	72%	82%
nprobe=5, rerank=96	93%	93%
nprobe=10, rerank=200	99%	99%

float32 vs float16 precision

Tested the same 10K Gutenberg vectors stored as float32 (svec) vs float16-degraded (svec → hsvec → svec roundtrip). Both trained independently with identical parameters. No measurable recall difference — float16 precision loss (~1e-7) is 1000× smaller than typical distance gaps between neighbors (~1e-4). Precision is not the recall bottleneck; PQ quantization and IVF routing are. This confirms hsvec is a safe storage choice for ANN.

Comparison with other vector search engines

Current repo-owned harnesses:

python3 scripts/bench_gutenberg_local_dump.py --dump /tmp/cogniformerus_backup/cogniformerus_backup.dump --port 65473
REMOTE_PYTHON=/path/to/python SH_EF=32 EXTRA_ARGS='--sh-ef-construction 200' ./scripts/bench_gutenberg_aws.sh <aws-host> /path/to/repo /path/to/dump 65485
scripts/bench_sorted_hnsw_vs_pgvector.sh /tmp 65485 10000 20 384 10 vector 64 96
make bench-partitioned-sorted-hnsw
python3 scripts/bench_ann_real_dataset.py --dataset nytimes-256 --sample-size 10000 --queries 20 --k 10 --pgv-ef 64 --sh-ef 96 --zvec-ef 64 --qdrant-ef 64
python3 scripts/bench_qdrant_synthetic.py --rows 10000 --queries 20 --dim 384 --k 10 --ef 64
python3 scripts/bench_zvec_synthetic.py --rows 10000 --queries 20 --dim 384 --k 10 --ef 64

AWS restored Gutenberg dump (~104K x 2880D, top-10, exact heap GT on the restored svec table). Host: AWS ARM64, 4 vCPU, 8 GiB RAM. In the current rerun the stored bench_hnsw_gt table matched the recomputed exact GT on 100% of the 50 benchmark queries after restore, so the fresh exact heap GT and the historical GT table agree. This rerun uses sorted_hnsw ef_construction=200 and ef_search=32, and the benchmark harness reconnects after build before timing ordered scans.

Method	p50 latency	Recall@10	Notes
Exact heap (`svec`)	458.762 ms	100.0%	brute-force GT on restored corpus
`sorted_hnsw` (`svec`)	1.287 ms	100.0%	`ef_construction=200`, `ef_search=32`, index 404 MB, total 1902 MB
`sorted_hnsw` (`hsvec`)	1.404 ms	100.0%	`ef_construction=200`, `ef_search=32`, index 404 MB, total 1032 MB
pgvector HNSW (`halfvec`)	2.031 ms	99.8%	`ef_search=64`, index 804 MB, total 1615 MB
zvec HNSW	50.499 ms	100.0%	in-process collection, `ef=64`, ~1.12 GiB on disk
Qdrant HNSW	6.028 ms	99.2%	local Docker on same AWS host, `hnsw_ef=64`, 103,260 points

The precision-matched PostgreSQL row on Gutenberg is sorted_hnsw (hsvec) vs pgvector halfvec: 1.404 ms @ 100.0% versus 2.031 ms @ 99.8%, with total footprint 1032 MB versus 1615 MB. The raw fastest PostgreSQL row on this corpus is still sorted_hnsw (svec) at 1.287 ms, but that uses float32 source storage. The sorted_hnsw index stays 404 MB in both cases because it stores SQ8 graph state; the size win from hsvec appears in the source table and TOAST footprint (1902 MB -> 1032 MB), not in the index.

Synthetic 10K x 384D cosine corpus, top-10, warm query loop. PostgreSQL methods were rerun across 3 fresh builds and the table below reports median p50 / median recall. Qdrant uses 3 warm measurement passes on one local Docker collection.

Method	p50 latency	Recall@10	Notes
Exact heap (`svec`)	2.03 ms	100%	Brute-force ground truth
sorted_hnsw	0.158 ms	100%	`shared_cache=on`, `ef_search=96`, index ~5.4 MB
pgvector HNSW (`vector`)	0.446 ms	90% median (90-95 range)	`ef_search=64`, same `M=16`, `ef_construction=64`, index ~2.0 MB
zvec HNSW	0.611 ms	100%	local in-process collection, `ef=64`
Qdrant HNSW	1.94 ms	100%	local Docker, `hnsw_ef=64`

Real-dataset sample (nytimes-256-angular, sampled 10K x 256D, top-10). The table below reports medians across 3 full harness runs. Ground truth is exact heap search inside PostgreSQL on the sampled corpus.

Method	p50 latency	Recall@10	Notes
Exact heap (`svec`)	1.557 ms	100%	ground truth
sorted_hnsw	0.327 ms	85.0% median (83.5-85.5 range)	`shared_cache=on`, `ef_search=96`, index ~4.1 MB
pgvector HNSW (`vector`)	0.751 ms	79.0% median (78.5-79.0 range)	`ef_search=64`, same `M=16`, `ef_construction=64`, index ~13 MB
zvec HNSW	0.403 ms	99.5%	local in-process collection, `ef=64`, ~14.1 MB on disk
Qdrant HNSW	1.704 ms	99.5%	local Docker, `hnsw_ef=64`

This dataset is far harsher than the deterministic synthetic corpus. Use the synthetic table for controlled regression tracking and the nytimes-256 sample when you want a better read on fixed-parameter recall.

See Sidecar HNSW search below for the legacy/manual svec_hnsw_scan path and its cache-mode tradeoffs.

Self-query vs cross-query

Self-query: the query vector exists in the dataset. This is the common RAG case — you embedded documents, now you search them. R@1 is 100% because the query is trivially its own closest neighbor.

Cross-query: the query vector is NOT in the dataset (e.g., a user’s question embedded at search time). R@1 depends on nprobe and PQ fidelity.

Reproducible benchmark

-- Pick 100 random queries, compute ground truth and ANN recall
WITH queries AS (
    SELECT id AS qid, embedding AS qvec
    FROM your_table ORDER BY random() LIMIT 100
),
ground_truth AS (
    SELECT q.qid,
           array_agg(t.id ORDER BY t.embedding <=> q.qvec) AS gt
    FROM queries q
    CROSS JOIN LATERAL (
        SELECT id, embedding FROM your_table
        WHERE id != q.qid
        ORDER BY embedding <=> q.qvec LIMIT 10
    ) t GROUP BY q.qid
),
ann_results AS (
    SELECT q.qid,
           (array_agg(a.id ORDER BY a.distance))[2:11] AS ann
    FROM queries q
    CROSS JOIN LATERAL pg_sorted_heap.svec_ann_scan(
        'your_table', q.qvec, nprobe := 3, lim := 11,
        cb_id := 2, ivf_cb_id := 1) a
    GROUP BY q.qid
)
SELECT
    round((avg(CASE WHEN gt.gt[1] = ar.ann[1]
               THEN 1.0 ELSE 0.0 END) * 100)::numeric, 1) AS "R@1",
    round((avg((SELECT count(*)::numeric
               FROM unnest(ar.ann) x
               WHERE x = ANY(gt.gt)) / 10.0) * 100)::numeric, 1) AS "Recall@10"
FROM ground_truth gt
JOIN ann_results ar ON gt.qid = ar.qid;

Note: lim := 11 and [2:11] skip the self-match (position 1) for cross-query evaluation. For self-query benchmarks, use lim := 10 without slicing.

For the local synthetic bench_nomic setup used during graph/IVF tuning, use scripts/bench_nomic_local_ann.py to reproduce exact ground truth, svec_graph_scan, and svec_ann_scan latency/recall curves from one command. The reproducible Make targets are:

make build-graph-bench-nomic \
  VECTOR_BENCH_DSN='host=/tmp port=65432 dbname=bench_nomic'

make bench-nomic-ann \
  VECTOR_BENCH_DSN='host=/tmp port=65432 dbname=bench_nomic'

Local graph/ANN tooling expects the Python packages listed in scripts/requirements-vector-tools.txt. CI installs that file directly; for local runs, scripts/find_vector_python.sh resolves a Python that can import the same dependency set.

To rebuild the graph sidecar used by svec_graph_scan, use scripts/build_graph.py. The committed workflow for the local bench_nomic setup is:

"$(./scripts/find_vector_python.sh)" scripts/build_graph.py \
  --dsn 'host=/tmp port=65432 dbname=bench_nomic' \
  --table bench_nomic_8k \
  --graph-table bench_nomic_graph \
  --entry-table bench_nomic_graph_entries \
  --bootstrap \
  --sketch-dim 384 \
  --M 32 \
  --M-max 64 \
  --n-adjacent 4 \
  --no-prune \
  --seed 42

Then benchmark the rebuilt graph against exact and IVF baselines:

"$(./scripts/find_vector_python.sh)" scripts/bench_nomic_local_ann.py \
  --dsn 'host=/tmp port=65432 dbname=bench_nomic' \
  --graph-table bench_nomic_graph \
  --entry-table bench_nomic_graph_entries \
  --query-limit 20 \
  --graph-efs 128,256,512,1024 \
  --ivf-nprobes 40 \
  --warmup 1

Builder notes:

--bootstrap reads directly from the main table and derives src_tid from ctid.
rebuild mode rejoins on src_tid/ctid, not id, so it stays correct for (partition_id, id) primary keys where id is not globally unique.
M, M-max, and n-adjacent change graph topology; re-run the harness after each build rather than carrying over numbers from an older graph.

pg_dump / pg_restore limitation: HNSW sidecar tables store src_tid (physical heap tuple pointer) which changes after pg_restore because COPY rewrites all heap pages with new TIDs. After restore, the sidecar’s src_tid values point to wrong or nonexistent heap tuples, causing recall to silently drop (observed: 88% → 99.8% after rebuild on the same data). Always rebuild the HNSW sidecar after pg_dump/pg_restore. This limitation does not affect sorted_hnsw, which uses PostgreSQL’s normal index infrastructure instead of sidecar src_tid joins.

Sidecar HNSW search (legacy: `svec_hnsw_scan`)

svec_hnsw_scan performs hierarchical HNSW search using compact sidecar tables. The latency/recall tables in this section describe that legacy/manual path, not the current sorted_hnsw Index AM baseline. The L0 column type controls the recall/memory tradeoff:

L0 column	Cache mode	Cache size (103K)	Recall@10 (ef=96)	p50
`hsvec(384)`	float16 sketch	~75 MB	97%	0.7ms
`svec(D)`	SQ8 quantized (default)	~D/4 × N	98.4%	1.3ms
`svec(D)` + `sq8=off`	float32 full	~D×4 × N	99.6%	1.5ms

The cache auto-detects the L0 column type (svec vs hsvec) at build time. For svec columns, SQ8 scalar quantization (uint8 per dimension) is applied automatically for 4x memory savings, controlled by the sorted_heap.hnsw_cache_sq8 GUC (default on).

Table requirements

{prefix}_meta   — entry_nid int4, max_level int2
{prefix}_l0     — nid int4 PK, sketch hsvec(N)|svec(D), neighbors int4[],
                  src_id text, src_tid tid
{prefix}_l1..lN — nid int4 PK, sketch hsvec(N), neighbors int4[]

Upper levels always use hsvec sketches. Only L0 supports svec for hybrid mode.

Building the graph

# Sketch-only L0 (fastest, smallest cache, ~97% recall)
python scripts/build_hnsw_graph.py \
  --dsn 'host=... dbname=...' \
  --source-table my_graph_table \
  --prefix my_hnsw \
  --M 16 --ef-construction 200

# Hybrid L0: full vectors in L0, sketches in upper levels (~99%+ recall)
python scripts/build_hnsw_graph.py \
  --dsn 'host=... dbname=...' \
  --source-table my_graph_table \
  --prefix my_hnsw \
  --M 16 --ef-construction 200 \
  --full-vectors --main-table my_sorted_heap_table

# Truncated L0: first 768 dims only (for MRL/Matryoshka embeddings)
python scripts/build_hnsw_graph.py \
  --dsn 'host=... dbname=...' \
  --source-table my_graph_table \
  --prefix my_hnsw \
  --M 16 --ef-construction 200 \
  --full-vectors --main-table my_sorted_heap_table --l0-dim 768

Calling the function

SELECT * FROM svec_hnsw_scan(
    tbl          := 'my_table'::regclass,
    query        := '[0.1, 0.2, ...]'::svec,
    prefix       := 'my_table_hnsw',
    ef_search    := 96,    -- beam width for L0 traversal
    lim          := 10,    -- results to return
    rerank_topk  := 48,    -- candidates to exact-rerank (see below)
    rerank1_topk := 0      -- dense r1 pre-filter (0 = disabled)
);

Enable the session-local cache for best latency (built once per session):

SET sorted_heap.hnsw_cache_l0 = on;

-- SQ8 quantization (default on, 4x memory saving for svec L0):
SET sorted_heap.hnsw_cache_sq8 = on;   -- default
-- To disable SQ8 for maximum recall without rerank:
SET sorted_heap.hnsw_cache_sq8 = off;

`rerank_topk` semantics

rerank_topk controls how many L0 candidates are passed to exact svec cosine rerank. Exact rerank always runs when the L0 table has a src_tid column (which build_hnsw_graph.py always adds).

`rerank_topk` value	Candidates reranked	Effect
`0` (default)	all `ef_search`	No truncation. Highest recall, `ef_search` TOAST reads.
`0 < rk < ef_search`	`rk`	Truncates before rerank. Fewer TOAST reads, lower recall.
`rk >= ef_search`	all `ef_search`	No effect (same as 0).

rerank_topk=0 does NOT skip exact rerank. It means “rerank all candidates”. To return results by sketch distance only (skipping TOAST reads entirely), the L0 table must omit the src_tid column — this is not the default build.

Recommended operating points (103K × 2880-dim, k8s 2 Gi pod)

hsvec(384) sketch L0:

Goal	ef_search	rerank_topk	p50 latency	Recall@10
Balanced	96	48	1.02ms	96.8%

svec(D) hybrid L0 (SQ8 cache, default):

Goal	ef_search	lim	rerank_topk	p50 latency	Recall@10
Fastest top-1	32	1	1	0.51ms	—
Fast top-5	64	5	5	0.87ms	98.8%
Fast top-10	96	10	10	1.25ms	98.6%
Balanced top-10	96	10	20	1.35ms	99.8%
Safe top-10	96	10	48	1.64ms	99.8%
Rerank-all	96	10	0	6.94ms	99.8%

Tuning rerank_topk for lowest latency: set rerank_topk = max(lim, 20) for 99.8% recall with minimal TOAST reads. Each TOAST read fetches one full svec(D) row (~11.5 KB for 2880-dim), so fewer reads = lower latency. The SQ8 cache navigates accurately enough that reranking just 20 candidates already achieves 99.8% recall — no need for 48 or more.

SQ8 quantizes float32 → uint8 per dimension in the session-local cache (4x memory savings). The streaming two-pass build avoids allocating a float32 intermediate buffer, so peak memory is just the SQ8 cache itself (283 MB for 103K × 2880-dim). This runs comfortably on 2 Gi pods. Set sorted_heap.hnsw_cache_sq8 = off only when memory is abundant and you need zero-rerank operation.

Measured with shared_buffers=512MB (2 Gi pod), warm cache, 50 queries. Requires sorted_heap.hnsw_cache_l0 = on. Cold first-call latency is 2–3x higher due to TOAST page faults and cache build.

Dense r1 pre-filter (`rerank1_topk`)

An optional intermediate stage using a {prefix}_r1 (nid int4 PK, rerank_vec hsvec(768)) sidecar. Set rerank1_topk > 0 to enable. The r1 stage scores all ef_search candidates via hsvec(768) cosine, keeps the closest Max(rerank1_topk, lim), then passes those to exact svec rerank.

On a warm TOAST pool, r1 provides marginal benefit. At ef=64, r1=24 saves ~0.3 ms but costs ~0.12 recall (9.74→9.62). At ef≥96 the r1 btree overhead exceeds the TOAST savings. r1 is most useful in cold-TOAST scenarios (first query of a session, or very large datasets where TOAST pages don’t fit in shared_buffers).

If {prefix}_r1 does not exist, the stage is silently skipped.

API reference

Training

Function	Description
`svec_ann_train(query, nlist, m)`	Train IVF + PQ codebooks in one call
`svec_ivf_train(query, nlist)`	Train IVF centroids only
`svec_pq_train(query, m)`	Train raw PQ codebook
`svec_pq_train_residual(query, m, ivf_cb_id)`	Train residual PQ codebook

Encoding

Function	Description
`svec_ivf_assign(vec, cb_id)`	Assign vector to nearest IVF centroid → int2
`svec_pq_encode(vec, cb_id)`	Encode vector as PQ code → bytea
`svec_pq_encode_residual(vec, centroid_id, pq_cb_id, ivf_cb_id)`	Encode residual as PQ code → bytea

Search

Function	Description
`svec_hnsw_scan(tbl, query, prefix, ef_search, lim, rerank_topk, rerank1_topk)`	Hierarchical HNSW via sidecar tables (sub-ms with cache)
`svec_graph_scan(tbl, query, graph_tbl, entries_tbl, ef_search, lim, rerank_topk)`	Flat NSW graph search
`svec_ann_scan(tbl, query, nprobe, lim, rerank_topk, cb_id, ivf_cb_id, pq_column)`	C-level IVF-PQ scan
`svec_ann_search(tbl, query, nprobe, lim, rerank_topk, cb_id)`	SQL-level IVF-PQ search
`svec_ivf_probe(vec, nprobe, cb_id)`	Return nearest nprobe centroid IDs

Low-level distance

Function	Description
`svec_pq_distance_table(vec, cb_id)`	Precompute M×256 distance table → bytea
`svec_pq_distance_table_residual(vec, centroid_id, pq_cb_id, ivf_cb_id)`	Distance table for residual PQ
`svec_pq_adc_lookup(dist_table, pq_code)`	ADC distance from precomputed table
`svec_pq_adc(vec, pq_code, cb_id)`	ADC distance (builds table internally)
`svec_cosine_distance(a, b)`	Exact cosine distance (also available as `<=>`)