Quick Start
Requirements
- PostgreSQL 17 or 18
- Standard PGXS build toolchain (
pg_configin PATH)
Build and install
git clone https://github.com/skuznetsov/pg_sorted_heap.git
cd pg_sorted_heap
make && make install
To build for a specific PG version:
make PG_CONFIG=/usr/lib/postgresql/17/bin/pg_config
make install PG_CONFIG=/usr/lib/postgresql/17/bin/pg_config
Create a sorted_heap table
CREATE EXTENSION pg_sorted_heap;
CREATE TABLE events (
id int PRIMARY KEY,
ts timestamptz,
payload text
) USING sorted_heap;
Load data
The COPY path (multi_insert) automatically sorts each batch by PK:
INSERT INTO events
SELECT i, now() - (i || ' seconds')::interval, repeat('x', 80)
FROM generate_series(1, 100000) i;
Compact
Compaction rewrites all data in globally sorted PK order and builds the zone map:
SELECT sorted_heap_compact('events'::regclass);
For non-blocking compaction on a live system:
CALL sorted_heap_compact_online('events'::regclass);
Verify scan pruning
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM events WHERE id BETWEEN 500 AND 600;
Output:
Custom Scan (SortedHeapScan) on events
Filter: ((id >= 500) AND (id <= 600))
Zone Map: 2 of 1946 blocks (pruned 1944)
Buffers: shared hit=2
The zone map pruned 1,944 of 1,946 blocks – only 2 blocks were read.
Run tests
make installcheck # regression tests
make test-crash-recovery # crash recovery (4 scenarios)
make test-concurrent # concurrent DML + online ops
make test-toast # TOAST integrity + concurrent guard
make test-alter-table # ALTER TABLE DDL (36 checks)
make test-dump-restore # pg_dump/restore lifecycle (10 checks)
make test-graph-builder # graph sidecar bootstrap + rebuild smoke
make test-pg-upgrade # pg_upgrade 17->18 (13 checks)
Stable GraphRAG quick start
The stable 0.13 GraphRAG surface is intentionally narrow: fact-shaped retrieval over a sorted_heap table clustered by (entity_id, relation_id, target_id).
CREATE EXTENSION pg_sorted_heap;
CREATE TABLE facts (
entity_id int4,
relation_id int2,
target_id int4,
embedding svec(384),
payload text,
PRIMARY KEY (entity_id, relation_id, target_id)
) USING sorted_heap;
CREATE INDEX facts_embedding_idx
ON facts USING sorted_hnsw (embedding)
WITH (m = 24, ef_construction = 200);
SET sorted_hnsw.ef_search = 128;
One-hop retrieval:
SELECT *
FROM sorted_heap_graph_rag(
'facts'::regclass,
'[0.1,0.2,0.3,...]'::svec,
relation_path := ARRAY[1],
ann_k := 64,
top_k := 10,
score_mode := 'endpoint'
);
Two-hop path-aware retrieval:
SELECT *
FROM sorted_heap_graph_rag(
'facts'::regclass,
'[0.1,0.2,0.3,...]'::svec,
relation_path := ARRAY[1, 2],
ann_k := 64,
top_k := 10,
score_mode := 'path'
);
If your fact table uses different column names, register the mapping once:
SELECT sorted_heap_graph_register(
'facts_alias'::regclass,
entity_column := 'src_id',
relation_column := 'edge_type',
target_column := 'dst_id',
embedding_column := 'vec',
payload_column := 'body'
);
For stage-level tuning, reset and inspect the backend-local last-call stats:
SELECT sorted_heap_graph_rag_reset_stats();
SELECT *
FROM sorted_heap_graph_rag(
'facts'::regclass,
'[0.1,0.2,0.3,...]'::svec,
relation_path := ARRAY[1, 2],
ann_k := 64,
top_k := 10,
score_mode := 'path'
);
SELECT * FROM sorted_heap_graph_rag_stats();