GraphRAG 0.13 plan
This document narrows the release target for GraphRAG.
The goal is not to ship a general-purpose graph database API in 0.13. The goal is to ship a stable fact-shaped GraphRAG contract on top of the already-verified sorted_heap + sorted_hnsw path.
Current status: all listed release gates below are now covered for the narrow fact-shaped 0.13 surface. The remaining work is release bundling and documentation clarity, not new release-critical hardening.
Stable target
The stable surface for 0.13 is:
SELECT *
FROM sorted_heap_graph_rag(
'facts'::regclass,
'[0.1,0.2,0.3,...]'::svec,
relation_path := ARRAY[1, 2],
ann_k := 64,
top_k := 10,
score_mode := 'path'
);
Contract:
- fact rows clustered by
(entity_id, relation_id, target_id), or by an equivalent registered alias mapping - ANN seed retrieval on
entity_id relation_pathis a non-empty per-hop relation sequencescore_mode = 'endpoint' | 'path'limit_rows = 0means unlimited helper work; positive values cap expansion/rerank work and do not replace the finaltop_kcontract- exact rerank on the expanded candidate set
Semantics:
relation_path := ARRAY[1]- one-hop expansion
- exact rerank on the endpoint fact
score_mode := 'path'is intentionally equivalent toendpoint
relation_path := ARRAY[1, 2], score_mode := 'endpoint'- two-hop expansion
- exact rerank on the second-hop endpoint only
relation_path := ARRAY[1, 2], score_mode := 'path'- two-hop expansion
- path-aware rerank using hop-1 and hop-2 evidence together
relation_path := ARRAY[1, 2, 3, ...]- explicit multi-hop expansion
- each array element is the relation filter for that hop
score_mode := 'endpoint'ranks only the final hopscore_mode := 'path'accumulates evidence across the whole path
What stays beta
These remain beta even after the new syntax lands:
- unified routed GraphRAG dispatcher (recommended app entry point for multi-shard workloads):
sorted_heap_graph_route(...)— single query entry pointsorted_heap_graph_route_plan(...)— routing introspection- routing setup helpers (exact-key, range, policy, profile, default)
- lower-level helper zoo:
sorted_heap_expand_ids(...)sorted_heap_expand_rerank(...)sorted_heap_expand_twohop_rerank(...)sorted_heap_expand_twohop_path_rerank(...)sorted_heap_expand_multihop_rerank(...)sorted_heap_expand_multihop_path_rerank(...)sorted_heap_graph_rag_scan(...)sorted_heap_graph_rag_twohop_scan(...)sorted_heap_graph_rag_twohop_path_scan(...)sorted_heap_graph_rag_multihop_scan(...)sorted_heap_graph_rag_multihop_path_scan(...)
- code-corpus contracts that currently live in benchmark/harness logic:
- prompt-focused snippet selection
- prompt-symbol rescue
- compact lexical rescue
- external-corpus rescue paths that are quality-correct but still much slower than the primary in-repo frontier
Why this syntax
The existing beta surface works, but it is a function zoo.
sorted_heap_graph_rag(...) is the stable-facing layer because it:
- collapses the public fact-graph contract to one entry point
- keeps the fast path on top of already-verified helper/wrapper internals
- fixes the semantic mismatch of the older one-hop wrapper for fact graphs by seeding one-hop expansion from ANN-selected
entity_idvalues instead oftarget_id - gives PostgreSQL users a query shape that is closer to the current
sorted_hnswexperience: one primary entry point, with a few meaningful knobs
Release gates for 0.13
All of the following are now covered for the narrow fact-shaped contract:
- Surface freeze
sorted_heap_graph_rag(...)is the documented primary entry point for fact-shaped GraphRAG- older wrappers remain available but are documented as lower-level building blocks
- Lifecycle hardening
- dump/restore coverage, including shared/default
segment_labelspersistence in the segmented/routed control plane - crash recovery coverage
- extension upgrade coverage
- concurrent DML/compact interaction checks on GraphRAG-shaped tables
- dump/restore coverage, including shared/default
- Observability
- implemented via:
sorted_heap_graph_rag_stats()sorted_heap_graph_rag_reset_stats()
- current stats include:
- seed count
- expanded row count
- reranked row count
- returned row count
- per-stage timing for ANN, expansion, rerank
- current scope is backend-local last-call observability, which is enough for release tuning and debugging but not a full tracing system
- implemented via:
- Larger real-corpus verification
- current progress:
- the smaller in-repo
cogniformerusslice is already repeated-build stable at100.0% / 100.0% - a larger in-repo transfer gate on the full
~/Projects/Crystal/cogniformerusrepository now also passes repeated-build verification once the final result budget is raised fromtop_k=4totop_k=8 - a first mixed-language gate on
~/Projects/C/pycdcnow also runs under the same harness family via JSON fixtures + configurable source extensions + quoted include-edge extraction - on that corpus, the fast generic point is repeated-build stable but only partial (
90.0% / 60.0%), while the code-aware helper-backed compact include rescue is repeated-build stable at100.0% / 100.0% - a first archive-side gate on
~/SrcArchives/apple/ninja/srcis now also repeated-build stable:- generic
prompt_summary_snippet_pycloses at100.0% / 100.0%withtop_k=12 - code-aware
prompt_summary_snippet_pyremains partial there
- generic
- the smaller in-repo
- the scoped
0.13larger real-corpus gate is now covered across:~/Projects/Crystal~/Projects/C~/SrcArchives
- current progress:
- Non-canonical schema story
- non-canonical fact schemas are now supported via:
sorted_heap_graph_register(...)sorted_heap_graph_config(...)sorted_heap_graph_unregister(...)
- remaining work is hardening and documentation, not naming flexibility
- non-canonical fact schemas are now supported via:
Implementation phases
Phase 1: unified syntax
Done in this branch:
- introduce
sorted_heap_graph_rag(...) - keep the current lower-level wrappers as implementation building blocks
- document the stable contract separately from the experimental code-corpus contracts
Phase 2: hardening
Implemented in this branch:
- dump/restore coverage for registered GraphRAG alias mappings
- extension upgrade coverage for
0.12.0 -> 0.13.0 - persistence coverage for
sorted_heap_graph_registryacross pg_dump/restore - persistence coverage for the segmented/routed GraphRAG control plane across pg_dump/restore:
- shared shard metadata
- shared
segment_labels - range routing
- exact-key routing
- route policies
- route profiles
- route defaults
- effective default
segment_labels
- crash recovery coverage for registered GraphRAG alias schemas
- concurrent DML / online compact / online merge coverage for registered GraphRAG alias schemas
Still needed:
- no additional release-critical hardening is currently required for the narrow
0.13fact-graph stable target - remaining work is release packaging and keeping the stable/beta/reference split explicit in the public docs
Phase 3: schema registration
Implemented in this branch:
- register graph metadata for non-canonical column names
- keep
sorted_heap_graph_rag(...)syntax stable while relaxing the schema naming constraint - regression coverage now includes an alias schema:
src_id / edge_type / dst_id / vec / body
Phase 4: code-corpus productization
Future work, not required for 0.13 fact-graph stable:
- move snippet/symbol/lexical rescue logic from benchmark harnesses into a coherent user-facing API, or
- keep it explicitly documented as reference logic instead of product surface