pg_ripple — Roadmap

From 0.1.0 (foundation) to 1.0.0 (production-ready triple store)

Authority rule: plans/implementation_plan.md is the authoritative description of the eventual target architecture. This roadmap is the delivery sequence for that architecture. If a milestone summary here conflicts with the implementation plan, the implementation plan wins and the roadmap should be updated to match it.

How to read this roadmap

Each release below has two layers:

The plain-language summary (in the coloured box) explains what the release delivers and why it matters — no programming knowledge required.
The technical deliverables list the specific items developers will build. Feel free to skip these if you're reading for the big picture.

Effort estimates are given as person-weeks — e.g. "6–8 pw" means the release would take roughly 6–8 weeks for a single full-time developer, or 3–4 weeks for a pair working together. The total estimated effort from v0.1.0 to v1.0.0 is 275–376 person-weeks (~63–86 months for one developer; ~32–43 months for a pair).

"optional at runtime" items: some deliverables are annotated (optional at runtime — X must be installed). This means the feature depends on an external extension (e.g. pg_trickle) that may not be installed in every deployment. The feature is required by this roadmap and must be implemented; the Rust code gates on a runtime availability check and degrades gracefully (returns 0 / false / empty, emits a WARNING, never raises an ERROR) when the dependency is absent. These items are not optional from a delivery standpoint.

Overview at a glance

Version	Name	What it delivers (one sentence)	Effort
0.1.0	Foundation	Install the extension, store and retrieve facts (VP storage from day one)	6–8 pw
0.2.0	Bulk Loading & Named Graphs	Bulk data import, named graphs, rare-predicate consolidation, N-Triples export	6–8 pw
0.3.0	SPARQL Basic	Ask questions in the standard RDF query language (incl. GRAPH patterns)	6–8 pw
0.4.0	RDF-star / Statement IDs	Make statements about statements; LPG-ready storage	8–10 pw
0.5.0	SPARQL Advanced (Query)	Property paths, aggregates, UNION/MINUS, subqueries, BIND/VALUES	6–8 pw
0.5.1	SPARQL Advanced (Storage & Write)	Inline encoding, CONSTRUCT/DESCRIBE, INSERT/DELETE DATA, FTS	6–8 pw
0.6.0	HTAP Architecture	Heavy reads and writes at the same time; shared-memory cache	8–10 pw
0.7.0	SHACL Core + Deduplication	Define data quality rules; reject bad data on insert; on-demand and merge-time triple deduplication	5–7 pw
0.8.0	SHACL Advanced	Complex data quality rules with background checking	4–6 pw
0.9.0	Serialization	Import and export data in all standard RDF file formats	3–4 pw
0.10.0	Datalog Reasoning	Automatically derive new facts from rules and logic	10–12 pw
0.11.0	SPARQL & Datalog Views	Live, always-up-to-date dashboards from SPARQL and Datalog queries	5–7 pw
0.12.0	SPARQL Update (Advanced)	Pattern-based updates and graph management commands	3–4 pw
0.13.0	Performance	Speed tuning, benchmarks, production-grade throughput	6–8 pw
0.14.0	Admin & Security	Operations tooling, access control, docs, packaging	4–6 pw
0.15.0	SPARQL Protocol	Standard HTTP API, graph-aware loaders and deletes as SQL functions	3–4 pw
0.16.0	SPARQL Federation	Query remote SPARQL endpoints alongside local data	4–6 pw
0.17.0	JSON-LD Framing	Frame-driven CONSTRUCT queries producing nested JSON-LD	3–4 pw
0.18.0	SPARQL CONSTRUCT & ASK Views	Materialize CONSTRUCT and ASK queries as live, incrementally-updated stream tables	2–3 pw
0.19.0	Federation Performance	Connection pooling, result caching, query rewriting, and batching for remote SPARQL endpoints	3–5 pw
0.20.0	W3C Conformance & Stability	W3C SPARQL 1.1 and SHACL Core test suite compliance, crash recovery and memory safety hardening, security audit initiation	5–7 pw
0.21.0	SPARQL Built-in Functions & Query Correctness	Implement all ~40 missing SPARQL 1.1 built-in functions, fix the FILTER silent-drop hazard, and close critical query-semantics bugs	6–8 pw
0.22.0	Storage Correctness & Security Hardening	Fix HTAP merge race conditions, dictionary cache rollback, shmem cache thrashing, rare-predicate promotion race, and HTTP service security gaps	6–8 pw
0.23.0	SHACL Core Completion & SPARQL Diagnostics	Complete the SHACL constraint set, add SPARQL query introspection, and fix Datalog/JSON-LD correctness issues	6–8 pw
0.24.0	Semi-naive Datalog & Performance Hardening	Implement semi-naive evaluation for Datalog rules, complete the OWL RL rule set, batch-decode large result sets, and bound property-path depth	6–8 pw
0.25.0	GeoSPARQL & Architectural Polish	Add GeoSPARQL 1.1 geometry primitives, stabilise the internal catalog against OID drift, and close remaining medium- and low-priority issues	6–8 pw
0.26.0	GraphRAG Integration	First-class integration with Microsoft GraphRAG: BYOG Parquet export, Datalog-enriched entity graphs, SHACL quality enforcement, and a Python CLI bridge	4–6 pw
0.27.0	Vector + SPARQL Hybrid: Foundation	Core pgvector integration — embedding table, HNSW index, `pg:similar()` SPARQL function, bulk embedding, and hybrid retrieval modes	5–7 pw
0.28.0	Advanced Hybrid Search & RAG Pipeline	Production-grade RRF fusion, incremental embedding worker, graph-contextualized embeddings, and end-to-end RAG retrieval	5–8 pw
0.29.0	Datalog Optimization: Magic Sets & Cost-Based Compilation	Goal-directed inference via magic sets, cost-based body atom reordering, subsumption checking, anti-join negation, filter pushdown, delta table indexing	5–7 pw
0.30.0	Datalog Aggregation & Compiled Rule Plans	Aggregation in rule bodies (Datalog^agg), SQL plan caching across inference runs, SPARQL on-demand query speedup	5–7 pw
0.31.0	Entity Resolution & Demand Transformation	`owl:sameAs` entity canonicalization, demand transformation for goal-directed rule rewriting, SPARQL query planner integration	5–7 pw
0.32.0	Well-Founded Semantics & Tabling	Three-valued semantics for cyclic ontologies, subsumptive result caching for Datalog and SPARQL repeated sub-queries	5–7 pw
0.33.0	Documentation Site & Content Overhaul	Complete docs site rebuild — CI harness, eight feature-deep-dive chapters, operations guide, reference section, and content governance	8–12 pw
0.34.0	Bounded-Depth Termination & Incremental Retraction (DRed)	Early fixpoint termination for bounded hierarchies (20–50% faster SPARQL property paths); Delete-Rederive for write-correct materialized predicates	5–7 pw
0.35.0	Parallel Stratum Evaluation & Incremental Rule Updates	Background-worker parallelism for independent rules (2–5× faster materialization); add/remove rules without full recompute	5–7 pw
0.36.0	Worst-Case Optimal Joins & Lattice-Based Datalog	Leapfrog Triejoin for cyclic SPARQL patterns (10×–100× speedup); Datalog^L monotone lattice aggregation	6–9 pw
0.37.0	Storage Concurrency Hardening & Error Safety	Fix HTAP merge race, rare-predicate promotion race, dictionary cache rollback; eliminate all hard panics; add GUC validators	9–11 pw
0.38.0	Architecture Refactoring & Query Completeness	Split god-module, PredicateCatalog trait, batch encoding, SCBD, SPARQL Update completeness, SHACL hints in planner	9–11 pw
0.39.0	Datalog HTTP API	REST API exposing all 27 Datalog SQL functions in `pg_ripple_http`: rule management, inference, goal queries, constraints, admin	3–5 pw
0.40.0	Streaming Results, Explain & Observability	Server-side SPARQL cursors, `explain_sparql()`, `explain_datalog()`, OpenTelemetry tracing, resource governors	9–11 pw
0.41.0	Full W3C SPARQL 1.1 Test Suite	Complete W3C SPARQL 1.1 Query + Update + Graph Patterns + Aggregates test suite harness with parallelized execution; 3,000+ tests in < 2 min CI	5–7 pw
0.42.0	Parallel Merge, Cost-Based Federation & Live CDC	Multi-worker HTAP merge, FedX-style federation planner, parallel SERVICE, live RDF change subscriptions	10–12 pw
0.43.0	WatDiv + Jena Conformance Suite	Apache Jena edge-case tests (~1,000) and WatDiv scale-correctness benchmark (10M+ triples, star/chain/snowflake/complex patterns); 90% harness reuse from v0.41.0	5–7 pw
0.44.0	LUBM Conformance Suite	Lehigh University Benchmark — OWL RL inference correctness across 14 canonical queries on 1K–8M triple datasets; includes Datalog API validation sub-suite for rule compilation, iteration tracking, inferred triples, goal queries, and performance baseline	3–5 pw
0.45.0	SHACL Completion, Datalog Robustness & Crash Recovery	Close remaining SHACL Core gaps (`sh:equals`/`sh:disjoint`, decoded violation IRIs, async load test), harden parallel Datalog strata rollback, add missing crash-recovery scenarios, and standardise migration documentation	4–6 pw
0.46.0	Property-Based Testing, Fuzz Hardening & OWL 2 RL Conformance	`proptest` for SPARQL and dictionary invariants, fuzz the federation result decoder, W3C OWL 2 RL test suite in CI, TopN push-down, BSBM regression gate, sequence pre-allocation for Datalog workers, rustdoc coverage enforcement, and HTTP certificate pinning	5–7 pw
0.47.0	SHACL Truthfulness, Dead-Code Activation & Architecture Refactor	Fix parsed-but-not-checked SHACL constraints, wire `preallocate_sid_ranges()`, finish the `sparql/translate/` module split, add 5 fuzz targets, 4 crash-recovery scenarios, cache hit-rate SRFs, GUC validators, and security hygiene	8–10 pw
0.48.0	SHACL Core Completeness, OWL 2 RL Closure & SPARQL Completeness	Complete all 35 SHACL Core constraints and complex `sh:path` expressions, close the OWL 2 RL rule set, add SPARQL Update MOVE/COPY/ADD, fix SPARQL-star variable patterns, WatDiv baselines, and operational hardening	6–8 pw
0.49.0	AI & LLM Integration	`sparql_from_nl()` NL-to-SPARQL via configurable LLM endpoint; `suggest_sameas()` and `apply_sameas_candidates()` for embedding-based entity alignment	4–6 pw
0.50.0	Developer Experience & GraphRAG Polish	VS Code extension with SPARQL/SHACL/Datalog support and query runner; `explain_sparql(analyze:=true)` debugger; `rag_context()` RAG pipeline	5–7 pw
1.0.0	Production Release	Standards conformance, stress testing, security audit	6–8 pw
		Total estimated effort	275–376 pw

v0.1.0 — Foundation

Theme: Core data model, dictionary encoding, and basic triple CRUD.

In plain language: This is the "hello world" release. After installing pg_ripple into a PostgreSQL database, a user can store facts (called triples — think "subject → relationship → object", e.g. "Alice → knows → Bob") and retrieve them by pattern. No query language yet — just the basic building blocks. Internally, every piece of text (names, URLs, values) is converted to a compact number for fast storage and comparison. This release also sets up automated testing so that every future change is verified.

Effort estimate: 6–8 person-weeks

Completed items (click to expand)

Feature	Supported	Notes
`@type` matching	✓	Single IRI or array of IRIs
`@id` matching	✓	Single IRI or array of IRIs
Property wildcard `{}`	✓	Matches any value for a property
Absent-property pattern `[]`	✓	Matches nodes lacking the property
`@reverse` properties	✓	Flipped triple pattern in CONSTRUCT
`@embed`: `@once` / `@always` / `@never`	✓	Full embedding control
`@explicit` inclusion flag	✓	Omit unlisted properties from output
`@omitDefault` flag	✓	Omit null-valued absent properties
`@default` values	✓	Substitute defaults for absent properties
`@requireAll` flag	✓	Turns OPTIONAL joins to INNER joins
`@context` compaction	✓	Prefix substitution from frame `@context`
Named graph `@graph` scoping	✓	Maps to `g` column filter on VP tables
`@omitGraph` flag	✓	Single root node omits `@graph` wrapper
Value pattern matching (`@value` / `@language` / `@type` in value objects)	✗	Deferred; requires full-graph scan to implement correctly

GUC	Type	Default	Description
`pg_ripple.topn_pushdown`	bool	`on`	Push `LIMIT N` into the SQL plan for `ORDER BY + LIMIT` queries
`pg_ripple.datalog_sequence_batch`	integer	`10000`	SID range reserved per parallel Datalog worker per batch

GUC	Type	Default	Description
`pg_ripple.llm_endpoint`	string	`''`	LLM API base URL (empty = NL→SPARQL disabled)
`pg_ripple.llm_model`	string	`gpt-4o`	LLM model identifier
`pg_ripple.llm_api_key_env`	string	`PG_RIPPLE_LLM_API_KEY`	Name of the environment variable holding the LLM API key
`pg_ripple.llm_include_shapes`	bool	`on`	Include SHACL shapes as LLM context when generating SPARQL

Code	Severity	Message
PT700	ERROR	LLM endpoint unreachable or returned HTTP error
PT701	ERROR	LLM response did not contain a valid SPARQL query
PT702	ERROR	LLM-generated SPARQL query failed to parse