FAQ
General
Why VP tables instead of one big triple table?
A single (s, p, o, g) table with 100M triples requires a B-tree index that touches all four columns for any useful predicate-specific query. Each query must scan rows for all predicates regardless of the filter.
Vertical Partitioning (one table per predicate) means a query for <ex:knows> triples only scans the vp_{knows_id} table — typically a fraction of the total data. The two B-tree indexes on (s, o) and (o, s) are small and cache-friendly. SPARQL star-patterns (same subject, multiple predicates) become simple multi-way joins between small tables.
Why PostgreSQL 18?
pg_ripple uses the CYCLE clause in WITH RECURSIVE CTEs for hash-based cycle detection in property path queries. The CYCLE clause was introduced in PostgreSQL 14 but the hash-based variant (as opposed to array-based) first became performant in PG 17/18. PG 18 is also the first version where pgrx 0.17 has stable support.
Is pg_ripple compatible with LPG tools?
Not yet. A Cypher/GQL compatibility layer is on the post-1.0 roadmap. The VP storage structure is architecturally aligned with LPG — each VP table is a property edge type — so the mapping will be natural.
What RDF formats does pg_ripple support?
Import (loading):
- N-Triples and N-Triples-star (
load_ntriples) - N-Quads (
load_nquads) - Turtle and Turtle-star (
load_turtle) - TriG (
load_trig) - RDF/XML (
load_rdfxml, v0.9.0)
Export:
- N-Triples (
export_ntriples) - N-Quads (
export_nquads) - Turtle (
export_turtle, v0.9.0) — including Turtle-star for RDF-star data - JSON-LD expanded form (
export_jsonld, v0.9.0) - Streaming Turtle or JSON-LD for large graphs (
export_turtle_stream,export_jsonld_stream, v0.9.0)
SPARQL CONSTRUCT and DESCRIBE results can be serialized directly to Turtle or JSON-LD via sparql_construct_turtle, sparql_construct_jsonld, sparql_describe_turtle, and sparql_describe_jsonld (v0.9.0).
Can I use pg_ripple with JSON-LD for REST APIs?
Yes. Use export_jsonld() or sparql_construct_jsonld() to produce JSON-LD responses:
-- Full graph as JSON-LD
SELECT pg_ripple.export_jsonld('https://myapp.example.org/graph/users');
-- SPARQL-driven selection as JSON-LD
SELECT pg_ripple.sparql_construct_jsonld('
CONSTRUCT { ?s ?p ?o }
WHERE { ?s a <https://schema.org/Person> ; ?p ?o }
');
The output is JSON-LD in expanded form — each subject is one array entry with IRI keys and typed value arrays.
SPARQL
What SPARQL 1.1 features are supported?
As of v0.19.0, the full SPARQL 1.1 specification is implemented:
Query forms: SELECT, ASK, CONSTRUCT, DESCRIBE
Graph patterns: BGP, OPTIONAL (LeftJoin), UNION, MINUS, FILTER, BIND, VALUES, Named graphs via GRAPH
Property paths: +, *, ?, / (sequence), | (alternative), ^ (inverse)
Aggregates: GROUP BY, HAVING, COUNT, SUM, AVG, MIN, MAX, GROUP_CONCAT
Modifiers: DISTINCT, ORDER BY, LIMIT, OFFSET, subqueries
Update: INSERT DATA, DELETE DATA, DELETE/INSERT WHERE, LOAD, CLEAR, DROP, CREATE, COPY, MOVE, ADD
Federation: SERVICE <url> { … } with SSRF allowlist, SERVICE SILENT, connection pooling, result caching, adaptive timeouts, batch SERVICE detection
Does pg_ripple support SPARQL 1.1 property paths?
Yes, as of v0.5.0. All standard path operators are supported: +, *, ?, / (sequence), | (alternative), ^ (inverse). Negated property sets !(p1|p2) are partially supported via vp_rare.
Property path queries compile to WITH RECURSIVE CTEs with PostgreSQL 18's CYCLE clause for hash-based cycle detection.
What is the maximum traversal depth for property paths?
Controlled by the pg_ripple.max_path_depth GUC (default: 100). Set it lower to prevent runaway queries on dense graphs:
SET pg_ripple.max_path_depth = 10;
Why does my FILTER not match a number?
SPARQL FILTER comparisons on numeric literals (FILTER(?age >= 18)) require the literal to be typed with an XSD numeric type:
"18"^^<http://www.w3.org/2001/XMLSchema#integer>
Plain string literals like "18" are compared as strings. Use typed literals when inserting numeric data, or cast in the FILTER expression.
Data modeling
What's the difference between a named graph and a blank node?
A named graph is a set of triples identified by an IRI. It is used for partitioning data by source, time, or topic. You can query across all named graphs, query within a specific graph, or count triples per graph.
A blank node is a resource without a global IRI identity — it has identity only within a document load scope. Blank nodes are used for anonymous resources (e.g. intermediate nodes in a structure) that don't need a stable identifier.
What is an RDF-star quoted triple?
A quoted triple << s p o >> is a triple that can appear in subject or object position in another triple. It enables statements about triples — useful for provenance (<< alice knows bob >> :assertedBy :carol), temporal annotations, and confidence scores.
pg_ripple stores quoted triples as dictionary entries of kind = 5. See RDF-star for details.
Performance
How fast is bulk load?
On a modern server with an NVMe SSD, load_ntriples() processes approximately 50,000–150,000 triples per second (single connection, default settings). Performance depends on predicate diversity (more unique predicates → more VP tables created), hardware, and PostgreSQL configuration.
When should I use SPARQL vs find_triples?
find_triples() only matches a single (s, p, o, g) pattern — it is equivalent to a SPARQL BGP with exactly one triple pattern. Use it for single-pattern lookups.
Use sparql() for anything more complex: multi-pattern joins, OPTIONAL, FILTER, aggregates, property paths, or when you want the ergonomics of SPARQL's variable-binding model.
HTAP & Operations (v0.6.0)
Does pg_ripple require shared_preload_libraries?
For full HTAP functionality (background merge worker, latch-poke hook, shared-memory statistics) you must add pg_ripple to shared_preload_libraries:
shared_preload_libraries = 'pg_ripple'
Without this, the extension still works for reads and writes — but all writes stay in delta tables and are never automatically merged into main. Queries on predicates with large deltas will be slower than expected.
See the Pre-Deployment Checklist for the complete setup sequence.
What is the difference between compact() and the merge worker?
compact() | Merge worker | |
|---|---|---|
| Trigger | Manual SQL call | Automatic (latch poke or timer) |
| Blocks caller | Yes | No — runs in background |
| When to use | Maintenance windows, tests | Production continuous operation |
Both produce the same result: delta rows are moved into main, tombstones are cleared, and a fresh BRIN index is built.
How do I know if the merge worker is keeping up?
-- Check unmerged row count
SELECT pg_ripple.stats() -> 'unmerged_delta_rows';
-- Watch it over time
SELECT now(), (pg_ripple.stats() -> 'unmerged_delta_rows')::int AS lag
FROM generate_series(1, 10) g,
pg_sleep(5) AS _s
WHERE true; -- run this manually in a loop
A healthy deployment shows unmerged_delta_rows rising during writes and falling after merges. If it only rises, the worker is behind — lower merge_threshold or increase server I/O capacity.
Can I subscribe to triple changes in real time?
Yes. CDC (Change Data Capture) is available in v0.6.0 via PostgreSQL NOTIFY:
-- Subscribe to a specific predicate
SELECT pg_ripple.subscribe('<https://schema.org/name>', 'name_changes');
-- In another session
LISTEN name_changes;
-- Notifications arrive when triples are inserted or deleted
SELECT pg_ripple.insert_triple(
'<https://example.org/Alice>',
'<https://schema.org/name>',
'"Alice"'
);
Subscriptions are stored in _pg_ripple.cdc_subscriptions and persist across reconnects (but must be re-registered after a server restart). See the Administration reference for details.
Why does my query not see recently inserted triples?
If you inserted triples and immediately queried with SPARQL, the results should include those triples — delta tables are always queried alongside main tables.
If triples are missing, check:
- The triple was committed (not inside an uncommitted transaction)
- The correct graph is being queried (default graph vs named graph)
- The correct predicate IRI spelling was used
What is the HTTP endpoint URL?
The pg_ripple_http companion service listens on http://localhost:7878/sparql by default. Configure the port with PG_RIPPLE_HTTP_PORT. The URL accepts both GET and POST SPARQL requests per the W3C SPARQL 1.1 Protocol.
How do I connect SPARQL tools to pg_ripple?
Start pg_ripple_http alongside your PostgreSQL instance. Point any SPARQL client (YASGUI, Protege, SPARQLWrapper, Jena) to http://localhost:7878/sparql. The endpoint supports standard content negotiation (Accept: application/sparql-results+json, text/turtle, etc.).
Can I run pg_ripple_http inside Docker?
Yes. The Docker image bundles both PostgreSQL and pg_ripple_http. Use docker compose up with the provided docker-compose.yml to start both services. The SPARQL endpoint is exposed on port 7878 by default.
JSON-LD Framing (v0.17.0)
What is JSON-LD Framing and how is it different from plain JSON-LD export?
Plain JSON-LD export (export_jsonld) serializes every triple in the graph as a flat list of node objects. JSON-LD Framing lets you specify the desired output shape — which types to select, which properties to include, and how to nest related nodes — using a frame document. The result is a nested, structured JSON-LD document suitable for serving directly from a REST API.
The key difference in performance: framing reads only the VP tables touched by the frame. A frame targeting 3 predicates on a graph with 10,000 predicates reads 3 VP tables, not 10,000.
Which W3C framing features are supported?
pg_ripple v0.17.0 supports: @type matching, @id matching, property wildcards {}, absent-property patterns [], @reverse, @embed (@once/@always/@never), @explicit, @omitDefault, @default, @requireAll, @context compaction, named graph @graph scoping, and @omitGraph.
Value pattern matching (@value/@language/@type inside value objects) is deferred to a future release.
What is value pattern matching and why is it deferred?
Value pattern matching would allow frames like {"ex:name": {"@language": "en"}} to select only English-language name literals. Implementing this correctly requires a full-graph scan to find matching literals — it cannot be done efficiently with the VP table join model. It is deferred until a targeted literal index is available.
What is the difference between framing views and SPARQL views?
SPARQL views (create_sparql_view) store raw SPARQL SELECT results as integer ID columns in a stream table. Framing views (create_framing_view) run the full embedding and compaction pipeline over CONSTRUCT results, so each row in the stream table contains a ready-to-serve nested JSON-LD document rather than raw projection values.
Use SPARQL views when you need low-level access to result bindings; use framing views when you want ready-to-serve nested JSON-LD for an API.
Vector Federation (v0.28.0)
How does vector federation work?
After registering an external endpoint with pg_ripple.register_vector_endpoint(url, api_type), pg_ripple can route similarity queries to Weaviate, Qdrant, Pinecone, or a remote pgvector instance. The results are merged with local triple store data using Reciprocal Rank Fusion inside hybrid_search().
How do I prevent SSRF attacks when using vector federation?
pg_ripple does not restrict which URLs can be registered. You should use network policies (e.g., Kubernetes NetworkPolicy, AWS security groups) to restrict which external hosts your PostgreSQL server can reach. Only register endpoints that belong to trusted vector services in your infrastructure.
Why does my federated query time out?
The default timeout is 5000 ms. Increase it with:
SET pg_ripple.vector_federation_timeout_ms = 30000;
Or configure it globally via ALTER SYSTEM SET pg_ripple.vector_federation_timeout_ms = 30000; SELECT pg_reload_conf();
How do I configure a remote endpoint's API key?
pg_ripple does not store API keys for external vector services. Pass the API key in the endpoint URL if the service supports it, or configure it via environment variables in your application layer before calling the endpoint.