When to Use pg_ripple

pg_ripple is a PostgreSQL extension that turns your database into a knowledge graph store. This page helps you decide whether it fits your architecture.

Decision flowchart

Ask yourself these questions in order:

Do you already run PostgreSQL? If yes, pg_ripple integrates with zero additional infrastructure for the data store. If you run a different database, evaluate the migration cost.
Do you need to model complex relationships? If your data is primarily tabular with few joins, standard SQL may be simpler. If you have deeply nested, many-to-many, or hierarchical relationships, a graph model helps.
Do you need a standard query language? SPARQL is a W3C standard with broad tool support. If you prefer a property-graph query language (Cypher/GQL), consider Neo4j or Amazon Neptune.
Do you need reasoning or validation? pg_ripple includes SHACL validation and Datalog reasoning. Standalone triple stores like Virtuoso or Blazegraph may not.
Do you need graph context for LLM prompts? pg_ripple combines SPARQL graph traversal with pgvector similarity search in a single query — something pure vector databases cannot do.

Comparison matrix

Criterion	pg_ripple	Plain SQL	Virtuoso / Blazegraph	Neo4j	Pure vector DB
Deployment	PostgreSQL extension	Any RDBMS	Standalone JVM	Standalone	Standalone
Query language	SPARQL 1.1	SQL	SPARQL 1.1	Cypher	Proprietary
Data model	RDF (triples)	Relational	RDF (triples)	Property graph	Vectors + metadata
Schema validation	SHACL	CHECK / triggers	Varies	Constraints	None
Reasoning	Datalog (RDFS, OWL RL)	Manual SQL	RDFS / OWL (varies)	None built-in	None
Vector search	pgvector integration	pgvector	Not built-in	Limited	Native
Hybrid graph+vector	Yes (single query)	Manual joins	No	No	No
HTTP API	pg_ripple_http	Build your own	Built-in	Built-in	Built-in
Transactions	Full PostgreSQL ACID	Full ACID	Varies	ACID	Varies
Backup/restore	pg_dump/pg_restore	Standard	Custom tools	Custom tools	Custom tools
Operational complexity	Low (PostgreSQL)	Low	Medium–High	Medium	Medium

When pg_ripple is a good fit

You already operate PostgreSQL and want to avoid managing a separate graph database
Your data has rich, interconnected relationships (ontologies, catalogs, supply chains)
You need SPARQL 1.1 compliance for interoperability with W3C-standard tools
You need to validate data quality against formal rules (SHACL)
You need to derive new facts from existing data (Datalog reasoning, OWL RL, RDFS)
You want to combine graph traversal with vector similarity for RAG pipelines
You need full ACID transactions on graph data

When pg_ripple is not the best fit

Graph datasets exceeding ~1 billion triples: pg_ripple has been tested to 100M triples. For very large datasets, consider distributed solutions.
Property graph with Cypher/GQL: if your team already uses Cypher and Neo4j, migrating to SPARQL has a learning curve. pg_ripple speaks SPARQL, not Cypher.
Pure vector search workload: if you only need approximate nearest neighbor search without graph traversal, pgvector alone is simpler.
Real-time streaming graphs: pg_ripple processes data in transactions, not continuous streams. For streaming graph analytics, consider Apache Flink with a graph library.
No PostgreSQL in your stack: if you run MySQL, MongoDB, or a managed NoSQL service and have no plans to adopt PostgreSQL, introducing it solely for pg_ripple adds operational overhead.

AI/LLM comparison: when does graph context outperform flat vector retrieval?

Graph-augmented retrieval helps when:

The query requires multi-hop reasoning — "find papers by co-authors of Alice's co-authors" cannot be answered by vector similarity alone
Entity deduplication matters — owl:sameAs canonicalization ensures the same entity is not embedded multiple times with different IRIs
Structured output is needed — JSON-LD framing produces token-efficient, structured context that flat top-k results cannot provide
Provenance matters — graph traversal can trace why a fact is relevant, not just that it is similar

Pure vector search (Qdrant, Weaviate, pgvector-only) is sufficient when:

The query is a simple "find similar documents" without relationship constraints
Your corpus is unstructured text without entity-level structure
Latency requirements are sub-millisecond at millions of vectors

Next steps

Installation — get pg_ripple running
Hello World — load and query data in five minutes