GUC Reference

All pg_ripple configuration parameters are set with ALTER SYSTEM SET, SET (session-level), or in postgresql.conf. Reload with SELECT pg_reload_conf() after ALTER SYSTEM.


General Parameters

pg_ripple.max_path_depth

TypeInteger
Default10
Range1–100

Maximum recursion depth for SPARQL property paths (*, +). Increase for deeply nested graphs; lower for tighter resource bounds.


pg_ripple.property_path_max_depth (deprecated)

TypeInteger
Default64
Range1–100 000
StatusDeprecated since v0.38.0 — use max_path_depth instead

Legacy alias for max_path_depth. Setting this GUC still works but emits a deprecation notice. It will be removed in a future major release.


pg_ripple.federation_timeout

TypeInteger (milliseconds)
Default5000

Timeout for outbound SPARQL federation requests.


pg_ripple.export_batch_size

TypeInteger
Default1000

Number of rows written per batch in Parquet export operations.


Embedding / Vector Parameters (v0.27.0+)

These GUCs control the pgvector integration introduced in v0.27.0. All embedding functions degrade gracefully when pgvector is absent.


pg_ripple.pgvector_enabled

TypeBoolean
Defaulton

Master switch for all vector embedding paths. Set to off to disable embedding storage, similarity search, and SPARQL pg:similar() without uninstalling pgvector. Useful for temporarily disabling the feature.

-- Disable at session level for a bulk load
SET pg_ripple.pgvector_enabled = off;

pg_ripple.embedding_api_url

TypeString
Default(none)

Base URL for the OpenAI-compatible embeddings API. The extension appends /embeddings to this URL when making requests.

ALTER SYSTEM SET pg_ripple.embedding_api_url = 'https://api.openai.com/v1';
-- For Ollama (local):
ALTER SYSTEM SET pg_ripple.embedding_api_url = 'http://localhost:11434/v1';

pg_ripple.embedding_api_key

TypeString
Default(none)

Bearer token sent as Authorization: Bearer <key> in embedding API requests. For local models that don't require authentication, set to any non-empty string (e.g., 'local').

Security: Avoid storing API keys in postgresql.conf. Use ALTER SYSTEM and restrict pg_hba.conf access, or inject the key via a session-level SET in application code.


pg_ripple.embedding_model

TypeString
Default(none)

Model name passed in the "model" field of embedding API requests.

ALTER SYSTEM SET pg_ripple.embedding_model = 'text-embedding-3-small';
-- or for Ollama:
ALTER SYSTEM SET pg_ripple.embedding_model = 'nomic-embed-text';

pg_ripple.embedding_dimensions

TypeInteger
Default1536
Range1–65535

Expected output dimensions from the embedding model. Must match the model's output length. Common values:

ModelDimensions
text-embedding-3-small1536
text-embedding-3-large3072
text-embedding-ada-0021536
nomic-embed-text (Ollama)768

pg_ripple.embedding_index_type

TypeString
Default(none — HNSW when pgvector present)
Valueshnsw, ivfflat

Index type for the _pg_ripple.embeddings table. HNSW is the default and recommended for most workloads. IVFFlat uses less memory but requires lists parameter tuning.


pg_ripple.embedding_precision

TypeString
Default(none — full float4 precision)
Values(unset), half, binary

Storage precision for embedding vectors. Reduces disk/memory usage at the cost of accuracy:

Valuepgvector typeNotes
(unset)vector(N)Full 32-bit float; highest accuracy
halfhalfvec(N)16-bit float; ~50% storage reduction
binarybit(N)1-bit quantised; ~97% storage reduction, lower accuracy

Note: Changing precision after data is stored requires re-running the migration or manually altering the column type and re-embedding.


v0.37.0: Tombstone GC & Error Safety

pg_ripple.tombstone_gc_enabled

TypeBoolean
Defaulton
Contextsighup (shared: requires server signal, not per-session)

When on, pg_ripple automatically issues VACUUM ANALYZE on a predicate's tombstone table after each merge cycle if the residual tombstone count exceeds tombstone_gc_threshold × main_row_count. Set to off to disable automatic tombstone cleanup (useful when managing VACUUM manually).

pg_ripple.tombstone_gc_threshold

TypeString (decimal)
Default0.05 (5%)
Range0.01.0
Contextsighup

Tombstone-to-main-row ratio that triggers automatic VACUUM after a merge cycle. When the remaining tombstone count divided by the new main table row count exceeds this value, a VACUUM ANALYZE is scheduled on the tombstone table.

Lower values (e.g. 0.01) trigger VACUUM more aggressively; higher values (e.g. 0.20) allow more tombstone bloat before cleanup.


v0.37.0: GUC Validator Rules

The following string-enum GUCs now reject invalid values at SET time with an error. Previously, invalid values were silently ignored until the execution path checked them.

GUCValid values
pg_ripple.inference_modeoff, on_demand, materialized
pg_ripple.enforce_constraintsoff, warn, error
pg_ripple.rule_graph_scopedefault, all
pg_ripple.shacl_modeoff, sync, async
pg_ripple.describe_strategycbd, scbd, simple

pg_ripple.rls_bypass scope change (v0.37.0): This GUC is now registered at PGC_POSTMASTER scope when pg_ripple is loaded via shared_preload_libraries. This prevents a session from bypassing graph-level RLS with SET LOCAL pg_ripple.rls_bypass = on.


v0.42.0: Parallel Merge Workers

pg_ripple.merge_workers

TypeInteger
Default1
Range116
Contextpostmaster (startup-only; set in postgresql.conf)

Number of background merge worker processes. Each worker owns a disjoint round-robin slice of VP predicates. Workers use pg_advisory_lock to prevent conflicts; idle workers steal work from overloaded peers. Increasing this value helps workloads with many distinct predicates (> 50).


v0.42.0: Cost-Based Federation Planner

pg_ripple.federation_planner_enabled

TypeBoolean
Defaulton
Contextuserset

When on, pg_ripple uses VoID statistics collected from remote SPARQL endpoints to sort the SERVICE execution order by ascending estimated cost. When off, SERVICE clauses are executed in document order.

pg_ripple.federation_stats_ttl_secs

TypeInteger
Default3600 (1 hour)
Range086400
Contextuserset

Seconds until cached VoID statistics for a remote endpoint are considered stale. Setting 0 disables caching (re-fetches on every query).

pg_ripple.federation_parallel_max

TypeInteger
Default4
Range164
Contextuserset

Maximum number of remote SERVICE clauses that pg_ripple will execute concurrently within a single query. Set to 1 to disable parallel SERVICE execution.

pg_ripple.federation_parallel_timeout

TypeInteger
Default60 (seconds)
Range13600
Contextuserset

Per-endpoint timeout when executing parallel SERVICE clauses. Endpoints that do not respond within this limit return an empty result set (with a WARNING). Does not affect sequential SERVICE execution.

pg_ripple.federation_inline_max_rows

TypeInteger
Default10000
Range11000000
Contextuserset

Maximum number of rows in the VALUES binding table passed to a remote SERVICE clause. When the result set from the local graph exceeds this limit, pg_ripple automatically spools the bindings into a temporary table (PT620 INFO logged) and issues multiple smaller requests to the remote endpoint in batches. Set to a lower value if remote endpoints enforce query complexity limits.

pg_ripple.federation_allow_private

TypeBoolean
Defaultoff
Contextsuperuser

Security-critical GUC — only superusers can set this.

When off (the default), register_endpoint() rejects endpoints whose hostname resolves to a loopback address (127.0.0.0/8), a link-local address (169.254.0.0/16), any RFC-1918 private range (10/8, 172.16/12, 192.168/16), or an IPv6 equivalent. This prevents server-side request forgery (SSRF) via malicious SPARQL SERVICE calls.

Set to on only in controlled environments where the remote endpoint is a trusted internal service (e.g., a local Fuseki instance in a Docker network).


v0.42.0: owl:sameAs Safety

pg_ripple.sameas_max_cluster_size

TypeInteger
Default100000
Range02147483647
Contextuserset

Maximum number of entities in a single owl:sameAs equivalence cluster before canonicalization is skipped with a PT550 WARNING. A single cluster larger than this limit is usually a data quality problem (e.g., a mistakenly asserted owl:sameAs owl:Thing). Set to 0 to disable the check (no limit).


v0.46.0: TopN Push-down & Datalog Sequence Batch

pg_ripple.topn_pushdown

TypeBoolean
Defaulton
Contextuserset

When on (default), SPARQL SELECT queries that contain both ORDER BY and LIMIT N (with no OFFSET > 0 and no DISTINCT) emit the SQL as … ORDER BY … LIMIT N rather than fetching all rows and discarding after decoding.

Set to off to disable the optimisation globally — for example, during debugging when you suspect that TopN push-down is producing incorrect results.

The sparql_explain() output includes a "topn_applied": true/false key that indicates whether push-down was applied to a specific query.

pg_ripple.datalog_sequence_batch

TypeInteger
Default10000
Range1001000000
Contextuserset

SID (statement-ID) range reserved per parallel Datalog worker per batch. Before launching N parallel strata workers, the coordinator atomically advances the global _pg_ripple.statement_id_seq sequence by N * datalog_sequence_batch, then assigns each worker an exclusive sub-range. Workers insert triples with pre-computed SIDs without touching the shared sequence, eliminating contention.

Increase this value if parallel inference workers frequently conflict on the sequence. Decrease it to reduce unused SID gaps when inference produces fewer triples than expected per batch.


v0.47.0: Validated String GUCs

All six string-valued GUCs below now reject invalid values at SET time (previously invalid values were accepted and silently ignored at runtime).

pg_ripple.federation_on_error

TypeString
Defaultwarning
Valid valueswarning, error, empty
Contextuserset

Controls behaviour when a SERVICE call fails completely. warning emits a PT610 WARNING and returns an empty binding set for that endpoint. error raises an ERROR and aborts the query. empty silently returns zero rows for that endpoint.

pg_ripple.federation_on_partial

TypeString
Defaultempty
Valid valuesempty, use
Contextuserset

Controls behaviour when a SERVICE response stream is interrupted mid-transfer (e.g., the remote endpoint drops the connection). empty discards partial results and returns zero rows. use keeps the rows received before the error.

pg_ripple.sparql_overflow_action

TypeString
Defaultwarn
Valid valueswarn, error
Contextuserset

Action taken when a SPARQL SELECT result set exceeds sparql_max_rowAction taken when a SPARQL> 0). warn truncates the result set and emits a PT601 WARNING. error raises an ERROR.

pg_ripple.tracing_exporter

| | | |---|--|---|--|---|--|---|--|---|--|---|--|---|--|---|--|---|--|---|--|---t, otlp|---|--|---|--|---|--|---|--|---|--|---|--|---|--|---|--|---|--|---utwrit|---|--|---|--|---|--|---|--|---|--|---|--|---|--|---|--|---|--|---|--|-erhead). otlpsends spans via the OTLP gRPC protocol to the endpoint specivia tby theOTEL_EXPORTER_OTLP_ENDPOINT` environment variable.

pg_ripple.embedding_index_type

TypeString
Defaulthnsw
Valid values`h
ChanginC this setCing after embeddings have been indexedChanginC this setCiREINDEX TABLE _pg_ripple.embeddings.

pg_ripple.embedding_precision

TypeString
Defaultsingle
Valid valuessingle, half, binary
Contextuserset

Storage precision for emStorage precision forngle uses pgvectorStorage precision for emStorage precision forngle uses pgvectorStorage precision for emStorage precision forngle uses pgvectorStorage precision for emStorage precision forngle uses pgvectorStorage precision for emStorage precision forngle uses pgbinary`.