Configuration and Tuning

pg_ripple exposes its configuration through PostgreSQL GUC (Grand Unified Configuration) parameters. All parameters use the pg_ripple. prefix and can be set in postgresql.conf, via ALTER SYSTEM, or per-session with SET.

Restart requirements

Parameters marked Postmaster require a PostgreSQL restart. Parameters marked SIGHUP can be reloaded with SELECT pg_reload_conf(). All others can be changed per-session with SET.


Storage Parameters

Control how triples are stored in VP tables and the rare-predicate consolidation table.

ParameterTypeDefaultRangeContextDescription
vp_promotion_thresholdint100010 – 10,000,000UsersetMinimum triples before a predicate gets a dedicated VP table. Below this, triples go to vp_rare.
named_graph_optimizedbooloffUsersetAdds a (g, s, o) index per VP table. Speeds up GRAPH queries but increases write overhead.
default_graphtext''Any IRIUsersetIRI used as the default graph when g is not specified on insert.
dedup_on_mergebooloffUsersetWhen on, the merge worker deduplicates (s, o, g) rows, keeping the lowest SID.

HTAP / Merge Worker Parameters

Control the delta/main split and background merge behavior. These take effect only when pg_ripple is loaded via shared_preload_libraries.

ParameterTypeDefaultRangeContextDescription
merge_thresholdint100001 – 2,147,483,647SIGHUPDelta row count that triggers a merge for a predicate. Lower = fresher reads, more I/O.
merge_interval_secsint601 – 3600SIGHUPMaximum seconds between merge worker poll cycles.
merge_retention_secondsint600 – 86,400SIGHUPSeconds to keep the old main table after a merge before dropping it.
latch_trigger_thresholdint100001 – 2,147,483,647SIGHUPRows written in a batch before poking the merge worker latch immediately.
merge_watchdog_timeoutint30010 – 86,400SIGHUPSeconds of merge worker inactivity before logging a WARNING.
worker_databasetext'postgres'SIGHUPDatabase the background merge worker connects to.
auto_analyzeboolonSIGHUPRun ANALYZE on VP main tables after each merge cycle.

Query Engine Parameters

Tune SPARQL-to-SQL translation and execution.

ParameterTypeDefaultRangeContextDescription
plan_cache_sizeint2560 – 65,536UsersetCached SPARQL→SQL translations per backend. 0 disables caching.
max_path_depthint1000 – 10,000UsersetMaximum recursion depth for property path queries (+, *). 0 = unlimited.
property_path_max_depthint641 – 100,000UsersetAlternative property path depth limit (v0.24.0).
describe_strategytext'cbd''cbd', 'scbd', 'simple'UsersetDESCRIBE algorithm: Concise Bounded Description, Symmetric CBD, or simple one-hop.
bgp_reorderboolonUsersetReorder BGP triple patterns by estimated selectivity before SQL generation.
parallel_query_min_joinsint31 – 100UsersetMinimum VP-table joins before enabling parallel query workers.
sparql_strictboolonUsersetWhen on, unsupported FILTER functions raise an error; when off, they are silently dropped.
export_batch_sizeint10000100 – 1,000,000UsersetTriples per cursor batch during streaming export.

Inference / Datalog Parameters

Control the Datalog reasoning engine, magic sets, and rule caching.

ParameterTypeDefaultRangeContextDescription
inference_modetext'off''off', 'on_demand', 'materialized'UsersetDatalog reasoning mode. 'materialized' requires pg_trickle.
enforce_constraintstext'off''off', 'warn', 'error'UsersetBehavior when Datalog constraint rules detect violations.
rule_graph_scopetext'default''default', 'all'UsersetWhether unscoped rule atoms operate on the default graph only or all graphs.
magic_setsboolonUsersetUse magic sets for goal-directed inference in infer_goal().
datalog_cost_reorderboolonUsersetSort rule body atoms by ascending VP-table cardinality before SQL compilation.
datalog_antijoin_thresholdint10000 – 10,000,000UsersetMinimum VP rows for NOT atoms to use LEFT JOIN anti-join form.
delta_index_thresholdint5000 – 10,000,000UsersetMinimum semi-naive delta rows before creating a B-tree index.
demand_transformboolonUsersetAuto-apply demand transformation when multiple goal patterns are specified.
sameas_reasoningboolonUsersetApply owl:sameAs canonicalization pre-pass during inference.
rule_plan_cacheboolonUsersetCache compiled SQL for each rule set. Invalidated by drop_rules() and load_rules().
rule_plan_cache_sizeint641 – 4,096UsersetMaximum rule sets in the plan cache.

Well-Founded Semantics / Tabling Parameters

Control WFS evaluation and tabling cache (v0.32.0).

ParameterTypeDefaultRangeContextDescription
wfs_max_iterationsint1001 – 10,000UsersetSafety cap on alternating fixpoint rounds per WFS pass. Emits PT520 WARNING if not converged.
tablingboolonUsersetCache infer_wfs() and SPARQL results in _pg_ripple.tabling_cache.
tabling_ttlint3000 – 86,400UsersetTTL in seconds for tabling cache entries. 0 disables TTL-based expiry.

SHACL Validation Parameters

ParameterTypeDefaultRangeContextDescription
shacl_modetext'off''off', 'sync', 'async'Userset'sync' rejects violations inline; 'async' queues for background validation.

Federation Parameters

Control remote SPARQL endpoint calls via the SERVICE keyword.

ParameterTypeDefaultRangeContextDescription
federation_timeoutint301 – 3,600UsersetPer-SERVICE call wall-clock timeout in seconds.
federation_max_resultsint100001 – 1,000,000UsersetMaximum rows accepted from a single remote call.
federation_on_errortext'warning''warning', 'error', 'empty'UsersetBehavior on SERVICE call failure.
federation_pool_sizeint41 – 32UsersetIdle HTTP connections per endpoint host.
federation_cache_ttlint00 – 86,400UsersetRemote result cache TTL in seconds. 0 disables caching.
federation_on_partialtext'empty''empty', 'use'UsersetBehavior on mid-stream SERVICE failure.
federation_adaptive_timeoutbooloffUsersetDerive per-endpoint timeout from P95 latency.

Shared Memory Parameters (Startup Only)

These must be set in postgresql.conf before PostgreSQL starts. They cannot be changed at runtime.

ParameterTypeDefaultRangeContextDescription
dictionary_cache_sizeint40960 – 1,000,000PostmasterShared-memory encode cache capacity in entries.
cache_budgetint640 – 65,536PostmasterShared-memory budget cap in MB. Bulk loads throttle at 90% utilization.

Startup GUCs require restart

Changes to dictionary_cache_size and cache_budget require a full PostgreSQL restart. Plan your cache sizing before deploying to production.


Security Parameters

ParameterTypeDefaultRangeContextDescription
rls_bypassbooloffSusetSuperuser override to bypass graph-level Row-Level Security.

Vector / Embedding Parameters

ParameterTypeDefaultRangeContextDescription
embedding_modeltext''UsersetModel name tag stored in _pg_ripple.embeddings.
embedding_dimensionsint15361 – 16,000UsersetVector dimension count. Must match model output.
embedding_api_urltext''UsersetBase URL for OpenAI-compatible embedding API.
embedding_api_keytext''SusetAPI key (superuser-only, masked in pg_settings).
pgvector_enabledboolonUsersetDisable pgvector code paths without uninstalling.
embedding_index_typetext'hnsw''hnsw', 'ivfflat'UsersetIndex type on embeddings table.
embedding_precisiontext'single''single', 'half', 'binary'UsersetStorage precision for embedding vectors.
auto_embedbooloffUsersetAuto-embed new entities via background worker.
embedding_batch_sizeint1001 – 10,000UsersetEntities dequeued per background worker batch.

Quick-Start Configurations

Small Dataset (< 1M triples)

Suitable for development, prototyping, or small knowledge graphs:

# postgresql.conf
shared_preload_libraries = 'pg_ripple'

# Dictionary cache — small footprint
pg_ripple.dictionary_cache_size = 8192
pg_ripple.cache_budget = 16

# Merge worker — merge early for fresh reads
pg_ripple.merge_threshold = 5000
pg_ripple.merge_interval_secs = 30

# Query engine
pg_ripple.plan_cache_size = 64
pg_ripple.max_path_depth = 50

Medium Dataset (1M – 100M triples)

Production workloads with moderate query complexity:

# postgresql.conf
shared_preload_libraries = 'pg_ripple'

# Dictionary cache — larger cache for better hit rates
pg_ripple.dictionary_cache_size = 131072
pg_ripple.cache_budget = 128

# Merge worker — balance freshness and I/O
pg_ripple.merge_threshold = 50000
pg_ripple.merge_interval_secs = 60
pg_ripple.latch_trigger_threshold = 20000
pg_ripple.auto_analyze = on

# Query engine — larger plan cache for diverse queries
pg_ripple.plan_cache_size = 512
pg_ripple.max_path_depth = 100
pg_ripple.bgp_reorder = on

# Inference (if used)
pg_ripple.inference_mode = 'on_demand'
pg_ripple.magic_sets = on

Large Dataset (> 100M triples)

High-throughput production with heavy query loads:

# postgresql.conf
shared_preload_libraries = 'pg_ripple'

# Dictionary cache — maximize cache coverage
pg_ripple.dictionary_cache_size = 500000
pg_ripple.cache_budget = 512

# Merge worker — batch larger merges, reduce churn
pg_ripple.merge_threshold = 200000
pg_ripple.merge_interval_secs = 120
pg_ripple.latch_trigger_threshold = 100000
pg_ripple.merge_retention_seconds = 120
pg_ripple.auto_analyze = on

# Query engine — large plan cache, parallel queries
pg_ripple.plan_cache_size = 2048
pg_ripple.max_path_depth = 200
pg_ripple.bgp_reorder = on
pg_ripple.parallel_query_min_joins = 2

# Named graph optimization (if heavy GRAPH usage)
pg_ripple.named_graph_optimized = on

# Inference
pg_ripple.inference_mode = 'on_demand'
pg_ripple.magic_sets = on
pg_ripple.rule_plan_cache = on
pg_ripple.rule_plan_cache_size = 256

# Tabling cache for repeated inference patterns
pg_ripple.tabling = on
pg_ripple.tabling_ttl = 600

# Federation (if used)
pg_ripple.federation_timeout = 60
pg_ripple.federation_pool_size = 8
pg_ripple.federation_cache_ttl = 300

PostgreSQL tuning

Don't forget to tune PostgreSQL itself alongside pg_ripple. Key PostgreSQL parameters for triple store workloads:

  • shared_buffers = 25% of RAM
  • effective_cache_size = 75% of RAM
  • work_mem = 64MB–256MB (for complex joins)
  • maintenance_work_mem = 512MB–1GB (for merge ANALYZE)
  • random_page_cost = 1.1 (if using SSDs)
  • max_parallel_workers_per_gather = 4