Configuration and Tuning
pg_ripple exposes its configuration through PostgreSQL GUC (Grand Unified Configuration) parameters. All parameters use the pg_ripple. prefix and can be set in postgresql.conf, via ALTER SYSTEM, or per-session with SET.
Parameters marked Postmaster require a PostgreSQL restart. Parameters marked SIGHUP can be reloaded with SELECT pg_reload_conf(). All others can be changed per-session with SET.
Storage Parameters
Control how triples are stored in VP tables and the rare-predicate consolidation table.
| Parameter | Type | Default | Range | Context | Description |
|---|---|---|---|---|---|
vp_promotion_threshold | int | 1000 | 10 – 10,000,000 | Userset | Minimum triples before a predicate gets a dedicated VP table. Below this, triples go to vp_rare. |
named_graph_optimized | bool | off | — | Userset | Adds a (g, s, o) index per VP table. Speeds up GRAPH queries but increases write overhead. |
default_graph | text | '' | Any IRI | Userset | IRI used as the default graph when g is not specified on insert. |
dedup_on_merge | bool | off | — | Userset | When on, the merge worker deduplicates (s, o, g) rows, keeping the lowest SID. |
HTAP / Merge Worker Parameters
Control the delta/main split and background merge behavior. These take effect only when pg_ripple is loaded via shared_preload_libraries.
| Parameter | Type | Default | Range | Context | Description |
|---|---|---|---|---|---|
merge_threshold | int | 10000 | 1 – 2,147,483,647 | SIGHUP | Delta row count that triggers a merge for a predicate. Lower = fresher reads, more I/O. |
merge_interval_secs | int | 60 | 1 – 3600 | SIGHUP | Maximum seconds between merge worker poll cycles. |
merge_retention_seconds | int | 60 | 0 – 86,400 | SIGHUP | Seconds to keep the old main table after a merge before dropping it. |
latch_trigger_threshold | int | 10000 | 1 – 2,147,483,647 | SIGHUP | Rows written in a batch before poking the merge worker latch immediately. |
merge_watchdog_timeout | int | 300 | 10 – 86,400 | SIGHUP | Seconds of merge worker inactivity before logging a WARNING. |
worker_database | text | 'postgres' | — | SIGHUP | Database the background merge worker connects to. |
auto_analyze | bool | on | — | SIGHUP | Run ANALYZE on VP main tables after each merge cycle. |
Query Engine Parameters
Tune SPARQL-to-SQL translation and execution.
| Parameter | Type | Default | Range | Context | Description |
|---|---|---|---|---|---|
plan_cache_size | int | 256 | 0 – 65,536 | Userset | Cached SPARQL→SQL translations per backend. 0 disables caching. |
max_path_depth | int | 100 | 0 – 10,000 | Userset | Maximum recursion depth for property path queries (+, *). 0 = unlimited. |
property_path_max_depth | int | 64 | 1 – 100,000 | Userset | Alternative property path depth limit (v0.24.0). |
describe_strategy | text | 'cbd' | 'cbd', 'scbd', 'simple' | Userset | DESCRIBE algorithm: Concise Bounded Description, Symmetric CBD, or simple one-hop. |
bgp_reorder | bool | on | — | Userset | Reorder BGP triple patterns by estimated selectivity before SQL generation. |
parallel_query_min_joins | int | 3 | 1 – 100 | Userset | Minimum VP-table joins before enabling parallel query workers. |
sparql_strict | bool | on | — | Userset | When on, unsupported FILTER functions raise an error; when off, they are silently dropped. |
export_batch_size | int | 10000 | 100 – 1,000,000 | Userset | Triples per cursor batch during streaming export. |
Inference / Datalog Parameters
Control the Datalog reasoning engine, magic sets, and rule caching.
| Parameter | Type | Default | Range | Context | Description |
|---|---|---|---|---|---|
inference_mode | text | 'off' | 'off', 'on_demand', 'materialized' | Userset | Datalog reasoning mode. 'materialized' requires pg_trickle. |
enforce_constraints | text | 'off' | 'off', 'warn', 'error' | Userset | Behavior when Datalog constraint rules detect violations. |
rule_graph_scope | text | 'default' | 'default', 'all' | Userset | Whether unscoped rule atoms operate on the default graph only or all graphs. |
magic_sets | bool | on | — | Userset | Use magic sets for goal-directed inference in infer_goal(). |
datalog_cost_reorder | bool | on | — | Userset | Sort rule body atoms by ascending VP-table cardinality before SQL compilation. |
datalog_antijoin_threshold | int | 1000 | 0 – 10,000,000 | Userset | Minimum VP rows for NOT atoms to use LEFT JOIN anti-join form. |
delta_index_threshold | int | 500 | 0 – 10,000,000 | Userset | Minimum semi-naive delta rows before creating a B-tree index. |
demand_transform | bool | on | — | Userset | Auto-apply demand transformation when multiple goal patterns are specified. |
sameas_reasoning | bool | on | — | Userset | Apply owl:sameAs canonicalization pre-pass during inference. |
rule_plan_cache | bool | on | — | Userset | Cache compiled SQL for each rule set. Invalidated by drop_rules() and load_rules(). |
rule_plan_cache_size | int | 64 | 1 – 4,096 | Userset | Maximum rule sets in the plan cache. |
Well-Founded Semantics / Tabling Parameters
Control WFS evaluation and tabling cache (v0.32.0).
| Parameter | Type | Default | Range | Context | Description |
|---|---|---|---|---|---|
wfs_max_iterations | int | 100 | 1 – 10,000 | Userset | Safety cap on alternating fixpoint rounds per WFS pass. Emits PT520 WARNING if not converged. |
tabling | bool | on | — | Userset | Cache infer_wfs() and SPARQL results in _pg_ripple.tabling_cache. |
tabling_ttl | int | 300 | 0 – 86,400 | Userset | TTL in seconds for tabling cache entries. 0 disables TTL-based expiry. |
SHACL Validation Parameters
| Parameter | Type | Default | Range | Context | Description |
|---|---|---|---|---|---|
shacl_mode | text | 'off' | 'off', 'sync', 'async' | Userset | 'sync' rejects violations inline; 'async' queues for background validation. |
Federation Parameters
Control remote SPARQL endpoint calls via the SERVICE keyword.
| Parameter | Type | Default | Range | Context | Description |
|---|---|---|---|---|---|
federation_timeout | int | 30 | 1 – 3,600 | Userset | Per-SERVICE call wall-clock timeout in seconds. |
federation_max_results | int | 10000 | 1 – 1,000,000 | Userset | Maximum rows accepted from a single remote call. |
federation_on_error | text | 'warning' | 'warning', 'error', 'empty' | Userset | Behavior on SERVICE call failure. |
federation_pool_size | int | 4 | 1 – 32 | Userset | Idle HTTP connections per endpoint host. |
federation_cache_ttl | int | 0 | 0 – 86,400 | Userset | Remote result cache TTL in seconds. 0 disables caching. |
federation_on_partial | text | 'empty' | 'empty', 'use' | Userset | Behavior on mid-stream SERVICE failure. |
federation_adaptive_timeout | bool | off | — | Userset | Derive per-endpoint timeout from P95 latency. |
Shared Memory Parameters (Startup Only)
These must be set in postgresql.conf before PostgreSQL starts. They cannot be changed at runtime.
| Parameter | Type | Default | Range | Context | Description |
|---|---|---|---|---|---|
dictionary_cache_size | int | 4096 | 0 – 1,000,000 | Postmaster | Shared-memory encode cache capacity in entries. |
cache_budget | int | 64 | 0 – 65,536 | Postmaster | Shared-memory budget cap in MB. Bulk loads throttle at 90% utilization. |
Changes to dictionary_cache_size and cache_budget require a full PostgreSQL restart. Plan your cache sizing before deploying to production.
Security Parameters
| Parameter | Type | Default | Range | Context | Description |
|---|---|---|---|---|---|
rls_bypass | bool | off | — | Suset | Superuser override to bypass graph-level Row-Level Security. |
Vector / Embedding Parameters
| Parameter | Type | Default | Range | Context | Description |
|---|---|---|---|---|---|
embedding_model | text | '' | — | Userset | Model name tag stored in _pg_ripple.embeddings. |
embedding_dimensions | int | 1536 | 1 – 16,000 | Userset | Vector dimension count. Must match model output. |
embedding_api_url | text | '' | — | Userset | Base URL for OpenAI-compatible embedding API. |
embedding_api_key | text | '' | — | Suset | API key (superuser-only, masked in pg_settings). |
pgvector_enabled | bool | on | — | Userset | Disable pgvector code paths without uninstalling. |
embedding_index_type | text | 'hnsw' | 'hnsw', 'ivfflat' | Userset | Index type on embeddings table. |
embedding_precision | text | 'single' | 'single', 'half', 'binary' | Userset | Storage precision for embedding vectors. |
auto_embed | bool | off | — | Userset | Auto-embed new entities via background worker. |
embedding_batch_size | int | 100 | 1 – 10,000 | Userset | Entities dequeued per background worker batch. |
Quick-Start Configurations
Small Dataset (< 1M triples)
Suitable for development, prototyping, or small knowledge graphs:
# postgresql.conf
shared_preload_libraries = 'pg_ripple'
# Dictionary cache — small footprint
pg_ripple.dictionary_cache_size = 8192
pg_ripple.cache_budget = 16
# Merge worker — merge early for fresh reads
pg_ripple.merge_threshold = 5000
pg_ripple.merge_interval_secs = 30
# Query engine
pg_ripple.plan_cache_size = 64
pg_ripple.max_path_depth = 50
Medium Dataset (1M – 100M triples)
Production workloads with moderate query complexity:
# postgresql.conf
shared_preload_libraries = 'pg_ripple'
# Dictionary cache — larger cache for better hit rates
pg_ripple.dictionary_cache_size = 131072
pg_ripple.cache_budget = 128
# Merge worker — balance freshness and I/O
pg_ripple.merge_threshold = 50000
pg_ripple.merge_interval_secs = 60
pg_ripple.latch_trigger_threshold = 20000
pg_ripple.auto_analyze = on
# Query engine — larger plan cache for diverse queries
pg_ripple.plan_cache_size = 512
pg_ripple.max_path_depth = 100
pg_ripple.bgp_reorder = on
# Inference (if used)
pg_ripple.inference_mode = 'on_demand'
pg_ripple.magic_sets = on
Large Dataset (> 100M triples)
High-throughput production with heavy query loads:
# postgresql.conf
shared_preload_libraries = 'pg_ripple'
# Dictionary cache — maximize cache coverage
pg_ripple.dictionary_cache_size = 500000
pg_ripple.cache_budget = 512
# Merge worker — batch larger merges, reduce churn
pg_ripple.merge_threshold = 200000
pg_ripple.merge_interval_secs = 120
pg_ripple.latch_trigger_threshold = 100000
pg_ripple.merge_retention_seconds = 120
pg_ripple.auto_analyze = on
# Query engine — large plan cache, parallel queries
pg_ripple.plan_cache_size = 2048
pg_ripple.max_path_depth = 200
pg_ripple.bgp_reorder = on
pg_ripple.parallel_query_min_joins = 2
# Named graph optimization (if heavy GRAPH usage)
pg_ripple.named_graph_optimized = on
# Inference
pg_ripple.inference_mode = 'on_demand'
pg_ripple.magic_sets = on
pg_ripple.rule_plan_cache = on
pg_ripple.rule_plan_cache_size = 256
# Tabling cache for repeated inference patterns
pg_ripple.tabling = on
pg_ripple.tabling_ttl = 600
# Federation (if used)
pg_ripple.federation_timeout = 60
pg_ripple.federation_pool_size = 8
pg_ripple.federation_cache_ttl = 300
Don't forget to tune PostgreSQL itself alongside pg_ripple. Key PostgreSQL parameters for triple store workloads:
shared_buffers= 25% of RAMeffective_cache_size= 75% of RAMwork_mem= 64MB–256MB (for complex joins)maintenance_work_mem= 512MB–1GB (for merge ANALYZE)random_page_cost= 1.1 (if using SSDs)max_parallel_workers_per_gather= 4