Parallel Merge Worker Pool

Added in v0.42.0

Overview

pg_ripple uses a Vertical Partitioning (VP) architecture where each unique predicate gets its own storage table. The merge worker pool keeps the read-optimised _main partitions in sync with the write-optimised _delta tables.

By default, a single background worker handles all predicates sequentially. For workloads with many distinct predicates — such as rich ontologies with 50+ property types — a pool of parallel workers can significantly improve write throughput.

Configuration

pg_ripple.merge_workers (startup only)

Controls the number of parallel merge worker processes. Must be set in postgresql.conf or before the server starts; it cannot be changed with SET at session level.

# postgresql.conf
shared_preload_libraries = 'pg_ripple'
pg_ripple.merge_workers = 4
  • Default: 1 (single worker, original behaviour)
  • Range: 1 to 16
  • Type: integer, PGC_POSTMASTER (startup-only)

pg_ripple.merge_threshold

Minimum rows in a VP delta table before a merge is triggered. Increasing this reduces merge frequency but increases per-merge cost.

SET pg_ripple.merge_threshold = 50000;  -- default: 10000

pg_ripple.merge_interval_secs

Maximum seconds between merge worker polling cycles.

SET pg_ripple.merge_interval_secs = 30;  -- default: 60

How It Works

With merge_workers = N, pg_ripple spawns N background worker processes. Each worker owns a disjoint round-robin subset of VP predicates:

  • Worker 0 handles predicates where pred_id % N == 0
  • Worker 1 handles predicates where pred_id % N == 1
  • … and so on

Advisory locking prevents races: before merging a predicate, a worker calls pg_try_advisory_lock(pred_id). If another worker already holds the lock, it skips that predicate.

Work-stealing: after processing its assigned predicates, an idle worker checks whether any "foreign" predicate (not in its round-robin slice) has a delta table above the merge threshold and no lock held. If so, it steals that work. This prevents a single overloaded predicate from delaying the merge cycle.

Monitoring

Use pg_ripple.diagnostic_report() to check merge worker activity:

SELECT value FROM pg_ripple.diagnostic_report()
WHERE key LIKE 'merge_%';

Or query the background worker state:

SELECT pid, application_name, state
FROM pg_stat_activity
WHERE application_name LIKE 'pg_ripple merge%';

Choosing the Right Worker Count

Predicate countRecommended workers
< 201 (default)
20–1002–4
100–5004–8
> 5008–16

For most workloads, the bottleneck is not the worker count but the merge threshold and interval. Tune those first before scaling workers.

Restart Requirement

Because merge_workers is a PGC_POSTMASTER GUC, changes take effect only after a PostgreSQL restart:

# After updating postgresql.conf:
pg_ctl restart -D $PGDATA