pg_trickle
pg_trickle is a PostgreSQL 18 extension that turns ordinary SQL views into
self-maintaining stream tables — no external processes, no sidecars, no
bespoke refresh pipelines. Just CREATE EXTENSION pg_trickle and your views
stay fresh.
-- Declare a stream table — a view that maintains itself
SELECT pgtrickle.create_stream_table(
name => 'active_orders',
query => 'SELECT * FROM orders WHERE status = ''active''',
schedule => '30s'
);
-- Insert a row — the stream table updates automatically on the next refresh
INSERT INTO orders (id, status) VALUES (42, 'active');
SELECT count(*) FROM active_orders; -- 1
The problem with materialized views
PostgreSQL's materialized views are powerful but frustrating.
REFRESH MATERIALIZED VIEW re-runs the entire query from scratch, even if
only one row changed in a million-row table. Your choices are: burn CPU on
full recomputation, or accept stale data. Most teams end up building bespoke
refresh pipelines just to keep summary tables current.
What pg_trickle does differently
pg_trickle captures changes to your source tables and — on each refresh cycle — derives a delta query that processes only the changed rows and merges the result into the materialized table. One insert into a million-row source table? pg_trickle touches exactly one row's worth of computation.
The approach is grounded in the DBSP differential dataflow framework (Budiu et al., 2022). Delta queries are derived automatically from your SQL's operator tree: joins produce the classic bilinear expansion, aggregates maintain auxiliary counters, and linear operators like filters pass deltas through unchanged.
Key capabilities
| Feature | Description |
|---|---|
| Incremental refresh | Only changed rows are recomputed — never a full table scan |
| Cascading DAG | Stream tables that depend on stream tables propagate deltas downstream automatically |
| Demand-driven scheduling | Set a freshness interval on the views your app queries; upstream layers inherit the tightest schedule automatically |
| Hybrid CDC | Starts with lightweight row-level triggers; seamlessly transitions to WAL-based logical replication once available |
| Broad SQL support | JOINs, GROUP BY, DISTINCT, UNION/INTERSECT/EXCEPT, subqueries, CTEs (including WITH RECURSIVE), window functions, LATERAL, and more |
| Built-in observability | Monitoring views, refresh history, NOTIFY-based alerting |
| CloudNativePG-ready | Ships as an Image Volume extension image for Kubernetes deployments |
Demand-driven scheduling
With the default CALCULATED schedule mode, you only set an explicit refresh
interval on the stream tables your application actually queries. The system
propagates that cadence upward through the dependency graph: each upstream
stream table inherits the tightest schedule among its downstream dependents.
You declare freshness requirements where they matter — at the consumer — and
the entire pipeline adjusts without manual coordination.
Hybrid change capture
pg_trickle bootstraps with lightweight row-level triggers — no configuration
needed, works out of the box. Once the first refresh succeeds and
wal_level = logical is available, the system automatically transitions to
WAL-based logical replication for lower write-side overhead. The transition
is seamless: trigger → transitioning → WAL-only. If anything goes wrong, it
falls back to triggers.
Explore this documentation
- Getting Started — build a three-layer DAG from scratch in minutes
- SQL Reference — every function and option
- Architecture — how the engine works internally
- Configuration — GUC variables and tuning
Tutorials
- Fuse Circuit Breaker — protect stream tables from bulk-change storms
- Tiered Scheduling — configure multi-tier refresh cadences
- Migrating from Materialized Views — step-by-step migration guide
- Circular Dependencies — handle SCCs in your DAG
- Monitoring & Alerting — set up observability for stream tables
- ETL Bulk Load Patterns — safely load large batches without overwhelming CDC
Integrations
- CloudNativePG — deploy pg_trickle on Kubernetes
- Prometheus & Grafana — metrics and dashboards
- PgBouncer — connection pooling configuration
Source & releases
Written in Rust using pgrx. Targets PostgreSQL 18. Apache 2.0 licensed.
- Repository: github.com/grove/pg-trickle
- Install instructions: Installation
- Changelog: Changelog
- Roadmap: Roadmap
Getting Started with pg_trickle
What is pg_trickle?
pg_trickle adds stream tables to PostgreSQL — tables that are defined by a SQL query and kept automatically up to date as the underlying data changes. Think of them as materialized views that refresh themselves, but smarter: instead of re-running the entire query on every refresh, pg_trickle uses Incremental View Maintenance (IVM) to process only the rows that changed.
Traditional materialized views force a choice: either re-run the full query (expensive) or accept stale data. pg_trickle eliminates this trade-off. When you insert a single row into a million-row table, pg_trickle computes the effect of that one row on the query result — it doesn't touch the other 999,999.
How data flows
The key concept is that data flows downstream automatically — from your base tables through any chain of stream tables, without you writing a single line of orchestration code:
You write to base tables
│
▼
┌─────────────┐ triggers (or WAL) ┌─────────────────────┐
│ Base Tables │ ─────────────────────▶ │ Change Buffers │
│ (you write) │ │ (pgtrickle_changes.*) │
└─────────────┘ └──────────┬──────────┘
│
delta query (ΔQ) on refresh
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Stream Table A ◀── depends on base tables │
└──────────────────────────┬───────────────────────────────────┘
│ change captured, buffer written
▼
┌──────────────────────────────────────────────────────────────┐
│ Stream Table B ◀── depends on Stream Table A │
└──────────────────────────────────────────────────────────────┘
One write to a base table can ripple through an entire DAG of stream tables — each layer refreshed in the correct topological order, each doing only the work proportional to what actually changed.
- You write to your base tables normally —
INSERT,UPDATE,DELETE - Lightweight
AFTERrow-level triggers capture each change into a buffer, atomically in the same transaction. No polling, no logical replication slots required by default. - On each refresh cycle, pg_trickle derives a delta query (ΔQ) that reads only the buffered changes since the last refresh frontier
- The delta is merged into the stream table — only the affected rows are written
- If other stream tables depend on this one, they are scheduled next (topological order)
- Optionally: once
wal_level = logicalis available and the first refresh succeeds, pg_trickle automatically transitions from triggers to WAL-based CDC (near-zero write-path overhead compared to ~2–15 μs for triggers). The transition is seamless and transparent.
This tutorial walks through a concrete org-chart example so you can see this flow end to end, including a chain of stream tables that propagates changes automatically.
Prerequisites
- PostgreSQL 18.x with pg_trickle installed (see INSTALL.md)
shared_preload_libraries = 'pg_trickle'inpostgresql.confmax_worker_processesraised to at least 32 (see INSTALL.md); the PostgreSQL default of 8 is often exhausted if you have several databases, causing stream tables to silently stop refreshingpsqlor any SQL client
Deploying to production? See the Pre-Deployment Checklist for a complete list of requirements, pooler compatibility, and recommended GUC values.
Playground: The fastest way to experiment is the playground — a Docker Compose environment with sample tables and stream tables pre-loaded.
cd playground && docker compose up -dand you're running.
Quick start with Docker: Pull the pre-built GHCR image — PostgreSQL 18.3 + pg_trickle ready to run, no configuration needed:
docker run --rm -e POSTGRES_PASSWORD=secret -p 5432:5432 ghcr.io/grove/pg_trickle:latestAll GUC defaults (
wal_level,shared_preload_libraries, scheduler settings) are pre-configured. See INSTALL.md for tag details and volume mounting.
Connect to the database you want to use and enable the extension:
CREATE EXTENSION pg_trickle;
No additional configuration is needed. pg_trickle automatically discovers all databases on the server and starts a scheduler for each one where the extension is installed.
Chapter 1: Hello World — Your First Stream Table
Before diving into multi-table joins and recursive CTEs, start with the simplest possible stream table: a single-source aggregate with no joins.
1.1 Setup
Create one table and enable the extension:
CREATE EXTENSION IF NOT EXISTS pg_trickle;
CREATE TABLE products (
id SERIAL PRIMARY KEY,
category TEXT NOT NULL,
price NUMERIC(10,2) NOT NULL,
in_stock BOOLEAN NOT NULL DEFAULT true
);
INSERT INTO products (category, price) VALUES
('Electronics', 299.99),
('Electronics', 49.99),
('Books', 14.99),
('Books', 24.99),
('Books', 9.99);
1.2 Create the stream table
SELECT pgtrickle.create_stream_table(
name => 'category_summary',
query => $$
SELECT
category,
COUNT(*) AS product_count,
ROUND(AVG(price), 2) AS avg_price,
MIN(price) AS min_price,
MAX(price) AS max_price,
COUNT(*) FILTER (WHERE in_stock) AS in_stock_count
FROM products
GROUP BY category
$$,
schedule => '1s'
);
Query it immediately — it was populated by the initial full refresh:
SELECT category, product_count, avg_price, min_price, max_price, in_stock_count
FROM category_summary ORDER BY category;
category | product_count | avg_price | min_price | max_price | in_stock_count
-------------+---------------+-----------+-----------+-----------+----------------
Books | 3 | 16.66 | 9.99 | 24.99 | 3
Electronics | 2 | 174.99 | 49.99 | 299.99 | 2
(2 rows)
1.3 Watch an INSERT update one group
INSERT INTO products (category, price) VALUES ('Books', 39.99);
Within ~1 second (or call SELECT pgtrickle.refresh_stream_table('category_summary') to force it):
SELECT category, product_count, avg_price, min_price, max_price, in_stock_count
FROM category_summary WHERE category = 'Books';
category | product_count | avg_price | min_price | max_price | in_stock_count
----------+---------------+-----------+-----------+-----------+----------------
Books | 4 | 22.49 | 9.99 | 39.99 | 4
(1 row)
The Electronics row was not touched at all — pg_trickle read exactly 1
row from the change buffer, adjusted only the Books group.
1.4 Watch an UPDATE propagate
UPDATE products SET price = 19.99 WHERE price = 299.99;
After the next refresh:
SELECT category, product_count, avg_price, min_price, max_price, in_stock_count
FROM category_summary WHERE category = 'Electronics';
category | product_count | avg_price | min_price | max_price | in_stock_count
-------------+---------------+-----------+-----------+-----------+----------------
Electronics | 2 | 34.99 | 19.99 | 49.99 | 2
(1 row)
For AVG, pg_trickle maintains running sum and count columns internally, so
re-aggregating a group is O(1) regardless of group size.
1.5 What you just saw
- A single function call created the storage table, installed CDC triggers, ran the initial full refresh, and registered a 1-second schedule.
- Every subsequent DML on
productswas captured in anAFTERtrigger — no polling, no logical replication. - Each refresh touched only the rows and groups that changed.
- The stream table is a real PostgreSQL table — you can
SELECT, index, and join againstcategory_summarylike any other table.
Clean up:
SELECT pgtrickle.drop_stream_table('category_summary'); DROP TABLE products;
Chapter 2: Joins, Aggregates & Chains
What you'll build
An employee org-chart system with two stream tables:
department_tree— a recursive CTE that flattens a department hierarchy into paths likeCompany > Engineering > Backenddepartment_stats— a join + aggregation overdepartment_tree(a stream table!) that computes headcount and salary budget, with the full path includeddepartment_report— a further aggregation that rolls up stats to top-level departments
The chain departments → department_tree → department_stats → department_report demonstrates automatic downstream propagation: modify a department name in the base table and all three stream tables update automatically, in the right order, without any manual orchestration.
By the end you will have:
- Seen how stream tables are created, queried, and refreshed
- Watched a single
UPDATEin a base table cascade through three layers of stream tables automatically - Understood the four refresh modes and IVM strategies
Prefer dbt? A runnable dbt companion project mirrors every step below. Clone the repo and run:
./examples/dbt_getting_started/scripts/run_example.shSee examples/dbt_getting_started/ for full details.
2.1 Create the Base Tables
These are ordinary PostgreSQL tables — pg_trickle doesn't require any special column types, annotations, or schema conventions.
Tables without a primary key work, but pg_trickle will emit a WARNING at stream table creation time: change detection falls back to a content-based hash across all columns, which is slower for wide tables and cannot distinguish between identical duplicate rows. Adding a primary key gives the best performance and most reliable change detection. A primary key is also required for automatic transition to WAL-based CDC (cdc_mode = 'auto'); without one the source table stays on trigger-based CDC.
-- Department hierarchy (self-referencing tree)
CREATE TABLE departments (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
parent_id INT REFERENCES departments(id)
);
-- Employees belong to a department
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
department_id INT NOT NULL REFERENCES departments(id),
salary NUMERIC(10,2) NOT NULL
);
Now insert some data — a three-level department tree and a handful of employees:
-- Top-level
INSERT INTO departments (id, name, parent_id) VALUES
(1, 'Company', NULL);
-- Second level
INSERT INTO departments (id, name, parent_id) VALUES
(2, 'Engineering', 1),
(3, 'Sales', 1),
(4, 'Operations', 1);
-- Third level (under Engineering)
INSERT INTO departments (id, name, parent_id) VALUES
(5, 'Backend', 2),
(6, 'Frontend', 2),
(7, 'Platform', 2);
-- Employees
INSERT INTO employees (name, department_id, salary) VALUES
('Alice', 5, 120000), -- Backend
('Bob', 5, 115000), -- Backend
('Charlie', 6, 110000), -- Frontend
('Diana', 7, 130000), -- Platform
('Eve', 3, 95000), -- Sales
('Frank', 3, 90000), -- Sales
('Grace', 4, 100000); -- Operations
At this point these are plain tables with no triggers, no change tracking, nothing special. The department tree looks like this:
Company (1)
├── Engineering (2)
│ ├── Backend (5) — Alice, Bob
│ ├── Frontend (6) — Charlie
│ └── Platform (7) — Diana
├── Sales (3) — Eve, Frank
└── Operations (4) — Grace
2.2 Create the First Stream Table — Recursive Hierarchy
Our first stream table flattens the department tree. For every department, it computes the full path from the root and the depth level. This uses WITH RECURSIVE — a SQL construct that can't be differentiated with simple algebraic rules (the recursion depends on itself), but pg_trickle handles it using incremental strategies (semi-naive evaluation for inserts, Delete-and-Rederive for mixed changes) that we'll explain later.
SELECT pgtrickle.create_stream_table(
name => 'department_tree',
query => $$
WITH RECURSIVE tree AS (
-- Base case: root departments (no parent)
SELECT id, name, parent_id, name AS path, 0 AS depth
FROM departments
WHERE parent_id IS NULL
UNION ALL
-- Recursive step: children join back to the tree
SELECT d.id, d.name, d.parent_id,
tree.path || ' > ' || d.name AS path,
tree.depth + 1
FROM departments d
JOIN tree ON d.parent_id = tree.id
)
SELECT id, name, parent_id, path, depth FROM tree
$$,
schedule => '1s'
);
Note on short schedules: A 1-second schedule is safe for development and production thanks to
auto_backoff(on by default since v0.10.0). If a refresh takes more than 95% of the schedule window, the scheduler automatically stretches the effective interval (up to 8× the configured schedule) to prevent CPU runaway, then resets to 1× as soon as a refresh completes on time. You will see aWARNINGmessage when backoff activates.v0.2.0+:
create_stream_tablealso acceptsdiamond_consistency('none'or'atomic') anddiamond_schedule_policy('fastest'or'slowest') for diamond-shaped dependency graphs. Schedules can be cron expressions (e.g.,'*/5 * * * *','@hourly'). Setpooler_compatibility_mode => trueif you're connecting through PgBouncer or another transaction-mode connection pooler. See SQL_REFERENCE.md for the full parameter list.
What just happened?
That single function call did a lot of work atomically (all in one transaction):
- Parsed the defining query into an operator tree — identifying the recursive CTE, the scan on
departments, the join, the union - Created a storage table called
department_treein thepublicschema — a real PostgreSQL heap table with columns matching the SELECT output, plus internal columns__pgt_row_id(a hash used to track individual rows) - Installed CDC triggers on the
departmentstable — lightweightAFTER INSERT OR UPDATE OR DELETErow-level triggers that will capture every future change - Created a change buffer table in the
pgtrickle_changesschema — this is where the triggers write captured changes - Ran an initial full refresh — executed the recursive query against the current data and populated the storage table
- Registered the stream table in pg_trickle's catalog with a 1-second refresh schedule
TRUNCATE caveat: Row-level triggers do not fire on
TRUNCATE. If youTRUNCATEa base table, the change is not captured incrementally — the stream table will become stale. UseDELETE FROM tableinstead, or callpgtrickle.refresh_stream_table('department_tree')after a TRUNCATE. If the stream table uses DIFFERENTIAL mode, temporarily switch to FULL for a full recompute:pgtrickle.alter_stream_table('department_tree', refresh_mode => 'FULL'), refresh, then switch back. Query it immediately — it's already populated:
SELECT id, name, parent_id, path, depth FROM department_tree ORDER BY path;
Expected output:
id | name | parent_id | path | depth
----+-------------+-----------+----------------------------------+-------
1 | Company | | Company | 0
2 | Engineering | 1 | Company > Engineering | 1
5 | Backend | 2 | Company > Engineering > Backend | 2
6 | Frontend | 2 | Company > Engineering > Frontend | 2
7 | Platform | 2 | Company > Engineering > Platform | 2
4 | Operations | 1 | Company > Operations | 1
3 | Sales | 1 | Company > Sales | 1
(7 rows)
This is a real PostgreSQL table — you can create indexes on it, join it in other queries, reference it in views, or even use it as a source for other stream tables. pg_trickle keeps it in sync automatically.
Key insight: The recursive query that computes paths and depths would normally need to be re-run manually (or via
REFRESH MATERIALIZED VIEW). With pg_trickle, it stays fresh — any change to thedepartmentstable is automatically reflected within the schedule bound (1 second here).
2.3 Chain Stream Tables — Build the Downstream Layers
Now create department_stats. The twist: instead of joining directly against departments, it joins against department_tree — the stream table we just created. This creates a chain: changes to departments update department_tree, whose changes then trigger department_stats to update.
This demonstrates how pg_trickle builds a DAG — a directed acyclic graph of stream tables — and automatically schedules refreshes in topological order.
SELECT pgtrickle.create_stream_table(
name => 'department_stats',
query => $$
SELECT
t.id AS department_id,
t.name AS department_name,
t.path AS full_path,
t.depth,
COUNT(e.id) AS headcount,
COALESCE(SUM(e.salary), 0) AS total_salary,
COALESCE(AVG(e.salary), 0) AS avg_salary
FROM department_tree t
LEFT JOIN employees e ON e.department_id = t.id
GROUP BY t.id, t.name, t.path, t.depth
$$,
schedule => 'calculated' -- CALCULATED: inherit schedule from downstream; see explanation below
);
What just happened — and why this one is different?
Like before, pg_trickle parsed the query, created a storage table, and set up CDC. But department_stats depends on department_tree, not a base table — so no new triggers were installed. Instead, pg_trickle registered department_tree as an upstream dependency in the DAG.
The schedule is 'calculated' (CALCULATED mode), which means: "don't give this table its own schedule — inherit the tightest schedule of any downstream table that queries it". Internally this stores NULL in the catalog, but you must pass the string 'calculated' — passing SQL NULL is an error. Since no other stream table has been created yet, it will be refreshed on demand or when a downstream dependent triggers it.
The query has no recursive CTE, so pg_trickle uses algebraic differentiation:
- Decomposed into operators:
Scan(department_tree)→LEFT JOIN→Scan(employees)→Aggregate(GROUP BY + COUNT/SUM/AVG)→Project - Derived a differentiation rule for each:
Δ(Scan)= read only change buffer rows (not the full table)Δ(LEFT JOIN)= join change rows from one side against the full other sideΔ(Aggregate)= for COUNT/SUM/AVG, add or subtract per group — no rescan needed
- Composed these into a single delta query (ΔQ) that never touches unchanged rows
When one employee is inserted, the refresh reads one change buffer row, joins to find the department, and adjusts only that group's count and sum.
Query it:
SELECT department_name, full_path, headcount, total_salary
FROM department_stats
ORDER BY full_path;
Expected output:
department_name | full_path | headcount | total_salary
-----------------+----------------------------------+-----------+--------------
Company | Company | 0 | 0
Engineering | Company > Engineering | 0 | 0
Backend | Company > Engineering > Backend | 2 | 235000.00
Frontend | Company > Engineering > Frontend | 1 | 110000.00
Platform | Company > Engineering > Platform | 1 | 130000.00
Operations | Company > Operations | 1 | 100000.00
Sales | Company > Sales | 2 | 185000.00
(7 rows)
Notice that the full_path column comes from department_tree — this data already went through one layer of incremental maintenance before landing here.
Add a third layer: department_report
Now add a rollup that aggregates department_stats by top-level group (depth = 1):
SELECT pgtrickle.create_stream_table(
name => 'department_report',
query => $$
SELECT
split_part(full_path, ' > ', 2) AS division,
SUM(headcount) AS total_headcount,
SUM(total_salary) AS total_payroll
FROM department_stats
WHERE depth >= 1
GROUP BY 1
$$,
schedule => '1s' -- this is the only explicit schedule; CALCULATED tables above inherit it
);
The DAG is now:
departments (base) employees (base)
│ │
▼ │
department_tree ──────────┤
(DIFF, CALCULATED) │
│ ▼
└──────▶ department_stats
(DIFF, CALCULATED)
│
▼
department_report
(DIFF, 1s) ◀── only explicit schedule
department_report drives the whole pipeline. Because it has a 1-second schedule, pg_trickle automatically propagates that cadence upstream: department_stats and department_tree will also be refreshed within 1 second of a base table change, in topological order, with no manual configuration.
Query the report:
SELECT division, total_headcount, total_payroll FROM department_report ORDER BY division;
division | total_headcount | total_payroll
-------------+-----------------+---------------
Engineering | 4 | 475000.00
Operations | 1 | 100000.00
Sales | 2 | 185000.00
(3 rows)
2.4 Watch a Change Cascade Through All Three Layers
This is the heart of pg_trickle. We'll make four changes to the base tables and watch changes propagate automatically through the three-layer DAG — each layer doing only the minimum work.
The data flow pipeline (three layers)
Your SQL statement
│
▼
CDC trigger fires (same transaction)
Change buffer receives one row
│
▼
Background scheduler fires (within ~1 second)
│
├──▶ [Layer 1] Refresh department_tree
│ delta query reads change buffer
│ MERGE touches only affected rows in department_tree
│ department_tree's own change buffer is updated
│
├──▶ [Layer 2] Refresh department_stats
│ delta query reads department_tree's change buffer
│ MERGE touches only affected department groups
│
└──▶ [Layer 3] Refresh department_report
delta query reads department_stats' change buffer
MERGE touches only affected division rows
All change buffers cleaned up ✓
All three layers run in a single scheduled pass, in topological order.
2.4a: INSERT ripples through all three layers
INSERT INTO employees (name, department_id, salary) VALUES
('Heidi', 6, 105000); -- New Frontend engineer
What happened immediately (in your transaction): The AFTER INSERT trigger on employees fired and wrote one row to pgtrickle_changes.changes_<employees_oid>. The row contains the new values, action type I, and the LSN at the time of insert. Your transaction committed normally — no blocking.
The stream tables don't know about Heidi yet. The change is in the buffer, waiting for the next refresh.
The background scheduler handles this automatically. With a 1-second schedule,
department_statsanddepartment_reportrefresh within about a second.To confirm a refresh has happened, check
data_timestampin the monitoring view:SELECT name, data_timestamp, staleness FROM pgtrickle.pgt_status();To force an immediate synchronous refresh, wait a moment first (so the scheduler can finish its current tick), then call in topological order. Note that
refresh_stream_tableonly refreshes the named table — it does not cascade upstream:SELECT pg_sleep(2); -- let the scheduler finish any in-progress tick SELECT pgtrickle.refresh_stream_table('department_stats'); SELECT pgtrickle.refresh_stream_table('department_report');
What happened across the three layers:
| Layer | What ran | Rows touched |
|---|---|---|
department_tree | No change — employees is not a source for this ST | 0 |
department_stats | Delta query: read 1 buffer row, join to Frontend, COUNT+1, SUM+105000 | 1 (Frontend group only) |
department_report | Delta query: read 1 change from dept_stats, SUM += 1 headcount, += 105000 | 1 (Engineering row only) |
Check the result:
SELECT department_name, headcount, total_salary FROM department_stats
WHERE department_name = 'Frontend';
department_name | headcount | total_salary
-----------------+-----------+--------------
Frontend | 2 | 215000.00
The 6 other groups in department_stats were not touched at all.
Contrast with a standard materialized view:
REFRESH MATERIALIZED VIEWwould re-scan all 8 employees, re-join with all 7 departments, re-aggregate, and update all 7 rows. With pg_trickle, the work was proportional to the 1 changed row — across all three layers.
2.4b: A department change cascades through the whole DAG
Now change the departments table — the root of the entire chain:
INSERT INTO departments (id, name, parent_id) VALUES
(8, 'DevOps', 2); -- New team under Engineering
What happened: The CDC trigger on departments fired. The change buffer for departments has one new row. None of the stream tables know about it yet.
The scheduler handles this automatically — all three tables will refresh within a second in the correct dependency order (upstream first). To force it synchronously, wait a moment first then refresh each table in topological order (
refresh_stream_tabledoes not cascade upstream):SELECT pg_sleep(2); SELECT pgtrickle.refresh_stream_table('department_tree'); SELECT pgtrickle.refresh_stream_table('department_stats'); SELECT pgtrickle.refresh_stream_table('department_report');
What happened across all three layers:
| Layer | What ran | Rows touched |
|---|---|---|
department_tree | Semi-naive evaluation: base case finds new dept, recursive term computes its path. Result: 1 new row | 1 inserted |
department_stats | Delta query reads new row from dept_tree's change buffer; DevOps has 0 employees so delta is minimal | 1 inserted (headcount=0) |
department_report | Delta on Engineering row: headcount stays the same (DevOps has 0 employees) | 0 effective changes |
How the recursive CTE refresh works — unlike department_stats, recursive CTEs can't be algebraically differentiated (the recursion references itself). pg_trickle uses incremental fixpoint strategies:
- INSERT → semi-naive evaluation: differentiate the base case, propagate the delta through the recursive term, stopping when no new rows are produced. Only new rows inserted.
- DELETE or UPDATE → Delete-and-Rederive (DRed): remove rows derived from deleted facts, re-derive rows that may have alternative derivation paths, handle cascades cleanly.
SELECT id, name, depth, path FROM department_tree WHERE name = 'DevOps';
id | name | depth | path
----+--------+-------+--------------------------------
8 | DevOps | 2 | Company > Engineering > DevOps
(1 row)
The recursive CTE automatically expanded to include the new department at the correct depth and path. One inserted row in departments produced one new row in the stream table.
2.4c: UPDATE — A single rename that cascades everywhere
Rename "Engineering" to "R&D":
UPDATE departments SET name = 'R&D' WHERE id = 2;
What happened in the change buffer: The CDC trigger captured the old row (name='Engineering') and the new row (name='R&D'). Both old and new values are stored so the delta can compute what to remove and what to add.
Wait a moment for the scheduler to propagate the rename through all layers. To force it synchronously, wait then refresh each table in topological order (refresh_stream_table does not cascade upstream):
SELECT pg_sleep(2);
SELECT pgtrickle.refresh_stream_table('department_tree');
SELECT pgtrickle.refresh_stream_table('department_stats');
SELECT pgtrickle.refresh_stream_table('department_report');
What happened across all three layers:
| Layer | Work done | Result |
|---|---|---|
department_tree | DRed strategy: delete rows derived with old name, re-derive with new name. 5 rows updated (Engineering + 4 sub-teams) | Paths now say Company > R&D > … |
department_stats | Delta reads 5 changed rows from dept_tree's buffer; updates full_path column for those 5 departments | 5 rows updated |
department_report | Division name changed: "Engineering" row replaced by "R&D" row | 1 DELETE + 1 INSERT |
Query to verify the cascade:
SELECT name, path FROM department_tree WHERE path LIKE '%R&D%' ORDER BY depth, name;
name | path
----------+--------------------------
R&D | Company > R&D
Backend | Company > R&D > Backend
DevOps | Company > R&D > DevOps
Frontend | Company > R&D > Frontend
Platform | Company > R&D > Platform
(5 rows)
One UPDATE to a department name flowed through all three layers automatically — updating 5 + 5 + 2 rows across the chain.
2.4d: DELETE — Remove an employee
DELETE FROM employees WHERE name = 'Bob';
What happened: The AFTER DELETE trigger on employees fired, writing a change buffer row with action type D and Bob's old values (department_id=5, salary=115000). The delta query will use these old values to compute the correct aggregate adjustment — it knows to subtract 115000 from Backend's salary sum and decrement the count.
Important — refresh before querying: The background scheduler refreshes all three tables within ~1 second, in topological order. To see the result immediately, wait a moment then explicitly refresh in upstream-first order:
SELECT pg_sleep(2);
SELECT pgtrickle.refresh_stream_table('department_stats');
SELECT pgtrickle.refresh_stream_table('department_report');
Why call
department_statsfirst?department_statssources from bothemployeesanddepartment_tree. Refreshing in topological order ensures each layer processes its upstream changes before computing its own deltas. Even whendepartment_treehas unprocessed changes from step 4c and a new employee change arrives simultaneously, pg_trickle's differential engine handles both correctly — using the pre-change left snapshot (L₀) to avoid double-counting.
Then verify the result:
SELECT department_name, headcount, total_salary, avg_salary
FROM department_stats WHERE department_name = 'Backend';
department_name | headcount | total_salary | avg_salary
-----------------+-----------+--------------+---------------------
Backend | 1 | 120000.00 | 120000.000000000000
(1 row)
Headcount dropped from 2 → 1 and the salary aggregates updated. Again, only the Backend group was touched — the other 6 department rows were untouched.
Chapter 3: Scheduling & Backpressure
Automatic Scheduling — Let the DAG Drive Itself
pg_trickle runs a background scheduler that automatically refreshes stale tables in topological order. In the Step 4 examples above, the scheduler handled every change within about a second. You can also call refresh_stream_table() directly when needed (e.g. in scripts or tests), but in normal operation the scheduler takes care of everything.
How schedules propagate
We gave department_report a '1s' schedule and the two upstream tables a NULL schedule (CALCULATED mode). This is the recommended pattern:
department_tree (CALCULATED → inherits 1s from downstream)
│
department_stats (CALCULATED → inherits 1s from downstream)
│
department_report (1s — the only explicit schedule)
CALCULATED mode (pass schedule => 'calculated') means: compute the tightest schedule across all downstream dependents. You declare freshness requirements at the tables your application queries — the system figures out how often each upstream table needs to refresh.
What the scheduler does every second
- Queries the catalog for stream tables past their freshness bound
- Sorts them topologically (upstream first) —
department_treerefreshes beforedepartment_stats, which refreshes beforedepartment_report - Runs each refresh (respecting
pg_trickle.max_concurrent_refreshes) - Updates the last-refresh frontier
Monitoring
-- Current status of all stream tables
SELECT name, status, refresh_mode, schedule, data_timestamp, staleness
FROM pgtrickle.pgt_status();
name | status | refresh_mode | schedule | data_timestamp | staleness
-----------------------------+--------+---------------+----------+-----------------------------+-----------------
public.department_tree | ACTIVE | DIFFERENTIAL | | 2026-02-26 10:30:00.123+01 | 00:00:00.877
public.department_stats | ACTIVE | DIFFERENTIAL | | 2026-02-26 10:30:00.456+01 | 00:00:00.544
public.department_report | ACTIVE | DIFFERENTIAL | 1s | 2026-02-26 10:30:00.789+01 | 00:00:00.211
-- Detailed performance stats
SELECT pgt_name, total_refreshes, avg_duration_ms, successful_refreshes
FROM pgtrickle.pg_stat_stream_tables;
-- Health check: quick triage of common issues
SELECT check_name, severity, detail FROM pgtrickle.health_check();
-- Visualize the dependency DAG
SELECT * FROM pgtrickle.dependency_tree();
-- Recent refresh timeline across all stream tables
SELECT * FROM pgtrickle.refresh_timeline(10);
-- Check CDC change buffer sizes (spotting buffer build-up)
SELECT * FROM pgtrickle.change_buffer_sizes();
See SQL_REFERENCE.md for the full list of monitoring functions including list_sources(), trigger_inventory(), and diamond_groups().
Chapter 4: Monitoring In Depth
All the monitoring capabilities from the monitoring quick reference above, expanded. For the five most important day-to-day introspection queries see the Monitoring Quick Reference at the end of this guide.
Optional: WAL-based CDC
By default pg_trickle uses triggers. If wal_level = logical is configured, set:
ALTER SYSTEM SET pg_trickle.cdc_mode = 'auto';
SELECT pg_reload_conf();
pg_trickle will automatically transition each stream table from trigger-based to WAL-based capture after the first successful refresh — reducing per-write overhead from ~2–15 μs (triggers) to near-zero (WAL-based capture adds no synchronous overhead to your DML). The transition is transparent; your queries and the refresh schedule are unaffected.
Optional: Parallel Refresh (v0.4.0+)
By default the scheduler refreshes stream tables sequentially in topological order within a single background worker. This is correct and efficient for most workloads.
For deployments with many independent stream tables, enable parallel refresh:
ALTER SYSTEM SET pg_trickle.parallel_refresh_mode = 'on';
ALTER SYSTEM SET pg_trickle.max_dynamic_refresh_workers = 4; -- cluster-wide cap
SELECT pg_reload_conf();
Independent stream tables at the same DAG level will then refresh concurrently in separate dynamic background workers. Refresh pairs with IMMEDIATE-trigger connections and atomic consistency groups still run in a single worker for correctness.
Before enabling, ensure max_worker_processes has enough room:
max_worker_processes >= 1 (launcher)
+ number of databases with stream tables
+ max_dynamic_refresh_workers (default 4)
+ autovacuum and other extension workers
Monitor parallel refresh:
SELECT * FROM pgtrickle.worker_pool_status(); -- live worker budget
SELECT * FROM pgtrickle.parallel_job_status(60); -- recent jobs
See CONFIGURATION.md — Parallel Refresh for the complete tuning reference.
Optional: PgBouncer / Connection Pooler Compatibility (v0.10.0+)
If you're connecting through PgBouncer or another connection pooler in transaction mode (the default on Supabase, Railway, Neon, and most managed PostgreSQL platforms), set pooler_compatibility_mode when creating or altering a stream table:
SELECT pgtrickle.create_stream_table(
name => 'live_headcount',
query => 'SELECT department_id, COUNT(*) FROM employees GROUP BY 1',
schedule => '1s',
pooler_compatibility_mode => true
);
This disables prepared statements and NOTIFY emissions for that table — the two features that break in transaction-pool mode. Leave it off (the default) if you connect directly to PostgreSQL.
Optional: Change Buffer Compaction (v0.10.0+)
For high-churn tables, pg_trickle automatically compacts the pending change buffer before each refresh cycle when it exceeds pg_trickle.compact_threshold (default 100,000 rows). INSERT→DELETE pairs that cancel each other out are eliminated, and multiple changes to the same row are collapsed to a single net change, reducing delta scan overhead by 50–90%.
Chapter 5: Advanced Topics
Refresh Modes and IVM Strategies
You've now seen the IVM strategies pg_trickle uses for incremental view maintenance. Understanding the four refresh modes and when each strategy applies helps you write efficient stream table queries.
The Four Refresh Modes
| Mode | When it refreshes | Use case |
|---|---|---|
| AUTO (default) | On a schedule (background) | Most use cases — uses DIFFERENTIAL when possible, falls back to FULL automatically |
| DIFFERENTIAL | On a schedule (background) | Like AUTO but errors if the query can't be differentiated |
| FULL | On a schedule (background) | Forces full recompute every cycle |
| IMMEDIATE | Synchronously, in the same transaction as the DML | Real-time dashboards, audit tables — the stream table is always up-to-date |
When you omit refresh_mode, the default is 'AUTO' — it uses differential (delta-only) maintenance when the query supports it, and automatically falls back to full recomputation when it doesn't. You only need to specify a mode explicitly for advanced cases.
IMMEDIATE mode (new in v0.2.0) maintains stream tables synchronously within the same transaction as the base table DML. It uses statement-level AFTER triggers with transition tables — no change buffers, no scheduler. The stream table is always consistent with the current transaction.
-- Create a stream table that updates in real-time
SELECT pgtrickle.create_stream_table(
name => 'live_headcount',
query => $$
SELECT department_id, COUNT(*) AS headcount
FROM employees
GROUP BY department_id
$$,
refresh_mode => 'IMMEDIATE'
);
-- After any INSERT/UPDATE/DELETE on employees,
-- live_headcount is already up-to-date — no refresh needed!
IMMEDIATE mode supports joins, aggregates, window functions, LATERAL subqueries, and cascading IMMEDIATE stream tables. Recursive CTEs are not supported in IMMEDIATE mode (use DIFFERENTIAL instead).
You can switch between modes at any time:
-- Switch from DIFFERENTIAL to IMMEDIATE
SELECT pgtrickle.alter_stream_table('department_stats', refresh_mode => 'IMMEDIATE');
-- Switch back to DIFFERENTIAL with a schedule
SELECT pgtrickle.alter_stream_table('department_stats', refresh_mode => 'DIFFERENTIAL', schedule => '1s');
Algebraic Differentiation (used by department_stats)
For queries composed of scans, filters, joins, and algebraic aggregates (COUNT, SUM, AVG), pg_trickle can derive the IVM delta mathematically. The rules come from the theory of DBSP (Database Stream Processing):
| Operator | Delta Rule | Cost |
|---|---|---|
| Scan | Read only change buffer rows (not the full table) | O(changes) |
| Filter (WHERE) | Apply predicate to change rows | O(changes) |
| Join | Join change rows from one side against the full other side | O(changes × lookup) |
| Aggregate (COUNT/SUM/AVG) | Add or subtract deltas per group — no rescan | O(affected groups) |
| Project | Pass through | O(changes) |
The total cost is proportional to the number of changes, not the table size. For a million-row table with 10 changes, the delta query touches ~10 rows.
Incremental Strategies for Recursive CTEs (used by department_tree)
For recursive CTEs, pg_trickle can't derive an algebraic delta because the recursion references itself. Instead it uses two complementary strategies, chosen automatically based on what changed:
Semi-naive evaluation (for INSERT-only changes):
- Differentiate the base case — find the new seed rows
- Propagate the delta through the recursive term, iterating until no new rows are produced
- The result is only the new rows created by the change — not the whole tree
Delete-and-Rederive (DRed) (for DELETE or UPDATE):
- Remove all rows derived from the old fact
- Re-derive rows that had the old fact as one of their derivation paths (they may still be reachable via other paths)
- Insert the newly derived rows under the new fact
Both strategies are more efficient than full recomputation — they work on the affected portion of the result set, not the entire recursive query. The MERGE only modifies rows that actually changed.
When to use which strategy?
You don't choose — pg_trickle detects the strategy automatically based on the query structure:
| Query Pattern | Strategy | Performance |
|---|---|---|
| Scan + Filter + Join + algebraic Aggregate (COUNT/SUM/AVG) | Algebraic | Excellent — O(changes) |
CORR, COVAR_POP/SAMP, REGR_* (12 functions) | Algebraic (Welford running totals) | O(changes) — running totals updated per changed row, no group rescan (v0.10.0+) |
| Non-recursive CTEs | Algebraic (inlined) | CTE body is differentiated inline |
MIN / MAX aggregates | Semi-algebraic | Uses LEAST/GREATEST merge; per-group rescan only when an extremum is deleted |
STRING_AGG, ARRAY_AGG, ordered-set aggregates | Group-rescan | Affected groups fully re-aggregated from source |
GROUPING SETS / CUBE / ROLLUP | Algebraic (rewritten) | Auto-expanded to UNION ALL of GROUP BY queries; CUBE capped at 64 branches |
Recursive CTEs (WITH RECURSIVE) INSERT | Semi-naive evaluation | O(new rows derived from the change) |
Recursive CTEs (WITH RECURSIVE) DELETE/UPDATE | Delete-and-Rederive | Re-derives rows with alternative paths; O(affected subgraph) (v0.10.0+) |
| LATERAL subqueries | Correlated re-evaluation | Only outer rows correlated with changed inner data re-evaluated — O(correlated rows) (v0.10.0+) |
| Window functions | Partition recompute | Only affected partitions recomputed |
ORDER BY … LIMIT N (TopK) | Scoped recomputation | Re-evaluates top-N via MERGE; stores exactly N rows |
| IMMEDIATE mode queries | In-transaction delta | Same algebraic strategies, applied synchronously via transition tables |
FUSE Circuit Breaker (v0.11.0+)
The fuse is a circuit breaker that stops a stream table from processing an unexpectedly large batch of changes — for example from a runaway script or mass-delete migration — without operator review.
-- Arm a fuse: blow when pending changes exceed 50 000 rows
SELECT pgtrickle.alter_stream_table(
'category_summary',
fuse => 'on',
fuse_ceiling => 50000
);
-- Check fuse status across all stream tables
SELECT name, fuse_mode, fuse_state, fuse_ceiling, blown_at
FROM pgtrickle.fuse_status();
-- After investigating and deciding to apply the batch:
SELECT pgtrickle.reset_fuse('category_summary', action => 'apply');
-- Or skip the oversized batch entirely and resume from current state:
SELECT pgtrickle.reset_fuse('category_summary', action => 'skip_changes');
reset_fuse supports three actions:
'apply'— process all pending changes and resume normal scheduling.'reinitialize'— drop and repopulate the stream table from scratch.'skip_changes'— discard pending changes and resume from the current frontier.
A pgtrickle_alert NOTIFY is emitted when the fuse blows, making it easy to
hook into alerting pipelines or LISTEN from application code.
Partitioned Stream Tables (v0.11.0+)
For large stream tables, declare a partition key at creation time so MERGE operations are scoped to only the relevant partitions:
SELECT pgtrickle.create_stream_table(
name => 'sales_by_month',
query => $$
SELECT
DATE_TRUNC('month', sale_date) AS month,
product_id,
SUM(amount) AS total_sales
FROM sales
GROUP BY 1, 2
$$,
schedule => '1m',
partition_by => 'month' -- partition key must be in the SELECT output
);
pg_trickle creates the storage table as PARTITION BY RANGE (month) with a
catch-all partition, then on each refresh:
- Inspects the delta to find the
MINandMAXof the partition key. - Injects
AND st.month BETWEEN min AND maxinto the MERGE ON clause. - PostgreSQL prunes all partitions outside the range — giving ~100× I/O reduction for a 0.1% change rate on a 10M-row table.
See SQL_REFERENCE.md for full partitioning options.
IMMEDIATE Mode — Real-Time In-Transaction IVM
-- Create a stream table that updates in the same transaction as its source
SELECT pgtrickle.create_stream_table(
name => 'live_headcount',
query => $$
SELECT department_id, COUNT(*) AS headcount
FROM employees
GROUP BY department_id
$$,
refresh_mode => 'IMMEDIATE'
);
-- After any INSERT/UPDATE/DELETE on employees, live_headcount is already up-to-date:
INSERT INTO employees (name, department_id, salary) VALUES ('Zara', 2, 95000);
SELECT * FROM live_headcount WHERE department_id = 2; -- 4 rows, immediately
IMMEDIATE mode uses statement-level AFTER triggers with transition tables —
no change buffers, no scheduler, no background workers. The stream table is
always consistent with the current transaction. Ideal for audit tables,
real-time dashboards, and applications that need zero-latency reads.
Multi-Tenant Worker Quotas (v0.11.0+)
In deployments with multiple databases, one busy database can starve others
if all dynamic refresh workers are claimed. The per_database_worker_quota
GUC prevents this:
-- Limit one performance-critical database to 4 workers (with burst to 6)
ALTER DATABASE analytics SET pg_trickle.per_database_worker_quota = 4;
-- Allow a reporting database only 2 base workers
ALTER DATABASE reporting SET pg_trickle.per_database_worker_quota = 2;
-- Apply changes
SELECT pg_reload_conf();
When the cluster has spare capacity (active workers < 80% of
max_dynamic_refresh_workers), a database may temporarily burst to 150% of
its quota. Burst is reclaimed within 1 scheduler cycle once load rises.
Within each dispatch tick, IMMEDIATE-trigger closures are always dispatched
first, followed by atomic groups, singletons, and cyclic SCCs.
See CONFIGURATION.md for full quota tuning options.
Clean Up
When you're done experimenting, drop the stream tables. Drop dependents before their sources:
SELECT pgtrickle.drop_stream_table('department_report');
SELECT pgtrickle.drop_stream_table('department_stats');
SELECT pgtrickle.drop_stream_table('department_tree');
DROP TABLE employees;
DROP TABLE departments;
drop_stream_table atomically removes in a single transaction:
- The storage table (e.g.,
public.department_stats) - CDC triggers on source tables (removed only if no other stream table references the same source)
- Change buffer tables in
pgtrickle_changes - Catalog entries in
pgtrickle.pgt_stream_tables
Monitoring Quick Reference
pg_trickle ships several built-in monitoring functions and a ready-made Prometheus/Grafana stack. Here are the five most useful functions for day-to-day operations.
Stream Table Status
-- Overview of all stream tables: status, staleness, last refresh time, errors
SELECT name, status, staleness, last_refresh_at, last_error
FROM pgtrickle.pgt_status();
Health Check
-- Run all built-in health checks; returns severity (OK/WARNING/CRITICAL) per check
SELECT check_name, severity, detail FROM pgtrickle.health_check();
Change Buffer Sizes
-- Show CDC buffer row counts per source table — useful for spotting backlogs
SELECT * FROM pgtrickle.change_buffer_sizes();
Dependency Tree
-- Visualize the DAG: which stream tables depend on what
SELECT * FROM pgtrickle.dependency_tree();
Fuse Status
-- Check circuit breaker state for all stream tables (v0.11.0+)
SELECT * FROM pgtrickle.fuse_status();
Prometheus & Grafana
For production monitoring, pg_trickle ships a ready-made observability stack
in the monitoring/ directory:
cd monitoring && docker compose up
This starts PostgreSQL + postgres_exporter + Prometheus + Grafana with
pre-configured dashboards and alerting rules. Grafana is available at
http://localhost:3000 (admin/admin). See monitoring/README.md
for the full list of exported metrics and alert conditions.
Key Prometheus metrics:
| Metric | Description |
|---|---|
pgtrickle_refresh_total | Cumulative refresh count per table |
pgtrickle_refresh_duration_seconds | Last refresh duration per table |
pgtrickle_staleness_seconds | Seconds since last successful refresh |
pgtrickle_consecutive_errors | Current error streak per table |
pgtrickle_cdc_buffer_rows | Pending change buffer rows per source table |
Pre-configured alerts: staleness > 5 min, ≥3 consecutive failures, table SUSPENDED, CDC buffer > 1 GB, scheduler down, high refresh duration.
Summary: What You Learned
| Concept | What you saw |
|---|---|
| Stream tables | Tables defined by a SQL query that stay automatically up to date |
| CDC triggers | Lightweight change capture in the same transaction — no logical replication or polling required |
| DAG scheduling | Stream tables can depend on other stream tables; refreshes run in topological order, schedules propagate upstream via CALCULATED mode |
| Algebraic IVM | Delta queries that process only changed rows — O(changes) regardless of table size |
| Semi-naive / DRed | Incremental strategies for WITH RECURSIVE — INSERT uses semi-naive, DELETE/UPDATE uses Delete-and-Rederive (v0.10.0+) |
| IMMEDIATE mode | Synchronous in-transaction IVM — stream tables updated within the same transaction as your DML, always consistent |
| TopK | ORDER BY … LIMIT N queries store exactly N rows, refreshed via scoped recomputation |
| Diamond consistency | Atomic refresh groups for diamond-shaped dependency graphs via diamond_consistency = 'atomic' |
| Downstream propagation | A single base table write cascades through an entire chain of stream tables, automatically, in the right order |
| Trigger-based CDC | Lightweight row-level triggers by default (no WAL configuration needed); optional transition to WAL-based capture via pg_trickle.cdc_mode = 'auto' |
| Parallel refresh | Independent stream tables refresh concurrently in dynamic background workers via pg_trickle.parallel_refresh_mode = 'on' (v0.4.0+, default off) |
| auto_backoff | Scheduler automatically stretches effective interval when refresh cost exceeds 95% of the schedule window, capped at 8× (on by default, v0.10.0+) |
| PgBouncer compatibility | Set pooler_compatibility_mode => true per stream table to work behind transaction-mode connection poolers (v0.10.0+) |
| Monitoring | pgt_status(), health_check(), dependency_tree(), pg_stat_stream_tables, and more for freshness, timing, and error history |
The key takeaway: you write to base tables — pg_trickle does the rest. Data flows downstream automatically, each layer doing the minimum work proportional to what changed, in dependency order.
Troubleshooting
Stream table is stale / not refreshing
Check the status view first:
SELECT name, status, last_error, last_refresh_at, staleness FROM pgtrickle.pgt_status();
A status of ERROR means the last refresh failed. last_error contains the message. Fix the underlying issue (e.g., a dropped column referenced in the query) then call:
SELECT pgtrickle.refresh_stream_table('your_table');
For a broader health check:
SELECT check_name, severity, detail FROM pgtrickle.health_check();
Change buffer growing large
If a stream table has status = 'PAUSED' or refreshes are falling behind:
SELECT * FROM pgtrickle.change_buffer_sizes(); -- find large buffers
Large buffers are normal under heavy load — auto_backoff slows the schedule to avoid CPU runaway and will self-correct once throughput stabilises. If a buffer stays large indefinitely, check last_error in pgt_status() for a blocked refresh.
CDC triggers missing after restore / point-in-time recovery
PITR restores the heap table but not the triggers if the extension was installed after the base backup. Verify:
SELECT * FROM pgtrickle.trigger_inventory(); -- expected vs installed triggers
Any missing trigger can be reinstalled with:
SELECT pgtrickle.repair_stream_table('your_table');
Deployment Best Practices
Once you've built your stream tables interactively, you'll want to deploy them reliably — via SQL migration scripts, dbt, or GitOps pipelines.
Kubernetes Deployment (CloudNativePG)
pg_trickle integrates natively with CloudNativePG
using Image Volume Extensions (Kubernetes 1.33+). The extension is packaged
as a scratch-based OCI image containing only the .so, .control, and .sql
files — no custom PostgreSQL image required.
Prerequisites
- Kubernetes 1.33+ with the
ImageVolumefeature gate enabled - CloudNativePG operator 1.28+
- pg_trickle extension image pushed to your cluster registry
Quick Start
- Deploy the Cluster with the extension mounted as an Image Volume:
# cnpg/cluster-example.yaml (abridged)
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: pg-trickle-demo
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:18
postgresql:
shared_preload_libraries:
- pg_trickle
extensions:
- name: pg-trickle
image:
reference: ghcr.io/<owner>/pg_trickle-ext:<version>
parameters:
max_worker_processes: "8"
- Create the extension declaratively with a CNPG Database resource:
# cnpg/database-example.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Database
metadata:
name: pg-trickle-app
spec:
name: app
owner: app
cluster:
name: pg-trickle-demo
extensions:
- name: pg_trickle
- Apply both resources:
kubectl apply -f cnpg/cluster-example.yaml
kubectl apply -f cnpg/database-example.yaml
Full example manifests are in the cnpg/ directory.
Health Monitoring
CNPG manages PostgreSQL liveness/readiness probes via its instance manager. For pg_trickle-specific health, use the built-in health check function:
-- Run against the primary or any replica:
SELECT * FROM pgtrickle.health_check();
This returns rows for scheduler status, error/suspended tables, stale tables, CDC buffer growth, WAL slot lag, and worker pool utilization. Integrate it into your monitoring stack:
- Prometheus: Use the CNPG monitoring integration to expose
pgtrickle.health_check()results as custom metrics - Kubernetes CronJob: Schedule periodic health checks and alert via your existing alerting pipeline
- pgtrickle-tui: The TUI tool has a dedicated Health view that
polls
health_check()continuously
Probe Configuration
The example manifests include probe settings tuned for pg_trickle workloads:
probes:
startup:
periodSeconds: 10
failureThreshold: 60 # 10 min for shared_preload_libraries init
liveness:
periodSeconds: 10
failureThreshold: 6 # 60s before restart
readiness:
type: streaming
maximumLag: 64Mi # replicas must be streaming before serving reads
Why readiness: streaming? Stream tables are readable on replicas, but
a lagging replica serves stale stream table data. The maximumLag setting
ensures replicas are caught up before receiving traffic.
Failover Behavior
When the primary pod fails and CNPG promotes a replica:
- Scheduler: The new primary starts the pg_trickle scheduler background
worker automatically (registered via
shared_preload_libraries) - Stream tables: All stream table definitions are stored in the
pgtrickle.pgt_stream_tablescatalog table, which is replicated to all replicas. The promoted replica has the complete catalog. - CDC triggers: Trigger definitions are replicated as part of the WAL stream. The new primary's triggers fire normally on new writes.
- Change buffers: Uncommitted change buffer rows from in-flight transactions on the old primary are lost (standard PostgreSQL behavior). The next refresh cycle detects the gap and performs a FULL refresh to resynchronize.
- Refresh frontiers: Each stream table's last-refresh frontier is stored in the catalog. If the frontier is ahead of the available change buffer data (due to WAL replay lag), the scheduler falls back to FULL refresh once and then resumes DIFFERENTIAL.
No manual intervention is required after failover.
Idempotent SQL Migrations
Use create_or_replace_stream_table() in your migration scripts. It's safe to
run on every deploy:
-- migrations/V003__stream_tables.sql
-- Creates if absent, updates if definition changed, no-op if identical.
SELECT pgtrickle.create_or_replace_stream_table(
name => 'employee_salaries',
query => 'SELECT e.id, e.name, d.name AS department, e.salary
FROM employees e JOIN departments d ON e.department_id = d.id',
schedule => '30s',
refresh_mode => 'DIFFERENTIAL'
);
SELECT pgtrickle.create_or_replace_stream_table(
name => 'department_stats',
query => 'SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM employee_salaries GROUP BY department',
schedule => '30s',
refresh_mode => 'DIFFERENTIAL'
);
If someone changes the query in a later migration, create_or_replace detects
the difference and migrates the storage table in place — no need to drop and
recreate.
dbt Integration
With the dbt-pgtrickle
package, stream tables are just dbt models with materialized='stream_table':
-- models/department_stats.sql
{{ config(
materialized='stream_table',
schedule='30s',
refresh_mode='DIFFERENTIAL'
) }}
SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM {{ ref('employee_salaries') }}
GROUP BY department
Every dbt run calls create_or_replace_stream_table() under the hood,
so deployments are always idempotent.
Day 2 Operations
Added in v0.20.0 (UX-4).
Once your stream tables are running in production, pg_trickle can monitor itself using its own stream tables — a technique called dog-feeding.
Enabling Dog-Feeding
-- Create all five monitoring stream tables (idempotent, safe to repeat).
SELECT pgtrickle.setup_dog_feeding();
-- Check what was created.
SELECT * FROM pgtrickle.dog_feeding_status();
This creates five stream tables in the pgtrickle schema:
| Stream Table | Purpose |
|---|---|
df_efficiency_rolling | Rolling-window refresh statistics (replaces manual refresh_efficiency() calls) |
df_anomaly_signals | Detects duration spikes, error bursts, mode oscillation |
df_threshold_advice | Recommends threshold adjustments based on multi-cycle analysis |
df_cdc_buffer_trends | Tracks CDC buffer growth rates per source table |
df_scheduling_interference | Detects concurrent refresh overlap patterns |
Checking Recommendations
After at least 10–20 refresh cycles have accumulated:
-- Which stream tables have poorly calibrated thresholds?
SELECT pgt_name, current_threshold, recommended_threshold, confidence, reason
FROM pgtrickle.df_threshold_advice
WHERE confidence IN ('HIGH', 'MEDIUM')
AND abs(recommended_threshold - current_threshold) > 0.05;
-- Are any stream tables experiencing anomalies?
SELECT pgt_name, duration_anomaly, recent_failures
FROM pgtrickle.df_anomaly_signals
WHERE duration_anomaly IS NOT NULL OR recent_failures >= 2;
Automatic Threshold Tuning
To let pg_trickle automatically apply threshold recommendations:
SET pg_trickle.dog_feeding_auto_apply = 'threshold_only';
This applies changes only when confidence is HIGH and the recommended threshold
differs by more than 5%. Changes are rate-limited to once per 10 minutes per
stream table and logged with initiated_by = 'DOG_FEED'.
Visualizing the DAG
-- See the full refresh graph (Mermaid format, paste into any Mermaid renderer).
SELECT pgtrickle.explain_dag();
Dog-feeding STs appear in green, user STs in blue, suspended in red.
Disabling Dog-Feeding
SELECT pgtrickle.teardown_dog_feeding();
This drops all monitoring stream tables. User stream tables are never affected. The control plane continues operating identically without dog-feeding.
What's Next?
- TUI.md — Terminal UI & CLI tool for managing and monitoring stream tables from outside SQL
- SQL_REFERENCE.md — Full API reference for all functions, views, and configuration
- ARCHITECTURE.md — Deep dive into the system architecture and data flow
- DVM_OPERATORS.md — How each SQL operator is differentiated for incremental maintenance
- CONFIGURATION.md — GUC variables for tuning schedule, concurrency, and cleanup behavior
- Flyway & Liquibase Integration — Migration patterns for Flyway and Liquibase
- ORM Integration — SQLAlchemy and Django ORM patterns for stream tables
- What Happens on INSERT — Detailed trace of a single INSERT through the entire pipeline
- What Happens on UPDATE — How UPDATEs are split into D+I, group key changes, and net-effect computation
- What Happens on DELETE — Reference counting, group deletion, and INSERT+DELETE cancellation
- What Happens on TRUNCATE — Why TRUNCATE bypasses triggers and how to recover
- dbt Getting Started example — Everything above, expressed as dbt models and seeds with a one-command Docker runner
Playground
The quickest way to explore pg_trickle is the playground — a pre-configured Docker environment with sample data and stream tables ready to query. No installation, no configuration. One command and you're running.
Quick Start
git clone https://github.com/grove/pg-trickle.git
cd pg-trickle/playground
docker compose up -d
Then connect:
psql postgresql://postgres:playground@localhost:5432/playground
PostgreSQL 18+ note: The Docker image stores data in a versioned subdirectory (
/var/lib/postgresql/18/main). The compose file mounts/var/lib/postgresql(not.../data) — this is intentional.
What's Pre-Loaded
The seed script creates three base tables and five stream tables that cover the most common pg_trickle patterns.
Base Tables
| Table | Description |
|---|---|
products | Product catalog with categories and prices |
orders | Order line items with quantities and timestamps |
customers | Customer profiles with regions |
Stream Tables
| Stream Table | Query | Pattern demonstrated |
|---|---|---|
sales_by_region | SUM(total) grouped by region | Basic aggregate, DIFFERENTIAL mode |
top_products | SUM(quantity) ranked by category | Window function (RANK()) |
customer_lifetime_value | Revenue + order count per customer | Multi-table join + aggregates |
daily_revenue | Revenue per day | Time-series aggregation |
active_products | Products with orders | EXISTS subquery |
Exercises
1. Watch an INSERT propagate
-- Current state
SELECT * FROM sales_by_region ORDER BY region;
-- Insert a new order
INSERT INTO orders (customer_id, product_id, quantity, order_date)
VALUES (1, 1, 10, CURRENT_DATE);
-- After ~1 s the stream table refreshes
SELECT * FROM sales_by_region ORDER BY region;
2. Inspect pg_trickle internals
-- Overall health
SELECT * FROM pgtrickle.health_check();
-- Status of all stream tables
SELECT name, status, refresh_mode, staleness
FROM pgtrickle.pgt_status()
ORDER BY name;
-- Recent refresh activity
SELECT start_time, stream_table, action, status, duration_ms
FROM pgtrickle.refresh_timeline(10);
-- Delta SQL for a stream table
SELECT pgtrickle.explain_st('sales_by_region');
-- Change buffer sizes
SELECT * FROM pgtrickle.change_buffer_sizes();
3. Update and Delete
-- Update a product price
UPDATE products SET price = 99.99 WHERE name = 'Widget';
-- customer_lifetime_value re-calculates
SELECT * FROM customer_lifetime_value ORDER BY total_revenue DESC LIMIT 5;
-- Delete a customer's orders
DELETE FROM orders WHERE customer_id = 3;
-- Stream tables reflect the removal
SELECT * FROM sales_by_region ORDER BY region;
4. Create your own stream table
SELECT pgtrickle.create_stream_table(
name => 'my_experiment',
query => $$
SELECT p.category,
COUNT(DISTINCT o.customer_id) AS unique_buyers,
SUM(o.quantity) AS total_units
FROM orders o
JOIN products p ON p.id = o.product_id
GROUP BY p.category
HAVING SUM(o.quantity) > 5
$$,
schedule => '2s'
);
SELECT * FROM my_experiment;
Tear Down
docker compose down -v
The -v flag removes the data volume. Omit it if you want to keep your changes.
Next Steps
- Getting Started Guide — full tutorial with an org-chart example
- SQL Reference — all functions and parameters
- Best-Practice Patterns — production-ready patterns
Best-Practice Patterns for pg_trickle
This guide covers common data modeling patterns and recommended configurations for pg_trickle stream tables. Each pattern includes worked SQL examples, anti-patterns to avoid, and refresh mode recommendations.
Version: v0.14.0+. Some features require recent versions — check SQL_REFERENCE.md for per-feature availability.
Table of Contents
- Pattern 1: Bronze / Silver / Gold Materialization
- Pattern 2: Event Sourcing with Stream Tables
- Pattern 3: Slowly Changing Dimensions (SCD)
- Pattern 4: High-Fan-Out Topology
- Pattern 5: Real-Time Dashboards
- Pattern 6: Tiered Refresh Strategy
- General Guidelines
Pattern 1: Bronze / Silver / Gold Materialization
A multi-layer approach where raw data flows through progressively refined stream tables, similar to a medallion architecture.
Architecture
[raw_events] ← Bronze: raw ingest table (regular table)
↓
[events_cleaned] ← Silver: filtered, deduplicated, typed
↓
[events_aggregated] ← Gold: business-level aggregates
SQL Example
-- Bronze: regular PostgreSQL table (source of truth)
CREATE TABLE raw_events (
event_id BIGSERIAL PRIMARY KEY,
user_id INT NOT NULL,
event_type TEXT NOT NULL,
payload JSONB,
received_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Silver: cleaned and deduplicated events
SELECT pgtrickle.create_stream_table(
'events_cleaned',
$$SELECT DISTINCT ON (event_id)
event_id,
user_id,
event_type,
(payload->>'amount')::numeric AS amount,
received_at
FROM raw_events
WHERE event_type IN ('purchase', 'refund', 'subscription')$$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
-- Gold: per-user purchase summary
SELECT pgtrickle.create_stream_table(
'user_purchase_summary',
$$SELECT user_id,
COUNT(*) AS total_purchases,
SUM(amount) AS total_spent,
AVG(amount) AS avg_order
FROM events_cleaned
WHERE event_type = 'purchase'
GROUP BY user_id$$,
schedule => 'calculated',
refresh_mode => 'DIFFERENTIAL'
);
Recommended Configuration
| Layer | Refresh Mode | Schedule | Tier |
|---|---|---|---|
| Silver | DIFFERENTIAL | 5s – 30s | hot |
| Gold | DIFFERENTIAL | calculated | hot |
Anti-Patterns
- Don't use FULL refresh for Silver. With frequent small inserts, DIFFERENTIAL is 10–100x faster.
- Don't skip the Silver layer. Joining raw tables directly in Gold queries produces wider joins and slower deltas.
- Don't use IMMEDIATE mode for Gold. Aggregate maintenance on every DML row is expensive — batched DIFFERENTIAL is more efficient.
Pattern 2: Event Sourcing with Stream Tables
Use stream tables as projections of an append-only event log. The source table is the event store; stream tables materialize different read models.
SQL Example
-- Event store (append-only source)
CREATE TABLE events (
event_id BIGSERIAL PRIMARY KEY,
aggregate_id UUID NOT NULL,
event_type TEXT NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Projection 1: Current state per aggregate
SELECT pgtrickle.create_stream_table(
'aggregate_state',
$$SELECT DISTINCT ON (aggregate_id)
aggregate_id,
event_type AS last_event,
payload AS current_state,
created_at AS last_updated
FROM events
ORDER BY aggregate_id, created_at DESC$$,
schedule => '2s',
refresh_mode => 'DIFFERENTIAL'
);
-- Projection 2: Event counts by type per hour
SELECT pgtrickle.create_stream_table(
'hourly_event_counts',
$$SELECT date_trunc('hour', created_at) AS hour,
event_type,
COUNT(*) AS event_count
FROM events
GROUP BY 1, 2$$,
schedule => '10s',
refresh_mode => 'DIFFERENTIAL'
);
Recommended Configuration
| Projection | Refresh Mode | Why |
|---|---|---|
| Current state | DIFFERENTIAL | Small delta per cycle; DISTINCT ON supported |
| Hourly counts | DIFFERENTIAL | Algebraic aggregate (COUNT), efficient delta |
| String aggregations | AUTO | GROUP_RESCAN aggs may benefit from FULL |
Anti-Patterns
- Don't DELETE from the event store. pg_trickle tracks changes via triggers; mixing append and delete on the source creates unnecessary delta complexity. Archive old events to a separate table.
- Don't use
append_only => truewith UPDATE/DELETE patterns. Theappend_onlyflag skips DELETE tracking in the change buffer — only use it when the source truly never updates or deletes.
Pattern 3: Slowly Changing Dimensions (SCD)
SCD Type 1: Overwrite
The stream table always reflects the current state. Source updates overwrite previous values.
-- Source: customer dimension table (updated in place)
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name TEXT NOT NULL,
email TEXT,
tier TEXT DEFAULT 'standard',
updated_at TIMESTAMPTZ DEFAULT now()
);
-- SCD-1: current customer state enriched with order stats
SELECT pgtrickle.create_stream_table(
'customer_360',
$$SELECT c.customer_id,
c.name,
c.email,
c.tier,
COUNT(o.id) AS total_orders,
COALESCE(SUM(o.amount), 0) AS lifetime_value
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, c.email, c.tier$$,
schedule => '30s',
refresh_mode => 'DIFFERENTIAL'
);
SCD Type 2: History Tracking
For SCD-2, maintain a history table with valid-from/valid-to ranges. The stream table provides the current snapshot.
-- Source: customer history with validity ranges
CREATE TABLE customer_history (
customer_id INT NOT NULL,
name TEXT NOT NULL,
tier TEXT NOT NULL,
valid_from TIMESTAMPTZ NOT NULL,
valid_to TIMESTAMPTZ, -- NULL = current
PRIMARY KEY (customer_id, valid_from)
);
-- Current active records only
SELECT pgtrickle.create_stream_table(
'customers_current',
$$SELECT customer_id, name, tier, valid_from
FROM customer_history
WHERE valid_to IS NULL$$,
schedule => '10s',
refresh_mode => 'DIFFERENTIAL'
);
Anti-Patterns
- Don't use FULL refresh for SCD-1 with large dimension tables. Customer tables with millions of rows but few changes per cycle are ideal for DIFFERENTIAL.
- Don't forget to index
valid_to IS NULLfor SCD-2 sources. Without it, the delta scan touches all historical rows.
Pattern 4: High-Fan-Out Topology
When a single source table feeds many downstream stream tables.
Architecture
[orders]
↙ ↓ ↓ ↘
[daily_totals] [by_region] [by_product] [top_customers]
SQL Example
-- Single source feeding multiple views
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
region TEXT NOT NULL,
product_id INT NOT NULL,
amount NUMERIC(10,2) NOT NULL,
order_date DATE NOT NULL DEFAULT CURRENT_DATE
);
-- Fan-out: 4 stream tables on 1 source
SELECT pgtrickle.create_stream_table('daily_totals',
'SELECT order_date, SUM(amount) AS daily_total, COUNT(*) AS order_count
FROM orders GROUP BY order_date',
schedule => '5s', refresh_mode => 'DIFFERENTIAL');
SELECT pgtrickle.create_stream_table('by_region',
'SELECT region, SUM(amount) AS total, COUNT(*) AS cnt
FROM orders GROUP BY region',
schedule => '5s', refresh_mode => 'DIFFERENTIAL');
SELECT pgtrickle.create_stream_table('by_product',
'SELECT product_id, SUM(amount) AS total, COUNT(*) AS cnt
FROM orders GROUP BY product_id',
schedule => '5s', refresh_mode => 'DIFFERENTIAL');
SELECT pgtrickle.create_stream_table('top_customers',
'SELECT customer_id, SUM(amount) AS lifetime_value, COUNT(*) AS order_count
FROM orders GROUP BY customer_id',
schedule => '10s', refresh_mode => 'DIFFERENTIAL');
Recommended Configuration
- All fan-out targets share the same source change buffer — CDC overhead
is paid once regardless of how many stream tables read from
orders. - Use
schedule => 'calculated'on downstream STs when they chain from other stream tables. - Consider
pg_trickle.max_workersif fan-out exceeds 8 (default: 4 workers).
Anti-Patterns
- Don't use IMMEDIATE mode on high-fan-out sources. Each DML row triggers N refreshes (one per downstream ST). Use DIFFERENTIAL with a batched schedule instead.
- Don't set different schedules on STs that should be consistent.
If
daily_totalsandby_regionmust agree, give them the same schedule or usediamond_consistency => 'atomic'.
Pattern 5: Real-Time Dashboards
For dashboards that need sub-second refresh latency.
SQL Example
-- Live order monitor (sub-second freshness)
SELECT pgtrickle.create_stream_table(
'order_monitor',
$$SELECT
date_trunc('minute', order_date) AS minute,
region,
COUNT(*) AS orders,
SUM(amount) AS revenue
FROM orders
WHERE order_date >= CURRENT_DATE
GROUP BY 1, 2$$,
schedule => '1s',
refresh_mode => 'DIFFERENTIAL'
);
-- For truly real-time needs, use IMMEDIATE mode (triggers on each DML)
SELECT pgtrickle.create_stream_table(
'live_counter',
$$SELECT region, COUNT(*) AS cnt, SUM(amount) AS total
FROM orders GROUP BY region$$,
schedule => 'IMMEDIATE',
refresh_mode => 'DIFFERENTIAL'
);
When to Use IMMEDIATE vs Scheduled DIFFERENTIAL
| Scenario | Mode | Why |
|---|---|---|
| Dashboard polls every 1s | 1s | Batched delta amortizes overhead |
| GraphQL subscription, < 100ms | IMMEDIATE | Triggers fire synchronously per DML |
| Aggregate with GROUP_RESCAN | 5s+ | Avoid per-row full rescans |
| High write throughput (>1K/s) | 2s–5s | IMMEDIATE adds latency to each INSERT |
Anti-Patterns
- Don't use IMMEDIATE for complex joins. Each INSERT/UPDATE/DELETE fires the full DVM delta SQL synchronously — multi-table joins in IMMEDIATE mode add significant latency to writes.
- Don't forget
pooler_compatibility_modewith PgBouncer. Transaction pooling drops temp tables between transactions; enable this flag to avoid stale PREPARE statements.
Pattern 6: Tiered Refresh Strategy
Assign refresh importance tiers to control scheduling priority.
-- Hot: real-time operational dashboard
SELECT pgtrickle.create_stream_table('live_metrics', ...);
SELECT pgtrickle.alter_stream_table('live_metrics', tier => 'hot');
-- Warm: hourly business reports (2x interval multiplier)
SELECT pgtrickle.create_stream_table('hourly_report', ...,
schedule => '1m');
SELECT pgtrickle.alter_stream_table('hourly_report', tier => 'warm');
-- Cold: daily analytics (10x interval multiplier)
SELECT pgtrickle.create_stream_table('daily_analytics', ...,
schedule => '5m');
SELECT pgtrickle.alter_stream_table('daily_analytics', tier => 'cold');
-- Frozen: archive/audit (skip refresh entirely)
SELECT pgtrickle.alter_stream_table('audit_log_summary', tier => 'frozen');
Tier Multipliers
| Tier | Schedule Multiplier | Use Case |
|---|---|---|
| hot | 1x | Operational dashboards, alerts |
| warm | 2x | Hourly reports, batch pipelines |
| cold | 10x | Daily analytics, low-priority STs |
| frozen | skip | Paused/archived, manual refresh |
General Guidelines
Choosing a Refresh Mode
| Scenario | Recommended Mode |
|---|---|
| Source has < 5% change ratio per cycle | DIFFERENTIAL |
| Source changes > 50% per cycle | FULL |
| Query is a simple filter/projection | DIFFERENTIAL |
| Query has GROUP_RESCAN aggregates (MIN, MAX) | AUTO |
| Query joins 4+ tables | DIFFERENTIAL |
| Target table < 1000 rows | FULL |
| Need per-row latency guarantee | IMMEDIATE |
Use pgtrickle.recommend_refresh_mode() (v0.14.0+) for automated
analysis:
SELECT pgt_name, recommended_mode, confidence, reason
FROM pgtrickle.recommend_refresh_mode();
Monitoring Checklist
-- Check refresh efficiency across all stream tables
SELECT pgt_name, refresh_mode, diff_speedup, avg_change_ratio
FROM pgtrickle.refresh_efficiency()
ORDER BY total_refreshes DESC;
-- Find stream tables that might benefit from mode change
SELECT pgt_name, current_mode, recommended_mode, reason
FROM pgtrickle.recommend_refresh_mode()
WHERE recommended_mode != 'KEEP';
-- Check for error states
SELECT pgt_name, status, last_error_message
FROM pgtrickle.stream_tables_info
WHERE status IN ('ERROR', 'SUSPENDED');
-- Export definitions for backup
SELECT pgtrickle.export_definition(pgt_schema || '.' || pgt_name)
FROM pgtrickle.pgt_stream_tables;
Common Mistakes
-
Using FULL refresh by default. Start with DIFFERENTIAL — it's correct for 80%+ of workloads. Switch to FULL only when
recommend_refresh_mode()suggests it. -
Over-scheduling. A 1-second schedule on a table with 1-hour change cycles wastes CPU. Match the schedule to actual data arrival rate.
-
Ignoring
append_only. If the source table is truly append-only (no UPDATEs, no DELETEs), setappend_only => trueto halve change buffer writes. -
Not using
calculatedschedule for chained STs. When ST-B reads from ST-A, useschedule => 'calculated'on ST-B to avoid unnecessary refreshes. The scheduler automatically propagates ST-A changes downstream. -
Mixing IMMEDIATE and complex joins. IMMEDIATE mode fires delta SQL on every DML — an 8-table join in IMMEDIATE mode adds 50–200ms to each INSERT. Use scheduled DIFFERENTIAL for complex queries.
Pre-Deployment Checklist
Complete this checklist before deploying pg_trickle to a new environment. Each item links to the relevant documentation for details.
Version: v0.14.0+. Earlier versions may have different requirements.
1. PostgreSQL Version
- PostgreSQL 18.x is required (pg_trickle is compiled against PG 18)
- Extension binary matches your exact PostgreSQL major version
SELECT version(); -- Must show PostgreSQL 18.x
2. shared_preload_libraries
pg_trickle must be loaded at server startup via shared_preload_libraries.
Without this, GUC variables and the background scheduler are not available.
# postgresql.conf
shared_preload_libraries = 'pg_trickle'
-
shared_preload_librariesincludespg_trickle - PostgreSQL has been restarted after changing this setting (reload is not sufficient)
SHOW shared_preload_libraries; -- Must include pg_trickle
Managed PostgreSQL: Some providers (Supabase, Neon) do not support custom
shared_preload_libraries. Check your provider's extension compatibility list. AWS RDS and Google Cloud SQL support custom shared libraries via parameter groups.
3. WAL Configuration (Optional but Recommended)
pg_trickle works without wal_level = logical — it uses trigger-based
CDC by default. However, WAL-based CDC provides lower overhead on
write-heavy workloads.
# postgresql.conf (optional — for WAL-based CDC)
wal_level = logical
max_replication_slots = 10 # At least 1 per tracked source table
- Decide: trigger-based CDC (default) or WAL-based CDC
-
If WAL:
wal_level = logicaland server restarted -
If WAL:
max_replication_slotsis sufficient for your source table count
Note: CDC mode is configurable per stream table. The default
cdc_mode = 'auto'starts with triggers and transitions to WAL automatically whenwal_level = logicalis detected. See CONFIGURATION.md for details.
4. Extension Installation
CREATE EXTENSION pg_trickle;
-- Verify installation
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_trickle';
- Extension created successfully
- Version matches expected release
5. Background Scheduler
The scheduler runs as a background worker and manages automatic refresh. Verify it's running:
SELECT pid, backend_type, state
FROM pg_stat_activity
WHERE backend_type = 'pg_trickle scheduler';
-
Scheduler process is visible in
pg_stat_activity -
pg_trickle.enabled = true(default; set tofalseto disable)
6. Connection Pooler Compatibility
PgBouncer (Transaction Mode)
PgBouncer in transaction pooling mode drops session state between transactions. pg_trickle needs special handling:
-
Enable
pooler_compatibility_modeon affected stream tables:
SELECT pgtrickle.alter_stream_table('my_st',
pooler_compatibility_mode => true);
- Or set globally via GUC:
pg_trickle.pooler_compatibility_mode = true
PgBouncer (Session Mode)
Session mode preserves session state — no special configuration needed.
Supavisor / Other Poolers
Some poolers (Supavisor, pgcat) have their own compatibility
characteristics. Test with pgtrickle.validate_query() before deploying.
7. Recommended GUC Starting Values
These are sensible defaults for most workloads. Adjust based on monitoring data.
# Core settings (usually fine as defaults)
pg_trickle.enabled = true # Enable scheduler
pg_trickle.schedule_interval = '5s' # Global default refresh interval
pg_trickle.max_workers = 4 # Parallel refresh workers
# Performance tuning
pg_trickle.planner_aggressive = true # Enable MERGE planner hints
pg_trickle.tiered_scheduling = true # Tier-aware scheduling
# CDC mode
pg_trickle.cdc_mode = 'auto' # auto | trigger | wal
# Safety
pg_trickle.unlogged_buffers = false # true = faster but not crash-safe
pg_trickle.fuse_default_ceiling = 10000 # Auto-fuse change threshold
- Review GUC values for your workload
- See CONFIGURATION.md for the full reference
8. Resource Planning
Memory
- Each background worker uses a separate PostgreSQL backend
work_memapplies to each worker's delta SQL execution- Monitor RSS growth via
pg_stat_activityor OS-level tools
Storage
- Change buffer tables (
pgtrickle_changes.changes_*) grow between refreshes - Buffer size depends on DML rate × refresh interval
- Monitor via
pgtrickle.shared_buffer_stats()
Connections
-
The scheduler uses
pg_trickle.max_workersbackend connections -
Ensure
max_connectionshas headroom for workers + application -
max_connectionsis at least application connections +pg_trickle.max_workers+ 5
9. Monitoring Setup
Essential Queries
-- Stream table health overview
SELECT pgt_name, status, staleness, refresh_mode
FROM pgtrickle.stream_tables_info
ORDER BY staleness DESC NULLS LAST;
-- Refresh efficiency
SELECT pgt_name, diff_speedup, avg_change_ratio
FROM pgtrickle.refresh_efficiency();
-- Error states
SELECT pgt_name, status, last_error_message, last_error_at
FROM pgtrickle.pgt_stream_tables
WHERE status IN ('ERROR', 'SUSPENDED');
Grafana / Prometheus
See the monitoring/ directory for ready-to-use Grafana dashboards and Prometheus configuration.
- Monitoring configured for stream table health
- Alerting on ERROR/SUSPENDED status
10. Backup & Restore
pg_trickle stream tables are standard PostgreSQL tables and are included
in pg_dump / pg_restore. See BACKUP_AND_RESTORE.md
for details.
- Backup strategy accounts for both source tables and stream tables
- Restore procedure tested (stream tables may need re-initialization)
Quick Validation Script
Run this after deployment to verify everything is working:
-- 1. Extension loaded
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_trickle';
-- 2. Scheduler running
SELECT COUNT(*) > 0 AS scheduler_alive
FROM pg_stat_activity
WHERE backend_type = 'pg_trickle scheduler';
-- 3. Create a test stream table
CREATE TABLE _deploy_test_src (id INT PRIMARY KEY, val INT);
INSERT INTO _deploy_test_src VALUES (1, 100), (2, 200);
SELECT pgtrickle.create_stream_table(
'_deploy_test_st',
'SELECT id, val FROM _deploy_test_src',
refresh_mode => 'FULL'
);
SELECT pgtrickle.refresh_stream_table('_deploy_test_st');
-- 4. Verify data
SELECT * FROM _deploy_test_st ORDER BY id;
-- Expected: (1, 100), (2, 200)
-- 5. Cleanup
SELECT pgtrickle.drop_stream_table('_deploy_test_st');
DROP TABLE _deploy_test_src;
Connection Pooler Compatibility
Added in v0.19.0 (UX-4 / STAB-1).
pg_trickle uses prepared statements and NOTIFY internally. These features
require special handling when a connection pooler sits between the application
and PostgreSQL.
PgBouncer Transaction Mode
In PgBouncer transaction pooling mode, each transaction may land on a different server-side connection. Prepared statements and LISTEN/NOTIFY do not survive across transactions.
Recommended configuration:
# postgresql.conf
pg_trickle.connection_pooler_mode = 'transaction'
This cluster-wide GUC:
- Disables prepared-statement reuse for all stream tables.
- Suppresses
NOTIFY pg_trickle_refreshemissions (listeners on other connections will not receive them anyway in transaction mode).
Alternatively, enable pooler compatibility per stream table:
SELECT pgtrickle.alter_stream_table('my_stream_table',
pooler_compatibility_mode => true);
PgBouncer Session Mode
Session pooling is fully compatible — no special configuration needed.
pgcat / Supavisor
These poolers generally support prepared statements and NOTIFY. Set
pg_trickle.connection_pooler_mode = 'off' (the default).
Kubernetes / CNPG
See Scaling — CNPG for connection pooler configuration in Kubernetes environments.
Related Documentation
- Getting Started — First stream table in 5 minutes
- Configuration Reference — All GUC variables
- SQL Reference — Complete function reference
- Best-Practice Patterns — Common data modeling patterns
- Architecture — How pg_trickle works internally
- Backup & Restore — Backup considerations
SQL Reference
Complete reference for all SQL functions, views, and catalog tables provided by pgtrickle.
Table of Contents
- Functions
- Expression Support
- Conditional Expressions
- Comparison Operators
- Boolean Tests
- SQL Value Functions
- Array and Row Expressions
- Subquery Expressions
- Auto-Rewrite Pipeline
- HAVING Clause
- Tables Without Primary Keys (Keyless Tables)
- Volatile Function Detection
- COLLATE Expressions
- IS JSON Predicate (PostgreSQL 16+)
- SQL/JSON Constructors (PostgreSQL 16+)
- JSON_TABLE (PostgreSQL 17+)
- Unsupported Expression Types
- Restrictions & Interoperability
- Referencing Other Stream Tables
- Views as Sources in Defining Queries
- Partitioned Tables as Sources
- Foreign Tables as Sources
- IMMEDIATE Mode Query Restrictions
- Logical Replication Targets
- Views on Stream Tables
- Materialized Views on Stream Tables
- Logical Replication of Stream Tables
- Known Delta Computation Limitations
- What Is NOT Allowed
- Row-Level Security (RLS)
- Views
- Catalog Tables
- Delta SQL Profiling (v0.13.0)
- dbt Integration (v0.13.0)
Functions
Core Lifecycle
Create, modify, and manage the lifecycle of stream tables.
pgtrickle.create_stream_table
Create a new stream table.
pgtrickle.create_stream_table(
name text,
query text,
schedule text DEFAULT 'calculated',
refresh_mode text DEFAULT 'AUTO',
initialize bool DEFAULT true,
diamond_consistency text DEFAULT NULL,
diamond_schedule_policy text DEFAULT NULL,
cdc_mode text DEFAULT NULL,
append_only bool DEFAULT false,
pooler_compatibility_mode bool DEFAULT false
) → void
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
name | text | — | Name of the stream table. May be schema-qualified (myschema.my_st). Defaults to public schema. |
query | text | — | The defining SQL query. Must be a valid SELECT statement using supported operators. |
schedule | text | 'calculated' | Refresh schedule as a Prometheus/GNU-style duration string (e.g., '30s', '5m', '1h', '1h30m', '1d') or a cron expression (e.g., '*/5 * * * *', '@hourly'). Use 'calculated' for CALCULATED mode (inherits schedule from downstream dependents). |
refresh_mode | text | 'AUTO' | 'AUTO' (adaptive — uses DIFFERENTIAL when possible, falls back to FULL if the query is not differentiable), 'FULL' (truncate and reload), 'DIFFERENTIAL' (apply delta only — errors if the query is not differentiable), or 'IMMEDIATE' (synchronous in-transaction maintenance via statement-level triggers). |
initialize | bool | true | If true, populates the table immediately via a full refresh. If false, creates the table empty. |
diamond_consistency | text | NULL (defaults to 'atomic') | Diamond dependency consistency mode: 'atomic' (SAVEPOINT-based atomic group refresh) or 'none' (independent refresh). |
diamond_schedule_policy | text | NULL (defaults to 'fastest') | Schedule policy for atomic diamond groups: 'fastest' (fire when any member is due) or 'slowest' (fire when all are due). Set on the convergence node. |
cdc_mode | text | NULL (use pg_trickle.cdc_mode) | Optional per-stream-table CDC override: 'auto', 'trigger', or 'wal'. This affects all deferred TABLE sources of the stream table. |
append_only | bool | false | When true, differential refreshes use a fast INSERT path instead of MERGE. Skips DELETE/UPDATE/IS DISTINCT FROM checks. If a DELETE or Update is later detected in the change buffer, the flag is automatically reverted to false. Not compatible with FULL, IMMEDIATE, or keyless sources. |
pooler_compatibility_mode | bool | false | When true, the refresh engine uses inline SQL instead of PREPARE/EXECUTE and suppresses all NOTIFY emissions for this stream table. Enable this when the stream table is accessed through a transaction-mode connection pooler (e.g. PgBouncer). |
When refresh_mode => 'IMMEDIATE', the cluster-wide pg_trickle.cdc_mode
setting is ignored. IMMEDIATE mode always uses statement-level IVM triggers
instead of CDC triggers or WAL replication slots. If you explicitly pass
cdc_mode => 'wal' together with refresh_mode => 'IMMEDIATE', pg_trickle
rejects the call because WAL CDC is asynchronous and incompatible with
in-transaction maintenance.
Duration format:
| Unit | Suffix | Example |
|---|---|---|
| Seconds | s | '30s' |
| Minutes | m | '5m' |
| Hours | h | '2h' |
| Days | d | '1d' |
| Weeks | w | '1w' |
| Compound | — | '1h30m', '2m30s' |
Cron expression format:
schedule also accepts standard cron expressions for time-based scheduling. The scheduler refreshes the stream table when the cron schedule fires, rather than checking staleness.
| Format | Fields | Example | Description |
|---|---|---|---|
| 5-field | min hour dom mon dow | '*/5 * * * *' | Every 5 minutes |
| 6-field | sec min hour dom mon dow | '0 */5 * * * *' | Every 5 minutes at :00 seconds |
| Alias | — | '@hourly' | Every hour |
| Alias | — | '@daily' | Every day at midnight |
| Alias | — | '@weekly' | Every Sunday at midnight |
| Alias | — | '@monthly' | First of every month |
| Weekday range | — | '0 6 * * 1-5' | 6 AM on weekdays |
Note: Cron-scheduled stream tables do not participate in CALCULATED schedule resolution. The
stalecolumn in monitoring views returnsNULLfor cron-scheduled tables.
Example:
-- Duration-based: refresh when data is staler than 2 minutes (refresh_mode defaults to 'AUTO')
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
schedule => '2m'
);
-- Cron-based: refresh every hour
SELECT pgtrickle.create_stream_table(
name => 'hourly_summary',
query => 'SELECT date_trunc(''hour'', ts), COUNT(*) FROM events GROUP BY 1',
schedule => '@hourly',
refresh_mode => 'FULL'
);
-- Cron-based: refresh at 6 AM on weekdays
SELECT pgtrickle.create_stream_table(
name => 'daily_report',
query => 'SELECT region, SUM(revenue) AS total FROM sales GROUP BY region',
schedule => '0 6 * * 1-5',
refresh_mode => 'FULL'
);
-- Immediate mode: maintained synchronously within the same transaction
-- No schedule needed — updates happen automatically when base table changes
SELECT pgtrickle.create_stream_table(
name => 'live_totals',
query => 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
refresh_mode => 'IMMEDIATE'
);
-- Force WAL CDC for this stream table even if the global GUC is 'trigger'
SELECT pgtrickle.create_stream_table(
name => 'wal_orders',
query => 'SELECT id, amount FROM orders',
schedule => '1s',
refresh_mode => 'DIFFERENTIAL',
cdc_mode => 'wal'
);
Aggregate Examples:
All supported aggregate functions work in AUTO mode (and all other modes).
Examples below omit refresh_mode — the default 'AUTO' selects DIFFERENTIAL automatically.
Explicit modes are shown only when the mode itself is being demonstrated.
-- Algebraic aggregates (fully differential — no rescan needed)
SELECT pgtrickle.create_stream_table(
name => 'sales_summary',
query => 'SELECT region, COUNT(*) AS cnt, SUM(amount) AS total, AVG(amount) AS avg_amount
FROM orders GROUP BY region',
schedule => '1m'
);
-- Semi-algebraic aggregates (MIN/MAX)
SELECT pgtrickle.create_stream_table(
name => 'salary_ranges',
query => 'SELECT department, MIN(salary) AS min_sal, MAX(salary) AS max_sal
FROM employees GROUP BY department',
schedule => '2m'
);
-- Group-rescan aggregates (BOOL_AND/OR, STRING_AGG, ARRAY_AGG, JSON_AGG, JSONB_AGG,
-- BIT_AND, BIT_OR, BIT_XOR, JSON_OBJECT_AGG, JSONB_OBJECT_AGG,
-- STDDEV, STDDEV_POP, STDDEV_SAMP, VARIANCE, VAR_POP, VAR_SAMP,
-- MODE, PERCENTILE_CONT, PERCENTILE_DISC,
-- CORR, COVAR_POP, COVAR_SAMP, REGR_AVGX, REGR_AVGY,
-- REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE,
-- REGR_SXX, REGR_SXY, REGR_SYY, ANY_VALUE)
SELECT pgtrickle.create_stream_table(
name => 'team_members',
query => 'SELECT department,
STRING_AGG(name, '', '' ORDER BY name) AS members,
ARRAY_AGG(employee_id) AS member_ids,
BOOL_AND(active) AS all_active,
JSON_AGG(name) AS members_json
FROM employees
GROUP BY department',
schedule => '1m'
);
-- Bitwise aggregates
SELECT pgtrickle.create_stream_table(
name => 'permission_summary',
query => 'SELECT department,
BIT_OR(permissions) AS combined_perms,
BIT_AND(permissions) AS common_perms,
BIT_XOR(flags) AS xor_flags
FROM employees
GROUP BY department',
schedule => '1m'
);
-- JSON object aggregates
SELECT pgtrickle.create_stream_table(
name => 'config_map',
query => 'SELECT department,
JSON_OBJECT_AGG(setting_name, setting_value) AS settings,
JSONB_OBJECT_AGG(key, value) AS metadata
FROM config
GROUP BY department',
schedule => '1m'
);
-- Statistical aggregates
SELECT pgtrickle.create_stream_table(
name => 'salary_stats',
query => 'SELECT department,
STDDEV_POP(salary) AS sd_pop,
STDDEV_SAMP(salary) AS sd_samp,
VAR_POP(salary) AS var_pop,
VAR_SAMP(salary) AS var_samp
FROM employees
GROUP BY department',
schedule => '1m'
);
-- Ordered-set aggregates (MODE, PERCENTILE_CONT, PERCENTILE_DISC)
SELECT pgtrickle.create_stream_table(
name => 'salary_percentiles',
query => 'SELECT department,
MODE() WITHIN GROUP (ORDER BY grade) AS most_common_grade,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) AS median_salary,
PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY salary) AS p90_salary
FROM employees
GROUP BY department',
schedule => '1m'
);
-- Regression / correlation aggregates (CORR, COVAR_*, REGR_*)
SELECT pgtrickle.create_stream_table(
name => 'regression_stats',
query => 'SELECT department,
CORR(salary, experience) AS sal_exp_corr,
COVAR_POP(salary, experience) AS covar_pop,
COVAR_SAMP(salary, experience) AS covar_samp,
REGR_SLOPE(salary, experience) AS slope,
REGR_INTERCEPT(salary, experience) AS intercept,
REGR_R2(salary, experience) AS r_squared,
REGR_COUNT(salary, experience) AS regr_n
FROM employees
GROUP BY department',
schedule => '1m'
);
-- ANY_VALUE aggregate (PostgreSQL 16+)
SELECT pgtrickle.create_stream_table(
name => 'dept_sample',
query => 'SELECT department, ANY_VALUE(office_location) AS sample_office
FROM employees GROUP BY department',
schedule => '1m'
);
-- FILTER clause on aggregates
SELECT pgtrickle.create_stream_table(
name => 'order_metrics',
query => 'SELECT region,
COUNT(*) AS total,
COUNT(*) FILTER (WHERE status = ''active'') AS active_count,
SUM(amount) FILTER (WHERE status = ''shipped'') AS shipped_total
FROM orders
GROUP BY region',
schedule => '1m'
);
-- PgBouncer compatibility (transaction-mode pooler)
SELECT pgtrickle.create_stream_table(
name => 'pooled_orders',
query => 'SELECT id, amount FROM orders',
schedule => '5m',
pooler_compatibility_mode => true
);
CTE Examples:
Non-recursive CTEs are fully supported in both FULL and DIFFERENTIAL modes:
-- Simple CTE
SELECT pgtrickle.create_stream_table(
name => 'active_order_totals',
query => 'WITH active_users AS (
SELECT id, name FROM users WHERE active = true
)
SELECT a.id, a.name, SUM(o.amount) AS total
FROM active_users a
JOIN orders o ON o.user_id = a.id
GROUP BY a.id, a.name',
schedule => '1m'
);
-- Chained CTEs (CTE referencing another CTE)
SELECT pgtrickle.create_stream_table(
name => 'top_regions',
query => 'WITH regional AS (
SELECT region, SUM(amount) AS total FROM orders GROUP BY region
),
ranked AS (
SELECT region, total FROM regional WHERE total > 1000
)
SELECT * FROM ranked',
schedule => '2m'
);
-- Multi-reference CTE (referenced twice in FROM — shared delta optimization)
SELECT pgtrickle.create_stream_table(
name => 'self_compare',
query => 'WITH totals AS (
SELECT user_id, SUM(amount) AS total FROM orders GROUP BY user_id
)
SELECT t1.user_id, t1.total, t2.total AS next_total
FROM totals t1
JOIN totals t2 ON t1.user_id = t2.user_id + 1',
schedule => '1m'
);
-- Append-only stream table (INSERT-only fast path)
SELECT pgtrickle.create_stream_table(
name => 'event_log_st',
query => 'SELECT id, event_type, payload, created_at FROM events',
schedule => '30s',
append_only => true
);
Recursive CTEs work with FULL, DIFFERENTIAL, and IMMEDIATE modes:
-- Recursive CTE (hierarchy traversal)
SELECT pgtrickle.create_stream_table(
name => 'category_tree',
query => 'WITH RECURSIVE cat_tree AS (
SELECT id, name, parent_id, 0 AS depth
FROM categories WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.name, c.parent_id, ct.depth + 1
FROM categories c
JOIN cat_tree ct ON c.parent_id = ct.id
)
SELECT * FROM cat_tree',
schedule => '5m',
refresh_mode => 'FULL' -- FULL mode: standard re-execution
);
-- Recursive CTE with DIFFERENTIAL mode (incremental semi-naive / DRed)
SELECT pgtrickle.create_stream_table(
name => 'org_chart',
query => 'WITH RECURSIVE reports AS (
SELECT id, name, manager_id FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id
FROM employees e JOIN reports r ON e.manager_id = r.id
)
SELECT * FROM reports',
schedule => '2m',
refresh_mode => 'DIFFERENTIAL' -- Uses semi-naive, DRed, or recomputation (auto-selected)
);
-- Recursive CTE with IMMEDIATE mode (same-transaction maintenance)
SELECT pgtrickle.create_stream_table(
name => 'org_chart_live',
query => 'WITH RECURSIVE reports AS (
SELECT id, name, manager_id FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id
FROM employees e JOIN reports r ON e.manager_id = r.id
)
SELECT * FROM reports',
refresh_mode => 'IMMEDIATE' -- Uses transition tables with semi-naive / DRed maintenance
);
Non-monotone recursive terms: If the recursive term contains operators like
EXCEPT, aggregate functions, window functions,DISTINCT,INTERSECT(set), or anti-joins, the system automatically falls back to recomputation to guarantee correctness. Semi-naive and DRed strategies require monotone recursive terms (JOIN, UNION ALL, filter/project only).
Set Operation Examples:
INTERSECT, INTERSECT ALL, EXCEPT, EXCEPT ALL, UNION, and UNION ALL are supported:
-- INTERSECT: customers who placed orders in BOTH regions
SELECT pgtrickle.create_stream_table(
name => 'bi_region_customers',
query => 'SELECT customer_id FROM orders_east
INTERSECT
SELECT customer_id FROM orders_west',
schedule => '2m'
);
-- INTERSECT ALL: preserves duplicates (bag semantics)
SELECT pgtrickle.create_stream_table(
name => 'common_items',
query => 'SELECT item_name FROM warehouse_a
INTERSECT ALL
SELECT item_name FROM warehouse_b',
schedule => '1m'
);
-- EXCEPT: orders not yet shipped
SELECT pgtrickle.create_stream_table(
name => 'unshipped_orders',
query => 'SELECT order_id FROM orders
EXCEPT
SELECT order_id FROM shipments',
schedule => '1m'
);
-- EXCEPT ALL: preserves duplicate counts (bag subtraction)
SELECT pgtrickle.create_stream_table(
name => 'excess_inventory',
query => 'SELECT sku FROM stock_received
EXCEPT ALL
SELECT sku FROM stock_shipped',
schedule => '5m'
);
-- UNION: deduplicated merge of two sources
SELECT pgtrickle.create_stream_table(
name => 'all_contacts',
query => 'SELECT email FROM customers
UNION
SELECT email FROM newsletter_subscribers',
schedule => '5m'
);
LATERAL Set-Returning Function Examples:
Set-returning functions (SRFs) in the FROM clause are supported in both FULL and DIFFERENTIAL modes. Common SRFs include jsonb_array_elements, jsonb_each, jsonb_each_text, and unnest:
-- Flatten JSONB arrays into rows
SELECT pgtrickle.create_stream_table(
name => 'flat_children',
query => 'SELECT p.id, child.value AS val
FROM parent_data p,
jsonb_array_elements(p.data->''children'') AS child',
schedule => '1m'
);
-- Expand JSONB key-value pairs (multi-column SRF)
SELECT pgtrickle.create_stream_table(
name => 'flat_properties',
query => 'SELECT d.id, kv.key, kv.value
FROM documents d,
jsonb_each(d.metadata) AS kv',
schedule => '2m'
);
-- Unnest arrays
SELECT pgtrickle.create_stream_table(
name => 'flat_tags',
query => 'SELECT t.id, tag.tag
FROM tagged_items t,
unnest(t.tags) AS tag(tag)',
schedule => '1m'
);
-- SRF with WHERE filter
SELECT pgtrickle.create_stream_table(
name => 'high_value_items',
query => 'SELECT p.id, (e.value)::int AS amount
FROM products p,
jsonb_array_elements(p.prices) AS e
WHERE (e.value)::int > 100',
schedule => '5m'
);
-- SRF combined with aggregation
SELECT pgtrickle.create_stream_table(
name => 'element_counts',
query => 'SELECT a.id, count(*) AS cnt
FROM arrays a,
jsonb_array_elements(a.data) AS e
GROUP BY a.id',
schedule => '1m',
refresh_mode => 'FULL'
);
LATERAL Subquery Examples:
LATERAL subqueries in the FROM clause are supported in both FULL and DIFFERENTIAL modes. Use them for top-N per group, correlated aggregation, and conditional expansion:
-- Top-N per group: latest item per order
SELECT pgtrickle.create_stream_table(
name => 'latest_items',
query => 'SELECT o.id, o.customer, latest.amount
FROM orders o,
LATERAL (
SELECT li.amount
FROM line_items li
WHERE li.order_id = o.id
ORDER BY li.created_at DESC
LIMIT 1
) AS latest',
schedule => '1m'
);
-- Correlated aggregate
SELECT pgtrickle.create_stream_table(
name => 'dept_summaries',
query => 'SELECT d.id, d.name, stats.total, stats.cnt
FROM departments d,
LATERAL (
SELECT SUM(e.salary) AS total, COUNT(*) AS cnt
FROM employees e
WHERE e.dept_id = d.id
) AS stats',
schedule => '1m'
);
-- LEFT JOIN LATERAL: preserve outer rows with NULLs when subquery returns no rows
SELECT pgtrickle.create_stream_table(
name => 'dept_stats_all',
query => 'SELECT d.id, d.name, stats.total
FROM departments d
LEFT JOIN LATERAL (
SELECT SUM(e.salary) AS total
FROM employees e
WHERE e.dept_id = d.id
) AS stats ON true',
schedule => '1m'
);
WHERE Subquery Examples:
Subqueries in the WHERE clause are automatically transformed into semi-join, anti-join, or scalar subquery operators in the DVM operator tree:
-- EXISTS subquery: customers who have placed orders
SELECT pgtrickle.create_stream_table(
name => 'active_customers',
query => 'SELECT c.id, c.name
FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id)',
schedule => '1m'
);
-- NOT EXISTS: customers with no orders
SELECT pgtrickle.create_stream_table(
name => 'inactive_customers',
query => 'SELECT c.id, c.name
FROM customers c
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id)',
schedule => '1m'
);
-- IN subquery: products that have been ordered
SELECT pgtrickle.create_stream_table(
name => 'ordered_products',
query => 'SELECT p.id, p.name
FROM products p
WHERE p.id IN (SELECT product_id FROM order_items)',
schedule => '1m'
);
-- NOT IN subquery: products never ordered
SELECT pgtrickle.create_stream_table(
name => 'unordered_products',
query => 'SELECT p.id, p.name
FROM products p
WHERE p.id NOT IN (SELECT product_id FROM order_items)',
schedule => '1m'
);
-- Scalar subquery in SELECT list
SELECT pgtrickle.create_stream_table(
name => 'products_with_max_price',
query => 'SELECT p.id, p.name, (SELECT max(price) FROM products) AS max_price
FROM products p',
schedule => '1m'
);
Notes:
- The defining query is parsed into an operator tree and validated for DVM support.
- Views as sources — views referenced in the defining query are automatically inlined as subqueries (auto-rewrite pass #0). CDC triggers are created on the underlying base tables. Nested views (view → view → table) are fully expanded. The user's original query is preserved in
original_queryfor reinit and introspection. Materialized views are rejected in DIFFERENTIAL mode (use FULL mode or the underlying query directly). Foreign tables are also rejected in DIFFERENTIAL mode. - CDC triggers and change buffer tables are created automatically for each source table.
- TRUNCATE on source tables — when a source table is TRUNCATEd, a CDC trigger writes a marker row (
action='T') into the change buffer. On the next refresh cycle, pg_trickle detects the marker and automatically falls back to a FULL refresh. For single-source stream tables where no subsequent DML occurred after the TRUNCATE, an optimized fast path deletes all ST rows directly without re-running the full defining query. - The ST is registered in the dependency DAG; cycles are rejected.
- Non-recursive CTEs are inlined as subqueries during parsing (Tier 1). Multi-reference CTEs share delta computation (Tier 2).
- Recursive CTEs in DIFFERENTIAL mode use three strategies, auto-selected per refresh: semi-naive evaluation for INSERT-only changes, DRed (Delete-and-Rederive) for mixed DELETE/UPDATE changes, and recomputation fallback when CTE columns do not match ST storage columns. Non-monotone recursive terms (containing EXCEPT, Aggregate, Window, DISTINCT, AntiJoin, or INTERSECT SET) automatically fall back to recomputation to ensure correctness.
Recursive CTE DIFFERENTIAL mode -- DRed algorithm (P2-1) In DIFFERENTIAL mode, mixed DELETE/UPDATE changes now use the DRed (Delete-and-Rederive) algorithm: (1) semi-naive INSERT propagation; (2) over-deletion cascade from ST storage; (3) rederivation from current source tables; (4) combine net deletions. DRed correctly handles derived-column changes such as path rebuilds under a renamed ancestor node. When CTE output columns differ from ST storage columns (mismatch), recomputation is used. Implemented in v0.10.0 (P2-1).
- LATERAL SRFs in DIFFERENTIAL mode use row-scoped recomputation: when a source row changes, only the SRF expansions for that row are re-evaluated.
- LATERAL subqueries in DIFFERENTIAL mode also use row-scoped recomputation: when an outer row changes, the correlated subquery is re-executed only for that row.
- WHERE subqueries (
EXISTS,IN, scalar) are parsed into dedicated semi-join, anti-join, and scalar subquery operators with specialized delta computation. ALL (subquery)is the only subquery form that is currently rejected.- ORDER BY is accepted but silently discarded — row order in the storage table is undefined (consistent with PostgreSQL's
CREATE MATERIALIZED VIEWbehavior). Apply ORDER BY when querying the stream table. - TopK (ORDER BY + LIMIT) — When a top-level
ORDER BY … LIMIT Nis present (with a constant integer limit, optionally withOFFSET M), the query is recognized as a "TopK" pattern and accepted. TopK stream tables store exactly N rows (starting from position M+1 if OFFSET is specified) and are refreshed via a scoped-recomputation MERGE strategy. The DVM delta pipeline is bypassed; instead, each refresh re-evaluates the full ORDER BY + LIMIT [+ OFFSET] query and merges the result into the storage table. The catalog recordstopk_limit,topk_order_by, and optionallytopk_offsetfor the stream table. TopK is not supported with set operations (UNION/INTERSECT/EXCEPT) or GROUP BY ROLLUP/CUBE/GROUPING SETS. - LIMIT / OFFSET without ORDER BY are rejected — stream tables materialize the full result set. Apply LIMIT when querying the stream table.
pgtrickle.create_stream_table_if_not_exists
Create a stream table if it does not already exist. If a stream table with the
given name already exists, this is a silent no-op (an INFO message is logged).
The existing definition is never modified.
pgtrickle.create_stream_table_if_not_exists(
name text,
query text,
schedule text DEFAULT 'calculated',
refresh_mode text DEFAULT 'AUTO',
initialize bool DEFAULT true,
diamond_consistency text DEFAULT NULL,
diamond_schedule_policy text DEFAULT NULL,
cdc_mode text DEFAULT NULL,
append_only bool DEFAULT false,
pooler_compatibility_mode bool DEFAULT false
) → void
Parameters: Same as create_stream_table.
Example:
-- Safe to re-run in migrations:
SELECT pgtrickle.create_stream_table_if_not_exists(
'order_totals',
'SELECT customer_id, sum(amount) AS total FROM orders GROUP BY customer_id',
'1m',
'DIFFERENTIAL'
);
Notes:
- Useful for deployment / migration scripts that should be safe to re-run.
- If the stream table already exists, the provided
query,schedule, and other parameters are ignored — the existing definition is preserved.
pgtrickle.create_or_replace_stream_table
Create a stream table if it does not exist, or replace the existing one if the definition changed. This is the declarative, idempotent API for deployment workflows (dbt, SQL migrations, GitOps).
pgtrickle.create_or_replace_stream_table(
name text,
query text,
schedule text DEFAULT 'calculated',
refresh_mode text DEFAULT 'AUTO',
initialize bool DEFAULT true,
diamond_consistency text DEFAULT NULL,
diamond_schedule_policy text DEFAULT NULL,
cdc_mode text DEFAULT NULL,
append_only bool DEFAULT false,
pooler_compatibility_mode bool DEFAULT false
) → void
Parameters: Same as create_stream_table.
Behavior:
| Current state | Action taken |
|---|---|
| Stream table does not exist | Create — identical to create_stream_table(...) |
| Stream table exists, query and all config identical | No-op — logs INFO, returns immediately |
| Stream table exists, query identical but config differs | Alter config — delegates to alter_stream_table(...) for schedule, refresh_mode, diamond settings, cdc_mode, append_only, pooler_compatibility_mode |
| Stream table exists, query differs | Replace query — in-place ALTER QUERY migration plus any config changes; a full refresh is applied |
The initialize parameter is honoured on create only. On replace, the stream table is always repopulated via a full refresh.
Query comparison uses the post-rewrite (normalized) form of the SQL. Cosmetic differences such as whitespace, casing, and extra parentheses are ignored.
Example:
-- Idempotent deployment — safe to run on every deploy:
SELECT pgtrickle.create_or_replace_stream_table(
name => 'order_totals',
query => 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
schedule => '2m',
refresh_mode => 'DIFFERENTIAL'
);
-- If the query changed since last deploy, the stream table is
-- migrated in place (no data gap). If nothing changed, it's a no-op.
Notes:
- Mirrors PostgreSQL's
CREATE OR REPLACEconvention (CREATE OR REPLACE VIEW,CREATE OR REPLACE FUNCTION). - Never drops the stream table — even for incompatible schema changes, the ALTER QUERY path rebuilds storage in place while preserving the catalog entry (
pgt_id). - For migration scripts that should not modify an existing definition, use
create_stream_table_if_not_existsinstead.
pgtrickle.bulk_create
Create multiple stream tables in a single transaction.
pgtrickle.bulk_create(
definitions jsonb -- Array of stream table definitions
) → jsonb -- Array of result objects
Each element in the definitions array must be a JSON object with at least name and query keys. All other keys match the parameters of create_stream_table (snake_case):
| Key | Type | Default | Description |
|---|---|---|---|
name | string | (required) | Stream table name (optionally schema-qualified). |
query | string | (required) | Defining SQL query. |
schedule | string | 'calculated' | Refresh schedule. |
refresh_mode | string | 'AUTO' | 'AUTO', 'FULL', 'DIFFERENTIAL', or 'IMMEDIATE'. |
initialize | boolean | true | Whether to populate immediately. |
diamond_consistency | string | NULL | 'atomic' or 'none'. |
diamond_schedule_policy | string | NULL | 'fastest' or 'slowest'. |
cdc_mode | string | NULL | 'auto', 'trigger', or 'wal'. |
append_only | boolean | false | Enable append-only fast path. |
pooler_compatibility_mode | boolean | false | PgBouncer compatibility. |
partition_by | string | NULL | Partition key. |
max_differential_joins | integer | NULL | Max join scan limit. |
max_delta_fraction | number | NULL | Max delta fraction (0.0–1.0). |
Returns a JSONB array of result objects:
[
{"name": "st1", "status": "created", "pgt_id": 42},
{"name": "st2", "status": "created", "pgt_id": 43}
]
On any error, the entire transaction is rolled back (standard PostgreSQL transactional semantics). The error message includes the index and name of the failing definition.
Example:
SELECT pgtrickle.bulk_create('[
{"name": "order_totals", "query": "SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id", "schedule": "30s"},
{"name": "product_stats", "query": "SELECT product_id, COUNT(*) AS cnt FROM order_items GROUP BY product_id", "schedule": "1m"}
]'::jsonb);
pgtrickle.alter_stream_table
Alter properties of an existing stream table.
pgtrickle.alter_stream_table(
name text,
query text DEFAULT NULL,
schedule text DEFAULT NULL,
refresh_mode text DEFAULT NULL,
status text DEFAULT NULL,
diamond_consistency text DEFAULT NULL,
diamond_schedule_policy text DEFAULT NULL,
cdc_mode text DEFAULT NULL,
append_only bool DEFAULT NULL,
pooler_compatibility_mode bool DEFAULT NULL,
tier text DEFAULT NULL
) → void
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
name | text | — | Name of the stream table (schema-qualified or unqualified). |
query | text | NULL | New defining query. Pass NULL to leave unchanged. When set, the function validates the new query, migrates the storage table schema if needed, updates catalog entries and dependencies, and runs a full refresh. Schema changes are classified as same (no DDL), compatible (ALTER TABLE ADD/DROP COLUMN), or incompatible (full storage rebuild with OID change). |
schedule | text | NULL | New schedule as a duration string (e.g., '5m'). Pass NULL to leave unchanged. Pass 'calculated' to switch to CALCULATED mode. |
refresh_mode | text | NULL | New refresh mode ('AUTO', 'FULL', 'DIFFERENTIAL', or 'IMMEDIATE'). Pass NULL to leave unchanged. Switching to/from 'IMMEDIATE' migrates trigger infrastructure (IVM triggers ↔ CDC triggers), clears or restores the schedule, and runs a full refresh. |
status | text | NULL | New status ('ACTIVE', 'SUSPENDED'). Pass NULL to leave unchanged. Resuming resets consecutive errors to 0. |
diamond_consistency | text | NULL | New diamond consistency mode ('none' or 'atomic'). Pass NULL to leave unchanged. |
diamond_schedule_policy | text | NULL | New schedule policy for atomic diamond groups ('fastest' or 'slowest'). Pass NULL to leave unchanged. |
cdc_mode | text | NULL | New requested CDC mode override ('auto', 'trigger', or 'wal'). Pass NULL to leave unchanged. |
append_only | bool | NULL | Enable or disable the append-only INSERT fast path. Pass NULL to leave unchanged. When true, rejected for FULL, IMMEDIATE, or keyless source stream tables. |
pooler_compatibility_mode | bool | NULL | Enable or disable pooler-safe mode. When true, prepared statements are bypassed and NOTIFY emissions are suppressed. Pass NULL to leave unchanged. |
tier | text | NULL | Refresh tier for tiered scheduling ('hot', 'warm', 'cold', or 'frozen'). Only effective when pg_trickle.tiered_scheduling GUC is enabled. Hot (1×), Warm (2×), Cold (10×), Frozen (skip). Pass NULL to leave unchanged. |
If you switch a stream table to refresh_mode => 'IMMEDIATE' while the
cluster-wide pg_trickle.cdc_mode GUC is set to 'wal', pg_trickle logs an
INFO and proceeds with IVM triggers. WAL CDC does not apply to IMMEDIATE mode.
If the stream table has an explicit cdc_mode => 'wal' override, switching to
IMMEDIATE is rejected until you change the requested CDC mode back to
'auto' or 'trigger'.
Examples:
-- Change the defining query (same output schema — fast path)
SELECT pgtrickle.alter_stream_table('order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders WHERE status = ''active'' GROUP BY customer_id');
-- Change query and add a column (compatible schema migration)
SELECT pgtrickle.alter_stream_table('order_totals',
query => 'SELECT customer_id, SUM(amount) AS total, COUNT(*) AS cnt FROM orders GROUP BY customer_id');
-- Change query and mode simultaneously
SELECT pgtrickle.alter_stream_table('order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
refresh_mode => 'FULL');
-- Change schedule
SELECT pgtrickle.alter_stream_table('order_totals', schedule => '5m');
-- Switch to full refresh mode
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'FULL');
-- Switch to immediate (transactional) mode — installs IVM triggers, clears schedule
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'IMMEDIATE');
-- Switch from immediate back to differential — re-creates CDC triggers, restores schedule
SELECT pgtrickle.alter_stream_table('order_totals',
refresh_mode => 'DIFFERENTIAL', schedule => '5m');
-- Pin a deferred stream table to trigger CDC even when the global GUC is 'auto'
SELECT pgtrickle.alter_stream_table('order_totals', cdc_mode => 'trigger');
-- Enable append-only INSERT fast path
SELECT pgtrickle.alter_stream_table('event_log_st', append_only => true);
-- Enable pooler compatibility mode (for PgBouncer transaction mode)
SELECT pgtrickle.alter_stream_table('order_totals', pooler_compatibility_mode => true);
-- Set refresh tier (requires pg_trickle.tiered_scheduling = on)
SELECT pgtrickle.alter_stream_table('order_totals', tier => 'warm');
SELECT pgtrickle.alter_stream_table('archive_stats', tier => 'frozen');
-- Suspend a stream table
SELECT pgtrickle.alter_stream_table('order_totals', status => 'SUSPENDED');
-- Resume a suspended stream table
SELECT pgtrickle.resume_stream_table('order_totals');
-- Or via alter_stream_table
SELECT pgtrickle.alter_stream_table('order_totals', status => 'ACTIVE');
Notes:
- When
queryis provided, the function runs the full query rewrite pipeline (view inlining, DISTINCT ON, GROUPING SETS, etc.) and validates the new query before applying changes. - The entire ALTER QUERY operation runs within a single transaction. If any step fails, the stream table is left unchanged.
- For same-schema and compatible-schema changes, the storage table OID is preserved — views, policies, and publications referencing the stream table remain valid.
- For incompatible schema changes (e.g., changing a column from
integertotext), the storage table is rebuilt and the OID changes. AWARNINGis emitted. - The stream table is temporarily suspended during query migration to prevent concurrent scheduler refreshes.
pgtrickle.drop_stream_table
Drop a stream table, removing the storage table and all catalog entries.
pgtrickle.drop_stream_table(name text) → void
Parameters:
| Parameter | Type | Description |
|---|---|---|
name | text | Name of the stream table to drop. |
Example:
SELECT pgtrickle.drop_stream_table('order_totals');
Notes:
- Drops the underlying storage table with
CASCADE. - Removes all catalog entries (metadata, dependencies, refresh history).
- Cleans up CDC triggers and change buffer tables for source tables that are no longer tracked by any ST.
pgtrickle.resume_stream_table
Resume a suspended stream table, clearing its consecutive error count and re-enabling automated and manual refreshes.
pgtrickle.resume_stream_table(name text) → void
Parameters:
| Parameter | Type | Description |
|---|---|---|
name | text | Name of the stream table to resume (schema-qualified or unqualified). |
Example:
-- Resume a stream table that was auto-suspended due to repeated errors
SELECT pgtrickle.resume_stream_table('order_totals');
Notes:
- Errors if the ST is not in
SUSPENDEDstate. - Resets
consecutive_errorsto0and setsstatus = 'ACTIVE'. - Emits a
resumedevent on thepg_trickle_alertNOTIFY channel. - After resuming, the scheduler will include the ST in its next cycle.
pgtrickle.refresh_stream_table
Manually trigger a synchronous refresh of a stream table.
pgtrickle.refresh_stream_table(name text) → void
Parameters:
| Parameter | Type | Description |
|---|---|---|
name | text | Name of the stream table to refresh. |
Example:
SELECT pgtrickle.refresh_stream_table('order_totals');
Notes:
- Blocked if the ST is
SUSPENDED— usepgtrickle.resume_stream_table(name)first. - Uses an advisory lock to prevent concurrent refreshes of the same ST.
- For
DIFFERENTIALmode, generates and applies a delta query. ForFULLmode, truncates and reloads. - Records the refresh in
pgtrickle.pgt_refresh_historywithinitiated_by = 'MANUAL'.
pgtrickle.repair_stream_table
Repair a stream table by reinstalling any missing CDC triggers, validating catalog entries, and reconciling change buffer state.
pgtrickle.repair_stream_table(name text) → void
Parameters:
| Parameter | Type | Description |
|---|---|---|
name | text | Name of the stream table to repair. |
Example:
-- Reinstall missing CDC triggers after a point-in-time recovery
SELECT pgtrickle.repair_stream_table('order_totals');
Notes:
- Inspects all source tables in the stream table's dependency graph and reinstalls any missing or disabled CDC triggers.
- Validates that the stream table's catalog entry, storage table, and change buffer tables are consistent.
- Useful after
pg_basebackupor PITR restores where triggers may not have been captured in the backup. - Use
pgtrickle.trigger_inventory()first to identify which triggers are missing. - Safe to call on a healthy stream table — it is a no-op if everything is intact.
Status & Monitoring
Query the state of stream tables, view refresh statistics, and diagnose problems.
pgtrickle.pgt_status
Get the status of all stream tables.
pgtrickle.pgt_status() → SETOF record(
name text,
status text,
refresh_mode text,
is_populated bool,
consecutive_errors int,
schedule text,
data_timestamp timestamptz,
staleness interval
)
Example:
SELECT * FROM pgtrickle.pgt_status();
| name | status | refresh_mode | is_populated | consecutive_errors | schedule | data_timestamp | staleness |
|---|---|---|---|---|---|---|---|
| public.order_totals | ACTIVE | DIFFERENTIAL | true | 0 | 5m | 2026-02-21 12:00:00+00 | 00:02:30 |
pgtrickle.health_check
Run a set of health checks against the pg_trickle installation and return one row per check.
pgtrickle.health_check() → SETOF record(
check_name text, -- identifier for the check
severity text, -- 'OK', 'WARN', or 'ERROR'
detail text -- human-readable explanation
)
Filter to problems only:
SELECT check_name, severity, detail
FROM pgtrickle.health_check()
WHERE severity != 'OK';
Checks: scheduler_running, error_tables, stale_tables, needs_reinit,
consecutive_errors, buffer_growth (> 10 000 pending rows), slot_lag
(retained WAL above pg_trickle.slot_lag_warning_threshold_mb, default 100 MB),
worker_pool (all worker tokens in use — parallel mode only), job_queue
(> 10 jobs queued — parallel mode only).
pgtrickle.health_summary
Single-row summary of the entire pg_trickle deployment's health. Designed for monitoring dashboards that want one endpoint to poll instead of joining multiple views.
pgtrickle.health_summary() → SETOF record(
total_stream_tables int,
active_count int,
error_count int,
suspended_count int,
stale_count int,
reinit_pending int,
max_staleness_seconds float8, -- NULL if no stream tables
scheduler_status text, -- 'ACTIVE', 'STOPPED', or 'NOT_LOADED'
cache_hit_rate float8 -- NULL if no cache lookups yet
)
Example:
SELECT * FROM pgtrickle.health_summary();
| total_stream_tables | active_count | error_count | suspended_count | stale_count | reinit_pending | max_staleness_seconds | scheduler_status | cache_hit_rate |
|---|---|---|---|---|---|---|---|---|
| 12 | 11 | 0 | 1 | 0 | 0 | 45.2 | ACTIVE | 0.94 |
Tip: Use this in a Grafana single-stat panel or a Prometheus exporter to surface fleet-level health at a glance.
pgtrickle.refresh_timeline
Return recent refresh records across all stream tables in a single chronological view.
pgtrickle.refresh_timeline(
max_rows int DEFAULT 50
) → SETOF record(
start_time timestamptz,
stream_table text,
action text,
status text,
rows_inserted bigint,
rows_deleted bigint,
duration_ms float8,
error_message text
)
Example:
-- Most recent 20 events across all stream tables:
SELECT start_time, stream_table, action, status, round(duration_ms::numeric,1) AS ms
FROM pgtrickle.refresh_timeline(20);
-- Just failures in the last 100 events:
SELECT * FROM pgtrickle.refresh_timeline(100) WHERE status = 'ERROR';
pgtrickle.st_refresh_stats
Return per-ST refresh statistics aggregated from the refresh history.
pgtrickle.st_refresh_stats() → SETOF record(
pgt_name text,
pgt_schema text,
status text,
refresh_mode text,
is_populated bool,
total_refreshes bigint,
successful_refreshes bigint,
failed_refreshes bigint,
total_rows_inserted bigint,
total_rows_deleted bigint,
avg_duration_ms float8,
last_refresh_action text,
last_refresh_status text,
last_refresh_at timestamptz,
staleness_secs float8,
stale bool
)
Example:
SELECT pgt_name, status, total_refreshes, avg_duration_ms, stale
FROM pgtrickle.st_refresh_stats();
pgtrickle.get_refresh_history
Return refresh history for a specific stream table.
pgtrickle.get_refresh_history(
name text,
max_rows int DEFAULT 20
) → SETOF record(
refresh_id bigint,
data_timestamp timestamptz,
start_time timestamptz,
end_time timestamptz,
action text,
status text,
rows_inserted bigint,
rows_deleted bigint,
duration_ms float8,
error_message text
)
Example:
SELECT action, status, rows_inserted, duration_ms
FROM pgtrickle.get_refresh_history('order_totals', 5);
pgtrickle.get_staleness
Get the current staleness in seconds for a specific stream table.
pgtrickle.get_staleness(name text) → float8
Returns NULL if the ST has never been refreshed.
Example:
SELECT pgtrickle.get_staleness('order_totals');
-- Returns: 12.345 (seconds since last refresh)
pgtrickle.explain_refresh_mode
Added in v0.11.0
Explain the configured vs. effective refresh mode for a stream table, including the reason for any downgrade (e.g., AUTO choosing FULL).
pgtrickle.explain_refresh_mode(name text) → TABLE(
configured_mode text,
effective_mode text,
downgrade_reason text
)
Columns:
| Column | Type | Description |
|---|---|---|
configured_mode | text | The refresh mode set on the stream table (e.g., DIFFERENTIAL, AUTO, FULL, IMMEDIATE) |
effective_mode | text | The mode actually used on the most recent refresh. NULL for IMMEDIATE mode (handled by triggers) |
downgrade_reason | text | Human-readable explanation when effective_mode differs from configured_mode, or informational note for IMMEDIATE / APPEND_ONLY |
Example:
SELECT * FROM pgtrickle.explain_refresh_mode('public.orders_summary');
| configured_mode | effective_mode | downgrade_reason |
|---|---|---|
| AUTO | FULL | The most recent refresh used FULL mode. Possible causes: defining query contains a CTE or unsupported operator, adaptive change-ratio threshold was exceeded, or aggregate saturation occurred. Check pgtrickle.pgt_refresh_history for details. |
pgtrickle.cache_stats
Return template cache statistics from shared memory.
Reports L1 (thread-local) hits, L2 (catalog table) hits, full misses (DVM re-parse), evictions (generation flushes), and the current L1 cache size for this backend.
pgtrickle.cache_stats() → SETOF record(
l1_hits bigint,
l2_hits bigint,
misses bigint,
evictions bigint,
l1_size integer
)
| Column | Description |
|---|---|
l1_hits | Number of delta template cache hits in the thread-local (L1) cache. ~0 ns lookup. |
l2_hits | Number of delta template cache hits in the catalog table (L2) cache. ~1 ms SPI lookup. |
misses | Number of full cache misses requiring DVM re-parse (~45 ms). |
evictions | Number of entries evicted from L1 due to DDL-triggered generation flushes. |
l1_size | Current number of entries in this backend's L1 cache. |
Example:
SELECT * FROM pgtrickle.cache_stats();
| l1_hits | l2_hits | misses | evictions | l1_size |
|---|---|---|---|---|
| 142 | 3 | 5 | 10 | 8 |
Note: Counters are cluster-wide (shared memory) except
l1_sizewhich is per-backend. Requiresshared_preload_libraries = 'pg_trickle'; returns zeros when loaded dynamically.
CDC Diagnostics
Inspect CDC pipeline health, replication slots, change buffers, and trigger coverage.
pgtrickle.slot_health
Check replication slot health for all tracked CDC slots.
pgtrickle.slot_health() → SETOF record(
slot_name text,
source_relid bigint,
active bool,
retained_wal_bytes bigint,
wal_status text
)
Example:
SELECT * FROM pgtrickle.slot_health();
| slot_name | source_relid | active | retained_wal_bytes | wal_status |
|---|---|---|---|---|
| pg_trickle_slot_16384 | 16384 | false | 1048576 | reserved |
pgtrickle.check_cdc_health
Check CDC health for all tracked source tables. Returns per-source health status including the current CDC mode, replication slot details, estimated lag, and any alerts.
The alert column uses the critical threshold configured by
pg_trickle.slot_lag_critical_threshold_mb (default 1024 MB).
pgtrickle.check_cdc_health() → SETOF record(
source_relid bigint,
source_table text,
cdc_mode text,
slot_name text,
lag_bytes bigint,
confirmed_lsn text,
alert text
)
Columns:
| Column | Type | Description |
|---|---|---|
source_relid | bigint | OID of the tracked source table |
source_table | text | Resolved name of the source table (e.g., public.orders) |
cdc_mode | text | Current CDC mode: TRIGGER, TRANSITIONING, or WAL |
slot_name | text | Replication slot name (NULL for TRIGGER mode) |
lag_bytes | bigint | Replication slot lag in bytes (NULL for TRIGGER mode) |
confirmed_lsn | text | Last confirmed WAL position (NULL for TRIGGER mode) |
alert | text | Alert message if unhealthy (e.g., slot_lag_exceeds_threshold, replication_slot_missing) |
Example:
SELECT * FROM pgtrickle.check_cdc_health();
| source_relid | source_table | cdc_mode | slot_name | lag_bytes | confirmed_lsn | alert |
|---|---|---|---|---|---|---|
| 16384 | public.orders | TRIGGER | ||||
| 16390 | public.events | WAL | pg_trickle_slot_16390 | 524288 | 0/1A8B000 |
pgtrickle.change_buffer_sizes
Show pending change counts and estimated on-disk sizes for all CDC-tracked source tables.
Returns one row per (stream_table, source_table) pair.
pgtrickle.change_buffer_sizes() → SETOF record(
stream_table text, -- qualified stream table name
source_table text, -- qualified source table name
source_oid bigint,
cdc_mode text, -- 'trigger', 'wal', or 'transitioning'
pending_rows bigint, -- rows in buffer not yet consumed
buffer_bytes bigint -- estimated buffer table size in bytes
)
Example:
SELECT * FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
Useful for spotting a source table whose CDC buffer is growing unexpectedly (which may indicate a stalled differential refresh or a high-write source that has outpaced the schedule).
pgtrickle.worker_pool_status
Snapshot of the parallel refresh worker pool. Returns a single row.
pgtrickle.worker_pool_status() → SETOF record(
active_workers int, -- workers currently executing refresh jobs
max_workers int, -- cluster-wide worker budget (GUC)
per_db_cap int, -- per-database dispatch cap (GUC)
parallel_mode text -- current parallel_refresh_mode value
)
Example:
SELECT * FROM pgtrickle.worker_pool_status();
Returns 0 active workers when parallel_refresh_mode = 'off'.
pgtrickle.parallel_job_status
Active and recently completed scheduler jobs from the pgt_scheduler_jobs
table. Shows jobs that are currently queued or running, plus jobs that
finished within the last max_age_seconds (default 300).
pgtrickle.parallel_job_status(
max_age_seconds int DEFAULT 300
) → SETOF record(
job_id bigint,
unit_key text, -- stable unit identifier (s:42, a:1,2, etc.)
unit_kind text, -- 'singleton', 'atomic_group', 'immediate_closure'
status text, -- 'QUEUED', 'RUNNING', 'SUCCEEDED', etc.
member_count int,
attempt_no int,
scheduler_pid int,
worker_pid int, -- NULL if not yet claimed
enqueued_at timestamptz,
started_at timestamptz, -- NULL if still queued
finished_at timestamptz, -- NULL if not finished
duration_ms float8 -- NULL if not finished
)
Example — show running and recently failed jobs:
SELECT job_id, unit_key, status, duration_ms
FROM pgtrickle.parallel_job_status(60)
WHERE status NOT IN ('SUCCEEDED');
pgtrickle.trigger_inventory
List all CDC triggers that pg_trickle should have installed, and verify each one exists and is enabled in pg_catalog.
pgtrickle.trigger_inventory() → SETOF record(
source_table text, -- qualified source table name
source_oid bigint,
trigger_name text, -- expected trigger name
trigger_type text, -- 'DML' or 'TRUNCATE'
present bool, -- trigger exists in pg_catalog
enabled bool -- trigger is not disabled
)
A present = false row means change capture is broken for that source.
Example:
-- Show only missing or disabled triggers:
SELECT source_table, trigger_type, trigger_name
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;
pgtrickle.fuse_status
Return the circuit-breaker (fuse) state for every stream table that has a fuse configured.
pgtrickle.fuse_status() → SETOF record(
name text, -- stream table name
fuse_mode text, -- 'off', 'on', or 'auto'
fuse_state text, -- 'armed' or 'blown'
fuse_ceiling bigint, -- change-count threshold
fuse_sensitivity int, -- consecutive over-ceiling cycles before blow
blown_at timestamptz, -- when the fuse last blew (NULL if armed)
blow_reason text -- reason the fuse blew (NULL if armed)
)
Example:
-- Check all fuse-enabled stream tables
SELECT name, fuse_mode, fuse_state, fuse_ceiling, blown_at
FROM pgtrickle.fuse_status();
-- Find blown fuses
SELECT name, blow_reason, blown_at
FROM pgtrickle.fuse_status()
WHERE fuse_state = 'blown';
Notes:
- Returns one row per stream table where
fuse_mode != 'off'. - A blown fuse suspends differential refreshes until cleared with
pgtrickle.reset_fuse(). - A
pgtrickle_alertNOTIFY with eventfuse_blownis emitted when the fuse trips. - See Configuration — fuse_default_ceiling for global defaults.
pgtrickle.reset_fuse
Clear a blown circuit-breaker fuse and resume scheduling for the stream table.
pgtrickle.reset_fuse(name text, action text DEFAULT 'apply') → void
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
name | text | — | Name of the stream table whose fuse to reset. |
action | text | 'apply' | How to handle the pending changes that caused the fuse to blow. |
Actions:
| Action | Behavior |
|---|---|
'apply' | Process all pending changes normally and resume scheduling. |
'reinitialize' | Drop and repopulate the stream table from scratch (full refresh from defining query). |
'skip_changes' | Discard the pending changes that triggered the fuse and resume from the current frontier. |
Example:
-- After investigating a bulk load, apply the changes:
SELECT pgtrickle.reset_fuse('category_summary', action => 'apply');
-- Or skip the oversized batch entirely:
SELECT pgtrickle.reset_fuse('category_summary', action => 'skip_changes');
-- Or rebuild from scratch:
SELECT pgtrickle.reset_fuse('category_summary', action => 'reinitialize');
Notes:
- Errors if the stream table's fuse is not in
'blown'state. - After reset, the fuse returns to
'armed'state and the scheduler resumes normal operation. - Use
pgtrickle.fuse_status()to inspect the fuse state before resetting. - The
'skip_changes'action advances the frontier past the pending changes without applying them — use only when you are certain the changes should be discarded.
Dependency & Inspection
Visualize dependencies, understand query plans, and audit source table relationships.
pgtrickle.dependency_tree
Render all stream table dependencies as an indented ASCII tree.
pgtrickle.dependency_tree() → SETOF record(
tree_line text, -- indented visual line (├──, └──, │ characters)
node text, -- qualified name (schema.table)
node_type text, -- 'stream_table' or 'source_table'
depth int,
status text, -- NULL for source_table nodes
refresh_mode text -- NULL for source_table nodes
)
Roots (stream tables with no stream-table parents) appear at depth 0. Each
dependent is indented beneath its parent. Plain source tables are rendered as
leaf nodes tagged [src].
Example:
SELECT tree_line, status, refresh_mode
FROM pgtrickle.dependency_tree();
tree_line status refresh_mode
----------------------------------------+---------+--------------
report_summary ACTIVE DIFFERENTIAL
├── orders_by_region ACTIVE DIFFERENTIAL
│ ├── public.orders [src]
│ └── public.customers [src]
└── revenue_totals ACTIVE DIFFERENTIAL
└── public.orders [src]
pgtrickle.diamond_groups
List all detected diamond dependency groups and their members.
When stream tables form diamond-shaped dependency graphs (multiple paths converge at a single fan-in node), the scheduler groups them for coordinated refresh. This function exposes those groups for monitoring and debugging.
pgtrickle.diamond_groups() → SETOF record(
group_id int4,
member_name text,
member_schema text,
is_convergence bool,
epoch int8,
schedule_policy text
)
Return columns:
| Column | Type | Description |
|---|---|---|
group_id | int4 | Numeric identifier for the consistency group (1-based). |
member_name | text | Name of the stream table in this group. |
member_schema | text | Schema of the stream table. |
is_convergence | bool | true if this member is a convergence (fan-in) node where multiple paths meet. |
epoch | int8 | Group epoch counter — advances on each successful atomic refresh of the group. |
schedule_policy | text | Effective schedule policy for this group ('fastest' or 'slowest'). Computed from convergence node settings with strictest-wins. |
Example:
SELECT * FROM pgtrickle.diamond_groups();
| group_id | member_name | member_schema | is_convergence | epoch | schedule_policy |
|---|---|---|---|---|---|
| 1 | st_b | public | false | 0 | fastest |
| 1 | st_c | public | false | 0 | fastest |
| 1 | st_d | public | true | 0 | fastest |
Notes:
- Singleton stream tables (not part of any diamond) are omitted.
- The DAG is rebuilt on each call from the catalog — results reflect the current dependency graph.
- Groups are only relevant when
diamond_consistency = 'atomic'is set on the convergence node or globally via thepg_trickle.diamond_consistencyGUC.
pgtrickle.pgt_scc_status
List all cyclic strongly connected components (SCCs) and their convergence status.
When stream tables form circular dependencies (with pg_trickle.allow_circular = true), they are grouped into SCCs and iterated to a fixed point. This function exposes those groups for monitoring and debugging.
pgtrickle.pgt_scc_status() → SETOF record(
scc_id int4,
member_count int4,
members text[],
last_iterations int4,
last_converged_at timestamptz
)
Return columns:
| Column | Type | Description |
|---|---|---|
scc_id | int4 | SCC group identifier (1-based). |
member_count | int4 | Number of stream tables in this SCC. |
members | text[] | Array of schema.name for each member. |
last_iterations | int4 | Number of fixpoint iterations in the last convergence (NULL if never iterated). |
last_converged_at | timestamptz | Timestamp of the most recent refresh among SCC members (NULL if never refreshed). |
Example:
SELECT * FROM pgtrickle.pgt_scc_status();
| scc_id | member_count | members | last_iterations | last_converged_at |
|---|---|---|---|---|
| 1 | 2 | {public.reach_a,public.reach_b} | 3 | 2026-03-15 12:00:00+00 |
Notes:
- Only cyclic SCCs (with
scc_id IS NOT NULL) are returned. Acyclic stream tables are omitted. last_iterationsreflects the maximumlast_fixpoint_iterationsacross SCC members.- Results are queried from the catalog on each call.
pgtrickle.explain_st
Explain the DVM plan for a stream table's defining query.
pgtrickle.explain_st(name text) → SETOF record(
property text,
value text
)
Example:
SELECT * FROM pgtrickle.explain_st('order_totals');
| property | value |
|---|---|
| pgt_name | public.order_totals |
| defining_query | SELECT region, SUM(amount) ... |
| refresh_mode | DIFFERENTIAL |
| status | active |
| is_populated | true |
| dvm_supported | true |
| operator_tree | Aggregate → Scan(orders) |
| output_columns | region, total |
| source_oids | 16384 |
| delta_query | WITH ... SELECT ... |
| frontier | {"orders": "0/15A3B80"} |
| amplification_stats | {"samples":10,"min":1.0,...} |
| refresh_timing_stats | {"samples":10,"min_ms":12.3,...} |
| source_partitions | [{"source":"public.orders",...}] |
| dependency_graph_dot | digraph dependency_subgraph { ... } |
| spill_info | {"temp_blks_read":0,"temp_blks_written":1234,...} |
Output Fields
| Property | Description |
|---|---|
pgt_name | Fully-qualified stream table name |
defining_query | The SQL query that defines the stream table |
refresh_mode | DIFFERENTIAL, FULL, or IMMEDIATE |
status | Current status (active, suspended, etc.) |
is_populated | Whether the stream table has been initially populated |
dvm_supported | Whether the defining query supports differential view maintenance |
operator_tree | Debug representation of the DVM operator tree |
output_columns | Comma-separated list of output column names |
source_oids | Comma-separated list of source table OIDs |
aggregate_strategies | Per-aggregate maintenance strategies (JSON, if aggregates present) |
delta_query | The generated delta SQL used for DIFFERENTIAL refresh |
frontier | Current LSN/watermark frontier (JSON) |
amplification_stats | Delta amplification ratio statistics over the last 20 refreshes (JSON) |
refresh_timing_stats | Refresh duration statistics over the last 20 completed refreshes (JSON). Fields: samples, min_ms, max_ms, avg_ms, latest_ms, latest_action |
source_partitions | Partition info for partitioned source tables (JSON array). Fields per entry: source, partition_key, partitions |
dependency_graph_dot | Dependency sub-graph in DOT format. Shows immediate upstream sources (ellipses for base tables, boxes for stream tables) and downstream dependents. Paste into a Graphviz renderer to visualize. |
spill_info | Temp file spill metrics from pg_stat_statements (JSON). Fields: temp_blks_read, temp_blks_written, threshold, exceeds_threshold. Only present when pg_trickle.spill_threshold_blocks > 0. |
Note: Properties are only included when data is available. For example,
source_partitionsonly appears when at least one source table is partitioned, andrefresh_timing_statsonly appears after at least one completed refresh.
pgtrickle.list_sources
List the source tables that a stream table depends on.
pgtrickle.list_sources(name text) → SETOF record(
source_table text, -- qualified source table name
source_oid bigint,
source_type text, -- 'table', 'stream_table', etc.
cdc_mode text, -- 'trigger', 'wal', or 'transitioning'
columns_used text -- column-level dependency info (if available)
)
Example:
SELECT * FROM pgtrickle.list_sources('order_totals');
Returns the tables tracked by CDC for the given stream table, along with how they are being tracked. Useful when diagnosing why a stream table is not refreshing or to audit which source tables are being trigger-tracked.
Utilities
Utility functions for CDC management and row identity hashing.
pgtrickle.rebuild_cdc_triggers
Rebuild all CDC triggers (function body + trigger DDL) for every source table tracked by pg_trickle. This recreates trigger functions and re-attaches the trigger to each source table.
pgtrickle.rebuild_cdc_triggers() → text
Returns 'done' on success. Emits a WARNING per table on error and
continues processing remaining sources.
When to use:
- After changing
pg_trickle.cdc_trigger_modefromrowtostatement(or vice versa). - After
ALTER EXTENSION pg_trickle UPDATEwhen the CDC trigger function body has changed. - After restoring from a backup where triggers may have been lost.
Example:
-- Switch to statement-level triggers and rebuild
SET pg_trickle.cdc_trigger_mode = 'statement';
SELECT pgtrickle.rebuild_cdc_triggers();
Notes:
- Called automatically during
ALTER EXTENSION pg_trickle UPDATE(0.3.0 → 0.4.0) migration. - Safe to call at any time — existing triggers are dropped and recreated.
- On error for a specific table, a
WARNINGis logged and processing continues with remaining sources.
pgtrickle.pg_trickle_hash
Compute a 64-bit xxHash row ID from a text value.
pgtrickle.pg_trickle_hash(input text) → bigint
Marked IMMUTABLE, PARALLEL SAFE.
Example:
SELECT pgtrickle.pg_trickle_hash('some_key');
-- Returns: 1234567890123456789
pgtrickle.pg_trickle_hash_multi
Compute a row ID by hashing multiple text values (composite keys).
pgtrickle.pg_trickle_hash_multi(inputs text[]) → bigint
Marked IMMUTABLE, PARALLEL SAFE. Uses \x1E (record separator) between values and \x00NULL\x00 for NULL entries.
Example:
SELECT pgtrickle.pg_trickle_hash_multi(ARRAY['key1', 'key2']);
Operator Support Matrix — Summary
pg_trickle supports 60+ SQL constructs across three refresh modes. The table below summarises broad categories. For the complete per-operator matrix (including notes on caveats, auxiliary columns and strategies), see DVM_OPERATORS.md.
| Category | FULL | DIFFERENTIAL | IMMEDIATE | Notes |
|---|---|---|---|---|
| Basic SELECT / WHERE / DISTINCT | ✅ | ✅ | ✅ | |
| Joins (INNER, LEFT, RIGHT, FULL, CROSS, LATERAL) | ✅ | ✅ | ✅ | Hybrid delta strategy |
| Subqueries (EXISTS, IN, NOT EXISTS, NOT IN, scalar) | ✅ | ✅ | ✅ | |
| Set operations (UNION ALL, INTERSECT, EXCEPT) | ✅ | ✅ | ✅ | |
| Algebraic aggregates (COUNT, SUM, AVG, STDDEV, …) | ✅ | ✅ | ✅ | Fully invertible delta |
| Semi-algebraic aggregates (MIN, MAX) | ✅ | ✅ | ✅ | Group rescan on ambiguous delete |
| Group-rescan aggregates (STRING_AGG, ARRAY_AGG, …) | ✅ | ⚠️ | ⚠️ | Warning emitted at creation time |
| Window functions (ROW_NUMBER, RANK, LAG, LEAD, …) | ✅ | ✅ | ✅ | Partition-scoped recompute |
| CTEs (non-recursive and WITH RECURSIVE) | ✅ | ✅ | ✅ | Semi-naive / DRed strategies |
| TopK (ORDER BY … LIMIT) | ✅ | ✅ | ✅ | Scoped recomputation |
| LATERAL / set-returning functions / JSON_TABLE | ✅ | ✅ | ✅ | Row-scoped re-execution |
| ST-to-ST dependencies | ✅ | ✅ | ✅ | Differential via change buffers |
| VOLATILE functions | ✅ | ❌ | ❌ | Rejected at creation time |
Legend: ✅ fully supported — ⚠️ supported with caveats — ❌ not supported
For details on each operator's delta strategy, auxiliary columns, and known limitations, see the full Operator Support Matrix.
Expression Support
pgtrickle's DVM parser supports a wide range of SQL expressions in defining queries. All expressions work in both FULL and DIFFERENTIAL modes.
Conditional Expressions
| Expression | Example | Notes |
|---|---|---|
CASE WHEN … THEN … ELSE … END | CASE WHEN amount > 100 THEN 'high' ELSE 'low' END | Searched CASE |
CASE <expr> WHEN … THEN … END | CASE status WHEN 1 THEN 'active' WHEN 2 THEN 'inactive' END | Simple CASE |
COALESCE(a, b, …) | COALESCE(phone, email, 'unknown') | Returns first non-NULL argument |
NULLIF(a, b) | NULLIF(divisor, 0) | Returns NULL if a = b |
GREATEST(a, b, …) | GREATEST(score1, score2, score3) | Returns the largest value |
LEAST(a, b, …) | LEAST(price, max_price) | Returns the smallest value |
Comparison Operators
| Expression | Example | Notes |
|---|---|---|
IN (list) | category IN ('A', 'B', 'C') | Also supports NOT IN |
BETWEEN a AND b | price BETWEEN 10 AND 100 | Also supports NOT BETWEEN |
IS DISTINCT FROM | a IS DISTINCT FROM b | NULL-safe inequality |
IS NOT DISTINCT FROM | a IS NOT DISTINCT FROM b | NULL-safe equality |
SIMILAR TO | name SIMILAR TO '%pattern%' | SQL regex matching |
op ANY(array) | id = ANY(ARRAY[1,2,3]) | Array comparison |
op ALL(array) | score > ALL(ARRAY[50,60]) | Array comparison |
Boolean Tests
| Expression | Example |
|---|---|
IS TRUE | active IS TRUE |
IS NOT TRUE | flag IS NOT TRUE |
IS FALSE | completed IS FALSE |
IS NOT FALSE | valid IS NOT FALSE |
IS UNKNOWN | result IS UNKNOWN |
IS NOT UNKNOWN | flag IS NOT UNKNOWN |
SQL Value Functions
| Function | Description |
|---|---|
CURRENT_DATE | Current date |
CURRENT_TIME | Current time with time zone |
CURRENT_TIMESTAMP | Current date and time with time zone |
LOCALTIME | Current time without time zone |
LOCALTIMESTAMP | Current date and time without time zone |
CURRENT_ROLE | Current role name |
CURRENT_USER | Current user name |
SESSION_USER | Session user name |
CURRENT_CATALOG | Current database name |
CURRENT_SCHEMA | Current schema name |
Array and Row Expressions
| Expression | Example | Notes |
|---|---|---|
ARRAY[…] | ARRAY[1, 2, 3] | Array constructor |
ROW(…) | ROW(a, b, c) | Row constructor |
| Array subscript | arr[1] | Array element access |
| Field access | (rec).field | Composite type field access |
| Star indirection | (data).* | Expand all fields |
Subquery Expressions
Subqueries are supported in the WHERE clause and SELECT list. They are parsed into dedicated DVM operators with specialized delta computation for incremental maintenance.
| Expression | Example | DVM Operator |
|---|---|---|
EXISTS (subquery) | WHERE EXISTS (SELECT 1 FROM orders WHERE orders.cid = c.id) | Semi-Join |
NOT EXISTS (subquery) | WHERE NOT EXISTS (SELECT 1 FROM orders WHERE orders.cid = c.id) | Anti-Join |
IN (subquery) | WHERE id IN (SELECT product_id FROM order_items) | Semi-Join (rewritten as equality) |
NOT IN (subquery) | WHERE id NOT IN (SELECT product_id FROM order_items) | Anti-Join |
ALL (subquery) | WHERE price > ALL (SELECT price FROM competitors) | Anti-Join (NULL-safe) |
| Scalar subquery (SELECT) | SELECT (SELECT max(price) FROM products) AS max_p | Scalar Subquery |
Notes:
EXISTSandIN (subquery)in theWHEREclause are transformed into semi-join operators.NOT EXISTSandNOT IN (subquery)become anti-join operators.- Multi-column
IN (subquery)is not supported (e.g.,WHERE (a, b) IN (SELECT x, y FROM ...)). Rewrite asWHERE EXISTS (SELECT 1 FROM ... WHERE a = x AND b = y)for equivalent semantics. - Multiple subqueries in the same
WHEREclause are supported when combined withAND. Subqueries combined withORare also supported — they are automatically rewritten intoUNIONof separate filtered queries. - Scalar subqueries in the
SELECTlist are supported as long as they return exactly one row and one column. ALL (subquery)is supported — see the worked example below.
ALL (subquery) — Worked Example
ALL (subquery) tests whether a comparison holds against every row returned
by the subquery. pg_trickle rewrites it to a NULL-safe anti-join so it can be
maintained incrementally.
Comparison operators supported: >, >=, <, <=, =, <>
Example — products cheaper than all competitors:
-- Source tables
CREATE TABLE products (
id INT PRIMARY KEY,
name TEXT,
price NUMERIC
);
CREATE TABLE competitor_prices (
id INT PRIMARY KEY,
product_id INT,
price NUMERIC
);
-- Sample data
INSERT INTO products VALUES (1, 'Widget', 9.99), (2, 'Gadget', 24.99), (3, 'Gizmo', 14.99);
INSERT INTO competitor_prices VALUES (1, 1, 12.99), (2, 1, 11.50), (3, 2, 19.99), (4, 3, 14.99);
-- Stream table: find products priced below ALL competitor prices
SELECT pgtrickle.create_stream_table(
name => 'cheapest_products',
query => $$
SELECT p.id, p.name, p.price
FROM products p
WHERE p.price < ALL (
SELECT cp.price
FROM competitor_prices cp
WHERE cp.product_id = p.id
)
$$,
schedule => '1m'
);
Result: Widget (9.99 < all of [12.99, 11.50]) is included. Gadget (24.99 ≮ 19.99) is excluded. Gizmo (14.99 ≮ 14.99) is excluded.
How pg_trickle handles it internally:
WHERE price < ALL (SELECT ...)is parsed into an anti-join with a NULL-safe condition.- The condition
NOT (x op col)is wrapped as(col IS NULL OR NOT (x op col))to correctly handle NULL values in the subquery — if any subquery row is NULL, the ALL comparison fails (standard SQL semantics). - The anti-join uses the same incremental delta computation as
NOT EXISTS, so changes to eitherproductsorcompetitor_pricesare propagated efficiently.
Other common patterns:
-- Employees whose salary meets or exceeds all department maximums
WHERE salary >= ALL (SELECT max_salary FROM department_caps)
-- Orders with ratings better than all thresholds
WHERE rating > ALL (SELECT min_rating FROM quality_thresholds)
Auto-Rewrite Pipeline
pg_trickle transparently rewrites certain SQL constructs before parsing. These rewrites are applied automatically and require no user action:
| Order | Trigger | Rewrite |
|---|---|---|
| #0 | View references in FROM | Inline view body as subquery |
| #1 | DISTINCT ON (expr) | Convert to ROW_NUMBER() OVER (PARTITION BY expr ORDER BY ...) = 1 subquery |
| #2 | GROUPING SETS / CUBE / ROLLUP | Decompose into UNION ALL of separate GROUP BY queries |
| #3 | Scalar subquery in WHERE | Convert to CROSS JOIN with inline view |
| #4 | Correlated scalar subquery in SELECT | Convert to LEFT JOIN with grouped inline view |
| #5 | EXISTS/IN inside OR | Split into UNION of separate filtered queries |
| #6 | Multiple PARTITION BY clauses | Split into joined subqueries, one per distinct partitioning |
| #7 | Window functions inside expressions | Lift to inner subquery with synthetic __pgt_wf_N columns (see below) |
Window Functions in Expressions (Auto-Rewrite)
Window functions nested inside expressions (e.g., CASE WHEN ROW_NUMBER() ...,
ABS(RANK() OVER (...) - 5)) are automatically rewritten. pg_trickle lifts
each window function call into a synthetic column in an inner subquery, then
applies the original expression in the outer SELECT.
This rewrite is transparent — you write your query naturally and pg_trickle handles it:
Your query:
SELECT
id,
name,
CASE WHEN ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) = 1
THEN 'top earner'
ELSE 'other'
END AS rank_label
FROM employees
What pg_trickle generates internally:
SELECT
"__pgt_wf_inner".id,
"__pgt_wf_inner".name,
CASE WHEN "__pgt_wf_inner"."__pgt_wf_1" = 1
THEN 'top earner'
ELSE 'other'
END AS "rank_label"
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS "__pgt_wf_1"
FROM employees
) "__pgt_wf_inner"
The inner subquery produces the window function result as a plain column
(__pgt_wf_1), which the DVM engine can maintain incrementally using its
existing window function support. The outer expression is then a simple
column reference.
More examples:
-- Arithmetic with window functions
SELECT id, ABS(RANK() OVER (ORDER BY score) - 5) AS adjusted_rank
FROM players
-- COALESCE with window function
SELECT id, COALESCE(LAG(value) OVER (ORDER BY ts), 0) AS prev_value
FROM sensor_readings
-- Multiple window functions in expressions
SELECT id,
ROW_NUMBER() OVER (ORDER BY created_at) * 100 AS seq,
SUM(amount) OVER (ORDER BY created_at) / COUNT(*) OVER (ORDER BY created_at) AS running_avg
FROM transactions
All of these are handled automatically — each distinct window function call
is extracted to its own __pgt_wf_N synthetic column.
HAVING Clause
HAVING is fully supported. The filter predicate is applied on top of the aggregate delta computation — groups that pass the HAVING condition are included in the stream table.
SELECT pgtrickle.create_stream_table(
name => 'big_departments',
query => 'SELECT department, COUNT(*) AS cnt FROM employees GROUP BY department HAVING COUNT(*) > 10',
schedule => '1m'
);
Tables Without Primary Keys (Keyless Tables)
Tables without a primary key can be used as sources. pg_trickle generates a content-based row identity
by hashing all column values using pg_trickle_hash_multi(). This allows DIFFERENTIAL mode to work,
though at the cost of being unable to distinguish truly duplicate rows (rows with identical values in all columns).
-- No primary key — pg_trickle uses content hashing for row identity
CREATE TABLE events (ts TIMESTAMPTZ, payload JSONB);
SELECT pgtrickle.create_stream_table(
name => 'event_summary',
query => 'SELECT payload->>''type'' AS event_type, COUNT(*) FROM events GROUP BY 1',
schedule => '1m'
);
Known Limitation — Duplicate Rows in Keyless Tables (G7.1)
When a keyless table contains exact duplicate rows (identical values in every column), content-based hashing produces the same
__pgt_row_idfor each copy. Consequences:
- INSERT of a duplicate row may appear as a no-op (the hash already exists in the stream table).
- DELETE of one copy may delete all copies (the MERGE matches on
__pgt_row_id, hitting every duplicate).- Aggregate counts over keyless tables with duplicates may drift from the true query result.
Recommendation: Add a
PRIMARY KEYor at least aUNIQUEconstraint to source tables used in DIFFERENTIAL mode. This eliminates the ambiguity entirely. If duplicates are expected and correctness matters, useFULLrefresh mode, which always recomputes from scratch.
Volatile Function Detection
pg_trickle checks all functions and operators in the defining query against pg_proc.provolatile:
- VOLATILE functions (e.g.,
random(),clock_timestamp(),gen_random_uuid()) are rejected in DIFFERENTIAL and IMMEDIATE modes because they produce different results on each evaluation, breaking delta correctness. - VOLATILE operators — custom operators backed by volatile functions are also detected. The check resolves the operator’s implementation function via
pg_operator.oprcodeand checks its volatility inpg_proc. - STABLE functions (e.g.,
now(),current_timestamp,current_setting()) produce a warning in DIFFERENTIAL and IMMEDIATE modes — they are consistent within a single refresh but may differ between refreshes. - IMMUTABLE functions are always safe and produce no warnings.
FULL mode accepts all volatility classes since it re-evaluates the entire query each time.
Volatile Function Policy (VOL-1)
The pg_trickle.volatile_function_policy GUC controls how volatile functions are handled:
| Value | Behavior |
|---|---|
reject (default) | ERROR — volatile functions are rejected at creation time. |
warn | WARNING emitted but creation proceeds. Delta correctness is not guaranteed. |
allow | Silent — no warning or error. Use when you understand the implications. |
-- Allow volatile functions with a warning
SET pg_trickle.volatile_function_policy = 'warn';
-- Allow volatile functions silently
SET pg_trickle.volatile_function_policy = 'allow';
-- Restore default (reject volatile functions)
SET pg_trickle.volatile_function_policy = 'reject';
COLLATE Expressions
COLLATE clauses on expressions are supported:
SELECT pgtrickle.create_stream_table(
name => 'sorted_names',
query => 'SELECT name COLLATE "C" AS c_name FROM users',
schedule => '1m'
);
IS JSON Predicate (PostgreSQL 16+)
The IS JSON predicate validates whether a value is valid JSON. All variants are supported:
-- Filter rows with valid JSON
SELECT pgtrickle.create_stream_table(
name => 'valid_json_events',
query => 'SELECT id, payload FROM events WHERE payload::text IS JSON',
schedule => '1m'
);
-- Type-specific checks
SELECT pgtrickle.create_stream_table(
name => 'json_objects_only',
query => 'SELECT id, data IS JSON OBJECT AS is_obj,
data IS JSON ARRAY AS is_arr,
data IS JSON SCALAR AS is_scalar
FROM json_data',
schedule => '1m',
refresh_mode => 'FULL'
);
Supported variants: IS JSON, IS JSON OBJECT, IS JSON ARRAY, IS JSON SCALAR, IS NOT JSON (all forms), WITH UNIQUE KEYS.
SQL/JSON Constructors (PostgreSQL 16+)
SQL-standard JSON constructor functions are supported in both FULL and DIFFERENTIAL modes:
-- JSON_OBJECT: construct a JSON object from key-value pairs
SELECT pgtrickle.create_stream_table(
name => 'user_json',
query => 'SELECT id, JSON_OBJECT(''name'' : name, ''age'' : age) AS data FROM users',
schedule => '1m'
);
-- JSON_ARRAY: construct a JSON array from values
SELECT pgtrickle.create_stream_table(
name => 'value_arrays',
query => 'SELECT id, JSON_ARRAY(a, b, c) AS arr FROM measurements',
schedule => '1m',
refresh_mode => 'FULL'
);
-- JSON(): parse a text value as JSON
-- JSON_SCALAR(): wrap a scalar value as JSON
-- JSON_SERIALIZE(): serialize a JSON value to text
Note:
JSON_ARRAYAGG()andJSON_OBJECTAGG()are SQL-standard aggregate functions fully recognized by the DVM engine. In DIFFERENTIAL mode, they use the group-rescan strategy (affected groups are re-aggregated from source data). The full deparsed SQL is preserved to handle the specialkey: value,ABSENT ON NULL,ORDER BY, andRETURNINGclause syntax.
JSON_TABLE (PostgreSQL 17+)
JSON_TABLE() generates a relational table from JSON data. It is supported in the FROM clause in both FULL and DIFFERENTIAL modes. Internally, it is modeled as a LateralFunction.
-- Extract structured data from a JSON column
SELECT pgtrickle.create_stream_table(
name => 'user_phones',
query => $$SELECT u.id, j.phone_type, j.phone_number
FROM users u,
JSON_TABLE(u.contact_info, '$.phones[*]'
COLUMNS (
phone_type TEXT PATH '$.type',
phone_number TEXT PATH '$.number'
)
) AS j$$,
schedule => '1m'
);
Supported column types:
- Regular columns —
name TYPE PATH '$.path'(with optionalON ERROR/ON EMPTYbehaviors) - EXISTS columns —
name TYPE EXISTS PATH '$.path' - Formatted columns —
name TYPE FORMAT JSON PATH '$.path' - Nested columns —
NESTED PATH '$.path' COLUMNS (...)
The PASSING clause is also supported for passing named variables to path expressions.
Unsupported Expression Types
The following are rejected with clear error messages rather than producing broken SQL:
| Expression | Error Behavior | Suggested Rewrite |
|---|---|---|
TABLESAMPLE | Rejected — stream tables materialize the complete result set | Use WHERE random() < 0.1 if sampling is needed |
FOR UPDATE / FOR SHARE | Rejected — stream tables do not support row-level locking | Remove the locking clause |
| Unknown node types | Rejected with type information | — |
Note: Window functions inside expressions (e.g.,
CASE WHEN ROW_NUMBER() OVER (...) ...) were unsupported in earlier versions but are now automatically rewritten — see Auto-Rewrite Pipeline § Window Functions in Expressions.
Restrictions & Interoperability
Stream tables are standard PostgreSQL heap tables stored in the pgtrickle schema with an additional __pgt_row_id BIGINT PRIMARY KEY column managed by the refresh engine. This section describes what you can and cannot do with them.
Referencing Other Stream Tables
Stream tables can reference other stream tables in their defining query. This creates a dependency edge in the internal DAG, and the scheduler refreshes upstream tables before downstream ones. By default, cycles are detected and rejected at creation time.
When pg_trickle.allow_circular = true, circular dependencies are allowed for stream tables that use DIFFERENTIAL refresh mode and have monotone defining queries (no aggregates, EXCEPT, window functions, or NOT EXISTS/NOT IN). Cycle members are assigned an scc_id and the scheduler iterates them to a fixed point. Non-monotone operators are rejected because they prevent convergence.
-- ST1 reads from a base table
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
schedule => '1m'
);
-- ST2 reads from ST1
SELECT pgtrickle.create_stream_table(
name => 'big_customers',
query => 'SELECT customer_id, total FROM pgtrickle.order_totals WHERE total > 1000',
schedule => '1m'
);
Views as Sources in Defining Queries
PostgreSQL views can be used as source tables in a stream table's defining query. Views are automatically inlined — replaced with their underlying SELECT definition as subqueries — so CDC triggers land on the actual base tables.
CREATE VIEW active_orders AS
SELECT * FROM orders WHERE status = 'active';
-- This works (views are auto-inlined):
SELECT pgtrickle.create_stream_table(
name => 'order_summary',
query => 'SELECT customer_id, COUNT(*) FROM active_orders GROUP BY customer_id',
schedule => '1m'
);
-- Internally, 'active_orders' is replaced with:
-- (SELECT ... FROM orders WHERE status = 'active') AS active_orders
Nested views (view → view → table) are fully expanded via a fixpoint loop. Column-renaming views (CREATE VIEW v(a, b) AS ...) work correctly — pg_get_viewdef() produces the proper column aliases.
When a view is inlined, the user's original SQL is stored in the original_query catalog column for reinit and introspection. The defining_query column contains the expanded (post-inlining) form.
DDL hooks: CREATE OR REPLACE VIEW on a view that was inlined into a stream table marks that ST for reinit. DROP VIEW sets affected STs to ERROR status.
Materialized views are rejected in DIFFERENTIAL mode — their stale-snapshot semantics prevent CDC triggers from tracking changes. Use the underlying query directly, or switch to FULL mode. In FULL mode, materialized views are allowed (no CDC needed).
Foreign tables are rejected in DIFFERENTIAL mode — row-level triggers cannot be created on foreign tables. Use FULL mode instead.
Partitioned Tables as Sources
Partitioned tables are fully supported as source tables in both FULL and DIFFERENTIAL modes. CDC triggers are installed on the partitioned parent table, and PostgreSQL 13+ ensures the trigger fires for all DML routed to child partitions. The change buffer uses the parent table's OID (pgtrickle_changes.changes_<parent_oid>).
CREATE TABLE orders (
id INT, region TEXT, amount NUMERIC
) PARTITION BY LIST (region);
CREATE TABLE orders_us PARTITION OF orders FOR VALUES IN ('US');
CREATE TABLE orders_eu PARTITION OF orders FOR VALUES IN ('EU');
-- Works — inserts into any partition are captured:
SELECT pgtrickle.create_stream_table(
name => 'order_summary',
query => 'SELECT region, SUM(amount) FROM orders GROUP BY region',
schedule => '1m'
);
ATTACH PARTITION detection: When a new partition is attached to a tracked
source table via ALTER TABLE parent ATTACH PARTITION child ..., pg_trickle's
DDL event trigger detects the change in partition structure and automatically
marks affected stream tables for reinitialize. This ensures pre-existing rows
in the newly attached partition are included on the next refresh. DETACH
PARTITION is also detected and triggers reinitialization.
WAL mode: When using WAL-based CDC (cdc_mode = 'wal'), publications for
partitioned source tables are created with publish_via_partition_root = true.
This ensures changes from child partitions are published under the parent
table's identity, matching trigger-mode CDC behavior.
Note: pg_trickle targets PostgreSQL 18. On PostgreSQL 12 or earlier (not supported), parent triggers do not fire for partition-routed rows, which would cause silent data loss.
Foreign Tables as Sources
Foreign tables (via postgres_fdw or other FDWs) can be used as stream table
sources with these constraints:
| CDC Method | Supported? | Why |
|---|---|---|
| Trigger-based | ❌ No | Foreign tables don't support row-level triggers |
| WAL-based | ❌ No | Foreign tables don't generate local WAL entries |
| FULL refresh | ✅ Yes | Re-executes the remote query each cycle |
| Polling-based | ✅ Yes | When pg_trickle.foreign_table_polling = on |
-- Foreign table source — FULL refresh only
SELECT pgtrickle.create_stream_table(
name => 'remote_summary',
query => 'SELECT region, SUM(amount) FROM remote_orders GROUP BY region',
schedule => '5m',
refresh_mode => 'FULL'
);
When pg_trickle detects a foreign table source, it emits an INFO message explaining the constraints. If you attempt to use DIFFERENTIAL mode without polling enabled, the creation will succeed but the refresh falls back to FULL.
Polling-based CDC creates a local snapshot table and computes EXCEPT ALL
differences on each refresh. Enable with:
SET pg_trickle.foreign_table_polling = on;
For a complete step-by-step setup guide, see the Foreign Table Sources tutorial.
IMMEDIATE Mode Query Restrictions
The 'IMMEDIATE' refresh mode supports nearly all SQL constructs supported by 'DIFFERENTIAL' and 'FULL' modes. Queries are validated at stream table creation and when switching to IMMEDIATE mode via alter_stream_table.
Supported in IMMEDIATE mode:
- Simple
SELECT ... FROM tablescans, filters, projections JOIN(INNER, LEFT, FULL OUTER)GROUP BYwith standard aggregates (COUNT,SUM,AVG,MIN,MAX, etc.)DISTINCT- Non-recursive
WITH(CTEs) UNION ALL,INTERSECT,EXCEPTEXISTS/INsubqueries (SemiJoin,AntiJoin)- Subqueries in
FROM - Window functions (
ROW_NUMBER,RANK,DENSE_RANK, etc.) LATERALsubqueriesLATERALset-returning functions (unnest(),jsonb_array_elements(), etc.)- Scalar subqueries in
SELECT - Cascading IMMEDIATE stream tables (ST depending on another IMMEDIATE ST)
- Recursive CTEs (
WITH RECURSIVE) — uses semi-naive evaluation (INSERT-only) or Delete-and-Rederive (DELETE/UPDATE); bounded bypg_trickle.ivm_recursive_max_depth(default 100) to guard against infinite loops from cyclic data
Not yet supported in IMMEDIATE mode:
None — all constructs that work in 'DIFFERENTIAL' mode are now also available in
'IMMEDIATE' mode.
Notes on WITH RECURSIVE in IMMEDIATE mode:
- A
__pgt_depthcounter is injected into the generated semi-naive SQL. Propagation stops when the counter reachesivm_recursive_max_depth(default 100). Raise this GUC for deeper hierarchies or set it to 0 to disable the guard. - A WARNING is emitted at stream table creation time reminding operators to monitor
for
stack depth limit exceedederrors on very deep hierarchies. - Non-linear recursion (multiple self-references) is rejected — PostgreSQL itself enforces this restriction.
Attempting to create a stream table with an unsupported construct produces a clear error message.
Logical Replication Targets
Tables that receive data via logical replication require special consideration. Changes arriving via replication do not fire normal row-level triggers, which means CDC triggers will miss those changes.
pg_trickle emits a WARNING at stream table creation time if any source table is detected as a logical replication target (via pg_subscription_rel).
Workarounds:
- Use
cdc_mode = 'wal'for WAL-based CDC that captures all changes regardless of origin. - Use
FULLrefresh mode, which recomputes entirely from the current table state. - Set a frequent refresh schedule with FULL mode to limit staleness.
Views on Stream Tables
PostgreSQL views can reference stream tables. The view reflects the data as of the most recent refresh.
CREATE VIEW top_customers AS
SELECT customer_id, total
FROM pgtrickle.order_totals
WHERE total > 500
ORDER BY total DESC;
Materialized Views on Stream Tables
Materialized views can reference stream tables, though this is typically redundant (both are physical snapshots of a query). The materialized view requires its own REFRESH MATERIALIZED VIEW — it does not auto-refresh when the stream table refreshes.
Logical Replication of Stream Tables
Stream tables can be published for logical replication like any ordinary table:
-- On publisher
CREATE PUBLICATION my_pub FOR TABLE pgtrickle.order_totals;
-- On subscriber
CREATE SUBSCRIPTION my_sub
CONNECTION 'host=... dbname=...'
PUBLICATION my_pub;
Caveats:
- The
__pgt_row_idcolumn is replicated (it is the primary key), which is an internal implementation detail. - The subscriber receives materialized data, not the defining query. Refreshes on the publisher propagate as normal DML via logical replication.
- Do not install pg_trickle on the subscriber and attempt to refresh the replicated table — it will have no CDC triggers or catalog entries.
- The internal change buffer tables (
pgtrickle_changes.changes_<oid>) and catalog tables are not published by default; subscribers only receive the final output.
Known Delta Computation Limitations
The following edge cases produce incorrect delta results in DIFFERENTIAL mode under specific data mutation patterns. They have no effect on FULL mode.
JOIN Key Column Change + Simultaneous Right-Side Delete — Fixed (EC-01)
Resolved in v0.14.0. This limitation no longer exists — the delta query now uses a pre-change right snapshot (R₀) for DELETE deltas, ensuring stale rows are correctly removed even when the join partner is simultaneously deleted.
The fix splits Part 1 of the JOIN delta into two arms:
- Part 1a (inserts):
ΔL_inserts ⋈ R₁— uses current right state - Part 1b (deletes):
ΔL_deletes ⋈ R₀— uses pre-change right state
R₀ is reconstructed as R_current EXCEPT ALL ΔR_inserts UNION ALL ΔR_deletes (or via
NOT EXISTS anti-join for simple Scan nodes). This ensures the DELETE half always
finds the old join partner, even if that partner was deleted in the same cycle.
The fix applies to INNER JOIN, LEFT JOIN, and FULL OUTER JOIN delta operators. See DVM_OPERATORS.md for implementation details.
CUBE/ROLLUP Expansion Limit
CUBE(a, b, c...n) on N columns generates $2^N$ grouping set branches (a UNION ALL of N queries).
pg_trickle rejects CUBE/ROLLUP that would produce more than 64 branches to prevent runaway
memory usage during query generation. Use explicit GROUPING SETS(...) instead:
-- Rejected: CUBE(a, b, c, d, e, f, g) would generate 128 branches
-- Use instead:
SELECT pgtrickle.create_stream_table(
name => 'multi_dim',
query => 'SELECT a, b, c, SUM(v) FROM t
GROUP BY GROUPING SETS ((a, b, c), (a, b), (a), ())',
schedule => '5m'
);
What Is NOT Allowed
| Operation | Restriction | Reason |
|---|---|---|
Direct DML (INSERT, UPDATE, DELETE) | ❌ Not supported | Stream table contents are managed exclusively by the refresh engine. |
Direct DDL (ALTER TABLE) | ❌ Not supported | Use pgtrickle.alter_stream_table() to change the defining query or schedule. |
| Foreign keys referencing or from a stream table | ❌ Not supported | The refresh engine performs bulk MERGE operations that do not respect FK ordering. |
| User-defined triggers on stream tables | ✅ Supported (DIFFERENTIAL) | In DIFFERENTIAL mode, the refresh engine decomposes changes into explicit DELETE + UPDATE + INSERT statements so triggers fire with correct TG_OP, OLD, and NEW. Row-level triggers are suppressed during FULL refresh. Controlled by pg_trickle.user_triggers GUC (default: auto). |
TRUNCATE on a stream table | ❌ Not supported | Use pgtrickle.refresh_stream_table() to reset data. |
Tip: The
__pgt_row_idcolumn is visible but should be ignored by consuming queries — it is an implementation detail used for deltaMERGEoperations.
Internal __pgt_* Auxiliary Columns
Stream tables may contain additional hidden columns whose names begin with __pgt_. These are managed exclusively by the refresh engine — they are not part of the user-visible schema and should never be read or written by application queries.
__pgt_row_id — Row identity (always present)
Every stream table has a BIGINT PRIMARY KEY column named __pgt_row_id. It is a content hash of all output columns (xxHash3-128 with Fibonacci-mixing of multiple column hashes), updated by the refresh engine on every MERGE. It is used as the MERGE join key to detect inserts/updates/deletes.
__pgt_count — Group multiplicity (aggregates & DISTINCT)
Added when the defining query contains GROUP BY, DISTINCT, UNION ALL ... GROUP BY, or any aggregate expression that requires tracking how many source rows contribute to each output row.
| Type | Triggers |
|---|---|
BIGINT NOT NULL DEFAULT 0 | GROUP BY, DISTINCT, COUNT(*), SUM(...), AVG(...), STDDEV(...), VAR(...), UNION deduplication |
__pgt_count_l / __pgt_count_r — Dual multiplicity (INTERSECT / EXCEPT)
Added when the defining query contains INTERSECT or EXCEPT. Stores independently the left-branch and right-branch row counts for Z-set delta algebra.
| Type | Triggers |
|---|---|
BIGINT NOT NULL DEFAULT 0 each | INTERSECT, INTERSECT ALL, EXCEPT, EXCEPT ALL |
__pgt_aux_sum_<alias> / __pgt_aux_count_<alias> — Running totals for AVG
Pairs of auxiliary columns added for each AVG(expr) in the query. Instead of recomputing the average from scratch on each delta, the refresh engine maintains a running sum and count and derives the average algebraically.
| Type | Triggers |
|---|---|
NUMERIC NOT NULL DEFAULT 0 (sum), BIGINT NOT NULL DEFAULT 0 (count) | Any AVG(expr) in GROUP BY query |
Named __pgt_aux_sum_<output_alias> and __pgt_aux_count_<output_alias>, where <output_alias> is the column alias for the AVG expression in the SELECT list.
__pgt_aux_sum2_<alias> — Sum-of-squares for STDDEV / VARIANCE
Added alongside the sum/count pair when the query contains STDDEV, STDDEV_POP, STDDEV_SAMP, VARIANCE, VAR_POP, or VAR_SAMP. Enables O(1) algebraic computation of variance from the Welford identity.
| Type | Triggers |
|---|---|
NUMERIC NOT NULL DEFAULT 0 | STDDEV(...), STDDEV_POP(...), STDDEV_SAMP(...), VARIANCE(...), VAR_POP(...), VAR_SAMP(...) |
__pgt_aux_sumx_* / __pgt_aux_sumy_* / __pgt_aux_sumxy_* / __pgt_aux_sumx2_* / __pgt_aux_sumy2_* — Cross-product accumulators for regression aggregates
Five auxiliary columns per aggregate, used for O(1) algebraic maintenance of the twelve PostgreSQL regression and correlation aggregates.
| Type | Triggers |
|---|---|
NUMERIC NOT NULL DEFAULT 0 (five columns per aggregate) | CORR(Y,X), COVAR_POP(Y,X), COVAR_SAMP(Y,X), REGR_AVGX(Y,X), REGR_AVGY(Y,X), REGR_COUNT(Y,X), REGR_INTERCEPT(Y,X), REGR_R2(Y,X), REGR_SLOPE(Y,X), REGR_SXX(Y,X), REGR_SXY(Y,X), REGR_SYY(Y,X) |
The five columns are named with base prefix __pgt_aux_<kind>_<output_alias> where <kind> is sumx, sumy, sumxy, sumx2, or sumy2. The shared group count is stored in the companion __pgt_aux_count_<output_alias> column.
__pgt_aux_nonnull_<alias> — Non-NULL count for SUM + FULL OUTER JOIN
Added when the query contains SUM(expr) inside a FULL OUTER JOIN aggregate. When matched rows transition to unmatched (null-padded), standard algebraic SUM would produce 0 instead of NULL. This counter tracks how many non-NULL argument values exist in each group; when it reaches zero the SUM is definitively NULL without a full rescan.
| Type | Triggers |
|---|---|
BIGINT NOT NULL DEFAULT 0 | SUM(expr) in a query with FULL OUTER JOIN at the top level |
__pgt_wf_<N> — Window function lift-out (query rewrite)
Added at query-rewrite time (before storage table creation) when the defining query contains window functions embedded inside larger expressions (e.g. CASE WHEN ROW_NUMBER() OVER (...) = 1 THEN ...). The engine lifts the window function to a synthetic inner-subquery column so the outer SELECT can reference it by alias.
| Type | Triggers |
|---|---|
| Inherits the window-function return type | Window function inside expression — e.g. RANK(), ROW_NUMBER(), DENSE_RANK(), LAG(), LEAD(), etc. |
__pgt_depth — Recursion depth counter (recursive CTE)
Present only inside the DVM-generated SQL for recursive CTE queries. Used to limit unbounded recursion in semi-naive evaluation. Not added as a permanent column to the storage table.
Rule of thumb: Unless you see an
ALTER TABLEquery mentioning one of these columns, they are transparent to consuming queries. NeverSELECT __pgt_*columns in application code — their names, types, and presence may change across minor versions.
Row-Level Security (RLS)
Stream tables follow the same RLS model as PostgreSQL's built-in
MATERIALIZED VIEW: the refresh always materializes the full, unfiltered
result set. Access control is applied at read time via RLS policies on the
stream table itself.
How It Works
| Area | Behavior |
|---|---|
| RLS on source tables | Ignored during refresh. The scheduler runs as superuser; manual refresh_stream_table() and IMMEDIATE-mode triggers bypass RLS via SET LOCAL row_security = off / SECURITY DEFINER. The stream table always contains all rows. |
| RLS on the stream table | Works naturally. Enable RLS and create policies on the stream table to filter reads per role — exactly as you would on any regular table. |
| RLS policy changes on source tables | CREATE POLICY, ALTER POLICY, and DROP POLICY on a source table are detected by pg_trickle's DDL event trigger and mark the stream table for reinitialisation. |
| ENABLE/DISABLE RLS on source tables | ALTER TABLE … ENABLE ROW LEVEL SECURITY and DISABLE ROW LEVEL SECURITY on a source table mark the stream table for reinitialisation. |
| Change buffer tables | RLS is explicitly disabled on all change buffer tables (pgtrickle_changes.changes_*) so CDC trigger inserts always succeed regardless of schema-level RLS settings. |
| IMMEDIATE mode | IVM trigger functions are SECURITY DEFINER with a locked search_path, so the delta query always sees all rows. The DML issued by the calling user is still filtered by that user's RLS policies on the source table — only the stream table maintenance runs with elevated privileges. |
Recommended Pattern: RLS on the Stream Table
-- 1. Create a stream table (materializes all rows)
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT tenant_id, SUM(amount) AS total FROM orders GROUP BY tenant_id'
);
-- 2. Enable RLS on the stream table
ALTER TABLE pgtrickle.order_totals ENABLE ROW LEVEL SECURITY;
-- 3. Create per-tenant policies
CREATE POLICY tenant_isolation ON pgtrickle.order_totals
USING (tenant_id = current_setting('app.tenant_id')::INT);
-- 4. Each role sees only its own rows
SET app.tenant_id = '42';
SELECT * FROM pgtrickle.order_totals; -- only tenant 42's rows
Note: This is identical to how you would apply RLS to a regular
MATERIALIZED VIEW. One stream table serves all tenants; per-tenant filtering happens at query time with zero storage duplication.
Views
pgtrickle.stream_tables_info
Status overview with computed staleness information.
SELECT * FROM pgtrickle.stream_tables_info;
Columns include all pgtrickle.pgt_stream_tables columns plus:
| Column | Type | Description |
|---|---|---|
staleness | interval | now() - last_refresh_at |
stale | bool | true when the scheduler itself is behind (last_refresh_at age exceeds schedule); false when the scheduler is healthy even if source tables have had no writes |
pgtrickle.pg_stat_stream_tables
Comprehensive monitoring view combining catalog metadata with aggregate refresh statistics.
SELECT * FROM pgtrickle.pg_stat_stream_tables;
Key columns:
| Column | Type | Description |
|---|---|---|
pgt_id | bigint | Stream table ID |
pgt_schema / pgt_name | text | Schema and name |
status | text | INITIALIZING, ACTIVE, SUSPENDED, ERROR |
refresh_mode | text | FULL or DIFFERENTIAL |
data_timestamp | timestamptz | Timestamp of last refresh |
staleness | interval | now() - last_refresh_at |
stale | bool | true when the scheduler is behind its schedule; false when the scheduler is healthy (quiet source tables do not count as stale) |
total_refreshes | bigint | Total refresh count |
successful_refreshes | bigint | Successful refresh count |
failed_refreshes | bigint | Failed refresh count |
avg_duration_ms | float8 | Average refresh duration |
consecutive_errors | int | Current error streak |
cdc_modes | text[] | Distinct CDC modes across TABLE-type sources (e.g. {wal}, {trigger,wal}, {transitioning,wal}) |
scc_id | int | SCC group identifier for circular dependencies (NULL if not in a cycle) |
last_fixpoint_iterations | int | Number of fixpoint iterations in the last SCC convergence (NULL if not cyclic) |
pgtrickle.quick_health
Single-row health summary for dashboards and alerting. Returns the overall health status of the pg_trickle extension at a glance.
SELECT * FROM pgtrickle.quick_health;
| Column | Type | Description |
|---|---|---|
total_stream_tables | bigint | Total number of stream tables |
error_tables | bigint | Stream tables with status = 'ERROR' or consecutive_errors > 0 |
stale_tables | bigint | Stream tables whose data is older than their schedule interval |
scheduler_running | boolean | Whether a pg_trickle scheduler backend is detected in pg_stat_activity |
status | text | Overall status: EMPTY, OK, WARNING, or CRITICAL |
Status values:
EMPTY— No stream tables exist.OK— All stream tables are healthy and up-to-date.WARNING— Some tables have errors or are stale.CRITICAL— At least one stream table isSUSPENDED.
pgtrickle.pgt_cdc_status
Convenience view for inspecting the CDC mode and WAL slot state of every TABLE-type source for all stream tables. Useful for monitoring in-progress TRIGGER→WAL transitions.
SELECT * FROM pgtrickle.pgt_cdc_status;
| Column | Type | Description |
|---|---|---|
pgt_schema | text | Schema of the stream table |
pgt_name | text | Name of the stream table |
source_relid | oid | OID of the source table |
source_name | text | Name of the source table |
source_schema | text | Schema of the source table |
cdc_mode | text | Current CDC mode: trigger, transitioning, or wal |
slot_name | text | Replication slot name (NULL for trigger mode) |
decoder_confirmed_lsn | pg_lsn | Last WAL position decoded (NULL for trigger mode) |
transition_started_at | timestamptz | When the trigger→WAL transition began (NULL if not transitioning) |
Subscribe to the pgtrickle_cdc_transition NOTIFY channel to receive real-time
events when a source moves between CDC modes (payload is a JSON object with
source_oid, from, and to fields).
Catalog Tables
pgtrickle.pgt_stream_tables
Core metadata for each stream table.
| Column | Type | Description |
|---|---|---|
pgt_id | bigserial | Primary key |
pgt_relid | oid | OID of the storage table |
pgt_name | text | Table name |
pgt_schema | text | Schema name |
defining_query | text | The SQL query that defines the ST |
original_query | text | The user-supplied query before normalization |
schedule | text | Refresh schedule (duration or cron expression) |
refresh_mode | text | FULL, DIFFERENTIAL, or IMMEDIATE |
status | text | INITIALIZING, ACTIVE, SUSPENDED, ERROR |
is_populated | bool | Whether the table has been populated |
data_timestamp | timestamptz | Timestamp of the data in the ST |
frontier | jsonb | Per-source LSN positions (version tracking) |
last_refresh_at | timestamptz | When last refreshed |
consecutive_errors | int | Current error streak count |
needs_reinit | bool | Whether upstream DDL requires reinitialization |
auto_threshold | double precision | Per-ST adaptive fallback threshold (overrides GUC) |
last_full_ms | double precision | Last FULL refresh duration in milliseconds |
functions_used | text[] | Function names used in the defining query (for DDL tracking) |
topk_limit | int | LIMIT value for TopK stream tables (NULL if not TopK) |
topk_order_by | text | ORDER BY clause SQL for TopK stream tables |
topk_offset | int | OFFSET value for paged TopK queries (NULL if not paged) |
diamond_consistency | text | Diamond consistency mode: none or atomic |
diamond_schedule_policy | text | Diamond schedule policy: fastest or slowest |
has_keyless_source | bool | Whether any source table lacks a PRIMARY KEY (EC-06) |
function_hashes | text | MD5 hashes of referenced function bodies for change detection (EC-16) |
scc_id | int | SCC group identifier for circular dependencies (NULL if not in a cycle) |
last_fixpoint_iterations | int | Number of iterations in the last SCC fixpoint convergence (NULL if never iterated) |
created_at | timestamptz | Creation timestamp |
updated_at | timestamptz | Last modification timestamp |
pgtrickle.pgt_dependencies
DAG edges — records which source tables each ST depends on, including CDC mode metadata.
| Column | Type | Description |
|---|---|---|
pgt_id | bigint | FK to pgt_stream_tables |
source_relid | oid | OID of the source table |
source_type | text | TABLE, STREAM_TABLE, VIEW, MATVIEW, or FOREIGN_TABLE |
columns_used | text[] | Which columns are referenced |
column_snapshot | jsonb | Snapshot of source column metadata at creation time |
schema_fingerprint | text | SHA-256 fingerprint of column snapshot for fast equality checks |
cdc_mode | text | Current CDC mode: TRIGGER, TRANSITIONING, or WAL |
slot_name | text | Replication slot name (WAL/TRANSITIONING modes) |
decoder_confirmed_lsn | pg_lsn | WAL decoder's last confirmed position |
transition_started_at | timestamptz | When the trigger→WAL transition started |
pgtrickle.pgt_refresh_history
Audit log of all refresh operations.
| Column | Type | Description |
|---|---|---|
refresh_id | bigserial | Primary key |
pgt_id | bigint | FK to pgt_stream_tables |
data_timestamp | timestamptz | Data timestamp of the refresh |
start_time | timestamptz | When the refresh started |
end_time | timestamptz | When it completed |
action | text | NO_DATA, FULL, DIFFERENTIAL, REINITIALIZE, SKIP |
rows_inserted | bigint | Rows inserted |
rows_deleted | bigint | Rows deleted |
delta_row_count | bigint | Number of delta rows processed from change buffers |
merge_strategy_used | text | Which merge strategy was used (e.g. MERGE, DELETE+INSERT) |
was_full_fallback | bool | Whether the refresh fell back to FULL from DIFFERENTIAL |
error_message | text | Error message if failed |
status | text | RUNNING, COMPLETED, FAILED, SKIPPED |
initiated_by | text | What triggered: SCHEDULER, MANUAL, or INITIAL |
freshness_deadline | timestamptz | SLA deadline (duration schedules only; NULL for cron) |
fixpoint_iteration | int | Iteration of the fixed-point loop (NULL for non-cyclic refreshes) |
pgtrickle.pgt_change_tracking
CDC slot tracking per source table.
| Column | Type | Description |
|---|---|---|
source_relid | oid | OID of the tracked source table |
slot_name | text | Logical replication slot name |
last_consumed_lsn | pg_lsn | Last consumed WAL position |
tracked_by_pgt_ids | bigint[] | Array of ST IDs depending on this source |
pgtrickle.pgt_source_gates
Bootstrap source gate registry. One row per source table that has ever been
gated. Only sources with gated = true are actively blocking scheduler
refreshes.
| Column | Type | Description |
|---|---|---|
source_relid | oid | OID of the gated source table (PK) |
gated | boolean | true while the source is gated; false after ungate_source() |
gated_at | timestamptz | When the gate was most recently set |
ungated_at | timestamptz | When the gate was cleared (NULL if still active) |
gated_by | text | Actor that set the gate (e.g. 'gate_source') |
pgtrickle.pgt_refresh_groups
User-declared Cross-Source Snapshot Consistency groups (v0.9.0). A refresh
group guarantees that all member stream tables are refreshed against a snapshot
taken at the same point in time, preventing partial-update visibility (e.g.
orders and order_lines both reflecting the same transaction boundary).
| Column | Type | Description |
|---|---|---|
group_id | serial | Primary key |
group_name | text | Unique human-readable group name |
member_oids | oid[] | OIDs of the stream table storage relations that participate in this group |
isolation | text | Snapshot isolation level for the group: 'read_committed' (default) or 'repeatable_read' |
created_at | timestamptz | When the group was created |
Management API
-- Create a refresh group
SELECT pgtrickle.create_refresh_group(
'orders_snapshot',
ARRAY['public.orders_summary', 'public.order_lines_summary'],
'repeatable_read' -- or 'read_committed' (default)
);
-- List all groups:
SELECT * FROM pgtrickle.refresh_groups();
-- Remove a group:
SELECT pgtrickle.drop_refresh_group('orders_snapshot');
Validation rules:
- At least 2 member stream tables are required.
- All members must exist in
pgt_stream_tables. - No member can appear in more than one refresh group.
- Valid isolation levels:
'read_committed'(default),'repeatable_read'.
Bootstrap Source Gating (v0.5.0)
These functions let operators pause and resume scheduler-driven refreshes for individual source tables — useful during large bulk loads or ETL windows.
pgtrickle.gate_source(source TEXT)
Mark a source table as gated. The scheduler will skip any stream table that
reads from this source until ungate_source() is called.
SELECT pgtrickle.gate_source('my_schema.big_source');
Manual refresh_stream_table() calls are not affected by gates.
pgtrickle.ungate_source(source TEXT)
Clear a gate set by gate_source(). After this call the scheduler resumes
normal refresh scheduling for dependent stream tables.
SELECT pgtrickle.ungate_source('my_schema.big_source');
pgtrickle.source_gates()
Table function returning the current gate status for all registered sources.
SELECT * FROM pgtrickle.source_gates();
-- source_table | schema_name | gated | gated_at | ungated_at | gated_by
| Column | Type | Description |
|---|---|---|
source_table | text | Relation name |
schema_name | text | Schema name |
gated | boolean | Whether the source is currently gated |
gated_at | timestamptz | When the gate was set |
ungated_at | timestamptz | When the gate was cleared (NULL if active) |
gated_by | text | Which function set the gate |
Typical workflow
-- 1. Gate the source before a bulk load.
SELECT pgtrickle.gate_source('orders');
-- 2. Load historical data (scheduler sits idle for orders-based STs).
COPY orders FROM '/data/historical_orders.csv';
-- 3. Ungate — the next scheduler tick refreshes everything cleanly.
SELECT pgtrickle.ungate_source('orders');
pgtrickle.bootstrap_gate_status() (v0.6.0)
Rich introspection of bootstrap gate lifecycle. Returns the same columns as
source_gates() plus computed fields for debugging.
SELECT * FROM pgtrickle.bootstrap_gate_status();
-- source_table | schema_name | gated | gated_at | ungated_at | gated_by | gate_duration | affected_stream_tables
| Column | Type | Description |
|---|---|---|
source_table | text | Relation name |
schema_name | text | Schema name |
gated | boolean | Whether the source is currently gated |
gated_at | timestamptz | When the gate was set (updated on re-gate) |
ungated_at | timestamptz | When the gate was cleared (NULL if active) |
gated_by | text | Which function set the gate |
gate_duration | interval | How long the gate has been active (gated: now() - gated_at; ungated: ungated_at - gated_at) |
affected_stream_tables | text | Comma-separated list of stream tables whose scheduler refreshes are blocked by this gate |
Rows are sorted with currently-gated sources first, then alphabetically.
ETL Coordination Cookbook (v0.6.0)
Step-by-step recipes for common bulk-load patterns using source gating.
Recipe 1 — Single Source Bulk Load
Gate one source table during a large data import. The scheduler pauses refreshes for all stream tables that depend on this source.
-- 1. Gate the source before loading.
SELECT pgtrickle.gate_source('orders');
-- 2. Load the data. The scheduler sits idle for orders-dependent STs.
COPY orders FROM '/data/orders_2026.csv' WITH (FORMAT csv, HEADER);
-- 3. Ungate. On the next tick the scheduler refreshes everything cleanly.
SELECT pgtrickle.ungate_source('orders');
Recipe 2 — Coordinated Multi-Source Load
When multiple sources feed into a shared downstream stream table, gate them all before loading so no intermediate refreshes occur.
-- 1. Gate all sources that will be loaded.
SELECT pgtrickle.gate_source('orders');
SELECT pgtrickle.gate_source('order_lines');
-- 2. Load each source (can be parallel, any order).
COPY orders FROM '/data/orders.csv' WITH (FORMAT csv, HEADER);
COPY order_lines FROM '/data/lines.csv' WITH (FORMAT csv, HEADER);
-- 3. Ungate all sources. The scheduler refreshes downstream STs once.
SELECT pgtrickle.ungate_source('orders');
SELECT pgtrickle.ungate_source('order_lines');
Recipe 3 — Gate + Deferred Initialization
Combine gating with initialize => false to prevent incomplete initial
population when sources are loaded asynchronously.
-- 1. Gate sources before creating any stream tables.
SELECT pgtrickle.gate_source('orders');
SELECT pgtrickle.gate_source('order_lines');
-- 2. Create stream tables without initial population.
SELECT pgtrickle.create_stream_table(
'order_summary',
'SELECT region, SUM(amount) FROM orders GROUP BY region',
'1m', initialize => false
);
SELECT pgtrickle.create_stream_table(
'order_report',
'SELECT s.region, s.total, l.line_count
FROM order_summary s
JOIN (SELECT region, COUNT(*) AS line_count FROM order_lines GROUP BY region) l
USING (region)',
'1m', initialize => false
);
-- 3. Run ETL processes (can be in separate transactions).
BEGIN;
COPY orders FROM 's3://warehouse/orders.parquet';
SELECT pgtrickle.ungate_source('orders');
COMMIT;
BEGIN;
COPY order_lines FROM 's3://warehouse/lines.parquet';
SELECT pgtrickle.ungate_source('order_lines');
COMMIT;
-- 4. Once all sources are ungated, the scheduler initializes and refreshes
-- all stream tables in dependency order.
Recipe 4 — Nightly Batch Pattern
For scheduled ETL that runs overnight, gate sources before the batch starts and ungate after the batch completes.
-- Nightly ETL script:
-- Gate all sources that will be refreshed.
SELECT pgtrickle.gate_source('sales');
SELECT pgtrickle.gate_source('inventory');
-- Truncate and reload (or use COPY, INSERT...SELECT, etc.).
TRUNCATE sales;
COPY sales FROM '/data/nightly/sales.csv' WITH (FORMAT csv, HEADER);
TRUNCATE inventory;
COPY inventory FROM '/data/nightly/inventory.csv' WITH (FORMAT csv, HEADER);
-- All data loaded — ungate and let the scheduler handle the rest.
SELECT pgtrickle.ungate_source('sales');
SELECT pgtrickle.ungate_source('inventory');
-- Verify: check the gate status to confirm everything is ungated.
SELECT * FROM pgtrickle.bootstrap_gate_status();
Recipe 5 — Monitoring During a Gated Load
Use bootstrap_gate_status() to monitor progress when streams appear stalled.
-- Check which sources are currently gated and how long they've been paused.
SELECT source_table, gate_duration, affected_stream_tables
FROM pgtrickle.bootstrap_gate_status()
WHERE gated = true;
-- If a gate has been active too long (e.g. ETL failed), ungate manually.
SELECT pgtrickle.ungate_source('stale_source');
Watermark Gating (v0.7.0)
Watermark gating is a scheduling control for ETL pipelines where multiple source tables are populated by separate jobs that finish at different times. Each ETL job declares "I'm done up to timestamp X", and the scheduler waits until all sources in a group are caught up within a configurable tolerance before refreshing downstream stream tables.
Catalog Tables
pgtrickle.pgt_watermarks
Per-source watermark state. One row per source table that has had a watermark advanced.
| Column | Type | Description |
|---|---|---|
source_relid | oid | Source table OID (primary key) |
watermark | timestamptz | Current watermark value |
updated_at | timestamptz | When the watermark was last advanced |
advanced_by | text | User/role that advanced the watermark |
wal_lsn_at_advance | text | WAL LSN at the time of advancement |
pgtrickle.pgt_watermark_groups
Watermark group definitions. Each group declares that a set of sources must be temporally aligned.
| Column | Type | Description |
|---|---|---|
group_id | serial | Auto-generated group ID (primary key) |
group_name | text | Unique group name |
source_relids | oid[] | Array of source table OIDs in the group |
tolerance_secs | float8 | Maximum allowed lag in seconds (default 0) |
created_at | timestamptz | When the group was created |
pgtrickle.pgt_template_cache
Added in v0.16.0. Cross-backend delta SQL template cache (UNLOGGED). Stores compiled delta query templates so new backends skip the ~45 ms DVM parse+differentiate step. Managed automatically — no user interaction required.
| Column | Type | Description |
|---|---|---|
pgt_id | bigint | Stream table ID (PK, FK → pgt_stream_tables) |
query_hash | bigint | Hash of the defining query (staleness detection) |
delta_sql | text | Delta SQL template with LSN placeholder tokens |
columns | text[] | Output column names |
source_oids | integer[] | Source table OIDs |
is_dedup | boolean | Whether the delta is deduplicated per row ID |
key_changed | boolean | Whether __pgt_key_changed column is present |
all_algebraic | boolean | Whether all aggregates are algebraically invertible |
cached_at | timestamptz | When the entry was last populated |
Functions
pgtrickle.advance_watermark(source TEXT, watermark TIMESTAMPTZ)
Signal that a source table's data is complete through the given timestamp.
- Monotonic: rejects watermarks that go backward (raises error).
- Idempotent: advancing to the same value is a silent no-op.
- Transactional: the watermark is part of the caller's transaction.
SELECT pgtrickle.advance_watermark('orders', '2026-03-01 12:05:00+00');
pgtrickle.create_watermark_group(group_name TEXT, sources TEXT[], tolerance_secs FLOAT8 DEFAULT 0)
Create a watermark group. Requires at least 2 sources.
tolerance_secs: maximum allowed lag between the most-advanced and least-advanced watermarks. Default0means strict alignment.
SELECT pgtrickle.create_watermark_group(
'order_pipeline',
ARRAY['orders', 'order_lines'],
0 -- strict alignment (default)
);
pgtrickle.drop_watermark_group(group_name TEXT)
Remove a watermark group by name.
SELECT pgtrickle.drop_watermark_group('order_pipeline');
pgtrickle.watermarks()
Return the current watermark state for all registered sources.
SELECT * FROM pgtrickle.watermarks();
| Column | Type | Description |
|---|---|---|
source_table | text | Source table name |
schema_name | text | Schema name |
watermark | timestamptz | Current watermark value |
updated_at | timestamptz | Last advancement time |
advanced_by | text | User that advanced it |
wal_lsn | text | WAL LSN at advancement |
pgtrickle.watermark_groups()
Return all watermark group definitions.
SELECT * FROM pgtrickle.watermark_groups();
pgtrickle.watermark_status()
Return live alignment status for each watermark group.
SELECT * FROM pgtrickle.watermark_status();
| Column | Type | Description |
|---|---|---|
group_name | text | Group name |
min_watermark | timestamptz | Least-advanced watermark |
max_watermark | timestamptz | Most-advanced watermark |
lag_secs | float8 | Lag in seconds between max and min |
aligned | boolean | Whether lag is within tolerance |
sources_with_watermark | int4 | Number of sources that have a watermark |
sources_total | int4 | Total sources in the group |
Recipes
Recipe 6 — Nightly ETL with Watermarks
-- Create a watermark group for the order pipeline.
SELECT pgtrickle.create_watermark_group(
'order_pipeline',
ARRAY['orders', 'order_lines']
);
-- Nightly ETL job 1: Load orders
BEGIN;
COPY orders FROM '/data/orders_20260301.csv';
SELECT pgtrickle.advance_watermark('orders', '2026-03-01');
COMMIT;
-- Nightly ETL job 2: Load order lines (may run later)
BEGIN;
COPY order_lines FROM '/data/lines_20260301.csv';
SELECT pgtrickle.advance_watermark('order_lines', '2026-03-01');
COMMIT;
-- order_report refreshes on the next tick after both watermarks align.
Recipe 7 — Micro-Batch Tolerance
-- Allow up to 30 seconds of skew between trades and quotes.
SELECT pgtrickle.create_watermark_group(
'realtime_pipeline',
ARRAY['trades', 'quotes'],
30 -- 30-second tolerance
);
-- External process advances watermarks every few seconds.
SELECT pgtrickle.advance_watermark('trades', '2026-03-01 12:00:05+00');
SELECT pgtrickle.advance_watermark('quotes', '2026-03-01 12:00:02+00');
-- Lag is 3s, within 30s tolerance → stream tables refresh normally.
Recipe 8 — Monitoring Watermark Alignment
-- Check which groups are currently misaligned.
SELECT group_name, lag_secs, aligned
FROM pgtrickle.watermark_status()
WHERE NOT aligned;
-- Check individual source watermarks.
SELECT source_table, watermark, updated_at
FROM pgtrickle.watermarks()
ORDER BY watermark;
Stuck Watermark Detection (WM-7, v0.15.0)
When pg_trickle.watermark_holdback_timeout is set to a positive value
(seconds), the scheduler periodically checks all watermark sources. If any
source in a watermark group has not been advanced within the timeout,
downstream stream tables in that group are paused (refresh is skipped)
and a pgtrickle_alert NOTIFY is emitted.
This protects against silent data staleness when an ETL pipeline breaks and stops advancing watermarks -- without this guard, stream tables would continue refreshing with stale external data.
Behavior:
- Stuck detection: Every ~60 seconds, the scheduler checks
updated_atfor all watermark sources. Ifnow() - updated_at > watermark_holdback_timeout, the source is stuck. - Pause: Any stream table whose source set overlaps a group containing
a stuck source is skipped. A SKIP record with
"stuck"in the reason is logged topgt_refresh_history. - Alert: A
pgtrickle_alertNOTIFY with eventwatermark_stuckis emitted (once per newly-stuck source, not repeated every check cycle). - Auto-resume: When the stuck watermark is advanced via
advance_watermark(), the next scheduler check detects the advancement, lifts the pause, and emits awatermark_resumedevent.
Recipe 9 — Stuck Watermark Protection
-- Enable stuck-watermark detection with a 10-minute timeout.
ALTER SYSTEM SET pg_trickle.watermark_holdback_timeout = 600;
SELECT pg_reload_conf();
-- Listen for alerts in a monitoring process.
LISTEN pgtrickle_alert;
-- When the ETL pipeline breaks and stops calling advance_watermark(),
-- the scheduler will start skipping downstream STs after 10 minutes.
-- You'll receive a NOTIFY payload like:
-- {"event":"watermark_stuck","group":"order_pipeline","source_oid":16385,"age_secs":620}
-- When the ETL pipeline recovers and advances the watermark:
SELECT pgtrickle.advance_watermark('orders', '2026-03-02 00:00:00+00');
-- The scheduler automatically resumes, and you'll receive:
-- {"event":"watermark_resumed","source_oid":16385}
Developer Diagnostics (v0.12.0)
Four SQL-callable introspection functions that surface internal DVM state without side-effects. All functions are read-only — they never modify catalog tables or trigger refreshes.
pgtrickle.explain_query_rewrite(query TEXT)
Walk a query through the full DVM rewrite pipeline and report each pass.
Returns one row per rewrite pass. When a pass changes the query, changed = true
and sql_after contains the SQL after the transformation. Two synthetic rows
are appended: topk_detection (detects ORDER BY … LIMIT) and dvm_patterns
(lists detected DVM constructs such as aggregation strategy, join types, and
volatility).
SELECT pass_name, changed, sql_after
FROM pgtrickle.explain_query_rewrite(
'SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id'
);
Return columns:
| Column | Type | Description |
|---|---|---|
pass_name | text | Rewrite pass name (e.g. view_inlining, distinct_on, grouping_sets) |
changed | bool | Whether this pass modified the query |
sql_after | text | SQL text after this pass (NULL if unchanged) |
Rewrite passes (in order):
| Pass | Description |
|---|---|
view_inlining | Expand view references to their defining SQL |
nested_window_lift | Lift window functions out of expressions (e.g. CASE WHEN ROW_NUMBER() OVER (...) ...) |
distinct_on | Rewrite DISTINCT ON to a ROW_NUMBER() window |
grouping_sets | Expand GROUPING SETS / CUBE / ROLLUP to UNION ALL of GROUP BY |
scalar_subquery_in_where | Rewrite scalar subqueries in WHERE to CROSS JOIN |
correlated_scalar_in_select | Rewrite correlated scalar subqueries in SELECT to LEFT JOIN |
sublinks_in_or_demorgan | Apply De Morgan normalization and expand SubLinks inside OR |
rows_from | Rewrite ROWS FROM() multi-function expressions |
topk_detection | Detect ORDER BY … LIMIT n TopK pattern |
dvm_patterns | Detected DVM constructs: join types, aggregate strategies, volatility |
pgtrickle.diagnose_errors(name TEXT)
Return the last 5 FAILED refresh events for a stream table, with each error classified by type and supplied with a remediation hint.
SELECT event_time, error_type, error_message, remediation
FROM pgtrickle.diagnose_errors('my_stream_table');
Return columns:
| Column | Type | Description |
|---|---|---|
event_time | timestamptz | When the failed refresh started |
error_type | text | Classification: user, schema, correctness, performance, infrastructure |
error_message | text | Raw error text from pgt_refresh_history |
remediation | text | Suggested next step |
Error types:
| Type | Trigger patterns | Typical action |
|---|---|---|
user | query parse error, unsupported operator, type mismatch | Check query; run validate_query() |
schema | upstream table schema changed, upstream table dropped | Reinitialize; check pgt_dependencies |
correctness | phantom, EXCEPT ALL, row count mismatch | Switch to refresh_mode='FULL'; report bug |
performance | lock timeout, deadlock, serialization failure, spill | Tune lock_timeout; enable buffer_partitioning |
infrastructure | permission denied, SPI error, replication slot | Check role grants; verify slot config |
pgtrickle.list_auxiliary_columns(name TEXT)
List all __pgt_* internal columns on a stream table's storage relation,
with an explanation of each column's role.
These columns are normally hidden from SELECT * output. This function
surfaces them for debugging and operator visibility.
SELECT column_name, data_type, purpose
FROM pgtrickle.list_auxiliary_columns('my_stream_table');
Return columns:
| Column | Type | Description |
|---|---|---|
column_name | text | Internal column name (e.g. __pgt_row_id) |
data_type | text | PostgreSQL type (e.g. bigint, text) |
purpose | text | Human-readable description of the column's role |
Common auxiliary columns:
| Column | Purpose |
|---|---|
__pgt_row_id | Row identity hash — MERGE join key for delta application |
__pgt_count | Multiplicity counter for DISTINCT / aggregation / UNION dedup |
__pgt_count_l | Left-side multiplicity for INTERSECT / EXCEPT |
__pgt_count_r | Right-side multiplicity for INTERSECT / EXCEPT |
__pgt_aux_sum_<col> | Running SUM for algebraic AVG maintenance |
__pgt_aux_count_<col> | Running COUNT for algebraic AVG maintenance |
__pgt_aux_sum2_<col> | Sum-of-squares for STDDEV / VAR maintenance |
__pgt_aux_sum{x,y,xy,x2,y2}_<col> | Five-column set for CORR / COVAR / REGR_* |
__pgt_aux_nonnull_<col> | Non-null count for SUM-above-FULL-JOIN maintenance |
pgtrickle.validate_query(query TEXT)
Parse and validate a query through the DVM pipeline without creating a stream table. Returns detected SQL constructs, warnings, and the resolved refresh mode.
SELECT check_name, result, severity
FROM pgtrickle.validate_query(
'SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id'
);
Return columns:
| Column | Type | Description |
|---|---|---|
check_name | text | Name of the check or detected construct |
result | text | Resolved value or construct description |
severity | text | INFO, WARNING, or ERROR |
The first row always has check_name = 'resolved_refresh_mode' with the mode
that would be assigned under refresh_mode = 'AUTO': DIFFERENTIAL, FULL,
or TOPK.
Common check names:
| Check | Description |
|---|---|
resolved_refresh_mode | DIFFERENTIAL, FULL, or TOPK |
topk_pattern | Detected LIMIT + ORDER BY values |
unsupported_construct | Feature not supported for DIFFERENTIAL mode (→ WARNING) |
matview_or_foreign_table | Query references matview/foreign table (→ WARNING, FULL) |
ivm_support_check | DVM parse result (→ WARNING if DIFFERENTIAL not possible) |
aggregate | Aggregate with strategy: ALGEBRAIC_INVERTIBLE, ALGEBRAIC_VIA_AUX, SEMI_ALGEBRAIC, or GROUP_RESCAN |
join | Detected join type: INNER, LEFT_OUTER, FULL_OUTER, SEMI, ANTI |
set_op | Set operation: DISTINCT, UNION_ALL, INTERSECT, EXCEPT, EXCEPT_ALL |
window_function | Query contains window functions |
scalar_subquery | Query contains scalar subqueries |
lateral | Query contains LATERAL functions or subqueries |
recursive_cte | Query uses WITH RECURSIVE |
volatility | Worst-case volatility of functions used: immutable, stable, volatile |
needs_pgt_count | Multiplicity counter column will be added |
needs_dual_count | Left/right multiplicity counters will be added |
parse_warning | Advisory warning from the DVM parse phase |
Example output for a GROUP_RESCAN query:
SELECT check_name, result, severity
FROM pgtrickle.validate_query(
'SELECT grp, STRING_AGG(tag, '','') FROM events GROUP BY grp'
);
| check_name | result | severity |
|---|---|---|
resolved_refresh_mode | DIFFERENTIAL | INFO |
aggregate | STRING_AGG(GROUP_RESCAN) | WARNING |
needs_pgt_count | true — multiplicity counter column required | INFO |
volatility | immutable | INFO |
Note on GROUP_RESCAN:
STRING_AGG,ARRAY_AGG,BOOL_AND, and other non-algebraic aggregates use a group-rescan strategy — any change in a group triggers full re-aggregation from the source data for that group. This is still DIFFERENTIAL (only changed groups are rescanned), but has higher per-group cost than algebraic strategies. If this is performance-sensitive, consider pre-aggregating with a simpler aggregate and post-processing.
Delta SQL Profiling (v0.13.0)
pgtrickle.explain_delta(st_name text, format text DEFAULT 'text')
Generate the delta SQL query plan for a stream table without executing a refresh.
explain_delta produces the differential delta SQL that would be used on the
next DIFFERENTIAL refresh, then runs EXPLAIN (ANALYZE false, FORMAT <format>)
on it and returns the plan lines. This function is useful for:
- Identifying slow joins or missing indexes in auto-generated delta SQL.
- Comparing plan complexity between different query forms.
- Monitoring how the size of change buffers affects plan shape.
The delta SQL is generated against a hypothetical "scan all changes" window
(LSN 0/0 → FF/FFFFFFFF) so the plan shows the full join/filter structure
even when the change buffer is currently empty.
Parameters:
| Name | Type | Description |
|---|---|---|
st_name | text | Qualified stream table name (e.g. 'public.orders_summary'). |
format | text | Plan format: 'text' (default), 'json', 'xml', or 'yaml'. |
Returns: SETOF text — one row per plan line (text format) or one row containing the full JSON/XML/YAML plan.
Example:
-- Show the text plan for the delta query
SELECT line FROM pgtrickle.explain_delta('public.orders_summary');
-- Get the JSON plan for programmatic analysis
SELECT line FROM pgtrickle.explain_delta('public.orders_summary', 'json');
Environment variable (PGS_PROFILE_DELTA=1): When the environment variable
PGS_PROFILE_DELTA=1 is set in the PostgreSQL server process, every
DIFFERENTIAL refresh automatically captures EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
for the resolved delta SQL and writes the plan to
/tmp/delta_plans/<schema>_<table>.json. This is intended for E2E test
diagnostics and local profiling sessions.
pgtrickle.dedup_stats()
Show MERGE deduplication profiling counters accumulated since server start.
When the delta cannot be guaranteed to contain at most one row per
__pgt_row_id (e.g. for aggregate queries or keyless sources), the MERGE
must group and aggregate the delta before merging. This is tracked as
dedup needed. A consistently high ratio indicates that pre-MERGE compaction
in the change buffer would reduce refresh latency.
Returns: one row with:
| Column | Type | Description |
|---|---|---|
total_diff_refreshes | bigint | Total DIFFERENTIAL refreshes executed since server start that processed at least one change. Resets on server restart. |
dedup_needed | bigint | Number of those refreshes where the delta required weight aggregation / deduplication in the MERGE USING clause. |
dedup_ratio_pct | float8 | dedup_needed / total_diff_refreshes × 100. 0 when total_diff_refreshes = 0. |
Example:
SELECT * FROM pgtrickle.dedup_stats();
-- total_diff_refreshes | dedup_needed | dedup_ratio_pct
-- ----------------------+--------------+-----------------
-- 1234 | 87 | 7.05
A dedup_ratio_pct ≥ 10 is the threshold recommended for investigating a
two-pass MERGE strategy. See plans/performance/REPORT_OVERALL_STATUS.md §14
for background.
pgtrickle.shared_buffer_stats()
Added in v0.13.0
D-4 observability function. Returns one row per shared change buffer (one per tracked source table), showing how many stream tables share the buffer, which columns are tracked, the safe cleanup frontier, and the current buffer size.
Return columns:
| Column | Type | Description |
|---|---|---|
source_oid | bigint | PostgreSQL OID of the source table |
source_table | text | Fully qualified source table name |
consumer_count | integer | Number of stream tables sharing this buffer |
consumers | text | Comma-separated list of consumer stream table names |
columns_tracked | integer | Number of new_* columns in the buffer (column superset) |
safe_frontier_lsn | text | MIN(frontier LSN) across all consumers — rows at or below this are safe to clean up |
buffer_rows | bigint | Current number of rows in the change buffer |
is_partitioned | boolean | Whether the buffer uses LSN-range partitioning |
Example:
SELECT * FROM pgtrickle.shared_buffer_stats();
-- source_oid | source_table | consumer_count | consumers | columns_tracked | safe_frontier_lsn | buffer_rows | is_partitioned
-- -----------+--------------------+----------------+------------------------------------+-----------------+-------------------+-------------+----------------
-- 16456 | public.orders | 3 | public.orders_by_region, public... | 5 | 0/1A2B3C4D | 142 | f
UNLOGGED Change Buffers (v0.14.0)
pgtrickle.convert_buffers_to_unlogged()
Converts all existing logged change buffer tables to UNLOGGED. This
eliminates WAL writes for trigger-inserted CDC rows, reducing WAL
amplification by ~30%.
Returns: bigint — the number of buffer tables converted.
SELECT pgtrickle.convert_buffers_to_unlogged();
-- convert_buffers_to_unlogged
-- ----------------------------
-- 5
Warning: Each conversion acquires
ACCESS EXCLUSIVElock on the buffer table. Run this function during a low-traffic maintenance window to minimize lock contention.
After conversion: Buffer contents will be lost on crash recovery. The scheduler automatically detects this and enqueues a FULL refresh for affected stream tables. See
pg_trickle.unlogged_buffersfor the full trade-off discussion.
Refresh Mode Diagnostics (v0.14.0)
pgtrickle.recommend_refresh_mode(st_name TEXT DEFAULT NULL)
Analyze stream table workload characteristics and recommend the optimal
refresh mode (FULL vs DIFFERENTIAL). When st_name is NULL, returns one
row per stream table. When provided, returns a single row for the named
stream table.
The function evaluates seven weighted signals — change ratio, empirical timing, query complexity, target size, index coverage, and latency variance — and computes a composite score. Scores above +0.15 recommend DIFFERENTIAL; below −0.15 recommend FULL; in between, the function recommends KEEP (current mode is near-optimal).
Parameters:
| Name | Type | Default | Description |
|---|---|---|---|
st_name | text | NULL | Optional stream table name. NULL = all stream tables. |
Return columns:
| Column | Type | Description |
|---|---|---|
pgt_schema | text | Stream table schema |
pgt_name | text | Stream table name |
current_mode | text | Currently configured refresh mode |
effective_mode | text | Mode actually used in the last refresh |
recommended_mode | text | DIFFERENTIAL, FULL, or KEEP |
confidence | text | high, medium, or low |
reason | text | Human-readable explanation of the recommendation |
signals | jsonb | Detailed signal breakdown with scores and weights |
Example:
-- Check all stream tables
SELECT pgt_name, current_mode, recommended_mode, confidence, reason
FROM pgtrickle.recommend_refresh_mode();
-- Check a specific stream table
SELECT recommended_mode, confidence, reason, signals
FROM pgtrickle.recommend_refresh_mode('public.orders_summary');
Signal weights:
| Signal | Base Weight | Description |
|---|---|---|
change_ratio_current | 0.25 | Current pending changes / source rows |
change_ratio_avg | 0.30 | Historical average change ratio |
empirical_timing | 0.35 | Observed DIFF vs FULL speed ratio |
query_complexity | 0.10 | JOIN/aggregate/window count |
target_size | 0.10 | Target relation + index size |
index_coverage | 0.05 | Whether __pgt_row_id index exists |
latency_variance | 0.05 | DIFF latency p95/p50 ratio |
pgtrickle.refresh_efficiency()
Per-table refresh efficiency metrics. Returns operational statistics for every stream table — useful for monitoring dashboards and Grafana alerts.
Return columns:
| Column | Type | Description |
|---|---|---|
pgt_schema | text | Stream table schema |
pgt_name | text | Stream table name |
refresh_mode | text | Current refresh mode |
total_refreshes | bigint | Total completed refresh count |
diff_count | bigint | DIFFERENTIAL refresh count |
full_count | bigint | FULL refresh count |
avg_diff_ms | float8 | Average DIFFERENTIAL duration (ms) |
avg_full_ms | float8 | Average FULL duration (ms) |
avg_change_ratio | float8 | Average change ratio from history |
diff_speedup | text | Speedup factor (e.g. 12.3x) of FULL / DIFF timing |
last_refresh_at | text | Timestamp of last data refresh |
Example:
SELECT pgt_name, refresh_mode, diff_count, full_count,
avg_diff_ms, avg_full_ms, diff_speedup
FROM pgtrickle.refresh_efficiency()
ORDER BY total_refreshes DESC;
Export API (v0.14.0)
pgtrickle.export_definition(st_name TEXT)
Export a stream table's configuration as reproducible DDL. Returns a SQL
script containing DROP STREAM TABLE IF EXISTS followed by
SELECT pgtrickle.create_stream_table(...) with all configured options,
plus any ALTER STREAM TABLE calls for post-creation settings (tier,
fuse mode, etc.).
Parameters:
| Name | Type | Description |
|---|---|---|
st_name | text | Fully qualified or search-path-resolved stream table name. |
Returns: text — SQL script that recreates the stream table.
Example:
-- Export a single definition
SELECT pgtrickle.export_definition('public.orders_summary');
-- Export all definitions
SELECT pgtrickle.export_definition(pgt_schema || '.' || pgt_name)
FROM pgtrickle.pgt_stream_tables;
dbt Integration (v0.13.0)
The dbt-pgtrickle package exposes two new config(...) keys added in
v0.13.0: partition_by and the fuse circuit-breaker options. Use them directly
in any stream_table materialization model.
For full dbt documentation see dbt-pgtrickle/README.md.
partition_by config
Partition the stream table's underlying storage table using PostgreSQL
PARTITION BY RANGE. Only applied at creation time — changing it after the
stream table exists has no effect (use --full-refresh to recreate).
-- models/marts/events_by_day.sql
{{ config(
materialized='stream_table',
schedule='1m',
refresh_mode='DIFFERENTIAL',
partition_by='event_day'
) }}
SELECT
event_day,
user_id,
COUNT(*) AS event_count
FROM {{ source('raw', 'events') }}
GROUP BY event_day, user_id
pg_trickle creates a PARTITION BY RANGE (event_day) storage table with an
automatic default catch-all partition. Add named partitions via standard DDL:
CREATE TABLE analytics.events_by_day_2026
PARTITION OF analytics.events_by_day
FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');
The partition_by value is stored in pgtrickle.pgt_stream_tables.st_partition_key
and visible via pgtrickle.stream_tables_info.
fuse config
The fuse circuit breaker suspends differential refreshes when the incoming
change volume exceeds a threshold, preventing runaway refresh cycles during
bulk ingestion. Fuse parameters are applied via alter_stream_table() on
every dbt run; they are a no-op if the values have not changed.
-- models/marts/order_totals.sql
{{ config(
materialized='stream_table',
schedule='5m',
refresh_mode='DIFFERENTIAL',
fuse='auto',
fuse_ceiling=50000,
fuse_sensitivity=3
) }}
SELECT customer_id, SUM(amount) AS total
FROM {{ source('raw', 'orders') }}
GROUP BY customer_id
| Config key | Type | Default | Description |
|---|---|---|---|
fuse | 'off'|'on'|'auto' | null (no-op) | Fuse mode. 'auto' activates only when FULL refresh would be cheaper than DIFFERENTIAL. |
fuse_ceiling | integer | null | Change-count threshold (number of changed rows) that triggers the fuse. null uses the global pg_trickle.fuse_default_ceiling GUC. |
fuse_sensitivity | integer | null | Number of consecutive over-ceiling observations required before the fuse blows. null means 1 (blow immediately). |
Monitor fuse state via pgtrickle.dedup_stats() or check
pgtrickle.pgt_stream_tables.fuse_state directly:
SELECT pgt_name, fuse_mode, fuse_state, fuse_ceiling, fuse_sensitivity
FROM pgtrickle.pgt_stream_tables
WHERE fuse_mode != 'off';
Dog Feeding — Self-Monitoring (v0.20.0)
Added in v0.20.0.
pg_trickle can monitor itself using its own stream tables. Five dog-feeding stream tables maintain reactive analytics over the internal catalog, replacing repeated full-scan diagnostic queries with continuously-maintained incremental views.
Quick Start
-- Create all five dog-feeding stream tables (idempotent).
SELECT pgtrickle.setup_dog_feeding();
-- Check status.
SELECT * FROM pgtrickle.dog_feeding_status();
-- View threshold recommendations (after 10+ refresh cycles).
SELECT * FROM pgtrickle.df_threshold_advice
WHERE confidence IN ('HIGH', 'MEDIUM');
-- View anomalies.
SELECT * FROM pgtrickle.df_anomaly_signals
WHERE duration_anomaly IS NOT NULL;
-- Enable auto-apply (optional).
SET pg_trickle.dog_feeding_auto_apply = 'threshold_only';
-- Clean up.
SELECT pgtrickle.teardown_dog_feeding();
pgtrickle.setup_dog_feeding()
Creates all five dog-feeding stream tables. Idempotent — safe to call multiple
times. Emits a warm-up warning if pgt_refresh_history has fewer than 50 rows.
Stream tables created:
| Name | Schedule | Mode | Purpose |
|---|---|---|---|
pgtrickle.df_efficiency_rolling | 48s | AUTO | Rolling-window refresh statistics |
pgtrickle.df_anomaly_signals | 48s | AUTO | Duration spikes, error bursts, mode oscillation |
pgtrickle.df_threshold_advice | 96s | AUTO | Multi-cycle threshold recommendations |
pgtrickle.df_cdc_buffer_trends | 48s | AUTO | CDC buffer growth rates per source |
pgtrickle.df_scheduling_interference | 96s | FULL | Concurrent refresh overlap detection |
pgtrickle.teardown_dog_feeding()
Drops all dog-feeding stream tables. Safe with partial setups — missing tables are silently skipped. User stream tables are never affected.
pgtrickle.dog_feeding_status()
Returns the status of all five expected dog-feeding stream tables:
| Column | Type | Description |
|---|---|---|
st_name | text | Stream table name |
exists | bool | Whether the ST exists |
status | text | Current status (ACTIVE, SUSPENDED, etc.) |
refresh_mode | text | Effective refresh mode |
last_refresh_at | text | Last successful refresh timestamp |
total_refreshes | bigint | Total completed refreshes |
pgtrickle.scheduler_overhead()
Returns scheduler efficiency metrics for the last hour:
| Column | Type | Description |
|---|---|---|
total_refreshes_1h | bigint | Total refreshes in the last hour |
df_refreshes_1h | bigint | Dog-feeding refreshes in the last hour |
df_refresh_fraction | float | Fraction of refreshes that are dog-feeding |
avg_refresh_ms | float | Average refresh duration (ms) |
avg_df_refresh_ms | float | Average DF refresh duration (ms) |
total_refresh_time_s | float | Total time spent refreshing (seconds) |
df_refresh_time_s | float | Time spent on DF refreshes (seconds) |
pgtrickle.explain_dag(format)
Returns the full refresh DAG as a Mermaid markdown (default) or Graphviz DOT string. Node colours: user STs = blue, dog-feeding STs = green, suspended = red, fused = orange.
-- Mermaid format (default).
SELECT pgtrickle.explain_dag();
-- Graphviz DOT format.
SELECT pgtrickle.explain_dag('dot');
Auto-Apply Policy
The pg_trickle.dog_feeding_auto_apply GUC controls whether analytics can
automatically adjust stream table configuration:
| Value | Behaviour |
|---|---|
off (default) | Advisory only — no automatic changes |
threshold_only | Apply threshold recommendations when confidence is HIGH and delta > 5% |
full | Also apply scheduling hints from interference analysis |
Auto-apply is rate-limited to at most one threshold change per stream table
per 10 minutes. Changes are logged to pgt_refresh_history with
initiated_by = 'DOG_FEED'.
Confidence Levels and Sparse History
df_threshold_advice assigns a confidence level to each recommendation:
| Confidence | Criteria | What to expect |
|---|---|---|
| HIGH | ≥ 20 total refreshes, ≥ 5 DIFFERENTIAL, ≥ 2 FULL | Reliable recommendation — auto-apply will act on this |
| MEDIUM | ≥ 10 total refreshes | Directionally useful, but may lack enough FULL/DIFF mix |
| LOW | < 10 total refreshes | Insufficient data — recommendation equals the current threshold |
When you see LOW confidence: This is normal during the first minutes after
setup_dog_feeding(). The stream tables need time to accumulate refresh
history. In typical deployments with a 1-minute schedule, expect:
- LOW for the first ~10 minutes
- MEDIUM after ~10 minutes
- HIGH after ~20 minutes (requires at least 2 FULL refreshes — these happen naturally when the auto-threshold triggers a mode switch)
If a stream table uses FULL mode exclusively, the advice will remain
at MEDIUM because no DIFFERENTIAL observations exist for comparison.
The sla_headroom_pct column shows how much faster DIFFERENTIAL is compared
to FULL as a percentage. A value of 70% means "DIFF is 70% faster than FULL".
This column is NULL when either FULL or DIFF observations are missing.
Public API Stability Contract
Added in v0.19.0 (DB-6).
Stable (will not break without a major version bump)
| Surface | Guarantee |
|---|---|
All functions in the pgtrickle schema documented in this reference | Signature and return type preserved across minor releases. New optional parameters may be added with defaults that preserve existing behaviour. |
Catalog tables pgtrickle.pgt_stream_tables, pgtrickle.pgt_dependencies, pgtrickle.pgt_refresh_history | Existing columns are not renamed or removed. New columns may be added. |
NOTIFY channels pg_trickle_refresh, pgtrickle_alert, pgtrickle_wake | Channel names and JSON payload structure preserved. New keys may be added to JSON payloads. |
GUC names listed in docs/CONFIGURATION.md | Names preserved; default values may change between minor releases (documented in CHANGELOG). |
Unstable (may change in any release)
| Surface | Notes |
|---|---|
Functions prefixed with _ (e.g. _signal_launcher_rescan) | Internal use only. |
Catalog tables not listed above (e.g. pgt_scheduler_jobs, pgt_source_gates, pgt_watermarks) | Schema may change. |
The pgtrickle_changes schema and its changes_* tables | CDC implementation detail; format may change. |
| SQL generated by the DVM engine (MERGE, delta CTEs) | Internal query structure is not an API. |
The pgtrickle.pgt_schema_version table | Migration infrastructure; rows and schema may change. |
Versioning Policy
- Patch releases (0.x.Y): Bug fixes only. No breaking changes.
- Minor releases (0.X.0): New features. Stable API preserved; unstable surfaces may change. Breaking changes to stable API only with a deprecation cycle (WARNING for one release, removal in the next).
- Major release (1.0.0): Stable API locked. Breaking changes require a major version bump.
Configuration
Complete reference for all pg_trickle GUC (Grand Unified Configuration) variables.
Table of Contents
- Overview
- GUC Variables
- Essential
- WAL CDC
- Refresh Performance
- pg_trickle.differential_max_change_ratio
- pg_trickle.refresh_strategy
- pg_trickle.cost_model_safety_margin
- pg_trickle.max_delta_estimate_rows
- pg_trickle.planner_aggressive
- pg_trickle.merge_join_strategy
- pg_trickle.merge_strategy
- pg_trickle.merge_strategy_threshold
- pg_trickle.merge_planner_hints (deprecated)
- pg_trickle.merge_work_mem_mb
- pg_trickle.merge_seqscan_threshold
- pg_trickle.auto_backoff
- pg_trickle.tiered_scheduling
- pg_trickle.cleanup_use_truncate
- pg_trickle.use_prepared_statements
- pg_trickle.user_triggers
- Guardrails & Limits
- pg_trickle.block_source_ddl
- pg_trickle.buffer_alert_threshold
- pg_trickle.compact_threshold
- pg_trickle.max_buffer_rows
- pg_trickle.auto_index
- pg_trickle.aggregate_fast_path
- pg_trickle.template_cache
- pg_trickle.buffer_partitioning
- pg_trickle.max_grouping_set_branches
- pg_trickle.max_parse_depth
- pg_trickle.ivm_topk_max_limit
- pg_trickle.ivm_recursive_max_depth
- Parallel Refresh
- Advanced / Internal
- Guardrails & Diagnostics
- Connection Pooler
- History & Retention
- Circular Dependencies
- GUC Interaction Matrix
- Tuning Profiles
- Complete postgresql.conf Example
- Runtime Configuration
- Further Reading
Overview
pg_trickle exposes over forty configuration variables in the pg_trickle namespace. All can be set in postgresql.conf or at runtime via SET / ALTER SYSTEM.
Required postgresql.conf settings:
shared_preload_libraries = 'pg_trickle'
The extension must be loaded via shared_preload_libraries because it registers GUC variables and a background worker at startup.
Note:
wal_level = logicalandmax_replication_slotsare recommended but not required. The default CDC mode (auto) uses lightweight row-level triggers initially and transparently transitions to WAL-based capture ifwal_level = logicalis available. Ifwal_levelis notlogical, pg_trickle stays on triggers permanently — no degradation, no errors. Setpg_trickle.cdc_mode = 'trigger'to disable WAL transitions entirely (see pg_trickle.cdc_mode).
GUC Variables
Essential
The settings most users configure at install time.
pg_trickle.enabled
Enable or disable the pg_trickle extension.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET (superuser) |
| Restart Required | No |
When set to false, the background scheduler stops processing refreshes. Existing stream tables remain in the catalog but are not refreshed. Manual pgtrickle.refresh_stream_table() calls still work.
-- Disable automatic refreshes
SET pg_trickle.enabled = false;
-- Re-enable
SET pg_trickle.enabled = true;
pg_trickle.cdc_mode
CDC (Change Data Capture) mechanism selection.
| Value | Description |
|---|---|
'auto' | (default) Use triggers for creation; transition to WAL-based CDC if wal_level = logical. Falls back to triggers automatically on error. |
'trigger' | Always use row-level triggers for change capture |
'wal' | Require WAL-based CDC (fails if wal_level != logical) |
Default: 'auto'
pg_trickle.cdc_mode only affects deferred refresh modes ('AUTO', 'FULL',
and 'DIFFERENTIAL'). refresh_mode = 'IMMEDIATE' bypasses CDC entirely and
always uses statement-level IVM triggers. If the GUC is set to 'wal' when a
stream table is created or altered to IMMEDIATE, pg_trickle logs an INFO and
continues with IVM triggers instead of creating CDC triggers or WAL slots.
Per-stream-table overrides take precedence over the GUC when you pass
cdc_mode => 'auto' | 'trigger' | 'wal' to
pgtrickle.create_stream_table(...) or pgtrickle.alter_stream_table(...).
The override is stored in pgtrickle.pgt_stream_tables.requested_cdc_mode.
For shared source tables, pg_trickle resolves the effective source-level CDC
mechanism conservatively: any dependent stream table that requests 'trigger'
keeps the source on trigger CDC; otherwise 'wal' wins over 'auto'.
-- Enable automatic trigger → WAL transition (default)
SET pg_trickle.cdc_mode = 'auto';
-- Force trigger-only CDC (disable WAL transitions)
SET pg_trickle.cdc_mode = 'trigger';
-- Require WAL-based CDC (error if wal_level != logical)
SET pg_trickle.cdc_mode = 'wal';
pg_trickle.scheduler_interval_ms
How often the background scheduler checks for stream tables that need refreshing.
| Property | Value |
|---|---|
| Type | int |
| Default | 1000 (1 second) |
| Range | 100 – 60000 (100ms to 60s) |
| Context | SUSET |
| Restart Required | No |
Tuning Guidance:
- Low-latency workloads (sub-second schedule): Set to
100–500. - Standard workloads (minutes of schedule): Default
1000is appropriate. - Low-overhead workloads (many STs with long schedules): Increase to
5000–10000to reduce scheduler overhead.
The scheduler interval does not determine refresh frequency — it determines how often the scheduler checks whether any ST's staleness exceeds its schedule (or whether a cron expression has fired). The actual refresh frequency is governed by schedule (duration or cron) and canonical period alignment.
SET pg_trickle.scheduler_interval_ms = 500;
pg_trickle.event_driven_wake
Enable event-driven scheduler wake via LISTEN/NOTIFY. When enabled, CDC triggers emit pg_notify('pgtrickle_wake', '') after writing to the change buffer, and the scheduler LISTENs on that channel, waking immediately instead of waiting for the full scheduler_interval_ms poll. This reduces median end-to-end latency from ~500 ms to ~15 ms for low-volume workloads.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
Tuning Guidance:
- Low-latency workloads: Leave enabled (default) for the best latency.
- Extreme write throughput (>100K DML/s): Consider disabling if the per-statement NOTIFY overhead is measurable. The NOTIFY is coalesced by PostgreSQL (one notification per transaction), so the actual overhead is negligible for most workloads.
-- Disable event-driven wake (fall back to poll-only)
SET pg_trickle.event_driven_wake = off;
pg_trickle.wake_debounce_ms
After the scheduler receives the first pgtrickle_wake notification, it waits this many milliseconds to coalesce rapidly arriving notifications before starting a refresh tick. Lower values reduce latency; higher values reduce wake overhead during bulk DML.
| Property | Value |
|---|---|
| Type | int |
| Default | 10 (10 milliseconds) |
| Range | 1 – 5000 |
| Context | SUSET |
| Restart Required | No |
Tuning Guidance:
- Single-statement latency-sensitive: Use
1–5ms. - Bulk DML workloads: Use
50–200ms to coalesce more notifications per tick. - Default (
10ms) balances sub-20 ms latency with reasonable coalescing.
SET pg_trickle.wake_debounce_ms = 50;
pg_trickle.min_schedule_seconds
Minimum allowed schedule value (in seconds) when creating or altering a stream table with a duration-based schedule. This limit does not apply to cron expressions.
| Property | Value |
|---|---|
| Type | int |
| Default | 1 (1 second) |
| Range | 1 – 86400 (1 second to 24 hours) |
| Context | SUSET |
| Restart Required | No |
This acts as a safety guardrail to prevent users from setting impractically small schedules that would cause excessive refresh overhead.
Tuning Guidance:
- Development/testing: Default
1allows sub-second testing. - Production: Raise to
60or higher to prevent excessive WAL consumption and CPU usage.
-- Restrict to 10-second minimum schedules
SET pg_trickle.min_schedule_seconds = 10;
pg_trickle.default_schedule_seconds
Default effective schedule (in seconds) for isolated CALCULATED stream tables that have no downstream dependents.
| Property | Value |
|---|---|
| Type | int |
| Default | 1 (1 second) |
| Range | 1 – 86400 (1 second to 24 hours) |
| Context | SUSET |
| Restart Required | No |
When a CALCULATED stream table (scheduled with 'calculated') has no downstream dependents to derive a schedule from, this value is used as its effective refresh interval. This is distinct from min_schedule_seconds, which is the validation floor for duration-based schedules.
Tuning Guidance:
- Development/testing: Default
1allows rapid iteration. - Production standalone CALCULATED tables: Raise to match your desired update cadence (e.g.,
60for once-per-minute).
-- Set default for isolated CALCULATED tables to 30 seconds
SET pg_trickle.default_schedule_seconds = 30;
pg_trickle.max_consecutive_errors
Maximum consecutive refresh failures before a stream table is moved to ERROR status.
| Property | Value |
|---|---|
| Type | int |
| Default | 3 |
| Range | 1 – 100 |
| Context | SUSET |
| Restart Required | No |
When a ST's consecutive_errors reaches this threshold:
- The ST status changes to
ERROR. - Automatic refreshes stop for this ST.
- Manual intervention is required:
SELECT pgtrickle.alter_stream_table('...', status => 'ACTIVE').
Tuning Guidance:
- Strict (production):
3— fail fast to surface issues. - Lenient (development):
10–20— tolerate transient errors.
SET pg_trickle.max_consecutive_errors = 5;
WAL CDC
Settings specific to WAL-based CDC. Only relevant when pg_trickle.cdc_mode = 'auto' or 'wal'.
pg_trickle.wal_transition_timeout
Note: This setting is only relevant when
pg_trickle.cdc_mode = 'auto'or'wal'. See ARCHITECTURE.md for the full CDC transition lifecycle.
Maximum time (seconds) to wait for the WAL decoder to catch up during the transition from trigger-based to WAL-based CDC. If the decoder has not caught up within this timeout, the system falls back to triggers.
Default: 300 (5 minutes)
Range: 10 – 3600
SET pg_trickle.wal_transition_timeout = 300;
pg_trickle.slot_lag_warning_threshold_mb
Warning threshold for retained WAL on pg_trickle replication slots.
| Property | Value |
|---|---|
| Type | int |
| Default | 100 (MB) |
| Range | 1 – 1048576 |
| Context | SUSET |
| Restart Required | No |
When retained WAL for a pg_trickle replication slot exceeds this threshold:
- The scheduler emits a
slot_lag_warningevent onLISTEN pg_trickle_alert pgtrickle.health_check()reportsWARNfor theslot_lagcheck
Raise this on high-throughput systems that intentionally tolerate larger WAL retention. Lower it if you want earlier warning before slots risk invalidation.
SET pg_trickle.slot_lag_warning_threshold_mb = 256;
pg_trickle.slot_lag_critical_threshold_mb
Critical threshold for retained WAL on pg_trickle replication slots.
| Property | Value |
|---|---|
| Type | int |
| Default | 1024 (MB) |
| Range | 1 – 1048576 |
| Context | SUSET |
| Restart Required | No |
When retained WAL for a pg_trickle replication slot exceeds this threshold,
pgtrickle.check_cdc_health() returns a per-source
slot_lag_exceeds_threshold alert.
This threshold is intentionally higher than the warning threshold so operators can separate early warning from source-level unhealthy state.
SET pg_trickle.slot_lag_critical_threshold_mb = 2048;
Refresh Performance
Fine-grained tuning for the differential refresh engine.
pg_trickle.differential_max_change_ratio
Maximum change-to-table ratio before DIFFERENTIAL refresh falls back to FULL refresh.
| Property | Value |
|---|---|
| Type | float |
| Default | 0.15 (15%) |
| Range | 0.0 – 1.0 |
| Context | SUSET |
| Restart Required | No |
When the number of pending change buffer rows exceeds this fraction of the source table's estimated row count, the refresh engine switches from DIFFERENTIAL (which uses JSONB parsing and window functions) to FULL refresh. At high change rates FULL refresh is cheaper because it avoids the per-row JSONB overhead.
Special Values:
0.0: Disable adaptive fallback — always use DIFFERENTIAL.1.0: Always fall back to FULL (effectively forces FULL mode).
Tuning Guidance:
- OLTP with low change rates (< 5%): Default
0.15is appropriate. - Batch-load workloads (bulk inserts): Lower to
0.05–0.10so large batches trigger FULL refresh sooner. - Latency-sensitive (want deterministic refresh time): Set to
0.0to always use DIFFERENTIAL.
-- Lower threshold for batch-heavy workloads
SET pg_trickle.differential_max_change_ratio = 0.10;
-- Disable adaptive fallback
SET pg_trickle.differential_max_change_ratio = 0.0;
pg_trickle.refresh_strategy
Cluster-wide refresh strategy override.
| Property | Value |
|---|---|
| Type | string |
| Default | 'auto' |
| Values | 'auto', 'differential', 'full' |
| Context | SUSET |
| Restart Required | No |
Controls the FULL vs. DIFFERENTIAL decision for all stream tables whose refresh_mode is DIFFERENTIAL:
'auto'(default): Use the adaptive cost-based heuristic that considersdifferential_max_change_ratio, per-STauto_threshold, refresh history, and spill detection to pick the optimal strategy per refresh cycle.'differential': Always use DIFFERENTIAL refresh — skip the adaptive ratio check entirely. The BUF-LIMIT safety check (max_buffer_rows) still applies.'full': Always use FULL refresh regardless of change volume. Useful for debugging or when you know DIFFERENTIAL is consistently slower for your workload.
Important: Per-ST refresh_mode in the catalog takes precedence. Stream tables explicitly configured as refresh_mode = 'FULL' always use FULL regardless of this GUC.
Tuning Guidance:
- Most workloads: Leave at
'auto'— the adaptive heuristic learns from refresh history. - Known-low-churn workloads: Set to
'differential'to eliminate the per-source capped-count query overhead. - Debugging delta issues: Temporarily set to
'full'to compare behavior.
-- Force DIFFERENTIAL for all stream tables (skip ratio check)
SET pg_trickle.refresh_strategy = 'differential';
-- Force FULL for all stream tables (debugging)
SET pg_trickle.refresh_strategy = 'full';
-- Reset to adaptive heuristic
SET pg_trickle.refresh_strategy = 'auto';
pg_trickle.cost_model_safety_margin
Added in v0.17.0. Safety margin for the predictive cost model that decides FULL vs. DIFFERENTIAL.
| Property | Value |
|---|---|
| Type | float |
| Default | 0.8 |
| Range | 0.1 – 2.0 |
| Context | SUSET |
| Restart Required | No |
When refresh_strategy = 'auto', the cost model estimates DIFFERENTIAL and FULL costs from recent refresh history. DIFFERENTIAL is chosen when:
estimated_diff_cost < estimated_full_cost × safety_margin
A value below 1.0 biases toward DIFFERENTIAL (which has lower lock contention and is generally preferred). A value above 1.0 biases toward FULL.
The cost model also classifies each stream table's query complexity (scan, filter, aggregate, join, or join+aggregate) and uses per-class coefficients learned from historical data.
Tuning Guidance:
0.8(default): Prefer DIFFERENTIAL unless it's nearly as expensive as FULL.0.5: Strongly prefer DIFFERENTIAL — only fall back when it's clearly more expensive.1.0: Neutral — pick whichever is estimated to be cheaper.1.2: Slightly prefer FULL — useful when FULL is very fast and DIFFERENTIAL lock contention is a concern.
-- Strongly prefer DIFFERENTIAL
SET pg_trickle.cost_model_safety_margin = 0.5;
-- Neutral (pick the estimated cheapest)
SET pg_trickle.cost_model_safety_margin = 1.0;
pg_trickle.max_delta_estimate_rows
Added in v0.15.0. Maximum estimated delta output cardinality before falling back to FULL refresh.
| Property | Value |
|---|---|
| Type | int |
| Default | 0 (disabled) |
| Range | 0 – 10,000,000 |
| Context | SUSET |
| Restart Required | No |
Before executing the MERGE, the refresh executor extracts the delta subquery and runs a capped SELECT count(*) FROM (delta LIMIT N+1). If the count reaches the configured limit, the refresh emits a NOTICE and falls back to FULL refresh to prevent OOM or excessive temp-file spills from unexpectedly large delta output.
This is complementary to differential_max_change_ratio which checks input change buffer size as a ratio of source table size. max_delta_estimate_rows checks output cardinality — catching cases where a small number of input changes produce a large delta output after JOINs.
Special Values:
0(default): Disable the estimation check entirely.
Tuning Guidance:
- Servers with 8–16 GB RAM: Start with
100000and adjust based on observed refresh behavior. - Large-memory servers (32+ GB):
500000or higher. - Complex multi-join queries: Lower to
50000since join fan-out can amplify small changes.
-- Enable delta output estimation with 100K row limit
SET pg_trickle.max_delta_estimate_rows = 100000;
-- Disable estimation (default)
SET pg_trickle.max_delta_estimate_rows = 0;
pg_trickle.cleanup_use_truncate
Use TRUNCATE instead of per-row DELETE for change buffer cleanup when the entire buffer is consumed by a refresh.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
After a differential refresh consumes all rows from the change buffer, the engine must clean up the buffer table. TRUNCATE is O(1) regardless of row count, versus DELETE which must update indexes row-by-row. This saves 3–5 ms per refresh at 10%+ change rates.
Trade-off: TRUNCATE acquires an AccessExclusiveLock on the change buffer table. If concurrent DML on the source table is actively inserting into the same change buffer via triggers, this lock can cause brief contention.
Tuning Guidance:
- Most workloads: Leave at
true— the performance benefit outweighs the brief lock. - High-concurrency OLTP with continuous writes during refresh: Set to
falseif you observe lock-wait timeouts on the change buffer. - PgBouncer / connection poolers: The
AccessExclusiveLockacquired byTRUNCATEis held only on the change buffer table (not the source table), but in transaction-pooling mode with frequent refreshes, even brief exclusive locks can cause connection queuing. If you observe elevatedpg_stat_activitywait events on change buffer tables, switch tofalse.
-- Use per-row DELETE for change buffer cleanup
SET pg_trickle.cleanup_use_truncate = false;
pg_trickle.planner_aggressive
Added in v0.14.0. Consolidated switch for all MERGE planner hints. Replaces the deprecated merge_planner_hints and merge_work_mem_mb GUCs.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
When enabled, the refresh executor estimates the delta size and applies optimizer hints within the transaction:
- Delta ≥ 100 rows:
SET LOCAL enable_nestloop = off— forces hash joins instead of nested-loop joins. - Delta ≥ 10,000 rows: additionally
SET LOCAL work_mem = '<N>MB'(see pg_trickle.merge_work_mem_mb).
Tuning Guidance:
- Most workloads: Leave at
true— the hints improve tail latency without affecting small deltas. - Custom plan overrides: Set to
falseif you manage planner settings yourself or if the hints conflict with yourpg_hint_planconfiguration. - Memory-constrained environments: When enabled, large deltas (≥ 10,000 rows) raise
work_memto 64 MB (configurable viamerge_work_mem_mb). If your server has limited RAM and runs many concurrent refreshes, this can cause unexpected memory pressure or temp-file spills. Monitortemp_blks_writteninpg_stat_statementsand consider loweringmerge_work_mem_mbor disabling this GUC if spills are frequent.
-- Disable all planner hints
SET pg_trickle.planner_aggressive = false;
pg_trickle.merge_join_strategy
Added in v0.15.0. Manual override for the join strategy used during MERGE execution.
| Property | Value |
|---|---|
| Type | text |
| Default | 'auto' |
| Values | auto, hash_join, nested_loop, merge_join |
| Context | SUSET |
| Restart Required | No |
Controls which join strategy the refresh executor hints to PostgreSQL via SET LOCAL during differential refresh. Requires planner_aggressive to be enabled.
| Value | Behaviour |
|---|---|
auto (default) | Delta-size heuristics choose: nested-loop for tiny deltas, hash-join for larger ones |
hash_join | Always disable nested-loop joins and raise work_mem — best for medium-to-large deltas |
nested_loop | Always disable hash-join and merge-join — best for very small deltas against indexed tables |
merge_join | Always disable hash-join and nested-loop — useful if data is pre-sorted |
Tuning Guidance:
- Most workloads: Leave at
auto— the built-in heuristic performs well. - Consistently large deltas (1K+ rows): Setting to
hash_joinavoids heuristic overhead. - Troubleshooting: If refresh is slow, try different strategies and compare with
explain_st().
-- Force hash joins for all MERGE operations
SET pg_trickle.merge_join_strategy = 'hash_join';
-- Revert to automatic heuristics
SET pg_trickle.merge_join_strategy = 'auto';
pg_trickle.merge_strategy
Added in v0.16.0. Controls how differential refresh applies deltas to stream tables.
| Property | Value |
|---|---|
| Type | text |
| Default | 'auto' |
| Values | auto, merge |
| Context | SUSET |
| Restart Required | No |
| Value | Behaviour |
|---|---|
auto (default) | Use DELETE+INSERT when delta_rows / target_rows is below merge_strategy_threshold; MERGE otherwise |
merge | Always use the PostgreSQL MERGE statement |
Breaking change (v0.19.0): The
delete_insertvalue was removed in v0.19.0 (CORR-1) because it was semantically unsafe for aggregate and DISTINCT queries. Setting it now logs a WARNING and falls back toauto.
The DELETE+INSERT strategy avoids the MERGE join cost by executing two targeted statements:
a DELETE for removed rows (matched by __pgt_row_id), then an INSERT for new rows.
This is significantly cheaper for sub-1% deltas against large tables because it avoids
scanning the entire target for the MERGE join.
Tuning Guidance:
- Most workloads: Leave at
auto— the heuristic picks DELETE+INSERT for small deltas automatically. - Correctness concerns: The
mergesetting preserves the pre-v0.16.0 behaviour.
-- Force MERGE for all differential refreshes
SET pg_trickle.merge_strategy = 'merge';
-- Revert to automatic heuristics
SET pg_trickle.merge_strategy = 'auto';
pg_trickle.merge_strategy_threshold
Added in v0.16.0. Delta ratio threshold for the auto merge strategy.
| Property | Value |
|---|---|
| Type | float |
| Default | 0.01 (1%) |
| Range | 0.001 – 1.0 |
| Context | SUSET |
| Restart Required | No |
When merge_strategy is auto, DELETE+INSERT is used instead of
MERGE when delta_rows / target_rows is below this threshold. The target row count is estimated
from pg_class.reltuples.
Tuning Guidance:
- Default (0.01): DELETE+INSERT for deltas under 1% of the target table size.
- Higher values (0.05–0.10): More aggressive use of DELETE+INSERT; useful for wide tables where MERGE join overhead is high.
- Lower values (0.001): Only use DELETE+INSERT for very tiny deltas.
-- Use DELETE+INSERT for deltas under 5% of target size
SET pg_trickle.merge_strategy_threshold = 0.05;
pg_trickle.merge_planner_hints
Deprecated in v0.14.0. Use
pg_trickle.planner_aggressiveinstead. This GUC is still accepted for backward compatibility but is ignored at runtime.
Inject SET LOCAL planner hints before MERGE execution during differential refresh.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
When enabled, the refresh executor estimates the delta size and applies optimizer hints within the transaction:
- Delta ≥ 100 rows:
SET LOCAL enable_nestloop = off— forces hash joins instead of nested-loop joins. - Delta ≥ 10,000 rows: additionally
SET LOCAL work_mem = '<N>MB'(see pg_trickle.merge_work_mem_mb).
This reduces P95 latency spikes caused by PostgreSQL choosing nested-loop plans for medium/large delta sizes.
Tuning Guidance:
- Most workloads: Leave at
true— the hints improve tail latency without affecting small deltas. - Custom plan overrides: Set to
falseif you manage planner settings yourself or if the hints conflict with yourpg_hint_planconfiguration.
-- Disable planner hints
SET pg_trickle.merge_planner_hints = false;
pg_trickle.merge_work_mem_mb
work_mem value (in MB) applied via SET LOCAL when the delta exceeds 10,000 rows and planner hints are enabled.
| Property | Value |
|---|---|
| Type | int |
| Default | 64 (64 MB) |
| Range | 8 – 4096 (8 MB to 4 GB) |
| Context | SUSET |
| Restart Required | No |
A higher value lets PostgreSQL use larger in-memory hash tables for the MERGE join, avoiding disk-spilling sort/merge strategies on large deltas. This setting is only applied when both merge_planner_hints = true and the delta exceeds 10,000 rows.
Tuning Guidance:
- Servers with ample RAM (32+ GB): Increase to
128–256for faster large-delta refreshes. - Memory-constrained: Lower to
16–32or disable planner hints entirely. - Very large deltas (100K+ rows): Consider
256–512if refresh latency matters.
SET pg_trickle.merge_work_mem_mb = 128;
pg_trickle.delta_work_mem_cap_mb
Maximum work_mem (in MB) that planner hints are allowed to set during delta MERGE execution. When the deep-join or large-delta path would set work_mem above this cap, the refresh falls back to FULL instead of risking OOM.
| Property | Value |
|---|---|
| Type | int |
| Default | 0 (disabled — no cap) |
| Range | 0 – 8192 (0 to 8 GB) |
| Context | SUSET |
| Restart Required | No |
Set to 0 to disable the cap entirely (default). When enabled, the cap is checked before any SET LOCAL work_mem in apply_planner_hints(). If the configured or computed work_mem exceeds the cap, the refresh emits a NOTICE and falls back to FULL refresh.
Tuning Guidance:
- Production servers with tight memory budgets: Set to
256–512to prevent runaway hash joins. - Servers with ample RAM (64+ GB): Leave at
0(disabled) or set high (2048+). - If you see SCAL-3 fallback notices: Either raise the cap or investigate why delta sizes are unexpectedly large.
SET pg_trickle.delta_work_mem_cap_mb = 512;
pg_trickle.merge_seqscan_threshold
Delta-to-ST row ratio below which sequential scans are disabled for the MERGE transaction. Requires planner hints to be enabled.
| Property | Value |
|---|---|
| Type | real |
| Default | 0.001 |
| Range | 0.0 – 1.0 |
| Context | SUSET |
| Restart Required | No |
When the estimated delta row count divided by the stream table's reltuples falls below this threshold, the refresh executor issues SET LOCAL enable_seqscan = off, coercing PostgreSQL into using the __pgt_row_id B-tree index instead of a full sequential scan.
Set to 0.0 to disable the feature entirely.
Tuning Guidance:
- Default (
0.001): Suitable for most workloads. A 10M-row ST with fewer than 10K delta rows triggers the hint. - High-throughput / small STs: Increase to
0.01if your STs are small and you want more aggressive index usage. - Disable: Set to
0.0if index-only scans are not beneficial for your access pattern.
SET pg_trickle.merge_seqscan_threshold = 0.01;
pg_trickle.auto_backoff
Automatically back off the refresh schedule when a stream table is consistently falling behind.
| Property | Value |
|---|---|
| Type | bool |
| Default | on |
| Context | SUSET |
| Restart Required | No |
When enabled (the default), the scheduler tracks a per-stream-table backoff factor. If a
refresh cycle takes more than 95% of the scheduled interval, the backoff factor doubles
(capped at 8×), effectively stretching the schedule to avoid runaway refresh storms.
The factor resets to 1× on the first on-time completion, and a WARNING is emitted whenever
the factor changes so you always know why a stream table is refreshing more slowly than expected.
The 95% trigger threshold means that brief jitter on developer machines (e.g. a 950 ms refresh
on a 1-second schedule) will correctly engage backoff, while a 900 ms refresh on the same
schedule will not. The EC-11 operator alert (scheduler_falling_behind NOTIFY) continues to
fire at the lower 80% threshold, giving you advance warning before the scheduler is actually stuck.
This is a safety net for overloaded systems — it prevents a single slow stream table from monopolizing the background worker when operators are not available to intervene.
Tuning Guidance:
- Leave on (the default) for both production and development environments.
- Disable only if you are deliberately running stream tables at the limit of their schedule budget and want the scheduler to keep trying at full speed regardless.
-- Disable if you want no backoff (not recommended for production)
SET pg_trickle.auto_backoff = off;
pg_trickle.tiered_scheduling
Enable tiered refresh scheduling (Hot/Warm/Cold/Frozen) for stream tables.
| Property | Value |
|---|---|
| Type | bool |
| Default | on |
| Context | SUSET |
| Restart Required | No |
When enabled, the scheduler applies a per-stream-table refresh tier multiplier
to duration-based schedules. Each stream table has a refresh_tier column
(default 'hot') that controls how often it is refreshed relative to its
configured schedule:
| Tier | Multiplier | Effect |
|---|---|---|
hot | 1× | Refresh at configured schedule (default) |
warm | 2× | Refresh at 2× the configured interval |
cold | 10× | Refresh at 10× the configured interval |
frozen | skip | Never refreshed until manually promoted |
Cron-based schedules are not affected by the tier multiplier.
Set the tier via:
SELECT pgtrickle.alter_stream_table('my_table', tier => 'warm');
SELECT pgtrickle.alter_stream_table('my_table', tier => 'frozen');
Design note: Tiers are user-assigned only. Automatic classification from
pg_stat_user_tables was rejected because pg_trickle's own MERGE scans
pollute the read counters, making auto-classification unreliable.
Tier Thresholds Reference
The following table summarizes the effective refresh behavior for each tier.
All multipliers apply to duration-based schedules only — cron-based
schedules are always honored as-is. New stream tables default to hot.
| Tier | Multiplier | Effective Schedule (1 s base) | Use Case |
|---|---|---|---|
hot | 1× | 1 s | Real-time dashboards, alerting tables, SLA-bound queries |
warm | 2× | 2 s | Important but not latency-critical tables; reduces CPU by 50% |
cold | 10× | 10 s | Reporting tables queried infrequently; saves significant CPU |
frozen | skip | never (until promoted) | Archival tables, tables under maintenance, or seasonal reports |
When to use each tier:
- Hot — default for all new stream tables. Appropriate when downstream consumers expect near-real-time freshness.
- Warm — set for tables where a few seconds of staleness is acceptable. Halves the refresh CPU cost compared to Hot.
- Cold — set for tables queried only by batch jobs or low-frequency dashboards. 10× reduction in refresh overhead.
- Frozen — set when a table should not be refreshed at all (e.g., during a maintenance window or when the upstream source is being migrated). Promote back to Hot/Warm/Cold when ready.
-- Promote a frozen table back to warm
SELECT pgtrickle.alter_stream_table('seasonal_report', tier => 'warm');
-- Freeze a table during maintenance
SELECT pgtrickle.alter_stream_table('my_table', tier => 'frozen');
Changed in v0.12.0: The default for
pg_trickle.tiered_schedulingchanged fromofftoon. Setpg_trickle.tiered_scheduling = offinpostgresql.confto restore pre-v0.12.0 behavior (all STs refresh at full speed regardless of tier assignment).
Diamond Schedule Policy (per-stream-table)
Controls how the scheduler fires diamond consistency groups — sets of stream tables that share upstream sources through a diamond-shaped DAG topology.
| Property | Value |
|---|---|
| Column | diamond_schedule_policy in pgt_stream_tables |
| Values | 'fastest' (default), 'slowest' |
| Set via | create_stream_table(..., diamond_schedule_policy => 'slowest') |
| Alter via | alter_stream_table('name', diamond_schedule_policy => 'slowest') |
Only meaningful when diamond_consistency = 'atomic' is also set.
fastest (default): The atomic group fires when any member is due.
This maximizes freshness but can cause CPU multiplication. In an asymmetric
diamond where stream table B refreshes every 1 s and stream table C every 5 s,
both feeding D with diamond_consistency = 'atomic': C refreshes 5× more
often than its schedule because B triggers the group every second. For N
members with schedules S₁ < S₂ < … < Sₙ, the total refresh count is
N × (cycle_time / S₁), meaning slower members do up to Sₙ/S₁ times more work
than their schedule implies.
slowest: The atomic group fires only when all members are due.
This minimizes CPU cost at the expense of freshness — faster members are held
back until the slowest member's schedule fires.
Tuning Guidance:
- Use
'fastest'when freshness of the diamond tip matters and the cost of extra refreshes is acceptable. - Use
'slowest'when CPU budget is tight or members have very different schedules (e.g., 1 s vs 60 s) and the multiplication would be excessive.
-- Create with slowest policy to avoid CPU multiplication
SELECT pgtrickle.create_stream_table(
'my_diamond_tip',
'SELECT ... FROM a JOIN b ...',
diamond_consistency => 'atomic',
diamond_schedule_policy => 'slowest'
);
pg_trickle.use_prepared_statements
Use SQL PREPARE / EXECUTE for MERGE statements during differential refresh.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
When enabled, the refresh executor issues PREPARE __pgt_merge_{id} on the first cache-hit cycle, then uses EXECUTE on subsequent cycles. After approximately 5 executions, PostgreSQL switches from a custom plan to a generic plan, saving 1–2 ms of parse/plan overhead per refresh.
Tuning Guidance:
- Most workloads: Leave at
true— the cumulative parse/plan savings are significant for frequently-refreshed stream tables. - Highly skewed data: Set to
falseif prepared-statement parameter sniffing produces poor plans (e.g., highly skewed LSN distributions causing bad join estimates).
-- Disable prepared statements
SET pg_trickle.use_prepared_statements = false;
pg_trickle.user_triggers
Control how user-defined triggers on stream tables are handled during refresh.
| Property | Value |
|---|---|
| Type | text |
| Default | 'auto' |
| Values | 'auto', 'off' ('on' accepted as deprecated alias for 'auto') |
| Context | SUSET |
| Restart Required | No |
When a stream table has user-defined row-level triggers, the refresh engine can decompose the MERGE into explicit DELETE + UPDATE + INSERT statements so triggers fire with correct TG_OP, OLD, and NEW values.
Values:
auto(default): Automatically detect user triggers on the stream table. If present, use the explicit DML path; otherwise useMERGE.off: Always useMERGE. User triggers are suppressed during refresh. This is the escape hatch if explicit DML causes issues.on: Deprecated compatibility alias forauto. Existing configs continue to work, but new configs should useauto.
Notes:
- Row-level triggers do not fire during FULL refresh regardless of this setting. FULL refresh uses
DISABLE TRIGGER USER/ENABLE TRIGGER USERto suppress them. - The explicit DML path adds ~25–60% overhead compared to MERGE for affected stream tables.
- Stream tables without user triggers have zero overhead when using
auto(only a fastpg_triggercheck).
-- Auto-detect (default)
SET pg_trickle.user_triggers = 'auto';
-- Suppress triggers, use MERGE
SET pg_trickle.user_triggers = 'off';
-- Backward-compatible legacy setting (treated the same as 'auto')
SET pg_trickle.user_triggers = 'on';
Guardrails & Limits
Safety controls and hard limits.
pg_trickle.block_source_ddl
When enabled, column-affecting DDL (e.g., ALTER TABLE ... DROP COLUMN,
ALTER TABLE ... ALTER COLUMN ... TYPE) on source tables tracked by stream
tables is blocked with an ERROR instead of silently marking stream tables
for reinitialization.
This is useful in production environments where you want to prevent accidental schema changes that would trigger expensive full recomputation of downstream stream tables.
Default: false
Context: Superuser
-- Block column-affecting DDL on tracked source tables
SET pg_trickle.block_source_ddl = true;
-- Allow DDL (stream tables will be marked for reinit instead)
SET pg_trickle.block_source_ddl = false;
Note: Only column-affecting changes are blocked. Benign DDL (adding indexes, comments, constraints) is always allowed regardless of this setting.
pg_trickle.buffer_alert_threshold
When any source table's change buffer exceeds this number of rows, a
BufferGrowthWarning alert is emitted. Raise for high-throughput workloads,
lower for small tables.
Default: 1000000 (1 million rows)
Range: 1000 – 100000000
SET pg_trickle.buffer_alert_threshold = 500000;
pg_trickle.compact_threshold
When a source table's pending change buffer exceeds this many rows,
compaction is triggered before the next refresh cycle. Compaction eliminates
net-zero INSERT+DELETE pairs (rows inserted then deleted within the same
refresh window) and collapses multi-change groups to first+last rows per
pk_hash, reducing delta scan overhead by 50–90% for high-churn tables.
Set to 0 to disable compaction.
Default: 100000 (100K rows)
Range: 0 – 100000000
-- Trigger compaction at 50K pending rows
SET pg_trickle.compact_threshold = 50000;
-- Disable compaction
SET pg_trickle.compact_threshold = 0;
pg_trickle.max_buffer_rows
Added in v0.16.0. Hard limit on change buffer rows per source table. When a source table's change buffer exceeds this limit at refresh time, pg_trickle forces a FULL refresh and truncates the buffer, preventing unbounded disk growth when differential refresh fails repeatedly.
| Property | Value |
|---|---|
| Type | integer |
| Default | 1000000 (1 million rows) |
| Range | 0 – 100000000 |
| Context | SUSET |
| Restart Required | No |
Set to 0 to disable the limit (not recommended for production).
Tuning Guidance:
- Most workloads: Leave at
1000000. This accommodates high-throughput tables while preventing runaway growth. - High-throughput event tables: Raise to
5000000–10000000if your source tables regularly accumulate large change buffers between refresh cycles. - Small databases / tight disk budgets: Lower to
100000–500000to limit change buffer disk usage.
-- Set buffer limit to 5 million rows
SET pg_trickle.max_buffer_rows = 5000000;
-- Disable the limit (not recommended)
SET pg_trickle.max_buffer_rows = 0;
pg_trickle.auto_index
Added in v0.16.0. Controls whether create_stream_table() automatically
creates performance indexes on stream tables.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
When enabled, the following indexes are created automatically:
-
GROUP BY composite index — for aggregate queries in DIFFERENTIAL mode, a composite index on the GROUP BY columns is created to speed up group lookups during MERGE.
-
DISTINCT composite index — for DISTINCT queries with ≤ 8 output columns, a composite index on all output columns is created.
-
Covering
__pgt_row_idindex — for stream tables with ≤ 8 output columns, the__pgt_row_idindex includes all user columns viaINCLUDE, enabling index-only scans during MERGE (20–50% faster for small deltas against large targets).
The __pgt_row_id index itself is always created regardless of this setting
(it is required for correctness).
Tuning Guidance:
- Most workloads: Leave at
true. - Custom index strategies: Set to
falseif you prefer to manage indexes manually or if the auto-created indexes conflict with your workload patterns.
-- Disable automatic index creation
SET pg_trickle.auto_index = false;
pg_trickle.aggregate_fast_path
Added in v0.16.0. Controls whether stream tables with all-algebraic aggregates use the explicit DML fast-path instead of MERGE.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
When enabled, stream tables whose aggregates are all algebraically invertible (COUNT, SUM, AVG, STDDEV, VAR, CORR, REGR_*, etc.) use the explicit DML path (DELETE + UPDATE + INSERT via a materialized temp table) instead of the generic MERGE statement. This avoids the MERGE hash-join cost, which dominates for aggregate queries with high group cardinality.
Not eligible:
- Queries with SEMI_ALGEBRAIC aggregates (MIN, MAX) — these may require group-rescan on extremum deletion
- Queries with GROUP_RESCAN aggregates (STRING_AGG, ARRAY_AGG, JSON_AGG, etc.)
- Queries with user-defined triggers on the stream table (already use explicit DML via the user-trigger path)
The explain_st() output shows the aggregate_path field:
explicit_dml— fast-path is activemerge— using the default MERGE pathmerge (fast-path disabled)— eligible but GUC is off
-- Disable aggregate fast-path
SET pg_trickle.aggregate_fast_path = false;
-- Check the current aggregate path for a stream table
SELECT * FROM pgtrickle.explain_st('my_agg_st');
pg_trickle.template_cache
Added in v0.16.0. Controls the cross-backend delta template cache backed by an UNLOGGED catalog table.
| Property | Value |
|---|---|
| Type | bool |
| Default | true |
| Context | SUSET |
| Restart Required | No |
When enabled, delta SQL templates generated by the DVM engine are persisted in
pgtrickle.pgt_template_cache so that new backends skip the ~45 ms
parse+differentiate step on their first refresh of each stream table (down to
~1 ms SPI lookup).
Templates are automatically invalidated when:
- A stream table's defining query changes (ALTER STREAM TABLE ... SET QUERY)
- A stream table is dropped
- A stream table is reinitialized
The explain_st() output includes template_cache (enabled/disabled) and
template_cache_stats with L2 hit and full miss counters.
-- Disable the template cache for debugging
SET pg_trickle.template_cache = false;
-- Check template cache stats
SELECT * FROM pgtrickle.explain_st('my_st')
WHERE property IN ('template_cache', 'template_cache_stats');
pg_trickle.buffer_partitioning
Controls whether change buffer tables use PARTITION BY RANGE (lsn) for
O(1) cleanup via partition detach instead of O(n) DELETE.
| Value | Behaviour |
|---|---|
'off' | (Default) Unpartitioned heap tables. Cleanup uses DELETE or TRUNCATE. Lowest DDL overhead per cycle. |
'on' | Always create partitioned change buffers. Old partitions are detached and dropped after consumption — O(1) cleanup regardless of buffer size. Best for high-throughput sources where buffers routinely exceed compact_threshold. |
'auto' | Start with unpartitioned buffers. If a buffer accumulates more rows than compact_threshold within a single refresh cycle, automatically promote it to RANGE(lsn) partitioned mode. Once promoted, the buffer stays partitioned. Combines low overhead for quiet sources with O(1) cleanup for hot ones. |
Default: 'off'
Context: SUSET (superuser session-level)
-- Always partition change buffers
SET pg_trickle.buffer_partitioning = 'on';
-- Auto-promote based on throughput
SET pg_trickle.buffer_partitioning = 'auto';
-- Disable partitioning (default)
SET pg_trickle.buffer_partitioning = 'off';
Interaction with
compact_threshold: In'auto'mode, thecompact_thresholdvalue serves double duty — it triggers both compaction and the auto-promotion decision. Loweringcompact_thresholdmakes auto-promotion more sensitive.
pg_trickle.max_grouping_set_branches
Maximum allowed grouping set branches in CUBE/ROLLUP queries.
CUBE(n) produces $2^n$ branches — without a limit, large cubes cause
memory exhaustion during parsing. Users who genuinely need more than
64 branches can raise this GUC.
Default: 64
Range: 1 – 65536
-- Allow up to 128 grouping set branches
SET pg_trickle.max_grouping_set_branches = 128;
pg_trickle.volatile_function_policy
Controls how volatile functions in defining queries are handled for DIFFERENTIAL and IMMEDIATE modes.
| Value | Behaviour |
|---|---|
reject | (Default) Volatile functions cause an ERROR at stream table creation time. |
warn | Volatile functions emit a WARNING but creation proceeds. Delta correctness is not guaranteed. |
allow | Volatile functions are permitted silently. Use only when you understand that delta computation may produce incorrect results. |
Default: reject
Context: SUSET (superuser session-level)
-- Allow volatile functions with a warning
SET pg_trickle.volatile_function_policy = 'warn';
-- Allow volatile functions silently
SET pg_trickle.volatile_function_policy = 'allow';
Note: Volatile functions (e.g.,
random(),clock_timestamp()) produce different values on each evaluation. In DIFFERENTIAL/IMMEDIATE modes, the delta computation assumes deterministic functions — volatile functions may cause stale or incorrect rows. FULL mode is unaffected since it recomputes from scratch every time.
pg_trickle.unlogged_buffers
Create new change buffer tables as UNLOGGED to reduce WAL amplification
from CDC trigger inserts.
| Value | Behaviour |
|---|---|
false | (Default) Change buffers are WAL-logged. Crash-safe — no data loss on crash recovery. |
true | New change buffers are created as UNLOGGED. Eliminates WAL writes for trigger-inserted rows, reducing WAL amplification by ~30%. Trade-off: buffers are truncated on crash recovery; affected stream tables automatically receive a FULL refresh on the next scheduler cycle. |
Default: false
Context: SUSET (superuser session-level)
-- Enable UNLOGGED buffers for new stream tables
SET pg_trickle.unlogged_buffers = true;
Crash recovery: After a PostgreSQL crash or standby restart, UNLOGGED buffer tables are automatically truncated by PostgreSQL. The pg_trickle scheduler detects this condition and enqueues a FULL refresh for each affected stream table on the next tick. During the window between crash recovery and FULL refresh completion, stream table data may be stale.
Standby replicas: UNLOGGED tables are not replicated to standbys. Stream tables on read replicas will be stale after any standby restart until the next FULL refresh completes on the primary.
Converting existing buffers: This GUC only affects newly created change buffer tables. To convert existing logged buffers, use:
SELECT pgtrickle.convert_buffers_to_unlogged();This function acquires
ACCESS EXCLUSIVElock on each buffer table. Run it during a low-traffic maintenance window.
pg_trickle.max_parse_depth
Maximum recursion depth for the query parser's tree visitors (G13-SD).
Prevents stack-overflow crashes on pathological queries with deeply nested
subqueries, CTEs, or set operations. When the limit is exceeded, the
parser returns a QueryTooComplex error instead of crashing.
Default: 64
Range: 1 – 10000
-- Raise the limit for deeply nested queries
SET pg_trickle.max_parse_depth = 128;
pg_trickle.ivm_topk_max_limit
Maximum LIMIT value for TopK stream tables in IMMEDIATE mode.
TopK queries exceeding this threshold are rejected because the inline
micro-refresh (recomputing top-K rows on every DML statement) adds
latency proportional to LIMIT. Set to 0 to disable TopK in
IMMEDIATE mode entirely.
Default: 1000
Range: 0 – 1000000
-- Allow TopK up to LIMIT 5000 in IMMEDIATE mode
SET pg_trickle.ivm_topk_max_limit = 5000;
pg_trickle.ivm_recursive_max_depth
Maximum recursion depth for WITH RECURSIVE queries in IMMEDIATE mode.
The semi-naive evaluation injects a __pgt_depth counter column into the
recursive SQL; iteration stops when the counter reaches this limit. Protects
against infinite recursion in pathological graphs.
Default: 100
Range: 1 – 10000
-- Allow deeper recursion for large hierarchies
SET pg_trickle.ivm_recursive_max_depth = 500;
Parallel Refresh
These settings control whether and how the scheduler dispatches refresh work to multiple dynamic background workers instead of processing stream tables sequentially. See PLAN_PARALLELISM.md for the design.
Note: Parallel refresh is new in v0.4.0 and defaults to
off. Enable it viapg_trickle.parallel_refresh_modeafter validating your workload.
pg_trickle.parallel_refresh_mode
Controls whether the scheduler dispatches refresh work to dynamic background workers.
| Property | Value |
|---|---|
| Type | text |
| Default | 'off' |
| Values | 'off', 'dry_run', 'on' |
| Context | SUSET |
| Restart Required | No |
off(default): Sequential execution. All stream tables are refreshed one at a time in topological order by the single scheduler background worker. This is the proven, stable default.dry_run: The scheduler computes execution units, logs dispatch decisions (unit keys, ready-queue contents, budget), but still executes refreshes inline. Useful for previewing parallel behaviour without actually spawning workers.on: True parallel refresh. The coordinator builds an execution-unit DAG, dispatches ready units to dynamic background workers, and respects both the per-database cap (max_concurrent_refreshes) and the cluster-wide cap (max_dynamic_refresh_workers).
-- Preview parallel dispatch decisions without changing runtime behaviour
SET pg_trickle.parallel_refresh_mode = 'dry_run';
-- Enable parallel refresh
SET pg_trickle.parallel_refresh_mode = 'on';
pg_trickle.max_dynamic_refresh_workers
Cluster-wide cap on concurrently active pg_trickle dynamic refresh workers.
| Property | Value |
|---|---|
| Type | int |
| Default | 4 |
| Range | 0 – 64 |
| Context | SUSET |
| Restart Required | No |
This is distinct from pg_trickle.max_concurrent_refreshes (per-database
cap). When multiple databases each have their own scheduler, this GUC
prevents them from overcommitting the shared PostgreSQL
max_worker_processes budget.
Worker-budget planning: Each dynamic refresh worker consumes one
max_worker_processes slot. In addition, pg_trickle uses one slot for
the launcher and one per-database scheduler. Ensure:
max_worker_processes >= pg_trickle launchers (1)
+ pg_trickle schedulers (1 per database)
+ max_dynamic_refresh_workers
+ autovacuum workers
+ parallel query workers
+ other extensions
A typical small deployment (1–2 databases, 4 parallel workers) needs at
least max_worker_processes = 16. The E2E test Docker image uses 128.
-- Allow up to 8 concurrent refresh workers cluster-wide
SET pg_trickle.max_dynamic_refresh_workers = 8;
pg_trickle.max_concurrent_refreshes
Per-database dispatch cap for parallel refresh workers.
| Property | Value |
|---|---|
| Type | int |
| Default | 4 |
| Range | 1 – 32 |
| Context | SUSET |
| Restart Required | No |
When parallel_refresh_mode = 'on', this limits how many execution units
a single database coordinator may have in-flight at the same time. In
sequential mode (parallel_refresh_mode = 'off'), this setting has no
effect.
The effective concurrent refreshes for a database is:
min(max_concurrent_refreshes, max_dynamic_refresh_workers - workers_used_by_other_dbs)
-- Allow up to 8 concurrent refreshes in this database
SET pg_trickle.max_concurrent_refreshes = 8;
pg_trickle.per_database_worker_quota
Per-database dynamic refresh worker quota for multi-tenant cluster isolation.
| Property | Value |
|---|---|
| Type | int |
| Default | 0 (disabled) |
| Range | 0 – 64 |
| Context | SUSET |
| Restart Required | No |
When greater than 0, each per-database scheduler limits itself to this many
concurrently active dynamic refresh workers drawn from the shared
max_dynamic_refresh_workers pool. This prevents a single busy database
from starving others in multi-tenant clusters.
Burst capacity: when the cluster is lightly loaded (active workers
< 80% of max_dynamic_refresh_workers), a database may temporarily
exceed its quota by up to 50% to absorb sudden change backlogs. The burst
is reclaimed automatically within 1 scheduler cycle once global load rises
back above the 80% threshold.
Priority dispatch: within each dispatch tick, IMMEDIATE-trigger closures are dispatched before all other unit kinds, ensuring transactional consistency requirements are always met first, even under quota pressure.
-- Limit the analytics DB to 4 base workers (bursts to 6 when cluster is idle)
ALTER DATABASE analytics SET pg_trickle.per_database_worker_quota = 4;
-- Give the reporting DB only 2 base workers
ALTER DATABASE reporting SET pg_trickle.per_database_worker_quota = 2;
SELECT pg_reload_conf();
When per_database_worker_quota = 0 (the default), this feature is
disabled and all databases share the max_dynamic_refresh_workers pool
on a first-come-first-served basis, bounded per coordinator by
max_concurrent_refreshes.
Note: Set this GUC per-database with
ALTER DATABASErather than globally withALTER SYSTEM, so different databases can have different quotas.
Advanced / Internal
pg_trickle.change_buffer_schema
Schema name for change-buffer tables created by the trigger-based CDC pipeline.
Default: 'pgtrickle_changes'
Change buffer tables are named <schema>.changes_<oid> where <oid> is
the source table's OID. Placing them in a dedicated schema keeps them out
of the public namespace.
SET pg_trickle.change_buffer_schema = 'my_change_buffers';
pg_trickle.foreign_table_polling
Enable polling-based change detection for foreign table sources. When
enabled, the scheduler periodically re-executes the foreign table query
and computes deltas via snapshot comparison (EXCEPT ALL). Foreign tables
cannot use trigger or WAL-based CDC, so this is the only mechanism for
incremental maintenance.
Default: false
SET pg_trickle.foreign_table_polling = true;
pg_trickle.matview_polling
Enable polling-based CDC for materialized views. When enabled, materialized
views referenced in defining queries are supported via snapshot-comparison
(the same mechanism as foreign table polling). A local shadow table stores
the previous state; EXCEPT ALL computes the delta on each refresh cycle.
| Property | Value |
|---|---|
| Type | boolean |
| Default | false |
| Context | SUSET (superuser) |
| Restart required | No |
SET pg_trickle.matview_polling = true;
pg_trickle.cdc_trigger_mode
Controls the CDC trigger granularity: statement (default) or row.
statement uses statement-level AFTER triggers with transition tables
(NEW TABLE / OLD TABLE). A single invocation per DML statement processes
all affected rows in one bulk INSERT ... SELECT, giving 50-80% less
write-side overhead for bulk UPDATE/DELETE. Single-row DML is unaffected.
row uses the legacy per-row trigger approach (pg_trickle < 0.4.0 behavior).
Changing this setting takes effect for newly installed CDC triggers. Call
pgtrickle.rebuild_cdc_triggers() to migrate existing stream tables.
| Property | Value |
|---|---|
| Type | string |
| Default | 'statement' |
| Valid values | statement, row |
| Context | SUSET (superuser) |
| Restart required | No |
-- Switch to statement-level triggers (default, recommended)
SET pg_trickle.cdc_trigger_mode = 'statement';
-- After changing, rebuild existing triggers:
SELECT pgtrickle.rebuild_cdc_triggers();
pg_trickle.tick_watermark_enabled
Cap CDC consumption to the WAL LSN at scheduler tick start. When enabled
(default), each scheduler tick captures pg_current_wal_lsn() at its start
and prevents any refresh from consuming WAL changes beyond that LSN. This
bounds cross-source staleness without requiring user configuration.
Disable only if you need stream tables to always advance to the latest available LSN.
| Property | Value |
|---|---|
| Type | boolean |
| Default | true |
| Context | SUSET (superuser) |
| Restart required | No |
-- Disable tick watermark bounding
SET pg_trickle.tick_watermark_enabled = false;
pg_trickle.watermark_holdback_timeout
Maximum seconds a user-provided watermark may remain un-advanced before
being considered stuck. When a watermark group contains a source whose
watermark has not been advanced within this timeout, downstream stream
tables in that group are paused (refresh is skipped) and a
pgtrickle_alert NOTIFY with watermark_stuck event is emitted.
When the stuck watermark is advanced again (via advance_watermark()), the
pause is automatically lifted and a watermark_resumed event is emitted.
Set to 0 to disable stuck-watermark detection (default). Useful values
depend on your ETL pipeline cadence -- for a pipeline that loads every 5
minutes, a timeout of 600 (10 min) gives a safety margin.
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (disabled) |
| Min | 0 |
| Max | 86400 (24 hours) |
| Context | SUSET (superuser) |
| Restart required | No |
-- Set stuck-watermark timeout to 10 minutes
ALTER SYSTEM SET pg_trickle.watermark_holdback_timeout = 600;
SELECT pg_reload_conf();
NOTIFY payloads:
{"event":"watermark_stuck","group":"order_pipeline","source_oid":16385,"age_secs":620}
{"event":"watermark_resumed","source_oid":16385}
pg_trickle.spill_threshold_blocks
Temp blocks written threshold for spill detection. After each differential
MERGE, pg_trickle queries pg_stat_statements for the temp_blks_written
metric. If the value exceeds this threshold, the refresh is considered a
spill.
After spill_consecutive_limit consecutive spills, the scheduler forces a
FULL refresh for that stream table to prevent repeated expensive
differential merges.
Requires the pg_stat_statements extension to be installed. Set to 0 to
disable spill detection (default).
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (disabled) |
| Min | 0 |
| Max | 100000000 |
| Context | SUSET (superuser) |
| Restart required | No |
-- Enable spill detection: flag > 1000 temp blocks as a spill
ALTER SYSTEM SET pg_trickle.spill_threshold_blocks = 1000;
SELECT pg_reload_conf();
pg_trickle.spill_consecutive_limit
Number of consecutive spilling differential refreshes before the scheduler automatically forces a FULL refresh. Resets after any non-spilling refresh.
Only effective when spill_threshold_blocks > 0.
| Property | Value |
|---|---|
| Type | integer |
| Default | 3 |
| Min | 1 |
| Max | 100 |
| Context | SUSET (superuser) |
| Restart required | No |
-- Force FULL after 5 consecutive spills (default: 3)
ALTER SYSTEM SET pg_trickle.spill_consecutive_limit = 5;
SELECT pg_reload_conf();
pg_trickle.log_merge_sql
Log the generated MERGE SQL template on every refresh cycle. When enabled,
the MERGE SQL template built during differential refresh is emitted to the
PostgreSQL server log at LOG level.
Intended for debugging MERGE query generation only. Do not enable in production — the output is verbose and includes the full SQL for every refresh.
| Property | Value |
|---|---|
| Type | boolean |
| Default | false |
| Context | SUSET (superuser) |
| Restart required | No |
SET pg_trickle.log_merge_sql = true;
Guardrails & Diagnostics
These GUCs control safety thresholds and diagnostic warnings.
pg_trickle.fuse_default_ceiling
Global default change-count ceiling for the fuse circuit breaker. When a
stream table has fuse_mode = 'on' or 'auto' and no per-ST fuse_ceiling,
this value is used. If pending changes exceed this count, the fuse blows
and the stream table is suspended (status = SUSPENDED).
Set to 0 to disable the global default (per-ST ceilings still apply).
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (disabled) |
| Range | 0 - 2,000,000,000 |
| Context | SUSET (superuser) |
| Restart required | No |
-- Set global fuse ceiling to 1 million rows
SET pg_trickle.fuse_default_ceiling = 1000000;
pg_trickle.delta_amplification_threshold
Delta amplification detection threshold (output/input ratio). When a
DIFFERENTIAL refresh produces more than this multiple of the input delta
rows, a WARNING is emitted so operators can identify pathological join
fan-out or many-to-many amplification.
Set to 0.0 to disable.
| Property | Value |
|---|---|
| Type | float |
| Default | 0.0 (disabled) |
| Range | 0.0 - 100,000.0 |
| Context | SUSET (superuser) |
| Restart required | No |
-- Warn when delta output is 10x the input
SET pg_trickle.delta_amplification_threshold = 10.0;
pg_trickle.algebraic_drift_reset_cycles
Differential cycles between automatic full recomputes for algebraic
aggregates. After this many differential refresh cycles, stream tables
with algebraic aggregates (AVG, STDDEV, VAR) are automatically
reinitialized to reset accumulated floating-point drift in auxiliary
columns.
Set to 0 to disable automatic resets.
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (disabled) |
| Range | 0 - 100,000 |
| Context | SUSET (superuser) |
| Restart required | No |
-- Reset algebraic aggregates every 10,000 cycles
SET pg_trickle.algebraic_drift_reset_cycles = 10000;
pg_trickle.agg_diff_cardinality_threshold
Estimated GROUP BY cardinality threshold for algebraic aggregate warnings.
At create_stream_table time, if the defining query uses algebraic
aggregates (SUM, COUNT, AVG) in DIFFERENTIAL mode and the estimated
group cardinality is below this threshold, a WARNING is emitted suggesting
FULL or AUTO mode.
Set to 0 to disable the warning.
| Property | Value |
|---|---|
| Type | integer |
| Default | 0 (disabled) |
| Range | 0 - 100,000,000 |
| Context | SUSET (superuser) |
| Restart required | No |
-- Warn when GROUP BY cardinality is below 100
SET pg_trickle.agg_diff_cardinality_threshold = 100;
Connection Pooler
v0.19.0+ (STAB-1).
pg_trickle.connection_pooler_mode
Cluster-wide connection pooler compatibility override.
| Property | Value |
|---|---|
| Type | string |
| Default | 'off' |
| Valid values | 'off', 'transaction', 'session' |
| Context | SUSET |
| Value | Behaviour |
|---|---|
off (default) | Per-ST pooler_compatibility_mode governs |
transaction | Globally disable prepared-statement reuse and suppress NOTIFY emissions (PgBouncer transaction-pool compatibility) |
session | Explicit opt-in to session mode (same as off today, reserved for future use) |
See Connection Pooler Compatibility for deployment guidance.
-- Enable transaction-mode pooler compatibility globally
SET pg_trickle.connection_pooler_mode = 'transaction';
History & Retention
v0.19.0+ (DB-5).
pg_trickle.history_retention_days
Number of days to retain rows in pgtrickle.pgt_refresh_history.
| Property | Value |
|---|---|
| Type | integer |
| Default | 90 |
| Min | 0 (disabled) |
| Max | 36500 (~100 years) |
| Context | SUSET |
The scheduler runs a daily background cleanup that deletes rows older than
this many days. Set to 0 to disable automatic cleanup (history grows
unbounded — monitor disk usage).
-- Keep 30 days of refresh history
SET pg_trickle.history_retention_days = 30;
Circular Dependencies
v0.7.0+ — Circular dependency support is now fully available for safe monotone cycles in DIFFERENTIAL mode. These settings control whether cycles are allowed at all and how many fixpoint iterations the scheduler will try before surfacing a non-convergence error.
pg_trickle.allow_circular
Master switch for circular (cyclic) stream table dependencies. When false
(default), creating a stream table that would introduce a cycle in the
dependency graph is rejected with a CycleDetected error. When true,
monotone cycles — those containing only safe operators (joins, filters,
projections, UNION ALL, INTERSECT, EXISTS) — are allowed.
Non-monotone operators (Aggregate, EXCEPT, Window functions, NOT EXISTS) always block cycle creation regardless of this setting, because they cannot guarantee convergence to a fixed point.
Default: false
SET pg_trickle.allow_circular = true;
pg_trickle.max_fixpoint_iterations
Maximum number of iterations per strongly connected component (SCC) before the scheduler declares non-convergence and marks all SCC members as ERROR. Prevents runaway loops in circular dependency chains.
For most practical use cases (transitive closure, graph reachability), convergence happens in 2–5 iterations. The default of 100 provides ample headroom.
Default: 100
Range: 1 – 10000
SET pg_trickle.max_fixpoint_iterations = 50;
pg_trickle.dog_feeding_auto_apply
Added in v0.20.0 (DF-G1).
Controls whether the dog-feeding analytics stream tables can automatically adjust stream table configuration.
| Value | Behaviour |
|---|---|
off (default) | Advisory only — no automatic changes. Dog-feeding stream tables produce analytics that operators and dashboards can read, but nothing is applied automatically. |
threshold_only | After each 10-minute auto-apply cycle, reads df_threshold_advice. If a recommendation has HIGH confidence and the recommended threshold differs from the current threshold by more than 5%, applies ALTER STREAM TABLE ... SET auto_threshold = <recommended>. Changes are logged with initiated_by = 'DOG_FEED'. |
full | Same as threshold_only, plus applies scheduling hints from df_scheduling_interference (future enhancement). |
Default: off
-- Enable threshold auto-apply.
SET pg_trickle.dog_feeding_auto_apply = 'threshold_only';
-- Check current setting.
SHOW pg_trickle.dog_feeding_auto_apply;
Prerequisites: Dog-feeding stream tables must be created first via
SELECT pgtrickle.setup_dog_feeding(). If the stream tables do not exist,
the auto-apply worker is a no-op.
Rate limiting: At most one threshold change per stream table per 10 minutes.
Audit trail: All auto-apply changes are recorded in pgt_refresh_history
with initiated_by = 'DOG_FEED' and a SKIP action describing the old and new
threshold values.
GUC Interaction Matrix
Some GUC variables interact with or depend on each other. The table below documents these cross-dependencies to help avoid misconfiguration.
| GUC A | GUC B | Interaction |
|---|---|---|
event_driven_wake | scheduler_interval_ms | When event_driven_wake = true, the scheduler wakes on NOTIFY and scheduler_interval_ms serves only as the poll-based fallback interval. Lowering scheduler_interval_ms below 100 ms with event-driven wake enabled adds little value and wastes CPU. |
event_driven_wake | wake_debounce_ms | wake_debounce_ms only takes effect when event_driven_wake = true. It coalesces rapid-fire notifications during bulk DML. Set higher (50–100 ms) for write-heavy workloads, lower (5–10 ms) for latency-sensitive workloads. |
auto_backoff | min_schedule_seconds | auto_backoff stretches the effective interval up to 8× the configured schedule, but never below min_schedule_seconds. If min_schedule_seconds is high, backoff has limited room to operate. |
auto_backoff | default_schedule_seconds | The backoff multiplier is applied to default_schedule_seconds (or the per-ST override); raising this value gives backoff a wider range. |
parallel_refresh_mode | max_concurrent_refreshes | parallel_refresh_mode = 'on' dispatches independent STs to parallel workers, up to max_concurrent_refreshes per database. Setting max_concurrent_refreshes = 1 effectively disables parallelism even when the mode is 'on'. |
parallel_refresh_mode | max_dynamic_refresh_workers | max_dynamic_refresh_workers is a cluster-wide cap across all databases. If you have 4 databases each wanting 4 concurrent refreshes, set this to ≥16 (or accept queuing). |
max_dynamic_refresh_workers | per_database_worker_quota | When per_database_worker_quota > 0, each database claims at most that many workers from the shared max_dynamic_refresh_workers pool. Set per_database_worker_quota to max_dynamic_refresh_workers / n_databases for equal sharing. Burst to 150% is allowed when the cluster is < 80% loaded. |
differential_max_change_ratio | fuse_default_ceiling | Both guard against large change batches but at different levels: differential_max_change_ratio triggers a FULL refresh fallback (proportional to table size), while fuse_default_ceiling halts refresh entirely (absolute row count). The fuse fires first if the change count exceeds it, regardless of the ratio. |
block_source_ddl | DDL operations | When true, DDL on source tables (ALTER TABLE, DROP COLUMN) is blocked by an event trigger. Disable temporarily with SET pg_trickle.block_source_ddl = false before schema migrations, then re-enable. |
cdc_mode | cdc_trigger_mode | cdc_trigger_mode ('statement' / 'row') only applies when CDC is trigger-based. When cdc_mode = 'wal' (or after auto-transition to WAL), cdc_trigger_mode is irrelevant. |
cdc_mode | wal_transition_timeout | wal_transition_timeout only applies when cdc_mode = 'auto'. It controls how many seconds to wait for the first WAL-based refresh to succeed before falling back to triggers. |
cleanup_use_truncate | compact_threshold | cleanup_use_truncate = true uses TRUNCATE to clear consumed change buffers (fastest, acquires AccessExclusiveLock briefly). compact_threshold controls when fully-consumed buffers are compacted via DELETE — only relevant when TRUNCATE is disabled. |
buffer_partitioning | compact_threshold | In 'auto' mode, compact_threshold serves as the promotion trigger: if a buffer exceeds this many rows in a single refresh cycle, it is promoted to RANGE(lsn) partitioned mode. Lowering compact_threshold makes auto-promotion more sensitive. |
allow_circular | max_fixpoint_iterations | max_fixpoint_iterations is only evaluated when allow_circular = true. It caps the number of convergence iterations for circular dependency chains. |
ivm_topk_max_limit | TopK queries | Queries with LIMIT > ivm_topk_max_limit fall back to FULL refresh instead of the optimized TopK path. Raise this if you have legitimate large TopK queries. |
ivm_recursive_max_depth | Recursive CTEs | Recursive expansion beyond ivm_recursive_max_depth iterations is terminated with a warning and falls back to FULL refresh. Set to 0 to disable the guard (not recommended). |
Tuning Profiles
Three named profiles for common deployment patterns. Copy the relevant
settings into your postgresql.conf and adjust to taste.
Low-Latency Profile
Goal: Minimize end-to-end latency from base table write to stream table update. Best for dashboards, real-time analytics, and operational monitoring.
# Event-driven wake — sub-50ms median latency
pg_trickle.event_driven_wake = true
pg_trickle.wake_debounce_ms = 5 # aggressive: 5ms coalesce
# Fast scheduling
pg_trickle.scheduler_interval_ms = 200 # poll fallback (rarely used)
pg_trickle.min_schedule_seconds = 1
pg_trickle.default_schedule_seconds = 1
# Parallel refresh for independent STs
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_concurrent_refreshes = 4
# Lean merge
pg_trickle.merge_planner_hints = true
pg_trickle.merge_work_mem_mb = 128 # more memory = fewer disk sorts
pg_trickle.cleanup_use_truncate = true
pg_trickle.use_prepared_statements = true
# Guardrails
pg_trickle.auto_backoff = true # prevent CPU runaway
pg_trickle.fuse_default_ceiling = 0 # disabled — latency over safety
pg_trickle.block_source_ddl = true
High-Throughput Profile
Goal: Maximize rows-per-second processed across many stream tables under heavy write load. Accepts slightly higher latency in exchange for better batching and resource efficiency.
# Batched wake — coalesce writes into larger deltas
pg_trickle.event_driven_wake = true
pg_trickle.wake_debounce_ms = 50 # 50ms coalesce window
# Relaxed scheduling
pg_trickle.scheduler_interval_ms = 2000 # 2-second poll fallback
pg_trickle.min_schedule_seconds = 2
pg_trickle.default_schedule_seconds = 5
# Heavy parallelism
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_concurrent_refreshes = 8
pg_trickle.max_dynamic_refresh_workers = 8
# Aggressive performance
pg_trickle.merge_planner_hints = true
pg_trickle.merge_work_mem_mb = 256 # large work_mem for big deltas
pg_trickle.merge_seqscan_threshold = 0.01 # allow seq scans for >1% changes
pg_trickle.cleanup_use_truncate = true
pg_trickle.use_prepared_statements = true
pg_trickle.auto_backoff = true
pg_trickle.buffer_partitioning = 'auto' # O(1) cleanup for hot buffers
# Safety for bulk workloads
pg_trickle.fuse_default_ceiling = 500000 # pause on >500K changes
pg_trickle.differential_max_change_ratio = 0.25 # FULL fallback at 25%
pg_trickle.block_source_ddl = true
Resource-Constrained Profile
Goal: Minimize CPU and memory footprint for small instances, shared hosting, or development environments. Accepts higher latency and slower throughput.
# Poll-based only — no NOTIFY overhead
pg_trickle.event_driven_wake = false
pg_trickle.scheduler_interval_ms = 5000 # 5-second poll
# Conservative scheduling
pg_trickle.min_schedule_seconds = 5
pg_trickle.default_schedule_seconds = 10
# Minimal parallelism
pg_trickle.parallel_refresh_mode = 'off' # single-threaded refresh
pg_trickle.max_concurrent_refreshes = 1
pg_trickle.max_dynamic_refresh_workers = 1
# Conservative memory
pg_trickle.merge_work_mem_mb = 32
pg_trickle.merge_planner_hints = true
pg_trickle.cleanup_use_truncate = true
# Tight guardrails
pg_trickle.auto_backoff = true
pg_trickle.fuse_default_ceiling = 100000
pg_trickle.differential_max_change_ratio = 0.10
pg_trickle.block_source_ddl = true
pg_trickle.buffer_alert_threshold = 500000
Complete postgresql.conf Example
# Required
shared_preload_libraries = 'pg_trickle'
# Essential
pg_trickle.enabled = true
pg_trickle.cdc_mode = 'auto'
pg_trickle.scheduler_interval_ms = 1000
pg_trickle.min_schedule_seconds = 1
pg_trickle.default_schedule_seconds = 1
pg_trickle.max_consecutive_errors = 3
# WAL CDC
pg_trickle.wal_transition_timeout = 300
pg_trickle.slot_lag_warning_threshold_mb = 100
pg_trickle.slot_lag_critical_threshold_mb = 1024
# Refresh performance
pg_trickle.differential_max_change_ratio = 0.15
pg_trickle.merge_planner_hints = true
pg_trickle.merge_work_mem_mb = 64
pg_trickle.cleanup_use_truncate = true
pg_trickle.use_prepared_statements = true
pg_trickle.user_triggers = 'auto'
# Guardrails & limits
pg_trickle.block_source_ddl = false
pg_trickle.buffer_alert_threshold = 1000000
pg_trickle.compact_threshold = 100000
pg_trickle.buffer_partitioning = 'off'
pg_trickle.max_grouping_set_branches = 64
pg_trickle.max_parse_depth = 64
pg_trickle.ivm_topk_max_limit = 1000
pg_trickle.ivm_recursive_max_depth = 100
# Circular dependencies (v0.7.0+)
pg_trickle.allow_circular = false # master switch
pg_trickle.max_fixpoint_iterations = 100 # convergence limit
# Parallel refresh (v0.4.0+, default off)
pg_trickle.parallel_refresh_mode = 'off' # 'off' | 'dry_run' | 'on'
pg_trickle.max_dynamic_refresh_workers = 4 # cluster-wide worker cap
pg_trickle.max_concurrent_refreshes = 4 # per-database dispatch cap
# Advanced / internal
pg_trickle.change_buffer_schema = 'pgtrickle_changes'
pg_trickle.foreign_table_polling = false
Runtime Configuration
All GUC variables can be changed at runtime by a superuser:
-- View current settings
SHOW pg_trickle.enabled;
SHOW pg_trickle.parallel_refresh_mode;
-- Enable parallel refresh for current session
SET pg_trickle.parallel_refresh_mode = 'on';
-- Change persistently (requires reload)
ALTER SYSTEM SET pg_trickle.scheduler_interval_ms = 500;
SELECT pg_reload_conf();
Further Reading
- INSTALL.md — Installation and initial configuration
- ARCHITECTURE.md — System architecture overview
- SQL_REFERENCE.md — Complete function reference
Scaling Guide
This document provides guidance for scaling pg_trickle to hundreds of stream tables and beyond. It covers worker pool sizing, scheduler tuning, and diagnostic queries for identifying bottlenecks.
Architecture Overview
pg_trickle uses a two-tier background worker model:
- Launcher — one per server. Scans
pg_databaseevery 10 seconds, spawns per-database schedulers, and auto-restarts crashed workers. - Per-database scheduler — one per database. Wakes every
scheduler_interval_ms(default: 1 s), reads DAG changes from shared memory, consumes CDC buffers, and dispatches refreshes.
When parallel_refresh_mode = 'on', the scheduler dispatches refresh work to a
pool of dynamic background workers instead of running refreshes inline.
Worker Pool Sizing
| Deployment Size | Stream Tables | Recommended max_dynamic_refresh_workers | Notes |
|---|---|---|---|
| Small | 1–20 | 2–4 | Default (4) is usually sufficient |
| Medium | 20–100 | 4–8 | Monitor worker saturation |
| Large | 100–200 | 8–16 | Enable tiered scheduling |
| Very Large | 200+ | 16–32 | Tune per-database quotas |
Budget Formula
Worker slots are drawn from max_worker_processes, which is shared with
autovacuum, parallel queries, and other extensions:
max_worker_processes >= launchers(1)
+ schedulers(N_databases)
+ max_dynamic_refresh_workers
+ autovacuum_max_workers
+ max_parallel_workers
+ other_extensions
Example for 200 STs across 2 databases with 16 workers:
# postgresql.conf
max_worker_processes = 40
pg_trickle.max_dynamic_refresh_workers = 16
pg_trickle.max_concurrent_refreshes = 8
pg_trickle.per_database_worker_quota = 8
pg_trickle.parallel_refresh_mode = 'on'
Tiered Scheduling
For deployments with 50+ stream tables, enable tiered scheduling to reduce scheduler overhead:
pg_trickle.tiered_scheduling = on -- default since v0.12.0
The scheduler classifies stream tables into tiers based on change frequency:
| Tier | Schedule Multiplier | Behavior |
|---|---|---|
| Hot | 1× (base interval) | Tables with frequent changes |
| Warm | 2× | Tables with moderate changes |
| Cold | 10× | Tables with rare changes |
| Frozen | skip | Tables with no recent changes |
This reduces the CPU cost of the scheduling loop itself, which can become a bottleneck at 200+ STs when every table is polled every cycle.
Dispatch Priority
When multiple stream tables are ready simultaneously, the scheduler dispatches in priority order:
- IMMEDIATE closures — time-critical refresh requests
- Atomic groups / Repeatable-read groups / Fused chains — multi-ST units
- Singletons — individual stream tables
- Cyclic SCCs — strongly-connected components
Within each priority band, the tier sort applies (Hot > Warm > Cold).
Per-Database Quotas and Burst
When per_database_worker_quota > 0, each database gets a guaranteed slice
of the worker pool:
- Normal load (cluster < 80% capacity): database can burst to 150% of its quota using idle capacity from other databases.
- High load (cluster ≥ 80% capacity): strict quota enforcement.
This prevents a single high-traffic database from starving others.
Monitoring
Worker Pool Status
SELECT * FROM pgtrickle.worker_pool_status();
-- Returns: active_workers, max_workers, per_db_cap, parallel_mode
Active Job Details
SELECT * FROM pgtrickle.parallel_job_status(300);
-- Returns recent jobs (last 300s): status, duration, worker PID, etc.
Health Summary
SELECT * FROM pgtrickle.health_summary();
-- Returns: total/active/error/suspended/stale counts, scheduler status, cache hit rate
Buffer Backlog Check
SELECT * FROM pgtrickle.change_buffer_sizes()
ORDER BY row_count DESC
LIMIT 20;
Identifying Bottlenecks
Is the scheduler loop the bottleneck?
-- If queue depth is consistently > 10 and workers are not saturated,
-- the scheduler loop is the bottleneck. Reduce scheduler_interval_ms.
SELECT active_workers, max_workers
FROM pgtrickle.worker_pool_status();
Are workers saturated?
-- If active_workers == max_workers consistently, increase the pool.
SELECT active_workers >= max_workers AS saturated
FROM pgtrickle.worker_pool_status();
Which STs take the longest?
SELECT st.pgt_schema, st.pgt_name,
AVG(EXTRACT(EPOCH FROM (h.end_time - h.start_time))) AS avg_sec,
MAX(EXTRACT(EPOCH FROM (h.end_time - h.start_time))) AS max_sec,
COUNT(*) AS refreshes
FROM pgtrickle.pgt_refresh_history h
JOIN pgtrickle.pgt_stream_tables st ON st.pgt_id = h.pgt_id
WHERE h.start_time > now() - interval '1 hour'
AND h.status = 'COMPLETED'
GROUP BY st.pgt_schema, st.pgt_name
ORDER BY avg_sec DESC
LIMIT 20;
Tuning Profiles
Low-Latency (< 50 ms P99)
pg_trickle.scheduler_interval_ms = 200
pg_trickle.event_driven_wake = on
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_dynamic_refresh_workers = 8
pg_trickle.tiered_scheduling = on
High-Throughput (200+ STs)
pg_trickle.scheduler_interval_ms = 500
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_dynamic_refresh_workers = 16
pg_trickle.max_concurrent_refreshes = 8
pg_trickle.per_database_worker_quota = 8
pg_trickle.tiered_scheduling = on
pg_trickle.merge_work_mem_mb = 128
Resource-Constrained (4 CPU / 8 GB RAM)
pg_trickle.scheduler_interval_ms = 2000
pg_trickle.parallel_refresh_mode = 'on'
pg_trickle.max_dynamic_refresh_workers = 2
pg_trickle.max_concurrent_refreshes = 2
pg_trickle.tiered_scheduling = on
pg_trickle.delta_work_mem_cap_mb = 256
pg_trickle.merge_work_mem_mb = 32
Profiling Methodology
To profile worker utilization at scale, run a test with 200+ stream tables
and max_workers set to 4, 8, and 16 in turn. Collect the following metrics
at 1-second intervals:
-- Worker pool utilization over time
SELECT now() AS ts,
(SELECT active_workers FROM pgtrickle.worker_pool_status()) AS active,
(SELECT max_workers FROM pgtrickle.worker_pool_status()) AS pool_size,
(SELECT COUNT(*) FROM pgtrickle.parallel_job_status(5)
WHERE status = 'QUEUED') AS queue_depth;
Plot active / pool_size (utilization) and queue_depth over time.
If utilization is consistently > 90% with non-zero queue depth, the pool
is undersized. If utilization is < 50%, the pool is oversized and consuming
max_worker_processes slots unnecessarily.
Known Scaling Limits
| Resource | Practical Limit | Bottleneck |
|---|---|---|
| Stream tables per DB | ~500 | Scheduler loop CPU |
| Worker pool size | 64 | GUC max |
| Change buffer rows | max_buffer_rows (default 1M) | Disk I/O |
| Template cache size | 128 entries (L1) | Evictions increase at >128 STs |
| DAG depth | ~20 levels | Topological sort + cascade latency |
Read Replicas & Hot Standby
Added in v0.19.0 (SCAL-1 / STAB-2).
pg_trickle is a primary-only extension. Stream tables are maintained by the background scheduler through DML (INSERT, DELETE, MERGE), which is only possible on the primary server.
Behaviour on Replicas
When the pg_trickle shared library is loaded on a read replica (physical standby or streaming replica):
- The launcher worker detects
pg_is_in_recovery() = trueand enters a sleep loop, checking every 30 seconds for promotion. - Upon promotion (e.g.
pg_promote()), the launcher resumes normal operation and spawns per-database schedulers. - Manual refresh calls (
pgtrickle.refresh_stream_table()) on a replica are rejected with a clear error message.
Recommended Setup
- Include
pg_trickleinshared_preload_librarieson both primary and replicas. This ensures immediate availability after failover without a restart. - Stream tables are read-queryable on replicas via physical replication — the storage tables are regular PostgreSQL tables that replicate normally.
- Monitor the replication lag to estimate stream table staleness on replicas.
CNPG & Kubernetes Operations
Added in v0.19.0 (SCAL-3).
CloudNativePG (CNPG) is the recommended Kubernetes operator for running pg_trickle. The extension is packaged as a custom container image that extends the official PostgreSQL image.
Container Image
Build the pg_trickle image using the provided Dockerfiles:
# GHCR image (multi-stage build)
docker build -f Dockerfile.ghcr -t pg-trickle:latest .
# Or use the CNPG-specific Dockerfile
docker build -f cnpg/Dockerfile.ext -t pg-trickle-cnpg:latest .
CNPG Cluster Configuration
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: pg-trickle-cluster
spec:
instances: 3
imageName: your-registry/pg-trickle:0.19.0
postgresql:
shared_preload_libraries:
- pg_trickle
parameters:
pg_trickle.enabled: "true"
pg_trickle.scheduler_interval_ms: "1000"
pg_trickle.max_concurrent_refreshes: "4"
# STAB-1: If using PgBouncer sidecar in transaction mode:
# pg_trickle.connection_pooler_mode: "transaction"
Operational Notes
- Failover: pg_trickle detects promotion automatically (see Read Replicas above). After CNPG promotes a replica, the launcher starts within 30 seconds.
- Scaling replicas: Stream table data replicates to all replicas via physical replication. No pg_trickle-specific configuration needed on replicas.
- Backup: Use CNPG's built-in Barman backup. pg_trickle's catalog tables are included automatically. See Backup & Restore.
- Monitoring: The Prometheus endpoint (
pgtrickle.health_summary()) is compatible with CNPG's monitoring sidecar. See the Grafana dashboards inmonitoring/grafana/.
Installation Guide
Prerequisites
| Requirement | Version |
|---|---|
| PostgreSQL | 18.x |
Building from source additionally requires Rust 1.85+ (edition 2024) and pgrx 0.17.x. Pre-built release artifacts only need a running PostgreSQL 18.x instance.
Installing from a Pre-built Release
1. Download the release archive
Download the archive for your platform from the GitHub Releases page:
| Platform | Archive |
|---|---|
| Linux x86_64 | pg_trickle-<ver>-pg18-linux-amd64.tar.gz |
| macOS Apple Silicon | pg_trickle-<ver>-pg18-macos-arm64.tar.gz |
| Windows x64 | pg_trickle-<ver>-pg18-windows-amd64.zip |
Optionally verify the checksum against SHA256SUMS.txt from the same release:
sha256sum -c SHA256SUMS.txt
2. Extract and install
Linux / macOS:
tar xzf pg_trickle-<ver>-pg18-linux-amd64.tar.gz
cd pg_trickle-<ver>-pg18-linux-amd64
sudo cp lib/*.so "$(pg_config --pkglibdir)/"
sudo cp extension/*.control extension/*.sql "$(pg_config --sharedir)/extension/"
Windows (PowerShell):
Expand-Archive pg_trickle-<ver>-pg18-windows-amd64.zip -DestinationPath .
cd pg_trickle-<ver>-pg18-windows-amd64
Copy-Item lib\*.dll "$(pg_config --pkglibdir)\"
Copy-Item extension\* "$(pg_config --sharedir)\extension\"
3. Using with CloudNativePG (Kubernetes)
pg_trickle is distributed as an OCI extension image for use with CloudNativePG Image Volume Extensions.
Requirements: Kubernetes 1.33+, CNPG 1.28+, PostgreSQL 18.
# Pull the extension image
docker pull ghcr.io/grove/pg_trickle-ext:<ver>
See cnpg/cluster-example.yaml and cnpg/database-example.yaml for complete Cluster and Database deployment examples.
4. GHCR Docker image (recommended for local dev)
pg_trickle is published as a ready-to-run Docker image on the GitHub Container
Registry. PostgreSQL 18.3 and pg_trickle are pre-installed and all sensible GUC
defaults (wal_level, shared_preload_libraries, memory, scheduler settings)
are baked in — no configuration file editing needed.
docker pull ghcr.io/grove/pg_trickle:latest
docker run --rm \
-e POSTGRES_PASSWORD=secret \
-p 5432:5432 \
ghcr.io/grove/pg_trickle:latest
CREATE EXTENSION pg_trickle; runs automatically on the default postgres
database at first startup.
Available tags:
| Tag | Meaning |
|---|---|
latest | Most recent release |
pg18 | Floating alias for the latest PostgreSQL 18 build |
<version>-pg18.3 | Immutable tag, e.g. 0.13.0-pg18.3 |
Override any GUC at runtime without rebuilding:
docker run --rm \
-e POSTGRES_PASSWORD=secret \
-p 5432:5432 \
ghcr.io/grove/pg_trickle:latest \
-c shared_buffers=2GB -c work_mem=64MB -c effective_cache_size=6GB
For persistent data, mount a volume:
docker run -d \
--name pg_trickle \
-e POSTGRES_PASSWORD=secret \
-p 5432:5432 \
-v pg_trickle_data:/var/lib/postgresql/data \
ghcr.io/grove/pg_trickle:latest
Alternative — manual mount from a release archive:
If you prefer to use the stock postgres:18.3 image rather than the pre-built
image, extract the extension files from a release archive and mount them:
tar xzf pg_trickle-<ver>-pg18-linux-amd64.tar.gz
cd pg_trickle-<ver>-pg18-linux-amd64
docker run --rm \
-v $PWD/lib/pg_trickle.so:/usr/lib/postgresql/18/lib/pg_trickle.so:ro \
-v $PWD/extension/:/tmp/ext/:ro \
-e POSTGRES_PASSWORD=postgres \
postgres:18.3 \
sh -c 'cp /tmp/ext/* /usr/share/postgresql/18/extension/ && \
exec postgres -c shared_preload_libraries=pg_trickle'
Installing from PGXN
pg_trickle is published on the PostgreSQL Extension Network (PGXN). Installing via PGXN compiles the extension from source, so the Rust toolchain and pgrx are required.
1. Install prerequisites
# Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
# pgrx build tool
cargo install --locked cargo-pgrx --version 0.17.0
cargo pgrx init --pg18 "$(pg_config --bindir)/pg_config"
2. Install the pgxn client
pip install pgxnclient
3. Install pg_trickle
pgxn install pg_trickle
To install a specific version:
pgxn install pg_trickle=0.10.0
Note: After installation, follow the PostgreSQL Configuration and Extension Installation steps below.
Building from Source
1. Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
2. Install pgrx
cargo install --locked cargo-pgrx --version 0.17.0
cargo pgrx init --pg18 $(pg_config --bindir)/pg_config
3. Build the Extension
# Development build (faster compilation)
cargo pgrx install --pg-config $(pg_config --bindir)/pg_config
# Release build (optimized, for production)
cargo pgrx install --release --pg-config $(pg_config --bindir)/pg_config
# Package for deployment (creates installable artifacts)
cargo pgrx package --pg-config $(pg_config --bindir)/pg_config
PostgreSQL Configuration
Add the following to postgresql.conf before starting PostgreSQL:
# Required — loads the extension shared library at server start
shared_preload_libraries = 'pg_trickle'
# Must accommodate the pg_trickle launcher + one scheduler per database
# with pg_trickle installed + optional parallel refresh workers.
#
# WARNING: when this limit is reached, the launcher silently skips
# databases it cannot spawn a scheduler for and retries every 5 minutes.
# Those databases stop refreshing without any visible error.
# Check PostgreSQL logs for:
# WARNING: pg_trickle launcher: could not spawn scheduler for database '...'
#
# Formula:
# 1 (launcher) + N (one scheduler per DB) + max_dynamic_refresh_workers
# + autovacuum_max_workers + parallel query workers + other extensions
#
# 32 is a safe starting point for most clusters:
max_worker_processes = 32
Note:
wal_level = logicalandmax_replication_slotsare not required. The extension uses lightweight row-level triggers for CDC, not logical replication.
Restart PostgreSQL after modifying these settings:
pg_ctl restart -D /path/to/data
# or
systemctl restart postgresql
Extension Installation
Connect to the target database and run:
CREATE EXTENSION pg_trickle;
This creates:
- The
pgtrickleschema with catalog tables and SQL functions - The
pgtrickle_changesschema for change buffer tables - Event triggers for DDL tracking
- The
pgtrickle.pg_stat_stream_tablesmonitoring view
Verification
After installation, verify everything is working:
-- Check the extension version
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_trickle';
-- Or get a full status overview (includes version, scheduler state, stream table count)
SELECT * FROM pgtrickle.pgt_status();
Quick functional test
CREATE TABLE test_source (id INT PRIMARY KEY, val TEXT);
INSERT INTO test_source VALUES (1, 'hello');
SELECT pgtrickle.create_stream_table(
'test_st',
'SELECT id, val FROM test_source',
'1m',
'FULL'
);
SELECT * FROM test_st;
-- Should return: 1 | hello
-- Clean up
SELECT pgtrickle.drop_stream_table('test_st');
DROP TABLE test_source;
Upgrading
To upgrade pg_trickle to a newer version without losing data:
For comprehensive upgrade instructions, version-specific notes, troubleshooting, and rollback procedures, see docs/UPGRADING.md.
1. Install the new extension files
Follow the same steps as Installing from a Pre-built Release to overwrite the shared library and SQL files with the new version. You do not need to drop the extension from your databases first.
Linux / macOS:
tar xzf pg_trickle-<new-ver>-pg18-linux-amd64.tar.gz
cd pg_trickle-<new-ver>-pg18-linux-amd64
sudo cp lib/*.so "$(pg_config --pkglibdir)/"
sudo cp extension/*.control extension/*.sql "$(pg_config --sharedir)/extension/"
2. Restart PostgreSQL (when required)
If the shared library ABI has changed, restart PostgreSQL before proceeding so the
new .so/.dll is loaded. The release notes for each version will call this out
explicitly when a restart is required.
pg_ctl restart -D /path/to/data
# or
systemctl restart postgresql
3. Apply the schema migration in each database
Connect to every database where pg_trickle is installed and run:
-- Upgrade to the latest bundled version
ALTER EXTENSION pg_trickle UPDATE;
-- Or upgrade to a specific version
ALTER EXTENSION pg_trickle UPDATE TO '<new-version>';
PostgreSQL uses the versioned SQL migration scripts bundled with the release
(e.g. pg_trickle--0.2.3--0.3.0.sql, pg_trickle--0.3.0--0.4.0.sql) to
apply catalog and SQL-surface changes.
PostgreSQL automatically chains these scripts when you run ALTER EXTENSION pg_trickle UPDATE. The command is a no-op when no migration script is needed
for a given release.
You can confirm the active version afterwards:
SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';
Coming soon: A future release will include a helper function (
pgtrickle.upgrade()) that automates steps 2–3 across all databases in the cluster and validates catalog integrity after the migration. Until then, the manual steps above are the supported upgrade path.
Uninstallation
-- Drop all stream tables first
SELECT pgtrickle.drop_stream_table(pgt_schema || '.' || pgt_name)
FROM pgtrickle.pgt_stream_tables;
-- Drop the extension
DROP EXTENSION pg_trickle CASCADE;
Remove pg_trickle from shared_preload_libraries in postgresql.conf and restart PostgreSQL.
Troubleshooting
Unit tests crash on macOS 26+ (symbol not found in flat namespace)
macOS 26 (Tahoe) changed dyld to eagerly resolve all flat-namespace symbols
at binary load time. pgrx extensions reference PostgreSQL server-internal
symbols (e.g. CacheMemoryContext, SPI_connect) via the
-Wl,-undefined,dynamic_lookup linker flag. These symbols are normally
provided by the postgres executable when the extension is loaded as a shared
library — but for cargo test --lib there is no postgres process, so the test
binary aborts immediately:
dyld[66617]: symbol not found in flat namespace '_CacheMemoryContext'
This affects local development only — integration tests, E2E tests, and the extension itself running inside PostgreSQL are unaffected.
The fix is built into the just test-unit recipe. It automatically:
- Compiles a tiny C stub library (
scripts/pg_stub.c→target/libpg_stub.dylib) that provides NULL/no-op definitions for the ~28 PostgreSQL symbols. - Compiles the test binary with
--no-run. - Runs the binary with
DYLD_INSERT_LIBRARIESpointing to the stub.
The stub is only built on macOS 26+. On Linux or older macOS, just test-unit
runs cargo test --lib directly with no changes.
Note: The stub symbols are never called — unit tests exercise pure Rust logic only. If a test accidentally calls a PostgreSQL function it will crash with a NULL dereference (the desired fail-fast behavior).
If you run unit tests without just (e.g. directly via cargo test --lib),
you can use the wrapper script instead:
./scripts/run_unit_tests.sh pg18
# With test name filter:
./scripts/run_unit_tests.sh pg18 -- test_parse_basic
Extension fails to load
Ensure shared_preload_libraries = 'pg_trickle' is set and PostgreSQL has been restarted (not just reloaded). The extension requires shared memory initialization at startup.
Background worker not starting
Check that max_worker_processes is high enough. In sequential mode (default) pg_trickle needs one slot per database with stream tables. With parallel refresh enabled (pg_trickle.parallel_refresh_mode = 'on') it additionally needs max_dynamic_refresh_workers slots (default 4) shared across all databases.
See the worker-budget formula in CONFIGURATION.md for sizing guidance.
Check logs for details
The extension logs at various levels. Enable debug logging for more detail:
SET client_min_messages TO debug1;
Next Steps
- Getting Started — Create your first stream table in 5 minutes
- Pre-Deployment Checklist — Complete checklist for production deployments
- Best-Practice Patterns — Common data modeling patterns
- Configuration Reference — All GUC variables and tuning
Upgrading pg_trickle
This guide covers upgrading pg_trickle from one version to another.
Quick Upgrade (Recommended)
-- 1. Check current version
SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';
-- 2. Replace the binary files (.so/.dylib, .control, .sql)
-- See the installation method below for your platform.
-- 3. Restart PostgreSQL (required for shared library changes)
-- sudo systemctl restart postgresql
-- 4. Run the upgrade in each database that has pg_trickle installed
ALTER EXTENSION pg_trickle UPDATE;
-- 5. Verify the upgrade
SELECT pgtrickle.version();
SELECT * FROM pgtrickle.health_check();
Step-by-Step Instructions
1. Check Current Version
SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';
-- Returns your current installed version, e.g. '0.9.0'
2. Install New Binary Files
Replace the extension files in your PostgreSQL installation directory. The method depends on how you originally installed pg_trickle.
From release tarball:
# Replace <new-version> with the target release, for example 0.2.3
curl -LO https://github.com/getretake/pg_trickle/releases/download/v<new-version>/pg_trickle-<new-version>-pg18-linux-amd64.tar.gz
tar xzf pg_trickle-<new-version>-pg18-linux-amd64.tar.gz
# Copy files to PostgreSQL directories
sudo cp pg_trickle-<new-version>-pg18-linux-amd64/lib/* $(pg_config --pkglibdir)/
sudo cp pg_trickle-<new-version>-pg18-linux-amd64/extension/* $(pg_config --sharedir)/extension/
From source (cargo-pgrx):
cargo pgrx install --release
3. Restart PostgreSQL
The shared library (.so / .dylib) is loaded at server start via
shared_preload_libraries. A restart is required for the new binary to
take effect.
sudo systemctl restart postgresql
# or on macOS with Homebrew:
brew services restart postgresql@18
4. Run ALTER EXTENSION UPDATE
Connect to each database where pg_trickle is installed and run:
ALTER EXTENSION pg_trickle UPDATE;
This executes the upgrade migration scripts in order (for example,
pg_trickle--0.5.0--0.6.0.sql → pg_trickle--0.6.0--0.7.0.sql).
PostgreSQL automatically determines the full upgrade chain from your current
version to the new default_version.
5. Verify the Upgrade
-- Check version
SELECT pgtrickle.version();
-- Run health check
SELECT * FROM pgtrickle.health_check();
-- Verify stream tables are intact
SELECT * FROM pgtrickle.stream_tables_info;
-- Test a refresh
SELECT pgtrickle.refresh_stream_table('your_stream_table');
Version-Specific Notes
0.1.3 → 0.2.0
New functions added:
pgtrickle.list_sources(name)— list source tables for a stream tablepgtrickle.change_buffer_sizes()— inspect CDC change buffer sizespgtrickle.health_check()— diagnostic health checkspgtrickle.dependency_tree()— visualize the dependency DAGpgtrickle.trigger_inventory()— audit CDC triggerspgtrickle.refresh_timeline(max_rows)— refresh historypgtrickle.diamond_groups()— diamond dependency group infopgtrickle.version()— extension version stringpgtrickle.pgt_ivm_apply_delta(...)— internal IVM delta applicationpgtrickle.pgt_ivm_handle_truncate(...)— internal TRUNCATE handlerpgtrickle._signal_launcher_rescan()— internal launcher signal
No schema changes to pgtrickle.pgt_stream_tables or
pgtrickle.pgt_dependencies catalog tables.
No breaking changes. All v0.1.3 functions and views continue to work as before.
0.2.0 → 0.2.1
Three new catalog columns added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
topk_offset | INT | NULL | Pre-provisioned for paged TopK OFFSET (activated in v0.2.2) |
has_keyless_source | BOOLEAN NOT NULL | FALSE | EC-06: keyless source flag; switches apply strategy from MERGE to counted DELETE |
function_hashes | TEXT | NULL | EC-16: stores MD5 hashes of referenced function bodies for change detection |
The migration script (pg_trickle--0.2.0--0.2.1.sql) adds these columns
via ALTER TABLE … ADD COLUMN IF NOT EXISTS.
No breaking changes. All v0.2.0 functions, views, and event triggers continue to work as before.
What's also new:
- Upgrade migration safety infrastructure (scripts, CI, E2E tests)
- GitHub Pages book expansion (6 new documentation pages)
- User-facing upgrade guide (this document)
0.2.1 → 0.2.2
No catalog table DDL changes. The topk_offset column needed for paged
TopK was already added in v0.2.1.
Two SQL function updates are applied by
pg_trickle--0.2.1--0.2.2.sql:
pgtrickle.create_stream_table(...)- default
schedulechanges from'1m'to'calculated' - default
refresh_modechanges from'DIFFERENTIAL'to'AUTO'
- default
pgtrickle.alter_stream_table(...)- adds the optional
queryparameter used by ALTER QUERY support
- adds the optional
Because PostgreSQL stores argument defaults and function signatures in
pg_proc, the migration script must DROP FUNCTION and recreate both
signatures during ALTER EXTENSION ... UPDATE.
Behavioral notes:
- Existing stream tables keep their current catalog values. The migration only
changes the defaults used by future
create_stream_table(...)calls. - Existing applications can opt a table into the new defaults explicitly via
pgtrickle.alter_stream_table(...)after the upgrade. - After installing the new binary and restarting PostgreSQL, the scheduler now
warns if the shared library version and SQL-installed extension version do
not match. This helps detect stale
.so/.dylibfiles after partial upgrades.
0.2.2 → 0.2.3
One new catalog column is added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
requested_cdc_mode | TEXT | NULL | Optional per-stream-table CDC override ('auto', 'trigger', 'wal') |
The upgrade script also recreates two SQL functions:
pgtrickle.create_stream_table(...)- adds the optional
cdc_modeparameter
- adds the optional
pgtrickle.alter_stream_table(...)- adds the optional
cdc_modeparameter
- adds the optional
Monitoring view updates:
pgtrickle.pg_stat_stream_tablesgains thecdc_modescolumnpgtrickle.pgt_cdc_statusis added for per-source CDC visibility
Because PostgreSQL stores function signatures and defaults in pg_proc, the
upgrade script drops and recreates both lifecycle functions during
ALTER EXTENSION ... UPDATE.
0.6.0 → 0.7.0
One new catalog column is added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
last_fixpoint_iterations | INT | NULL | Records how many rounds the last circular-dependency fixpoint run required |
Two new catalog tables are added:
| Table | Purpose |
|---|---|
pgtrickle.pgt_watermarks | Stores per-source watermark progress reported by external loaders |
pgtrickle.pgt_watermark_groups | Stores groups of sources that must stay temporally aligned before refresh |
The upgrade script also updates and adds SQL functions:
- Recreates
pgtrickle.pgt_status()so the result includesscc_id - Adds
pgtrickle.pgt_scc_status()for circular-dependency monitoring - Adds
pgtrickle.advance_watermark(source, watermark) - Adds
pgtrickle.create_watermark_group(name, sources[], tolerance_secs) - Adds
pgtrickle.drop_watermark_group(name) - Adds
pgtrickle.watermarks() - Adds
pgtrickle.watermark_groups() - Adds
pgtrickle.watermark_status()
Behavioral notes:
- Circular stream table dependencies can now run to convergence when
pg_trickle.allow_circular = trueand every member of the cycle is safe for monotone DIFFERENTIAL refresh. - The scheduler can now hold back refreshes until related source tables are aligned within a configured watermark tolerance.
- Existing non-circular stream tables continue to work as before. The new catalog objects are additive.
0.7.0 → 0.8.0
No catalog schema changes. The upgrade migration script contains no DDL.
New operational features:
pg_dump/pg_restoresupport: stream tables are now safely exported and re-connected after restore without manual intervention.- Connection pooler opt-in was introduced at the per-stream level (superseded
by the more comprehensive
pooler_compatibility_modeadded in v0.10.0).
No breaking changes. All v0.7.0 functions, views, and event triggers continue to work as before.
0.8.0 → 0.9.0
No catalog schema DDL changes to pgtrickle.pgt_stream_tables or the
dependency catalog.
New API function added:
pgtrickle.restore_stream_tables()— re-installs CDC triggers and re-registers stream tables after apg_restorefrom apg_dump.
Hidden auxiliary columns for AVG / STDDEV / VAR aggregates. Stream tables
using these aggregates will automatically receive hidden __pgt_aux_*
columns on the next refresh after upgrading. No manual action is needed —
pg_trickle detects missing auxiliary columns and performs a single full
reinitialise to add them.
Behavioral notes:
- COUNT, SUM, and AVG now update in constant time (O(changed rows)) instead of rescanning the whole group.
- STDDEV and VAR variants likewise update in O(changed rows) via hidden sum-of-squares auxiliary columns.
- MIN/MAX still requires a group rescan only when the deleted value is the current extreme.
- Refresh groups (
create_refresh_group,drop_refresh_group,refresh_groups()) are available starting from this version.
0.9.0 → 0.10.0
Two new catalog columns added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
pooler_compatibility_mode | BOOLEAN NOT NULL | FALSE | Disables prepared statements and NOTIFY for this stream table — required when accessed through PgBouncer in transaction-pool mode |
refresh_tier | TEXT NOT NULL | 'hot' | Tiered scheduling tier: hot, warm, cold, or frozen |
One new catalog table is added:
| Table | Purpose |
|---|---|
pgtrickle.pgt_refresh_groups | Stores refresh groups for snapshot-consistent multi-table refresh |
The upgrade script also updates and adds SQL functions:
pgtrickle.create_stream_table(...)gains thepooler_compatibility_modeparameterpgtrickle.create_stream_table_if_not_exists(...)likewisepgtrickle.create_or_replace_stream_table(...)likewisepgtrickle.alter_stream_table(...)likewise- Adds
pgtrickle.create_refresh_group(name, members, isolation) - Adds
pgtrickle.drop_refresh_group(name) - Adds
pgtrickle.refresh_groups()— lists all declared groups
Behavioral notes:
pooler_compatibility_modedefaults tofalse. Existing stream tables are unaffected. Enable it only for stream tables accessed through PgBouncer transaction-mode pooling.pg_trickle.auto_backoffnow defaults toon(wasoff). The backoff threshold is raised from 80 % → 95 % and the maximum slowdown is capped at 8× (was 64×). If you relied on the old opt-in behaviour, setpg_trickle.auto_backoff = offexplicitly.diamond_consistencynow defaults to'atomic'for new stream tables (was'none'). Existing stream tables keep their current setting.- The scheduler now uses row-level locking for concurrency control instead of session-level advisory locks, making pg_trickle compatible with PgBouncer transaction-pool and similar connection poolers.
- Statistical aggregates (
CORR,COVAR_*,REGR_*) now update incrementally using Welford-style accumulation, no longer requiring a group rescan. - Materialized view sources can now be used in DIFFERENTIAL mode when
pg_trickle.matview_polling = onis set. - Recursive CTE stream tables with DELETE/UPDATE now use the Delete-and-Rederive algorithm (O(delta) instead of O(n)).
0.10.0 → 0.11.0
New catalog columns added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
effective_refresh_mode | TEXT | NULL | Actual refresh mode used in the last cycle (FULL / DIFFERENTIAL / APPEND_ONLY / TOP_K / NO_DATA); populated by the scheduler after each completed refresh |
fuse_mode | TEXT NOT NULL | 'off' | Circuit-breaker mode: off, on, or auto |
fuse_state | TEXT NOT NULL | 'armed' | Circuit-breaker state: armed, blown, or disabled |
fuse_ceiling | BIGINT | NULL | Maximum change-row count that can pass through in one refresh before the fuse blows; NULL = unlimited |
fuse_sensitivity | INT | NULL | Sensitivity multiplier for auto-fuse detection |
blown_at | TIMESTAMPTZ | NULL | Timestamp when the fuse last triggered |
blow_reason | TEXT | NULL | Human-readable reason the fuse blew |
st_partition_key | TEXT | NULL | Partition key column for declaratively partitioned stream tables; NULL = not partitioned |
Updated function signatures — existing calls continue to work because new parameters all have defaults:
pgtrickle.create_stream_table(...)gainspartition_by TEXT DEFAULT NULLpgtrickle.create_stream_table_if_not_exists(...)likewisepgtrickle.create_or_replace_stream_table(...)likewisepgtrickle.alter_stream_table(...)gainsfuse TEXT DEFAULT NULL,fuse_ceiling BIGINT DEFAULT NULL,fuse_sensitivity INT DEFAULT NULL
New functions:
pgtrickle.reset_fuse(name TEXT, action TEXT DEFAULT 'apply')— clear a blown fuse and resume schedulingpgtrickle.fuse_status()— returns circuit-breaker state for every stream tablepgtrickle.explain_refresh_mode(name TEXT)— shows configured mode, effective mode, and the reason for any downgrade
Behavioral notes:
- Event-driven wake (
pg_trickle.event_driven_wake) isonby default — the background worker now wakes within ~15 ms of a source-table write instead of waiting up to 500 ms. - Stream-table-to-stream-table chains now refresh incrementally — downstream tables receive a small insert/delete delta rather than cascading full refreshes.
pg_trickle.tiered_schedulingnow defaults toon.- Declaratively partitioned stream tables are supported via
partition_by— the refresh MERGE is automatically restricted to only the changed partitions.
0.11.0 → 0.12.0
No schema changes. This release adds four new diagnostic SQL functions only:
| Function | Returns | Purpose |
|---|---|---|
pgtrickle.explain_query_rewrite(query TEXT) | TABLE(pass_name TEXT, changed BOOL, sql_after TEXT) | Walk a query through every DVM rewrite pass to see how pg_trickle transforms it |
pgtrickle.diagnose_errors(name TEXT) | TABLE(event_time TIMESTAMPTZ, error_type TEXT, error_message TEXT, remediation TEXT) | Last 5 FAILED refresh events with error classification and suggested fixes |
pgtrickle.list_auxiliary_columns(name TEXT) | TABLE(column_name TEXT, data_type TEXT, purpose TEXT) | List all hidden __pgt_* auxiliary columns on a stream table's storage relation |
pgtrickle.validate_query(query TEXT) | TABLE(valid BOOL, mode TEXT, reason TEXT) | Parse and validate a query for stream-table compatibility without creating one |
Behavioral notes:
- The incremental engine now handles multi-table join deletes correctly — phantom rows after simultaneous deletes from multiple join sides no longer occur.
- Stream-table-to-stream-table row identity is now computed consistently between the change buffer and the downstream table, eliminating stale duplicate rows after upstream UPDATEs.
pg_trickle.tiered_schedulingdefaults toon(same as 0.11.0 runtime behaviour; this release makes it the explicit default).
0.12.0 → 0.13.0
Ten new catalog columns added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
effective_refresh_mode | TEXT | NULL | Computed refresh mode after AUTO resolution |
fuse_mode | TEXT NOT NULL | 'off' | Fuse configuration: off, auto, or manual |
fuse_state | TEXT NOT NULL | 'armed' | Current fuse state: armed or blown |
fuse_ceiling | BIGINT | NULL | Maximum change count before fuse blows |
fuse_sensitivity | INT | NULL | Consecutive cycles above ceiling before triggering |
blown_at | TIMESTAMPTZ | NULL | Timestamp when the fuse last blew |
blow_reason | TEXT | NULL | Reason the fuse blew |
st_partition_key | TEXT | NULL | Partition key specification (RANGE, LIST, or HASH) |
max_differential_joins | INT | NULL | Maximum join count for differential mode (auto-fallback to FULL when exceeded) |
max_delta_fraction | DOUBLE PRECISION | NULL | Maximum delta-to-table ratio for differential mode (auto-fallback to FULL when exceeded) |
All columns use ADD COLUMN IF NOT EXISTS for idempotent upgrades.
Nine new SQL functions (plus one replacement with new signature):
| Function | Purpose |
|---|---|
pgtrickle.explain_delta(name, format) | Delta SQL query plan inspection |
pgtrickle.dedup_stats() | MERGE deduplication frequency counters |
pgtrickle.shared_buffer_stats() | Per-source-buffer observability |
pgtrickle.explain_refresh_mode(name) | Refresh mode decision explanation |
pgtrickle.reset_fuse(name) | Reset a blown fuse |
pgtrickle.fuse_status() | Fuse state across all stream tables |
pgtrickle.explain_query_rewrite(query) | DVM rewrite pass inspection |
pgtrickle.diagnose_errors(name) | Error classification and remediation |
pgtrickle.list_auxiliary_columns(name) | Hidden __pgt_* column listing |
pgtrickle.validate_query(query) | Query compatibility validation |
pgtrickle.alter_stream_table(...) | (replaced) — new partition_by parameter |
New GUC variables:
| GUC | Default | Purpose |
|---|---|---|
pg_trickle.per_database_worker_quota | 0 (auto) | Per-database parallel worker limit |
Behavioral notes:
- Shared change buffers: Multiple stream tables reading from the same source now automatically share a single change buffer. No migration action required — existing per-source buffers continue to work.
- Columnar change tracking: Wide-table UPDATEs that touch only value columns (not GROUP BY / JOIN / WHERE columns) now generate significantly less delta volume. This is fully automatic.
- Auto buffer partitioning: Set
pg_trickle.buffer_partitioning = 'auto'to let high-throughput buffers self-promote to partitioned mode for O(1) cleanup. - dbt macros: If you use dbt-pgtrickle, update your macros to the matching v0.13.0 version. New config options:
partition_by,fuse,fuse_ceiling,fuse_sensitivity.
No breaking changes. All v0.12.0 functions, views, and event triggers continue to work as before.
0.13.0 → 0.14.0
Two new catalog columns added to pgtrickle.pgt_stream_tables:
| Column | Type | Default | Purpose |
|---|---|---|---|
last_error_message | TEXT | NULL | Error message from the last permanent refresh failure |
last_error_at | TIMESTAMPTZ | NULL | Timestamp of the last permanent refresh failure |
Updated function signature (return type gained new columns):
pgtrickle.st_refresh_stats()— gainsconsecutive_errors,schedule,refresh_tier, andlast_error_messagecolumns. The upgrade script drops and recreates the function. No behavior change for existing callers that ignore unknown columns.
New SQL functions (available immediately after ALTER EXTENSION ... UPDATE):
| Function | Purpose |
|---|---|
pgtrickle.recommend_refresh_mode(name) | Workload-based refresh mode recommendation with confidence level |
pgtrickle.refresh_efficiency(name) | Per-table FULL vs. DIFFERENTIAL performance metrics |
pgtrickle.export_definition(name) | Export stream table as reproducible DROP+CREATE+ALTER DDL |
pgtrickle.convert_buffers_to_unlogged() | Convert logged change buffers to UNLOGGED |
New GUC variables:
| GUC | Default | Purpose |
|---|---|---|
pg_trickle.planner_aggressive | true | Consolidated switch replacing merge_planner_hints + merge_work_mem_mb |
pg_trickle.unlogged_buffers | false | Create new change buffers as UNLOGGED (reduces WAL by ~30%) |
pg_trickle.agg_diff_cardinality_threshold | 1000 | Warn at creation time when GROUP BY cardinality is below this |
Deprecated GUCs (still accepted but ignored at runtime):
pg_trickle.merge_planner_hints→ usepg_trickle.planner_aggressivepg_trickle.merge_work_mem_mb→ usepg_trickle.planner_aggressive
Behavioral notes:
- Error-state circuit breaker: A single permanent refresh failure (e.g.
a function that doesn't exist for the column type) now immediately sets the
stream table status to
ERRORwith a message stored inlast_error_message. The scheduler skipsERRORtables. Usepgtrickle.resume_stream_table(name)followed bypgtrickle.alter_stream_table(name, query => ...)to recover. - Tiered scheduling NOTICE: Demoting a stream table from
hottocoldorfrozennow emits a NOTICE so operators are aware the effective refresh interval has changed (10× for cold, suspended for frozen). - SECURITY DEFINER triggers: All CDC trigger functions now run with
SECURITY DEFINERand an explicitSET search_path, hardening against privilege-escalation attacks. This is applied automatically on upgrade — no manual action needed. - TUI binary: A
pgtricklecommand-line tool is now included in the package. See TUI.md for usage.
No breaking changes. All v0.13.0 functions, views, and event triggers continue to work as before.
Supported Upgrade Paths
The following migration hops are available. PostgreSQL chains them
automatically when you run ALTER EXTENSION pg_trickle UPDATE.
| From | To | Script |
|---|---|---|
| 0.1.3 | 0.2.0 | pg_trickle--0.1.3--0.2.0.sql |
| 0.2.0 | 0.2.1 | pg_trickle--0.2.0--0.2.1.sql |
| 0.2.1 | 0.2.2 | pg_trickle--0.2.1--0.2.2.sql |
| 0.2.2 | 0.2.3 | pg_trickle--0.2.2--0.2.3.sql |
| 0.2.3 | 0.3.0 | pg_trickle--0.2.3--0.3.0.sql |
| 0.3.0 | 0.4.0 | pg_trickle--0.3.0--0.4.0.sql |
| 0.4.0 | 0.5.0 | pg_trickle--0.4.0--0.5.0.sql |
| 0.5.0 | 0.6.0 | pg_trickle--0.5.0--0.6.0.sql |
| 0.6.0 | 0.7.0 | pg_trickle--0.6.0--0.7.0.sql |
| 0.7.0 | 0.8.0 | pg_trickle--0.7.0--0.8.0.sql |
| 0.8.0 | 0.9.0 | pg_trickle--0.8.0--0.9.0.sql |
| 0.9.0 | 0.10.0 | pg_trickle--0.9.0--0.10.0.sql |
| 0.10.0 | 0.11.0 | pg_trickle--0.10.0--0.11.0.sql |
| 0.11.0 | 0.12.0 | pg_trickle--0.11.0--0.12.0.sql |
| 0.12.0 | 0.13.0 | pg_trickle--0.12.0--0.13.0.sql |
| 0.13.0 | 0.14.0 | pg_trickle--0.13.0--0.14.0.sql |
That means any installation currently on 0.1.3 through 0.13.0 can upgrade to 0.14.0 in one step after the new binaries are installed and PostgreSQL has been restarted.
Rollback / Downgrade
PostgreSQL does not support automatic extension downgrades. To roll back:
- Export stream table definitions (if you want to recreate them later):
cargo run --bin pg_trickle_dump -- --output backup.sql
Or, if the binary is already installed in your PATH:
pg_trickle_dump --output backup.sql
Use --dsn '<connection string>' or standard PG* / DATABASE_URL
environment variables when the default local connection parameters are not
sufficient.
-
Drop the extension (destroys all stream tables):
DROP EXTENSION pg_trickle CASCADE; -
Install the old version and restart PostgreSQL.
-
Recreate the extension at the old version:
CREATE EXTENSION pg_trickle VERSION '0.1.3'; -
Recreate stream tables from your backup.
Troubleshooting
"function pgtrickle.xxx does not exist" after upgrade
This means the upgrade script is missing a function. Workaround:
-- Check what version PostgreSQL thinks is installed
SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';
-- If the version looks correct but functions are missing,
-- the upgrade script may be incomplete. Try a clean reinstall:
DROP EXTENSION pg_trickle CASCADE;
CREATE EXTENSION pg_trickle CASCADE;
-- Warning: this destroys all stream tables!
Report this as a bug — upgrade scripts should never silently drop functions.
"could not access file pg_trickle" after restart
The new shared library file was not installed correctly. Verify:
ls -la $(pg_config --pkglibdir)/pg_trickle*
ALTER EXTENSION UPDATE says "already at version X"
The binary files are already the new version but the SQL catalog wasn't
upgraded. This usually means the .control file's default_version
matches your current version. Check:
cat $(pg_config --sharedir)/extension/pg_trickle.control
Multi-Database Environments
ALTER EXTENSION UPDATE must be run in each database where pg_trickle
is installed. A common pattern:
for db in $(psql -t -c "SELECT datname FROM pg_database WHERE datname NOT IN ('template0', 'template1')"); do
psql -d "$db" -c "ALTER EXTENSION pg_trickle UPDATE;" 2>/dev/null || true
done
CloudNativePG (CNPG)
For CNPG deployments, see cnpg/README.md for upgrade instructions specific to the Kubernetes operator.
Backup and Restore
Like any standard PostgreSQL extension, pg_trickle supports logical backups via pg_dump and physical backups (via tools like pgBackRest or pg_basebackup).
Because pg_trickle maintains automated states (like Change Data Capture buffers and DDL Event Triggers), specific workflows should be followed to ensure a smooth recovery.
Physical Backups (pgBackRest / pg_basebackup)
Physical backups copy the underlying data blocks. These are the most robust backups.
No special steps are needed during restore. When the database comes online, pg_trickle's catalogs, CDC buffers, and internal dependencies exist precisely as they did at the moment the snapshot was taken.
Note for WAL-Mode Users: Physical backups do not export replication slot data by default. If your CDC pipeline was in wal mode, logical slots might not survive the recreation. The pg_trickle scheduler handles missing slots gracefully by temporarily re-enabling table triggers.
Logical Backups (pg_dump / pg_restore)
Logical backups dump your database schema as generic cross-compatible SQL (CREATE TABLE, INSERT, CREATE INDEX).
pg_trickle integrates with pg_dump natively. When restoring these backups (which typically involves sequentially recreating schemas, inserting data into those tables, and lastly applying indexes and triggers), you must restore into a database precisely, to allow the extension to rewrite its own internal triggers correctly without conflicting with plain PostgreSQL commands.
The Recommended Multi-Stage pg_restore Strategy
The most reliable approach is to use the --section arguments of pg_restore. By breaking the restore up into pieces, we guarantee that when the schema, data, and constraints are created, all variables and configurations are actively in the database, and our custom hook DdlEventKind::ExtensionChange intercepts the query and automatically dials pgtrickle.restore_stream_tables() internally.
pg_trickle TUI — User Guide
pgtrickle is a terminal tool for managing and monitoring pg_trickle stream
tables. It works in two modes:
- Interactive dashboard — run
pgtricklewith no arguments to launch a live-updating TUI that shows all your stream tables, their health, dependencies, and configuration. - One-shot CLI — run
pgtrickle <command>to perform a single operation and exit. Output goes to stdout in table, JSON, or CSV format. Designed for scripts, CI pipelines, and automation.
Building
The TUI is a standalone Rust binary in the pgtrickle-tui workspace member.
It does not require the PostgreSQL extension to compile — only a Rust
toolchain.
# Build (debug)
cargo build -p pgtrickle-tui
# Build (release, optimized)
cargo build --release -p pgtrickle-tui
# The binary is at:
# target/debug/pgtrickle (debug)
# target/release/pgtrickle (release)
To install it on your PATH:
cargo install --path pgtrickle-tui
Verify:
pgtrickle --version
pgtrickle --help
Requirements
- Rust 2024 edition (1.85+)
- A running PostgreSQL 18 server with the
pg_trickleextension installed - Network access to the database (no local socket required)
Connecting to a Database
pgtrickle resolves connection parameters in this order (first match wins):
| Priority | Method | Example |
|---|---|---|
| 1 | --url flag | pgtrickle --url postgres://user:pass@host:5432/mydb list |
| 2 | PGTRICKLE_URL env var | export PGTRICKLE_URL=postgres://... |
| 3 | Individual flags | --host, --port, --dbname, --user, --password |
| 4 | Standard libpq env vars | PGHOST, PGPORT, PGDATABASE, PGUSER, PGPASSWORD |
| 5 | Defaults | localhost:5432/postgres as user postgres |
Connection flags work with every subcommand and with the interactive dashboard:
# URL-style connection
pgtrickle --url postgres://admin:secret@db.example.com:5432/analytics
# Environment variables (most common in production)
export PGHOST=db.example.com
export PGPORT=5432
export PGDATABASE=analytics
export PGUSER=admin
export PGPASSWORD=secret
pgtrickle list
# Explicit flags
pgtrickle --host db.example.com --dbname analytics --user admin list
Interactive Dashboard
Run pgtrickle with no subcommand:
pgtrickle
This opens a full-screen terminal UI that auto-refreshes every 2 seconds. The screen has three areas:
- Header — application name, current view, connection status (
● connected/✗ disconnected), and time since last poll. - Body — the active view (see below).
- Footer — keyboard shortcuts for switching views and a filter indicator.
Press q or Ctrl+C to exit.
Views
There are 14 views. Switch between them by pressing the key shown:
| Key | View | What it shows |
|---|---|---|
1 | Dashboard | All stream tables in a sortable list with status, mode, staleness, and last refresh time. A status ribbon at the top summarizes active/error/stale counts. |
2 | Detail | Deep dive into the selected stream table: properties (schema, status, mode, schedule, tier, refresh mode explanation), source tables, refresh history, CDC health, and diagnosed errors for error-state tables. |
3 | Dependencies | The stream table dependency graph rendered as an ASCII tree. Edges are color-coded by status (green = active, red = error). |
4 | Refresh Log | A scrollable timeline of recent refreshes across all tables — timestamp, mode (DIFF/FULL), table name, status, duration, and rows affected. |
5 | Diagnostics | Output of recommend_refresh_mode() — shows each table's current mode vs. recommended mode with confidence level and reasoning. |
6 | CDC Health | Change buffer sizes and byte counts per source table, plus the CDC mode (trigger/WAL). Large buffers are highlighted as warnings. |
7 | Configuration | All pg_trickle.* GUC parameters: current value, unit, category, and description. |
8 | Health Checks | Results of health_check() — each check displays a name, severity (OK/WARN/CRITICAL), and detail message. Critical items are shown in red. |
9 | Alerts | Real-time alert feed from LISTEN pg_trickle_alert. Shows timestamp, severity icon, and message for each event. |
w | Workers | Background scheduler worker pool: each worker's state (running/idle), the table it's refreshing, and duration. Below that, the pending job queue with priority and wait time. |
f | Fuse | Circuit breaker status for each stream table: fuse state (ARMED/TRIPPED/BLOWN), consecutive error count, and last error message. |
m | Watermarks | Watermark group alignment: group name, member count, min/max watermarks, and whether the group is gated. Two tabs: Groups and Gates. |
d | Delta Inspector | Fetches and displays the auto-generated delta SQL for the selected stream table (two tabs: Delta SQL and Auxiliary Columns). Press e to show the table's CREATE DDL. |
i | Issues | All detected DAG issues (cycles, orphans, missing sources) sorted by severity and blast radius. |
Keyboard Shortcuts
Navigation — works in all views:
| Key | Action |
|---|---|
j or ↓ | Move selection down |
k or ↑ | Move selection up |
Page Down / Page Up | Scroll 20 rows |
Home | Jump to first row |
End | Jump to last row |
Enter | Drill into detail (Dashboard → Detail view; Delta Inspector → reload delta SQL) |
Esc | Go back to Dashboard / close overlay / clear filter |
Tab | Switch sub-tabs (Delta Inspector: SQL ↔ Auxiliary Columns; Watermarks: Groups ↔ Gates) |
Write actions (view-specific):
| Key | View | Action |
|---|---|---|
r | Dashboard, Detail | Refresh selected stream table |
R | Dashboard | Refresh all active tables (with confirmation) |
p | Dashboard, Detail | Pause selected (with confirmation) |
P | Dashboard, Detail | Resume selected |
e | Detail, Delta Inspector | Show CREATE DDL overlay for selected table |
A | Fuse | Re-arm fuse for selected (with confirmation) |
g | Watermarks (Gates tab) | Gate / ungate selected source (confirmation for gate) |
Global actions:
| Key | Action |
|---|---|
/ | Open filter — type to search, Enter to apply, Esc to cancel |
: | Open command palette |
s / S | Cycle sort field / reverse sort direction (Dashboard) |
t | Toggle light/dark theme |
Ctrl+R | Force an immediate poll |
Ctrl+E | Export current view to JSON file (/tmp/pgtrickle_export_*.json) |
? | Toggle help overlay |
q or Ctrl+C | Quit |
View switching:
Press 1–9, w, f, m, d, g, or i to jump directly to any view.
The active view and selected table are shown in both the header bar and the
footer nav bar.
Command Palette
Press : to open the command palette. Tab-completion works on stream table
names. Available commands:
| Command | Description |
|---|---|
refresh <name> | Refresh a stream table (or refresh all) |
pause <name> | Pause a stream table |
resume <name> | Resume a paused stream table |
repair <name> | Re-install CDC triggers |
export <name> | Show CREATE DDL overlay |
explain <name> | Fetch and display delta SQL for a stream table |
validate <SQL> | Validate a SQL query against the extension |
fuse reset <name> | Reset the circuit breaker fuse |
quit | Exit the TUI |
LISTEN/NOTIFY
The TUI opens a second, dedicated database connection that runs
LISTEN pg_trickle_alert. Alerts (refresh failures, auto-suspension events,
etc.) appear in the Alerts view (9) in real time, without waiting for
the next poll cycle.
CLI Subcommands
Every subcommand runs non-interactively: it connects, executes one query, prints the result, and exits. This makes them suitable for shell scripts, cron jobs, CI pipelines, and monitoring probes.
Output Formats
All subcommands that produce tabular output accept --format / -f:
| Format | Flag | Description |
|---|---|---|
| Table | --format table (default) | Human-readable aligned columns |
| JSON | --format json | Array of objects on stdout |
| CSV | --format csv | Comma-separated values |
Command Reference
pgtrickle list
List all stream tables with status, mode, schedule, tier, and refresh stats.
pgtrickle list
pgtrickle list --format json
pgtrickle status <name>
Show detailed status for a single stream table.
pgtrickle status order_totals
pgtrickle status order_totals --format json
pgtrickle refresh <name>
Trigger a manual refresh of one stream table, or all of them.
pgtrickle refresh order_totals
pgtrickle refresh --all
pgtrickle create <name> <query>
Create a new stream table with the given defining query.
pgtrickle create my_totals "SELECT region, SUM(amount) FROM orders GROUP BY region"
pgtrickle create my_totals "SELECT ..." --schedule 5m --mode differential
pgtrickle create my_totals "SELECT ..." --no-initialize
| Flag | Description |
|---|---|
--schedule | Refresh schedule (e.g. 5m, @hourly) |
--mode | Refresh mode: auto, differential, full, immediate |
--no-initialize | Skip the initial refresh after creation |
pgtrickle drop <name>
Drop a stream table.
pgtrickle drop my_totals
pgtrickle alter <name>
Change a stream table's settings.
pgtrickle alter order_totals --mode full
pgtrickle alter order_totals --schedule 10m
pgtrickle alter order_totals --tier cold
pgtrickle alter order_totals --status paused
pgtrickle alter order_totals --query "SELECT ..."
| Flag | Description |
|---|---|
--mode | New refresh mode |
--schedule | New refresh schedule |
--tier | New scheduling tier (hot, warm, cold, frozen) |
--status | New status (active, paused, suspended) |
--query | New defining query (ALTER QUERY) |
pgtrickle export <name>
Print the DDL (SQL definition) for a stream table.
pgtrickle export order_totals
pgtrickle diag [name]
Show refresh mode diagnostics and recommendations. Without a name, shows all tables. With a name, shows diagnostics for that table only.
pgtrickle diag
pgtrickle diag order_totals
pgtrickle diag --format json
pgtrickle cdc
Show CDC change buffer sizes and health.
pgtrickle cdc
pgtrickle cdc --format json
pgtrickle graph
Print the stream table dependency graph as an ASCII tree.
pgtrickle graph
pgtrickle graph --format json
pgtrickle config
Show all pg_trickle.* GUC parameters, or set one.
pgtrickle config
pgtrickle config --set pg_trickle.unlogged_buffers=true
pgtrickle config --format json
The --set flag runs ALTER SYSTEM SET followed by pg_reload_conf().
pgtrickle health
Run system health checks. Returns exit code 1 if any check is CRITICAL.
pgtrickle health
pgtrickle health --format json
# Use in CI/monitoring:
pgtrickle health || echo "Health check failed"
pgtrickle workers
Show the background worker pool status and pending job queue.
pgtrickle workers
pgtrickle workers --format json
pgtrickle fuse
Show fuse (circuit breaker) status for all stream tables.
pgtrickle fuse
pgtrickle fuse --format json
pgtrickle watermarks
Show watermark groups and source gating status.
pgtrickle watermarks
pgtrickle watermarks --format json
pgtrickle explain <name>
Inspect the generated delta SQL, DVM operator tree, or deduplication stats for a stream table. By default shows the delta SQL.
pgtrickle explain order_totals # Delta SQL
pgtrickle explain order_totals --analyze # EXPLAIN ANALYZE on the delta
pgtrickle explain order_totals --operators # DVM operator tree
pgtrickle explain order_totals --dedup # Dedup stats per source
pgtrickle explain order_totals --format json
| Flag | Description |
|---|---|
--analyze | Run EXPLAIN ANALYZE on the delta query |
--operators | Show the DVM operator tree instead of raw SQL |
--dedup | Show change buffer deduplication statistics |
pgtrickle watch
Non-interactive continuous output mode. Polls the database and prints a status table at regular intervals. Useful for CI logs, monitoring, and terminals without TUI support.
pgtrickle watch # Default: every 2 seconds
pgtrickle watch -n 10 # Every 10 seconds
pgtrickle watch --compact # One line per table
pgtrickle watch --no-color # No ANSI color codes
pgtrickle watch --append # Append mode (don't clear screen)
# Log to a file
pgtrickle watch --compact --no-color --append >> /var/log/pgtrickle.log
| Flag | Short | Description |
|---|---|---|
--interval | -n | Poll interval in seconds (default: 2) |
--compact | One-line-per-table output | |
--no-color | Disable ANSI color codes | |
--append | Append to stdout instead of clearing the screen |
pgtrickle completions <shell>
Generate shell completion scripts. Install them once and get tab-completion for all subcommands and flags.
# Bash
pgtrickle completions bash > /etc/bash_completion.d/pgtrickle
# or for the current user:
pgtrickle completions bash > ~/.local/share/bash-completion/completions/pgtrickle
# Zsh
pgtrickle completions zsh > ~/.zfunc/_pgtrickle
# Fish
pgtrickle completions fish > ~/.config/fish/completions/pgtrickle.fish
# PowerShell
pgtrickle completions powershell > pgtrickle.ps1
Examples
Quick health check in CI
#!/bin/bash
set -e
export PGHOST=db.example.com PGDATABASE=analytics PGUSER=monitor
pgtrickle health || { echo "pg_trickle health check failed"; exit 1; }
pgtrickle list --format json | jq '.[] | select(.status != "ACTIVE")'
Monitor stream tables in a tmux pane
pgtrickle watch -n 5
Export all definitions for version control
for name in $(pgtrickle list --format json | jq -r '.[].name'); do
pgtrickle export "$name" > "sql/stream_tables/${name}.sql"
done
Debug a slow differential refresh
pgtrickle explain order_totals --analyze
pgtrickle explain order_totals --operators
pgtrickle explain order_totals --dedup
How It Works
The TUI connects to PostgreSQL using tokio-postgres (async, no TLS by
default) and queries pg_trickle's built-in SQL API functions:
| View | SQL function(s) |
|---|---|
| Dashboard | pgtrickle.st_refresh_stats() |
| Detail | pgtrickle.explain_refresh_mode(), pgtrickle.list_sources(), pgtrickle.get_refresh_history(), pgtrickle.diagnose_errors() |
| Dependencies | pgtrickle.dependency_tree() |
| Refresh Log | pgtrickle.refresh_timeline() |
| Diagnostics | pgtrickle.recommend_refresh_mode() |
| CDC Health | pgtrickle.change_buffer_sizes(), pgtrickle.check_cdc_health() |
| Configuration | pg_settings WHERE name LIKE 'pg_trickle.%' |
| Health Checks | pgtrickle.health_check() |
| Alerts | LISTEN pg_trickle_alert (real-time) |
| Workers | pgtrickle.worker_pool_status(), pgtrickle.parallel_job_status() |
| Fuse | pgtrickle.fuse_status() |
| Watermarks | pgtrickle.watermark_groups(), pgtrickle.source_gate_status() |
| Delta Inspector | pgtrickle.explain_delta(), pgtrickle.list_auxiliary_columns(), pgtrickle.pgt_stream_tables (DDL) |
| Issues | pgtrickle.dag_issues() |
In interactive mode, a background task polls all of these every 2 seconds
and pushes state updates to the rendering loop. A second connection runs
LISTEN pg_trickle_alert for real-time notifications.
The TUI is purely a client — it reads from pg_trickle's monitoring API and
sends commands (refresh, create, drop, alter) through the same SQL functions
you would call from psql. It does not require any special privileges beyond
what the pg_trickle SQL API requires.
Planned: cache_stats() and health_summary() Integration
Status: Not yet surfaced in the TUI (v0.18.0 gap).
The following SQL functions are available but not yet integrated into the TUI:
pgtrickle.cache_stats()— template cache hit rate, L1 hits, evictions, delta cache entries. Useful for monitoring cache effectiveness.pgtrickle.health_summary()— single-row deployment overview with total/active/error/stale stream table counts, P99 refresh latency, scheduler status, and cache hit rate.
Lightest integration path: Add cache hit rate to the Dashboard status
ribbon (currently shows scheduler status from quick_health). The Health
Checks view (8) could display health_summary() fields alongside the
existing health_check() results. Both functions are already available via
raw SQL (psql, Grafana, or the Prometheus exporter).
Tech Stack
| Component | Crate | Purpose |
|---|---|---|
| Terminal rendering | ratatui 0.29 + crossterm 0.28 | Full-screen TUI with color, layout, widgets |
| Async runtime | tokio 1.x | Background polling, LISTEN/NOTIFY, signals |
| PostgreSQL | tokio-postgres 0.7 | Async database queries |
| CLI parsing | clap 4.x | Subcommands, flags, env var integration |
| Table output | comfy-table 7.x | Aligned text tables for CLI mode |
| Serialization | serde + serde_json | JSON and CSV output formats |
| Shell completions | clap_complete 4.x | bash/zsh/fish/PowerShell completions |
Contributing to pg_trickle
Thank you for your interest in contributing! pg_trickle is an Apache 2.0-licensed open-source project and welcomes contributions of all kinds.
Before You Start
- Check the open issues and discussions to avoid duplicating work.
- For non-trivial changes, open an issue first to discuss the approach.
- Read AGENTS.md — it is the authoritative guide for all coding conventions, error handling rules, module layout, and test requirements.
- Read docs/ARCHITECTURE.md to understand the system.
- Read ROADMAP.md to see what work is planned.
Ways to Contribute
| Type | Where to start |
|---|---|
| Bug report | Open an issue |
| Feature request | Open an issue or start a discussion |
| Documentation fix | Open a PR directly — no issue needed for typos/clarity |
| Code fix or feature | Open an issue first, then a PR |
| Performance improvement | Include benchmark numbers (see just bench) |
Development Setup
# Install pgrx
cargo install cargo-pgrx --version "=0.17.0"
cargo pgrx init --pg18 /usr/lib/postgresql/18/bin/pg_config
# Build
cargo build
# Format + lint (required before every PR)
just fmt
just lint
# Run tests
just test-unit # fast, no DB
just test-integration # Testcontainers
just test-light-e2e # PR-equivalent Light E2E tier (stock postgres)
just test-e2e # full E2E (builds Docker image)
just test-pgbouncer # PgBouncer transaction-pool compatibility tests
Full setup instructions are in INSTALL.md.
Devcontainer / Containerized Development
If you are developing in a devcontainer, use the default non-root vscode user
and run the normal commands from the workspace root:
just fmt
just lint
just test-unit
just test-unit uses scripts/run_unit_tests.sh, which now selects a writable
and cache-friendly target directory in this order:
target/(preferred).cargo-target/(project-local fallback)$HOME/.cache/pg_trickle-target${TMPDIR:-/tmp}/pg_trickle-target(last resort)
This avoids permission failures on bind mounts and preserves incremental builds when source or test files change.
If you see permission errors in containerized runs, verify you are not forcing a different container user/UID than expected by your workspace mount.
Run E2E tests in devcontainer
E2E tests use Testcontainers and require Docker access from inside the
devcontainer (provided by the Docker-in-Docker feature in
.devcontainer/devcontainer.json).
Run from the workspace root inside the devcontainer:
just build-e2e-image
just test-e2e
Notes:
- The E2E harness starts containers via
testcontainers(tests/e2e/mod.rs). - The default E2E image is
pg_trickle_e2e:latest(built bytests/build_e2e_image.sh). - A plain
docker runof the dev image is not equivalent to a full VS Code devcontainer session with features/lifecycle hooks enabled.
Making a Pull Request
- Fork the repository and create a branch:
git checkout -b fix/my-fix - Make your changes following the conventions in AGENTS.md
- Run
just fmt && just lint— both must pass with zero warnings - Add or update tests — see AGENTS.md § Testing
- Open a PR against
main
The PR template will walk you through the checklist.
CI Coverage on PRs
PR CI runs a three-tier gate:
- Unit tests (Linux only)
- Integration tests
- Light E2E — curated PR-friendly end-to-end coverage split across three
shards and executed against stock
postgres:18.3
Full E2E, TPC-H tests, benchmarks, dbt, CNPG smoke, and the extra macOS / Windows unit jobs stay off the PR critical path and run on push-to-main, schedule, or manual dispatch. This keeps typical PR feedback closer to the single-digit-minute range while preserving broader scheduled coverage.
To trigger the full CI matrix on your PR branch (recommended for DVM engine, refresh, or CDC changes):
gh workflow run ci.yml --ref <your-branch>
To run all tests locally before pushing:
just test-all # unit + integration + e2e
# PR-equivalent fast path:
just test-unit
just test-integration
just test-light-e2e
# TPC-H correctness tests (requires e2e Docker image):
cargo test --test e2e_tpch_tests -- --ignored --test-threads=1 --nocapture
See AGENTS.md § Testing for the full CI coverage matrix.
Coding Conventions (summary)
- No
unwrap()orpanic!()in non-test code - All
unsafeblocks require a// SAFETY:comment - Errors go through
PgTrickleErrorinsrc/error.rs - New SQL functions use
#[pg_extern(schema = "pgtrickle")] - Tests use Testcontainers — never a local PostgreSQL instance
Full details are in AGENTS.md.
Commit Messages
Use Conventional Commits:
fix: correct pgoutput action parsing for tables named INSERT_LOG
feat: add CUBE explosion guard (max 64 UNION ALL branches)
docs: document JOIN key change limitation in SQL_REFERENCE
test: add E2E test for keyless table duplicate-row behaviour
License
By contributing you agree that your contributions will be licensed under the Apache License 2.0.
dbt-pgtrickle
A dbt package that integrates
pg_trickle stream tables into your dbt
project via a custom stream_table materialization.
No custom Python adapter required — works with the standard dbt-postgres
adapter. Just Jinja SQL macros that call pg_trickle's SQL API.
Prerequisites
| Requirement | Minimum Version |
|---|---|
| dbt Core | ≥ 1.9 |
| dbt-postgres adapter | Matching dbt Core version |
| PostgreSQL | 18.x |
| pg_trickle extension | ≥ 0.1.0 (CREATE EXTENSION pg_trickle;) |
Installation
From Git (recommended until dbt Hub listing is live)
Add to your packages.yml:
packages:
- git: "https://github.com/grove/pg-trickle.git"
revision: v0.15.0
subdirectory: "dbt-pgtrickle"
From dbt Hub (once published)
After the package is listed on dbt Hub, you can install by package name:
packages:
- package: grove/dbt_pgtrickle
version: [">=0.15.0", "<1.0.0"]
Note: dbt Hub listing requires a separate GitHub repository for the package. See docs/integrations/dbt-hub-submission.md for the submission checklist and steps.
Then run:
dbt deps
Quick Start
Create a model with materialized='stream_table':
-- models/marts/order_totals.sql
{{
config(
materialized='stream_table',
schedule='5m',
refresh_mode='DIFFERENTIAL'
)
}}
SELECT
customer_id,
SUM(amount) AS total_amount,
COUNT(*) AS order_count
FROM {{ source('raw', 'orders') }}
GROUP BY customer_id
dbt run --select order_totals # Creates the stream table
dbt test --select order_totals # Tests work normally (it's a real table)
Configuration Reference
| Key | Type | Default | Description |
|---|---|---|---|
materialized | string | — | Must be 'stream_table' |
schedule | string/null | '1m' | Refresh schedule (e.g., '5m', '1h', cron). null for pg_trickle's CALCULATED schedule. |
refresh_mode | string | 'DIFFERENTIAL' | 'FULL', 'DIFFERENTIAL', 'AUTO', or 'IMMEDIATE' |
initialize | bool | true | Populate on creation |
status | string/null | null | 'ACTIVE' or 'PAUSED'. When set, applies on subsequent runs via alter_stream_table(). |
stream_table_name | string | model name | Override stream table name |
stream_table_schema | string | target schema | Override schema |
cdc_mode | string/null | null | CDC mode override: 'auto', 'trigger', or 'wal'. null uses the GUC default. |
partition_by | string/null | null | Column name for RANGE partitioning of the storage table (v0.13.0+). Cannot be changed after creation. |
fuse | string/null | null | Fuse circuit-breaker mode: 'off', 'on', or 'auto' (v0.13.0+). Applied via alter_stream_table() on every run; no-op if unchanged. |
fuse_ceiling | int/null | null | Change-count threshold that triggers the fuse (v0.13.0+). null uses the global GUC default. |
fuse_sensitivity | int/null | null | Number of consecutive over-ceiling observations before the fuse blows (v0.13.0+). null means 1 (immediate). |
partition_by — RANGE partitioning
Partition the stream table's storage table by a column value. pg_trickle creates a PARTITION BY RANGE (<col>) storage table with a default catch-all partition. Add your own date/integer range partitions via standard PostgreSQL DDL after dbt run.
-- models/marts/events_by_day.sql
{{ config(
materialized='stream_table',
schedule='1m',
refresh_mode='DIFFERENTIAL',
partition_by='event_day'
) }}
SELECT
event_day,
user_id,
COUNT(*) AS event_count
FROM {{ source('raw', 'events') }}
GROUP BY event_day, user_id
Note:
partition_byis applied only at creation time. Changing it after the stream table exists has no effect. Usedbt run --full-refreshto recreate with a new partition key.
fuse — Circuit breaker
The fuse circuit breaker suspends refreshes when the change volume exceeds a threshold, protecting against runaway refresh cycles during bulk ingestion.
-- models/marts/order_totals.sql
{{ config(
materialized='stream_table',
schedule='5m',
refresh_mode='DIFFERENTIAL',
fuse='auto',
fuse_ceiling=50000,
fuse_sensitivity=3
) }}
SELECT customer_id, SUM(amount) AS total
FROM {{ source('raw', 'orders') }}
GROUP BY customer_id
fuse value | Behaviour |
|---|---|
'off' | Fuse disabled (default) |
'on' | Fuse always active; blows when ceiling is exceeded |
'auto' | Fuse activates only when the delta is large enough to make FULL refresh cheaper than DIFFERENTIAL |
Fuse parameters are applied on every dbt run via alter_stream_table() — only calls the SQL function when the values have genuinely changed from the catalog state.
Project-level defaults
# dbt_project.yml
models:
my_project:
marts:
+materialized: stream_table
+schedule: '5m'
+refresh_mode: DIFFERENTIAL
Operations
pgtrickle_refresh — Manual refresh
dbt run-operation pgtrickle_refresh --args '{"model_name": "order_totals"}'
refresh_all_stream_tables — Refresh all in dependency order
Refreshes all dbt-managed stream tables in topological (dependency) order.
Upstream tables are refreshed before downstream ones. Designed for CI pipelines:
run after dbt run and before dbt test to ensure all data is current.
# Refresh all dbt-managed stream tables
dbt run-operation refresh_all_stream_tables
# Refresh only stream tables in a specific schema
dbt run-operation refresh_all_stream_tables --args '{"schema": "analytics"}'
drop_all_stream_tables — Drop dbt-managed stream tables
Drops only stream tables defined as dbt models (safe in shared environments):
dbt run-operation drop_all_stream_tables
drop_all_stream_tables_force — Drop ALL stream tables
Drops everything from the pg_trickle catalog, including non-dbt stream tables:
dbt run-operation drop_all_stream_tables_force
pgtrickle_check_cdc_health — CDC pipeline health
dbt run-operation pgtrickle_check_cdc_health
Raises an error (non-zero exit) if any CDC source is unhealthy.
Freshness Monitoring
Native dbt source freshness is not supported (the last_refresh_at column lives in
the catalog, not on the stream table). Use the pgtrickle_check_freshness run-operation
instead:
# Check all active stream tables (defaults: warn=600s, error=1800s)
dbt run-operation pgtrickle_check_freshness
# Custom thresholds
dbt run-operation pgtrickle_check_freshness \
--args '{model_name: order_totals, warn_seconds: 300, error_seconds: 900}'
Exits non-zero when any stream table exceeds the error threshold — safe for CI.
Useful dbt Commands
# List all stream table models
dbt ls --select config.materialized:stream_table
# Full refresh (drop + recreate)
dbt run --select order_totals --full-refresh
# Build models + tests in DAG order
dbt build --select order_totals
Note: dbt build runs stream table models early in the DAG. If downstream models
depend on a stream table with initialize: false, the table may not be populated yet.
Testing
Stream tables are standard PostgreSQL heap tables — all dbt tests work normally:
models:
- name: order_totals
columns:
- name: customer_id
tests:
- not_null
- unique
Stream Table Health Test
Use the built-in stream_table_healthy generic test to fail your dbt test suite
when a stream table is stale, erroring, or paused:
models:
- name: order_totals
tests:
- dbt_pgtrickle.stream_table_healthy:
warn_seconds: 300 # fail if stale for more than 5 minutes
The test queries pgtrickle.pg_stat_stream_tables and returns rows for any
unhealthy condition. An empty result set means the stream table is healthy.
Stream Table Status Macro
For more programmatic control, use the pgtrickle_stream_table_status() macro
directly in custom tests or run-operations:
{%- set st = dbt_pgtrickle.pgtrickle_stream_table_status('order_totals', warn_seconds=300) -%}
{# st.status is one of: 'healthy', 'stale', 'erroring', 'paused', 'not_found' #}
{# st.staleness_seconds, st.consecutive_errors, st.total_refreshes, etc. #}
__pgt_row_id Column
pg_trickle adds an internal __pgt_row_id column to stream tables for row identity
tracking. This column:
- Appears in
SELECT *anddbt docs generate - Does not affect
dbt testunless you check column counts - Can be documented to reduce confusion:
columns:
- name: __pgt_row_id
description: "Internal pg_trickle row identity hash. Ignore this column."
Limitations
| Limitation | Workaround |
|---|---|
| No in-place query alteration | Materialization auto-drops and recreates when query changes |
__pgt_row_id visible | Document it; exclude in downstream SELECT |
No native dbt source freshness | Use pgtrickle_check_freshness run-operation |
No dbt snapshot support | Snapshot the stream table as a regular table |
| Query change detection is whitespace-sensitive | dbt compiles deterministically; unnecessary recreations are safe |
| PostgreSQL 18 required | Extension requirement |
| Shared version tags with pg_trickle extension | Pin to specific git revision |
Contributing
See AGENTS.md for development guidelines and the implementation plan for design rationale.
Running tests locally
The quickest way (requires Docker and dbt installed):
# Full run — builds Docker image, starts container, runs tests, cleans up
just test-dbt
# Fast run — reuses existing Docker image (run after first build)
just test-dbt-fast
Or use the script directly with options:
cd dbt-pgtrickle/integration_tests/scripts
# Default: builds image, runs tests with dbt 1.9, cleans up
./run_dbt_tests.sh
# Skip image rebuild (faster iteration)
./run_dbt_tests.sh --skip-build
# Keep the container running after tests (for debugging)
./run_dbt_tests.sh --skip-build --keep-container
# Use a custom port (avoids conflicts with local PostgreSQL)
PGPORT=25432 ./run_dbt_tests.sh
Manual testing against an existing pg_trickle instance
If you already have PostgreSQL 18 + pg_trickle running locally:
export PGHOST=localhost PGPORT=5432 PGUSER=postgres PGPASSWORD=postgres PGDATABASE=postgres
cd dbt-pgtrickle/integration_tests
dbt deps
dbt seed
dbt run
./scripts/wait_for_populated.sh order_totals 30
dbt test
dbt run-operation drop_all_stream_tables
License
Apache 2.0 — see LICENSE.
CloudNativePG / Kubernetes
pg_trickle is designed to work with CloudNativePG (CNPG) — the Kubernetes operator for PostgreSQL. The extension is loaded via Image Volume Extensions, meaning no custom PostgreSQL image is needed.
Prerequisites
- Kubernetes 1.33+ with the ImageVolume feature gate enabled
- CloudNativePG operator 1.28+
- The
pg_trickle-extOCI image available in your cluster registry
Architecture
┌─────────────────────────────────────┐
│ CNPG Cluster (3 pods) │
│ │
│ ┌──────────┐ ┌──────────────────┐ │
│ │ Primary │ │ pg_trickle-ext │ │
│ │ PG 18 │◄─┤ (ImageVolume) │ │
│ │ │ │ .so + .sql only │ │
│ └──────────┘ └──────────────────┘ │
│ ┌──────────┐ ┌──────────┐ │
│ │ Replica 1│ │ Replica 2│ │
│ │ (standby)│ │ (standby)│ │
│ └──────────┘ └──────────┘ │
└─────────────────────────────────────┘
- The scheduler runs on the primary pod only. Replica pods detect recovery
mode (
pg_is_in_recovery() = true) and sleep. - Stream tables are replicated to standbys via physical streaming replication like any other heap table.
- Pod restarts are safe — the scheduler resumes from the stored frontier with no data loss.
Deploying pg_trickle on CNPG
1. Build the extension image
The cnpg/Dockerfile.ext builds a scratch-based OCI image containing
only the shared library, control file, and SQL migrations:
# From the dist/ directory with pre-built artifacts:
docker build -t ghcr.io/<owner>/pg_trickle-ext:0.13.0 -f cnpg/Dockerfile.ext dist/
docker push ghcr.io/<owner>/pg_trickle-ext:0.13.0
2. Deploy the Cluster
Apply the Cluster manifest with pg_trickle configured as an Image Volume extension:
# cnpg/cluster-example.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: pg-trickle-demo
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:18
postgresql:
shared_preload_libraries:
- pg_trickle
extensions:
- name: pg-trickle
image:
reference: ghcr.io/<owner>/pg_trickle-ext:0.13.0
parameters:
max_worker_processes: "8"
bootstrap:
initdb:
database: app
owner: app
storage:
size: 10Gi
storageClass: standard
kubectl apply -f cnpg/cluster-example.yaml
3. Enable the extension
Use the CNPG Database resource for declarative extension management:
# cnpg/database-example.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Database
metadata:
name: app
spec:
cluster:
name: pg-trickle-demo
name: app
owner: app
extensions:
- name: pg_trickle
kubectl apply -f cnpg/database-example.yaml
4. Verify
kubectl exec -it pg-trickle-demo-1 -- psql -U postgres -d app -c \
"SELECT pgtrickle.version();"
Key Considerations
Worker processes
Each database with pg_trickle needs one background worker slot. Set
max_worker_processes in the Cluster manifest to accommodate the launcher
(1) + one scheduler per database + any parallel refresh workers:
parameters:
max_worker_processes: "16"
Persistent volumes
Catalog tables (pgtrickle.pgt_stream_tables) and change buffers
(pgtrickle_changes.*) are stored in regular PostgreSQL tablespaces.
Persistent volume claims preserve them across pod rescheduling.
Backups
pg_trickle state (catalog, change buffers, stream table data) is included in CNPG's Barman object-store backups automatically. After a restore, the scheduler detects frontier inconsistencies and performs a full refresh on the first cycle. See Backup and Restore for details.
Failover
When the primary pod fails and a replica is promoted, the new primary's scheduler starts automatically. Since stream tables were replicated via streaming replication, they are already up-to-date (minus replication lag). The scheduler resumes refreshing from the stored frontier.
Resource limits
For production deployments, set resource requests and limits in the Cluster manifest to prevent the scheduler from starving other workloads:
resources:
requests:
memory: 512Mi
cpu: 500m
limits:
memory: 2Gi
cpu: 2000m
Example manifests
The repository includes ready-to-use manifests in the cnpg/ directory:
| File | Purpose |
|---|---|
cnpg/Dockerfile.ext | Build the scratch-based extension image |
cnpg/Dockerfile.ext-build | Multi-stage build for CI/CD pipelines |
cnpg/cluster-example.yaml | Complete Cluster manifest with pg_trickle |
cnpg/database-example.yaml | Database resource with declarative extension management |
Further reading
- CloudNativePG Image Volume Extensions
- CloudNativePG Declarative Database Management
- Backup and Restore
- Configuration Reference
Prometheus & Grafana Monitoring
pg_trickle ships with a complete observability stack based on
postgres_exporter, Prometheus, and Grafana. The monitoring/
directory in the repository contains everything you need.
Quick Start
cd monitoring/
docker compose up -d
Open Grafana at http://localhost:3000 (default: admin / admin).
The pg_trickle Overview dashboard is pre-provisioned.
Architecture
PostgreSQL + pg_trickle
│
│ custom SQL queries
▼
postgres_exporter (:9187)
│
│ /metrics (Prometheus format)
▼
Prometheus (:9090)
│
│ data source
▼
Grafana (:3000)
postgres_exporter runs custom SQL queries defined in
prometheus/pg_trickle_queries.yml against the pg_trickle monitoring views
(pgtrickle.stream_tables_info, pgtrickle.pg_stat_stream_tables, etc.)
and exposes them as Prometheus metrics.
Connecting to an Existing Database
If you already have PostgreSQL + pg_trickle running, configure the exporter to point at your instance:
export PG_HOST=your-pg-host
export PG_PORT=5432
export PG_USER=postgres
export PG_PASSWORD=yourpassword
export PG_DATABASE=yourdb
docker compose up -d
Or edit the DATA_SOURCE_NAME in docker-compose.yml directly.
Metrics Exposed
All metrics are prefixed pg_trickle_.
| Metric | Type | Description |
|---|---|---|
pg_trickle_stream_tables_total | gauge | Total stream tables by status |
pg_trickle_stale_tables_total | gauge | Tables with data older than schedule |
pg_trickle_consecutive_errors | gauge | Per-table consecutive error count |
pg_trickle_refresh_duration_ms | gauge | Average refresh duration (ms) |
pg_trickle_total_refreshes | counter | Total refresh count per table |
pg_trickle_failed_refreshes | counter | Failed refresh count per table |
pg_trickle_rows_inserted_total | counter | Rows inserted per table |
pg_trickle_rows_deleted_total | counter | Rows deleted per table |
pg_trickle_staleness_seconds | gauge | Seconds since last successful refresh |
pg_trickle_cdc_pending_rows | gauge | Pending rows in CDC change buffer |
pg_trickle_cdc_buffer_bytes | gauge | CDC change buffer size in bytes |
pg_trickle_scheduler_running | gauge | 1 if scheduler background worker is alive |
pg_trickle_health_status | gauge | Overall health: 0=OK, 1=WARNING, 2=CRITICAL |
Pre-configured Alerts
Alerting rules are defined in prometheus/alerts.yml:
| Alert | Condition | Severity |
|---|---|---|
PgTrickleTableStale | Staleness > 5 min past schedule | warning |
PgTrickleConsecutiveErrors | ≥ 3 consecutive refresh failures | warning |
PgTrickleTableSuspended | Any table in SUSPENDED status | critical |
PgTrickleCdcBufferLarge | CDC buffer > 1 GB | warning |
PgTrickleSchedulerDown | Scheduler not running for > 2 min | critical |
PgTrickleHighRefreshDuration | Avg refresh > 30 s | warning |
NOTIFY-Based Alerting
In addition to Prometheus alerts, pg_trickle emits real-time PostgreSQL
NOTIFY events on the pg_trickle_alert channel:
LISTEN pg_trickle_alert;
Events include stale_data, auto_suspended, reinitialize_needed,
buffer_growth_warning, fuse_blown, refresh_completed, and
refresh_failed. Each notification carries a JSON payload with the stream
table name and relevant details.
You can bridge NOTIFY events to external alerting systems (PagerDuty, Slack,
etc.) using tools like pgnotify or a
simple LISTEN loop in your application.
Grafana Dashboard
The pre-provisioned pg_trickle Overview dashboard
(grafana/dashboards/pg_trickle_overview.json) includes panels for:
- Stream table status distribution (active / suspended / error)
- Refresh rate and duration over time
- Staleness heatmap
- CDC buffer sizes
- Consecutive error counts
- Scheduler uptime
Built-in SQL Monitoring Views
pg_trickle also provides built-in monitoring accessible without Prometheus:
-- Quick health overview (returns warnings and errors)
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';
-- Stream table status and staleness
SELECT name, status, refresh_mode, staleness
FROM pgtrickle.stream_tables_info;
-- Detailed refresh statistics
SELECT * FROM pgtrickle.pg_stat_stream_tables;
-- CDC health per source table
SELECT * FROM pgtrickle.check_cdc_health();
-- Change buffer sizes
SELECT * FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
See the SQL Reference for the complete list of monitoring functions.
Files Reference
| File | Purpose |
|---|---|
monitoring/docker-compose.yml | Demo stack: PG + exporter + Prometheus + Grafana |
monitoring/prometheus/prometheus.yml | Prometheus scrape configuration |
monitoring/prometheus/pg_trickle_queries.yml | Custom SQL queries for postgres_exporter |
monitoring/prometheus/alerts.yml | Alerting rules |
monitoring/grafana/provisioning/ | Auto-provisioned data source + dashboard |
monitoring/grafana/dashboards/pg_trickle_overview.json | Overview dashboard |
Requirements
- Docker 24+ with Compose v2
- pg_trickle 0.10.0+ installed in the target database
- PostgreSQL user with
SELECTon thepgtrickle.*schema
PgBouncer & Connection Poolers
pg_trickle's background scheduler uses session-level PostgreSQL features. This page explains how to configure pg_trickle alongside connection poolers like PgBouncer, Supavisor (Supabase), and PgCat.
Compatibility Matrix
| Pooling Mode | Compatible? | Notes |
|---|---|---|
Session mode (pool_mode = session) | ✅ Fully | All features work. |
| Direct connection (no pooler for scheduler) | ✅ Fully | Application queries can still go through a pooler. |
Transaction mode (pool_mode = transaction) | ❌ Not supported | Advisory locks, prepared statements, and LISTEN/NOTIFY are session-scoped. |
Statement mode (pool_mode = statement) | ❌ Not supported | Same session-scoped limitations. |
Why Transaction Mode Breaks
The pg_trickle scheduler relies on three session-level features:
| Feature | Problem in Transaction Mode |
|---|---|
pg_advisory_lock() | Session lock released when connection returns to pool — concurrent refreshes become possible |
PREPARE / EXECUTE | Prepared statements vanish on connection hop — "prepared statement does not exist" errors |
LISTEN / NOTIFY | Listener loses notifications when assigned a different backend connection |
Recommended Setup
Route the pg_trickle background worker through a direct connection while keeping application traffic on the pooler:
┌─────────────────┐ ┌──────────────┐
│ Application │────▶│ PgBouncer │──┐
│ (transaction │ │ (txn mode) │ │
│ mode OK) │ └──────────────┘ │
└─────────────────┘ │
▼
┌─────────────────┐ ┌─────────────┐
│ pg_trickle │───────────────▶│ PostgreSQL │
│ scheduler │ direct conn │ │
│ (session mode) │ └─────────────┘
└─────────────────┘
The scheduler connects directly to PostgreSQL as a background worker — it does not go through the pooler at all. No special configuration is needed for this; the scheduler always uses an internal SPI connection.
The pooler only matters for application queries that read from stream
tables or call pg_trickle functions (e.g., refresh_stream_table()).
Platform-Specific Notes
Supabase
Supabase uses Supavisor in transaction mode by default. pg_trickle's
scheduler works because it runs as a background worker (bypasses the
pooler). Application queries against stream tables work normally through
the pooler since they are regular SELECT statements.
If you call pgtrickle.refresh_stream_table() from application code,
use the direct connection string (port 5432) rather than the pooled
connection (port 6543).
Neon
Neon uses a custom proxy that supports both session and transaction modes. Use the session-mode connection string for any pg_trickle management calls. The scheduler runs as a background worker and is unaffected by the proxy.
AWS RDS Proxy
RDS Proxy only supports transaction-mode pooling. The pg_trickle scheduler runs as a background worker inside the RDS instance and is unaffected. Application queries reading stream tables work normally through the proxy.
Manual refresh_stream_table() calls through the proxy may fail due to
advisory lock issues. Use a direct connection for management operations.
Pooler Compatibility Mode
pg_trickle includes a pooler_compatibility_mode setting (v0.10.0+) that
adjusts internal behavior for environments where the scheduler's SPI
connection may be affected by pooler-like middleware:
-- Usually not needed — the scheduler bypasses external poolers
SHOW pg_trickle.pooler_compatibility_mode;
This GUC is primarily for edge cases in managed PostgreSQL services. For standard deployments, the default setting works correctly.
Further Reading
Flyway & Liquibase Migration Frameworks
pg_trickle stream tables are managed through SQL function calls, not standard
DDL (CREATE TABLE / ALTER TABLE). This page documents patterns for
integrating pg_trickle with Flyway and Liquibase migration frameworks.
Key Principle
Stream tables are created and managed via pgtrickle.create_stream_table(),
pgtrickle.alter_stream_table(), and pgtrickle.drop_stream_table(). These
are regular SQL function calls that can be embedded in any migration script.
CDC triggers are automatically installed on source tables during stream table creation — no manual trigger management is needed.
Flyway
Creating Stream Tables in Migrations
Place stream table definitions in versioned migration files alongside your regular schema changes:
-- V3__create_order_stream_tables.sql
-- 1. Create the source tables first (standard DDL)
CREATE TABLE IF NOT EXISTS orders (
id SERIAL PRIMARY KEY,
region TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
-- 2. Create stream tables via pg_trickle API
SELECT pgtrickle.create_stream_table(
'order_totals',
$$SELECT region, COUNT(*) AS order_count, SUM(amount) AS total
FROM orders GROUP BY region$$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
Altering Stream Tables
Use pgtrickle.alter_stream_table() in a new migration:
-- V5__update_order_totals_schedule.sql
SELECT pgtrickle.alter_stream_table(
'order_totals',
schedule => '10s'
);
Altering the Defining Query
Use alter_query to change the SQL without dropping and recreating:
-- V7__add_avg_to_order_totals.sql
SELECT pgtrickle.alter_stream_table(
'order_totals',
alter_query => $$SELECT region,
COUNT(*) AS order_count,
SUM(amount) AS total,
AVG(amount) AS avg_amount
FROM orders GROUP BY region$$
);
Dropping Stream Tables
-- V9__remove_legacy_stream_tables.sql
SELECT pgtrickle.drop_stream_table('legacy_report');
Bulk Creation
For environments with many stream tables, use bulk_create to create
them atomically:
-- V4__create_all_stream_tables.sql
SELECT pgtrickle.bulk_create('[
{
"name": "order_totals",
"query": "SELECT region, COUNT(*) AS cnt, SUM(amount) AS total FROM orders GROUP BY region",
"schedule": "5s",
"refresh_mode": "DIFFERENTIAL"
},
{
"name": "daily_revenue",
"query": "SELECT date_trunc(''day'', created_at) AS day, SUM(amount) AS revenue FROM orders GROUP BY 1",
"schedule": "30s",
"refresh_mode": "DIFFERENTIAL"
}
]'::jsonb);
Ordering: Source Tables Before Stream Tables
Flyway executes migrations in version order. Ensure source tables are created in an earlier migration than their dependent stream tables:
V1__create_schema.sql -- CREATE TABLE orders, products, ...
V2__create_indexes.sql -- CREATE INDEX ...
V3__create_stream_tables.sql -- SELECT pgtrickle.create_stream_table(...)
Repeatable Migrations
If you want stream table definitions to be re-applied on every Flyway run (for development environments), use repeatable migrations:
-- R__stream_tables.sql
-- Drop and recreate all stream tables
SELECT pgtrickle.drop_stream_table('order_totals')
WHERE EXISTS (
SELECT 1 FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'order_totals'
);
SELECT pgtrickle.create_stream_table(
'order_totals',
$$SELECT region, COUNT(*) AS cnt FROM orders GROUP BY region$$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
Or use create_or_replace_stream_table for idempotent definitions:
-- R__stream_tables.sql (idempotent)
SELECT pgtrickle.create_or_replace_stream_table(
'order_totals',
$$SELECT region, COUNT(*) AS cnt FROM orders GROUP BY region$$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
Handling ALTER TABLE on Source Tables
When a Flyway migration alters a source table (e.g., adding a column), pg_trickle's DDL event trigger detects the change and suspends affected stream tables. After the schema change, stream tables resume automatically on the next refresh cycle.
If the source table change invalidates the stream table's defining query (e.g., removing a referenced column), you must update or drop the stream table in the same or a subsequent migration.
Liquibase
Creating Stream Tables in Changesets
Use Liquibase's <sql> tag to call pg_trickle functions:
<!-- changelog-3.0.xml -->
<changeSet id="create-order-stream-tables" author="dev">
<sql>
SELECT pgtrickle.create_stream_table(
'order_totals',
$pgt$SELECT region, COUNT(*) AS order_count, SUM(amount) AS total
FROM orders GROUP BY region$pgt$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
</sql>
<rollback>
<sql>SELECT pgtrickle.drop_stream_table('order_totals');</sql>
</rollback>
</changeSet>
Rollback Support
Always include <rollback> blocks that drop the stream table:
<changeSet id="add-daily-revenue-st" author="dev">
<sql>
SELECT pgtrickle.create_stream_table(
'daily_revenue',
$pgt$SELECT date_trunc('day', created_at) AS day,
SUM(amount) AS revenue
FROM orders GROUP BY 1$pgt$,
schedule => '30s',
refresh_mode => 'DIFFERENTIAL'
);
</sql>
<rollback>
<sql>SELECT pgtrickle.drop_stream_table('daily_revenue');</sql>
</rollback>
</changeSet>
Altering Stream Tables
<changeSet id="update-order-totals-schedule" author="dev">
<sql>
SELECT pgtrickle.alter_stream_table(
'order_totals',
schedule => '10s'
);
</sql>
<rollback>
<sql>
SELECT pgtrickle.alter_stream_table(
'order_totals',
schedule => '5s'
);
</sql>
</rollback>
</changeSet>
Preconditions
Use Liquibase preconditions to check whether pg_trickle is available:
<changeSet id="create-stream-tables" author="dev">
<preConditions onFail="MARK_RAN">
<sqlCheck expectedResult="1">
SELECT COUNT(*) FROM pg_extension WHERE extname = 'pg_trickle'
</sqlCheck>
</preConditions>
<sql>
SELECT pgtrickle.create_stream_table(...);
</sql>
</changeSet>
Common Patterns
Environment-Specific Schedules
Use different schedules for development vs. production:
-- Use a function to parameterize schedules
SELECT pgtrickle.create_stream_table(
'order_totals',
$$SELECT region, COUNT(*) AS cnt FROM orders GROUP BY region$$,
schedule => CASE
WHEN current_setting('pg_trickle.enabled', true) = 'on'
THEN '5s'
ELSE '1m'
END,
refresh_mode => 'DIFFERENTIAL'
);
CI/Test Environments
In CI, set pg_trickle.enabled = off in postgresql.conf to prevent the
background scheduler from running during schema migrations. Stream tables
will still be created correctly — they just won't auto-refresh until the
scheduler is enabled.
Extension Dependency
Ensure CREATE EXTENSION pg_trickle runs before any stream table migration.
In Flyway, use an early versioned migration:
-- V0__extensions.sql
CREATE EXTENSION IF NOT EXISTS pg_trickle;
In Liquibase:
<changeSet id="install-extensions" author="dev" runOnChange="true">
<sql>CREATE EXTENSION IF NOT EXISTS pg_trickle;</sql>
</changeSet>
Further Reading
- SQL Reference — Complete function reference
- Configuration — GUC variables for schedule tuning
- Getting Started — First stream table walkthrough
ORM Integration Guides
pg_trickle stream tables are read-only materialized views that refresh automatically. This page documents how to use stream tables from popular Python ORMs — SQLAlchemy and Django ORM.
Key Principles
- Stream tables are read-only. All writes go to the source tables; pg_trickle refreshes stream tables in the background.
- Model stream tables as views, not regular tables. ORMs should never
attempt
INSERT,UPDATE, orDELETEon a stream table. - Internal columns are hidden. The
__pgt_row_idcolumn used for incremental maintenance is excluded fromSELECT *queries.
SQLAlchemy
Read-Only Model Definition
Map a stream table as a read-only model using __table_args__:
from sqlalchemy import Column, Integer, Numeric, String, BigInteger
from sqlalchemy.orm import DeclarativeBase
class Base(DeclarativeBase):
pass
class OrderTotals(Base):
"""Read-only model backed by pg_trickle stream table."""
__tablename__ = "order_totals"
# Map the stream table's row ID as primary key for ORM identity
__pgt_row_id = Column("__pgt_row_id", BigInteger, primary_key=True)
region = Column(String, nullable=False)
order_count = Column(BigInteger, nullable=False)
total = Column(Numeric(10, 2), nullable=False)
__table_args__ = {
"info": {"readonly": True}, # Convention marker
}
Querying
Query stream tables like any other SQLAlchemy model:
from sqlalchemy import select
# All regions
stmt = select(OrderTotals).order_by(OrderTotals.total.desc())
results = session.execute(stmt).scalars().all()
# Filtered
stmt = (
select(OrderTotals)
.where(OrderTotals.order_count > 10)
.where(OrderTotals.region == "East")
)
row = session.execute(stmt).scalar_one_or_none()
Preventing Accidental Writes
Use SQLAlchemy events to block write operations:
from sqlalchemy import event
READONLY_TABLES = {"order_totals", "daily_revenue", "customer_stats"}
@event.listens_for(session, "before_flush")
def block_stream_table_writes(session, flush_context, instances):
for obj in session.new | session.dirty | session.deleted:
table_name = obj.__class__.__tablename__
if table_name in READONLY_TABLES:
raise RuntimeError(
f"Cannot write to stream table '{table_name}'. "
f"Write to the source table instead."
)
Reflecting Stream Tables
If you prefer reflection over explicit models:
from sqlalchemy import MetaData, Table, create_engine
engine = create_engine("postgresql://...")
metadata = MetaData()
# Reflect the stream table (treated as a regular table by PostgreSQL)
order_totals = Table("order_totals", metadata, autoload_with=engine)
# Query
with engine.connect() as conn:
result = conn.execute(order_totals.select().limit(10))
for row in result:
print(row)
Checking Freshness
Query the stream table's metadata to check when it was last refreshed:
from sqlalchemy import text
def get_staleness(session, st_name: str) -> dict:
"""Return freshness info for a stream table."""
result = session.execute(
text("SELECT * FROM pgtrickle.get_staleness(:name)"),
{"name": st_name},
).mappings().one()
return dict(result)
# Usage
staleness = get_staleness(session, "order_totals")
print(f"Last refresh: {staleness['data_timestamp']}")
print(f"Stale for: {staleness['staleness_seconds']}s")
Async SQLAlchemy (2.0+)
Works identically with async_session:
from sqlalchemy.ext.asyncio import AsyncSession
async def get_top_regions(session: AsyncSession, limit: int = 10):
stmt = (
select(OrderTotals)
.order_by(OrderTotals.total.desc())
.limit(limit)
)
result = await session.execute(stmt)
return result.scalars().all()
Django ORM
Read-Only Model Definition
Use managed = False so Django never creates, alters, or drops the table:
# models.py
from django.db import models
class OrderTotals(models.Model):
"""Read-only model backed by pg_trickle stream table."""
region = models.CharField(max_length=255)
order_count = models.BigIntegerField()
total = models.DecimalField(max_digits=10, decimal_places=2)
class Meta:
managed = False # Django will not create/alter this table
db_table = "order_totals"
def save(self, *args, **kwargs):
raise NotImplementedError("Stream tables are read-only")
def delete(self, *args, **kwargs):
raise NotImplementedError("Stream tables are read-only")
Querying
Standard Django QuerySet operations work:
# All regions sorted by total
OrderTotals.objects.all().order_by("-total")
# Filtered
OrderTotals.objects.filter(
order_count__gt=10,
region="East"
).first()
# Aggregation (on the stream table itself)
from django.db.models import Sum, Avg
OrderTotals.objects.aggregate(
total_revenue=Sum("total"),
avg_orders=Avg("order_count"),
)
Django Migrations
Since managed = False, Django migrations won't touch stream tables.
Create stream tables in a custom migration using RunSQL:
# migrations/0003_create_stream_tables.py
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
("myapp", "0002_create_orders_table"),
]
operations = [
migrations.RunSQL(
sql="""
SELECT pgtrickle.create_stream_table(
'order_totals',
$pgt$SELECT region,
COUNT(*) AS order_count,
SUM(amount) AS total
FROM orders GROUP BY region$pgt$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
""",
reverse_sql="""
SELECT pgtrickle.drop_stream_table('order_totals');
""",
),
]
Read-Only Mixin
Create a reusable mixin for all stream table models:
class StreamTableMixin(models.Model):
"""Base class for pg_trickle stream table models."""
class Meta:
abstract = True
managed = False
def save(self, *args, **kwargs):
raise NotImplementedError(
f"{self.__class__.__name__} is a read-only stream table. "
f"Write to the source table instead."
)
def delete(self, *args, **kwargs):
raise NotImplementedError(
f"{self.__class__.__name__} is a read-only stream table."
)
# Usage
class OrderTotals(StreamTableMixin):
region = models.CharField(max_length=255)
order_count = models.BigIntegerField()
total = models.DecimalField(max_digits=10, decimal_places=2)
class Meta(StreamTableMixin.Meta):
db_table = "order_totals"
class DailyRevenue(StreamTableMixin):
day = models.DateField()
revenue = models.DecimalField(max_digits=12, decimal_places=2)
class Meta(StreamTableMixin.Meta):
db_table = "daily_revenue"
Checking Freshness
Use raw SQL to query pg_trickle diagnostics:
from django.db import connection
def get_staleness(st_name: str) -> dict:
"""Return freshness info for a stream table."""
with connection.cursor() as cursor:
cursor.execute(
"SELECT * FROM pgtrickle.get_staleness(%s)", [st_name]
)
columns = [col.name for col in cursor.description]
row = cursor.fetchone()
return dict(zip(columns, row)) if row else {}
Django REST Framework
Stream table models work with DRF serializers and viewsets:
from rest_framework import serializers, viewsets
class OrderTotalsSerializer(serializers.ModelSerializer):
class Meta:
model = OrderTotals
fields = ["region", "order_count", "total"]
class OrderTotalsViewSet(viewsets.ReadOnlyModelViewSet):
"""Read-only API endpoint for order totals stream table."""
queryset = OrderTotals.objects.all()
serializer_class = OrderTotalsSerializer
Common Patterns
Write to Source, Read from Stream
The fundamental pattern: all writes go to source tables (normal ORM models), reads come from stream tables (read-only models).
# Write to source table (normal ORM)
order = Order(region="East", amount=Decimal("99.99"))
session.add(order)
session.commit()
# Read from stream table (auto-refreshed by pg_trickle)
totals = session.execute(
select(OrderTotals).where(OrderTotals.region == "East")
).scalar_one()
print(f"East: {totals.order_count} orders, ${totals.total}")
Handling Eventual Consistency
Stream tables refresh on a schedule (e.g., every 5 seconds). After writing to a source table, the stream table may be briefly stale. Options:
- Accept staleness — suitable for dashboards and reports.
- Force refresh — call
pgtrickle.refresh_stream_table()after critical writes. - Use IMMEDIATE mode — stream table refreshes within the same transaction.
# Option 2: Force refresh after a critical write
session.execute(text(
"SELECT pgtrickle.refresh_stream_table('order_totals')"
))
Further Reading
- SQL Reference — Complete function reference
- Configuration — Schedule tuning and refresh modes
- Getting Started — First stream table walkthrough
- dbt Integration — Using pg_trickle with dbt
Architecture
This document describes the internal architecture of pg_trickle — a PostgreSQL 18 extension that implements stream tables with differential view maintenance. For a high-level description of what pg_trickle does and why, read ESSENCE.md. For release milestones and future plans, see ROADMAP.md.
High-Level Overview
┌─────────────────────────────────────────────────────────────────┐
│ PostgreSQL 18 Backend │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────────┐ │
│ │ Source │ │ Source │ │ Storage │ │ Storage │ │
│ │ Table A │ │ Table B │ │ Table X │ │ Table Y │ │
│ └────┬─────┘ └────┬─────┘ └────▲─────┘ └────▲────────┘ │
│ │ │ │ │ │
│ ═════╪══════════════╪══════════════╪══════════════╪════════ │
│ │ │ │ │ │
│ ┌────▼──────────────▼────┐ ┌────┴──────────────┴────┐ │
│ │ Hybrid CDC Layer │ │ Delta Application │ │
│ │ Triggers ──or── WAL │ │ (INSERT/DELETE diffs) │ │
│ └────────────┬───────────┘ └────────────▲───────────┘ │
│ │ │ │
│ ┌────────────▼───────────┐ ┌────────────┴───────────┐ │
│ │ Change Buffer │ │ DVM Engine │ │
│ │ (pgtrickle_changes.*) │ │ (Operator Tree) │ │
│ └────────────┬───────────┘ └────────────▲───────────┘ │
│ │ │ │
│ └────────────┬───────────────┘ │
│ │ │
│ ┌─────────────────────────▼─────────────────────────────┐ │
│ │ Refresh Engine │ │
│ │ ┌──────────┐ ┌──────────┐ ┌─────────────────────┐ │ │
│ │ │ Frontier │ │ DAG │ │ Scheduler │ │ │
│ │ │ Tracker │ │ Resolver │ │ (canonical schedule)│ │ │
│ │ └──────────┘ └──────────┘ └─────────────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Catalog (pgtrickle.*) │ │
│ │ pgt_stream_tables │ pgt_dependencies │ pgt_refresh_history│ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Monitoring Layer │ │
│ │ st_refresh_stats │ slot_health │ check_cdc_health │ │
│ │ explain_st │ views │ NOTIFY alerting │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Component Details
1. SQL API Layer (src/api.rs)
The public entry point for users. All operations are exposed as #[pg_extern] functions in the pgtrickle schema:
- create_stream_table — Applies a chain of auto-rewrite passes (view inlining → DISTINCT ON → GROUPING SETS → scalar subquery in WHERE → correlated scalar subquery in SELECT → SubLinks in OR → multi-PARTITION BY windows), parses the defining query, builds an operator tree, creates the storage table, registers CDC slots, populates the catalog, and optionally performs an initial full refresh.
- alter_stream_table — Modifies schedule, refresh mode, status (ACTIVE/SUSPENDED), or defining query. Query changes trigger schema migration, dependency updates, and a full refresh within a single transaction.
- drop_stream_table — Removes the storage table, catalog entries, and cleans up CDC slots.
- refresh_stream_table — Triggers a manual refresh (same path as automatic scheduling).
- pgt_status — Returns a summary of all registered stream tables.
2. Catalog (src/catalog.rs)
The catalog manages persistent metadata stored in PostgreSQL tables within the pgtrickle schema:
| Table | Purpose |
|---|---|
pgtrickle.pgt_stream_tables | Core metadata: name, query, schedule, status, frontier, etc. |
pgtrickle.pgt_dependencies | DAG edges from ST to source tables |
pgtrickle.pgt_refresh_history | Audit log of every refresh operation |
pgtrickle.pgt_change_tracking | Per-source CDC slot metadata |
Schema creation is handled by extension_sql!() macros that run at CREATE EXTENSION time.
Entity-Relationship Diagram
erDiagram
pgt_stream_tables {
bigserial pgt_id PK
oid pgt_relid UK "OID of materialized storage table"
text pgt_name
text pgt_schema
text defining_query
text original_query "User's original SQL (pre-inlining)"
text schedule "Duration or cron expression"
text refresh_mode "FULL | DIFFERENTIAL | IMMEDIATE"
text status "INITIALIZING | ACTIVE | SUSPENDED | ERROR"
boolean is_populated
timestamptz data_timestamp "Freshness watermark"
jsonb frontier "DBSP-style version frontier"
timestamptz last_refresh_at
int consecutive_errors
boolean needs_reinit
float8 auto_threshold
float8 last_full_ms
timestamptz created_at
timestamptz updated_at
}
pgt_dependencies {
bigint pgt_id PK,FK "References pgt_stream_tables.pgt_id"
oid source_relid PK "OID of source table"
text source_type "TABLE | STREAM_TABLE | VIEW"
text_arr columns_used "Column-level lineage"
text cdc_mode "TRIGGER | TRANSITIONING | WAL"
text slot_name "Replication slot (WAL mode)"
pg_lsn decoder_confirmed_lsn "WAL decoder progress"
timestamptz transition_started_at "Trigger→WAL transition start"
}
pgt_refresh_history {
bigserial refresh_id PK
bigint pgt_id FK "References pgt_stream_tables.pgt_id"
timestamptz data_timestamp
timestamptz start_time
timestamptz end_time
text action "NO_DATA | FULL | DIFFERENTIAL | REINITIALIZE | SKIP"
bigint rows_inserted
bigint rows_deleted
text error_message
text status "RUNNING | COMPLETED | FAILED | SKIPPED"
text initiated_by "SCHEDULER | MANUAL | INITIAL"
timestamptz freshness_deadline
}
pgt_change_tracking {
oid source_relid PK "OID of tracked source table"
text slot_name "Trigger function name"
pg_lsn last_consumed_lsn
bigint_arr tracked_by_pgt_ids "ST IDs sharing this source"
}
pgt_stream_tables ||--o{ pgt_dependencies : "has sources"
pgt_stream_tables ||--o{ pgt_refresh_history : "has refresh history"
pgt_stream_tables }o--o{ pgt_change_tracking : "tracks via pgt_ids array"
Note: Change buffer tables (
pgtrickle_changes.changes_<oid>) are created dynamically per source table OID and live in the separatepgtrickle_changesschema.
3. CDC / Change Data Capture (src/cdc.rs, src/wal_decoder.rs)
pg_trickle uses a hybrid CDC architecture that starts with triggers and optionally transitions to WAL-based (logical replication) capture for lower write-side overhead.
Trigger Mode (default)
- Trigger Management — Creates
AFTER INSERT OR UPDATE OR DELETErow-level triggers (pg_trickle_cdc_<oid>) on each tracked source table. Each trigger fires a PL/pgSQL function (pg_trickle_cdc_fn_<oid>()) that writes changes to the buffer table. - Change Buffering — Decoded changes are written to per-source change buffer tables in the
pgtrickle_changesschema. Each row captures the LSN (pg_current_wal_lsn()), transaction ID, action type (I/U/D), and the new/old row data as typed columns (new_<col> TYPE,old_<col> TYPE) — native PostgreSQL types, not JSONB. - Cleanup — Consumed changes are deleted after each successful refresh via
delete_consumed_changes(), bounded by the upper LSN to prevent unbounded scans. - Lifecycle — Triggers and trigger functions are automatically created when a source table is first tracked and dropped when the last stream table referencing a source is removed.
The trigger approach was chosen as the default for transaction safety (triggers can be created in the same transaction as DDL), simplicity (no slot management, no wal_level = logical requirement), and immediate visibility (changes are visible in buffer tables as soon as the source transaction commits).
WAL Mode (optional, automatic transition)
When pg_trickle.cdc_mode is set to 'auto' or 'wal' and wal_level = logical is available, the system transitions from trigger-based to WAL-based CDC after the first successful refresh:
- WAL Availability Detection — At stream table creation, checks whether
wal_level = logicalis configured. If so, the source dependency is marked for WAL transition. - WAL Decoder Background Worker — A dedicated background worker (
src/wal_decoder.rs) polls logical replication slots and writes decoded changes into the same change buffer tables used by triggers, ensuring a uniform format for the DVM engine. - Transition Orchestration — The transition is a three-step process: (a) create a replication slot, (b) wait for the decoder to catch up to the trigger's last confirmed LSN, (c) drop the trigger and switch the dependency to WAL mode. If the decoder doesn't catch up within
pg_trickle.wal_transition_timeout(default 300s), the system falls back to triggers. - CDC Mode Tracking — Each source dependency in
pgt_dependenciescarries acdc_modecolumn (TRIGGER / TRANSITIONING / WAL) and WAL-specific metadata (slot_name,decoder_confirmed_lsn,transition_started_at).
See ADR-001 and ADR-002 in plans/adrs/PLAN_ADRS.md for the original design rationale and plans/sql/PLAN_HYBRID_CDC.md for the full implementation plan.
Immediate Mode / Transactional IVM (src/ivm.rs)
When refresh_mode = 'IMMEDIATE', pg_trickle uses statement-level AFTER triggers with transition tables instead of row-level CDC triggers. The stream table is maintained synchronously within the same transaction as the base table DML.
- BEFORE Triggers — Statement-level BEFORE triggers on each base table acquire an advisory lock on the stream table to prevent concurrent conflicting updates.
- AFTER Triggers — Statement-level AFTER triggers with
REFERENCING NEW TABLE AS ... OLD TABLE AS ...copy the transition table data to temp tables, then call the Rustpgt_ivm_apply_delta()function. - Delta Computation — The DVM engine's
Scanoperator reads from the temp tables (viaDeltaSource::TransitionTable) instead of change buffer tables. No LSN filtering or net-effect computation is needed — each trigger invocation represents a single atomic statement. - Delta Application — The computed delta is applied via explicit DML (DELETE + INSERT ON CONFLICT) to the stream table.
- TRUNCATE — A separate AFTER TRUNCATE trigger calls
pgt_ivm_handle_truncate(), which truncates the stream table and re-populates from the defining query.
No change buffer tables, no scheduler involvement, and no WAL infrastructure is needed for IMMEDIATE mode. See plans/sql/PLAN_TRANSACTIONAL_IVM.md for the design plan.
ST-to-ST Change Capture (v0.11.0+)
When a stream table's defining query references another stream table (rather than a base table), neither triggers nor WAL capture apply — the upstream source is itself maintained by pg_trickle. A dedicated ST change buffer mechanism enables downstream stream tables to refresh differentially even when their source is another stream table.
Base Table ──trigger/WAL──▶ changes_<oid> (base-table buffer)
Stream Table A ──refresh──▶ changes_pgt_<pgt_id> (ST buffer for A's consumers)
Stream Table B reads from changes_pgt_<pgt_id> (B depends on A)
Buffer schema. ST change buffers are named pgtrickle_changes.changes_pgt_<pgt_id> (using the internal pgt_id rather than the OID). Unlike base-table buffers, they store only new_* columns — no old_* columns — because ST deltas are expressed as INSERT/DELETE pairs, not UPDATE rows.
Delta capture — DIFFERENTIAL path. When an upstream stream table refreshes in DIFFERENTIAL mode and has downstream consumers, the refresh engine captures the computed delta (the INSERT and DELETE rows applied to the upstream ST) into the ST change buffer via explicit DML. Downstream stream tables then read from this buffer exactly as they would read from a base-table change buffer.
Delta capture — FULL path. When an upstream stream table refreshes in FULL mode (e.g., due to a mode downgrade or full => true), the engine takes a pre-refresh snapshot, executes the full refresh, then computes an EXCEPT ALL diff between the old and new contents. The resulting INSERT/DELETE pairs are written to the ST change buffer. This prevents FULL refreshes from cascading through the entire dependency chain — downstream STs always receive a minimal delta regardless of how the upstream was refreshed.
Frontier tracking. ST source positions are tracked in the same frontier JSONB structure as base-table sources, using pgt_<upstream_pgt_id> as the key (e.g., {"pgt_42": 157}) rather than the OID-based keys used for base tables. The scheduler's has_stream_table_source_changes() function compares the downstream's last-consumed frontier position against the upstream buffer's current maximum LSN to decide whether a refresh is needed.
Lifecycle. ST change buffers are created automatically when a stream table gains its first downstream consumer (create_st_change_buffer_table()), and dropped when the last downstream consumer is removed (drop_st_change_buffer_table()). On upgrade from pre-v0.11.0, existing ST-to-ST dependencies have their buffers auto-created on the first scheduler tick. Consumed rows are cleaned up by cleanup_st_change_buffers_by_frontier() after each successful downstream refresh.
4. DVM Engine (src/dvm/)
The Differential View Maintenance engine is the core of the system. It transforms the defining SQL query into an executable operator tree that can compute deltas efficiently.
Auto-Rewrite Pipeline (src/dvm/parser.rs)
Before the defining query is parsed into an operator tree, it passes through a chain of auto-rewrite passes that normalize SQL constructs the DVM parser doesn't handle directly:
| Pass | Function | Purpose |
|---|---|---|
| #0 | rewrite_views_inline() | Replace view references with (view_definition) AS alias subqueries |
| #1 | rewrite_distinct_on() | Convert DISTINCT ON to ROW_NUMBER() OVER (…) = 1 window subquery |
| #2 | rewrite_grouping_sets() | Decompose GROUPING SETS / CUBE / ROLLUP into UNION ALL of GROUP BY |
| #3 | rewrite_scalar_subquery_in_where() | Convert WHERE col > (SELECT …) to CROSS JOIN |
| #4 | rewrite_sublinks_in_or() | Split WHERE a OR EXISTS (…) into UNION branches |
| #5 | rewrite_multi_partition_windows() | Split multiple PARTITION BY clauses into joined subqueries |
The view inlining pass (#0) runs first so that view definitions containing DISTINCT ON, GROUPING SETS, etc. are further rewritten by downstream passes. Nested views are expanded via a fixpoint loop (max depth 10).
Query Parser (src/dvm/parser.rs)
Parses the defining query using PostgreSQL's internal parser (via pgrx raw_parser) and extracts:
- WITH clause — CTE definitions (non-recursive: inline expansion or shared delta; recursive: detected for mode gating)
- Target list — output columns
- FROM clause — source tables, joins, subqueries, and CTE references
- WHERE clause — filters
- GROUP BY / aggregate functions
- DISTINCT / UNION ALL / INTERSECT / EXCEPT
The parser produces an OpTree — a tree of operator nodes. CTE handling follows a tiered approach:
- Tier 1 (Inline Expansion) — Non-recursive CTEs referenced once are expanded into
Subquerynodes, equivalent to subqueries in FROM. - Tier 2 (Shared Delta) — Non-recursive CTEs referenced multiple times produce
CteScannodes that share a single delta computation via a CTE registry and delta cache. - Tier 3a/3b/3c (Recursive) — Recursive CTEs (
WITH RECURSIVE) are detected viaquery_has_recursive_cte(). In FULL mode, the query executes as-is. In DIFFERENTIAL mode, the strategy is auto-selected: semi-naive evaluation for INSERT-only changes, Delete-and-Rederive (DRed) for mixed changes, or recomputation fallback when CTE columns don't match ST storage or when the recursive term contains non-monotone operators (EXCEPT, Aggregate, Window, DISTINCT, AntiJoin, INTERSECT SET). In IMMEDIATE mode, the same semi-naive / DRed machinery runs against statement transition tables and is bounded bypg_trickle.ivm_recursive_max_depthto guard against unbounded recursion.
Operators (src/dvm/operators/)
Each operator knows how to generate a delta query — given a set of changes to its inputs, it produces the corresponding changes to its output:
| Operator | Delta Strategy |
|---|---|
| Scan | Direct passthrough of CDC changes |
| Filter | Apply WHERE predicate to deltas |
| Project | Apply column projection to deltas |
| Join | Join deltas against the other side's current state |
| OuterJoin | LEFT/RIGHT outer join with NULL padding |
| FullJoin | FULL OUTER JOIN with 8-part delta (both sides may produce NULLs) |
| Aggregate | Recompute group values where affected keys changed |
| Distinct | COUNT-based duplicate tracking |
| UnionAll | Merge deltas from both branches |
| Intersect | Dual-count multiplicity with LEAST boundary crossing |
| Except | Dual-count multiplicity with GREATEST(0, L-R) boundary crossing |
| Subquery | Transparent delegation + optional column renaming (CTEs, subselects) |
| CteScan | Shared delta lookup from CTE cache (multi-reference CTEs) |
| RecursiveCte | Semi-naive / DRed / recomputation for WITH RECURSIVE |
| Window | Partition-based recomputation for window functions |
| LateralFunction | Row-scoped recomputation for SRFs in FROM (jsonb_array_elements, unnest, etc.) |
| LateralSubquery | Row-scoped recomputation for correlated subqueries in LATERAL FROM |
| SemiJoin | EXISTS / IN subquery delta via semi-join |
| AntiJoin | NOT EXISTS / NOT IN subquery delta via anti-join |
| ScalarSubquery | Correlated scalar subquery in SELECT list |
See DVM_OPERATORS.md for detailed descriptions.
Diff Engine (src/dvm/diff.rs)
Generates the final diff SQL that:
- Computes the delta from the operator tree
- Produces
('+', row)for inserts and('-', row)for deletes - Applies the diff via
DELETEmatching old rows andINSERTfor new rows
5. DAG / Dependency Graph (src/dag.rs)
Stream tables can depend on other stream tables (cascading), forming a Directed Acyclic Graph:
- Cycle detection — Detects circular dependencies at creation time using Kahn's algorithm (BFS topological sort). When
pg_trickle.allow_circular = true, monotone cycles (queries using only safe operators — joins, filters, UNION ALL, etc.) are allowed; non-monotone cycles (aggregates, EXCEPT, window functions, anti-joins) are rejected. SCC IDs are automatically assigned to cycle members and recomputed on drop/alter. - SCC decomposition — Tarjan's algorithm decomposes the graph into strongly connected components. Singleton SCCs are acyclic; multi-node SCCs contain cycles that are handled by fixed-point iteration in the scheduler.
- Monotonicity analysis — Static check (
check_monotonicity()insrc/dvm/parser.rs) determines whether a query's operators are safe for cyclic fixed-point iteration. Non-monotone operators (Aggregate, EXCEPT, Window, NOT EXISTS) block cycle creation. - Topological ordering — Determines refresh order: upstream STs must be refreshed before downstream STs.
- Condensation order —
condensation_order()returns SCCs in topological order, grouping cyclic STs for fixed-point iteration. The scheduler'siterate_to_fixpoint()processes multi-node SCCs by refreshing all members repeatedly until convergence (zero net changes) ormax_fixpoint_iterationsis exceeded. - Cascade operations — When a source table changes, all transitive dependents are identified for refresh.
6. Version / Frontier Tracking (src/version.rs)
Implements a per-source frontier (JSONB map of source_oid → LSN) to track exactly how far each stream table has consumed changes:
- Read frontier — Before refresh, read the frontier to know where to start consuming changes.
- Advance frontier — After a successful refresh, the frontier is updated to the latest consumed LSN.
- Consistent snapshots — The frontier ensures that each refresh processes a contiguous, non-overlapping window of changes.
Delayed View Semantics (DVS) Guarantee
The contents of every stream table are logically equivalent to evaluating its defining query at some past point in time — the data_timestamp. The scheduler refreshes STs in topological order so that when ST B references upstream ST A, A has already been refreshed to the target data_timestamp before B runs its delta query against A's contents. The frontier lifecycle is:
- Created — on first full refresh; records the LSN of each source at that moment.
- Advanced — on each differential refresh; the old frontier becomes the lower bound and the new frontier (with fresh LSNs) the upper bound. The DVM engine reads changes in
[old, new]. - Reset — on reinitialize; a fresh frontier is created from scratch.
7. Refresh Engine (src/refresh.rs)
Orchestrates the complete refresh cycle:
┌──────────────┐
│ Check State │ → Is ST active? Has it been populated?
└──────┬───────┘
│
┌─────▼──────┐
│ Drain CDC │ → Read WAL changes into change buffer tables
└─────┬──────┘
│
┌─────▼──────────────┐
│ Determine Action │ → FULL, DIFFERENTIAL, NO_DATA, REINITIALIZE, or SKIP?
│ │ (adaptive: if change ratio > pg_trickle.differential_max_change_ratio,
│ │ downgrade DIFFERENTIAL → FULL automatically)
└─────┬──────────────┘
│
┌─────▼──────┐
│ Execute │ → Full: TRUNCATE + INSERT ... SELECT
│ │ Differential: Generate & apply delta SQL
└─────┬──────┘
│
┌─────▼──────────────┐
│ Record History │ → Write to pgtrickle.pgt_refresh_history
└─────┬──────────────┘
│
┌─────▼──────────────┐
│ Advance Frontier │ → Update JSONB frontier in catalog
└─────┬──────────────┘
│
┌─────▼──────────────┐
│ Reset Error Count │ → On success, reset consecutive_errors to 0
└──────────────────────┘
8. Background Worker & Scheduling (src/scheduler.rs)
Registration & Lifecycle
pg_trickle registers one PostgreSQL background worker — the scheduler — during _PG_init() (extension load). Because it is registered at startup, pg_trickle must appear in shared_preload_libraries, which requires a server restart.
┌──────────────────────────────────────────────────────────────────┐
│ PostgreSQL postmaster │
│ │
│ shared_preload_libraries = 'pg_trickle' │
│ │ │
│ ▼ │
│ _PG_init() │
│ ├─ Register GUCs (pg_trickle.enabled, scheduler_interval_ms …) │
│ ├─ Register shared memory (PgTrickleSharedState, atomics) │
│ └─ BackgroundWorkerBuilder::new("pg_trickle scheduler") │
│ .set_start_time(RecoveryFinished) │
│ .set_restart_time(5s) ← auto-restart on crash │
│ .load() │
│ │
│ After recovery finishes: │
│ │ │
│ ▼ │
│ pg_trickle_scheduler_main() ← background worker starts │
│ ├─ Attach SIGHUP + SIGTERM handlers │
│ ├─ Connect to SPI (database = "postgres") │
│ ├─ Crash recovery: mark stale RUNNING records as FAILED │
│ └─ Enter main loop ─────────────────────────┐ │
│ │ │ │
│ ▼ │ │
│ wait_latch(scheduler_interval_ms) │ │
│ │ │ │
│ ┌───▼───────────────────────────────┐ │ │
│ │ SIGTERM? → log + break │ │ │
│ │ pg_trickle.enabled = false? → skip │ │ │
│ │ Otherwise → scheduler tick │ │ │
│ └───┬───────────────────────────────┘ │ │
│ │ │ │
│ └──────────── loop ────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
Key lifecycle properties:
| Property | Behaviour |
|---|---|
| Start condition | After PostgreSQL recovery finishes (RecoveryFinished) |
| Auto-restart | 5-second delay after an unexpected crash |
| Graceful shutdown | Handles SIGTERM — breaks the main loop and exits cleanly |
| Config reload | Handles SIGHUP — re-reads GUC values on the next latch wake |
| Crash recovery | On startup, any pgt_refresh_history rows stuck in RUNNING status are marked FAILED (the transaction that wrote them was rolled back by PostgreSQL, but the status row may have been committed in a prior transaction) |
| Database | Connects to the postgres database via SPI |
| Standby / replica | On standby servers (pg_is_in_recovery() = true), the worker enters a sleep loop and does not attempt refreshes. Stream tables are still readable on standbys — they are regular heap tables replicated via physical streaming replication. After promotion the scheduler resumes automatically. See the FAQ § Replication for details on logical replication and subscriber limitations. |
Scheduler Tick
Each tick of the main loop performs the following steps inside a single transaction:
- DAG rebuild — Compare the shared-memory
DAG_REBUILD_SIGNALcounter against the local copy. If it advanced (aCREATE,ALTER, orDROPstream table occurred), rebuild the in-memory dependency graph (StDag) from the catalog. - Topological traversal — Walk stream tables in dependency order (upstream before downstream). This ensures that when ST B references ST A, A is refreshed first.
- Per-ST evaluation — For each active ST:
- Skip if in retry backoff (exponential, per-ST).
- Skip if schedule/cron says not yet due.
- Skip if a row-level lock on the catalog entry indicates a concurrent refresh.
- Check upstream change buffers for pending rows.
- Execute refresh — Acquire a row-level lock on the catalog entry → record
RUNNINGin history → runFULL/DIFFERENTIAL/REINITIALIZE→ store new frontier → release lock → record completion. - WAL transitions — Advance any trigger→WAL CDC mode transitions (
src/wal_decoder.rs). - Slot health — Check replication slot health and emit
NOTIFYalerts. - Prune retry state — Remove backoff entries for STs that no longer exist.
Sequential Processing (Default)
By default (parallel_refresh_mode = 'off), the scheduler processes
stream tables sequentially within a single background worker. All STs
are refreshed one at a time in topological order.
pg_trickle.max_concurrent_refreshes (default 4) only prevents a manual
pgtrickle.refresh_stream_table() call from overlapping with the
scheduler on the same ST — it does not spawn additional workers.
The PostgreSQL GUC max_worker_processes (default 8) sets the server-wide budget for all background workers (autovacuum, parallel query, logical replication, extensions). In sequential mode pg_trickle consumes one slot from that budget.
Parallel Refresh (parallel_refresh_mode = 'on')
When enabled, the scheduler builds an execution-unit DAG from the stream-table dependency graph and dispatches independent units to dynamic background workers:
- Execution units — Each independent stream table becomes a singleton unit. Atomic consistency groups and IMMEDIATE-trigger closures are collapsed into composite units that run in a single worker for correctness.
- Ready queue — Units whose upstream dependencies have all
completed enter the ready queue. The coordinator dispatches them
subject to a per-database cap (
max_concurrent_refreshes) and a cluster-wide cap (max_dynamic_refresh_workers). - Dynamic workers — Each dispatched unit spawns a short-lived
background worker via
BackgroundWorkerBuilder::load_dynamic(). Workers claim a job from thepgtrickle.pgt_scheduler_jobscatalog table, execute the refresh, and exit.
The parallel path respects the same topological ordering as the
sequential path — downstream units only become ready after all upstream
units succeed. The worker-budget caps ensure pg_trickle does not exhaust
max_worker_processes.
See PLAN_PARALLELISM.md for the full design and CONFIGURATION.md for tuning guidance.
Retry & Error Handling
Each ST maintains an in-memory RetryState (reset on scheduler restart):
- Retryable errors (SPI failures, lock contention, slot issues) trigger exponential backoff.
- Permanent errors (schema mismatch, user errors) skip backoff but increment
consecutive_errors. - When
consecutive_errorsreachespg_trickle.max_consecutive_errors(default 3), the ST is auto-suspended and aNOTIFYalert is emitted. - Schema errors additionally set
needs_reinit, triggering aREINITIALIZEon the next successful cycle.
Scheduling Policy
Automatic refresh scheduling uses canonical periods (48·2ⁿ seconds, n = 0, 1, 2, …) snapped to the user's schedule:
- Picks the smallest canonical period ≤
schedule. - For DOWNSTREAM schedule (NULL schedule), the ST refreshes only when explicitly triggered or when a downstream ST needs it.
- Advisory locks prevent concurrent refreshes of the same ST.
- The scheduler is driven by the background worker polling at the
pg_trickle.scheduler_interval_msGUC interval.
Shared Memory (src/shmem.rs)
The scheduler background worker and user sessions share a PgTrickleSharedState structure protected by a PgLwLock. Key fields:
| Field | Type | Purpose |
|---|---|---|
dag_version | u64 | Incremented when the ST catalog changes; used by the scheduler to detect when the DAG needs rebuilding. |
scheduler_pid | i32 | PID of the scheduler background worker (0 if not running). |
scheduler_running | bool | Whether the scheduler is active. |
last_scheduler_wake | i64 | Unix timestamp of the last scheduler wake cycle (for monitoring). |
A separate PgAtomic<AtomicU64> named DAG_REBUILD_SIGNAL is incremented by API functions (create, alter, drop) after catalog mutations. The scheduler compares its local copy against the atomic counter to detect when to rebuild its in-memory DAG without holding a lock.
A second PgAtomic<AtomicU64> named CACHE_GENERATION tracks DDL events that may invalidate cached delta or MERGE templates across backends. When DDL hooks fire (view change, ALTER TABLE, function change) or API functions mutate the catalog, CACHE_GENERATION is bumped. Each backend maintains a thread-local generation counter; on the next refresh, if the shared generation has advanced, the backend flushes its delta template cache, MERGE template cache, and explicitly DEALLOCATEs tracked __pgt_merge_* prepared statements before rebuilding local state.
9. DDL Tracking (src/hooks.rs)
Event triggers monitor DDL changes to source tables and functions:
_on_ddl_end— Fires onALTER TABLEto detect column adds/drops/type changes. If a source table used by a ST is altered, the ST'sneeds_reinitflag is set. Also detectsCREATE OR REPLACE FUNCTION/ALTER FUNCTION— if the function appears in a ST'sfunctions_usedcatalog column, the ST is marked for reinit._on_sql_drop— Fires onDROP TABLEto setneeds_reinitfor affected STs. Also detectsDROP FUNCTIONand marks affected STs for reinit.- Function name extraction —
object_identitystrings (e.g.,public.my_func(integer, text)) are parsed to extract the bare function name, which is matched against thefunctions_used TEXT[]column inpgt_stream_tables.
Reinitialization is deferred until the next refresh cycle, which then performs a REINITIALIZE action (drop and recreate the storage table from the updated query).
10. Error Handling (src/error.rs)
Centralized error types using thiserror:
PgTrickleErrorvariants cover catalog access, SQL execution, CDC, DVM, DAG, and config errors.- Each refresh failure increments
consecutive_errors. - When
consecutive_errorsreachespg_trickle.max_consecutive_errors(default 3), the ST is moved toERRORstatus and suspended from automatic refresh. - Manual intervention (
ALTER ... status => 'ACTIVE') resets the counter.
11. Monitoring (src/monitor.rs)
Provides observability functions:
- st_refresh_stats — Aggregate statistics (total/successful/failed refreshes, avg duration, staleness status).
- get_refresh_history — Per-ST audit trail.
- get_staleness — Current staleness in seconds.
- slot_health — Checks replication slot state and WAL retention.
- check_cdc_health — Per-source CDC health status including mode, slot lag, confirmed LSN, and alerts.
- explain_st — Describes the DVM plan for a given ST.
- diamond_groups — Lists detected diamond dependency groups, their members, convergence points, and epoch counters.
- Views —
pgtrickle.stream_tables_info(computed staleness) andpgtrickle.pg_stat_stream_tables(combined stats).
NOTIFY Alerting
Operational events are broadcast via PostgreSQL NOTIFY on the pg_trickle_alert channel. Clients can subscribe with LISTEN pg_trickle_alert; and receive JSON-formatted events:
| Event | Condition |
|---|---|
stale | data staleness exceeds 2× schedule |
auto_suspended | ST suspended after pg_trickle.max_consecutive_errors failures |
reinitialize_needed | Upstream DDL change detected |
slot_lag_warning | Replication slot WAL retention exceeded pg_trickle.slot_lag_warning_threshold_mb |
cdc_transition_complete | Source transitioned from trigger to WAL-based CDC |
cdc_transition_failed | Trigger→WAL transition failed (fell back to triggers) |
refresh_completed | Refresh completed successfully |
refresh_failed | Refresh failed with an error |
12. Row ID Hashing (src/hash.rs)
Provides deterministic 64-bit row identifiers using xxHash (xxh64) with a fixed seed. Two SQL functions are exposed:
pgtrickle.pg_trickle_hash(text)— Hash a single text value; used for simple single-column row IDs.pgtrickle.pg_trickle_hash_multi(text[])— Hash multiple values (separated by a record-separator byte\x1E) for composite keys (join row IDs, GROUP BY keys).
Row IDs are written into every stream table's storage as an internal __pgt_row_id BIGINT column and are used by the delta application phase to match DELETE candidates precisely.
13. Diamond Dependency Consistency (src/dag.rs)
When stream tables form diamond-shaped dependency graphs, a convergence (fan-in) node may read from multiple upstream STs that share a common ancestor:
A (source table)
/ \
B C (intermediate STs)
\ /
D (convergence / fan-in ST)
If B refreshes successfully but C fails, D would read a fresh version of B's data alongside stale data from C — a split-version inconsistency.
Detection
StDag::detect_diamonds() walks all fan-in nodes (STs with multiple upstream ST dependencies) and computes transitive ancestor sets per branch. If two or more branches share ancestors, a diamond is detected. Overlapping diamonds are merged.
Consistency Groups
StDag::compute_consistency_groups() converts detected diamonds into consistency groups — topologically ordered sets of STs that must be refreshed atomically. Each group contains:
- Members — All intermediate STs plus the convergence node, in refresh order.
- Convergence points — The fan-in nodes where multiple paths meet.
- Epoch counter — Advances on each successful atomic refresh.
STs not involved in any diamond are placed in singleton groups (no overhead).
Scheduler Wiring
When diamond_consistency = 'atomic' (per-ST or via the pg_trickle.diamond_consistency GUC):
- The scheduler wraps each multi-member group in a
SAVEPOINT pgt_consistency_group. - Each member is refreshed in topological order within the savepoint.
- If all succeed —
RELEASE SAVEPOINTand advance the group epoch. - If any member fails —
ROLLBACK TO SAVEPOINTundoes all members' changes. The failure is logged and the group retries on the next scheduler tick.
With diamond_consistency = 'none', members refresh independently in topological order — matching pre-feature behavior.
Schedule Policy
The diamond_schedule_policy setting (per-convergence-node or via the pg_trickle.diamond_schedule_policy GUC) controls when an atomic group fires:
| Policy | Trigger condition | Trade-off |
|---|---|---|
'fastest' (default) | Any member is due | Higher freshness, more refreshes |
'slowest' | All members are due | Lower resource cost, staler data |
The policy is set on the convergence (fan-in) node. When multiple convergence nodes exist in the same group (nested diamonds), the strictest policy wins (slowest > fastest). The GUC serves as a cluster-wide fallback for nodes without an explicit per-node setting.
Monitoring
The pgtrickle.diamond_groups() SQL function exposes detected groups for operational visibility. See SQL_REFERENCE.md for details.
14. Configuration (src/config.rs)
Runtime behavior is controlled by a growing set of GUC (Grand Unified Configuration) variables. See CONFIGURATION.md for the complete, current list.
| GUC | Default | Purpose |
|---|---|---|
pg_trickle.enabled | true | Master on/off switch for the scheduler |
pg_trickle.scheduler_interval_ms | 1000 | Scheduler background worker wake interval (ms) |
pg_trickle.min_schedule_seconds | 60 | Minimum allowed schedule |
pg_trickle.max_consecutive_errors | 3 | Errors before auto-suspending a ST |
pg_trickle.change_buffer_schema | pgtrickle_changes | Schema for change buffer tables |
pg_trickle.max_concurrent_refreshes | 4 | Maximum parallel refresh workers |
pg_trickle.differential_max_change_ratio | 0.15 | Change-to-table-size ratio above which DIFFERENTIAL falls back to FULL |
pg_trickle.cleanup_use_truncate | true | Use TRUNCATE instead of DELETE for change buffer cleanup when the entire buffer is consumed |
pg_trickle.user_triggers | 'auto' | User-defined trigger handling: auto / off (on accepted as deprecated alias for auto) |
pg_trickle.block_source_ddl | false | Block column-affecting DDL on tracked source tables instead of reinit |
pg_trickle.cdc_mode | 'auto' | CDC mechanism: auto / trigger / wal |
pg_trickle.wal_transition_timeout | 300 | Max seconds to wait for WAL decoder catch-up during transition |
pg_trickle.slot_lag_warning_threshold_mb | 100 | Warning threshold for WAL slot retention used by slot_lag_warning and health_check() |
pg_trickle.slot_lag_critical_threshold_mb | 1024 | Critical threshold for WAL slot retention used by check_cdc_health() alerts |
pg_trickle.diamond_consistency | 'atomic' | Diamond dependency consistency mode: atomic or none |
pg_trickle.diamond_schedule_policy | 'fastest' | Schedule policy for atomic diamond groups: fastest or slowest |
pg_trickle.merge_planner_hints | true | Inject SET LOCAL planner hints (disable nestloop, raise work_mem) before MERGE |
pg_trickle.merge_work_mem_mb | 64 | work_mem (MB) applied when delta exceeds 10 000 rows and planner hints enabled |
pg_trickle.use_prepared_statements | true | Use SQL PREPARE/EXECUTE for cached MERGE templates |
Data Flow: End-to-End Refresh
Source Table INSERT/UPDATE/DELETE
│
▼
Hybrid CDC Layer:
┌─────────────────────────────────────────────┐
│ TRIGGER mode: Row-Level AFTER Trigger │
│ pg_trickle_cdc_fn_<oid>() → buffer table │
│ │
│ WAL mode: Logical Replication Slot │
│ wal_decoder bgworker → same buffer table │
│ │
│ ST-to-ST: Refresh engine captures delta │
│ → changes_pgt_<pgt_id> buffer table │
└─────────────────────────────────────────────┘
│
▼
Change Buffer Table
Base tables: pgtrickle_changes.changes_<oid>
ST sources: pgtrickle_changes.changes_pgt_<pgt_id>
Columns: change_id, lsn, action (I/U/D), pk_hash, new_<col>, old_<col> (typed)
│
▼
DVM Engine: generate delta SQL from operator tree
- Scan operator reads from changes_<oid> or changes_pgt_<id>
- Filter/Project/Join transform the deltas
- Aggregate recomputes affected groups
│
▼
Diff Engine: produce (+/-) diff rows
│
▼
Delta Application:
DELETE FROM storage WHERE __pgt_row_id IN (removed)
INSERT INTO storage SELECT ... FROM (added)
│
▼
Frontier Update: advance per-source LSN
│
▼
History Record: log to pgtrickle.pgt_refresh_history
Module Map
src/
├── lib.rs # Extension entry, module declarations, _PG_init
├── bin/
│ └── pgrx_embed.rs# pgrx SQL entity embedding (generated)
├── api.rs # SQL API functions (create/alter/drop/refresh/status)
├── catalog.rs # Catalog CRUD operations
├── cdc.rs # Change data capture (triggers + WAL transition)
├── config.rs # GUC variable registration
├── dag.rs # Dependency graph (cycle detection, SCC decomposition, topo sort)
├── error.rs # Centralized error types
├── hash.rs # xxHash row ID generation (pg_trickle_hash / pg_trickle_hash_multi)
├── hooks.rs # DDL event trigger handlers (_on_ddl_end, _on_sql_drop)
├── ivm.rs # Transactional IVM (IMMEDIATE mode: statement-level triggers)
├── shmem.rs # Shared memory state (PgTrickleSharedState, DAG_REBUILD_SIGNAL, CACHE_GENERATION)
├── dvm/
│ ├── mod.rs # DVM module root + recursive CTE orchestration
│ ├── parser.rs # Query → OpTree converter (CTE extraction, subquery, window support)
│ ├── diff.rs # Delta SQL generation (CTE delta cache)
│ ├── row_id.rs # Row ID generation
│ └── operators/
│ ├── mod.rs # Operator trait + registry
│ ├── scan.rs # Table scan (CDC passthrough)
│ ├── filter.rs # WHERE clause filtering
│ ├── project.rs # Column projection
│ ├── join.rs # Inner join
│ ├── join_common.rs # Shared join utilities (snapshot subqueries, column disambiguation)
│ ├── outer_join.rs # LEFT/RIGHT outer join
│ ├── full_join.rs # FULL OUTER JOIN (8-part delta)
│ ├── aggregate.rs # GROUP BY + aggregate functions (39 AggFunc variants)
│ ├── distinct.rs # DISTINCT deduplication
│ ├── union_all.rs # UNION ALL merging
│ ├── intersect.rs # INTERSECT / INTERSECT ALL (dual-count LEAST)
│ ├── except.rs # EXCEPT / EXCEPT ALL (dual-count GREATEST)
│ ├── subquery.rs # Subquery / inlined CTE delegation
│ ├── cte_scan.rs # Shared CTE delta (multi-reference)
│ ├── recursive_cte.rs # Recursive CTE (semi-naive + DRed + recomputation)
│ ├── window.rs # Window function (partition recomputation)
│ ├── lateral_function.rs # LATERAL SRF (row-scoped recomputation)
│ ├── lateral_subquery.rs # LATERAL correlated subquery
│ ├── semi_join.rs # EXISTS / IN subquery (semi-join delta)
│ ├── anti_join.rs # NOT EXISTS / NOT IN subquery (anti-join delta)
│ └── scalar_subquery.rs # Correlated scalar subquery in SELECT
├── monitor.rs # Monitoring & observability functions
├── refresh.rs # Refresh orchestration
├── scheduler.rs # Automatic scheduling with canonical periods
├── version.rs # Frontier / LSN tracking
└── wal_decoder.rs # WAL-based CDC (logical replication slot polling, transitions)
Extension Control File (pg_trickle.control)
The pg_trickle.control file in the repository root is required by PostgreSQL's
extension infrastructure. It declares the extension's description, default
version, shared-library path, and privilege requirements. PostgreSQL reads this
file when CREATE EXTENSION pg_trickle; is executed.
During packaging (cargo pgrx package), pgrx replaces the @CARGO_VERSION@
placeholder with the version from Cargo.toml and copies the file into the
target's share/extension/ directory alongside the SQL migration scripts.
DVM Operators
This document describes the Differential View Maintenance (DVM) operators implemented by pgtrickle. Each operator transforms a stream of row-level changes (deltas) propagated from source tables through the operator tree.
Prior Art
- Budiu, M. et al. (2023). "DBSP: Automatic Incremental View Maintenance." VLDB 2023. (comparison)
- Gupta, A. & Mumick, I.S. (1999). Materialized Views: Techniques, Implementations, and Applications. MIT Press.
- Koch, C. et al. (2014). "DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views." VLDB Journal.
- PostgreSQL 9.4+ — Materialized views with
REFRESH MATERIALIZED VIEW CONCURRENTLY.
Overview
When a stream table is created, the defining SQL query is parsed into a tree of DVM operators. During an differential refresh, changes flow bottom-up through this tree:
Aggregate
│
Project
│
Filter
│
┌───────┴───────┐
Join │
┌─┴─┐ │
Scan(A) Scan(B) Scan(C)
Each operator implements a differentiation rule: given the delta (Δ) to its input(s), it produces the corresponding delta to its output. This is conceptually similar to automatic differentiation in calculus.
The general contract:
- Input: a set of
('+', row)and('-', row)tuples (inserts and deletes) - Output: a set of
('+', row)and('-', row)tuples
Updates are modeled as a delete of the old row followed by an insert of the new row.
DIFFERENTIAL and IMMEDIATE maintenance require deterministic expressions. VOLATILE functions and custom operators such as random() or clock_timestamp() are rejected during stream table creation because re-evaluation would corrupt delta semantics. STABLE functions such as now() and current_timestamp are allowed with a warning; FULL mode accepts all volatility classes because it recomputes the full result on each refresh.
Operator Support Matrix
The following table shows which SQL constructs are supported under each refresh mode.
| SQL Construct | FULL | DIFFERENTIAL | IMMEDIATE | Notes |
|---|---|---|---|---|
| Basic | ||||
Simple SELECT / projection | ✅ | ✅ | ✅ | |
WHERE filter | ✅ | ✅ | ✅ | |
| Column expressions / aliases | ✅ | ✅ | ✅ | |
DISTINCT | ✅ | ✅ | ✅ | Uses __pgt_dup_count reference counting |
DISTINCT ON | ✅ | ✅ | ✅ | |
| Joins | ||||
INNER JOIN | ✅ | ✅ | ✅ | Hybrid delta strategy |
LEFT OUTER JOIN | ✅ | ✅ | ✅ | NULL-padding transitions tracked |
RIGHT OUTER JOIN | ✅ | ✅ | ✅ | |
FULL OUTER JOIN | ✅ | ✅ | ✅ | 8-part UNION ALL delta |
CROSS JOIN | ✅ | ✅ | ✅ | |
LATERAL JOIN | ✅ | ✅ | ✅ | Row-scoped re-execution |
| Multi-table join (≤2 right scans) | ✅ | ✅ | ✅ | Full phantom-row-after-DELETE fix |
| Multi-table join (≥3 right scans) | ✅ | ⚠️ | ⚠️ | Falls back to post-change snapshot for right subtree (EC-01 boundary, fix planned for v0.12.0) |
| Subqueries | ||||
EXISTS / IN (semi-join) | ✅ | ✅ | ✅ | Delta-key pre-filter on left side |
NOT EXISTS / NOT IN (anti-join) | ✅ | ✅ | ✅ | Inverted semantics; two-part delta |
| Scalar subquery (SELECT-list) | ✅ | ✅ | ✅ | Pre/post snapshot EXCEPT ALL diff |
Correlated LATERAL subquery | ✅ | ✅ | ✅ | |
| Set Operations | ||||
UNION ALL | ✅ | ✅ | ✅ | Dual-branch merge |
INTERSECT / INTERSECT ALL | ✅ | ✅ | ✅ | Dual-count tracking |
EXCEPT / EXCEPT ALL | ✅ | ✅ | ✅ | |
| Aggregates | ||||
COUNT, SUM, AVG | ✅ | ✅ | ✅ | Algebraic — fully invertible delta |
MIN, MAX | ✅ | ✅ | ✅ | Semi-algebraic — group rescan on ambiguous delete |
COUNT(DISTINCT), SUM(DISTINCT) | ✅ | ✅ | ✅ | Algebraic via auxiliary columns |
BOOL_AND, BOOL_OR, BIT_AND, BIT_OR | ✅ | ✅ | ✅ | Algebraic via auxiliary columns |
EVERY | ✅ | ✅ | ✅ | Algebraic via auxiliary columns |
STRING_AGG, ARRAY_AGG | ✅ | ⚠️ | ⚠️ | Group-rescan strategy — warning emitted at creation time in DIFFERENTIAL mode |
STDDEV, VARIANCE, STDDEV_POP, VAR_POP | ✅ | ✅ | ✅ | Algebraic via auxiliary M2/sum/count columns |
COVAR_SAMP, COVAR_POP, CORR | ✅ | ✅ | ✅ | Algebraic via auxiliary columns |
REGR_* (all 9 regression functions) | ✅ | ✅ | ✅ | Algebraic via auxiliary columns |
PERCENTILE_CONT, PERCENTILE_DISC | ✅ | ⚠️ | ⚠️ | Group-rescan strategy |
MODE | ✅ | ⚠️ | ⚠️ | Group-rescan strategy |
XMLAGG, JSON_AGG, JSONB_AGG | ✅ | ⚠️ | ⚠️ | Group-rescan strategy |
JSON_OBJECT_AGG, JSONB_OBJECT_AGG | ✅ | ⚠️ | ⚠️ | Group-rescan strategy |
GROUP BY / HAVING | ✅ | ✅ | ✅ | |
GROUP BY ROLLUP / CUBE / GROUPING SETS | ✅ | ✅ | ✅ | Branch count capped by max_grouping_set_branches (default 64) |
| Window Functions | ||||
ROW_NUMBER, RANK, DENSE_RANK | ✅ | ✅ | ✅ | Partition-scoped recompute |
LAG, LEAD, FIRST_VALUE, LAST_VALUE | ✅ | ✅ | ✅ | Partition-scoped recompute |
NTILE, CUME_DIST, PERCENT_RANK | ✅ | ✅ | ✅ | Partition-scoped recompute |
Window frame clauses (ROWS, RANGE, GROUPS) | ✅ | ✅ | ✅ | |
| CTEs | ||||
Non-recursive WITH | ✅ | ✅ | ✅ | Inlined or delta-cached (multi-ref) |
WITH RECURSIVE (INSERT-only workload) | ✅ | ✅ | ✅ | Semi-naive evaluation |
WITH RECURSIVE (mixed INSERT/DELETE/UPDATE) | ✅ | ✅ | ✅ | Delete-and-Rederive (DRed) strategy |
| TopK | ||||
ORDER BY … LIMIT N | ✅ | ✅ | ✅ | Scoped recomputation; metadata validated each refresh |
ORDER BY … LIMIT N OFFSET M | ✅ | ✅ | ✅ | |
| Lateral / SRF | ||||
LATERAL with set-returning function | ✅ | ✅ | ✅ | Row-scoped re-execution |
JSON_TABLE | ✅ | ✅ | ✅ | Via lateral function operator |
generate_series() | ✅ | ✅ | ✅ | |
unnest() | ✅ | ✅ | ✅ | |
| ST-to-ST Dependencies | ||||
| Stream table reading from another stream table | ✅ | ✅ | ✅ | Differential via changes_pgt_ buffers (v0.11.0); FULL upstream produces I/D diff so downstream stays differential |
| Multi-level ST chains | ✅ | ✅ | ✅ | Topological order; per-level delta propagation |
| Function Volatility | ||||
IMMUTABLE functions | ✅ | ✅ | ✅ | |
STABLE functions (now(), current_timestamp) | ✅ | ⚠️ | ⚠️ | Allowed with warning — value may differ between initial load and delta evaluation |
VOLATILE functions (random(), clock_timestamp()) | ✅ | ❌ | ❌ | Rejected at creation time — re-evaluation corrupts delta semantics |
Legend: ✅ = fully supported — ⚠️ = supported with caveats (see Notes column) — ❌ = not supported (blocked at creation time)
Operators
Scan
Module: src/dvm/operators/scan.rs
The leaf operator. Reads CDC changes from a source table's change buffer.
Delta Rule:
$$\Delta(\text{Scan}(R)) = \Delta R$$
The scan operator is a direct passthrough — inserts in the source become inserts in the output, deletes become deletes.
SQL Generation:
SELECT op, row_data FROM pgtrickle_changes.changes_<oid>
WHERE xid >= <last_consumed_xid>
Notes:
- Each source table has a dedicated change buffer table created by the CDC module.
- Row data is stored as JSONB with column names as keys.
- The
__pgt_row_idcolumn (xxHash of primary key) is included for deduplication.
Filter
Module: src/dvm/operators/filter.rs
Applies a WHERE clause predicate to the delta stream.
Delta Rule:
$$\Delta(\sigma_p(R)) = \sigma_p(\Delta R)$$
Filtering is applied to the deltas in the same way as to the base data — only rows satisfying the predicate pass through.
SQL Generation:
SELECT * FROM (<input_delta>) AS d
WHERE <predicate>
Example:
If the defining query is:
SELECT * FROM orders WHERE status = 'shipped'
And a new row (id=5, status='pending') is inserted, it does not appear in the delta output. If (id=3, status='shipped') is inserted, it passes through.
Edge Cases:
- For updates that change the predicate column (e.g.,
statusfrom'pending'to'shipped'), the CDC produces a delete of the old row and insert of the new row. The filter passes the insert (matches) and blocks the delete (doesn't match the old row against the predicate), correctly resulting in a net insert.
Project
Module: src/dvm/operators/project.rs
Applies column projection from the target list.
Delta Rule:
$$\Delta(\pi_L(R)) = \pi_L(\Delta R)$$
Projects the same columns from the delta that the query projects from the base data.
SQL Generation:
SELECT <target_columns> FROM (<input_delta>) AS d
Notes:
- Projection is applied after filtering for efficiency.
- Computed expressions in the target list (e.g.,
price * quantity AS total) are evaluated on the delta rows.
Join (Inner)
Module: src/dvm/operators/join.rs
Implements inner join between two inputs.
Delta Rule:
For $R \bowtie S$:
$$\Delta(R \bowtie S) = (\Delta R \bowtie S) \cup (R' \bowtie \Delta S)$$
Where $R' = R \cup \Delta R$ (the new state of R after applying deltas).
In practice, when only one side has changes (common case), the delta join simplifies to joining the changed rows against the current state of the other side.
SQL Generation:
-- Changes to left side joined with current right side
SELECT '+' AS op, l.*, r.*
FROM (<left_delta> WHERE op = '+') AS l
JOIN <right_table> AS r ON <join_condition>
UNION ALL
-- Current left side joined with changes to right side
SELECT '+' AS op, l.*, r.*
FROM <left_table> AS l
JOIN (<right_delta> WHERE op = '+') AS r ON <join_condition>
(And corresponding DELETE queries for op = '-'.)
Notes:
- The join uses the current state of the non-changed side, not the change buffer.
- For equi-joins, this is efficient — the join key narrows the scan.
- Non-equi joins (theta joins) may require broader scans.
Outer Join
Module: src/dvm/operators/outer_join.rs (LEFT JOIN), src/dvm/operators/full_join.rs (FULL JOIN)
Implements LEFT, RIGHT, and FULL OUTER JOIN.
RIGHT JOIN Handling:
RIGHT JOIN is automatically converted to a LEFT JOIN with swapped left/right operands during query parsing. This normalization happens transparently — the user can write RIGHT JOIN and the parser rewrites it to an equivalent LEFT JOIN before the operator tree is constructed.
Delta Rule:
Similar to inner join, but additionally handles NULL-padded rows:
$$\Delta(R \text{ LEFT JOIN } S) = (\Delta R \bowtie_L S) \cup (R' \bowtie_L \Delta S)$$
With special handling for:
- Rows in ΔR that have no match in S → emit
('+', row, NULLs) - Rows in ΔS that create a first match for an R row → emit
('-', row, NULLs)and('+', row, s_data) - Rows in ΔS that remove the last match for an R row → emit
('-', row, s_data)and('+', row, NULLs)
SQL Generation (LEFT JOIN):
Uses anti-join detection (via NOT EXISTS) to correctly handle the NULL padding transitions.
FULL OUTER JOIN Delta Rule:
FULL OUTER JOIN extends the LEFT JOIN delta with symmetric right-side handling. The delta is computed as an 8-part UNION ALL:
- Parts 1–5: Same as LEFT JOIN delta (inserted/deleted rows from both sides, with NULL-padding transitions)
- Parts 6–7: Symmetric anti-join transitions for the right side (rows in ΔL that remove/create the last/first match for an S row)
- Part 8: Right-side insertions that have no match in the left side → emit
('+', NULLs, s_data)
Each part uses pre-computed delta flags (__has_ins_*, __has_del_*) to efficiently detect first-match/last-match transitions without redundant subqueries.
Nested Join Support:
Module: src/dvm/operators/join_common.rs
All join operators (inner, left, full) support nested children — i.e., a join whose left or right operand is itself another join. The join_common module provides shared helpers:
build_snapshot_sql()— returns the table reference for simple (Scan) operands, or a parenthesized subquery with disambiguated columns for nested join operandsrewrite_join_condition()— rewrites column references in ON conditions to use the correct alias prefixes for nested children (e.g.,o.cust_id→dl.o__cust_id)
This enables queries with 3 or more joined tables, e.g.:
SELECT o.id, c.name, p.title
FROM orders o
JOIN customers c ON o.cust_id = c.id
JOIN products p ON o.prod_id = p.id
Limitations:
- FULL OUTER JOIN delta computation can be expensive due to dual-side NULL tracking (8 UNION ALL parts).
- Performance degrades with high-cardinality join keys.
NATURAL JOINis supported — common columns are resolved automatically and synthesized into an explicit equi-join condition.- EC-01 pre-change snapshot boundary (SF-5): The phantom-row-after-DELETE
fix (EC-01) uses
EXCEPT ALLto reconstruct the pre-change state of a join side. This is limited to join subtrees with ≤ 2 scan nodes to avoid PostgreSQL temporary file exhaustion on wide join chains. For queries with ≥ 3 base tables on one side of a join (e.g. TPC-H Q7/Q8/Q9), a simultaneous DELETE on both join sides may leave a phantom row in the stream table until the next full refresh. Seeuse_pre_change_snapshot()injoin_common.rsfor the full rationale.
Aggregate
Module: src/dvm/operators/aggregate.rs
Handles GROUP BY with aggregate functions (COUNT, SUM, AVG, MIN, MAX, BOOL_AND, BOOL_OR, STRING_AGG, ARRAY_AGG, JSON_AGG, JSONB_AGG, BIT_AND, BIT_OR, BIT_XOR, JSON_OBJECT_AGG, JSONB_OBJECT_AGG, STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP, MODE, PERCENTILE_CONT, PERCENTILE_DISC, JSON_ARRAYAGG, JSON_OBJECTAGG) and the FILTER (WHERE …) and WITHIN GROUP (ORDER BY …) clauses.
Delta Rule:
$$\Delta(\gamma_{G, \text{agg}}(R)) = \gamma_{G, \text{agg}}(R' \text{ WHERE } G \in \text{affected_keys}) - \gamma_{G, \text{agg}}(R \text{ WHERE } G \in \text{affected_keys})$$
Where:
- $G$ = grouping columns
affected_keys= the set of group key values that appear in ΔR- $R'$ = $R \cup \Delta R$ (the new state)
Strategy:
- Identify affected groups — Collect all group key values that appear in the delta (either inserted or deleted rows).
- Recompute old values — Query the storage table for current aggregate values of affected groups.
- Recompute new values — Query the updated source for new aggregate values of affected groups.
- Diff — For each affected group:
- If old exists and new differs → emit
('-', old)and('+', new) - If old exists and new is gone → emit
('-', old)(group eliminated) - If no old and new exists → emit
('+', new)(new group appeared)
- If old exists and new differs → emit
Supported Aggregate Functions:
| Function | DVM Strategy | Notes |
|---|---|---|
COUNT(*) | Algebraic | Fully differential |
COUNT(expr) | Algebraic | Fully differential |
SUM(expr) | Algebraic | Fully differential |
AVG(expr) | Algebraic | Decomposed to SUM/COUNT internally |
MIN(expr) | Semi-algebraic | Uses LEAST merge; falls back to per-group rescan when min row is deleted |
MAX(expr) | Semi-algebraic | Uses GREATEST merge; falls back to per-group rescan when max row is deleted |
BOOL_AND(expr) | Group-rescan | Affected groups are re-aggregated from source data |
BOOL_OR(expr) | Group-rescan | Affected groups are re-aggregated from source data |
STRING_AGG(expr, sep) | Group-rescan | Affected groups are re-aggregated from source data |
ARRAY_AGG(expr) | Group-rescan | Affected groups are re-aggregated from source data |
JSON_AGG(expr) | Group-rescan | Affected groups are re-aggregated from source data |
JSONB_AGG(expr) | Group-rescan | Affected groups are re-aggregated from source data |
BIT_AND(expr) | Group-rescan | Affected groups are re-aggregated from source data |
BIT_OR(expr) | Group-rescan | Affected groups are re-aggregated from source data |
BIT_XOR(expr) | Group-rescan | Affected groups are re-aggregated from source data |
JSON_OBJECT_AGG(key, value) | Group-rescan | Affected groups are re-aggregated from source data |
JSONB_OBJECT_AGG(key, value) | Group-rescan | Affected groups are re-aggregated from source data |
STDDEV_POP(expr) / STDDEV(expr) | Group-rescan | Affected groups are re-aggregated from source data |
STDDEV_SAMP(expr) | Group-rescan | Affected groups are re-aggregated from source data |
VAR_POP(expr) | Group-rescan | Affected groups are re-aggregated from source data |
VAR_SAMP(expr) / VARIANCE(expr) | Group-rescan | Affected groups are re-aggregated from source data |
MODE() WITHIN GROUP (ORDER BY expr) | Group-rescan | Ordered-set aggregate; affected groups re-aggregated |
PERCENTILE_CONT(frac) WITHIN GROUP (ORDER BY expr) | Group-rescan | Ordered-set aggregate; affected groups re-aggregated |
PERCENTILE_DISC(frac) WITHIN GROUP (ORDER BY expr) | Group-rescan | Ordered-set aggregate; affected groups re-aggregated |
CORR(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
COVAR_POP(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
COVAR_SAMP(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_AVGX(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_AVGY(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_COUNT(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_INTERCEPT(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_R2(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_SLOPE(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_SXX(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_SXY(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
REGR_SYY(Y, X) | Group-rescan | Regression aggregate; affected groups re-aggregated |
ANY_VALUE(expr) | Group-rescan | PostgreSQL 16+; affected groups re-aggregated |
JSON_ARRAYAGG(expr ...) | Group-rescan | SQL-standard JSON aggregation (PostgreSQL 16+); full deparsed SQL preserved |
JSON_OBJECTAGG(key: value ...) | Group-rescan | SQL-standard JSON aggregation (PostgreSQL 16+); full deparsed SQL preserved |
User-defined aggregates (CREATE AGGREGATE) | Group-rescan | Any custom aggregate is supported via group-rescan; full aggregate call SQL preserved verbatim |
FILTER Clause:
All aggregate functions support the FILTER (WHERE …) clause:
SELECT COUNT(*) FILTER (WHERE status = 'active') AS active_count FROM orders GROUP BY region
The filter predicate is applied within the delta computation — only rows matching the filter contribute to the aggregate delta. Filtered aggregates are excluded from the P5 direct-bypass optimization.
SQL Generation:
The aggregate operator uses a 3-CTE pipeline:
- Merge CTE — Joins affected group keys against old (storage) and new (source) aggregate values, producing
__pgt_meta_action('I' for new-only groups, 'D' for disappeared groups, 'U' for changed groups). - LATERAL VALUES expansion — A single-pass
LATERAL (VALUES ...)clause expands each merge row into insert and delete actions, avoiding a 4-branch UNION ALL:
FROM merge_cte m,
LATERAL (VALUES
('I', m.new_count, m.new_total),
('D', m.old_count, m.old_total)
) v(action, count_val, val_total)
WHERE (m.__pgt_meta_action = 'I' AND v.action = 'I')
OR (m.__pgt_meta_action = 'D' AND v.action = 'D')
OR (m.__pgt_meta_action = 'U')
- Final projection — Emits
('+', row)and('-', row)tuples for the refresh engine.
MIN/MAX Merge Strategy:
MIN and MAX use a semi-algebraic strategy with two cases:
-
Non-extremum deletion — When the deleted row is NOT the current minimum (or maximum), the merge uses
LEAST(old_value, new_inserts)for MIN orGREATEST(old_value, new_inserts)for MAX. This is fully algebraic and requires no rescan. -
Extremum deletion — When the row holding the current minimum (or maximum) IS deleted, the new value cannot be computed from the delta alone. The merge expression returns
NULLas a sentinel, which triggers the change-detection guard (IS DISTINCT FROM) to emit the group for re-aggregation. The MERGE layer treats this as a DELETE + INSERT pair, recomputing the group from source data. This is still more efficient than a full table refresh since only affected groups are rescanned.
Distinct
Module: src/dvm/operators/distinct.rs
Implements SELECT DISTINCT using reference counting.
Delta Rule:
$$\Delta(\delta(R)) = { r \in \Delta R : \text{count}(r, R) = 0 \land \text{count}(r, R') > 0 } - { r \in \Delta R : \text{count}(r, R) > 0 \land \text{count}(r, R') = 0 }$$
In other words:
- A row enters the output when its count transitions from 0 to ≥1
- A row leaves the output when its count transitions from ≥1 to 0
Strategy:
Maintains a hidden __pgt_dup_count column in the storage table to track how many times each distinct row appears in the pre-distinct input.
- On insert: increment count. If count was 0, emit
('+', row). - On delete: decrement count. If count becomes 0, emit
('-', row).
Notes:
- The duplicate count is not visible in user queries against the storage table (projected away by the view layer).
- Duplicate counting uses
__pgt_row_id(xxHash) for efficient lookups.
Union All
Module: src/dvm/operators/union_all.rs
Merges deltas from two branches.
Delta Rule:
$$\Delta(R \cup_{\text{all}} S) = \Delta R \cup_{\text{all}} \Delta S$$
Simply concatenates the delta streams from both branches.
SQL Generation:
SELECT * FROM (<left_delta>)
UNION ALL
SELECT * FROM (<right_delta>)
Notes:
- Column count and types must match between branches.
- Each branch is independently processed through its own operator sub-tree.
- This is the simplest operator since
UNION ALLpreserves all duplicates.
Intersect
Module: src/dvm/operators/intersect.rs
Implements INTERSECT and INTERSECT ALL using dual-count per-branch multiplicity tracking.
Delta Rule:
$$\Delta(R \cap S): \text{emit rows where } \min(\text{count}_L, \text{count}_R) \text{ crosses the 0 boundary}$$
- INTERSECT (set): a row is present when both branches contain it.
- INTERSECT ALL (bag): a row appears $\min(\text{count}_L, \text{count}_R)$ times.
SQL Generation (3-CTE chain):
- Delta CTE — tags rows from left/right child deltas with branch indicator (
'L'/'R') and computes per-row net_count. - Merge CTE — joins with the storage table to compute old and new per-branch counts (
__pgt_count_l,__pgt_count_r). - Final CTE — detects boundary crossings using
LEAST(old_count_l, old_count_r)vsLEAST(new_count_l, new_count_r).
Notes:
- Storage table requires hidden columns
__pgt_count_land__pgt_count_rfor multiplicity tracking. - Both set and bag variants use the same 3-CTE structure; only the boundary logic stays the same (both use LEAST).
Except
Module: src/dvm/operators/except.rs
Implements EXCEPT and EXCEPT ALL using dual-count per-branch multiplicity tracking.
Delta Rule:
$$\Delta(R - S): \text{emit rows where } \max(0, \text{count}_L - \text{count}_R) \text{ crosses the 0 boundary}$$
- EXCEPT (set): a row is present when it exists in the left but not the right branch.
- EXCEPT ALL (bag): a row appears $\max(0, \text{count}_L - \text{count}_R)$ times.
SQL Generation (3-CTE chain):
- Delta CTE — same as Intersect: tags rows from both child deltas with branch indicator.
- Merge CTE — joins with storage table for old/new per-branch counts.
- Final CTE — detects boundary crossings using
GREATEST(0, old_count_l - old_count_r)vsGREATEST(0, new_count_l - new_count_r).
Notes:
- EXCEPT is not commutative — left branch is the positive input, right is subtracted.
- Storage table requires hidden columns
__pgt_count_land__pgt_count_r. - Same 3-CTE structure as Intersect with different effective-count function.
Subquery
Module: src/dvm/operators/subquery.rs
Handles both inlined CTEs and explicit subqueries in FROM ((SELECT ...) AS alias).
Delta Rule:
$$\Delta(\rho_{\text{alias}}(Q)) = \rho_{\text{alias}}(\Delta Q)$$
A subquery wrapper is transparent for differentiation — it delegates to its child's delta and optionally renames output columns to match the subquery's column aliases.
SQL Generation:
-- If column aliases differ from child output columns:
SELECT __pgt_row_id, __pgt_action, child_col1 AS alias_col1, child_col2 AS alias_col2
FROM (<child_delta>)
If the child columns already match the aliases, the subquery is a pure passthrough — no additional CTE is emitted.
Notes:
- This operator enables both CTE support (Tier 1) and standalone subqueries in FROM.
- Column aliases on subqueries (
FROM (...) AS x(a, b)) are handled by emitting a thin renaming CTE. - The subquery body is fully differentiated as a normal operator sub-tree.
CTE Scan (Shared Delta)
Module: src/dvm/operators/cte_scan.rs
Handles multi-reference CTEs by computing the CTE body's delta once and reusing it across all references (Tier 2).
Delta Rule:
$$\Delta(\text{CteScan}(\text{id}, Q)) = \text{cache}[\text{id}] \quad \text{(computed once, reused)}$$
When a CTE is referenced multiple times in a query, each reference produces a CteScan node with the same cte_id. The diff engine differentiates the CTE body once and caches the result. Subsequent CteScan nodes for the same CTE reuse the cached delta.
SQL Generation:
-- First reference: differentiates the CTE body and stores result in cache
-- Subsequent references: point to the same system CTE name
SELECT __pgt_row_id, __pgt_action, <columns>
FROM __pgt_cte_<cte_name>_delta -- shared across all references
If column aliases are present, a thin renaming CTE is added on top of the cached delta.
Notes:
- Without CteScan (Tier 1), multi-reference CTEs are inlined: each reference duplicates the full operator sub-tree. CteScan (Tier 2) eliminates this duplication.
- The CTE body is pre-differentiated in dependency order (earlier CTEs before later ones that reference them).
- Column alias support follows the same pattern as the Subquery operator.
Recursive CTEs
Recursive CTEs (WITH RECURSIVE) are supported in FULL, DIFFERENTIAL, and IMMEDIATE modes, with different execution paths depending on the refresh mode:
FULL Mode
Recursive CTEs work out-of-the-box with refresh_mode = 'FULL'. The defining query is executed as-is via INSERT INTO ... SELECT ..., and PostgreSQL handles the iterative evaluation internally.
DIFFERENTIAL Mode (Three-Strategy Incremental Maintenance)
Recursive CTEs with refresh_mode = 'DIFFERENTIAL' use an automatic three-strategy approach, selected based on column compatibility and change type:
Strategy 1: Semi-Naive Evaluation (INSERT-only changes)
When only INSERT changes are present in the change buffer, pg_trickle uses semi-naive evaluation — the standard technique for incremental fixpoint computation. The base case is differentiated normally through the DVM operator tree, then the resulting delta is propagated through the recursive term using a nested WITH RECURSIVE:
WITH RECURSIVE
__pgt_base_delta AS (
-- Normal DVM differentiation of the base case (INSERT rows only)
<differentiated base case>
),
__pgt_rec_delta AS (
-- Seed: base case delta rows
SELECT cols FROM __pgt_base_delta WHERE __pgt_action = 'I'
UNION ALL
-- Seed: new base rows joining existing ST storage
SELECT cols FROM <recursive term with self_ref = ST_storage, base = change_buffer>
UNION ALL
-- Propagation: recursive term applied to growing delta
SELECT cols FROM <recursive term with self_ref = __pgt_rec_delta, base = full>
)
SELECT pgtrickle.pg_trickle_hash(...) AS __pgt_row_id, 'I' AS __pgt_action, cols
FROM __pgt_rec_delta
The cost is proportional to the number of new rows produced by the change, not the full result set.
Strategy 2: Delete-and-Rederive / DRed (mixed INSERT/DELETE/UPDATE changes)
When the change buffer contains DELETE or UPDATE changes, simple propagation is insufficient — a deleted base row may have transitively derived many recursive rows, some of which may still be derivable from alternative paths. DRed handles this in four phases:
- Insert propagation — semi-naive evaluation for the INSERT portion (same as Strategy 1)
- Over-deletion cascade — propagate base-case deletions through the recursive term against ST storage to find all transitively-derived rows that might be invalidated
- Rederivation — re-execute the recursive CTE from the remaining (non-deleted) base rows to restore any over-deleted rows that have alternative derivations
- Combine — final delta = inserts + (over-deletions − rederived rows)
This avoids full recomputation while correctly handling deletions with alternative derivation paths.
IMMEDIATE Mode
Recursive CTEs with refresh_mode = 'IMMEDIATE' use the same semi-naive and
Delete-and-Rederive machinery as DIFFERENTIAL mode, but the base changes come
from PostgreSQL statement transition tables instead of the background change
buffer. This keeps the stream table transactionally up to date within the same
statement. To guard against cyclic data or unexpectedly deep recursion, the
semi-naive SQL injects a depth counter capped by
pg_trickle.ivm_recursive_max_depth (default 100; set to 0 to disable the
guard).
Strategy 3: Recomputation Fallback
When the CTE defines more columns than the outer SELECT projects (column mismatch), the incremental strategies cannot be used because the ST storage table lacks columns needed for recursive self-joins. In this case, the full defining query is re-executed and anti-joined against current storage:
WITH __pgt_recomp_new AS (
SELECT pgtrickle.pg_trickle_hash(row_to_json(sub)::text) AS __pgt_row_id, col1, col2, ...
FROM (<defining_query>) sub
),
__pgt_recomp_ins AS (
SELECT n.__pgt_row_id, 'I'::text AS __pgt_action, n.col1, n.col2, ...
FROM __pgt_recomp_new n
LEFT JOIN <storage_table> s ON s.__pgt_row_id = n.__pgt_row_id
WHERE s.__pgt_row_id IS NULL
),
__pgt_recomp_del AS (
SELECT s.__pgt_row_id, 'D'::text AS __pgt_action, s.col1, s.col2, ...
FROM <storage_table> s
LEFT JOIN __pgt_recomp_new n ON n.__pgt_row_id = s.__pgt_row_id
WHERE n.__pgt_row_id IS NULL
)
SELECT * FROM __pgt_recomp_ins
UNION ALL
SELECT * FROM __pgt_recomp_del
The cost is proportional to the full result set size.
Strategy Selection
| CTE columns match ST? | Change type | refresh_mode / DeltaSource | Strategy |
|---|---|---|---|
| ✅ Match | INSERT-only | DIFFERENTIAL (ChangeBuffer) | Semi-naive (Strategy 1) |
| Match | Mixed (INSERT+DELETE/UPDATE) | DIFFERENTIAL (ChangeBuffer) | DRed (Strategy 2) |
| Match | INSERT-only | IMMEDIATE (TransitionTable) | Semi-naive (Strategy 1) |
| Match | Mixed (INSERT+DELETE/UPDATE) | IMMEDIATE (TransitionTable) | DRed (Strategy 2) |
| Mismatch | Any | Any | Recomputation (Strategy 3) |
DRed in DIFFERENTIAL mode (P2-1 -- implemented in v0.10.0)
DRed is now active in both DIFFERENTIAL and IMMEDIATE modes when CTE output columns match ST storage columns. Phase 1 propagates inserts via semi-naive evaluation; Phase 2 cascades deletions through ST storage; Phase 3 rederives over-deleted rows that have alternative derivation paths; Phase 4 combines the results. DRed correctly handles derived-column changes such as path rebuilds under a renamed ancestor node. Column-mismatch cases still use recomputation fallback.
Notes:
- Non-linear recursion (multiple self-references in the recursive term) is rejected — PostgreSQL restricts the recursive term to reference the CTE at most once.
- The
__pgt_row_idcolumn (xxHash of the JSON-serialized row) is used for row identity. - For write-heavy workloads on very large recursive result sets with frequent mixed changes,
refresh_mode = 'FULL'may still be more efficient than DRed.
Window Functions
Module: src/dvm/operators/window.rs
Handles window functions (ROW_NUMBER, RANK, DENSE_RANK, SUM() OVER, etc.) using partition-based recomputation.
Delta Rule:
When any row in a partition changes (insert, update, or delete), the entire partition's window function output is recomputed:
$$\Delta(\omega_{f, P}(R)) = \omega_{f, P}(R'|{\text{affected partitions}}) - \omega{f, P}(R|_{\text{affected partitions}})$$
Where $P$ is the PARTITION BY key and $f$ is the window function.
Strategy:
- Identify affected partition keys from the child delta.
- Delete old window function results for affected partitions from storage.
- Build the current input for affected partitions by excluding changed rows via NOT EXISTS on pass-through columns.
- Recompute the window function on the current input for affected partitions.
- Compute unique row IDs via
row_to_json+row_number(handles tied values in ranking functions). - Emit the recomputed rows as inserts.
SQL Generation:
-- CTE 1: Affected partition keys from delta
WITH affected_partitions AS (
SELECT DISTINCT <partition_cols> FROM (<child_delta>)
),
-- CTE 2: Current input (surviving rows not in delta) for affected partitions
current_input AS (
SELECT * FROM <child_snapshot>
WHERE (<partition_cols>) IN (SELECT * FROM affected_partitions)
AND NOT EXISTS (
SELECT 1 FROM (<child_delta>) d
WHERE d.<col1> IS NOT DISTINCT FROM <child_alias>.<col1>
AND d.<col2> IS NOT DISTINCT FROM <child_alias>.<col2> ...
)
),
-- CTE 3: Recompute window function with unique row IDs
recomputed AS (
SELECT *, pgtrickle.pg_trickle_hash(
row_to_json(w)::text || '/' || row_number() OVER ()::text
) AS __pgt_row_id
FROM (
SELECT *, <window_func> OVER (PARTITION BY <partition_cols> ORDER BY <order_cols>) AS <alias>
FROM current_input
) w
)
-- Delete old results + insert recomputed results
SELECT 'D' AS __pgt_action, ... -- old rows from affected partitions
UNION ALL
SELECT 'I' AS __pgt_action, ... -- recomputed rows
Notes:
- The cost is proportional to the size of affected partitions, not the full table. For workloads where changes spread across few partitions, this is efficient.
- When multiple window functions use different PARTITION BY clauses, the parser accepts all of them. If they share the same partition key it is used directly; otherwise the operator falls back to un-partitioned (full) recomputation.
- Without PARTITION BY, the entire table is treated as a single partition — any change triggers a full recomputation.
- Window functions wrapping aggregates (e.g.,
RANK() OVER (ORDER BY SUM(x))) are supported: the window diff rewrites ORDER BY / PARTITION BY expressions to reference aggregate output aliases viabuild_agg_alias_map. - Row IDs are computed from the full row content (
row_to_json) plus a positional disambiguator (row_number) to avoid hash collisions with tied ranking values (DENSE_RANK, RANK).
Known Limitation: O(partition_size) Recomputation Cost
Any single-row change within a window partition triggers recomputation of the entire partition. For queries with large partitions (e.g.,
PARTITION BY regionwhere a region has 500K rows), a single INSERT into that partition causes all 500K rows to be recomputed and diffed. This is inherent to the partition-based delta strategy — window functions cannot be incrementally maintained at sub-partition granularity because a single row insertion can shift the rank, row number, or running aggregate of every other row in the same partition.Mitigation strategies:
- Use more granular
PARTITION BYkeys to keep partition sizes small.- For queries without
PARTITION BY, consider restructuring as aGROUP BYaggregate if the window function is equivalent (e.g.,SUM(x) OVER ()→SUM(x)as a scalar subquery).- Accept the cost for low-change-frequency partitions; the recomputation is still cheaper than a full table refresh since only affected partitions are touched.
- If partition sizes routinely exceed 100K rows and changes are frequent, consider the FULL refresh mode which bypasses the per-partition delta entirely.
Window Frame Clauses:
Window frame specifications are fully supported:
- Modes:
ROWS,RANGE,GROUPS - Bounds:
UNBOUNDED PRECEDING,N PRECEDING,CURRENT ROW,N FOLLOWING,UNBOUNDED FOLLOWING - Between syntax:
BETWEEN <start> AND <end> - Exclusion:
EXCLUDE CURRENT ROW,EXCLUDE GROUP,EXCLUDE TIES,EXCLUDE NO OTHERS
Example: SUM(val) OVER (ORDER BY ts ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
Named WINDOW Clauses:
Named window definitions are resolved from the query-level WINDOW clause:
SELECT id, SUM(val) OVER w, AVG(val) OVER w
FROM data
WINDOW w AS (PARTITION BY category ORDER BY ts)
The parser resolves OVER w by looking up the window definition from the WINDOW clause and merging partition, order, and frame specifications.
Lateral Function (Set-Returning Functions in FROM)
Module: src/dvm/operators/lateral_function.rs
Handles set-returning functions (SRFs) used in the FROM clause with implicit LATERAL semantics: jsonb_array_elements, jsonb_each, jsonb_each_text, unnest, etc.
Delta Rule:
When a source row changes (insert, update, or delete), the SRF expansion is re-evaluated only for that source row:
$$\Delta(R \ltimes f(R.\text{col})) = (R' \ltimes f(R'.\text{col}))|{\text{changed rows}} - (R \ltimes f(R.\text{col}))|{\text{changed rows}}$$
Where $R$ is the source table, $f$ is the SRF, and changed rows are identified via the child delta.
Strategy (Row-Scoped Recomputation):
- Propagate the child delta to identify changed source rows.
- Find all existing ST rows derived from changed source rows (via column matching).
- Delete old SRF expansions for those source rows.
- Re-expand the SRF for inserted/updated source rows.
- Emit deletes + inserts as the final delta.
SQL Generation (4-CTE chain):
-- CTE 1: Changed source rows from child delta
WITH lat_changed AS (
SELECT DISTINCT "__pgt_row_id", "__pgt_action", <child_cols>
FROM <child_delta>
),
-- CTE 2: Old ST rows for changed source rows (to be deleted)
lat_old AS (
SELECT st."__pgt_row_id", st.<all_output_cols>
FROM <st_table> st
WHERE EXISTS (
SELECT 1 FROM lat_changed cs
WHERE st.<col1> IS NOT DISTINCT FROM cs.<col1>
AND st.<col2> IS NOT DISTINCT FROM cs.<col2>
...
)
),
-- CTE 3: Re-expand SRF for inserted/updated source rows
lat_expand AS (
SELECT pg_trickle_hash(<all_cols>::text) AS "__pgt_row_id",
cs.<child_cols>, <srf_alias>.<srf_cols>
FROM lat_changed cs,
LATERAL <srf_function>(cs.<arg>) AS <srf_alias>
WHERE cs."__pgt_action" = 'I'
),
-- CTE 4: Final delta
lat_final AS (
SELECT "__pgt_row_id", 'D' AS "__pgt_action", <cols> FROM lat_old
UNION ALL
SELECT "__pgt_row_id", 'I' AS "__pgt_action", <cols> FROM lat_expand
)
Row Identity:
Content-based: hash(child_columns || srf_result_columns). This is stable as long as the same source row produces the same expanded values.
Supported SRFs:
| Function | Output Columns | Notes |
|---|---|---|
jsonb_array_elements(jsonb) | value (jsonb) | Expands JSONB array to rows |
jsonb_array_elements_text(jsonb) | value (text) | Text variant |
jsonb_each(jsonb) | key (text), value (jsonb) | Expands JSONB object to key-value pairs |
jsonb_each_text(jsonb) | key (text), value (text) | Text variant |
unnest(anyarray) | Element type | Unnests PostgreSQL arrays |
| Custom SRFs | User-provided column aliases | AS alias(col1, col2) |
Notes:
- The cost is proportional to the number of changed source rows × average SRF expansion size, not the full table.
WITH ORDINALITYis supported — adds abigintordinality column to the output.ROWS FROM()with multiple functions is not supported (rejected at parse time).- Column aliases (e.g.,
AS child(value)) are used to determine output column names; for known SRFs without aliases, the alias name becomes the column name. - JSON_TABLE (PostgreSQL 17+) —
JSON_TABLE(expr, path COLUMNS (...))is modeled as aLateralFunctionand uses the same row-scoped recomputation strategy. Supported column types: regular, EXISTS, formatted, and nested columns withON ERROR/ON EMPTYbehaviors andPASSINGclauses.
Lateral Subquery (Correlated Subqueries in FROM)
Module: src/dvm/operators/lateral_subquery.rs
Handles correlated subqueries used in the FROM clause with explicit or implicit LATERAL semantics: FROM t, LATERAL (SELECT ... WHERE ref = t.col) AS alias or FROM t LEFT JOIN LATERAL (...) AS alias ON true.
Delta Rule:
When an outer row changes, the correlated subquery is re-executed only for that row:
$$\Delta(R \ltimes Q(R)) = (R' \ltimes Q(R'))|{\text{changed rows}} - (R \ltimes Q(R))|{\text{changed rows}}$$
Where $R$ is the outer table, $Q(R)$ is the correlated subquery, and changed rows are identified via the child delta.
Strategy (Row-Scoped Recomputation):
- Propagate the child delta to identify changed outer rows.
- Find all existing ST rows derived from changed outer rows (via column matching with
IS NOT DISTINCT FROM). - Delete old subquery expansions for those outer rows.
- Re-execute the subquery for inserted/updated outer rows using the original outer alias.
- Emit deletes + inserts as the final delta.
SQL Generation (4-CTE chain):
-- CTE 1: Changed outer rows from child delta
WITH lat_sq_changed AS (
SELECT DISTINCT "__pgt_row_id", "__pgt_action", <child_cols>
FROM <child_delta>
),
-- CTE 2: Old ST rows for changed outer rows (to be deleted)
lat_sq_old AS (
SELECT st."__pgt_row_id", st.<all_output_cols>
FROM <st_table> st
WHERE EXISTS (
SELECT 1 FROM lat_sq_changed cs
WHERE st.<col1> IS NOT DISTINCT FROM cs.<col1>
AND st.<col2> IS NOT DISTINCT FROM cs.<col2>
...
)
),
-- CTE 3: Re-execute subquery for inserted/updated outer rows
lat_sq_expand AS (
SELECT pg_trickle_hash(<all_cols>::text) AS "__pgt_row_id",
<outer_alias>.<child_cols>, <sub_alias>.<sub_cols>
FROM lat_sq_changed AS <outer_alias>, -- Original outer alias!
LATERAL (<subquery_sql>) AS <sub_alias>
WHERE <outer_alias>."__pgt_action" = 'I'
),
-- CTE 4: Final delta
lat_sq_final AS (
SELECT "__pgt_row_id", 'D' AS "__pgt_action", <cols> FROM lat_sq_old
UNION ALL
SELECT "__pgt_row_id", 'I' AS "__pgt_action", <cols> FROM lat_sq_expand
)
LEFT JOIN LATERAL Handling:
For queries using LEFT JOIN LATERAL (...) ON true, the expand CTE uses LEFT JOIN LATERAL instead of comma syntax and wraps subquery columns in COALESCE for hash stability:
lat_sq_expand AS (
SELECT pg_trickle_hash(<outer_cols>::text || '/' || COALESCE(<sub_cols>::text, '')) AS "__pgt_row_id",
<outer_alias>.<child_cols>, <sub_alias>.<sub_cols>
FROM lat_sq_changed AS <outer_alias>
LEFT JOIN LATERAL (<subquery_sql>) AS <sub_alias> ON true
WHERE <outer_alias>."__pgt_action" = 'I'
)
Row Identity:
Content-based: hash(outer_columns || '/' || subquery_result_columns). For LEFT JOIN with NULL results, COALESCE ensures a stable hash.
Supported Patterns:
| Pattern | Syntax | Notes |
|---|---|---|
| Top-N per group | LATERAL (SELECT ... ORDER BY ... LIMIT N) | Most common use case |
| Correlated aggregate | LATERAL (SELECT SUM(x) FROM t WHERE t.fk = p.pk) | Returns single row per outer row |
| Existence with data | LEFT JOIN LATERAL (...) ON true | Preserves outer rows with NULLs |
| Multi-column lookup | LATERAL (SELECT a, b FROM t WHERE t.fk = p.pk LIMIT 1) | Multiple derived values |
| GROUP BY inside subquery | LATERAL (SELECT type, COUNT(*) FROM t WHERE t.fk = p.pk GROUP BY type) | Multiple rows per outer row |
Key Design Decision: Outer Alias Rewriting
The subquery body contains column references to the outer table (e.g., WHERE li.order_id = o.id). In the expansion CTE, the changed-sources CTE is aliased with the original outer table alias (e.g., lat_sq_changed AS o) so that the subquery's column references resolve naturally without rewriting.
Notes:
- The cost is proportional to the number of changed outer rows × average subquery result size, not the full table.
- The subquery is stored as raw SQL (like
LateralFunction) because it cannot be independently differentiated — it depends on outer row context. - Source table OIDs referenced by the subquery body are extracted at parse time for CDC trigger setup.
- ORDER BY + LIMIT inside the subquery are valid (they apply per-outer-row, not to the stream table).
Semi-Join (EXISTS / IN Subquery)
Module: src/dvm/operators/semi_join.rs
Handles WHERE EXISTS (SELECT ... FROM ...) and WHERE col IN (SELECT ...) patterns. The parser transforms these into a SemiJoin operator with a left (outer) child, a right (inner) child, and a join condition.
Delta Rule:
$$\Delta(L \ltimes R) = \Delta L|{R} + L|{\Delta R \text{ causes existence change}}$$
- Part 1: Outer rows that changed and still satisfy the semi-join condition.
- Part 2: Existing outer rows whose semi-join result flipped due to inner changes (a matching inner row was inserted or deleted).
Strategy (Two-Part Delta):
- Part 1 (outer delta): Filter
delta_leftto rows that have at least one match in the current right-hand snapshot. - Part 2 (inner delta): For each row in the left snapshot, check whether the existence of matching right-hand rows changed between the old and current state. Emit
'I'if a match appeared,'D'if all matches disappeared.
The "old" right-hand state is reconstructed from the current state by reversing the delta: R_old = (R_current EXCEPT ALL delta_right(action='I')) UNION ALL delta_right(action='D').
Row Identity:
- Part 1: Uses
__pgt_row_idfrom the left delta. - Part 2: Content-based hash via
pg_trickle_hash_multion left-side columns.
Supported Patterns:
| Pattern | SQL | Notes |
|---|---|---|
EXISTS | WHERE EXISTS (SELECT 1 FROM t WHERE t.fk = s.pk) | Direct semi-join |
IN (subquery) | WHERE id IN (SELECT fk FROM t) | Rewritten to EXISTS with equality |
| Multiple conditions | WHERE EXISTS (... AND ...) | Additional predicates in subquery WHERE |
Anti-Join (NOT EXISTS / NOT IN Subquery)
Module: src/dvm/operators/anti_join.rs
Handles WHERE NOT EXISTS (SELECT ... FROM ...) and WHERE col NOT IN (SELECT ...) patterns. The inverse of the semi-join operator.
Delta Rule:
$$\Delta(L \triangleright R) = \Delta L|{\neg R} + L|{\Delta R \text{ causes existence change}}$$
- Part 1: Outer rows that changed and have no match in the right-hand snapshot.
- Part 2: Existing outer rows whose anti-join result flipped due to inner changes.
Strategy (Two-Part Delta):
- Part 1 (outer delta): Filter
delta_leftto rows withNOT EXISTSin the current right snapshot. - Part 2 (inner delta): For each row in the left snapshot, detect existence changes. Emit
'D'if a match appeared (row no longer qualifies),'I'if all matches disappeared (row now qualifies).
Note the inverted semantics compared to semi-join: a new match means deletion, losing all matches means insertion.
Row Identity: Same as semi-join.
Supported Patterns:
| Pattern | SQL | Notes |
|---|---|---|
NOT EXISTS | WHERE NOT EXISTS (SELECT 1 FROM t WHERE t.fk = s.pk) | Direct anti-join |
NOT IN (subquery) | WHERE id NOT IN (SELECT fk FROM t) | Rewritten to NOT EXISTS with equality |
Scalar Subquery (Correlated SELECT Subquery)
Module: src/dvm/operators/scalar_subquery.rs
Handles scalar subqueries appearing in the SELECT list, e.g., SELECT a, (SELECT max(x) FROM t) AS mx FROM s. The subquery must return exactly one row and one column.
Delta Rule:
$$\Delta(L \times q) = \Delta L \times q' + L \times (q' - q)$$
Where $q$ is the scalar subquery value and $q'$ is the updated value.
Strategy (Two-Part Delta):
- Part 1 (outer delta): Propagate the child delta, appending the current scalar subquery value to each row.
- Part 2 (scalar value change): When the scalar subquery's result changes, emit deletes for all existing outer rows (with the old scalar value) and re-inserts for all outer rows (with the new value). The old scalar value is reconstructed by reversing the inner delta.
SQL Generation (3 or 4 CTEs):
-- Part 1: child delta + current scalar value
WITH sq_outer AS (
SELECT *, (<scalar_subquery>) AS "<alias>"
FROM <child_delta>
),
-- Part 2a: DELETE all outer rows when scalar changed
sq_del AS (
SELECT "__pgt_row_id", 'D' AS "__pgt_action", <cols>
FROM <st_table>
WHERE (<scalar_old>) IS DISTINCT FROM (<scalar_current>)
),
-- Part 2b: INSERT all outer rows with new scalar value
sq_ins AS (
SELECT pg_trickle_hash_multi(...) AS "__pgt_row_id",
'I' AS "__pgt_action", <cols>, (<scalar_current>) AS "<alias>"
FROM <source_snapshot>
WHERE (<scalar_old>) IS DISTINCT FROM (<scalar_current>)
)
-- Final: UNION ALL of all parts
SELECT * FROM sq_outer
UNION ALL SELECT * FROM sq_del
UNION ALL SELECT * FROM sq_ins
Row Identity:
- Part 1:
__pgt_row_idfrom the child delta. - Part 2: Content-based hash via
pg_trickle_hash_multion all output columns.
Notes:
- The scalar subquery is stored as raw SQL (deparsed from the parse tree).
- The old scalar value is approximated using the same
EXCEPT ALL / UNION ALLreversal technique as semi/anti-join. - If the scalar subquery references a table that changes, all outer rows must be re-evaluated — the delta can be large.
- Source OIDs used by the scalar subquery are captured at parse time for CDC trigger registration.
Operator Tree Construction
The DVM engine builds the operator tree by analyzing the parsed query:
- WITH clause → CTE definitions extracted into a name→body map (non-recursive) or CTE registry (multi-reference)
- FROM clause →
Scannodes for physical tables;Subquerynodes for inlined CTEs and subqueries in FROM;CteScannodes for multi-reference CTEs;LateralFunctionnodes for SRFs and JSON_TABLE in FROM;LateralSubquerynodes for correlated subqueries in FROM - JOIN →
JoinorOuterJoinwrapping two sub-trees - LATERAL SRFs →
LateralFunctionwrapping the left-hand FROM item as its child - LATERAL subqueries →
LateralSubquerywrapping the left-hand FROM item as its child (comma syntax or JOIN LATERAL) - WHERE subqueries →
SemiJoinforEXISTS/IN (subquery),AntiJoinforNOT EXISTS/NOT IN (subquery), extracted from the WHERE clause - Scalar subqueries →
ScalarSubqueryfor(SELECT ...)in the SELECT list, wrapping the child tree - WHERE →
Filterwrapping the scan/join tree (remaining non-subquery predicates) - SELECT list →
Projectfor column selection and expressions - GROUP BY →
Aggregatewrapping the filtered/projected tree - DISTINCT →
Distincton top - UNION ALL →
UnionAllcombining two complete sub-trees - INTERSECT / EXCEPT →
IntersectorExceptcombining two sub-trees with dual-count tracking - Window functions →
Windowwrapping the sub-tree with PARTITION BY / ORDER BY metadata - ORDER BY → silently discarded (storage row order is undefined)
- LIMIT / OFFSET →
ORDER BY + LIMIT [+ OFFSET]is accepted as TopK (scoped recomputation); standaloneLIMITorOFFSETwithoutORDER BYis rejected
For recursive CTEs (WITH RECURSIVE), the query is parsed into an OpTree with RecursiveCte operator nodes. In DIFFERENTIAL mode, the strategy (semi-naive, DRed, or recomputation) is selected automatically based on column compatibility and change type — see the Recursive CTEs section above for details.
The tree is then traversed bottom-up during delta generation: each operator's generate_delta_sql() method composes its SQL fragment around the output of its child operator(s).
Further Reading
- ARCHITECTURE.md — System-wide component overview
- SQL_REFERENCE.md — Complete function reference
- CONFIGURATION.md — GUC tuning guide
pg_trickle — Benchmark Guide
This document explains how the database-level refresh benchmarks work and how to interpret their output.
Overview
The benchmark suite in tests/e2e_bench_tests.rs measures wall-clock refresh time for FULL vs DIFFERENTIAL mode across a matrix of table sizes, change rates, and query complexities. Each benchmark spawns an isolated PostgreSQL 18.x container via Testcontainers, ensuring reproducible and interference-free measurements.
The core question the benchmarks answer:
How much faster is an DIFFERENTIAL refresh compared to a FULL refresh, given a specific workload?
Prerequisites
Build the E2E test Docker image before running any benchmarks:
./tests/build_e2e_image.sh
Docker must be running on the host.
Running Benchmarks
All benchmark tests are tagged #[ignore] so they are skipped during normal CI. The --nocapture flag is required to see the printed output tables.
Quick Spot Checks (~5–10 seconds each)
# Simple scan, 10K rows, 1% change rate
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_scan_10k_1pct
# Aggregate query, 100K rows, 1% change rate
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_aggregate_100k_1pct
# Join + aggregate, 100K rows, 10% change rate
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_join_agg_100k_10pct
Zero-Change Latency (~5 seconds)
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_no_data_refresh_latency
Full Matrix (~15–30 minutes)
Runs all 30 combinations and prints a consolidated summary:
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture bench_full_matrix
Run All Benchmarks in Parallel
cargo test --test e2e_bench_tests --features pg18 -- --ignored --nocapture
Note: each test starts its own container, so parallel execution requires sufficient Docker resources.
Benchmark Dimensions
Table Sizes
| Size | Rows | Purpose |
|---|---|---|
| Small | 10,000 | Fast iteration; measures per-row overhead |
| Medium | 100,000 | More realistic; reveals scaling characteristics |
Change Rates
| Rate | Description |
|---|---|
| 1% | Low churn — the sweet spot for incremental refresh |
| 10% | Moderate churn — tests delta query scalability |
| 50% | High churn — stress test; approaches full-refresh cost |
Query Complexities
| Scenario | Defining Query | Operators Tested |
|---|---|---|
| scan | SELECT id, region, category, amount, score FROM src | Table scan only |
| filter | SELECT id, region, amount FROM src WHERE amount > 5000 | Scan + filter (WHERE) |
| aggregate | SELECT region, SUM(amount), COUNT(*) FROM src GROUP BY region | Scan + group-by aggregate |
| join | SELECT s.id, s.region, s.amount, d.region_name FROM src s JOIN dim d ON ... | Scan + inner join |
| join_agg | SELECT d.region_name, SUM(s.amount), COUNT(*) FROM src s JOIN dim d ON ... GROUP BY ... | Scan + join + aggregate |
DML Mix per Cycle
Each change cycle applies a realistic mix of operations:
| Operation | Fraction | Example at 10K rows, 10% rate |
|---|---|---|
| UPDATE | 70% | 700 rows have amount incremented |
| DELETE | 15% | 150 rows removed |
| INSERT | 15% | 150 new rows added |
What Each Benchmark Does
1. Start a fresh PostgreSQL 18.x container
2. Install the pg_trickle extension
3. Create and populate the source table (10K or 100K rows)
4. Create dimension table if needed (for join scenarios)
5. ANALYZE for stable query plans
── FULL mode ──
6. Create a Stream Table in FULL refresh mode
7. For each of 3 cycles:
a. Apply random DML (updates + deletes + inserts)
b. ANALYZE
c. Time the FULL refresh (TRUNCATE + re-execute entire query)
d. Record refresh_ms and ST row count
8. Drop the FULL-mode ST
── DIFFERENTIAL mode ──
9. Reset source table to same starting state
10. Create a Stream Table in DIFFERENTIAL refresh mode
11. For each of 3 cycles:
a. Apply random DML (same parameters)
b. ANALYZE
c. Time the DIFFERENTIAL refresh (delta query + MERGE)
d. Record refresh_ms and ST row count
12. Print results table and summary
Both modes start from the same data to ensure a fair comparison. The 3-cycle design captures warm-up effects (cycle 1 may be slower due to plan caching).
Reading the Output
Detail Table
╔══════════════════════════════════════════════════════════════════════════════════════╗
║ pg_trickle Refresh Benchmark Results ║
╠════════════╤══════════╤════════╤═════════════╤═══════╤════════════╤═════════════════╣
║ Scenario │ Rows │ Chg % │ Mode │ Cycle │ Refresh ms │ ST Rows ║
╠════════════╪══════════╪════════╪═════════════╪═══════╪════════════╪═════════════════╣
║ aggregate │ 10000 │ 1% │ FULL │ 1 │ 22.1 │ 5 ║
║ aggregate │ 10000 │ 1% │ FULL │ 2 │ 4.8 │ 5 ║
║ aggregate │ 10000 │ 1% │ FULL │ 3 │ 5.3 │ 5 ║
║ aggregate │ 10000 │ 1% │ DIFFERENTIAL │ 1 │ 8.4 │ 5 ║
║ aggregate │ 10000 │ 1% │ DIFFERENTIAL │ 2 │ 4.4 │ 5 ║
║ aggregate │ 10000 │ 1% │ DIFFERENTIAL │ 3 │ 4.6 │ 5 ║
╚════════════╧══════════╧════════╧═════════════╧═══════╧════════════╧═════════════════╝
| Column | Meaning |
|---|---|
| Scenario | Query complexity level (scan, filter, aggregate, join, join_agg) |
| Rows | Number of rows in the base table |
| Chg % | Percentage of rows changed per cycle |
| Mode | FULL (truncate + recompute) or DIFFERENTIAL (delta + merge) |
| Cycle | Which of the 3 measurement rounds (cycle 1 often includes warm-up) |
| Refresh ms | Wall-clock time for the refresh operation |
| ST Rows | Row count in the Stream Table after refresh (sanity check) |
Summary Table
┌─────────────────────────────────────────────────────────────────────────┐
│ Summary (avg ms per cycle) │
├────────────┬──────────┬────────┬─────────────────┬──────────────────────┤
│ Scenario │ Rows │ Chg % │ FULL avg ms │ DIFFERENTIAL avg ms │
├────────────┼──────────┼────────┼─────────────────┼──────────────────────┤
│ aggregate │ 10000 │ 1% │ 10.7 │ 5.8 ( 1.8x) │
└────────────┴──────────┴────────┴─────────────────┴──────────────────────┘
The Speedup value in parentheses is FULL avg / DIFFERENTIAL avg — how many times faster the incremental refresh is compared to a full refresh.
Interpreting the Speedup
What to Expect
| Change Rate | Table Size | Expected Speedup | Explanation |
|---|---|---|---|
| 1% | 10K | 1.5–5x | Small table; overhead is similar, delta is tiny |
| 1% | 100K | 5–50x | Larger table amplifies full-refresh cost |
| 10% | 100K | 2–10x | Moderate delta; still significantly faster |
| 50% | any | 1–2x | Delta is nearly as large as full table |
Rules of Thumb
| Speedup | Interpretation |
|---|---|
| > 10x | Strong win for DIFFERENTIAL — typical at low change rates on larger tables |
| 5–10x | Clear advantage for DIFFERENTIAL |
| 2–5x | Moderate advantage — DIFFERENTIAL is the right choice |
| 1–2x | Marginal gain — either mode is acceptable |
| ~1x | Break-even — change rate is too high for incremental to help |
| < 1x | DIFFERENTIAL is slower — would indicate overhead exceeds savings (investigate) |
Key Patterns to Look For
-
Scaling with table size: For the same change rate, speedup should increase with table size. FULL must re-process all rows; DIFFERENTIAL processes only the delta.
-
Degradation with change rate: As change rate rises from 1% → 50%, speedup should decrease. At 50%, DIFFERENTIAL processes half the table which approaches FULL cost.
-
Query complexity amplifies speedup: Aggregate and join queries benefit more from DIFFERENTIAL because they avoid expensive re-computation. A join_agg at 1% changes should show higher speedup than a simple scan at the same parameters.
-
Cycle 1 warm-up: The first cycle in each mode may be slower due to PostgreSQL plan cache population. Use cycles 2–3 for the steadiest numbers.
-
ST Rows consistency: The ST row count should be similar between FULL and DIFFERENTIAL for the same scenario (accounting for random DML). Large discrepancies indicate a correctness issue.
Zero-Change Latency
The bench_no_data_refresh_latency test measures the overhead of a refresh when no data has changed — the NO_DATA code path.
┌──────────────────────────────────────────────┐
│ NO_DATA Refresh Latency (10 iterations) │
├──────────────────────────────────────────────┤
│ Avg: 3.21 ms │
│ Max: 5.10 ms │
│ Target: < 10 ms │
│ Status: ✅ PASS │
└──────────────────────────────────────────────┘
| Metric | Meaning |
|---|---|
| Avg | Average wall-clock time across 10 no-op refreshes |
| Max | Worst-case single iteration |
| Target | The PLAN.md goal: < 10 ms per no-op refresh |
| Status | PASS if avg < 10 ms, SLOW otherwise |
A passing result confirms the scheduler's per-cycle overhead is negligible. Values > 10 ms in containerized environments may be acceptable due to Docker overhead; bare-metal PostgreSQL should comfortably meet the target.
Available Tests
Individual Tests (10K rows)
| Test Name | Scenario | Change Rate |
|---|---|---|
bench_scan_10k_1pct | scan | 1% |
bench_scan_10k_10pct | scan | 10% |
bench_scan_10k_50pct | scan | 50% |
bench_filter_10k_1pct | filter | 1% |
bench_aggregate_10k_1pct | aggregate | 1% |
bench_join_10k_1pct | join | 1% |
bench_join_agg_10k_1pct | join_agg | 1% |
Individual Tests (100K rows)
| Test Name | Scenario | Change Rate |
|---|---|---|
bench_scan_100k_1pct | scan | 1% |
bench_scan_100k_10pct | scan | 10% |
bench_scan_100k_50pct | scan | 50% |
bench_aggregate_100k_1pct | aggregate | 1% |
bench_aggregate_100k_10pct | aggregate | 10% |
bench_join_agg_100k_1pct | join_agg | 1% |
bench_join_agg_100k_10pct | join_agg | 10% |
Special Tests
| Test Name | Description |
|---|---|
bench_full_matrix | All 30 combinations (5 queries × 2 sizes × 3 rates) |
bench_no_data_refresh_latency | Zero-change overhead (10 iterations) |
Nexmark Streaming Benchmark
The Nexmark benchmark validates correctness against a sustained high-frequency DML workload modelling an online auction system. It is adapted from the Nexmark benchmark specification used by streaming systems like Flink, Feldera, and Materialize.
Data Model
| Table | Description | Default Size |
|---|---|---|
person | Registered users (sellers/bidders) | 100 rows |
auction | Items listed for sale | 500 rows |
bid | Bids placed on auctions | 2,000 rows |
Queries
| Query | Features | Description |
|---|---|---|
| Q0 | Passthrough | Identity projection of all bids |
| Q1 | Projection + arithmetic | Currency conversion |
| Q2 | Filter | Bids on specific auctions |
| Q3 | JOIN + filter | Local item suggestion (person-auction join) |
| Q4 | JOIN + GROUP BY + AVG | Average selling price by category |
| Q5 | GROUP BY + COUNT | Hot items (bid count per auction) |
| Q6 | JOIN + GROUP BY + AVG | Average bid price per seller |
| Q7 | Aggregate (MAX) | Highest bid price |
| Q8 | JOIN | Person-auction join (new users monitoring) |
| Q9 | JOIN + DISTINCT ON | Winning bid per auction with bidder info |
Running Nexmark Tests
# Default scale (100 persons, 500 auctions, 2000 bids, 3 cycles)
cargo test --test e2e_nexmark_tests -- --ignored --test-threads=1 --nocapture
# Larger scale
NEXMARK_PERSONS=1000 NEXMARK_AUCTIONS=5000 NEXMARK_BIDS=50000 NEXMARK_CYCLES=5 \
cargo test --test e2e_nexmark_tests -- --ignored --test-threads=1 --nocapture
What Each Cycle Does
Each refresh cycle applies three mutation functions (RF1-RF3) then refreshes all stream tables and asserts multiset equality:
- RF1 (INSERT): New persons, auctions, and bids
- RF2 (DELETE): Remove oldest bids, orphaned auctions, orphaned persons
- RF3 (UPDATE): Price changes, reserve adjustments, city moves
- Refresh + Assert: Differential refresh → EXCEPT ALL correctness check
Correctness Validation
The test uses the same DBSP invariant as TPC-H: after every differential
refresh, the stream table must be multiset-equal to re-executing the
defining query from scratch (symmetric EXCEPT ALL). Additionally, negative
__pgt_count values (over-retraction bugs) are detected.
DAG Topology Benchmarks
The DAG topology benchmark suite in tests/e2e_dag_bench_tests.rs measures end-to-end propagation latency and throughput through multi-level DAG topologies. While the single-ST benchmarks above measure per-operator refresh speed, these benchmarks measure how efficiently changes propagate through chains, fan-outs, diamonds, and mixed topologies with 5–100+ stream tables.
The core questions these benchmarks answer:
How long does it take for a source-table INSERT to propagate through an entire DAG to the leaf stream tables?
How does PARALLEL refresh mode compare to CALCULATED mode across different topology shapes?
Running DAG Benchmarks
# Full suite (rebuilds Docker image)
just test-dag-bench
# Skip Docker image rebuild
just test-dag-bench-fast
# Individual topology tests
cargo test --test e2e_dag_bench_tests --features pg18 -- --ignored bench_latency_linear_5 --test-threads=1 --nocapture
cargo test --test e2e_dag_bench_tests --features pg18 -- --ignored bench_throughput_diamond --test-threads=1 --nocapture
Topology Patterns
| Topology | Shape | Description |
|---|---|---|
| Linear Chain | src → st_1 → st_2 → ... → st_N | Sequential pipeline; L1 aggregate, L2+ alternating project/filter |
| Wide DAG | src → [W parallel chains × D deep] | W independent chains of depth D from a shared source; tests parallel refresh mode |
| Fan-Out Tree | src → root → [b children] → [b² grandchildren] → ... | Exponential fan-out; each parent spawns b children with filter/project variants |
| Diamond | src → [fan-out aggregates] → JOIN → [extension] | Fan-out to independent aggregates (SUM/COUNT/MAX/MIN/AVG) then converge via JOIN |
| Mixed | Two sources, 4 layers, ~15 STs | Realistic e-commerce scenario with chains, fan-out, cross-source joins, and alerts |
Measurement Modes
Latency benchmarks (auto-refresh): The scheduler is enabled with a 200 ms interval. The test INSERTs into the source table and polls pgt_refresh_history until the leaf stream table has a new COMPLETED entry. This measures the full propagation latency including scheduler overhead.
Throughput benchmarks (manual refresh): The scheduler is disabled. The test applies mixed DML (70% UPDATE, 15% DELETE, 15% INSERT) then manually refreshes all STs in topological order. This isolates pure refresh cost from scheduler overhead.
Theoretical Comparison
Each latency benchmark computes the theoretical prediction from PLAN_DAG_PERFORMANCE.md and reports the delta:
| Mode | Formula |
|---|---|
| CALCULATED | L = I_s + N × T_r |
| PARALLEL(C) | L = Σ ⌈W_l / C⌉ × max(I_p, T_r) per level |
Where T_r is the measured average per-ST refresh time, I_s = 200 ms (scheduler interval), and C is the concurrency limit.
Reading the Output
Per-Cycle Machine-Parseable Lines (stderr)
[DAG_BENCH] topology=linear_chain mode=CALCULATED sts=10 depth=10 width=1 cycle=1 actual_ms=820.3 theory_ms=700.0 overhead_pct=17.2 per_hop_ms=82.0
ASCII Summary Table (stdout)
╔══════════════════════════════════════════════════════════════════════════════════════════════════════╗
║ pg_trickle DAG Topology Benchmark Results ║
╠═══════════════╤═══════════════╤══════╤═══════╤═══════╤════════════╤════════════╤═══════════════════╣
║ Topology │ Mode │ STs │ Depth │ Width │ Actual ms │ Theory ms │ Overhead ║
╠═══════════════╪═══════════════╪══════╪═══════╪═══════╪════════════╪════════════╪═══════════════════╣
║ linear_chain │ CALCULATED │ 10 │ 10 │ 1 │ 820.3 │ 700.0 │ +17.2% ║
║ wide_dag │ PARALLEL_C8 │ 60 │ 3 │ 20 │ 2430.1 │ 1800.0 │ +35.0% ║
╚═══════════════╧═══════════════╧══════╧═══════╧═══════╧════════════╧════════════╧═══════════════════╝
Per-Level Breakdown
Per-Level Breakdown (linear_chain D=10, CALCULATED):
Level 1: avg 52.3ms [st_lc_1]
Level 2: avg 48.7ms [st_lc_2]
...
Level 10: avg 51.2ms [st_lc_10]
Total: 513.5ms (scheduler overhead: 306.8ms)
JSON Export
Results are written to target/dag_bench_results/<timestamp>.json (overridable via PGS_DAG_BENCH_JSON_DIR env var) for cross-run comparison.
Available DAG Benchmark Tests
Latency Tests (Auto-Refresh)
| Test Name | Topology | Mode | STs |
|---|---|---|---|
bench_latency_linear_5_calc | Linear, D=5 | CALCULATED | 5 |
bench_latency_linear_10_calc | Linear, D=10 | CALCULATED | 10 |
bench_latency_linear_20_calc | Linear, D=20 | CALCULATED | 20 |
bench_latency_linear_10_par4 | Linear, D=10 | PARALLEL(4) | 10 |
bench_latency_wide_3x20_calc | Wide, D=3 W=20 | CALCULATED | 60 |
bench_latency_wide_3x20_par4 | Wide, D=3 W=20 | PARALLEL(4) | 60 |
bench_latency_wide_3x20_par8 | Wide, D=3 W=20 | PARALLEL(8) | 60 |
bench_latency_wide_5x20_calc | Wide, D=5 W=20 | CALCULATED | 100 |
bench_latency_wide_5x20_par8 | Wide, D=5 W=20 | PARALLEL(8) | 100 |
bench_latency_fanout_b2d5_calc | Fan-out, b=2 d=5 | CALCULATED | 31 |
bench_latency_fanout_b2d5_par8 | Fan-out, b=2 d=5 | PARALLEL(8) | 31 |
bench_latency_diamond_4_calc | Diamond, fan=4 | CALCULATED | 5 |
bench_latency_mixed_calc | Mixed, ~15 STs | CALCULATED | ~15 |
bench_latency_mixed_par8 | Mixed, ~15 STs | PARALLEL(8) | ~15 |
Throughput Tests (Manual Refresh)
| Test Name | Topology | STs | Delta Sizes |
|---|---|---|---|
bench_throughput_linear_5 | Linear, D=5 | 5 | 10, 100, 1000 |
bench_throughput_linear_10 | Linear, D=10 | 10 | 10, 100, 1000 |
bench_throughput_linear_20 | Linear, D=20 | 20 | 10, 100, 1000 |
bench_throughput_wide_3x20 | Wide, D=3 W=20 | 60 | 10, 100, 1000 |
bench_throughput_fanout_b2d5 | Fan-out, b=2 d=5 | 31 | 10, 100, 1000 |
bench_throughput_diamond_4 | Diamond, fan=4 | 5 | 10, 100, 1000 |
bench_throughput_mixed | Mixed, ~15 STs | ~15 | 10, 100, 1000 |
What to Look For
-
Linear chain: CALCULATED faster than PARALLEL. For width=1 DAGs, PARALLEL adds poll overhead without parallelism benefit. CALCULATED should be faster.
-
Wide DAG: PARALLEL(C=8) speedup over CALCULATED. For width ≥ 20, PARALLEL should show measurable improvement — it refreshes up to C STs concurrently per level instead of sequentially.
-
Overhead < 100%. Theoretical vs actual overhead should stay below 100% across all topologies — the formulas should be in the right ballpark.
-
DIFFERENTIAL action in per-ST breakdown. ST-on-ST hops should show
DIFFERENTIALrather thanFULL, confirming differential propagation is working. -
Throughput scaling with delta size. Smaller deltas (10 rows) should yield lower per-cycle wall-clock time than larger deltas (1000 rows).
In-Process Micro-Benchmarks (Criterion.rs)
In addition to the E2E database benchmarks, the project includes two Criterion.rs benchmark suites that measure pure Rust computation time without database overhead. These are useful for tracking performance regressions in the internal query-building and IVM differentiation logic.
Benchmark Suites
refresh_bench — Utility Functions
benches/refresh_bench.rs benchmarks the low-level helper functions used during refresh operations:
| Benchmark Group | What It Measures |
|---|---|
| quote_ident | PostgreSQL identifier quoting speed |
| col_list | Column list SQL generation |
| prefixed_col_list | Prefixed column list generation (e.g., NEW.col) |
| expr_to_sql | AST expression → SQL string conversion |
| output_columns | Output column extraction from parsed queries |
| source_oids | Source table OID resolution |
| lsn_gt | LSN comparison expression generation |
| frontier_json | Frontier state JSON serialization |
| canonical_period | Interval parsing and canonicalization |
| dag_operations | DAG topological sort and cycle detection |
| xxh64 | xxHash-64 hashing throughput |
diff_operators — IVM Operator Differentiation
benches/diff_operators.rs benchmarks the delta SQL generation for every IVM operator. Each benchmark creates a realistic operator tree and measures differentiate() throughput:
| Benchmark Group | What It Measures |
|---|---|
| diff_scan | Table scan differentiation (3, 10, 20 columns) |
| diff_filter | Filter (WHERE) differentiation |
| diff_project | Projection (SELECT subset) differentiation |
| diff_aggregate | GROUP BY aggregate differentiation (simple + complex) |
| diff_inner_join | Inner join differentiation |
| diff_left_join | Left outer join differentiation |
| diff_distinct | DISTINCT differentiation |
| diff_union_all | UNION ALL differentiation (2, 5, 10 children) |
| diff_window | Window function differentiation |
| diff_join_aggregate | Composite join + aggregate pipeline |
| differentiate_full | Full differentiate() call for scan-only and filter+scan trees |
Running Micro-Benchmarks
# Run all Criterion benchmarks
just bench
# Run only refresh utility benchmarks
cargo bench --bench refresh_bench --features pg18
# Run only IVM diff operator benchmarks
just bench-diff
# or equivalently:
cargo bench --bench diff_operators --features pg18
# Output in Bencher-compatible format (for CI integration)
just bench-bencher
Output and Reports
Criterion produces statistical analysis for each benchmark including:
- Mean and standard deviation of execution time
- Throughput (iterations/sec)
- Comparison with previous run — reports improvements/regressions with confidence intervals
HTML reports are generated in target/criterion/ with interactive charts showing distributions and regression history. Open target/criterion/report/index.html to browse all results.
Sample output:
diff_scan/3_columns time: [11.834 µs 12.074 µs 12.329 µs]
diff_scan/10_columns time: [16.203 µs 16.525 µs 16.869 µs]
diff_aggregate/simple time: [21.447 µs 21.862 µs 22.301 µs]
diff_inner_join time: [25.919 µs 26.421 µs 26.952 µs]
Continuous Benchmarking with Bencher
Bencher provides continuous benchmark tracking in CI, detecting performance regressions on pull requests before they merge.
How It Works
The .github/workflows/benchmarks.yml workflow:
-
On
mainpushes — runs both Criterion suites and uploads results to Bencher as the baseline. This establishes the expected performance for each benchmark. -
On pull requests — runs the same benchmarks and compares against the
mainbaseline using a Student's t-test with a 99% upper confidence boundary. If any benchmark regresses beyond the threshold, the PR check fails.
Setup
To enable Bencher for your fork or deployment:
-
Create a Bencher account at bencher.dev and create a project.
-
Add the API token as a GitHub Actions secret:
- Go to Settings → Secrets and variables → Actions
- Add
BENCHER_API_TOKENwith your Bencher API token
-
Update the project slug in
.github/workflows/benchmarks.ymlif your Bencher project name differs frompg-trickle.
The workflow gracefully degrades — if BENCHER_API_TOKEN is not set, benchmarks still run and upload artifacts but skip Bencher tracking.
Local Bencher-Format Output
To see what Bencher would receive from CI:
just bench-bencher
This runs both suites with --output-format bencher, producing JSON output compatible with bencher run.
Dashboard
Once configured, the Bencher dashboard shows:
- Historical trends for every benchmark across commits
- Statistical thresholds with configurable alerting
- PR annotations highlighting which benchmarks regressed and by how much
Troubleshooting
| Issue | Resolution |
|---|---|
docker: command not found | Install Docker Desktop and ensure it is running |
| Container startup timeout | Increase Docker memory allocation (≥ 4 GB recommended) |
image not found | Run ./tests/build_e2e_image.sh to build the test image |
| Highly variable timings | Close other workloads; use --test-threads=1 to avoid container contention |
| SLOW status on latency test | Expected in Docker; bare-metal should pass < 10 ms |
CDC Write-Side Overhead Benchmarks
The CDC write-overhead benchmark suite in tests/e2e_cdc_write_overhead_tests.rs measures the DML throughput cost of pg_trickle's CDC triggers on source tables. This quantifies the "write amplification factor" — how much slower DML becomes when a stream table is attached.
The core question this benchmark answers:
How much write throughput do you sacrifice by attaching a stream table to a source table?
Running CDC Write Overhead Benchmarks
# Full suite (all 5 scenarios)
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_write_overhead_full
# Individual scenarios
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_single_row_insert
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_bulk_insert
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_bulk_update
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_bulk_delete
cargo test --test e2e_cdc_write_overhead_tests --features pg18 -- --ignored --nocapture bench_cdc_concurrent_writers
Scenarios
| Scenario | Description | Rows per Cycle |
|---|---|---|
| Single-row INSERT | One INSERT statement per row, 1,000 rows total | 1,000 |
| Bulk INSERT | Single INSERT ... SELECT generate_series(...) | 10,000 |
| Bulk UPDATE | Single UPDATE ... WHERE id <= N | 10,000 |
| Bulk DELETE | Single DELETE ... WHERE id <= N | 10,000 |
| Concurrent writers | 4 parallel sessions each inserting 5,000 rows | 20,000 total |
Reading the Output
╔═══════════════════════════════════════════════════════════════════════════════════╗
║ pg_trickle CDC Write-Side Overhead Benchmark ║
╠═══════════════════════╤═══════════════╤═══════════════╤═════════════════════════╣
║ Scenario │ Baseline (ms) │ With CDC (ms) │ Write Amplification ║
╠═══════════════════════╪═══════════════╪═══════════════╪═════════════════════════╣
║ single-row INSERT │ 450.2 │ 890.5 │ 1.98× ║
║ bulk INSERT (10K) │ 35.1 │ 72.3 │ 2.06× ║
║ bulk UPDATE (10K) │ 48.7 │ 105.2 │ 2.16× ║
║ bulk DELETE (10K) │ 22.4 │ 51.8 │ 2.31× ║
║ concurrent (4×5K) │ 65.3 │ 142.1 │ 2.18× ║
╚═══════════════════════╧═══════════════╧═══════════════╧═════════════════════════╝
| Column | Meaning |
|---|---|
| Scenario | DML pattern being measured |
| Baseline | Average wall-clock time with no stream table (no CDC trigger) |
| With CDC | Average wall-clock time with an active stream table (CDC trigger fires) |
| Write Amplification | With CDC / Baseline — how many times slower the write path becomes |
Machine-Readable Output
[CDC_BENCH] scenario=single-row_INSERT baseline_avg_ms=450.2 cdc_avg_ms=890.5 write_amplification=1.98
Interpreting Write Amplification
| Write Amplification | Interpretation |
|---|---|
| 1.0–1.5× | Minimal overhead — triggers add negligible cost. Typical for bulk DML with statement-level triggers. |
| 1.5–2.5× | Expected range for statement-level CDC triggers. Each DML statement incurs one additional INSERT into the change buffer. |
| 2.5–4.0× | Moderate overhead — acceptable for most workloads. Common with row-level triggers or single-row DML. |
| 4.0–10× | High overhead — consider pg_trickle.cdc_trigger_mode = 'statement' if using row-level triggers, or reduce DML frequency. |
| > 10× | Investigate — may indicate lock contention on the change buffer or pathological trigger interaction. |
Key Patterns to Look For
-
Statement-level triggers vs row-level: Statement-level triggers (default since v0.11.0) should show significantly lower overhead for bulk DML compared to row-level triggers.
-
Bulk DML advantage: Bulk INSERT/UPDATE/DELETE should show lower write amplification than single-row INSERT because the trigger fires once per statement, not once per row.
-
Concurrent writer safety: The concurrent scenario should complete without deadlocks or errors, and the write amplification should be similar to the serial bulk INSERT case.
-
DELETE overhead: DELETE triggers tend to be slightly more expensive than INSERT triggers because the trigger must capture the
OLDrow values.
CI Benchmark Workflows
All benchmark jobs run only on weekly schedule and workflow_dispatch — never on PR or push — to avoid blocking the merge gate with long-running tests.
e2e-benchmarks.yml — E2E Benchmark Tracking
Produces the numbers in README.md and this document. Each job posts a summary table to the GitHub Actions run page and uploads artifacts at 90-day retention. Manual dispatch accepts a job input (refresh | latency | cdc | tpch | all) to re-run a single job.
| Job | Test(s) | README Section | Timeout | just command |
|---|---|---|---|---|
bench-refresh | bench_full_matrix | Differential vs Full Refresh | 60 min | just test-bench-e2e-fast |
bench-latency | bench_no_data_refresh_latency | Zero-Change Latency | 20 min | just test-bench-e2e-fast |
bench-cdc | bench_cdc_trigger_overhead | Write-Path Overhead | 30 min | just test-bench-e2e-fast |
bench-tpch | test_tpch_performance_comparison | TPC-H per-query table | 30 min | just bench-tpch-fast |
ci.yml — Benchmark Jobs
Criterion micro-benchmarks and DAG topology benchmarks. Run on the daily schedule and workflow_dispatch.
| Job | Test Suite | What It Measures | Timeout | just command |
|---|---|---|---|---|
benchmarks | benches/refresh_bench.rs, benches/diff_operators.rs | In-process Rust: query building, delta SQL generation (sub-µs) | 20 min | just bench |
dag-bench-calc | e2e_dag_bench_tests (excl. par*) | DAG propagation latency + throughput, CALCULATED mode | 30 min | just test-dag-bench-fast |
dag-bench-parallel | e2e_dag_bench_tests (par*) | DAG propagation with 4–8 parallel workers | 120 min | just test-dag-bench-fast |
benchmarks.yml — Bencher Integration (opt-in)
Disabled by default (no scheduled trigger). Re-enable by restoring push/pull_request triggers and adding a BENCHER_API_TOKEN secret. When active, it annotates PRs with regressions detected via Student’s t-test at a 99% upper confidence boundary.
| Job | Test Suite | What It Measures | Tracking |
|---|---|---|---|
benchmark | benches/refresh_bench.rs, benches/diff_operators.rs | Same as ci.yml benchmarks job | Bencher (regression alert on PR) |
Artifact Retention Summary
| Workflow | Artifact | Retention |
|---|---|---|
e2e-benchmarks.yml | bench-{refresh,latency,cdc,tpch}-results (stdout + JSON) | 90 days |
ci.yml benchmarks | benchmark-results (Criterion HTML + JSON) | 7 days |
benchmarks.yml | criterion-results (Criterion HTML + JSON) | 7 days |
What Happens When You INSERT a Row?
This tutorial traces the complete lifecycle of a single INSERT statement on a base table that is referenced by a stream table — from the moment the row is written to the moment the stream table reflects the change.
Setup: A Real-World Example
Suppose you run an e-commerce platform. You have an orders table and a stream table that maintains a running total per customer:
-- Base table
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL
);
-- Stream table: always-fresh customer totals
SELECT pgtrickle.create_stream_table(
name => 'customer_totals',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
schedule => '1m' -- refresh when data is staler than 1 minute
-- refresh_mode defaults to 'AUTO' (differential with full-refresh fallback)
);
After creation, customer_totals is a real PostgreSQL table:
SELECT * FROM customer_totals;
-- (empty — no orders yet)
Phase 1: The INSERT
A new order arrives:
INSERT INTO orders (customer, amount) VALUES ('alice', 49.99);
What happens inside PostgreSQL
When create_stream_table() was called, pg_trickle installed an AFTER INSERT OR UPDATE OR DELETE trigger on the orders table. This trigger fires automatically — the user's INSERT statement triggers it transparently.
The trigger function (pgtrickle_changes.pg_trickle_cdc_fn_<oid>()) executes inside the same transaction as the INSERT and writes a single row into the change buffer table:
pgtrickle_changes.changes_16384 (where 16384 = orders table OID)
┌───────────┬─────────────┬────────┬─────────┬──────────┬──────────┬────────────┐
│ change_id │ lsn │ action │ pk_hash │ new_id │ new_cust │ new_amount │
├───────────┼─────────────┼────────┼─────────┼──────────┼──────────┼────────────┤
│ 1 │ 0/1A3F2B80 │ I │ -837291 │ 1 │ alice │ 49.99 │
└───────────┴─────────────┴────────┴─────────┴──────────┴──────────┴────────────┘
Key details:
lsn: The current WAL Log Sequence Number (pg_current_wal_lsn()), used to bound which changes belong to which refresh cycle.action:'I'for INSERT,'U'for UPDATE,'D'for DELETE.pk_hash: A pre-computed hash of the primary key (orders.id), used later for efficient row matching.new_*columns: The actual column values fromNEW, stored as native PostgreSQL types (not JSONB). There are noold_*values for INSERTs.
The trigger adds zero overhead to the user's transaction commit beyond this single INSERT into the buffer table. There is no JSONB serialization, no logical replication slot, and no external process involved.
Phase 2: The Scheduler Wakes Up
A background worker called the scheduler runs inside PostgreSQL (registered via shared_preload_libraries). It wakes up every pg_trickle.scheduler_interval_ms milliseconds (default: 1000ms) and performs a tick:
- Rebuild the DAG (if any stream tables were created/dropped since last tick) — a dependency graph of all stream tables and their source tables.
- Topological sort — determine the refresh order so that stream tables depending on other stream tables are refreshed after their dependencies.
- For each stream table, check: has its staleness exceeded its schedule?
For customer_totals with a '1m' schedule, the scheduler compares:
now()minusdata_timestamp(the freshness watermark from the last refresh)- Against the schedule: 60 seconds
If more than 60 seconds have elapsed and the stream table isn't already being refreshed, the scheduler begins a refresh.
Phase 3: Frontier Advancement
Before executing the refresh, the scheduler creates a new frontier — a snapshot of how far to read changes from each source table:
Previous frontier: { orders(16384): lsn = 0/1A3F2A00 }
New frontier: { orders(16384): lsn = 0/1A3F2C00 }
The frontier is a DBSP-inspired version vector. Each source table has its own LSN cursor. The refresh will process all changes in the buffer table where lsn > previous_frontier_lsn AND lsn <= new_frontier_lsn.
This means:
- Changes committed before the previous refresh are already reflected.
- Changes committed after the new frontier will be picked up in the next cycle.
- The INSERT we made (
lsn = 0/1A3F2B80) falls within this window.
Phase 4: Change Detection — Is There Anything to Do?
Before running the full delta query, the scheduler runs a short-circuit check: does the change buffer actually have any rows in the LSN window?
SELECT count(*)::bigint FROM (
SELECT 1 FROM pgtrickle_changes.changes_16384
WHERE lsn > '0/1A3F2A00'::pg_lsn
AND lsn <= '0/1A3F2C00'::pg_lsn
LIMIT <threshold>
) __pgt_capped
This query also checks the adaptive threshold: if the number of changes exceeds a percentage of the source table size (default: 10%), the scheduler falls back to a FULL refresh instead of DIFFERENTIAL, because applying thousands of individual deltas would be slower than a bulk reload.
For our single INSERT, the count is 1 — well below the threshold. The scheduler proceeds with a DIFFERENTIAL refresh.
Phase 5: Delta Query Generation (DVM Engine)
This is where the Differential View Maintenance (DVM) engine does its work. The defining query:
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
is parsed into an operator tree:
Aggregate(GROUP BY customer, SUM(amount), COUNT(*))
└── Scan(orders)
The DVM engine differentiates each operator — converting it from "compute the full result" to "compute only what changed":
Step 1: Differentiate the Scan
The Scan(orders) operator becomes a read from the change buffer:
-- Reads only changes in the LSN window, splitting UPDATEs into DELETE+INSERT
WITH __pgt_raw AS (
SELECT c.pk_hash, c.action,
c."new_customer", c."old_customer",
c."new_amount", c."old_amount"
FROM pgtrickle_changes.changes_16384 c
WHERE c.lsn > '0/1A3F2A00'::pg_lsn
AND c.lsn <= '0/1A3F2C00'::pg_lsn
)
-- INSERT rows: take new_* values
SELECT pk_hash AS __pgt_row_id, 'I' AS __pgt_action,
"new_customer" AS customer, "new_amount" AS amount
FROM __pgt_raw WHERE action IN ('I', 'U')
UNION ALL
-- DELETE rows: take old_* values
SELECT pk_hash AS __pgt_row_id, 'D' AS __pgt_action,
"old_customer" AS customer, "old_amount" AS amount
FROM __pgt_raw WHERE action IN ('D', 'U')
For our single INSERT, this produces:
__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
-837291 | I | alice | 49.99
Step 2: Differentiate the Aggregate
The Aggregate differentiation is the heart of incremental maintenance. Instead of re-computing SUM(amount) over the entire orders table, it computes:
-- Delta for SUM: add new values, subtract deleted values
SELECT customer,
SUM(CASE WHEN __pgt_action = 'I' THEN amount
WHEN __pgt_action = 'D' THEN -amount END) AS total,
SUM(CASE WHEN __pgt_action = 'I' THEN 1
WHEN __pgt_action = 'D' THEN -1 END) AS order_count,
pgtrickle.pg_trickle_hash(customer::text) AS __pgt_row_id,
'I' AS __pgt_action
FROM <scan_delta>
GROUP BY customer
For our INSERT of ('alice', 49.99), this yields:
customer | total | order_count | __pgt_row_id | __pgt_action
---------|--------|-------------|--------------|-------------
alice | +49.99 | +1 | 7283194 | I
The stream table uses reference counting: it tracks __pgt_count (how many source rows contribute to each group). When __pgt_count reaches 0, the group row is deleted.
Phase 6: MERGE Into the Stream Table
The delta is applied to the customer_totals storage table using a single SQL MERGE statement:
MERGE INTO public.customer_totals AS st
USING (<delta_query>) AS d
ON st.__pgt_row_id = d.__pgt_row_id
WHEN MATCHED AND d.__pgt_action = 'D' THEN DELETE
WHEN MATCHED AND d.__pgt_action = 'I' THEN
UPDATE SET customer = d.customer, total = d.total, order_count = d.order_count
WHEN NOT MATCHED AND d.__pgt_action = 'I' THEN
INSERT (__pgt_row_id, customer, total, order_count)
VALUES (d.__pgt_row_id, d.customer, d.total, d.order_count)
Since alice didn't exist before, this is a NOT MATCHED → INSERT. The stream table now contains:
SELECT * FROM customer_totals;
customer | total | order_count
----------|-------|------------
alice | 49.99 | 1
Phase 7: Cleanup and Bookkeeping
After the MERGE succeeds:
-
Consumed changes are deleted from the buffer table:
DELETE FROM pgtrickle_changes.changes_16384 WHERE lsn > '0/1A3F2A00'::pg_lsn AND lsn <= '0/1A3F2C00'::pg_lsn -
The frontier is saved to the catalog as JSONB, so the next refresh knows where to start.
-
The refresh is recorded in
pgtrickle.pgt_refresh_history:refresh_id | pgt_id | action | rows_inserted | rows_deleted | delta_row_count | status | initiated_by 1 | 1 | DIFFERENTIAL | 1 | 0 | 1 | COMPLETED | SCHEDULERThe
delta_row_countcolumn (new in v0.2.0) records the total number of change buffer rows consumed during this refresh cycle. -
The data timestamp on the stream table is advanced, resetting the staleness clock.
-
The MERGE template is cached in thread-local storage. The next refresh for this stream table skips SQL parsing, operator tree construction, and differentiation — it only substitutes LSN values into the cached template. This saves ~45ms per refresh cycle.
What About UPDATE and DELETE?
UPDATE
UPDATE orders SET amount = 59.99 WHERE id = 1;
The trigger writes a single row with action = 'U', capturing both OLD and NEW values:
action | new_amount | old_amount | new_customer | old_customer
-------|------------|------------|--------------|-------------
U | 59.99 | 49.99 | alice | alice
The scan differentiation splits this into:
- DELETE old:
(alice, 49.99)with action'D' - INSERT new:
(alice, 59.99)with action'I'
The aggregate differentiation computes: +59.99 - 49.99 = +10.00 for alice's total. The MERGE updates the existing row.
DELETE
DELETE FROM orders WHERE id = 1;
The trigger writes action = 'D' with the OLD values. The aggregate differentiation computes -49.99 for the total and -1 for the count. If the __pgt_count reaches 0 (no more orders for alice), the MERGE deletes alice's row from the stream table entirely.
Performance: Why This Is Fast
| Step | What it avoids |
|---|---|
| Trigger-based CDC | No logical replication slot, no WAL parsing, no external process |
| Typed columns | No JSONB serialization in the trigger, no jsonb_populate_record in the delta query |
| Pre-computed pk_hash | No per-row hash computation during the delta query |
| LSN-bounded reads | Index scan on the change buffer, not a full table scan |
| Algebraic differentiation | Processes only changed rows — O(changes) not O(table size) |
| MERGE statement | Single SQL round-trip for all inserts, updates, and deletes |
| Cached templates | After the first refresh, delta SQL generation is skipped entirely |
| Adaptive fallback | Automatically switches to FULL refresh when changes exceed a threshold |
For a table with 10 million rows and 100 changed rows, a DIFFERENTIAL refresh processes only those 100 rows. A FULL refresh would need to scan all 10 million.
What About IMMEDIATE Mode?
Everything described above applies to the default AUTO mode — changes accumulate in a buffer and are applied on a schedule using differential (delta-only) maintenance. As of v0.2.0, pg_trickle also supports IMMEDIATE mode, which takes a fundamentally different path.
With IMMEDIATE mode, there are no change buffers, no scheduler, and no waiting:
SELECT pgtrickle.create_stream_table(
name => 'customer_totals_live',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
refresh_mode => 'IMMEDIATE'
);
How IMMEDIATE Mode Differs for INSERT
| Phase | DIFFERENTIAL | IMMEDIATE |
|---|---|---|
| Trigger type | Row-level AFTER trigger | Statement-level AFTER trigger with REFERENCING NEW TABLE |
| What's captured | One buffer row per INSERT | A transition table containing all inserted rows |
| When delta runs | Next scheduler tick (up to schedule bound) | Immediately, in the same transaction |
| Delta source | Change buffer table (pgtrickle_changes.*) | Temp table copied from transition table |
| Concurrency | No locking between writers | Advisory lock per stream table |
When you run INSERT INTO orders ...:
- A BEFORE INSERT statement-level trigger acquires an advisory lock on the stream table
- The AFTER INSERT trigger captures the transition table (
NEW TABLE AS __pgt_newtable) into a temp table - The DVM engine generates the same delta query, but reads from the temp table instead of the change buffer
- The delta is applied to the stream table via INSERT/DELETE DML (not MERGE)
- The stream table is immediately up-to-date — within the same transaction
BEGIN;
INSERT INTO orders (customer, amount) VALUES ('alice', 49.99);
-- customer_totals_live already shows alice with total=49.99 here!
SELECT * FROM customer_totals_live;
COMMIT;
The delta SQL template is cached per (pgt_id, source_oid, has_new, has_old) combination, so subsequent trigger invocations skip query parsing entirely.
Next in This Series
- What Happens When You UPDATE a Row? — D+I split, group key changes, net-effect for multiple UPDATEs
- What Happens When You DELETE a Row? — Reference counting, group deletion, INSERT+DELETE cancellation
- What Happens When You TRUNCATE a Table? — Why TRUNCATE bypasses triggers and how to recover
What Happens When You UPDATE a Row?
This tutorial traces what happens when an UPDATE statement hits a base table that is referenced by a stream table. It covers the trigger capture, the scan-level decomposition into DELETE + INSERT, and how each DVM operator propagates the change — including cases where the group key changes, where JOINs are involved, and where multiple UPDATEs happen within a single refresh window.
Prerequisite: Read WHAT_HAPPENS_ON_INSERT.md first — it introduces the full 7-phase lifecycle. This tutorial focuses on how UPDATE differs.
Setup
Same e-commerce example:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL
);
SELECT pgtrickle.create_stream_table(
name => 'customer_totals',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
schedule => '1m'
);
-- Seed some data
INSERT INTO orders (customer, amount) VALUES
('alice', 49.99),
('alice', 30.00),
('bob', 75.00);
After the first refresh, the stream table contains:
customer | total | order_count
---------|-------|------------
alice | 79.99 | 2
bob | 75.00 | 1
Case 1: Simple Value UPDATE (Same Group Key)
UPDATE orders SET amount = 59.99 WHERE id = 1;
Alice's first order changes from 49.99 to 59.99. The customer (group key) stays the same.
Phase 1: Trigger Capture
The AFTER UPDATE trigger fires and writes one row to the change buffer with both OLD and NEW values:
pgtrickle_changes.changes_16384
┌───────────┬─────────────┬────────┬──────────┬──────────┬────────────┬──────────┬────────────┐
│ change_id │ lsn │ action │ new_cust │ new_amt │ old_cust │ old_amt │ pk_hash │
├───────────┼─────────────┼────────┼──────────┼──────────┼────────────┼──────────┼────────────┤
│ 4 │ 0/1A3F3000 │ U │ alice │ 59.99 │ alice │ 49.99 │ -837291 │
└───────────┴─────────────┴────────┴──────────┴──────────┴────────────┴──────────┴────────────┘
Key difference from INSERT: the trigger writes both new_* and old_* columns. The pk_hash is computed from NEW.id.
Phase 2–4: Scheduler, Frontier, Change Detection
Identical to the INSERT flow. The scheduler detects one change row in the LSN window.
Phase 5: Scan Differentiation — The U → D+I Split
This is where UPDATE handling diverges fundamentally. The scan delta operator decomposes the UPDATE into two events:
__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
-837291 | D | alice | 49.99 ← old values (DELETE)
-837291 | I | alice | 59.99 ← new values (INSERT)
Why split into D+I? This is a core IVM principle. Downstream operators (aggregates, joins, filters) don't have special "update" logic — they only understand insertions and deletions. By decomposing the UPDATE:
- The DELETE event subtracts the old values from running aggregates
- The INSERT event adds the new values
This algebraic approach handles arbitrary operator trees without operator-specific update logic.
Phase 5 (continued): Aggregate Differentiation
The aggregate operator processes both events against the alice group:
-- DELETE event: subtract old values
alice: total += CASE WHEN action='D' THEN -49.99 END → -49.99
alice: count += CASE WHEN action='D' THEN -1 END → -1
-- INSERT event: add new values
alice: total += CASE WHEN action='I' THEN +59.99 END → +59.99
alice: count += CASE WHEN action='I' THEN +1 END → +1
Net effect on alice's group:
total delta: -49.99 + 59.99 = +10.00
count delta: -1 + 1 = 0
The aggregate emits this as an INSERT (because the group still exists and its value changed):
customer | total | order_count | __pgt_row_id | __pgt_action
---------|--------|-------------|--------------|-------------
alice | +10.00 | 0 | 7283194 | I
Phase 6: MERGE
The MERGE updates the existing row:
-- MERGE WHEN MATCHED AND action = 'I' THEN UPDATE:
-- alice's total: 79.99 + 10.00 = 89.99 (via reference counting)
-- alice's count: 2 + 0 = 2
Wait — that's not right. The MERGE doesn't add deltas; it replaces the row. The aggregate delta query actually computes the new absolute value by combining the stored state with the delta:
COALESCE(existing.total, 0) + delta.total → 79.99 + 10.00 = 89.99
COALESCE(existing.__pgt_count, 0) + delta.__pgt_count → 2 + 0 = 2
Result:
SELECT * FROM customer_totals;
customer | total | order_count
----------|-------|------------
alice | 89.99 | 2 ← was 79.99
bob | 75.00 | 1
Case 2: Group Key Change (Customer Reassignment)
UPDATE orders SET customer = 'bob' WHERE id = 2;
Alice's second order (amount=30.00) is reassigned to Bob. The group key itself changes.
Trigger Capture
change_id | lsn | action | new_cust | new_amt | old_cust | old_amt | pk_hash
5 | 0/1A3F3100 | U | bob | 30.00 | alice | 30.00 | 4521038
The old and new customer values differ.
Scan Delta: D+I Split
__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
4521038 | D | alice | 30.00 ← removes from alice's group
4521038 | I | bob | 30.00 ← adds to bob's group
Aggregate Delta
The aggregate groups by customer, so the DELETE and INSERT land in different groups:
Group "alice":
total delta: -30.00
count delta: -1
Group "bob":
total delta: +30.00
count delta: +1
After MERGE
SELECT * FROM customer_totals;
customer | total | order_count
----------|--------|------------
alice | 59.99 | 1 ← lost one order (-30.00)
bob | 105.00 | 2 ← gained one order (+30.00)
This is why the D+I decomposition is essential. Without it, you'd need special "move between groups" logic. With it, the standard aggregate differentiation handles group key changes naturally.
Case 3: UPDATE That Deletes a Group
-- Alice only has one order left. Reassign it to bob.
UPDATE orders SET customer = 'bob' WHERE id = 1;
Aggregate Delta
Group "alice":
total delta: -59.99
count delta: -1
new __pgt_count: 1 - 1 = 0 → group vanishes!
Group "bob":
total delta: +59.99
count delta: +1
When __pgt_count reaches 0, the aggregate emits a DELETE for alice's group:
customer | total | __pgt_row_id | __pgt_action
---------|-------|--------------|-------------
alice | — | 7283194 | D ← group removed
bob | ... | 9182734 | I ← group updated
The MERGE deletes alice's row entirely:
SELECT * FROM customer_totals;
customer | total | order_count
----------|--------|------------
bob | 165.00 | 3
Case 4: Multiple UPDATEs on the Same Row (Within One Refresh Window)
What if a row is updated multiple times before the next refresh?
UPDATE orders SET amount = 10.00 WHERE id = 3; -- bob: 75 → 10
UPDATE orders SET amount = 20.00 WHERE id = 3; -- bob: 10 → 20
UPDATE orders SET amount = 30.00 WHERE id = 3; -- bob: 20 → 30
The change buffer now has 3 rows for pk_hash of order #3:
change_id | action | old_amt | new_amt
6 | U | 75.00 | 10.00
7 | U | 10.00 | 20.00
8 | U | 20.00 | 30.00
Net-Effect Computation
The scan delta uses a split fast-path design. Since order #3 has multiple changes (cnt > 1), it takes the multi-change path with window functions:
FIRST_VALUE(action) OVER (PARTITION BY pk_hash ORDER BY change_id) → 'U'
LAST_VALUE(action) OVER (...) → 'U'
Both first and last actions are 'U', so:
- DELETE: emits using old values from the earliest change (change_id=6):
old_amt = 75.00 - INSERT: emits using new values from the latest change (change_id=8):
new_amt = 30.00
Net delta:
__pgt_row_id | __pgt_action | amount
-------------|--------------|-------
pk_hash_3 | D | 75.00 ← original value before all changes
pk_hash_3 | I | 30.00 ← final value after all changes
The aggregate sees -75.00 + 30.00 = -45.00. This is correct regardless of the intermediate values. The intermediate rows (10.00, 20.00) are never seen.
Case 5: INSERT + UPDATE in Same Window
INSERT INTO orders (customer, amount) VALUES ('charlie', 100.00);
UPDATE orders SET amount = 200.00 WHERE customer = 'charlie';
Both happen before the next refresh. The buffer has:
change_id | action | old_amt | new_amt
9 | I | NULL | 100.00
10 | U | 100.00 | 200.00
Net-effect analysis:
first_action = 'I'(row didn't exist before this window)last_action = 'U'(row exists after)
Result:
- No DELETE emitted (first_action = 'I' means the row was born in this window)
- INSERT with final values:
(charlie, 200.00)
The aggregate sees a pure insertion of (charlie, 200.00) — the intermediate value of 100.00 never appears.
Case 6: UPDATE + DELETE in Same Window
UPDATE orders SET amount = 999.99 WHERE id = 3;
DELETE FROM orders WHERE id = 3;
Net-effect:
first_action = 'U'(row existed before)last_action = 'D'(row no longer exists)
Result:
- DELETE with original old values from the first change
- No INSERT (last_action = 'D')
The aggregate correctly sees only a removal.
Case 7: UPDATE with JOINs
Consider a stream table that joins two tables:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
tier TEXT NOT NULL DEFAULT 'standard'
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
amount NUMERIC(10,2)
);
SELECT pgtrickle.create_stream_table(
name => 'order_details',
query => $$
SELECT c.name, c.tier, o.amount
FROM orders o
JOIN customers c ON o.customer_id = c.id
$$,
schedule => '1m'
);
Now update a customer's tier:
UPDATE customers SET tier = 'premium' WHERE name = 'alice';
How the JOIN Delta Works
The join differentiation follows the formula:
$$\Delta(L \bowtie R) = (\Delta L \bowtie R) \cup (L \bowtie \Delta R) - (\Delta L \bowtie \Delta R)$$
Since only the customers table changed:
- $\Delta L$ = changes to orders (empty)
- $\Delta R$ = changes to customers (alice's tier: standard → premium)
So:
- Part 1: $\Delta\text{orders} \bowtie \text{customers}$ = empty (no order changes)
- Part 2: $\text{orders} \bowtie \Delta\text{customers}$ = all of alice's orders joined with her tier change
- Part 3: $\Delta\text{orders} \bowtie \Delta\text{customers}$ = empty (no order changes)
Part 2 produces the delta: for each of alice's orders, DELETE the old row (with tier='standard') and INSERT a new row (with tier='premium').
The stream table is updated to reflect the new tier across all of alice's order rows.
Performance Summary
| Scenario | Buffer rows | Delta rows emitted | Work |
|---|---|---|---|
| Simple value change | 1 | 2 (D+I) | O(1) per group |
| Group key change | 1 | 2 (D+I, different groups) | O(1) per affected group |
| Group deletion | 1 | 1 (D) + 1 (I) or 1 (D) | O(1) |
| N updates same row | N | 2 (D first-old + I last-new) | O(N) scan, O(1) aggregate |
| INSERT+UPDATE same window | 2 | 1 (I only) | O(1) |
| UPDATE+DELETE same window | 2 | 1 (D only) | O(1) |
In all cases, the work is proportional to the number of changed rows, not the total table size. A single UPDATE on a billion-row table produces the same delta cost as on a 10-row table.
What About IMMEDIATE Mode?
Everything above describes DIFFERENTIAL mode — changes accumulate in a buffer and are applied on a schedule. As of v0.2.0, pg_trickle also supports IMMEDIATE mode, where the stream table is updated synchronously within the same transaction as your UPDATE.
How IMMEDIATE Mode Differs for UPDATE
| Phase | DIFFERENTIAL | IMMEDIATE |
|---|---|---|
| Trigger type | Row-level AFTER trigger | Statement-level AFTER trigger with REFERENCING OLD TABLE, NEW TABLE |
| What's captured | One buffer row with old_* and new_* | Two transition tables: __pgt_oldtable and __pgt_newtable |
| When delta runs | Next scheduler tick | Immediately, in the same transaction |
| D+I decomposition | In the scan delta CTE | Same algebra, but reading from transition temp tables |
| Concurrency | No locking between writers | Advisory lock per stream table |
When you run UPDATE orders SET amount = 59.99 WHERE id = 1:
- A BEFORE UPDATE trigger acquires an advisory lock on the stream table
- The AFTER UPDATE trigger captures both
OLD TABLE AS __pgt_oldtableandNEW TABLE AS __pgt_newtableinto temp tables - The DVM engine generates the same D+I decomposition, reading old values from the old-table and new values from the new-table
- The delta is applied to the stream table immediately
- Any query within the same transaction sees the updated stream table
BEGIN;
UPDATE orders SET amount = 59.99 WHERE id = 1;
-- customer_totals already reflects the new amount here!
SELECT * FROM customer_totals WHERE customer = 'alice';
COMMIT;
The same D+I split, aggregate differentiation, and net-effect logic applies — the only difference is the data source (transition tables vs change buffer) and timing (synchronous vs scheduled).
Next in This Series
- What Happens When You INSERT a Row? — The full 7-phase lifecycle (start here if you haven't already)
- What Happens When You DELETE a Row? — Reference counting, group deletion, INSERT+DELETE cancellation
- What Happens When You TRUNCATE a Table? — Why TRUNCATE bypasses triggers and how to recover
What Happens When You DELETE a Row?
This tutorial traces what happens when a DELETE statement hits a base table that is referenced by a stream table. It covers the trigger capture, how the scan delta emits a single DELETE event, and how each DVM operator propagates the removal — including group deletion, partial group reduction, JOINs, cascading deletes within a single refresh window, and the important edge case where a DELETE cancels a prior INSERT.
Prerequisite: Read WHAT_HAPPENS_ON_INSERT.md first — it introduces the full 7-phase lifecycle (trigger → scheduler → frontier → change detection → DVM delta → MERGE → cleanup). This tutorial focuses on how DELETE differs.
Setup
Same e-commerce example used throughout the series:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL
);
SELECT pgtrickle.create_stream_table(
name => 'customer_totals',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
schedule => '1m'
);
-- Seed some data
INSERT INTO orders (customer, amount) VALUES
('alice', 50.00),
('alice', 30.00),
('bob', 75.00),
('bob', 25.00);
After the first refresh, the stream table contains:
customer | total | order_count
---------|--------|------------
alice | 80.00 | 2
bob | 100.00 | 2
Case 1: Delete One Row (Group Survives)
DELETE FROM orders WHERE id = 2; -- alice's 30.00 order
Alice still has one remaining order (id=1, amount=50.00). The group shrinks but doesn't vanish.
Phase 1: Trigger Capture
The AFTER DELETE trigger fires and writes one row to the change buffer with only OLD values:
pgtrickle_changes.changes_16384
┌───────────┬─────────────┬────────┬──────────┬──────────┬────────────┬──────────┬────────────┐
│ change_id │ lsn │ action │ new_cust │ new_amt │ old_cust │ old_amt │ pk_hash │
├───────────┼─────────────┼────────┼──────────┼──────────┼────────────┼──────────┼────────────┤
│ 5 │ 0/1A3F3000 │ D │ NULL │ NULL │ alice │ 30.00 │ 4521038 │
└───────────┴─────────────┴────────┴──────────┴──────────┴────────────┴──────────┴────────────┘
Key difference from INSERT and UPDATE:
new_*columns are all NULL — the row no longer exists, so there are no NEW valuesold_*columns contain the deleted row's data — this is what gets subtractedpk_hashis computed fromOLD.id(the deleted row's primary key)
Phase 2–4: Scheduler, Frontier, Change Detection
Identical to the INSERT flow. The scheduler detects one change row in the LSN window.
Phase 5: Scan Differentiation — Pure DELETE
Unlike UPDATE (which splits into D+I), a DELETE produces a single event:
__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
4521038 | D | alice | 30.00
The scan delta applies the net-effect filtering rule:
first_action = 'D'→ row existed before the refresh windowlast_action = 'D'→ row does not exist after
Result: emit a DELETE using old values. No INSERT is emitted (because last_action = 'D').
This is the simplest path through the scan delta — one change, one PK, one DELETE event.
Phase 5 (continued): Aggregate Differentiation
The aggregate operator processes the DELETE event against the alice group:
-- DELETE event: subtract old values from alice's group
__ins_count = 0 -- no inserts
__del_count = 1 -- one deletion
__ins_total = 0 -- no amount added
__del_total = 30.00 -- 30.00 removed
The merge CTE joins this delta with the existing stream table state:
new_count = old_count + ins_count - del_count = 2 + 0 - 1 = 1 (still > 0)
Since new_count > 0 and the group already existed (old_count = 2), the action is classified as 'U' (update). The aggregate emits the group with its new values:
customer | total | order_count | __pgt_row_id | __pgt_action
---------|-------|-------------|--------------|-------------
alice | 50.00 | 1 | 7283194 | I
Note: the 'U' meta-action is emitted as __pgt_action = 'I' because the MERGE treats it as an update-via-INSERT (see aggregate final CTE: CASE WHEN __pgt_meta_action = 'D' THEN 'D' ELSE 'I' END).
Phase 6: MERGE
The MERGE statement matches alice's existing row and updates it:
MERGE INTO customer_totals AS st
USING (...delta...) AS d
ON st.__pgt_row_id = d.__pgt_row_id
WHEN MATCHED AND d.__pgt_action = 'I' THEN
UPDATE SET customer = d.customer, total = d.total, order_count = d.order_count, ...
Result:
SELECT * FROM customer_totals;
customer | total | order_count
----------|--------|------------
alice | 50.00 | 1 ← was 80.00 / 2
bob | 100.00 | 2
Phase 7: Cleanup
The change buffer rows in the consumed LSN window are deleted:
DELETE FROM pgtrickle_changes.changes_16384
WHERE lsn > '0/1A3F2FFF'::pg_lsn AND lsn <= '0/1A3F3000'::pg_lsn;
Case 2: Delete Last Row in Group (Group Vanishes)
-- Alice has one order left (id=1, amount=50.00). Delete it.
DELETE FROM orders WHERE id = 1;
Trigger Capture
change_id | lsn | action | old_cust | old_amt | pk_hash
6 | 0/1A3F3100 | D | alice | 50.00 | -837291
Scan Delta
Single DELETE event:
__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
-837291 | D | alice | 50.00
Aggregate Delta
Group "alice":
ins_count = 0
del_count = 1
new_count = old_count + 0 - 1 = 1 - 1 = 0 → group vanishes!
When new_count drops to 0 (or below), the aggregate classifies this as action 'D' (delete). The reference count has reached zero — no rows contribute to this group anymore.
The aggregate emits a DELETE for alice's group:
customer | __pgt_row_id | __pgt_action
---------|--------------|-------------
alice | 7283194 | D
MERGE
The MERGE matches alice's existing row and deletes it:
WHEN MATCHED AND d.__pgt_action = 'D' THEN DELETE
Result:
SELECT * FROM customer_totals;
customer | total | order_count
----------|--------|------------
bob | 100.00 | 2
Alice's row is completely removed from the stream table. This is the correct behavior — with zero contributing rows, the group should not exist.
Case 3: Delete Multiple Rows (Same Group, Same Window)
-- Delete both of bob's orders before the next refresh
DELETE FROM orders WHERE id = 3; -- bob, 75.00
DELETE FROM orders WHERE id = 4; -- bob, 25.00
The change buffer has two rows with different pk_hash values (different PKs):
change_id | action | old_cust | old_amt | pk_hash
7 | D | bob | 75.00 | pk_hash_3
8 | D | bob | 25.00 | pk_hash_4
Scan Delta
Each PK has exactly one change, so both take the single-change fast path:
__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
pk_hash_3 | D | bob | 75.00
pk_hash_4 | D | bob | 25.00
Two DELETE events, both targeting bob's group.
Aggregate Delta
The aggregate sums both deletions:
Group "bob":
ins_count = 0
del_count = 2
del_total = 75.00 + 25.00 = 100.00
new_count = 2 + 0 - 2 = 0 → group vanishes!
The aggregate emits a DELETE for bob's group.
MERGE
Bob's row is deleted from the stream table. With both alice and bob gone (from Cases 1+2+3), the stream table is now empty.
Case 4: INSERT + DELETE in Same Window (Cancellation)
What if a row is inserted and then deleted before the next refresh?
INSERT INTO orders (customer, amount) VALUES ('charlie', 200.00);
DELETE FROM orders WHERE customer = 'charlie';
The change buffer has:
change_id | action | new_cust | new_amt | old_cust | old_amt | pk_hash
9 | I | charlie | 200.00 | NULL | NULL | pk_hash_new
10 | D | NULL | NULL | charlie | 200.00 | pk_hash_new
Net-Effect Computation
Both changes share the same pk_hash. The pk_stats CTE finds cnt = 2, so this goes through the multi-change path:
first_action = FIRST_VALUE(action) OVER (...) → 'I'
last_action = LAST_VALUE(action) OVER (...) → 'D'
The scan delta applies the net-effect filtering:
- DELETE branch: requires
first_action != 'I'→ FAILS (first_action = 'I') - INSERT branch: requires
last_action != 'D'→ FAILS (last_action = 'D')
Result: zero events emitted. The INSERT and DELETE completely cancel each other out.
The aggregate never sees charlie. The stream table is unchanged. This is correct — the row was born and died within the same refresh window, so it should have no visible effect.
Case 5: UPDATE + DELETE in Same Window
UPDATE orders SET amount = 999.99 WHERE id = 3; -- bob: 75 → 999.99
DELETE FROM orders WHERE id = 3;
The change buffer:
change_id | action | old_amt | new_amt
11 | U | 75.00 | 999.99
12 | D | 999.99 | NULL
Net-Effect Computation
Same pk_hash, cnt = 2:
first_action = 'U' (row existed before this window)
last_action = 'D' (row no longer exists)
Filtering:
- DELETE branch:
first_action != 'I'→ OK. Emit DELETE with old values from the earliest change:old_amt = 75.00 - INSERT branch:
last_action != 'D'→ FAILS. No INSERT emitted.
Net delta:
__pgt_row_id | __pgt_action | amount
-------------|--------------|-------
pk_hash_3 | D | 75.00
The intermediate value of 999.99 never appears. The aggregate sees only the removal of the original value (75.00), which is correct — that's the value that was previously accounted for in the stream table.
Case 6: DELETE with JOINs
Consider a stream table that joins two tables:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
tier TEXT NOT NULL DEFAULT 'standard'
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
amount NUMERIC(10,2)
);
SELECT pgtrickle.create_stream_table(
name => 'order_details',
query => $$
SELECT c.name, c.tier, o.amount
FROM orders o
JOIN customers c ON o.customer_id = c.id
$$,
schedule => '1m'
);
Seed data:
INSERT INTO customers VALUES (1, 'alice', 'premium'), (2, 'bob', 'standard');
INSERT INTO orders VALUES (1, 1, 50.00), (2, 1, 30.00), (3, 2, 75.00);
After refresh, the stream table has:
name | tier | amount
------|----------|-------
alice | premium | 50.00
alice | premium | 30.00
bob | standard | 75.00
Now delete an order:
DELETE FROM orders WHERE id = 2; -- alice's 30.00 order
How the JOIN Delta Works
The join differentiation formula:
$$\Delta(L \bowtie R) = (\Delta L \bowtie R) \cup (L \bowtie \Delta R) - (\Delta L \bowtie \Delta R)$$
Since only the orders table changed:
- $\Delta L$ = changes to orders (one DELETE: order #2)
- $\Delta R$ = changes to customers (empty)
So:
- Part 1: $\Delta\text{orders} \bowtie \text{customers}$ = the deleted order joined with its customer
- Part 2: $\text{orders} \bowtie \Delta\text{customers}$ = empty (no customer changes)
- Part 3: $\Delta\text{orders} \bowtie \Delta\text{customers}$ = empty (customers unchanged)
Part 1 produces:
name | tier | amount | __pgt_action
------|---------|--------|-------------
alice | premium | 30.00 | D
The deleted order is joined with alice's customer record to produce a DELETE delta row with the complete joined values.
MERGE
The MERGE matches the row (alice, premium, 30.00) and deletes it:
SELECT * FROM order_details;
name | tier | amount
-------|----------|-------
alice | premium | 50.00 ← alice's remaining order
bob | standard | 75.00
What About Deleting From the Dimension Table?
DELETE FROM customers WHERE id = 2; -- remove bob entirely
Now $\Delta R$ has a DELETE for bob, while $\Delta L$ is empty:
- Part 1: $\Delta\text{orders} \bowtie \text{customers}$ = empty
- Part 2: $\text{orders} \bowtie \Delta\text{customers}$ = bob's order(s) joined with deleted customer record
Part 2 produces DELETE events for every order that referenced bob:
name | tier | amount | __pgt_action
-----|----------|--------|-------------
bob | standard | 75.00 | D
After MERGE, bob's rows vanish from the stream table.
Note: This assumes referential integrity — if
ordersstill references customer #2, a foreign key constraint would prevent the DELETE in practice. But from the IVM perspective, the join delta correctly handles the removal regardless.
Case 7: Bulk DELETE
DELETE FROM orders WHERE amount < 50.00;
This deletes multiple rows across potentially multiple groups. The trigger fires once per row (it's a FOR EACH ROW trigger), writing one change buffer entry per deleted row:
change_id | action | old_cust | old_amt | pk_hash
13 | D | alice | 30.00 | pk_hash_2
14 | D | bob | 25.00 | pk_hash_4
Scan Delta
Each deleted PK is independent (different pk_hash values), so each takes the single-change fast path. Two DELETE events:
__pgt_row_id | __pgt_action | customer | amount
-------------|--------------|----------|-------
pk_hash_2 | D | alice | 30.00
pk_hash_4 | D | bob | 25.00
Aggregate Delta
The aggregate groups these by customer:
Group "alice":
del_count = 1, del_total = 30.00
new_count = 2 - 1 = 1 (survives)
Group "bob":
del_count = 1, del_total = 25.00
new_count = 2 - 1 = 1 (survives)
Both groups survive (count > 0), so the aggregate emits UPDATE (as 'I') events with new values:
customer | total | order_count
---------|-------|------------
alice | 50.00 | 1
bob | 75.00 | 1
The MERGE updates both rows. All work is proportional to the number of deleted rows (2), not the total table size.
Case 8: TRUNCATE (Automatic Full Refresh)
TRUNCATE orders;
TRUNCATE does not fire row-level triggers. However, as of v0.2.0, pg_trickle installs a statement-level AFTER TRUNCATE trigger that writes a 'T' marker to the change buffer. On the next refresh cycle, the scheduler detects this marker and automatically performs a full refresh — truncating the stream table and recomputing from the defining query.
No manual intervention is required. For details on how TRUNCATE is handled across all three refresh modes (DIFFERENTIAL, IMMEDIATE, FULL), see What Happens When You TRUNCATE a Table?.
How DELETE Differs From INSERT and UPDATE — A Summary
| Aspect | INSERT | UPDATE | DELETE |
|---|---|---|---|
| Trigger writes | new_* columns only | Both new_* and old_* | old_* columns only |
| new_ columns* | Row values | New values | NULL |
| old_ columns* | NULL | Old values | Row values |
| pk_hash source | NEW.pk | NEW.pk | OLD.pk |
| Scan delta output | 1 INSERT event | 2 events (D+I split) | 1 DELETE event |
| Aggregate effect | Adds to group count/sum | Subtracts old, adds new | Subtracts from group |
| Can delete a group? | No (only creates/grows) | Yes (if group key changes) | Yes (if count reaches 0) |
| MERGE action | INSERT new row | UPDATE existing row | DELETE matched row |
The Reference Counting Principle
The core insight behind incremental DELETE handling is reference counting. Every aggregate group in the stream table maintains an internal counter (__pgt_count) that tracks how many source rows contribute to the group:
Stream table internal state:
customer | total | order_count | __pgt_count (hidden)
---------|-------|-------------|---------------------
alice | 80.00 | 2 | 2
bob | 100.00| 2 | 2
- INSERT →
__pgt_count += 1 - DELETE →
__pgt_count -= 1 - UPDATE →
__pgt_count += 0(D cancels I for same-group updates)
When __pgt_count reaches 0:
- The group has zero contributing rows
- The aggregate emits a DELETE event
- The MERGE removes the row from the stream table
This is mathematically rigorous — the stream table always reflects the correct result of the defining query over the current base table contents, incrementally maintained through algebraic delta operations.
Performance Summary
| Scenario | Buffer rows | Delta rows emitted | Work |
|---|---|---|---|
| Single row DELETE (group survives) | 1 | 1 (D) | O(1) per group |
| Single row DELETE (group vanishes) | 1 | 1 (D) | O(1) |
| N deletes same group | N | N (D) → 1 group delta | O(N) scan, O(1) per group |
| INSERT+DELETE same window | 2 | 0 (cancels) | O(1) |
| UPDATE+DELETE same window | 2 | 1 (D original) | O(1) |
| Bulk DELETE across M groups | N | N (D) → M group deltas | O(N) scan, O(M) aggregate |
| JOIN table DELETE | 1 | K (one per matched join row) | O(K) join |
In all cases, the work is proportional to the number of changed rows, not the total table size. Deleting 3 rows from a billion-row table produces the same delta cost as from a 10-row table.
What About IMMEDIATE Mode?
Everything above describes DIFFERENTIAL mode — changes accumulate in a buffer and are applied on a schedule. As of v0.2.0, pg_trickle also supports IMMEDIATE mode, where the stream table is updated synchronously within the same transaction as your DELETE.
How IMMEDIATE Mode Differs for DELETE
| Phase | DIFFERENTIAL | IMMEDIATE |
|---|---|---|
| Trigger type | Row-level AFTER trigger | Statement-level AFTER trigger with REFERENCING OLD TABLE |
| What's captured | One buffer row with old_* columns per deleted row | A transition table containing all deleted rows |
| When delta runs | Next scheduler tick | Immediately, in the same transaction |
| Delta source | Change buffer rows with action='D' | Temp table copied from transition table |
| Concurrency | No locking between writers | Advisory lock per stream table |
When you run DELETE FROM orders WHERE id = 2:
- A BEFORE DELETE trigger acquires an advisory lock on the stream table
- The AFTER DELETE trigger captures
OLD TABLE AS __pgt_oldtableinto a temp table - The DVM engine generates the same aggregate delta, reading deleted values from the old-table
- The delta is applied to the stream table immediately — groups are decremented, and groups reaching count=0 are removed
- Any query within the same transaction sees the updated stream table
BEGIN;
DELETE FROM orders WHERE id = 2; -- alice's 30.00 order
-- customer_totals already reflects the deletion here!
SELECT * FROM customer_totals WHERE customer = 'alice';
-- Shows: alice | 50.00 | 1
COMMIT;
The same reference counting, group deletion, and net-effect logic applies — the only difference is the data source (transition tables vs change buffer) and timing (synchronous vs scheduled).
Next in This Series
- What Happens When You INSERT a Row? — The full 7-phase lifecycle (start here if you haven't already)
- What Happens When You UPDATE a Row? — D+I split, group key changes, net-effect for multiple UPDATEs
- What Happens When You TRUNCATE a Table? — Why TRUNCATE bypasses triggers and how to recover
What Happens When You TRUNCATE a Table?
This tutorial explains what happens when a TRUNCATE statement hits a base table that is referenced by a stream table. Unlike INSERT, UPDATE, and DELETE — which are fully tracked by the CDC trigger — TRUNCATE is a special case that bypasses row-level triggers entirely. Understanding this gap is essential for operating pg_trickle correctly.
Prerequisite: Read WHAT_HAPPENS_ON_INSERT.md first — it introduces the 7-phase lifecycle. This tutorial explains why TRUNCATE breaks that lifecycle and how to recover.
Setup
Same e-commerce example used throughout the series:
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL
);
SELECT pgtrickle.create_stream_table(
name => 'customer_totals',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
schedule => '1m'
);
-- Seed some data
INSERT INTO orders (customer, amount) VALUES
('alice', 50.00),
('alice', 30.00),
('bob', 75.00),
('bob', 25.00);
After the first refresh, the stream table contains:
customer | total | order_count
---------|--------|------------
alice | 80.00 | 2
bob | 100.00 | 2
Case 1: TRUNCATE the Base Table (DIFFERENTIAL Mode)
TRUNCATE orders;
All four rows are removed instantly.
What Happens at the Trigger Level: TRUNCATE Marker
Updated in v0.2.0: pg_trickle now installs a statement-level AFTER TRUNCATE trigger on tracked source tables. This trigger writes a single marker row to the change buffer with
action = 'T'.
Unlike the per-row DML triggers, the TRUNCATE trigger cannot capture individual row data (PostgreSQL's TRUNCATE does not provide OLD records). Instead, it writes a sentinel:
pgtrickle_changes.changes_16384
┌───────────┬─────────────┬────────┬──────────┬──────────┐
│ change_id │ lsn │ action │ new_* │ old_* │
├───────────┼─────────────┼────────┼──────────┼──────────┤
│ 5 │ 0/1A3F4000 │ T │ NULL │ NULL │
└───────────┴─────────────┴────────┴──────────┴──────────┘
The 'T' action marker tells the refresh engine: "a TRUNCATE happened — a full refresh is required."
What Happens at the Scheduler: Automatic Full Refresh
On the next refresh cycle, the scheduler:
- Checks the change buffer for rows in the LSN window
- Finds the
action = 'T'marker row - Falls back to a FULL refresh — regardless of the stream table's configured
refresh_mode - TRUNCATEs the stream table
- Re-executes the defining query against the current base table state
- Inserts all results
Since the orders table is now empty, the defining query returns zero rows:
-- After the next scheduled refresh:
SELECT * FROM customer_totals;
customer | total | order_count
----------|-------|------------
(0 rows) ← correct: orders is empty
No manual intervention required. The TRUNCATE marker ensures the stream table is automatically brought back into consistency on the next refresh cycle.
Note: In versions before v0.2.0, TRUNCATE was not captured at all — the change buffer stayed empty and the stream table became silently stale. If you're running an older version, you still need to call
pgtrickle.refresh_stream_table()manually after a TRUNCATE.
Case 2: Manual Refresh (Explicit Recovery)
Although TRUNCATE is now automatically handled on the next refresh cycle, you can force an immediate recovery without waiting:
SELECT pgtrickle.refresh_stream_table('customer_totals');
This executes a full refresh regardless of the stream table's configured refresh mode:
- TRUNCATE the stream table itself (clearing the stale data)
- Re-execute the defining query
- INSERT the results into the stream table
- Update the frontier so future differential refreshes start from the current LSN
This is useful when you can't wait for the next scheduled refresh cycle and need the stream table consistent immediately.
Case 3: TRUNCATE Then INSERT (Common ETL Pattern)
A common data loading pattern is:
BEGIN;
TRUNCATE orders;
INSERT INTO orders (customer, amount) VALUES
('charlie', 100.00),
('charlie', 200.00),
('dave', 150.00);
COMMIT;
What the Change Buffer Sees
- TRUNCATE: 1 marker event (
action = 'T') — captured by the statement-level trigger - INSERT charlie 100.00: 1 event (captured)
- INSERT charlie 200.00: 1 event (captured)
- INSERT dave 150.00: 1 event (captured)
The change buffer has 4 rows — the TRUNCATE marker plus 3 INSERT events.
What the Scheduler Does
The scheduler sees the action = 'T' marker and triggers a full refresh, ignoring the individual INSERT events. The full refresh re-executes the defining query against the current state of orders, which now contains only charlie and dave:
-- After the next scheduled refresh:
SELECT * FROM customer_totals;
customer | total | order_count
----------|--------|------------
charlie | 300.00 | 2 ← correct
dave | 150.00 | 1 ← correct
The old data (alice, bob) is gone because the full refresh recomputed from scratch. This is correct — the TRUNCATE marker ensures consistency regardless of what other changes occurred in the same window.
Case 4: TRUNCATE a Dimension Table in a JOIN
Consider a stream table that joins two tables:
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
tier TEXT NOT NULL DEFAULT 'standard'
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(id),
amount NUMERIC(10,2)
);
SELECT pgtrickle.create_stream_table(
name => 'order_details',
query => $$
SELECT c.name, c.tier, o.amount
FROM orders o
JOIN customers c ON o.customer_id = c.id
$$,
schedule => '1m'
);
Now truncate the dimension table:
TRUNCATE customers CASCADE;
The CASCADE also truncates orders (due to the foreign key). Both tables have TRUNCATE triggers installed, so both write a 'T' marker to their respective change buffers.
On the next refresh cycle, the scheduler detects the TRUNCATE markers and performs a full refresh. The stream table is recomputed from the now-empty base tables:
-- After the next scheduled refresh:
SELECT * FROM order_details;
-- (0 rows) — correct
Case 5: FULL Mode Stream Tables Are Immune
If the stream table uses FULL refresh mode instead of DIFFERENTIAL:
SELECT pgtrickle.create_stream_table(
name => 'customer_totals_full',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
schedule => '1m',
refresh_mode => 'FULL'
);
A FULL-mode stream table doesn't use the change buffer at all. Every refresh cycle:
- TRUNCATEs the stream table
- Re-executes the defining query
- Inserts all results
So after a TRUNCATE of the base table, the next scheduled refresh automatically picks up the correct state — no manual intervention needed. The trade-off is that every refresh recomputes from scratch, which is more expensive for large result sets.
Why PostgreSQL Doesn't Fire Row Triggers on TRUNCATE
Understanding the PostgreSQL internals helps explain why per-row capture is impossible:
| Operation | Mechanism | Row triggers fired? | Statement triggers fired? |
|---|---|---|---|
DELETE FROM t | Scans and removes rows one by one | Yes — AFTER DELETE per row | Yes |
TRUNCATE t | Removes all heap files and reinitializes the table storage | No — no per-row processing | Yes — AFTER TRUNCATE |
DELETE FROM t WHERE true | Same as DELETE FROM t (full scan) | Yes — AFTER DELETE per row | Yes |
TRUNCATE is fundamentally different from DELETE. It's an O(1) operation that replaces the table's storage files, while DELETE is O(N) — scanning every row and recording each removal in WAL.
pg_trickle uses a statement-level AFTER TRUNCATE trigger to detect the event and write a 'T' marker to the change buffer. This marker does not contain per-row data (PostgreSQL's TRUNCATE trigger doesn't provide OLD records), but it's sufficient to signal that a full refresh is needed.
Alternative: DELETE FROM Instead of TRUNCATE
For DIFFERENTIAL mode, TRUNCATE is now handled automatically (via the 'T' marker and full refresh fallback). However, using DELETE FROM instead of TRUNCATE has its own advantages:
-- Instead of: TRUNCATE orders;
DELETE FROM orders;
This fires the row-level DELETE trigger for every row. The change buffer captures all removals, and the next differential refresh correctly decrements all reference counts through the standard algebraic delta path — avoiding the need for a full refresh fallback.
| Approach | Speed | Stream table consistent? | Refresh type |
|---|---|---|---|
TRUNCATE orders | O(1) — instant | Yes — automatic full refresh on next cycle | FULL (fallback) |
DELETE FROM orders | O(N) — scans all rows | Yes — per-row triggers fire | DIFFERENTIAL |
TRUNCATE + manual refresh | O(1) + O(query) | Yes — immediately | FULL (manual) |
For tables with millions of rows, DELETE FROM can be slow and generate significant WAL. TRUNCATE is generally the better choice — the automatic full refresh fallback makes it safe to use.
Best Practices
1. TRUNCATE Is Safe to Use
As of v0.2.0, TRUNCATE on tracked source tables is automatically detected and triggers a full refresh on the next scheduler cycle. No manual intervention is required for standard operation.
2. Use Manual Refresh for Immediate Consistency
If you need the stream table to be consistent immediately (not on the next cycle), call refresh explicitly:
TRUNCATE orders;
SELECT pgtrickle.refresh_stream_table('customer_totals');
3. Consider IMMEDIATE Mode for Real-Time Needs
For stream tables that need to reflect TRUNCATE instantly (within the same transaction), use IMMEDIATE mode. The TRUNCATE trigger automatically performs a full refresh synchronously.
4. Consider FULL Mode for ETL-Heavy Tables
If a table is routinely truncated and reloaded, FULL refresh mode may be simpler than DIFFERENTIAL — it naturally handles TRUNCATE because it recomputes from scratch every cycle.
5. Use trigger_inventory() to Verify Triggers
You can verify that both the DML trigger and the TRUNCATE trigger are installed and enabled:
SELECT * FROM pgtrickle.trigger_inventory();
This shows one row per (source table, trigger type) confirming both pg_trickle_cdc_<oid> (DML) and pg_trickle_cdc_truncate_<oid> (TRUNCATE) triggers are present.
How TRUNCATE Compares to Other Operations
| Aspect | INSERT | UPDATE | DELETE | TRUNCATE |
|---|---|---|---|---|
| Row trigger fires? | Yes (per row) | Yes (per row) | Yes (per row) | No |
| Statement trigger fires? | Yes | Yes | Yes | Yes (writes 'T' marker) |
| Change buffer | 1 row per INSERT | 1 row per UPDATE | 1 row per DELETE | 1 marker row (action='T') |
| Stream table updated? | Yes (next refresh) | Yes (next refresh) | Yes (next refresh) | Yes (full refresh on next cycle) |
| Recovery | Automatic (differential) | Automatic (differential) | Automatic (differential) | Automatic (full refresh fallback) |
| FULL mode affected? | N/A (recomputes) | N/A (recomputes) | N/A (recomputes) | N/A (recomputes) |
| IMMEDIATE mode? | Synchronous delta | Synchronous delta | Synchronous delta | Synchronous full refresh |
| Speed | O(1) per row | O(1) per row | O(1) per row | O(1) + O(query) for refresh |
What About IMMEDIATE Mode?
In IMMEDIATE mode, TRUNCATE is handled synchronously within the same transaction:
- The BEFORE TRUNCATE trigger acquires an advisory lock on the stream table
- The AFTER TRUNCATE trigger calls
pgt_ivm_handle_truncate(pgt_id) - This function TRUNCATEs the stream table and re-populates it by re-executing the defining query
- The stream table is immediately consistent — within the same transaction
SELECT pgtrickle.create_stream_table(
name => 'customer_totals_live',
query => $$
SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer
$$,
refresh_mode => 'IMMEDIATE'
);
BEGIN;
TRUNCATE orders;
-- customer_totals_live is already empty here!
SELECT * FROM customer_totals_live; -- (0 rows)
COMMIT;
No waiting for a scheduler cycle, no stale data — TRUNCATE is fully handled in real-time.
Summary
As of v0.2.0, TRUNCATE is fully tracked by pg_trickle across all three refresh modes. While it cannot be captured as per-row DELETE events (PostgreSQL's TRUNCATE doesn't process individual rows), pg_trickle uses a statement-level trigger to detect the event and respond appropriately.
The key takeaways:
- TRUNCATE is automatically handled — a statement-level AFTER TRUNCATE trigger writes a
'T'marker to the change buffer - DIFFERENTIAL mode: automatic full refresh — the scheduler detects the marker and falls back to a full refresh on the next cycle
- IMMEDIATE mode: synchronous full refresh — the stream table is rebuilt within the same transaction
- FULL mode: naturally immune — every refresh recomputes from scratch regardless
- Manual refresh for instant consistency — call
pgtrickle.refresh_stream_table()if you can't wait for the next cycle DELETE FROMremains an alternative — fires per-row triggers, enabling incremental delta processing instead of full refresh fallback
Next in This Series
- What Happens When You INSERT a Row? — The full 7-phase lifecycle (start here if you haven't already)
- What Happens When You UPDATE a Row? — D+I split, group key changes, net-effect for multiple UPDATEs
- What Happens When You DELETE a Row? — Reference counting, group deletion, INSERT+DELETE cancellation
Row-Level Security (RLS) on Stream Tables
This tutorial shows how to apply PostgreSQL Row-Level Security to stream tables so that different database roles see only the rows they are permitted to access.
Background
Stream tables materialize the full result set of their defining query,
regardless of any RLS policies on the source tables. This matches the behavior
of PostgreSQL's built-in MATERIALIZED VIEW — the cache contains everything,
and access control is enforced at read time.
The recommended pattern is:
- Source tables: may or may not have RLS. Stream tables always see all rows.
- Stream table: enable RLS on the stream table and create per-role policies so each role sees only its permitted rows.
Setup: Multi-Tenant Orders
-- Source table: all tenant orders
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
tenant_id INT NOT NULL,
product TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL
);
INSERT INTO orders (tenant_id, product, amount) VALUES
(1, 'Widget A', 19.99),
(1, 'Widget B', 9.50),
(2, 'Gadget X', 49.00),
(2, 'Gadget Y', 25.00),
(3, 'Doohickey', 5.00);
-- Stream table: per-tenant spend summary
SELECT pgtrickle.create_stream_table(
name => 'tenant_spend',
query => $$
SELECT tenant_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spend
FROM orders
GROUP BY tenant_id
$$,
schedule => '1m'
);
After the first refresh, tenant_spend contains all three tenants:
SELECT * FROM pgtrickle.tenant_spend ORDER BY tenant_id;
-- tenant_id | order_count | total_spend
-- -----------+-------------+-------------
-- 1 | 2 | 29.49
-- 2 | 2 | 74.00
-- 3 | 1 | 5.00
Step 1: Enable RLS on the Stream Table
ALTER TABLE pgtrickle.tenant_spend ENABLE ROW LEVEL SECURITY;
Once RLS is enabled, non-superuser roles see zero rows unless a policy grants access. The superuser (table owner) bypasses RLS by default.
Step 2: Create Per-Tenant Roles
CREATE ROLE tenant_1 LOGIN;
CREATE ROLE tenant_2 LOGIN;
GRANT USAGE ON SCHEMA pgtrickle TO tenant_1, tenant_2;
GRANT SELECT ON pgtrickle.tenant_spend TO tenant_1, tenant_2;
Step 3: Create RLS Policies
-- Tenant 1 sees only tenant_id = 1
CREATE POLICY tenant_1_policy ON pgtrickle.tenant_spend
FOR SELECT
TO tenant_1
USING (tenant_id = 1);
-- Tenant 2 sees only tenant_id = 2
CREATE POLICY tenant_2_policy ON pgtrickle.tenant_spend
FOR SELECT
TO tenant_2
USING (tenant_id = 2);
Step 4: Verify Filtering
Connect as each tenant role and query:
-- As tenant_1:
SET ROLE tenant_1;
SELECT * FROM pgtrickle.tenant_spend;
-- tenant_id | order_count | total_spend
-- -----------+-------------+-------------
-- 1 | 2 | 29.49
RESET ROLE;
-- As tenant_2:
SET ROLE tenant_2;
SELECT * FROM pgtrickle.tenant_spend;
-- tenant_id | order_count | total_spend
-- -----------+-------------+-------------
-- 2 | 2 | 74.00
RESET ROLE;
Each tenant sees only their own data. The underlying stream table still contains all rows — the filtering happens at query time via RLS.
How Refresh Works with RLS
Both scheduled and manual refreshes run with superuser-equivalent privileges, so RLS on source tables is always bypassed during refresh. This ensures:
- The stream table always contains the complete result set.
- A
refresh_stream_table()call produces the same result regardless of who calls it. - IMMEDIATE mode (IVM triggers) also bypasses RLS via
SECURITY DEFINERtrigger functions.
Policy Change Detection
pg_trickle automatically detects RLS-related DDL on source tables:
| DDL on source table | Effect |
|---|---|
CREATE POLICY / ALTER POLICY / DROP POLICY | Stream table marked for reinit |
ALTER TABLE ... ENABLE ROW LEVEL SECURITY | Stream table marked for reinit |
ALTER TABLE ... DISABLE ROW LEVEL SECURITY | Stream table marked for reinit |
ALTER TABLE ... FORCE ROW LEVEL SECURITY | Stream table marked for reinit |
ALTER TABLE ... NO FORCE ROW LEVEL SECURITY | Stream table marked for reinit |
Since the stream table always sees all rows (bypassing RLS), these reinits serve as a confirmation that the materialized data remains consistent after the security posture of the source table changed.
Tips
- One stream table, many roles: A single stream table can serve all tenants. Each role's RLS policy filters at read time — no per-tenant duplication needed.
- Write policies: Stream tables are maintained by pg_trickle. Restrict
writes to the pg_trickle system by only creating
FOR SELECTpolicies. - Default deny: Once RLS is enabled, roles without a matching policy see zero rows. Always test with a non-superuser role.
- FORCE ROW LEVEL SECURITY: By default, table owners bypass RLS. Use
ALTER TABLE ... FORCE ROW LEVEL SECURITYif the owner should also be subject to policies.
Partitioned Tables as Sources
This tutorial shows how pg_trickle works with PostgreSQL's declarative table partitioning. It covers RANGE, LIST, and HASH partitioned source tables, explains what happens when you add or remove partitions, and documents known caveats.
Background
PostgreSQL lets you split large tables into smaller "partitions" — for
example one partition per month for an orders table. This is a common
technique for managing very large datasets. pg_trickle handles partitioned
source tables transparently:
-
CDC triggers fire on all partitions. PostgreSQL 13+ automatically clones row-level triggers from the parent to every child partition. All DML (INSERT, UPDATE, DELETE) on any partition is captured in a single change buffer keyed by the parent table's OID.
-
ATTACH PARTITION is detected automatically. When you add a new partition with pre-existing data, pg_trickle's DDL event trigger detects the change and marks affected stream tables for reinitialization. No manual intervention required.
-
WAL-based CDC works correctly. When using WAL mode, publications are created with
publish_via_partition_root = trueso all partition changes appear under the parent table's identity.
Example: Monthly Sales Partitions (RANGE)
-- Create a RANGE-partitioned source table
CREATE TABLE sales (
id SERIAL,
sale_date DATE NOT NULL,
region TEXT NOT NULL,
amount NUMERIC NOT NULL,
PRIMARY KEY (id, sale_date)
) PARTITION BY RANGE (sale_date);
-- Create partitions for each half of the year
CREATE TABLE sales_h1_2025 PARTITION OF sales
FOR VALUES FROM ('2025-01-01') TO ('2025-07-01');
CREATE TABLE sales_h2_2025 PARTITION OF sales
FOR VALUES FROM ('2025-07-01') TO ('2026-01-01');
-- Insert data across partitions
INSERT INTO sales (sale_date, region, amount) VALUES
('2025-02-15', 'US', 100.00),
('2025-05-20', 'EU', 250.00),
('2025-08-10', 'US', 175.00),
('2025-11-30', 'EU', 300.00);
-- Create a stream table over the partitioned source
SELECT pgtrickle.create_stream_table(
name => 'regional_sales',
query => $$
SELECT region, SUM(amount) AS total, COUNT(*) AS cnt
FROM sales
GROUP BY region
$$,
schedule => '1 minute',
refresh_mode => 'DIFFERENTIAL'
);
-- Refresh to populate
SELECT pgtrickle.refresh_stream_table('regional_sales');
-- Verify — aggregates span all partitions:
SELECT * FROM regional_sales ORDER BY region;
-- region | total | cnt
-- --------+--------+-----
-- EU | 550.00 | 2
-- US | 275.00 | 2
Adding New Partitions
When you add a new partition, any new rows inserted through the parent are automatically captured by CDC triggers. The trigger on the parent is cloned to the new partition by PostgreSQL.
-- Add a new partition for 2026
CREATE TABLE sales_h1_2026 PARTITION OF sales
FOR VALUES FROM ('2026-01-01') TO ('2026-07-01');
-- Inserts into the new partition are captured normally
INSERT INTO sales (sale_date, region, amount)
VALUES ('2026-03-15', 'US', 400.00);
-- Next refresh picks up the new row
SELECT pgtrickle.refresh_stream_table('regional_sales');
SELECT * FROM regional_sales ORDER BY region;
-- region | total | cnt
-- --------+--------+-----
-- EU | 550.00 | 2
-- US | 675.00 | 3
ATTACH PARTITION with Pre-Existing Data
The most important edge case: attaching a table that already contains rows. These rows were never seen by CDC triggers, so the stream table would be stale. pg_trickle detects this automatically.
-- Create a standalone table with existing data
CREATE TABLE sales_h2_2026 (
id SERIAL,
sale_date DATE NOT NULL,
region TEXT NOT NULL,
amount NUMERIC NOT NULL,
PRIMARY KEY (id, sale_date)
);
INSERT INTO sales_h2_2026 (sale_date, region, amount) VALUES
('2026-08-01', 'EU', 500.00),
('2026-09-15', 'US', 200.00);
-- Attach it to the partitioned table
ALTER TABLE sales ATTACH PARTITION sales_h2_2026
FOR VALUES FROM ('2026-07-01') TO ('2027-01-01');
-- pg_trickle detects the partition change and marks the stream table
-- for reinitialize. Check:
SELECT pgt_name, needs_reinit
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'regional_sales';
-- pgt_name | needs_reinit
-- -----------------+--------------
-- regional_sales | t
-- The next refresh reinitializes — re-reading all data from scratch:
SELECT pgtrickle.refresh_stream_table('regional_sales');
SELECT * FROM regional_sales ORDER BY region;
-- region | total | cnt
-- --------+---------+-----
-- EU | 1050.00 | 3
-- US | 875.00 | 4
DETACH PARTITION
When you detach a partition, the detached table's data is no longer visible through the parent. pg_trickle detects this too and marks stream tables for reinitialize.
-- Archive the old partition
ALTER TABLE sales DETACH PARTITION sales_h1_2025;
-- Stream table is marked for reinit:
SELECT pgt_name, needs_reinit
FROM pgtrickle.pgt_stream_tables
WHERE pgt_name = 'regional_sales';
-- pgt_name | needs_reinit
-- -----------------+--------------
-- regional_sales | t
-- After refresh, the detached partition's rows are gone:
SELECT pgtrickle.refresh_stream_table('regional_sales');
SELECT * FROM regional_sales ORDER BY region;
-- (only rows from remaining partitions)
LIST Partitioning
LIST partitioning splits rows by discrete values. It works identically:
CREATE TABLE events (
id SERIAL,
region TEXT NOT NULL,
payload TEXT,
PRIMARY KEY (id, region)
) PARTITION BY LIST (region);
CREATE TABLE events_us PARTITION OF events FOR VALUES IN ('US');
CREATE TABLE events_eu PARTITION OF events FOR VALUES IN ('EU');
CREATE TABLE events_ap PARTITION OF events FOR VALUES IN ('AP');
SELECT pgtrickle.create_stream_table(
name => 'event_counts',
query => 'SELECT region, count(*) AS cnt FROM events GROUP BY region',
schedule => '1 minute'
);
HASH Partitioning
HASH partitioning distributes rows across a fixed number of partitions. Useful for spreading write load evenly:
CREATE TABLE metrics (
id SERIAL PRIMARY KEY,
sensor_id INT NOT NULL,
value DOUBLE PRECISION
) PARTITION BY HASH (id);
CREATE TABLE metrics_0 PARTITION OF metrics
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE metrics_1 PARTITION OF metrics
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE metrics_2 PARTITION OF metrics
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE metrics_3 PARTITION OF metrics
FOR VALUES WITH (MODULUS 4, REMAINDER 3);
SELECT pgtrickle.create_stream_table(
name => 'sensor_avg',
query => $$
SELECT sensor_id, AVG(value) AS avg_val, COUNT(*) AS cnt
FROM metrics GROUP BY sensor_id
$$,
schedule => '1 minute'
);
Foreign Tables
Tables from other databases (via postgres_fdw) can be used as sources,
but with restrictions:
- No trigger-based CDC — foreign tables don't support row-level triggers.
- No WAL-based CDC — foreign tables don't generate local WAL.
- FULL refresh works —
SELECT *executes a remote query each time. - Polling-based CDC works — when
pg_trickle.foreign_table_pollingis enabled, pg_trickle creates a local snapshot table and detects changes viaEXCEPT ALLcomparison.
When you use a foreign table as a source, pg_trickle emits an info message explaining the limitations:
CREATE EXTENSION postgres_fdw;
CREATE SERVER remote_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'remote-host', dbname 'analytics');
CREATE USER MAPPING FOR CURRENT_USER
SERVER remote_db OPTIONS (user 'reader');
CREATE FOREIGN TABLE remote_orders (
id INT,
amount NUMERIC
) SERVER remote_db OPTIONS (table_name 'orders');
-- Only FULL refresh is available:
SELECT pgtrickle.create_stream_table(
name => 'remote_totals',
query => 'SELECT SUM(amount) AS total FROM remote_orders',
schedule => '5 minutes',
refresh_mode => 'FULL'
);
-- INFO: pg_trickle: source table remote_orders is a foreign table.
-- Foreign tables cannot use trigger-based or WAL-based CDC —
-- only FULL refresh mode or polling-based change detection is supported.
Known Caveats
| Caveat | Description |
|---|---|
| PostgreSQL 13+ required | Parent-table triggers only propagate to child partitions on PG 13+. pg_trickle targets PostgreSQL 18, so this is always satisfied. |
| Partition key in PRIMARY KEY | PostgreSQL requires the partition key to be part of any unique constraint. This means your PRIMARY KEY must include the partition column. |
| ATTACH with data = reinitialize | Attaching a partition with pre-existing rows triggers a full reinitialize on the next refresh. For very large tables, this may be slow. Consider gating the source with pgtrickle.gate_source() during bulk partition operations. |
| Sub-partitioning | Multi-level partitioning (partitions of partitions) works in principle because triggers propagate through the entire hierarchy, but it is not extensively tested. |
| pg_partman compatibility | pg_partman dynamically creates and drops partitions. Since pg_trickle detects ATTACH/DETACH via DDL event triggers, it should work, but this combination is not yet tested. |
| Partitioned storage tables | Using a partitioned table as the stream table's storage is not supported. This is tracked for a future release. |
| DETACH PARTITION CONCURRENTLY | DETACH PARTITION ... CONCURRENTLY is a two-phase operation. The DDL event trigger fires after the first phase; the partition is not fully detached until the second phase commits. The stream table may briefly reflect the old partition count. |
Foreign Table Sources
This tutorial shows how to use a postgres_fdw foreign table as a source for
a stream table. Foreign tables let you aggregate data from remote PostgreSQL
databases into a local stream table that refreshes automatically.
Background
PostgreSQL's Foreign Data Wrapper
(postgres_fdw) lets you define tables that transparently query a remote
database. pg_trickle can use these foreign tables as stream table sources,
but with different change-detection semantics than regular tables.
Key difference: Foreign tables cannot use trigger-based or WAL-based CDC. Changes are detected either by re-scanning the entire remote table (FULL refresh) or by comparing a local snapshot to the remote data (polling-based CDC).
Step 1 — Set Up the Foreign Server
-- Enable the foreign data wrapper extension
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
-- Create a connection to the remote database
CREATE SERVER warehouse_db
FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'warehouse.example.com', dbname 'analytics', port '5432');
-- Map the current user to a remote user
CREATE USER MAPPING FOR CURRENT_USER
SERVER warehouse_db
OPTIONS (user 'readonly_user', password 'secret');
Step 2 — Define the Foreign Table
CREATE FOREIGN TABLE remote_orders (
id INT,
customer_id INT,
amount NUMERIC(12,2),
region TEXT,
created_at TIMESTAMP
) SERVER warehouse_db
OPTIONS (schema_name 'public', table_name 'orders');
Alternatively, import an entire remote schema:
IMPORT FOREIGN SCHEMA public
LIMIT TO (orders, customers)
FROM SERVER warehouse_db
INTO public;
Step 3 — Create a Stream Table with FULL Refresh
The simplest approach uses FULL refresh mode — pg_trickle re-executes the query against the remote table on every refresh cycle:
SELECT pgtrickle.create_stream_table(
name => 'orders_by_region',
query => $$
SELECT
region,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value
FROM remote_orders
GROUP BY region
$$,
schedule => '5m',
refresh_mode => 'FULL'
);
pg_trickle will emit an informational message:
INFO: pg_trickle: source table remote_orders is a foreign table.
Foreign tables cannot use trigger-based or WAL-based CDC —
only FULL refresh mode or polling-based change detection is supported.
How FULL refresh works with foreign tables:
- Every 5 minutes, pg_trickle executes the defining query.
- The query is sent to the remote database via
postgres_fdw. - The complete result set replaces the stream table contents.
- This is equivalent to a
MATERIALIZED VIEWrefresh, but automated.
Step 4 — Polling-Based CDC (Optional)
If the remote table is large and changes are small, FULL refresh becomes expensive because it transfers the entire result set every cycle. Polling-based CDC provides a more efficient alternative:
-- Enable polling globally (or per-session)
SET pg_trickle.foreign_table_polling = on;
-- Now create with DIFFERENTIAL mode — pg_trickle will use polling
SELECT pgtrickle.create_stream_table(
name => 'orders_by_region_polling',
query => $$
SELECT
region,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_order_value
FROM remote_orders
GROUP BY region
$$,
schedule => '5m',
refresh_mode => 'FULL'
);
How polling works:
- On the first refresh, pg_trickle creates a local snapshot table that mirrors the remote table's data.
- On subsequent refreshes, it fetches the current remote data and computes
an
EXCEPT ALLdifference against the snapshot. - Only the changed rows are written to the change buffer and processed through the incremental delta pipeline.
- The snapshot table is updated to reflect the new remote state.
- When the stream table is dropped, the snapshot table is cleaned up.
Trade-offs:
| Aspect | FULL Refresh | Polling CDC |
|---|---|---|
| Network transfer | Full result set every cycle | Full remote scan, but only diffs applied |
| Local storage | Stream table only | Stream table + snapshot table |
| Best for | Small remote tables | Large remote tables with small change rates |
| GUC required | No | pg_trickle.foreign_table_polling = on |
Step 5 — Verify and Monitor
-- Check stream table status
SELECT * FROM pgtrickle.pgt_status('orders_by_region');
-- Check CDC health (will show foreign table constraints)
SELECT * FROM pgtrickle.check_cdc_health();
-- View refresh history
SELECT * FROM pgtrickle.get_refresh_history('orders_by_region', 5);
-- Monitor staleness
SELECT * FROM pgtrickle.get_staleness('orders_by_region');
Worked Example — Remote Inventory Dashboard
This example aggregates inventory data from a remote warehouse database into a local dashboard table:
-- Remote table definition
CREATE FOREIGN TABLE remote_inventory (
sku TEXT,
warehouse TEXT,
quantity INT,
updated_at TIMESTAMP
) SERVER warehouse_db
OPTIONS (schema_name 'inventory', table_name 'stock_levels');
-- Dashboard: inventory summary by warehouse
SELECT pgtrickle.create_stream_table(
name => 'inventory_dashboard',
query => $$
SELECT
warehouse,
COUNT(DISTINCT sku) AS unique_products,
SUM(quantity) AS total_units,
MIN(updated_at) AS oldest_update,
MAX(updated_at) AS newest_update
FROM remote_inventory
GROUP BY warehouse
$$,
schedule => '10m',
refresh_mode => 'FULL'
);
After the first refresh:
SELECT * FROM inventory_dashboard;
warehouse | unique_products | total_units | oldest_update | newest_update
-----------+-----------------+-------------+---------------------+---------------------
east | 142 | 23500 | 2026-03-14 08:00:00 | 2026-03-14 09:15:00
west | 98 | 15200 | 2026-03-14 07:30:00 | 2026-03-14 09:10:00
central | 215 | 41000 | 2026-03-14 06:00:00 | 2026-03-14 09:20:00
Constraints and Caveats
| Constraint | Details |
|---|---|
| No trigger CDC | Foreign tables don't support PostgreSQL row-level triggers. |
| No WAL CDC | Foreign tables don't generate local WAL entries. |
| Network latency | Each refresh cycle queries the remote database. Schedule accordingly. |
| Remote availability | If the remote database is down, the refresh will fail (logged in pgt_refresh_history). The stream table retains its last successful data. |
| Authentication | CREATE USER MAPPING credentials must remain valid. Use .pgpass or environment variables in production. |
| Snapshot storage | Polling CDC creates a snapshot table sized proportionally to the remote table. Monitor disk usage. |
FAQ
Q: Why does my foreign table stream table only work in FULL mode?
Foreign tables cannot install row-level triggers (the mechanism pg_trickle uses
for trigger-based CDC) and don't generate local WAL records (used by WAL-based
CDC). FULL refresh works because it simply re-executes the remote query.
Enable pg_trickle.foreign_table_polling if you need differential-style
change detection.
Q: Can I mix foreign and local tables in the same defining query?
Yes. If your query joins a foreign table with a local table, pg_trickle uses trigger/WAL CDC for the local table and FULL-rescan or polling for the foreign table. The refresh mode must be FULL unless polling is enabled for the foreign table sources.
Q: What happens if the remote database is temporarily unavailable?
The refresh attempt fails, is logged in pgt_refresh_history with status
FAILED, and the consecutive_errors counter increments. The stream table
retains its last successful data. When the remote database recovers, the next
scheduled refresh succeeds and the error counter resets.
Tutorial: Migrating from Materialized Views
This guide shows how to incrementally migrate existing PostgreSQL
MATERIALIZED VIEW + manual REFRESH workflows to pg_trickle stream
tables.
Why Migrate?
| Materialized View | Stream Table | |
|---|---|---|
| Refresh | Manual (REFRESH MATERIALIZED VIEW) | Automatic (scheduler) or manual |
| Incremental refresh | Not supported | Built-in differential mode |
| Blocking reads | REFRESH without CONCURRENTLY blocks readers | Never blocks readers |
| Dependency ordering | Manual | Automatic (DAG-aware topological refresh) |
| Monitoring | None | Built-in views, stats, NOTIFY alerts |
| Scheduling | External (cron, pg_cron) | Native (duration, cron, CALCULATED) |
Step-by-Step Migration
1. Identify materialized views to migrate
-- List all materialized views with their defining queries
SELECT schemaname, matviewname, definition
FROM pg_matviews
ORDER BY schemaname, matviewname;
2. Create the stream table
Take the materialized view's defining query and pass it to
create_stream_table():
Before (materialized view):
CREATE MATERIALIZED VIEW order_totals AS
SELECT customer_id, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id;
-- Refreshed via cron or pg_cron:
-- */5 * * * * psql -c "REFRESH MATERIALIZED VIEW CONCURRENTLY order_totals"
After (stream table):
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer_id',
schedule => '5m'
);
3. Update application queries
Stream tables live in the pgtrickle schema by default. Update your
application queries to reference the new location:
-- Before:
SELECT * FROM public.order_totals WHERE total > 1000;
-- After:
SELECT * FROM pgtrickle.order_totals WHERE total > 1000;
Or create a view in the original schema for backward compatibility:
CREATE VIEW public.order_totals AS
SELECT customer_id, total, order_count
FROM pgtrickle.order_totals;
4. Recreate indexes
Stream tables are regular heap tables — you can add indexes just like any other table. Recreate the indexes your queries depend on:
-- Before (on materialized view):
CREATE UNIQUE INDEX ON order_totals (customer_id);
-- After (on stream table):
CREATE INDEX ON pgtrickle.order_totals (customer_id);
Note: The
__pgt_row_idcolumn is the primary key on stream tables. You cannot add a separateUNIQUEprimary key, but you can add regular or unique indexes on your business columns.
5. Remove the old materialized view
Once you've verified the stream table is working correctly:
DROP MATERIALIZED VIEW IF EXISTS public.order_totals;
6. Remove external refresh jobs
Delete any cron jobs, pg_cron entries, or application-level refresh triggers that were maintaining the old materialized view.
Migrating Concurrent Refresh Patterns
If you use REFRESH MATERIALIZED VIEW CONCURRENTLY (which requires a
unique index), the stream table equivalent is simpler — differential
refresh never blocks readers and doesn't require a unique index:
Before:
CREATE MATERIALIZED VIEW active_users AS
SELECT user_id, MAX(login_at) AS last_login
FROM logins
WHERE login_at > NOW() - INTERVAL '30 days'
GROUP BY user_id;
CREATE UNIQUE INDEX ON active_users (user_id);
REFRESH MATERIALIZED VIEW CONCURRENTLY active_users;
After:
SELECT pgtrickle.create_stream_table(
name => 'active_users',
query => 'SELECT user_id, MAX(login_at) AS last_login
FROM logins
WHERE login_at > NOW() - INTERVAL ''30 days''
GROUP BY user_id',
schedule => '1m'
);
-- No unique index needed. No manual refresh needed.
Migrating Cascading Materialized Views
If you have materialized views that depend on other materialized views, the migration is straightforward — pg_trickle handles dependency ordering automatically:
Before:
CREATE MATERIALIZED VIEW order_totals AS
SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id;
CREATE MATERIALIZED VIEW big_customers AS
SELECT customer_id, total FROM order_totals WHERE total > 1000;
-- Must refresh in order:
REFRESH MATERIALIZED VIEW order_totals;
REFRESH MATERIALIZED VIEW big_customers;
After:
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
schedule => '1m'
);
SELECT pgtrickle.create_stream_table(
name => 'big_customers',
query => 'SELECT customer_id, total FROM pgtrickle.order_totals WHERE total > 1000',
schedule => '1m'
);
-- Dependency ordering is automatic. No manual refresh needed.
Idempotent Deployment
For CI/CD pipelines, use create_or_replace_stream_table() so your
migration scripts are safe to re-run:
SELECT pgtrickle.create_or_replace_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
schedule => '5m',
refresh_mode => 'DIFFERENTIAL'
);
Choosing the Right Refresh Mode
| Scenario | Mode |
|---|---|
| Most migrations (default) | DIFFERENTIAL — only processes changes |
Volatile functions (NOW(), RANDOM()) in the query | FULL — the query result changes even without source DML |
| Need real-time consistency within a transaction | IMMEDIATE |
| Unsure | AUTO (default) — pg_trickle picks the best mode per cycle |
Migration Checklist
- Identify all materialized views and their refresh schedules
- Create equivalent stream tables with matching queries
- Recreate any required indexes on the stream tables
-
Update application queries to reference the
pgtrickleschema - Verify data correctness (compare stream table vs. materialized view)
- Remove external refresh jobs (cron, pg_cron)
- Drop the old materialized views
- Set up monitoring (Prometheus/Grafana or built-in views)
Further Reading
- Getting Started
- SQL Reference — create_stream_table()
- SQL Reference — create_or_replace_stream_table()
- FAQ — Materialized View vs Stream Table
Tutorial: Fuse Circuit Breaker
The fuse circuit breaker (v0.11.0+) suspends differential refreshes when the incoming change volume exceeds a threshold. This protects your database from runaway refresh cycles during bulk data loads, accidental mass-deletes, or migration scripts.
When to Use It
- Bulk ETL loads — loading millions of rows that would overwhelm a differential refresh
- Data migration scripts — large schema or data changes that temporarily spike the change buffer
- Protection against accidents — an errant
DELETE FROM ordersshouldn't silently cascade through all downstream stream tables
How It Works
Normal operation Fuse blows After reset
───────────────── ───────────────── ─────────────────
Source DML ──▶ CDC ──▶ Refresh Source DML ──▶ CDC ──▶ BLOCKED Source DML ──▶ CDC ──▶ Refresh
│ (resumed)
▼
NOTIFY alert
(fuse_blown)
- Each refresh cycle, the scheduler counts pending changes in the buffer.
- If the count exceeds
fuse_ceilingforfuse_sensitivityconsecutive cycles, the fuse blows. - The stream table enters a paused state — no refreshes occur.
- A
fuse_blownalert is emitted viaNOTIFY pg_trickle_alert. - An operator investigates and calls
reset_fuse()to resume.
Step-by-Step Example
1. Create a stream table with fuse protection
SELECT pgtrickle.create_stream_table(
name => 'category_summary',
query => 'SELECT category, COUNT(*) AS cnt, SUM(price) AS total
FROM products GROUP BY category',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
-- Arm the fuse: blow when pending changes exceed 50,000 rows
SELECT pgtrickle.alter_stream_table(
'category_summary',
fuse => 'on',
fuse_ceiling => 50000,
fuse_sensitivity => 3 -- require 3 consecutive over-ceiling cycles
);
2. Observe normal operation
-- Insert a small batch — well under the ceiling
INSERT INTO products (name, category, price)
SELECT 'Product ' || i, 'Electronics', 9.99
FROM generate_series(1, 100) i;
-- After the next refresh cycle, the stream table is updated normally
SELECT * FROM pgtrickle.category_summary;
3. Trigger a bulk load
-- Simulate a large ETL load — 100,000 rows
INSERT INTO products (name, category, price)
SELECT 'Bulk ' || i, 'Imported', 4.99
FROM generate_series(1, 100000) i;
After fuse_sensitivity scheduler cycles (3 in our example), the fuse
blows. The stream table stops refreshing.
4. Inspect the fuse state
SELECT name, fuse_mode, fuse_state, fuse_ceiling, blown_at, blow_reason
FROM pgtrickle.fuse_status();
name | fuse_mode | fuse_state | fuse_ceiling | blown_at | blow_reason
-------------------+-----------+------------+--------------+----------------------------+---------------------------
category_summary | on | blown | 50000 | 2026-03-31 14:22:01.123+00 | change_count_exceeded
5. Decide how to recover
You have three options:
-- Option A: Apply the changes (process the bulk load normally)
SELECT pgtrickle.reset_fuse('category_summary', action => 'apply');
-- Option B: Skip the changes (discard the batch, resume from current state)
SELECT pgtrickle.reset_fuse('category_summary', action => 'skip_changes');
-- Option C: Reinitialize (full rebuild from the defining query)
SELECT pgtrickle.reset_fuse('category_summary', action => 'reinitialize');
After resetting, the fuse returns to 'armed' state and the scheduler
resumes.
Fuse Modes
| Mode | Behavior |
|---|---|
'off' | No fuse protection (default) |
'on' | Always armed — blows when changes exceed fuse_ceiling |
'auto' | Blows only when a FULL refresh would be cheaper than DIFFERENTIAL |
'auto' mode is recommended for most use cases — it protects against
bulk loads while allowing large-but-efficient differential refreshes to
proceed.
Using with dbt
In dbt models, configure the fuse via the stream_table materialization:
-- models/marts/category_summary.sql
{{ config(
materialized='stream_table',
schedule='5m',
refresh_mode='DIFFERENTIAL',
fuse='auto',
fuse_ceiling=50000,
fuse_sensitivity=3
) }}
SELECT category, COUNT(*) AS cnt, SUM(price) AS total
FROM {{ source('raw', 'products') }}
GROUP BY category
Global Defaults
Set a cluster-wide default ceiling via the pg_trickle.fuse_default_ceiling
GUC. Stream tables with fuse_ceiling = NULL inherit this value:
ALTER SYSTEM SET pg_trickle.fuse_default_ceiling = 100000;
SELECT pg_reload_conf();
Monitoring
pgtrickle.fuse_status()— inspect fuse state for all stream tablesLISTEN pg_trickle_alert— receive real-timefuse_blownnotificationspgtrickle.dedup_stats()— includes fuse-related counterspgtrickle.pgt_stream_tables.fuse_state— direct catalog query
Further Reading
- SQL Reference — fuse_status()
- SQL Reference — reset_fuse()
- Configuration — fuse_default_ceiling
- Tutorial: ETL & Bulk Load Patterns
Tutorial: Tiered Scheduling
Tiered scheduling (v0.12.0+) lets you assign refresh priorities to stream tables using four tiers: Hot, Warm, Cold, and Frozen. This reduces CPU and I/O overhead by refreshing less-critical tables less frequently.
When to Use It
- You have many stream tables (50+) and want to reduce scheduler load
- Some tables power real-time dashboards (need hot refresh) while others serve weekly reports (can be cold)
- You want to freeze tables during maintenance windows without dropping them
Tier Overview
| Tier | Multiplier | Effect |
|---|---|---|
hot | 1× | Refresh at the configured schedule (default) |
warm | 2× | Refresh at 2× the configured interval |
cold | 10× | Refresh at 10× the configured interval |
frozen | skip | Never refreshed until manually promoted |
For a stream table with schedule => '1m':
| Tier | Effective Interval |
|---|---|
| hot | 1 minute |
| warm | 2 minutes |
| cold | 10 minutes |
| frozen | never |
Note: Cron-based schedules are not affected by the tier multiplier. They always fire at the configured cron time.
Step-by-Step Example
1. Enable tiered scheduling
Tiered scheduling is enabled by default since v0.12.0. Verify:
SHOW pg_trickle.tiered_scheduling;
-- Should return: on
2. Create stream tables with different priorities
-- Real-time dashboard — stays hot (default)
SELECT pgtrickle.create_stream_table(
name => 'live_order_count',
query => 'SELECT COUNT(*) AS total FROM orders WHERE status = ''active''',
schedule => '30s'
);
-- Important but not latency-critical
SELECT pgtrickle.create_stream_table(
name => 'daily_revenue',
query => 'SELECT DATE_TRUNC(''day'', created_at) AS day, SUM(amount) AS revenue
FROM orders GROUP BY 1',
schedule => '1m'
);
-- Weekly report — rarely queried
SELECT pgtrickle.create_stream_table(
name => 'customer_lifetime_value',
query => 'SELECT customer_id, SUM(amount) AS lifetime_value
FROM orders GROUP BY customer_id',
schedule => '5m'
);
3. Assign tiers
-- live_order_count stays at 'hot' (default) — refreshes every 30s
-- daily_revenue: 2× multiplier → effective interval = 2 minutes
SELECT pgtrickle.alter_stream_table('daily_revenue', tier => 'warm');
-- customer_lifetime_value: 10× multiplier → effective interval = 50 minutes
SELECT pgtrickle.alter_stream_table('customer_lifetime_value', tier => 'cold');
4. Verify effective schedules
SELECT pgt_name, schedule, refresh_tier,
CASE refresh_tier
WHEN 'hot' THEN schedule
WHEN 'warm' THEN schedule || ' ×2'
WHEN 'cold' THEN schedule || ' ×10'
WHEN 'frozen' THEN 'never'
END AS effective
FROM pgtrickle.pgt_stream_tables
ORDER BY refresh_tier;
5. Freeze a table during maintenance
-- Freeze before a schema migration
SELECT pgtrickle.alter_stream_table('customer_lifetime_value', tier => 'frozen');
-- ... perform migration ...
-- Promote back when ready
SELECT pgtrickle.alter_stream_table('customer_lifetime_value', tier => 'warm');
Choosing the Right Tier
| Use Case | Recommended Tier |
|---|---|
| Real-time dashboards, alerting tables | hot |
| Operational reports queried hourly | warm |
| Weekly/monthly analytics, batch consumers | cold |
| Tables under maintenance, seasonal reports | frozen |
Rules of thumb:
- Start with everything at hot (the default). Move tables to warm or cold as you identify which ones can tolerate more staleness.
- Warm halves the refresh CPU cost compared to hot.
- Cold reduces refresh overhead by 90%.
- Use frozen sparingly — changes accumulate in the buffer and will be processed when you promote the table back.
Monitoring Tiers
-- Check which tables are in which tier
SELECT pgt_name, refresh_tier, status, staleness
FROM pgtrickle.stream_tables_info
ORDER BY refresh_tier, staleness DESC;
-- Find frozen tables (these are NOT being refreshed)
SELECT pgt_name, refresh_tier
FROM pgtrickle.pgt_stream_tables
WHERE refresh_tier = 'frozen';
Troubleshooting
All tables are frozen and nothing is refreshing:
If every stream table is set to frozen, the scheduler has nothing to do.
Promote at least one table back to hot or warm.
Staleness exceeds expectations for cold tables:
Remember that cold applies a 10× multiplier. A 5-minute schedule becomes
a 50-minute effective interval. If this is too stale, use warm instead.
Further Reading
Tutorial: Tuning Refresh Mode
This tutorial walks you through using pg_trickle's built-in diagnostics to determine whether your stream tables are running in the most efficient refresh mode (FULL vs DIFFERENTIAL), and how to act on the recommendations.
Prerequisites
- pg_trickle v0.14.0 or later
- At least one stream table with several completed refresh cycles (the diagnostics become more accurate with more history)
Step 1: Check Current Refresh Efficiency
Start by reviewing how your stream tables are performing with their current refresh mode:
SELECT pgt_name, refresh_mode, diff_count, full_count,
avg_diff_ms, avg_full_ms, diff_speedup
FROM pgtrickle.refresh_efficiency();
Example output:
| pgt_name | refresh_mode | diff_count | full_count | avg_diff_ms | avg_full_ms | diff_speedup |
|---|---|---|---|---|---|---|
| order_totals | DIFFERENTIAL | 142 | 3 | 12.4 | 850.2 | 68.6x |
| user_stats | FULL | 0 | 145 | — | 320.1 | — |
| daily_metrics | DIFFERENTIAL | 98 | 47 | 425.8 | 410.3 | 1.0x |
Key observations:
- order_totals: DIFFERENTIAL is 68× faster — this is a great fit.
- user_stats: Running in FULL mode with no DIFFERENTIAL history — worth checking if DIFFERENTIAL would be faster.
- daily_metrics: DIFFERENTIAL and FULL take about the same time (1.0× speedup). FULL might actually be simpler and more predictable here.
Step 2: Get Recommendations
Use recommend_refresh_mode() to get AI-weighted recommendations:
SELECT pgt_name, current_mode, recommended_mode, confidence, reason
FROM pgtrickle.recommend_refresh_mode();
Example output:
| pgt_name | current_mode | recommended_mode | confidence | reason |
|---|---|---|---|---|
| order_totals | DIFFERENTIAL | KEEP | high | DIFFERENTIAL is 68.6× faster than FULL with low latency variance |
| user_stats | FULL | DIFFERENTIAL | medium | Query is simple (no complex joins), change ratio is low (2.1%), target table is large |
| daily_metrics | DIFFERENTIAL | FULL | medium | DIFFERENTIAL shows no speedup over FULL (1.0×); high latency variance (p95/p50 = 4.2) suggests unstable performance |
For a single table with full signal details:
SELECT recommended_mode, confidence, reason,
jsonb_pretty(signals) AS signal_details
FROM pgtrickle.recommend_refresh_mode('daily_metrics');
Step 3: Understand the Signals
The signals JSONB column contains the detailed breakdown of all seven
weighted signals that contributed to the recommendation:
{
"composite_score": -0.22,
"signals": [
{ "name": "change_ratio_avg", "score": -0.1, "weight": 0.30 },
{ "name": "empirical_timing", "score": -0.3, "weight": 0.35 },
{ "name": "change_ratio_current", "score": -0.2, "weight": 0.25 },
{ "name": "query_complexity", "score": 0.0, "weight": 0.10 },
{ "name": "target_size", "score": 0.1, "weight": 0.10 },
{ "name": "index_coverage", "score": 0.0, "weight": 0.05 },
{ "name": "latency_variance", "score": -0.4, "weight": 0.05 }
]
}
Positive scores favour DIFFERENTIAL; negative scores favour FULL. A composite score above +0.15 recommends DIFFERENTIAL; below −0.15 recommends FULL; in between, the current mode is near-optimal (KEEP).
Confidence levels:
| Level | Meaning |
|---|---|
high | 10+ completed refresh cycles; strong signal agreement |
medium | 5–10 cycles or mixed signals |
low | Fewer than 5 cycles; recommendation is speculative |
Step 4: Apply the Recommendation
If you decide to follow a recommendation, use ALTER STREAM TABLE:
-- Switch daily_metrics from DIFFERENTIAL to FULL
SELECT pgtrickle.alter_stream_table('daily_metrics',
refresh_mode => 'FULL'
);
Or switch a table to DIFFERENTIAL:
-- Switch user_stats to DIFFERENTIAL mode
SELECT pgtrickle.alter_stream_table('user_stats',
refresh_mode => 'DIFFERENTIAL'
);
The change takes effect on the next refresh cycle. No data is lost during the transition.
Step 5: Monitor After the Change
After switching modes, wait for several refresh cycles and re-check:
-- Wait a few minutes, then re-check efficiency
SELECT pgt_name, refresh_mode, diff_count, full_count,
avg_diff_ms, avg_full_ms, diff_speedup
FROM pgtrickle.refresh_efficiency()
WHERE pgt_name = 'daily_metrics';
Run the recommendation function again to verify the change was beneficial:
SELECT recommended_mode, confidence, reason
FROM pgtrickle.recommend_refresh_mode('daily_metrics');
If the recommendation now says KEEP, the new mode is working well.
Common Scenarios
High-cardinality aggregates
Stream tables with SUM/COUNT/AVG over high-cardinality GROUP BY keys
(1000+ groups) are almost always better in DIFFERENTIAL mode. pg_trickle
warns about low-cardinality groups at creation time (DIAG-2).
Small tables with frequent full rewrites
If the source table is small (< 10,000 rows) and changes affect > 30% of rows per cycle, FULL refresh is often faster because it avoids the overhead of change tracking and delta application.
Complex multi-join queries
Queries with 4+ JOINs may have high DIFFERENTIAL overhead due to the
delta propagation rules. If diff_speedup is below 2×, consider FULL mode.
Tables with volatile functions
Stream tables using volatile functions (e.g., now(), random()) must use
FULL mode. pg_trickle rejects volatile functions in DIFFERENTIAL mode at
creation time.
Using the TUI
The pgtrickle TUI provides a visual diagnostics panel. Press 5 or d
in the interactive dashboard to open the diagnostics view, which shows
recommendations with confidence levels for all stream tables at a glance.
From the CLI:
# Show recommendations for all tables
pgtrickle diag
# Show recommendations in JSON format (for automation)
pgtrickle diag --format json
See Also
- SQL Reference: recommend_refresh_mode() — Full function documentation
- SQL Reference: refresh_efficiency() — Efficiency metrics documentation
- Configuration: agg_diff_cardinality_threshold — Cardinality warning threshold
- DVM Operators — Full operator support matrix
Tutorial: Circular Dependencies
pg_trickle supports circular (cyclic) stream table dependencies (v0.7.0+) for queries that use only monotone operators. The scheduler groups circular dependencies into Strongly Connected Components (SCCs) and iterates them to a fixed point.
When to Use It
- Transitive closure — computing all reachable nodes in a graph
- Graph reachability — finding all paths between nodes
- Iterative convergence — mutual dependencies that stabilize after a few iterations
Prerequisites
Circular dependencies are disabled by default. Enable them:
SET pg_trickle.allow_circular = true;
Monotone Operator Requirement
Only monotone operators are allowed in circular dependency chains. Monotone operators guarantee convergence — the result set grows (or stays the same) with each iteration until a fixed point is reached.
| Allowed (Monotone) | Blocked (Non-Monotone) |
|---|---|
| Joins (INNER, LEFT, RIGHT, FULL) | Aggregates (SUM, COUNT, etc.) |
| Filters (WHERE) | EXCEPT |
| Projections (SELECT) | Window functions |
| UNION ALL | NOT EXISTS / NOT IN |
| INTERSECT | |
| EXISTS |
Creating a circular dependency with non-monotone operators is rejected
with a clear error message, regardless of the allow_circular setting.
Step-by-Step Example: Transitive Closure
Suppose you have a graph of relationships:
CREATE TABLE edges (src INT, dst INT);
INSERT INTO edges VALUES
(1, 2), (2, 3), (3, 4), (4, 5),
(1, 3), (2, 5);
1. Create the base reachability table
-- Direct edges: all nodes directly connected
SELECT pgtrickle.create_stream_table(
name => 'reachable_direct',
query => 'SELECT src, dst FROM edges',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
2. Create the transitive closure with a self-reference
-- Transitive closure: if A→B and B→C, then A→C
-- This creates a circular dependency (reachable depends on itself via the join)
SELECT pgtrickle.create_stream_table(
name => 'reachable',
query => 'SELECT DISTINCT r1.src, r2.dst
FROM pgtrickle.reachable_direct r1
JOIN pgtrickle.reachable_direct r2 ON r1.dst = r2.src
UNION ALL
SELECT src, dst FROM edges',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
Note: This example uses the
reachable_directtable for the join rather than self-referencingreachabledirectly. For a true self-referencing cycle, pg_trickle detects the SCC and iterates.
3. Observe the fixed-point iteration
When the scheduler processes an SCC, it iterates until no new rows are produced (the fixed point):
-- Check SCC status
SELECT * FROM pgtrickle.pgt_scc_status();
Output:
scc_id | members | iteration | converged
--------+----------------------------------+-----------+-----------
1 | {reachable_direct,reachable} | 3 | true
4. Add new edges and watch convergence
INSERT INTO edges VALUES (5, 1); -- creates a cycle in the graph
On the next refresh cycle, the scheduler re-iterates the SCC until the transitive closure stabilizes with the new edge.
Monitoring SCCs
-- View all SCCs and their convergence status
SELECT * FROM pgtrickle.pgt_scc_status();
-- Check which stream tables belong to which SCC
SELECT pgt_name, scc_id
FROM pgtrickle.pgt_stream_tables
WHERE scc_id IS NOT NULL;
Controlling Iteration Limits
The pg_trickle.max_fixpoint_iterations GUC limits how many iterations
the scheduler attempts before declaring non-convergence:
-- Default: 100 (generous headroom)
SHOW pg_trickle.max_fixpoint_iterations;
-- Lower it for fast-converging workloads
SET pg_trickle.max_fixpoint_iterations = 20;
If convergence is not reached within the limit, all SCC members are marked
as ERROR. This prevents runaway infinite loops.
Limitations
- Non-monotone operators are always rejected — aggregates, EXCEPT, window functions, and NOT EXISTS/NOT IN cannot appear in circular chains because they prevent convergence.
- Performance scales with iteration count — each iteration runs a full differential refresh cycle for all SCC members. Keep cycles small.
- All SCC members must use DIFFERENTIAL mode — FULL and IMMEDIATE modes are not supported for circular dependencies.
Further Reading
- Configuration — pg_trickle.allow_circular
- Configuration — pg_trickle.max_fixpoint_iterations
- SQL Reference — pgt_scc_status()
- FAQ — Cycle Detection
Tutorial: Monitoring & Alerting
This guide consolidates all pg_trickle monitoring capabilities into a single reference: built-in SQL views, NOTIFY-based alerts, and the Prometheus/Grafana observability stack.
Quick Health Check
The fastest way to verify pg_trickle is healthy:
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';
If this returns no rows, everything is working. Any WARN or ERROR rows
tell you where to investigate.
Built-in Monitoring Views
Stream table status
-- Overview: name, status, mode, staleness
SELECT name, status, refresh_mode, staleness, stale
FROM pgtrickle.stream_tables_info;
-- Detailed stats: refresh counts, duration, error streaks
SELECT pgt_name, total_refreshes, avg_duration_ms, consecutive_errors, stale
FROM pgtrickle.pg_stat_stream_tables;
-- Live status with error counts
SELECT * FROM pgtrickle.pgt_status();
Refresh history
-- Last 10 refreshes for a specific stream table
SELECT start_time, action, status, duration_ms, rows_inserted, rows_deleted, error_message
FROM pgtrickle.get_refresh_history('order_totals', 10);
-- Global refresh timeline (last 20 events across all stream tables)
SELECT start_time, stream_table, action, status, duration_ms, error_message
FROM pgtrickle.refresh_timeline(20);
-- Aggregate refresh statistics
SELECT * FROM pgtrickle.st_refresh_stats();
CDC pipeline health
-- Per-source CDC mode, WAL lag, and alerts
SELECT * FROM pgtrickle.check_cdc_health();
-- Change buffer sizes (pending changes not yet consumed)
SELECT stream_table, source_table, cdc_mode, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
-- Verify all CDC triggers are installed and enabled
SELECT source_table, trigger_type, trigger_name
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;
Dependencies
-- ASCII tree view of the entire dependency graph
SELECT tree_line, status, refresh_mode
FROM pgtrickle.dependency_tree();
-- Diamond consistency groups
SELECT * FROM pgtrickle.diamond_groups();
Fuse circuit breaker
-- Check fuse state for all stream tables
SELECT name, fuse_mode, fuse_state, fuse_ceiling, blown_at
FROM pgtrickle.fuse_status();
Parallel workers
-- Worker pool status (when parallel_refresh_mode = 'on')
SELECT * FROM pgtrickle.worker_pool_status();
-- Recent parallel job history
SELECT job_id, unit_key, status, duration_ms
FROM pgtrickle.parallel_job_status(60);
NOTIFY-Based Alerting
pg_trickle emits real-time events via PostgreSQL's NOTIFY system:
LISTEN pg_trickle_alert;
Event Types
| Event | Trigger | Severity |
|---|---|---|
stale_data | Scheduler is also behind — view is genuinely out of date | Warning |
no_upstream_changes | Scheduler is healthy but source tables have had no writes — view is correct | Info |
auto_suspended | Stream table suspended after max consecutive errors | Critical |
resumed | Stream table resumed after suspension | Info |
reinitialize_needed | Upstream DDL change detected | Warning |
buffer_growth_warning | Change buffer growing unexpectedly | Warning |
slot_lag_warning | WAL replication slot retaining excessive data | Warning |
fuse_blown | Circuit breaker tripped | Warning |
refresh_completed | Refresh completed successfully | Info |
refresh_failed | Refresh failed | Error |
diamond_partial_failure | One member of an atomic diamond group failed | Warning |
scheduler_falling_behind | Refresh duration approaching the schedule interval | Warning |
spill_threshold_exceeded | Delta MERGE spilled to temp files for consecutive refreshes, forcing FULL | Warning |
Notification Payload
Each notification carries a JSON payload:
{
"event": "auto_suspended",
"stream_table": "order_totals",
"consecutive_errors": 3,
"last_error": "column \"deleted_column\" does not exist",
"timestamp": "2026-03-31T14:22:01.123Z"
}
Bridging to External Systems
To forward NOTIFY events to external alerting systems (PagerDuty, Slack, OpsGenie), use a listener process:
# Example: Python listener using psycopg
import psycopg
import json
conn = psycopg.connect("postgresql://user:pass@host/db", autocommit=True)
conn.execute("LISTEN pg_trickle_alert")
for notify in conn.notifies():
payload = json.loads(notify.payload)
event = payload["event"]
# no_upstream_changes is informational — source tables are quiet but healthy.
# Only page on actionable events.
if event in ("auto_suspended", "fuse_blown", "refresh_failed"):
send_to_pagerduty(payload)
elif event == "stale_data": # scheduler itself is falling behind
send_to_pagerduty(payload)
Prometheus & Grafana Stack
For production deployments, use the pre-built observability stack in the
monitoring/ directory:
cd monitoring/
docker compose up -d
This gives you:
- Prometheus scraping pg_trickle metrics via postgres_exporter
- Grafana with a pre-provisioned dashboard
- Alerting rules for staleness, errors, CDC lag, and scheduler health
See Prometheus & Grafana Integration for full setup details.
Diagnostic Workflow
When something is wrong, follow this systematic workflow:
Step 1 — Global health
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';
Step 2 — Status and staleness
SELECT name, status, consecutive_errors, staleness
FROM pgtrickle.pgt_status()
ORDER BY staleness DESC NULLS FIRST;
Step 3 — Recent refresh activity
SELECT start_time, stream_table, action, status, error_message
FROM pgtrickle.refresh_timeline(20);
Step 4 — Error details for a specific stream table
SELECT * FROM pgtrickle.diagnose_errors('my_stream_table');
Step 5 — CDC pipeline
SELECT stream_table, source_table, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
Step 6 — Trigger verification
SELECT source_table, trigger_type, trigger_name
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;
Common Alert Responses
| Alert | Likely Cause | Action |
|---|---|---|
stale_data | Scheduler behind, long refresh, or lock contention | Check pgt_status() and refresh_timeline() |
auto_suspended | Repeated refresh failures | Fix root cause, then resume_stream_table() |
fuse_blown | Bulk load exceeded fuse ceiling | Investigate, then reset_fuse() |
buffer_growth_warning | Scheduler not consuming buffers fast enough | Check scheduler status and refresh errors |
reinitialize_needed | Source table DDL changed | Verify schema compatibility; scheduler handles automatically |
Further Reading
- Prometheus & Grafana Integration
- SQL Reference — Monitoring Functions
- Configuration Reference
- FAQ — Troubleshooting
Tutorial: ETL & Bulk Load Patterns
pg_trickle provides source gating (v0.5.0+) and watermark gating (v0.7.0+) to coordinate stream table refreshes with ETL pipelines and bulk data loads. This tutorial covers common patterns for pausing refreshes during loads and resuming them safely afterward.
The Problem
When you bulk-load data into a source table (e.g., a nightly ETL job), the change buffer fills rapidly. Without coordination:
- A differential refresh mid-load sees a partial batch, producing incomplete results
- The adaptive fallback may trigger repeated FULL refreshes during the load
- The fuse circuit breaker may blow, requiring manual intervention
Source gating solves this by telling pg_trickle to skip refreshes for gated sources until the load completes.
Recipe 1 — Single Source Bulk Load
The simplest pattern: gate the source, load data, ungate.
-- 1. Gate the source table — all dependent stream tables pause
SELECT pgtrickle.gate_source('public.orders');
-- 2. Perform the bulk load
COPY orders FROM '/data/orders_20260331.csv' WITH (FORMAT csv, HEADER);
-- or: INSERT INTO orders SELECT ... FROM staging_orders;
-- 3. Ungate — stream tables resume and process the full batch
SELECT pgtrickle.ungate_source('public.orders');
While gated, the scheduler skips all stream tables that depend on the gated source. Changes still accumulate in the CDC buffer and are processed in a single batch after ungating.
Recipe 2 — Coordinated Multi-Source Load
When your ETL loads multiple tables that feed into the same stream table:
-- Gate all sources involved in the load
SELECT pgtrickle.gate_source('public.orders');
SELECT pgtrickle.gate_source('public.customers');
SELECT pgtrickle.gate_source('public.products');
-- Load all tables
COPY orders FROM '/data/orders.csv' WITH (FORMAT csv, HEADER);
COPY customers FROM '/data/customers.csv' WITH (FORMAT csv, HEADER);
COPY products FROM '/data/products.csv' WITH (FORMAT csv, HEADER);
-- Ungate all at once — stream tables see a consistent snapshot
SELECT pgtrickle.ungate_source('public.orders');
SELECT pgtrickle.ungate_source('public.customers');
SELECT pgtrickle.ungate_source('public.products');
Recipe 3 — Gate + Deferred Stream Table Creation
For initial deployments where data must be loaded before stream tables are created:
-- 1. Gate the source before any stream tables exist
SELECT pgtrickle.gate_source('public.orders');
-- 2. Load the initial data
COPY orders FROM '/data/historical_orders.csv' WITH (FORMAT csv, HEADER);
-- 3. Create stream tables — they won't refresh yet (source is gated)
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
schedule => '1m'
);
-- 4. Ungate — the first refresh processes all data cleanly
SELECT pgtrickle.ungate_source('public.orders');
Recipe 4 — Nightly Batch Pattern
A common production pattern using a scheduled batch job:
-- Run nightly at 02:00 UTC
-- Step 1: Gate all ETL sources
DO $$
DECLARE
src TEXT;
BEGIN
FOR src IN SELECT DISTINCT source_table
FROM pgtrickle.list_sources('daily_report')
LOOP
PERFORM pgtrickle.gate_source(src);
END LOOP;
END;
$$;
-- Step 2: Run the ETL pipeline
CALL etl.load_daily_data();
-- Step 3: Ungate all sources
DO $$
DECLARE
gated RECORD;
BEGIN
FOR gated IN SELECT source_name FROM pgtrickle.source_gates()
WHERE is_gated = true
LOOP
PERFORM pgtrickle.ungate_source(gated.source_name);
END LOOP;
END;
$$;
Monitoring During a Gated Load
While sources are gated, verify the gate status:
-- Check which sources are currently gated
SELECT * FROM pgtrickle.source_gates();
-- Bootstrap gate status (v0.6.0+)
SELECT * FROM pgtrickle.bootstrap_gate_status();
Combining with the Fuse Circuit Breaker
For extra safety, combine gating with the fuse circuit breaker:
-- Arm the fuse as a safety net
SELECT pgtrickle.alter_stream_table('order_totals',
fuse => 'on',
fuse_ceiling => 500000
);
-- Gate for controlled loads
SELECT pgtrickle.gate_source('public.orders');
-- ... load data ...
SELECT pgtrickle.ungate_source('public.orders');
-- The fuse catches any unexpected bulk changes outside the gated window
Watermark Gating (v0.7.0+)
Watermark gating extends source gating with LSN-based coordination for more precise control:
-- Set a watermark — refreshes only consume changes up to this LSN
SELECT pgtrickle.set_watermark('public.orders', pg_current_wal_lsn());
-- Load new data (changes accumulate beyond the watermark)
COPY orders FROM '/data/new_orders.csv' WITH (FORMAT csv, HEADER);
-- Advance the watermark to include the new data
SELECT pgtrickle.advance_watermark('public.orders', pg_current_wal_lsn());
-- Or clear the watermark entirely
SELECT pgtrickle.clear_watermark('public.orders');
See the SQL Reference — Watermark Gating for the complete API.
Further Reading
- SQL Reference — Bootstrap Source Gating
- SQL Reference — Watermark Gating
- SQL Reference — ETL Coordination Cookbook
- Tutorial: Fuse Circuit Breaker
Tutorial: Migrating from pg_ivm to pg_trickle
This guide walks through migrating existing pg_ivm IMMVs (Incrementally
Maintained Materialized Views) to pg_trickle stream tables. It covers API
mapping, behavioral differences, and a step-by-step migration checklist.
See also: plans/ecosystem/GAP_PG_IVM_COMPARISON.md for the full feature comparison and gap analysis between the two extensions.
Why Migrate?
| pg_ivm (IMMV) | pg_trickle (Stream Table) | |
|---|---|---|
| Maintenance model | Immediate only (in-transaction) | Deferred (scheduler) and Immediate |
| Aggregate functions | 5 (COUNT, SUM, AVG, MIN, MAX) | 60+ (all built-in + user-defined) |
| Window functions | Not supported | Full support |
| CTEs (recursive) | Not supported | Semi-naive, DRed, recomputation |
| Subqueries | Very limited | Full (EXISTS, NOT EXISTS, IN, LATERAL, scalar) |
| Set operations | Not supported | UNION, INTERSECT, EXCEPT (bag + set) |
| HAVING clause | Not supported | Supported |
| GROUPING SETS / CUBE / ROLLUP | Not supported | Auto-rewritten to UNION ALL |
| DISTINCT ON | Not supported | Auto-rewritten to ROW_NUMBER |
| Views as sources | Not supported | Auto-inlined |
| Cascading views | Not supported | DAG-aware topological scheduling |
| Background scheduling | None (manual only) | Native cron, duration, CALCULATED |
| Monitoring | 1 catalog table | 15+ diagnostic functions |
| Concurrency | ExclusiveLock during maintenance | Advisory locks, non-blocking reads |
| Parallel refresh | Not supported | Worker pool with caps |
Concept Mapping
| pg_ivm Concept | pg_trickle Equivalent | Notes |
|---|---|---|
| IMMV (Incrementally Maintained Materialized View) | Stream table | Same idea — a query result kept incrementally up to date |
pgivm.create_immv(name, query) | pgtrickle.create_stream_table(name, query) | pg_trickle adds optional schedule and refresh_mode parameters |
pgivm.refresh_immv(name, true) | pgtrickle.refresh_stream_table(name) | Manual refresh |
pgivm.refresh_immv(name, false) | No direct equivalent | pg_trickle has pgtrickle.alter_stream_table(name, enabled => false) to suspend |
pgivm.pg_ivm_immv catalog | pgtrickle.pgt_stream_tables | Plus pgt_status(), refresh_timeline(), etc. |
DROP TABLE immv_name | pgtrickle.drop_stream_table(name) | Stream tables must be dropped via the API |
ALTER TABLE immv RENAME TO ... | pgtrickle.alter_stream_table(old, name => new) | Rename via API |
| In-transaction maintenance (AFTER row triggers) | refresh_mode => 'IMMEDIATE' | Same model — triggers fire in the writing transaction |
| (not available) | refresh_mode => 'DIFFERENTIAL' | Deferred incremental refresh via change buffers |
| (not available) | refresh_mode => 'AUTO' | Picks DIFFERENTIAL or FULL automatically |
| Auto-created indexes on GROUP BY / PK | Manual CREATE INDEX | pg_trickle auto-creates the primary key but not secondary indexes |
Step-by-Step Migration
1. Inventory existing IMMVs
List all pg_ivm IMMVs in your database:
-- pg_ivm catalog
SELECT immvrelid::regclass AS immv_name,
pgivm.get_immv_def(immvrelid) AS defining_query
FROM pgivm.pg_ivm_immv
ORDER BY immvrelid::regclass::text;
Record each IMMV's name, defining query, and any indexes you have created on it.
2. Check query compatibility
pg_trickle supports a superset of pg_ivm's SQL dialect, so any query that works with pg_ivm will work with pg_trickle. However, there are a few things to verify:
- Data types: pg_ivm requires btree operator classes for all columns
(excluding
json,xml,point, etc.). pg_trickle has no such restriction. - Outer joins: If your IMMV uses outer joins, pg_trickle removes pg_ivm's restrictions (single equijoin, no aggregates, no CASE). Your query may work unchanged or you may be able to simplify workarounds you added for pg_ivm.
3. Choose a refresh mode
For each IMMV, decide which pg_trickle refresh mode to use:
| pg_ivm behavior | pg_trickle refresh mode | When to choose |
|---|---|---|
| Zero staleness required | IMMEDIATE | Same in-transaction behavior as pg_ivm |
| Some staleness acceptable | DIFFERENTIAL with schedule | Lower write latency, batched refresh |
| Let pg_trickle decide | AUTO (default) | Recommended for most cases |
4. Create stream tables
For each IMMV, create the corresponding stream table:
pg_ivm (before):
SELECT pgivm.create_immv(
'order_totals',
'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id'
);
pg_trickle — IMMEDIATE mode (same behavior as pg_ivm):
SELECT pgtrickle.create_stream_table(
'order_totals',
'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
NULL, -- no schedule needed for IMMEDIATE
'IMMEDIATE'
);
pg_trickle — deferred mode (lower write latency):
SELECT pgtrickle.create_stream_table(
'order_totals',
'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
'30s' -- refresh every 30 seconds; mode defaults to AUTO
);
5. Recreate indexes
pg_ivm auto-creates indexes on GROUP BY, DISTINCT, and primary key columns.
pg_trickle auto-creates the primary key (pgt_row_id) but not secondary indexes.
Recreate any indexes that your read queries depend on:
-- Example: index on the GROUP BY column for lookup queries
CREATE INDEX ON pgtrickle.order_totals (customer_id);
6. Update application queries
pg_ivm IMMVs live in the schema where they were created (usually public).
pg_trickle stream tables default to the pgtrickle schema.
-- Before (pg_ivm):
SELECT * FROM public.order_totals WHERE customer_id = 42;
-- After (pg_trickle):
SELECT * FROM pgtrickle.order_totals WHERE customer_id = 42;
To avoid changing application code, create a compatibility view:
CREATE VIEW public.order_totals AS
SELECT * FROM pgtrickle.order_totals;
7. Verify correctness
After creating the stream table and running a refresh, compare results:
-- Compare row counts
SELECT 'immv' AS source, COUNT(*) FROM public.order_totals_immv
UNION ALL
SELECT 'stream_table', COUNT(*) FROM pgtrickle.order_totals;
-- Full diff (should return zero rows)
(SELECT * FROM public.order_totals_immv EXCEPT SELECT * FROM pgtrickle.order_totals)
UNION ALL
(SELECT * FROM pgtrickle.order_totals EXCEPT SELECT * FROM public.order_totals_immv);
8. Drop the old IMMV
Once you have verified the stream table is correct and applications are updated:
DROP TABLE public.order_totals_immv;
9. (Optional) Remove pg_ivm
After all IMMVs are migrated:
DROP EXTENSION pg_ivm CASCADE;
Remove pg_ivm from shared_preload_libraries if it was listed there and
restart PostgreSQL.
Behavioral Differences to Be Aware Of
Locking
- pg_ivm: Holds
ExclusiveLockon the IMMV during maintenance. InREPEATABLE READ/SERIALIZABLE, concurrent writes to the same IMMV's base tables may raise serialization errors. - pg_trickle (IMMEDIATE): Uses advisory locks. Concurrent reads of the stream table are never blocked.
- pg_trickle (deferred): Base table writes only insert into change buffers (~2–50 μs). No lock contention with refresh.
TRUNCATE
- pg_ivm: Synchronously truncates or fully refreshes the IMMV.
- pg_trickle (IMMEDIATE): Performs a full refresh within the same transaction.
- pg_trickle (deferred): Clears the change buffer and queues a full refresh on the next cycle.
Logical Replication
- pg_ivm: Not compatible with logical replication — subscriber nodes do not have triggers that fire for replicated changes.
- pg_trickle (deferred): Supports WAL-based CDC (
pg_trickle.cdc_mode = 'wal') which reads from the WAL directly. Trigger-based CDC also works with logical replication if triggers are created on the subscriber.
Schema Changes
- pg_ivm: No automatic DDL tracking. If a base table column is altered, the IMMV may break silently.
- pg_trickle: Event triggers detect DDL changes on source tables and automatically reinitialize affected stream tables.
Upgrading Queries That pg_ivm Couldn't Handle
pg_ivm's SQL restrictions often force users to create workarounds. With pg_trickle, many of these workarounds can be simplified:
HAVING clauses
-- pg_ivm workaround: filter in application or wrap in a view
SELECT pgivm.create_immv('big_customers',
'SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id'
);
-- Then: SELECT * FROM big_customers WHERE total > 1000;
-- pg_trickle: use HAVING directly
SELECT pgtrickle.create_stream_table('big_customers',
'SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id
HAVING SUM(amount) > 1000'
);
NOT EXISTS / anti-joins
-- pg_ivm: not supported — manual workaround required
-- pg_trickle: works directly
SELECT pgtrickle.create_stream_table('orphan_orders',
'SELECT o.* FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM customers c WHERE c.id = o.customer_id)'
);
Window functions
-- pg_ivm: not supported
-- pg_trickle: works directly
SELECT pgtrickle.create_stream_table('ranked_products',
'SELECT product_id, category, revenue,
RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS rnk
FROM product_revenue'
);
UNION ALL pipelines
-- pg_ivm: not supported — requires separate IMMVs + application-side UNION
-- pg_trickle: works directly
SELECT pgtrickle.create_stream_table('all_events',
'SELECT id, ts, ''order'' AS type FROM order_events
UNION ALL
SELECT id, ts, ''return'' AS type FROM return_events'
);
Monitoring After Migration
pg_trickle provides extensive monitoring that pg_ivm does not offer:
-- Overall health
SELECT * FROM pgtrickle.health_check();
-- Status of all stream tables (includes staleness, last refresh, error count)
SELECT * FROM pgtrickle.pgt_status();
-- Recent refresh history across all stream tables
SELECT * FROM pgtrickle.refresh_timeline(20);
-- CDC pipeline health
SELECT * FROM pgtrickle.change_buffer_sizes();
-- Diagnose errors for a specific stream table
SELECT * FROM pgtrickle.diagnose_errors('order_totals');
See SQL Reference for the complete list of monitoring functions.
Frequently Asked Questions
This FAQ covers everything from core concepts and getting started, through SQL support details, to operational topics like deployment, monitoring, and troubleshooting. Use the table of contents below to jump to a specific topic.
New User FAQ — Top 15 Questions
New to pg_trickle? Start here. Each answer is a short summary with a link to the full explanation further down.
1. What is pg_trickle?
A PostgreSQL 18 extension that adds stream tables — materialized views that refresh themselves incrementally, processing only changed rows instead of re-running the entire query. Full answer →
2. How is this different from a materialized view?
Stream tables refresh automatically on a schedule, support incremental
(differential) refresh, track changes via CDC triggers, and propagate updates
through dependency chains — none of which REFRESH MATERIALIZED VIEW provides.
Full answer →
3. How do I install pg_trickle?
Install from the Docker image, PGXN, or build from source. Add
shared_preload_libraries = 'pg_trickle' to postgresql.conf, then
CREATE EXTENSION pg_trickle; in each database. Full answer →
4. How do I create my first stream table?
One function call: SELECT pgtrickle.create_stream_table(name => 'my_st', query => 'SELECT ...', schedule => '5s');
See the Getting Started guide for a walkthrough.
Full answer →
5. What is the difference between FULL and DIFFERENTIAL refresh?
FULL re-runs the entire defining query. DIFFERENTIAL reads only the changed rows from the change buffer and computes the delta — orders of magnitude faster for small changes on large tables. AUTO mode picks the best strategy per cycle. Full answer →
6. Which refresh mode should I use?
Use AUTO (the default) — it selects DIFFERENTIAL when possible and falls back to FULL when needed. Use IMMEDIATE for same-transaction consistency. Use FULL only when the defining query uses volatile functions or is not IVM-eligible. Full answer →
7. What SQL features are supported?
Joins (INNER, LEFT, RIGHT, FULL OUTER, CROSS, LATERAL), aggregates (60+ functions including SUM, COUNT, AVG, array_agg, jsonb_agg), CTEs (including recursive), window functions, UNION/INTERSECT/EXCEPT, subqueries, CASE, COALESCE, DISTINCT, GROUP BY with ROLLUP/CUBE/GROUPING SETS, and more. Full answer →
8. How fresh is my stream table data?
As fresh as the refresh schedule allows. With a 1s schedule, data is typically
< 2 seconds stale. With IMMEDIATE mode, data is updated within the same
transaction as the source write. Full answer →
9. Can I chain stream tables (ST reads from another ST)?
Yes — stream tables can reference other stream tables. pg_trickle builds a dependency DAG and refreshes them in topological order automatically. Full answer →
10. How does change data capture work?
Lightweight row-level AFTER triggers capture every INSERT, UPDATE, and DELETE
into per-table change buffers. If wal_level = logical is available,
pg_trickle can automatically transition to WAL-based CDC for near-zero
write-path overhead. Full answer →
11. Do I need wal_level = logical?
No. pg_trickle works with the default wal_level = replica using trigger-based
CDC. WAL-based CDC is optional and provides lower write-path overhead.
Full answer →
12. Can I use pg_trickle with PgBouncer / connection poolers?
Yes. pg_trickle's background workers use direct connections, not pooled ones. Your application can use any pooler for reads and writes — the scheduler operates independently. Full answer →
13. How do I monitor stream table health?
Built-in views (pgtrickle.pgt_status, pgtrickle.pgt_refresh_history),
Prometheus metrics endpoint, Grafana dashboard, NOTIFY-based alerts, and
a TUI tool. Full answer →
14. What happens if a refresh fails?
The stream table is marked SUSPENDED after exceeding the fuse threshold (default
5 consecutive failures). Data in the change buffer is preserved. Use
pgtrickle.reset_fuse('my_st') to resume after fixing the issue.
Full answer →
15. Can I use pg_trickle with dbt?
Yes — the dbt-pgtrickle package provides a stream_table materialization.
dbt run creates/alters stream tables, dbt source freshness checks staleness.
Full answer →
Table of Contents
Getting started
- General — What pg_trickle is, how IVM works, key concepts
- Installation & Setup — Installing, configuring, uninstalling
- Creating & Managing Stream Tables — Create, alter, drop, schedules
Consistency & refresh modes
- Data Freshness & Consistency — Staleness, read-your-writes, DVS
- IMMEDIATE Mode (Transactional IVM) — Same-transaction refresh
SQL features
- SQL Support — Supported and unsupported SQL constructs
- Aggregates & Group-By — Incremental aggregates, HAVING, auxiliary columns
- Joins — Multi-table delta computation, FULL OUTER JOIN
- CTEs & Recursive Queries — Semi-naive, DRed, recomputation strategies
- Window Functions & LATERAL — Partition-based recomputation, SRFs
- TopK (ORDER BY … LIMIT) — Bounded result sets
- Tables Without Primary Keys — Content-based row identity
Internals & architecture
- Change Data Capture (CDC) — Triggers, WAL transition, why
autois the default, change buffers - Diamond Dependencies & DAG Scheduling — Topological ordering, atomic groups
- Schema Changes & DDL Events — Reinitialize, event triggers
Operations
- Performance & Tuning — Scheduler tuning, min schedule risks, disk space, adaptive fallback
- Interoperability — Views, replication, connection poolers, triggers, pgvector
- dbt Integration — Materialization, commands, freshness checks
- Row-Level Security (RLS) — Source vs stream table policies, SECURITY DEFINER triggers
- Deployment & Operations — Workers, upgrades, replicas, Kubernetes
- Monitoring & Alerting — Views, NOTIFY alerts, failure handling
- Configuration Reference — All GUC parameters
Troubleshooting & reference
- Troubleshooting — Common problems and debugging
- Why Are These SQL Features Not Supported? — Technical explanations for each limitation
- Why Are These Stream Table Operations Restricted? — Why direct DML, ALTER TABLE, and TRUNCATE are disallowed
General
These questions cover fundamental concepts — what pg_trickle is, how incremental view maintenance works, and the key building blocks (frontiers, row IDs, the auto-rewrite pipeline) that power the extension.
What is pg_trickle?
pg_trickle is a PostgreSQL 18 extension that implements stream tables — declarative, automatically-refreshing materialized views with Differential View Maintenance (DVM). You define a SQL query and a refresh schedule; the extension handles change capture, delta computation, and incremental refresh automatically.
It is inspired by the DBSP differential dataflow framework. See DBSP_COMPARISON.md for a detailed comparison.
What is incremental view maintenance (IVM) and why does it matter?
Incremental View Maintenance means updating a materialized view by processing only the changes (deltas) to the source data, rather than re-executing the entire defining query from scratch.
Consider a stream table defined as SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id over a 10-million-row orders table. When you insert 5 new rows:
- Without IVM (FULL refresh): Re-scans all 10 million rows and recomputes every group. Cost: O(total rows).
- With IVM (DIFFERENTIAL refresh): Reads only the 5 new rows from the change buffer, identifies the affected groups, and updates just those groups. Cost: O(changed rows × affected groups).
pg_trickle's DVM engine implements IVM using differentiation rules for each SQL operator (Scan, Filter, Join, Aggregate, etc.), generating a delta query that computes the exact changes to the stream table from the exact changes to the source.
What is the difference between a stream table and a regular materialized view, in practice?
| Feature | Materialized Views | Stream Tables |
|---|---|---|
| Refresh | Manual (REFRESH MATERIALIZED VIEW) | Automatic (scheduler) or manual |
| Incremental refresh | Not supported natively | Built-in differential mode |
| Change detection | None — always full recompute | CDC triggers track row-level changes |
| Dependency ordering | None | DAG-aware topological refresh |
| Monitoring | None | Built-in views, stats, NOTIFY alerts |
| Schedule | None | Duration strings (5m) or cron (*/5 * * * *) |
| Transactional IVM | No | Yes (IMMEDIATE mode) |
In practice, stream tables are regular PostgreSQL heap tables under the hood — you can query them, create indexes on them, join them with other tables, and reference them from views. The key difference is that pg_trickle manages their contents automatically.
What happens behind the scenes when I INSERT a row into a table tracked by a stream table?
The full data flow for a DIFFERENTIAL-mode stream table:
- Your INSERT completes normally. The row is written to the source table.
- A CDC trigger fires (row-level
AFTER INSERT). It writes a change record (action=I, the new row data as JSONB, the current WAL LSN) into the source's change buffer table (pgtrickle_changes.changes_<oid>). This happens within your transaction — if you roll back, the change record is also rolled back. - You commit. Both the source row and the change record become visible.
- The scheduler wakes up (every
pg_trickle.scheduler_interval_ms, default 1 second). It checks whether the stream table's schedule says a refresh is due. - If due, the refresh engine runs. It reads the change buffer for rows with LSN > the stream table's current frontier, generates a delta query from the DVM operator tree, and applies the result via
MERGE. - Frontier advances. The stream table's frontier is updated to the new LSN, and the consumed change buffer rows are cleaned up.
For IMMEDIATE-mode stream tables, steps 2–6 are replaced: a statement-level AFTER trigger computes and applies the delta within your transaction, so the stream table is updated before your transaction commits.
What does "differential" mean in the context of pg_trickle?
"Differential" refers to the mathematical approach of computing differences (deltas) rather than absolute values. Given a query Q and a set of changes ΔR to source table R, the DVM engine computes ΔQ(R, ΔR) — the change to the query result caused by the change to the source. This delta is then applied (merged) into the stream table.
Each SQL operator has its own differentiation rule. For example:
- Filter: ΔFilter(R, ΔR) = Filter(ΔR) — just apply the filter to the changes.
- Join: ΔJoin(R, S, ΔR) = Join(ΔR, S) — join the changes against the other side's current state.
- Aggregate: Recompute only the groups whose keys appear in the changes.
See DVM_OPERATORS.md for the complete set of differentiation rules.
What is a frontier, and why does pg_trickle track LSNs?
A frontier is a per-source map of {source_oid → LSN} that records exactly how far each stream table has consumed changes from each of its source tables. It is stored as JSONB in the pgtrickle.pgt_stream_tables catalog.
Why LSNs? PostgreSQL's Write-Ahead Log Sequence Number (LSN) provides a globally ordered, monotonically increasing position in the change stream. By recording the LSN at which each source was last consumed, the frontier ensures:
- No missed changes. The next refresh reads changes with LSN > frontier, ensuring contiguous, non-overlapping windows.
- No duplicate processing. Changes at or below the frontier are never re-read.
- Consistent snapshots. When a stream table depends on multiple source tables, the frontier tracks each source independently, enabling consistent multi-source delta computation.
Lifecycle: Created on first full refresh → Advanced on each differential refresh → Reset on reinitialize.
What is the __pgt_row_id column and why does it appear in my stream tables?
Every stream table has a __pgt_row_id BIGINT PRIMARY KEY column. It stores a 64-bit xxHash of the row's group-by key (for aggregate queries) or all output columns (for non-aggregate queries). The refresh engine uses it to match incoming deltas against existing rows during the MERGE operation.
You should ignore this column in your queries. It is an implementation detail. If it bothers you, exclude it explicitly:
SELECT customer_id, total FROM order_totals; -- omit __pgt_row_id
What is the auto-rewrite pipeline and how does it affect my queries?
Before parsing a defining query into the DVM operator tree, pg_trickle runs six automatic rewrite passes:
| # | Pass | What it does |
|---|---|---|
| 0 | View inlining | Replaces view references with (view_definition) AS alias subqueries (fixpoint, max depth 10) |
| 1 | DISTINCT ON | Converts to ROW_NUMBER() OVER (PARTITION BY … ORDER BY …) = 1 subquery |
| 2 | GROUPING SETS / CUBE / ROLLUP | Decomposes into UNION ALL of separate GROUP BY queries |
| 3 | Scalar subquery in WHERE | Rewrites WHERE col > (SELECT …) to CROSS JOIN |
| 4 | Correlated scalar subquery in SELECT | Rewrites to LEFT JOIN with grouped inline view |
| 5 | SubLinks in OR | Splits WHERE a OR EXISTS (…) into UNION branches |
The rewrites are transparent — your original query is preserved in the catalog (original_query column) while the rewritten version is stored in defining_query. The DVM engine only sees standard SQL operators after rewriting.
See ARCHITECTURE.md for details on each pass.
How does pg_trickle compare to DBSP (the academic framework)?
pg_trickle is inspired by DBSP but is not a direct implementation. Key differences:
- DBSP is a general-purpose differential dataflow framework with a Rust runtime (Feldera). It models computation as circuits over Z-sets (multisets with integer weights).
- pg_trickle implements the same mathematical principles (delta queries, frontier tracking) but embedded inside PostgreSQL as an extension. It generates SQL delta queries rather than running a separate computation engine.
- Trade-off: pg_trickle leverages PostgreSQL's optimizer, indexes, and storage engine but is limited to what can be expressed as SQL queries. DBSP can implement arbitrary dataflow computations.
See DBSP_COMPARISON.md for a detailed comparison.
How does pg_trickle compare to pg_ivm?
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| Refresh timing | Immediate (same transaction) only | Immediate, Deferred (scheduled), or Manual |
| Incremental strategy | Transition tables + query rewriting | DVM operator tree + delta SQL generation |
| Supported SQL | Inner joins, simple outer joins, COUNT/SUM/AVG/MIN/MAX, EXISTS, DISTINCT | All of the above + window functions, recursive CTEs, LATERAL, UNION/INTERSECT/EXCEPT, 37 aggregates, TopK, GROUPING SETS |
| Cascading (view-on-view) | No | Yes (DAG-aware topological refresh) |
| Scheduling | None (always immediate) | Duration, cron, CALCULATED, or NULL |
| Monitoring | None | Built-in views, stats, NOTIFY alerts |
| PostgreSQL version | 14–17 | 18 only (until v0.4.0) |
pg_trickle's IMMEDIATE mode is designed as a migration path for pg_ivm users — it uses the same statement-level trigger approach with transition tables.
What PostgreSQL versions are supported?
PostgreSQL 18.x exclusively. pg_trickle uses PostgreSQL 18 features such as enhanced MERGE syntax with NOT MATCHED BY SOURCE and improved event trigger payloads. These features are not available in earlier versions.
Backward compatibility with PostgreSQL 16–17 is planned for a future release (tracked in the roadmap).
Does pg_trickle require wal_level = logical?
No. By default, pg_trickle uses lightweight row-level triggers for change data capture instead of logical replication. This means you do not need to set wal_level = logical, configure max_replication_slots, or create publications.
If you later enable the hybrid CDC mode (pg_trickle.cdc_mode = 'auto'), WAL-based capture becomes an option — but this is opt-in and not required for normal operation.
Is pg_trickle production-ready?
pg_trickle is under active development and approaching production readiness. It has a comprehensive test suite with 700+ unit tests and 290+ end-to-end tests covering correctness, failure recovery, and concurrency scenarios.
That said, as with any new extension, you should evaluate it against your specific workloads before deploying to production. Start with non-critical dashboards or reporting tables, monitor refresh performance and data correctness, and gradually expand usage as confidence grows.
Installation & Setup
How do I install pg_trickle?
- Add
pg_trickletoshared_preload_librariesinpostgresql.conf:shared_preload_libraries = 'pg_trickle' - Restart PostgreSQL.
- Run:
CREATE EXTENSION pg_trickle;
See INSTALL.md for platform-specific instructions and pre-built release artifacts.
What are the minimum configuration requirements?
The only mandatory setting is adding pg_trickle to shared_preload_libraries in postgresql.conf (this requires a PostgreSQL restart):
shared_preload_libraries = 'pg_trickle'
All other GUC parameters have sensible defaults and can be tuned later. However, max_worker_processes often needs to be raised from its default of 8 — see the next question.
Can I install pg_trickle on a managed PostgreSQL service (RDS, Cloud SQL, etc.)?
It depends on whether the service allows custom extensions and shared_preload_libraries modifications. Many managed services restrict these. However, pg_trickle has one advantage over replication-based extensions: it does not require wal_level = logical, which avoids one of the most common restrictions on managed PostgreSQL services.
Check your provider's documentation for custom extension support. Services that support custom extensions (e.g., some tiers of Azure Flexible Server, Supabase, Neon) are more likely to work.
How do I uninstall pg_trickle?
- Drop all stream tables first (or they will be cascade-dropped):
SELECT pgtrickle.drop_stream_table(pgt_name) FROM pgtrickle.pgt_stream_tables; - Drop the extension:
DROP EXTENSION pg_trickle CASCADE; - Remove
pg_tricklefromshared_preload_librariesand restart PostgreSQL.
Creating & Managing Stream Tables
Do I need to choose a refresh mode?
No. The default mode ('AUTO') is adaptive: it uses differential (delta-only)
maintenance when efficient, and automatically falls back to full
recomputation when the change volume is high or the query cannot be
differentiated. This works well for the vast majority of queries.
You only need to specify a mode explicitly when:
- You want FULL mode to force recomputation every time (rare).
- You want IMMEDIATE mode for sub-second, in-transaction updates (adds overhead to every write on source tables).
- You want strict DIFFERENTIAL mode and prefer an error over silent fallback when the query isn't differentiable.
How do I create a stream table?
-- Minimal: just name and query. Refreshes on a calculated schedule
-- using adaptive differential maintenance.
SELECT pgtrickle.create_stream_table(
'order_totals',
'SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id'
);
-- With custom schedule:
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id',
schedule => '5m'
);
What is the difference between FULL and DIFFERENTIAL refresh mode?
- FULL — Truncates the stream table and re-runs the entire defining query every refresh cycle. Simple but expensive for large result sets.
- DIFFERENTIAL — Computes only the delta (changes since the last refresh) using the DVM engine and applies it via a
MERGEstatement. Much faster when only a small fraction of source data changes between refreshes. When the change ratio exceedspg_trickle.differential_max_change_ratio(default 15%), DIFFERENTIAL automatically falls back to FULL for that cycle. - IMMEDIATE — Maintains the stream table synchronously within the same transaction as the base table DML. Uses statement-level triggers with transition tables — no change buffers, no scheduler. The stream table is always up-to-date.
Why does FULL mode exist if DIFFERENTIAL can fall back to it automatically?
DIFFERENTIAL mode with adaptive fallback covers most user needs — it uses incremental deltas when changes are small and automatically switches to a full recompute when the change ratio is high. However, explicit FULL mode still has its place:
-
No CDC overhead. FULL mode installs CDC triggers on source tables (for DAG tracking), but the refresh itself ignores the change buffers entirely. If your workload has very high write throughput and you know you'll always do a full recompute, FULL mode avoids the per-row trigger overhead of writing change records that will never be consumed incrementally.
-
Simpler debugging. When investigating data correctness issues, FULL mode is a clean baseline — it re-runs the defining query with no delta computation, no frontier tracking, and no MERGE logic. If FULL produces correct results but DIFFERENTIAL doesn't, the bug is in the delta pipeline.
-
Predictable performance. DIFFERENTIAL refresh time varies with the number of changes, which can be unpredictable. FULL refresh time is proportional to the total result set size, which is stable. For SLA-sensitive workloads where you'd rather have consistent 500ms refreshes than variable 5ms–500ms refreshes, FULL provides that predictability.
-
Unsupported-but-planned constructs. Some queries may parse correctly in DIFFERENTIAL mode but produce suboptimal deltas. Using FULL mode explicitly is a safe fallback while the DVM engine matures.
For most users, DIFFERENTIAL is the right default. Use FULL when you have a specific reason.
When should I use FULL vs. DIFFERENTIAL vs. IMMEDIATE?
Use DIFFERENTIAL (default) when:
- Source tables are large and changes between refreshes are small
- The defining query uses supported operators (most common SQL is supported)
- Some staleness (seconds to minutes) is acceptable
Use FULL when:
- The defining query uses unsupported aggregates (
CORR,COVAR_*,REGR_*) - Source tables are small and a full recompute is cheap
- You see frequent adaptive fallbacks to FULL (check refresh history)
Use IMMEDIATE when:
- The stream table must always reflect the latest committed data
- You need transactional consistency (reads within the same transaction see updated data)
- Write-side overhead per DML statement is acceptable
- The defining query is relatively simple (no TopK, no materialized view sources)
What are the advantages and disadvantages of IMMEDIATE vs. deferred (FULL/DIFFERENTIAL) refresh modes?
IMMEDIATE mode
| Detail | |
|---|---|
| ✅ Read-your-writes consistency | The stream table is updated within the same transaction as the base table DML — always current from the writer's perspective. |
| ✅ No lag | No background worker, no schedule interval. The view is never stale. |
| ✅ No change buffers | pgtrickle_changes.* tables are not used, reducing write overhead on source tables. |
| ✅ pg_ivm compatibility | Drop-in migration path for existing pg_ivm / IMMV users. |
| ❌ Write amplification | Every DML statement on a base table also executes IVM trigger logic, adding latency to the original transaction. |
| ❌ Serialized concurrent writes | An ExclusiveLock is taken on the stream table during maintenance, serializing writers. |
| ❌ Limited SQL support | Window functions, recursive CTEs, LATERAL joins, scalar subqueries, and TopK (ORDER BY … LIMIT) are not supported — use DIFFERENTIAL instead. |
| ❌ Cascading limitations | Cascading IMMEDIATE stream tables work but may require manual refresh for deep chains. |
| ❌ No throttling | The refresh cannot be delayed or rate-limited. |
Deferred mode (FULL / DIFFERENTIAL)
| Detail | |
|---|---|
| ✅ Decoupled write path | Base table writes are fast; view maintenance runs later via the scheduler or manual refresh. |
| ✅ Broadest SQL support | Window functions, recursive CTEs, LATERAL, UNION, user-defined aggregates, TopK, cascading stream tables, and more. |
| ✅ Adaptive cost control | DIFFERENTIAL automatically falls back to FULL when the change ratio exceeds pg_trickle.differential_max_change_ratio. |
| ✅ Concurrency-friendly | Writers never block on view maintenance. |
| ❌ Staleness | The stream table lags by up to one schedule interval (e.g. 1m). |
| ❌ No read-your-writes | A writer querying the stream table immediately after a write may see the pre-change data. |
| ❌ Infrastructure overhead | Requires change buffer tables, a background worker, and frontier tracking. |
Rule of thumb: use IMMEDIATE when the query is simple and freshness within the transaction matters. Use DIFFERENTIAL (or FULL) for complex queries, high concurrency, or when you want to decouple write latency from view maintenance.
What happens if I have an IMMEDIATE stream table between two DIFFERENTIAL stream tables in a dependency chain?
Consider the chain: source → ST_A (DIFFERENTIAL) → ST_B (IMMEDIATE) → ST_C (DIFFERENTIAL). This is a valid but unusual configuration with important behavioral consequences:
- ST_A refreshes on its schedule (e.g., every 1 minute) via the background scheduler.
- ST_B is IMMEDIATE, so it has no CDC triggers on ST_A — it uses statement-level IVM triggers. But ST_A is updated by the scheduler (not by user DML), and the scheduler's
MERGEoperation does fire statement-level triggers on ST_A's dependents. So ST_B updates within the scheduler's transaction when ST_A refreshes. - ST_C is DIFFERENTIAL and depends on ST_B. Since ST_B is a stream table, ST_C's CDC triggers fire when ST_B is modified. The scheduler refreshes ST_C on its own schedule.
The practical concern: write latency stacking. When the scheduler refreshes ST_A, ST_B's IVM triggers fire synchronously within that same transaction, adding IVM overhead to ST_A's refresh. If ST_B's delta computation is expensive, it slows down the entire scheduler cycle.
Recommendation: Avoid mixing IMMEDIATE into the middle of a deferred chain. Either make the entire chain IMMEDIATE (for small, simple queries) or keep it entirely DIFFERENTIAL. If you need read-your-writes for one specific step, consider making that the terminal (leaf) stream table in the chain.
What schedule formats are supported?
Duration strings:
| Unit | Suffix | Example |
|---|---|---|
| Seconds | s | 30s |
| Minutes | m | 5m |
| Hours | h | 2h |
| Days | d | 1d |
| Weeks | w | 1w |
| Compound | — | 1h30m |
Cron expressions:
| Format | Example | Description |
|---|---|---|
| 5-field | */5 * * * * | Every 5 minutes |
| Aliases | @hourly, @daily | Built-in shortcuts |
CALCULATED mode: Pass NULL as the schedule to inherit the schedule from downstream dependents.
How do cron schedules handle timezones? What does @daily really mean?
pg_trickle evaluates cron expressions in UTC. The underlying croner crate computes the next occurrence from a UTC timestamp, and the scheduler compares this against chrono::Utc::now(). There is no per-stream-table timezone setting.
This means:
@daily(equivalent to0 0 * * *) fires at midnight UTC, not midnight in your local timezone.@hourly(equivalent to0 * * * *) fires at the top of each UTC hour.0 9 * * 1-5fires at 09:00 UTC on weekdays — if your server is inAmerica/New_York, that's 04:00 or 05:00 local time depending on DST.
If you need a schedule aligned to a local timezone, convert the desired local time to UTC and write the cron expression accordingly. For example, to refresh at 08:00 Europe/Oslo (UTC+1 in winter, UTC+2 in summer), use 0 6 * * * in summer and 0 7 * * * in winter — or accept the 1-hour seasonal shift and pick one.
Tip: For most analytics workloads, UTC-based schedules are preferable because they don't shift with daylight saving transitions.
What is the minimum allowed schedule?
The pg_trickle.min_schedule_seconds GUC (default: 60 seconds) sets the shortest allowed refresh schedule. Any create_stream_table or alter_stream_table call with a schedule shorter than this floor is rejected with a clear error message.
This guard exists to prevent accidentally creating stream tables that refresh too frequently, which could overload the scheduler or the source tables. During development and testing, you can lower it:
ALTER SYSTEM SET pg_trickle.min_schedule_seconds = 1;
SELECT pg_reload_conf();
What happens if all stream tables in the DAG have a CALCULATED schedule?
When every stream table uses a CALCULATED schedule (schedule => 'calculated'), there
are no explicit schedules for the resolution algorithm to derive from. The
CALCULATED logic works by propagating MIN(effective_schedule) from downstream
dependents upward through the DAG. If no node has an explicit duration:
- Leaf nodes (no downstream dependents) have no schedules to take the
minimum of, so they fall back to the
pg_trickle.min_schedule_secondsGUC (default: 60 seconds). - Upstream nodes then resolve to
MIN(fallback) = fallback. - The result: every stream table in the DAG gets the fallback schedule (60 s by default).
This is safe but usually not what you want — the whole DAG refreshes at the same generic interval. Best practice is to set an explicit schedule on at least the leaf (most-downstream) stream tables so that upstream CALCULATED schedules resolve to something meaningful:
-- Leaf ST with an explicit schedule
SELECT pgtrickle.create_stream_table(
name => 'daily_summary',
query => 'SELECT region, SUM(total) FROM pgtrickle.order_totals GROUP BY region',
schedule => '10m'
);
-- Upstream ST inherits that 10 m schedule via CALCULATED
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
schedule => 'calculated'
);
You can inspect the resolved effective schedules with:
SELECT pgt_name, schedule, effective_schedule
FROM pgtrickle.pgt_stream_tables;
Can a stream table reference another stream table?
Yes. Stream tables can depend on other stream tables. The scheduler automatically refreshes them in topological order (upstream first). Circular dependencies are detected and rejected at creation time.
-- ST1: aggregates orders
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
-- ST2: filters ST1
SELECT pgtrickle.create_stream_table(
name => 'big_customers',
query => 'SELECT customer_id, total FROM pgtrickle.order_totals WHERE total > 1000',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
How do I change a stream table's schedule or mode?
-- Change schedule
SELECT pgtrickle.alter_stream_table('order_totals', schedule => '10m');
-- Switch refresh mode
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'FULL');
-- Suspend
SELECT pgtrickle.alter_stream_table('order_totals', status => 'SUSPENDED');
-- Resume
SELECT pgtrickle.alter_stream_table('order_totals', status => 'ACTIVE');
Can I change the defining query of a stream table?
Yes — use the query parameter of alter_stream_table():
SELECT pgtrickle.alter_stream_table('order_totals',
query => 'SELECT customer_id, SUM(amount) AS total, COUNT(*) AS order_count
FROM orders GROUP BY customer_id');
The ALTER QUERY operation validates the new query, migrates the storage table schema if needed, updates catalog entries and source dependencies, and runs a full refresh — all within a single transaction. Concurrent readers see either the old data or the new data, never an empty table.
Schema migration behavior:
| Schema change | Behavior |
|---|---|
| Same columns | Fast path — no storage DDL, just catalog update + full refresh |
| Columns added or removed | Compatible migration via ALTER TABLE ADD/DROP COLUMN — storage table OID preserved |
| Column type incompatible | Full rebuild — storage table dropped and recreated (OID changes, WARNING emitted) |
You can also change the query and other parameters simultaneously:
SELECT pgtrickle.alter_stream_table('order_totals',
query => 'SELECT customer_id, SUM(amount) AS total FROM orders GROUP BY customer_id',
refresh_mode => 'FULL');
How do I deploy stream tables idempotently?
Use create_or_replace_stream_table() — one function call that does the right
thing automatically:
-- Safe to run on every deploy — creates, updates, or no-ops as needed:
SELECT pgtrickle.create_or_replace_stream_table(
name => 'order_totals',
query => 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
schedule => '2m',
refresh_mode => 'DIFFERENTIAL'
);
What happens on each deploy:
| Situation | Action |
|---|---|
| First deploy (stream table doesn't exist) | Creates it, populates data |
| Nothing changed since last deploy | No-op — logs INFO, returns instantly |
| You changed the schedule or mode | Updates config in place (no data loss) |
| You changed the query | Migrates storage schema + runs a full refresh |
This mirrors PostgreSQL's CREATE OR REPLACE VIEW / CREATE OR REPLACE FUNCTION pattern.
When to use which function:
| Function | Use case |
|---|---|
create_or_replace_stream_table() | Recommended for most deployments. Declarative, idempotent — handles all cases automatically. |
create_stream_table_if_not_exists() | Safe re-run, but never modifies an existing definition. Good for one-time seed migrations. |
create_stream_table() | Strict mode — errors if the stream table already exists. Use when you want an explicit failure on duplicates. |
How do I trigger a manual refresh?
Call refresh_stream_table() to immediately refresh a stream table without waiting for the next scheduled cycle:
SELECT pgtrickle.refresh_stream_table('order_totals');
This runs a synchronous refresh in your current session and returns when complete. It works even when the background scheduler is disabled (pg_trickle.enabled = false), making it useful for testing, debugging, or one-off data refreshes.
To force a full refresh regardless of the stream table's configured mode, temporarily change the refresh mode:
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'FULL');
SELECT pgtrickle.refresh_stream_table('order_totals');
-- Switch back to the original mode when done:
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'DIFFERENTIAL');
Data Freshness & Consistency
Understanding when and how stream tables become current is the #1 conceptual hurdle for users coming from synchronous materialized views. This section explains staleness guarantees, read-your-writes behavior, and Delayed View Semantics (DVS).
How stale can a stream table be?
For deferred modes (FULL / DIFFERENTIAL): A stream table can be at most one schedule interval behind the source data, plus the time it takes to execute the refresh itself. For example, with schedule => '1m', the maximum staleness is approximately 1 minute + refresh duration.
In practice, staleness is often less than the schedule interval because the scheduler continuously checks for due refreshes at pg_trickle.scheduler_interval_ms (default: 1 second).
For IMMEDIATE mode: The stream table is always current within the transaction that modified the source data. There is zero staleness.
Check current staleness:
SELECT pgtrickle.get_staleness('order_totals'); -- returns seconds, NULL if never refreshed
-- Or check all stream tables:
SELECT pgt_name, staleness, stale FROM pgtrickle.stream_tables_info;
Can I read my own writes immediately after an INSERT?
It depends on the refresh mode:
- IMMEDIATE mode: Yes. The stream table is updated within the same transaction as your INSERT. You can query it immediately and see the updated data.
- DIFFERENTIAL / FULL mode: No. The stream table is updated by the background scheduler in a separate transaction. Your INSERT is captured by the CDC trigger, but the stream table won't reflect it until the next scheduled refresh (or a manual
refresh_stream_table()call).
If read-your-writes consistency is a requirement, use refresh_mode => 'IMMEDIATE'.
What consistency guarantees does pg_trickle provide?
pg_trickle provides Delayed View Semantics (DVS): the contents of every stream table are logically equivalent to evaluating its defining query at some past point in time — the data_timestamp. This means:
- The data is always internally consistent — it corresponds to a valid snapshot of the source data.
- The data may be stale — it reflects the source state at
data_timestamp, not necessarily the current state. - For cascading stream tables, the scheduler refreshes in topological order so that when ST B references upstream ST A, A has already been refreshed before B runs its delta query against A's contents.
For IMMEDIATE mode, the guarantee is stronger: the stream table always reflects the state of the source data as of the current transaction.
What are "Delayed View Semantics" (DVS)?
DVS is the formal consistency guarantee: a stream table's contents are equivalent to evaluating its defining query at a specific past time (the data_timestamp). This is analogous to how a materialized view captured at a point in time is always internally consistent, even if the source data has since changed.
The data_timestamp is recorded in the catalog and advanced after each successful refresh:
SELECT pgt_name, data_timestamp FROM pgtrickle.pgt_stream_tables;
What happens if the scheduler is behind — does data get lost?
No. Change data is never lost, even if the scheduler falls behind. Changes accumulate in the change buffer tables (pgtrickle_changes.changes_<oid>) until consumed by a refresh. The frontier ensures that each refresh picks up exactly where the last one left off.
However, a growing change buffer increases:
- Disk usage (change buffer tables grow)
- Refresh time (more changes to process per cycle)
- Risk of adaptive fallback to FULL (if the change ratio exceeds
pg_trickle.differential_max_change_ratio)
The monitoring system emits a buffer_growth_warning NOTIFY alert if buffers grow unexpectedly.
How does pg_trickle ensure deltas are applied in the right order across cascading stream tables?
The scheduler uses topological ordering from the dependency DAG. When ST B depends on ST A:
- ST A is refreshed first — its data is brought up to date and its frontier advances.
- ST A's refresh writes are captured by CDC triggers (since ST A is a source for ST B).
- ST B is refreshed next — its delta query reads ST A's current (just-refreshed) data and the change buffer.
This ensures that downstream stream tables always see consistent upstream data. Circular dependencies are rejected at creation time.
IMMEDIATE Mode (Transactional IVM)
IMMEDIATE mode maintains the stream table synchronously — within the same transaction as the source DML. This section covers when to use it, what SQL it supports, locking behavior, and how to switch between modes.
When should I use IMMEDIATE mode instead of DIFFERENTIAL?
Use IMMEDIATE when:
- Your application requires read-your-writes consistency — e.g., a user inserts an order and immediately queries a dashboard that must include that order.
- The defining query is relatively simple (single-table aggregation, joins, filters).
- The source table write rate is moderate (IMMEDIATE adds latency to every DML statement).
Stick with DIFFERENTIAL when:
- Staleness of a few seconds to minutes is acceptable.
- The defining query uses unsupported IMMEDIATE constructs (materialized-view sources, foreign-table sources).
- Write-side performance is critical (high-throughput OLTP).
- You need to decouple write latency from view maintenance.
What SQL features are NOT supported in IMMEDIATE mode?
IMMEDIATE mode supports all constructs that DIFFERENTIAL supports, with two source-type exceptions:
| Feature | Status | Notes |
|---|---|---|
WITH RECURSIVE | ✅ Supported (IM1) | Semi-naive evaluation inside the trigger. A depth counter guards against infinite loops (pg_trickle.ivm_recursive_max_depth, default 100). A warning is emitted at create time for very deep hierarchies. |
TopK (ORDER BY … LIMIT N [OFFSET M]) | ✅ Supported (IM2) | Micro-refresh: recomputes the top-N rows on every DML statement. Gated by pg_trickle.ivm_topk_max_limit to prevent unbounded scans. |
| Materialized views as sources | ❌ Rejected | Stale-snapshot prevents trigger-based capture — use the underlying query instead. |
| Foreign tables as sources | ❌ Rejected | No triggers on foreign tables — use FULL mode instead. |
Attempting to create or switch to IMMEDIATE mode with an unsupported construct produces a clear error message.
What happens when I TRUNCATE a source table in IMMEDIATE mode?
A statement-level AFTER TRUNCATE trigger fires and truncates the stream table, then re-populates it by executing a full refresh from the defining query — all within the same transaction. The stream table remains consistent.
Can I have cascading IMMEDIATE stream tables (ST A → ST B)?
Yes. When ST A is IMMEDIATE and ST B depends on ST A and is also IMMEDIATE, changes propagate through the chain within the same transaction. The IVM triggers on the base table update ST A, and since that write is visible within the transaction, ST B's triggers fire and update ST B.
What locking does IMMEDIATE mode use?
IMMEDIATE mode acquires statement-level locks on the stream table during delta application:
- Simple queries (single-table scan/filter without aggregates or DISTINCT):
RowExclusiveLock— allows concurrent readers, blocks other writers. - Complex queries (joins, aggregates, DISTINCT, window functions):
ExclusiveLock— blocks both readers and writers to ensure delta consistency.
This means concurrent writes to the same base table are serialized through the stream table lock. For high-concurrency write workloads, DIFFERENTIAL mode avoids this bottleneck.
How do I switch an existing DIFFERENTIAL stream table to IMMEDIATE?
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'IMMEDIATE');
This:
- Validates the defining query against IMMEDIATE mode restrictions.
- Removes the row-level CDC triggers from source tables.
- Installs statement-level IVM triggers (BEFORE + AFTER with transition tables).
- Clears the schedule (IMMEDIATE mode has no schedule).
- Performs a full refresh to establish a consistent baseline.
To switch back:
SELECT pgtrickle.alter_stream_table('order_totals', refresh_mode => 'DIFFERENTIAL');
This reverses the process: removes IVM triggers, installs CDC triggers, restores the schedule (default 1m), and performs a full refresh.
What happens to IMMEDIATE mode during a manual refresh_stream_table() call?
For IMMEDIATE mode stream tables, refresh_stream_table() performs a FULL refresh — truncates and re-populates from the defining query. This is useful for recovering from edge cases or forcing a clean baseline. It is equivalent to pg_ivm's refresh_immv(name, true).
How much write-side overhead does IMMEDIATE mode add?
Each DML statement on a base table tracked by an IMMEDIATE stream table incurs:
- BEFORE trigger: Advisory lock acquisition + pre-state setup (~0.1–0.5 ms).
- AFTER trigger: Transition table copy to temp tables + delta SQL generation + delta application (~1–50 ms depending on query complexity and delta size).
For a simple single-table aggregate, expect 2–10 ms overhead per statement. For multi-table joins or window functions, overhead is higher. The overhead scales with the number of IMMEDIATE stream tables that depend on the same source table.
SQL Support
pg_trickle supports a broad range of SQL in defining queries. This section
covers what’s supported, what’s rejected (with rewrites), and how specific
constructs like aggregates and ORDER BY are handled. The subsections that
follow dive deeper into aggregates, joins, CTEs, window functions, and TopK.
What SQL features are supported in defining queries?
Most common SQL is supported in both FULL and DIFFERENTIAL modes:
- Table scans, projections,
WHERE/HAVINGfilters INNER,LEFT,RIGHT,FULL OUTER JOIN(including multi-table joins)GROUP BYwith 25+ aggregate functions (COUNT,SUM,AVG,MIN,MAX,BOOL_AND/OR,STRING_AGG,ARRAY_AGG,JSON_AGG,JSONB_AGG,BIT_AND/OR/XOR,STDDEV,VARIANCE,MODE,PERCENTILE_CONT/DISC, and more)FILTER (WHERE ...)on aggregatesDISTINCT- Set operations:
UNION ALL,UNION,INTERSECT,INTERSECT ALL,EXCEPT,EXCEPT ALL - Subqueries:
EXISTS,NOT EXISTS,IN (subquery),NOT IN (subquery), scalar subqueries - Non-recursive and recursive CTEs
- Window functions (
ROW_NUMBER,RANK,SUM OVER, etc.) LATERALjoins with set-returning functions and correlated subqueriesCASE,COALESCE,NULLIF,GREATEST,LEAST,BETWEEN,IS DISTINCT FROM
See DVM Operators for the complete list.
What SQL features are NOT supported?
The following are rejected with clear error messages and suggested rewrites:
| Feature | Reason | Suggested Rewrite |
|---|---|---|
TABLESAMPLE | Stream tables materialize the full result set | Use WHERE random() < fraction in consuming query |
| Window functions in expressions | Cannot be differentially maintained | Move window function to a separate column |
LIMIT / OFFSET (without ORDER BY) | Stream tables materialize the full result set; ORDER BY … LIMIT N [OFFSET M] is supported as TopK | Apply when querying the stream table, or add ORDER BY + LIMIT to use the TopK pattern |
FOR UPDATE / FOR SHARE | Row-level locking not applicable | Remove the locking clause |
RANGE_AGG / RANGE_INTERSECT_AGG | No incremental delta decomposition exists for range aggregates | Use FULL mode, or compute range unions in the consuming query |
Each rejected feature is explained in detail in the Why Are These SQL Features Not Supported? section below.
What happens to ORDER BY in defining queries?
ORDER BY in the defining query is accepted but silently discarded. This is consistent with how PostgreSQL handles CREATE MATERIALIZED VIEW AS SELECT ... ORDER BY ... — the ordering only affects the initial INSERT, not the stored data.
Stream tables are heap tables with no guaranteed row order. Apply ORDER BY when querying the stream table instead:
-- Don't rely on ORDER BY in the defining query:
-- 'SELECT region, SUM(amount) AS total FROM orders GROUP BY region ORDER BY total DESC'
-- Instead, order when reading:
SELECT * FROM regional_totals ORDER BY total DESC;
Exception: When ORDER BY is paired with LIMIT N (with or without OFFSET M), pg_trickle recognizes the TopK pattern and preserves the ordering, limit, and offset.
Which aggregates support DIFFERENTIAL mode?
Algebraic (O(changes), fully incremental): COUNT, SUM, AVG
Semi-algebraic (incremental with occasional group rescan): MIN, MAX
Group-rescan (affected groups re-aggregated from source): STRING_AGG, ARRAY_AGG, JSON_AGG, JSONB_AGG, BOOL_AND, BOOL_OR, BIT_AND, BIT_OR, BIT_XOR, JSON_OBJECT_AGG, JSONB_OBJECT_AGG, STDDEV, STDDEV_POP, STDDEV_SAMP, VARIANCE, VAR_POP, VAR_SAMP, MODE, PERCENTILE_CONT, PERCENTILE_DISC, CORR, COVAR_POP, COVAR_SAMP, REGR_AVGX, REGR_AVGY, REGR_COUNT, REGR_INTERCEPT, REGR_R2, REGR_SLOPE, REGR_SXX, REGR_SXY, REGR_SYY
37 aggregate function variants are supported in total.
Aggregates & Group-By
Aggregate handling is one of the most complex parts of incremental view maintenance. This section explains how pg_trickle categorizes aggregates by their incremental cost, how hidden auxiliary columns work, and what happens when groups are created or destroyed.
Which aggregates are fully incremental (O(1) per change) vs. group-rescan?
pg_trickle categorizes aggregates into three tiers:
| Tier | Cost per change | Aggregates | Mechanism |
|---|---|---|---|
| Algebraic | O(1) | COUNT, SUM, AVG | Hidden auxiliary columns (__pgt_count, __pgt_sum_x) track running totals. Delta updates these columns arithmetically. |
| Semi-algebraic | O(1) normally, O(group) on extremum deletion | MIN, MAX | Maintained via LEAST/GREATEST. If the current MIN/MAX is deleted, the group is rescanned to find the new extremum. |
| Group-rescan | O(group size) per affected group | All others (35 functions) | Affected groups are re-aggregated from source data. A NULL sentinel marks stale groups for rescan. |
For most workloads, the algebraic tier (COUNT/SUM/AVG) covers the majority of aggregations and is the fastest.
Why do some aggregates have hidden auxiliary columns?
For algebraic aggregates (COUNT, SUM, AVG), the DVM engine adds hidden __pgt_count and __pgt_sum_x columns to the stream table's storage. These store running totals that can be updated with O(1) arithmetic per change instead of rescanning the entire group.
For example, a stream table defined as SELECT dept, AVG(salary) FROM employees GROUP BY dept internally stores:
dept— the group-by keyavg— the user-visible average (computed as__pgt_sum_x / __pgt_count)__pgt_count— running count of rows in the group__pgt_sum_x— running sum of salary values__pgt_row_id— row identity hash
When a new employee is inserted, the refresh updates __pgt_count += 1, __pgt_sum_x += new_salary, and recomputes avg. No rescan of the source table is needed.
How does HAVING work with incremental refresh?
HAVING is fully supported in DIFFERENTIAL mode. The DVM engine tracks threshold transitions — groups entering or exiting the HAVING condition:
- Group crosses threshold upward: A previously excluded group (e.g.,
HAVING COUNT(*) > 5) gains enough members → the group is inserted into the stream table. - Group crosses threshold downward: A group that was included drops below the threshold → the group is deleted from the stream table.
- Group stays above threshold: Normal delta update (adjust aggregate values).
This means the stream table always reflects only the groups that satisfy the HAVING clause, even as group membership changes.
What happens to a group when all its rows are deleted?
When the last row of a group is deleted from the source table, the DVM engine detects that __pgt_count drops to zero and deletes the group row from the stream table. The hidden auxiliary columns are cleaned up along with it.
If a new row for the same group-by key is later inserted, a fresh group row is created from scratch.
Why are CORR, COVAR_*, and REGR_* limited to FULL mode?
Regression aggregates like CORR, COVAR_POP, COVAR_SAMP, and the REGR_* family require maintaining running sums of products and squares across the entire group. Unlike COUNT/SUM/AVG (where deltas can be computed from the change alone), regression aggregates:
- Lack algebraic delta rules. There is no closed-form way to update a correlation coefficient from a single row change without access to the full group's data.
- Would degrade to group-rescan anyway. Even if supported, the implementation would need to rescan the full group from source — identical to FULL mode for most practical group sizes.
These aggregates work fine in FULL refresh mode, which re-runs the entire query from scratch each cycle.
Joins
Join delta computation can produce surprising results when both sides change simultaneously. This section covers the standard IVM join rule, FULL OUTER JOIN support, and known edge cases.
How does a DIFFERENTIAL refresh handle a join when both sides changed?
When both tables in a join have changes since the last refresh, the DVM engine computes the join delta using the standard IVM join rule:
$$\Delta(R \bowtie S) = (\Delta R \bowtie S) \cup (R \bowtie \Delta S) \cup (\Delta R \bowtie \Delta S)$$
In practice, this means:
- Join the changes from the left against the current state of the right.
- Join the current state of the left against the changes from the right.
- Join the changes from both sides (handles simultaneous changes to matching keys).
All three parts are combined into a single CTE-based delta query that PostgreSQL executes in one pass.
Does pg_trickle support FULL OUTER JOIN incrementally?
Yes. FULL OUTER JOIN is supported in DIFFERENTIAL mode with an 8-part delta computation. This handles all four cases: matched rows on both sides, left-only rows, right-only rows, and rows that transition between matched and unmatched states as data changes.
The 8 parts cover: new left matches, removed left matches, new right matches, removed right matches, newly matched from left-only, newly matched from right-only, newly unmatched to left-only, and newly unmatched to right-only.
What happens when a join key is updated and the joined row is simultaneously deleted?
This is a known edge case. When a join key column is updated in the same refresh cycle as the joined-side row is deleted, the delta may miss the required DELETE, potentially leaving a stale row in the stream table.
Mitigations:
- The adaptive FULL fallback (triggered when the change ratio exceeds
pg_trickle.differential_max_change_ratio) catches most high-change-rate scenarios where this is likely. - You can stagger changes across refresh cycles.
- Use FULL mode for tables where this pattern is common.
How does NATURAL JOIN work?
NATURAL JOIN is fully supported. At parse time, pg_trickle resolves the common columns between the two tables and synthesizes explicit equi-join conditions. The internal __pgt_row_id column is excluded from common column resolution, so NATURAL JOINs between stream tables also work correctly.
CTEs & Recursive Queries
Recursive CTE support is a key differentiator for pg_trickle. This section explains the three maintenance strategies (semi-naive, DRed, recomputation) and when each is used.
Do recursive CTEs work in DIFFERENTIAL mode?
Yes. pg_trickle supports WITH RECURSIVE in DIFFERENTIAL mode with three auto-selected strategies:
| Strategy | When used | How it works |
|---|---|---|
| Semi-naive evaluation | INSERT-only changes to the base case | Iteratively evaluates new derivations from the inserted rows without touching existing rows. Fastest path. |
| Delete-and-Rederive (DRed) | Mixed changes (INSERT + DELETE/UPDATE) | Deletes potentially affected derived rows, then rederives them from scratch to determine the true delta. |
| Recomputation fallback | Column mismatch or non-monotone recursive terms | Falls back to full recomputation of the recursive CTE. Used when the recursive term contains EXCEPT, Aggregate, Window, DISTINCT, AntiJoin, or INTERSECT SET operators. |
The strategy is selected automatically based on the type of changes and the recursive term's structure.
What are the three strategies for recursive CTE maintenance?
See the table above. In brief:
- Semi-naive is the fast path for append-only workloads (e.g., adding nodes to a tree). It's O(new derivations) — much cheaper than a full re-evaluation.
- DRed handles deletions and updates correctly by first removing potentially invalidated rows and then rederiving them. More expensive than semi-naive, but still incremental.
- Recomputation is the safe fallback that re-executes the entire recursive CTE. Used when the recursive term's structure is too complex for incremental processing.
What triggers a fallback from semi-naive to recomputation?
A recomputation fallback is triggered when:
- The recursive term contains non-monotone operators —
EXCEPT,Aggregate,Window,DISTINCT,AntiJoin, orINTERSECT SET. These operators can "un-derive" rows when inputs change, which semi-naive evaluation cannot handle. - Column mismatch — the CTE's output columns don't match the stream table's storage schema (e.g., after a schema change).
- Mixed DML with non-monotone terms — DELETE or UPDATE changes combined with non-monotone recursive terms always trigger recomputation.
Check which strategy was used in the refresh history:
SELECT action, rows_inserted, rows_deleted
FROM pgtrickle.get_refresh_history('my_recursive_st', 5);
What happens when a CTE is referenced multiple times in the same query?
When a non-recursive CTE is referenced more than once, pg_trickle uses shared delta computation — the CTE's delta is computed once and cached, then reused by each reference. This is tracked via CteScan operator nodes that look up the shared delta from an internal CTE registry.
For single-reference CTEs, pg_trickle simply inlines them as subqueries (no overhead).
Window Functions & LATERAL
Window functions are maintained via partition-based recomputation rather than row-level deltas. This section covers what’s supported, the expression restriction, and LATERAL constructs.
How are window functions maintained incrementally?
pg_trickle uses partition-based recomputation for window functions. When source data changes, the DVM engine:
- Identifies which partitions are affected by the changes (based on the
PARTITION BYkey). - Recomputes the window function for only the affected partitions.
- Replaces the old partition results with the new ones in the stream table.
This is more efficient than a full recomputation when changes affect a small number of partitions.
Why can't I use a window function inside a CASE or COALESCE expression?
Window functions like ROW_NUMBER() OVER (…) are supported as standalone columns but cannot be embedded in expressions (e.g., CASE WHEN ROW_NUMBER() OVER (...) = 1 THEN ...).
This restriction exists because the DVM engine handles window functions by recomputing entire partitions. When a window function is buried inside an expression, the engine cannot isolate the window computation from the surrounding expression.
Rewrite: Move the window function to a separate column in one stream table, then reference it in a second stream table:
-- ST1: compute the window function
SELECT id, dept, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees
-- ST2: use it in an expression (references ST1)
SELECT id, CASE WHEN rn = 1 THEN 'top' ELSE 'other' END AS rank_label
FROM st1
What LATERAL constructs are supported?
pg_trickle supports three kinds of LATERAL constructs:
| Construct | Example | Delta strategy |
|---|---|---|
| Set-returning functions | LATERAL jsonb_array_elements(data) | Row-scoped recomputation — only affected parent rows are re-expanded |
| Correlated subqueries | LATERAL (SELECT ... WHERE t.id = s.id) | Row-scoped recomputation |
| JSON_TABLE (PG 17+) | JSON_TABLE(data, '$.items[*]' ...) | Modeled as LateralFunction |
Additional supported SRFs: jsonb_each, jsonb_each_text, unnest, generate_series, and others.
What happens when a row moves between window partitions during a refresh?
When a row's PARTITION BY key changes (e.g., an employee moves departments), the DVM engine recomputes both the old partition (to remove the row) and the new partition (to add it). Both partitions are re-evaluated from the source data, ensuring window function results are correct.
TopK (ORDER BY … LIMIT)
TopK queries (ORDER BY ... LIMIT N, optionally with OFFSET M) are handled via a
specialized MERGE-based strategy that re-executes the bounded query each cycle.
This section explains how it works and its limitations.
How does ORDER BY … LIMIT N work in a stream table?
When a defining query has a top-level ORDER BY … LIMIT N (with a constant integer N), pg_trickle recognizes it as a TopK pattern. An optional OFFSET M (constant integer) selects a "page" within the ranked result. The stream table stores exactly N rows and is refreshed via a MERGE-based scoped-recomputation strategy:
- On each refresh, the full query (with ORDER BY + LIMIT, and OFFSET if present) is re-executed against the source tables.
- The result is merged into the stream table using
MERGEwithNOT MATCHED BY SOURCEfor deletes. - The catalog records
topk_limit,topk_order_by, and optionallytopk_offsetfor the stream table.
TopK bypasses the DVM delta pipeline — it always re-executes the bounded query. This is efficient because the result set is bounded by N.
SELECT pgtrickle.create_stream_table(
name => 'top_customers',
query => 'SELECT customer_id, total FROM order_totals ORDER BY total DESC LIMIT 100',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
-- With OFFSET — "page 2" of the leaderboard (rows 101–200):
SELECT pgtrickle.create_stream_table(
name => 'next_customers',
query => 'SELECT customer_id, total FROM order_totals ORDER BY total DESC LIMIT 100 OFFSET 100',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
Does OFFSET work with TopK?
Yes. ORDER BY … LIMIT N OFFSET M is fully supported. The stream table stores exactly N rows starting from position M+1 in the ranked result. This is useful for:
- Paginated dashboards: Each page is a separate stream table with a different OFFSET.
- Excluding outliers:
OFFSET 5 LIMIT 50skips the top 5 and shows the next 50. - Windowed leaderboards:
OFFSET 10 LIMIT 10shows the "second tier."
Caveat: When source data changes, the "page" can shift — a row on page 3 may move to page 2 or 4. The stream table always reflects the current state of the page at the time of the last refresh.
OFFSET 0 is treated as no offset.
What happens when a row below the top-N cutoff rises above it?
On the next refresh, the full ORDER BY … LIMIT N query is re-executed. The newly qualifying row appears in the result, and the row that fell out of the top-N is removed. The MERGE operation handles this by:
- INSERT the newly qualifying row
- DELETE the row that fell below the cutoff
- UPDATE any rows whose values changed but remained in the top-N
Since TopK always re-executes the bounded query, it correctly detects all ranking changes.
Can I use TopK with aggregates or joins?
Yes. The defining query can contain any SQL that pg_trickle supports, plus ORDER BY … LIMIT N:
-- TopK over an aggregate
SELECT dept, SUM(salary) AS total_salary
FROM employees GROUP BY dept
ORDER BY total_salary DESC LIMIT 10
-- TopK over a join
SELECT e.name, d.name AS dept, e.salary
FROM employees e JOIN departments d ON e.dept_id = d.id
ORDER BY e.salary DESC LIMIT 20
The only restriction is that TopK cannot be combined with set operations (UNION/INTERSECT/EXCEPT) or GROUPING SETS/CUBE/ROLLUP.
Tables Without Primary Keys
While primary keys are not required, their absence changes how pg_trickle identifies rows. This section explains the content-based hashing fallback and its limitations with duplicate rows.
Do source tables need a primary key?
No, but it is strongly recommended. When a source table has a primary key, pg_trickle uses it to generate a deterministic __pgt_row_id for each row — this is the most reliable way to track row identity across refreshes.
Without a primary key, pg_trickle falls back to content-based hashing — an xxHash of all column values. This works correctly for tables where every row is unique, but has known issues with exact duplicate rows. See What are the risks of using tables without primary keys? for details.
What are the risks of using tables without primary keys?
Content-based row identity has known limitations with exact duplicate rows (rows where every column value is identical):
- INSERT as no-op: If a row identical to an existing one is inserted, both have the same
__pgt_row_idhash, so the MERGE treats it as a no-op (the row already exists). - DELETE removes all copies: Deleting one of N identical rows generates a DELETE delta, but the MERGE removes all rows with that
__pgt_row_id. - Aggregate drift: Over time, these mismatches can cause aggregate values to drift from the true result.
Recommendation: Add a primary key or unique constraint to source tables, or use FULL mode for tables with frequent exact-duplicate rows.
How does content-based row identity work for duplicate rows?
For tables without a primary key, __pgt_row_id is computed as pg_trickle_hash_multi(ARRAY[col1::text, col2::text, ...]) — an xxHash of all column values. Rows with identical content produce identical hashes.
The hash uses \x1E (record separator) between values and \x00NULL\x00 for NULL values, minimizing collision risk for rows with different content. However, truly identical rows (same values in every column) will always hash to the same value — this is inherent to content-based identity.
Change Data Capture (CDC)
This section explains how pg_trickle captures changes to your source tables, the trade-offs between trigger-based and WAL-based CDC, and operational topics like backup/restore and buffer inspection.
How does pg_trickle capture changes to source tables?
pg_trickle installs AFTER INSERT/UPDATE/DELETE row-level PL/pgSQL triggers on each source table referenced by a stream table. Whenever a row in the source table is modified, the trigger writes a change record into a per-source buffer table in the pgtrickle_changes schema.
Each change record contains:
- Action —
I(insert),U(update),D(delete), orT(truncate marker) - Row data — old and/or new row values serialized as JSONB
- LSN — the current WAL log sequence number, used for frontier tracking
- Transaction ID — links the change to its originating transaction
The trigger fires within your transaction, so if you roll back, the change record is also rolled back. This guarantees that only committed changes appear in the buffer.
What is the overhead of CDC triggers?
The per-row overhead is approximately 20–55 μs, which covers the PL/pgSQL function dispatch, row_to_json() serialization, and the buffer table INSERT.
At typical write rates (fewer than 1,000 writes per second per source table), this adds less than 5% additional DML latency. For most OLTP workloads, the overhead is negligible — a single network round-trip to the database is usually 10–100× more expensive.
If you have very high-throughput source tables (>10K writes/sec), consider enabling the hybrid CDC mode (pg_trickle.cdc_mode = 'auto') which can automatically transition to WAL-based capture for lower per-row overhead (~5–15 μs).
What happens when I TRUNCATE a source table?
TRUNCATE is captured via a statement-level AFTER TRUNCATE trigger that writes a T marker row to the change buffer. When the differential refresh engine detects this marker, it automatically falls back to a full refresh for that cycle, ensuring the stream table stays consistent. Both FULL and DIFFERENTIAL mode stream tables handle TRUNCATE correctly.
Are CDC triggers automatically cleaned up?
Yes. pg_trickle tracks which source tables are referenced by which stream tables in the pgt_dependencies catalog. When the last stream table referencing a particular source table is dropped, pg_trickle automatically:
- Removes the CDC triggers from the source table.
- Drops the associated change buffer table (
pgtrickle_changes.changes_<oid>).
You do not need to manually clean up triggers or buffer tables.
What happens if a source table is dropped or altered?
pg_trickle has DDL event triggers that listen for ALTER TABLE and DROP TABLE on source tables. When a change is detected, pg_trickle responds automatically:
- All stream tables that depend on the altered source are marked with
needs_reinit = truein the catalog. - On the next scheduler cycle, each affected stream table is reinitialized — the existing storage table is dropped, recreated from the current defining query schema, and re-populated with a full refresh.
- A
reinitialize_neededNOTIFY alert is sent so your monitoring can detect the event.
If the DDL change breaks the defining query (e.g., a column referenced in the query was dropped), the reinitialization will fail and the stream table will enter ERROR status. In that case, you need to drop and recreate the stream table with an updated query.
How do I check if a source table has switched from trigger-based CDC to WAL-based CDC?
When you enable hybrid CDC (pg_trickle.cdc_mode = 'auto'), pg_trickle starts capturing changes with triggers and can automatically transition to WAL-based logical replication once conditions are met. There are several ways to check the current CDC mode for each source table:
1. Query the dependency catalog directly:
SELECT d.source_relid, c.relname AS source_table, d.cdc_mode,
d.slot_name, d.decoder_confirmed_lsn, d.transition_started_at
FROM pgtrickle.pgt_dependencies d
JOIN pg_class c ON c.oid = d.source_relid;
The cdc_mode column shows one of three values:
TRIGGER— changes are captured via row-level triggers (the default)TRANSITIONING— the system is in the process of switching from triggers to WALWAL— changes are captured via logical replication
2. Use the built-in health check function:
SELECT source_table, cdc_mode, slot_name, lag_bytes, alert
FROM pgtrickle.check_cdc_health();
This returns a row per source table with the current mode, replication slot lag (for WAL-mode sources), and any alert conditions such as slot_lag_exceeds_threshold or replication_slot_missing.
3. Listen for real-time transition notifications:
LISTEN pg_trickle_cdc_transition;
pg_trickle sends a NOTIFY with a JSON payload whenever a transition starts, completes, or is rolled back. Example payload:
{
"event": "transition_complete",
"source_table": "public.orders",
"old_mode": "TRANSITIONING",
"new_mode": "WAL",
"slot_name": "pg_trickle_slot_16384"
}
This lets you integrate CDC mode changes into your monitoring stack without polling.
4. Check the global GUC setting:
SHOW pg_trickle.cdc_mode;
This shows the desired global behavior (trigger, auto, or wal), not the per-table actual state. The per-table state lives in pgt_dependencies.cdc_mode as described above.
See CONFIGURATION.md for details on the pg_trickle.cdc_mode, pg_trickle.wal_transition_timeout, pg_trickle.slot_lag_warning_threshold_mb, and pg_trickle.slot_lag_critical_threshold_mb GUCs.
Is it safe to add triggers to a stream table while the source table is switching CDC modes?
Yes, this is completely safe. CDC mode transitions and user-defined triggers operate on different tables and do not interfere with each other:
- CDC transitions affect how changes are captured from source tables (e.g.,
orders). The transition switches the capture mechanism from row-level triggers on the source table to WAL-based logical replication. - User-defined triggers live on stream tables (e.g.,
order_totals) and control how the refresh engine applies changes to the materialized output.
Because these are independent concerns, you can freely add, modify, or remove triggers on a stream table at any point — including during an active CDC transition on its source tables.
How it works in practice:
- The refresh engine checks for user-defined triggers on the stream table at the start of each refresh cycle (via a fast
pg_triggerlookup, <0.1 ms). - If user triggers are detected, the engine uses explicit
DELETE/UPDATE/INSERTstatements instead ofMERGE, so your triggers fire with correctTG_OP,OLD, andNEWvalues. - The change data consumed by the refresh engine has the same format regardless of whether it came from CDC triggers or WAL decoding — so the trigger detection and the CDC mode are fully decoupled.
A trigger added between two refresh cycles will simply be picked up on the next cycle. The only (theoretical) edge case is adding a trigger in the tiny window during a single refresh transaction, between the trigger-detection check and the MERGE execution — but since both happen within the same transaction, this is virtually impossible in practice.
Why does pg_trickle use triggers instead of logical replication for initial CDC?
pg_trickle always bootstraps CDC with row-level AFTER triggers because they provide single-transaction atomicity — the change record is written in the same transaction as the source DML, so:
- No commit-order ambiguity. The change buffer always reflects committed data; rolled-back transactions never produce partial change records.
- No replication slot management at creation time. Logical replication requires creating and monitoring replication slots, which can bloat WAL if the subscriber falls behind. Trigger-based bootstrap avoids this complexity.
- Works on all hosting providers. Some managed PostgreSQL services restrict
wal_level = logicalor limit the number of replication slots. Trigger bootstrap works everywhere, with no configuration changes. - Simpler initial deployment. No need for
wal_level = logical, no publication/subscription setup, and no extra connections for WAL senders.
With pg_trickle.cdc_mode = 'auto' (the default since v0.3.0), pg_trickle uses triggers initially and then transparently transitions to WAL-based CDC if wal_level = logical is available. If WAL is not available, triggers are kept permanently — no degradation, no errors. Set pg_trickle.cdc_mode = 'trigger' if you want to disable WAL transitions entirely. See ADR-001 and ADR-002 in the architecture documentation for the full rationale.
Why is auto the default pg_trickle.cdc_mode?
As of v0.3.0, auto is the default CDC mode. This was changed from trigger based on the following considerations:
1. Safe no-op on standard installs.
PostgreSQL ships with wal_level = replica by default. In this configuration, auto simply stays on trigger-based CDC permanently — it does not create replication slots, publications, or any WAL infrastructure. There is no error, warning, or user-visible difference from the old trigger default. auto only activates the WAL transition path when wal_level = logical is explicitly configured by the operator.
2. Automatic fallback hardening. The WAL transition and steady-state polling now include robust automatic fallback:
- Consecutive poll errors (5 failures) trigger automatic revert to triggers.
check_decoder_health()validates slot existence, WAL lag, andwal_levelon every tick.- The
TRANSITIONINGphase has a progressive timeout with informative warnings. - Post-restart health checks (
check_cdc_transition_health()) automatically clean up stale transitions.
3. Zero overhead for trigger-only deployments.
When wal_level != logical, the auto scheduler branch takes a fast-path exit after a single GUC check and pg_replication_slots query. The overhead compared to trigger mode is negligible (<1 ms per scheduler tick).
4. Progressive optimisation without config changes.
When an operator later enables wal_level = logical (e.g., for other replication needs), pg_trickle automatically benefits from lower per-row CDC overhead (~5–15 μs vs ~20–55 μs) without any configuration change. This aligns with the principle of least surprise.
When to use trigger instead: Set pg_trickle.cdc_mode = 'trigger' if you want fully deterministic trigger-only behaviour, need to minimize any replication slot management, or are on a restricted managed PostgreSQL that caps replication slots. This reverts to the pre-v0.3.0 default.
Caveats to be aware of in auto mode:
- Keyless tables (no PRIMARY KEY) stay on triggers permanently — WAL mode requires a PK for
pk_hashcomputation. - Replication slots prevent WAL recycling: if the decoder falls behind, WAL accumulates. pg_trickle now warns at
pg_trickle.slot_lag_warning_threshold_mb(default 100 MB) and marks per-source CDC health unhealthy atpg_trickle.slot_lag_critical_threshold_mb(default 1024 MB). - The
TRANSITIONINGphase runs both trigger and WAL decoder simultaneously; LSN-based deduplication handles correctness. If anything goes wrong, the system rolls back to triggers.
How does the trigger-to-WAL automatic transition work?
When pg_trickle.cdc_mode = 'auto', pg_trickle monitors each source table's write rate. When the rate exceeds an internal threshold, the transition proceeds in three phases:
- Slot creation. A logical replication slot is created for the source table's OID (e.g.,
pg_trickle_slot_16384). - Dual capture. For a brief period, both triggers and WAL decoding capture changes. The system uses LSN comparison to deduplicate, ensuring no changes are lost or double-counted.
- Trigger removal. Once the WAL decoder has confirmed it is caught up (its confirmed LSN ≥ the frontier LSN), the row-level triggers are dropped and the source transitions fully to WAL mode.
The transition is tracked in pgt_dependencies.cdc_mode (values: TRIGGER → TRANSITIONING → WAL). If the transition times out (pg_trickle.wal_transition_timeout, default 5 minutes), it is rolled back and triggers are kept.
What happens to CDC if I restore a database backup?
After restoring a backup (pg_dump, pg_basebackup, or PITR), the CDC state depends on the backup type:
| Backup type | Triggers | Change buffers | Frontier | Action needed |
|---|---|---|---|---|
| pg_dump (logical) | Preserved (in DDL) | Buffer rows included | Catalog restored | Usually none — next refresh detects stale frontier and does a full refresh |
| pg_basebackup (physical) | Preserved | Buffer rows preserved (committed at backup time) | Catalog restored | Replication slots may be invalid — WAL-mode sources may need manual transition back to TRIGGER mode |
| PITR (point-in-time) | Preserved | Only committed buffer rows at the recovery target | Catalog restored | Similar to pg_basebackup; frontier may point ahead of actual buffer content → first refresh does a full refresh to reconcile |
In all cases, the pg_trickle scheduler automatically detects frontier inconsistencies and falls back to a full refresh for the first cycle after restore. No manual intervention is required for trigger-mode sources.
For full guidelines on disaster recovery strategies, see our dedicated Backup and Restore chapter.
For WAL-mode sources, replication slots created after the backup point will not exist in the restored state. Set pg_trickle.cdc_mode = 'trigger' temporarily, or let the auto transition recreate slots.
Do CDC triggers fire for rows inserted via logical replication (subscribers)?
Yes. PostgreSQL fires row-level triggers on the subscriber side for rows applied via logical replication. This means if you have a subscriber database with pg_trickle installed, the CDC triggers will capture replicated changes into the local change buffers.
Implication: You can run stream tables on a subscriber database that tracks replicated tables — the change capture works transparently. However, be careful about:
- Double-counting. If the same table is tracked by pg_trickle on both the publisher and subscriber, changes are captured twice (once on each side). This is fine if the stream tables are independent, but confusing if you expect them to be identical.
- Replication lag. The stream table on the subscriber will be delayed by both the replication lag and the pg_trickle refresh schedule.
Can I inspect the change buffer tables directly?
Yes. Change buffers are ordinary tables in the pgtrickle_changes schema, named changes_<source_oid>:
-- List all change buffer tables
SELECT tablename FROM pg_tables WHERE schemaname = 'pgtrickle_changes';
-- Inspect recent changes for a source table (find OID first)
SELECT c.oid FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relname = 'orders' AND n.nspname = 'public';
-- Then query the buffer
SELECT action, lsn, txid, old_data, new_data
FROM pgtrickle_changes.changes_16384
ORDER BY lsn DESC LIMIT 10;
The action column contains: I (insert), U (update), D (delete), or T (truncate).
Warning: Do not modify buffer tables directly. The refresh engine manages buffer cleanup (truncation) after each successful refresh. Manual changes will corrupt the frontier tracking.
How does pg_trickle prevent its own refresh writes from re-triggering CDC?
When the refresh engine writes to a stream table (via MERGE or explicit DML), it does not trigger CDC capture on that stream table, even if the stream table is itself a source for a downstream stream table. This is because:
- CDC triggers are only installed on source tables, not on stream tables. The refresh engine writes directly to the stream table's storage without going through any change-capture mechanism.
- Downstream change propagation uses a different path. When stream table A is a source for stream table B, changes to A are detected at B's refresh time by re-reading A's data (not via triggers on A). The topological ordering ensures A is refreshed before B.
This design prevents infinite loops (A triggers B triggers A) and avoids the overhead of capturing changes to materialized output that will be recomputed anyway.
Diamond Dependencies & DAG Scheduling
When multiple stream tables form a diamond-shaped dependency graph, careful coordination is needed to avoid inconsistent snapshots. This section covers atomic consistency, schedule policies, and topological ordering.
What is a diamond dependency and why does it matter?
A diamond dependency occurs when two (or more) intermediate stream tables both depend on the same source, and a downstream stream table depends on both of them:
Source: orders
/ \
ST: totals ST: counts
\ /
ST: combined_report
Without coordination, combined_report might be refreshed after totals is updated but before counts is updated (or vice versa), producing a temporarily inconsistent snapshot — totals reflects the latest data but counts is stale.
What does diamond_consistency = 'atomic' do?
When diamond_consistency = 'atomic' is set on the downstream stream table (e.g., combined_report), pg_trickle ensures that all upstream stream tables in the diamond are refreshed within the same scheduler cycle before the downstream table is refreshed. This guarantees a consistent point-in-time snapshot.
If any upstream refresh in the atomic group fails, the downstream refresh is skipped for that cycle to avoid inconsistency. The failed upstream will be retried on the next cycle.
SELECT pgtrickle.alter_stream_table('combined_report',
diamond_consistency => 'atomic');
What is the difference between 'fastest' and 'slowest' schedule policy?
When a stream table has multiple upstream dependencies with different schedules, pg_trickle needs a policy for when to refresh the downstream table:
| Policy | Behavior | Best for |
|---|---|---|
fastest | Refresh downstream whenever any upstream refreshes | Low-latency dashboards where partial freshness is acceptable |
slowest | Refresh downstream only after all upstreams have refreshed | Reports requiring all-or-nothing consistency |
The default is fastest. Use slowest with diamond_consistency = 'atomic' for the strongest consistency guarantees.
What happens when an atomic diamond group partially fails?
When diamond_consistency = 'atomic' is set and one upstream stream table in the diamond fails to refresh:
- The downstream refresh is skipped for that cycle (it reads stale-but-consistent data from the previous successful cycle).
- The failed upstream follows the normal retry logic (exponential backoff, up to
max_consecutive_errors). - Other non-failing upstreams in the diamond are still refreshed normally — their data is fresh, but the downstream won't consume it until all upstreams succeed.
- A
NOTIFY pg_trickle_alertwith eventdiamond_partial_failureis sent so your monitoring can detect the situation.
How does pg_trickle determine topological refresh order?
The scheduler builds a directed acyclic graph (DAG) of stream table dependencies at startup and after any create_stream_table / drop_stream_table call. The algorithm:
- Edge discovery. For each stream table, the defining query's source tables are extracted. If a source table is itself a stream table, a dependency edge is added.
- Cycle detection. The DAG is checked for cycles. If a cycle is detected, the offending
create_stream_tablecall is rejected with a clear error message listing the cycle path. - Topological sort. A Kahn's algorithm topological sort produces the refresh order — leaf nodes (no stream table dependencies) are refreshed first, then their dependents, and so on.
- Level assignment. Each stream table is assigned a "level" (0 for leaves, max(parent levels) + 1 for dependents). Stream tables at the same level are refreshed concurrently when
pg_trickle.parallel_refresh_mode = 'on'.
The topological order is recalculated whenever the DAG changes. You can inspect it with:
SELECT pgt_name, depends_on, topo_level
FROM pgtrickle.stream_tables_info
ORDER BY topo_level, pgt_name;
Schema Changes & DDL Events
pg_trickle detects source table schema changes via PostgreSQL’s DDL event trigger system and reacts automatically. This section explains what happens for various DDL operations and how to handle them.
What happens when I add a column to a source table?
Adding a column to a source table is safe and non-disruptive if the stream table's defining query does not use SELECT *:
- Named columns: If the defining query explicitly lists columns (e.g.,
SELECT id, name, amount FROM orders), the new column is simply not captured by CDC and has no effect on the stream table. SELECT *: If the defining query usesSELECT *, pg_trickle detects the schema mismatch at the next refresh and marks the stream table withneeds_reinit = true. The next scheduler cycle performs a full reinitialization — drops the storage table, recreates it with the new column set, and does a full refresh.
CDC triggers capture the full row as JSONB regardless of which columns the stream table uses, so no trigger changes are needed.
What happens when I drop a column used in a stream table's query?
Dropping a column that is referenced in a stream table's defining query will cause the next refresh to fail because the column no longer exists in the source table. pg_trickle handles this via:
- DDL event trigger detects the
ALTER TABLE ... DROP COLUMNand marks all affected stream tables withneeds_reinit = true. - On the next refresh cycle, the scheduler attempts reinitialization — but the defining query will fail with a PostgreSQL error (e.g.,
column "amount" does not exist). - The stream table moves to ERROR status after
max_consecutive_errorsfailures. - A
reinitialize_neededNOTIFY alert is sent.
Resolution: Drop and recreate the stream table with an updated defining query:
SELECT pgtrickle.drop_stream_table('order_totals');
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT id, name FROM orders', -- updated query without dropped column
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
What happens when I CREATE OR REPLACE a view used by a stream table?
PostgreSQL event triggers fire on CREATE OR REPLACE VIEW, so pg_trickle detects the change and marks dependent stream tables with needs_reinit = true. On the next refresh:
- If the new view definition is compatible (same output columns, same types), reinitialization succeeds transparently — the stream table is repopulated with the new query logic.
- If the new view definition changes the output schema (different columns or types), the delta query will fail and the stream table enters ERROR status.
Tip: To avoid disruption, use pgtrickle.alter_stream_table() to pause the stream table before replacing the view, then resume after verifying compatibility.
What happens when I alter or drop a function used in a stream table's query?
If a stream table's defining query calls a user-defined function (e.g., SELECT my_func(amount) FROM orders) and that function is altered or dropped:
- ALTER FUNCTION (changing the body): pg_trickle does not detect this automatically — PostgreSQL does not fire DDL event triggers for function body changes. The stream table continues refreshing with the new function behavior. If this is intentional, no action is needed. If you want a full rebase to the new logic, temporarily switch to FULL mode and refresh:
SELECT pgtrickle.alter_stream_table('my_st', refresh_mode => 'FULL'); SELECT pgtrickle.refresh_stream_table('my_st'); SELECT pgtrickle.alter_stream_table('my_st', refresh_mode => 'DIFFERENTIAL'); - DROP FUNCTION: The next refresh fails because the function no longer exists. The stream table enters ERROR status. Recreate the function or drop and recreate the stream table.
What is reinitialize and when does it trigger?
Reinitialize is pg_trickle's mechanism for handling structural changes to source tables. When a stream table is marked with needs_reinit = true, the next scheduler cycle performs:
- Drop the existing storage table (the physical heap table backing the stream table).
- Recreate the storage table from the defining query's current output schema.
- Full refresh — run the defining query against current source data and populate the new storage table.
- Reset the frontier to the current LSN.
- Clear the
needs_reinitflag.
Reinitialize triggers automatically when:
- DDL event triggers detect
ALTER TABLE,DROP TABLE, orCREATE OR REPLACE VIEWon source tables or intermediate views. - A
needs_reinitNOTIFY alert is sent. - You can also trigger it manually:
UPDATE pgtrickle.pgt_stream_tables SET needs_reinit = true WHERE pgt_name = 'my_st';
Can I block DDL on tracked source tables?
pg_trickle does not currently block DDL on source tables — it only reacts to DDL changes via event triggers. If you want to prevent accidental schema changes on critical source tables, use PostgreSQL's built-in mechanisms:
-- Revoke ALTER/DROP from application roles
REVOKE ALL ON TABLE orders FROM app_user;
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE orders TO app_user;
-- Only the table owner (or superuser) can now ALTER/DROP
Alternatively, create a custom event trigger that raises an exception when DDL targets tracked source tables:
CREATE OR REPLACE FUNCTION prevent_source_ddl() RETURNS event_trigger AS $$
BEGIN
IF EXISTS (
SELECT 1 FROM pg_event_trigger_ddl_commands() cmd
JOIN pgtrickle.pgt_dependencies d ON d.source_relid = cmd.objid
) THEN
RAISE EXCEPTION 'Cannot ALTER/DROP a table tracked by pg_trickle';
END IF;
END;
$$ LANGUAGE plpgsql;
CREATE EVENT TRIGGER guard_source_ddl ON ddl_command_end
EXECUTE FUNCTION prevent_source_ddl();
What happens if I run DDL on a source table during an active refresh?
PostgreSQL's locking mechanism prevents most conflicts. The refresh transaction acquires a ShareLock on source tables before reading them. Since ALTER TABLE (including ADD COLUMN, DROP COLUMN, ALTER TYPE) requires an AccessExclusiveLock, the DDL statement blocks until the refresh transaction completes.
In practice:
- During a refresh: The ALTER TABLE waits for the refresh to finish, then proceeds. pg_trickle's DDL event trigger then detects the change and marks the stream table for reinitialization.
- Between refreshes: DDL proceeds immediately. The next refresh picks up the reinitialization flag.
There is a tiny theoretical window between lock acquisition and the first read where DDL could sneak in, but this is prevented by PostgreSQL's MVCC — the refresh's snapshot was taken before the DDL committed, so it reads the old schema regardless.
If pg_trickle.block_source_ddl = true: Column-affecting DDL on tracked source tables is rejected entirely with an ERROR, regardless of whether a refresh is running.
Do stream tables work with logical replication?
Stream tables are replicated to standbys via physical (streaming) replication like any other heap table. However, they are not automatically maintained by pg_trickle on the subscriber:
| Aspect | Primary | Physical standby | Logical subscriber |
|---|---|---|---|
| Scheduler runs | Yes | No (read-only) | No (no pg_trickle catalog) |
| Stream tables readable | Yes | Yes (replicated) | Only if published |
| Refreshes occur | Yes | No (standby is read-only) | No |
| Change buffers | Managed by pg_trickle | Replicated but not consumed | Not available |
Key limitations:
- Change buffer tables (
pgtrickle_changes.*) are not published through logical replication — they are internal transient data. - The pg_trickle catalog (
pgtrickle.pgt_stream_tables) is not replicated through logical replication. - On a physical standby, stream tables receive updates through streaming replication with the usual replication lag.
Recommended pattern: Run pg_trickle on the primary only. Read stream tables from any physical standby.
Performance & Tuning
This section covers scheduler tuning, the adaptive FULL fallback, disk space management, and guidance on when to use DIFFERENTIAL vs. FULL mode.
How do I tune the scheduler interval?
The pg_trickle.scheduler_interval_ms GUC controls how often the scheduler checks for stale stream tables (default: 1000 ms).
| Workload | Recommended Value |
|---|---|
| Low-latency (near real-time) | 100–500 |
| Standard | 1000 (default) |
| Low-overhead (many STs, long schedules) | 5000–10000 |
Is there any risk in setting min_schedule_seconds very low?
Yes. pg_trickle.min_schedule_seconds (default: 60) is a safety guardrail, not an arbitrary limit. Setting it very low — especially in production — can cause several problems:
WAL amplification. Every differential refresh writes a MERGE to the WAL. At 1-second intervals across many stream tables, WAL generation rises sharply, increasing replication lag and storage costs.
Lock contention. Each refresh acquires locks on the change buffer table. With cleanup_use_truncate = true (the default), this is an AccessExclusiveLock. Sub-second schedules can starve concurrent INSERT/UPDATE/DELETE statements on the source tables.
Cascading refresh load. If a refresh takes longer than the schedule interval (e.g., an 800 ms refresh on a 1-second schedule), the next refresh fires almost immediately upon completion. With chained or diamond-shaped ST graphs, the entire topological chain must complete within the interval to avoid falling behind.
Autovacuum pressure. Rapid MERGE operations produce dead tuples in the stream table faster than autovacuum can clean them up, bloating the table and degrading query performance over time.
Adaptive fallback triggering. At high change rates, pg_trickle.differential_max_change_ratio may trigger a FULL refresh instead of DIFFERENTIAL. A FULL refresh at 1-second intervals is very expensive and defeats the purpose of differential maintenance.
Practical guidance:
| Environment | Recommended minimum |
|---|---|
| Development / testing | 1 s — fine for fast iteration |
| Lightly loaded production | 10–30 s |
| Standard production | 60 s (default) |
| High-throughput OLTP | 120+ s — let change buffers accumulate for efficient batch merging |
If you need near-real-time results, consider IMMEDIATE mode (refresh_mode => 'DIFFERENTIAL' with same-transaction refresh) instead of a very short schedule — it avoids the scheduler overhead entirely and updates the stream table within your transaction.
What is the adaptive fallback to FULL?
When the number of pending changes exceeds pg_trickle.differential_max_change_ratio (default: 15%) of the source table size, DIFFERENTIAL mode automatically falls back to FULL for that refresh cycle. This prevents pathological delta queries on bulk changes.
- Set to
0.0to always use DIFFERENTIAL (even on large change sets) - Set to
1.0to effectively always use FULL - Default
0.15(15%) is a good balance
How many concurrent refreshes can run?
By default (parallel_refresh_mode = 'off') refreshes are processed sequentially within the scheduler's single background worker. This is safe and efficient for most deployments.
Starting in v0.4.0, true parallel refresh is available via:
ALTER SYSTEM SET pg_trickle.parallel_refresh_mode = 'on';
ALTER SYSTEM SET pg_trickle.max_dynamic_refresh_workers = 4; -- cluster-wide cap
ALTER SYSTEM SET pg_trickle.max_concurrent_refreshes = 4; -- per-database cap
SELECT pg_reload_conf();
When enabled, independent stream tables at the same DAG level are refreshed concurrently in separate dynamic background workers. Each worker uses one max_worker_processes slot — see the worker-budget formula before enabling.
Monitor parallel refresh with:
SELECT * FROM pgtrickle.worker_pool_status();
SELECT * FROM pgtrickle.parallel_job_status(60);
For most deployments with fewer than 100 stream tables, sequential processing is still efficient (each differential refresh typically takes 5–50 ms).
How do I check if my stream tables are keeping up?
-- Quick overview
SELECT pgt_name, status, staleness, stale
FROM pgtrickle.stream_tables_info;
-- Detailed statistics
SELECT pgt_name, total_refreshes, avg_duration_ms, consecutive_errors, stale
FROM pgtrickle.pg_stat_stream_tables;
-- Recent refresh history for a specific ST
SELECT * FROM pgtrickle.get_refresh_history('order_totals', 10);
What is __pgt_row_id?
Every stream table has a __pgt_row_id BIGINT PRIMARY KEY column that stores a 64-bit xxHash of the row's identity key. The refresh engine uses it to match incoming deltas against existing rows during MERGE operations.
For a detailed explanation of how this column is computed and why it exists, see What is the __pgt_row_id column and why does it appear in my stream tables? in the General section.
You should ignore this column in your queries. It is an implementation detail.
How much disk space do change buffer tables consume?
Each change buffer table stores one row per source-table change (INSERT, UPDATE, DELETE, or TRUNCATE marker). The row size depends on the source table's column count and data types:
| Component | Approximate size |
|---|---|
action column (char) | 1 byte |
old_data / new_data (JSONB) | 1–10 KB per row (depends on source columns) |
lsn (pg_lsn) | 8 bytes |
txid (xid8) | 8 bytes |
| Index (on lsn) | ~40 bytes per row |
Rule of thumb: Buffer tables consume roughly 2–3× the raw row size of the source change, because both OLD and NEW values are stored as JSONB.
Buffer tables are cleaned up (truncated or deleted) after each successful refresh. If you suspect buffer bloat, check:
SELECT relname, pg_size_pretty(pg_total_relation_size(oid)) AS size
FROM pg_class
WHERE relnamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'pgtrickle_changes')
ORDER BY pg_total_relation_size(oid) DESC;
What determines whether DIFFERENTIAL or FULL is faster for a given workload?
The breakeven point depends on the change ratio — the number of changed rows relative to the total source table size:
| Change ratio | Recommended mode | Why |
|---|---|---|
| < 5% | DIFFERENTIAL | Delta query touches few rows; much cheaper than re-reading everything |
| 5–15% | DIFFERENTIAL (usually) | Still faster, but approaching the crossover |
| 15–50% | FULL | The delta query scans a large fraction of the source anyway; FULL avoids the overhead of delta computation |
| > 50% | FULL | Bulk load scenario — TRUNCATE + INSERT is simpler and faster |
Additional factors:
- Query complexity: Queries with many joins or window functions have more expensive delta computation. The crossover shifts lower.
- Source table size: For small tables (<10K rows), FULL is nearly always faster because the overhead is negligible.
- Index presence: DIFFERENTIAL uses indexes to look up changed rows. Missing indexes on join keys or GROUP BY columns can make delta queries slow.
The adaptive fallback (pg_trickle.differential_max_change_ratio, default 0.15) automates this decision per-cycle.
What are the planner hints and when should I disable them?
Before executing a delta query, pg_trickle sets several session-level planner parameters to guide PostgreSQL toward efficient delta plans:
SET LOCAL enable_seqscan = off; -- Prefer index scans for small deltas
SET LOCAL enable_nestloop = on; -- Nested loops are good for small delta × large table joins
SET LOCAL enable_mergejoin = off; -- Merge joins are worse for skewed delta sizes
These hints are active only during the refresh transaction and are reset afterward.
When to disable hints: If you notice that a particular stream table's refresh is slow (check avg_duration_ms in pg_stat_stream_tables), the planner hints may be suboptimal for that specific query. You can disable them by setting:
SET pg_trickle.planner_hints = off;
This allows PostgreSQL's planner to choose its own strategy. Test both settings and compare avg_duration_ms.
How do prepared statements help refresh performance?
The refresh engine uses PostgreSQL prepared statements (PREPARE / EXECUTE) for the delta and MERGE queries. On the first refresh, the statement is prepared; subsequent refreshes reuse the cached plan. Benefits:
- Reduced planning overhead. For complex delta queries with many joins and CTEs, planning can take 5–50 ms. Prepared statements skip this on subsequent refreshes.
- Stable plans. The planner uses generic plans after the 5th execution (PostgreSQL default), avoiding plan instability from statistic fluctuations.
Prepared statements are stored per-session and are invalidated when:
- The stream table is reinitialized (schema change)
- The shared cache generation advances after DDL or stream-table metadata changes
- The PostgreSQL connection is recycled
- The session ends
How does the adaptive FULL fallback threshold work in practice?
The pg_trickle.differential_max_change_ratio GUC (default: 0.15) is evaluated per source table, per refresh cycle:
- Before each differential refresh, the engine counts pending changes in the buffer table:
pending_changes = COUNT(*) FROM pgtrickle_changes.changes_<oid>. - It estimates the source table size from
pg_class.reltuples. - If
pending_changes / reltuples > differential_max_change_ratio, the engine falls back to FULL for that cycle.
Edge cases:
- If the source table has
reltuples = 0(freshly created, no ANALYZE yet), the engine always uses FULL until statistics are available. - For multi-source stream tables (joins), each source is evaluated independently. If any source exceeds the threshold, the entire refresh falls back to FULL.
- The threshold applies to the current cycle only — the next cycle re-evaluates.
How many stream tables can a single PostgreSQL instance handle?
There is no hard limit. Practical limits depend on:
| Factor | Guideline |
|---|---|
| Scheduler overhead | Each cycle iterates all STs; at 1000 STs with 1ms overhead per check, the cycle takes ~1s |
| Background connections | 1 per database (the scheduler) + 1 per manual refresh call |
| Change buffer bloat | Each source table gets its own buffer table — many sources = many tables in pgtrickle_changes |
| Catalog size | pgt_stream_tables and pgt_dependencies grow linearly |
| Refresh throughput | Sequential processing means total cycle time = sum of individual refresh times |
Tested benchmarks: Up to 500 stream tables on a single instance with <2s total cycle time for DIFFERENTIAL refreshes averaging 3ms each.
What is the TRUNCATE vs DELETE cleanup trade-off for change buffers?
After each successful refresh, the engine cleans up processed change records from the buffer table. The pg_trickle.cleanup_use_truncate GUC (default: true) controls the method:
| Method | Pros | Cons |
|---|---|---|
TRUNCATE (default) | Instant — O(1) regardless of row count. Reclaims disk space immediately. | Takes an ACCESS EXCLUSIVE lock on the buffer table, briefly blocking concurrent INSERTs from CDC triggers (~0.1 ms typical). |
DELETE | Row-level lock only — no blocking of concurrent CDC writes. | O(N) — proportional to the number of processed rows. Dead tuples require VACUUM to reclaim space. |
When to switch to DELETE: If your source table has extremely high write throughput (>10K writes/sec) and you observe brief stalls in DML latency during refresh cleanup, switch to DELETE:
ALTER SYSTEM SET pg_trickle.cleanup_use_truncate = false;
SELECT pg_reload_conf();
For most workloads, TRUNCATE is the better choice because buffer tables are typically emptied completely after each refresh.
Interoperability
Stream tables are standard PostgreSQL heap tables, which means they work with most PostgreSQL features. This section clarifies what’s compatible (views, replication, triggers) and what’s not (direct DML, foreign keys).
Can PostgreSQL views reference stream tables?
Yes. Since stream tables are standard PostgreSQL heap tables, you can create views on top of them just like any other table. The view will return whatever data is currently in the stream table, reflecting the most recent refresh:
CREATE VIEW high_value_customers AS
SELECT customer_id, total FROM pgtrickle.order_totals WHERE total > 1000;
This is a common pattern for adding per-user filters or formatting on top of a shared stream table.
Can materialized views reference stream tables?
Yes, though this is usually redundant — both materialized views and stream tables are physical snapshots of query results. The key difference is that the materialized view requires its own manual REFRESH MATERIALIZED VIEW call; it does not auto-refresh when the underlying stream table refreshes.
A more idiomatic approach is to create a second stream table that references the first one. This way, pg_trickle handles the dependency ordering and refresh scheduling for both automatically.
Can I replicate stream tables with logical replication?
Yes. Stream tables can be published like any ordinary table:
CREATE PUBLICATION my_pub FOR TABLE pgtrickle.order_totals;
Important caveats:
- The
__pgt_row_idcolumn is replicated (it is the primary key) - Subscribers receive materialized data, not the defining query
- Do not install pg_trickle on the subscriber and attempt to refresh the replicated table — it will have no CDC triggers or catalog entries
- Internal change buffer tables are not published by default
Can I INSERT, UPDATE, or DELETE rows in a stream table directly?
No. Stream table contents are managed exclusively by the refresh engine, and direct DML will corrupt the internal state (row IDs, frontier tracking, and change buffer consistency). See Why can't I INSERT, UPDATE, or DELETE rows in a stream table? for a detailed explanation of what goes wrong.
If you need to post-process stream table data, create a view or a second stream table that references the first one.
Can I add foreign keys to or from stream tables?
No. Foreign key constraints are incompatible with how the refresh engine operates. The engine uses bulk MERGE operations that apply inserts and deletes atomically, without guaranteeing the row-by-row ordering that foreign key checks require. Full refreshes also use TRUNCATE + INSERT, which bypasses cascade logic entirely.
See Why can't I add foreign keys? for details. If you need referential integrity, enforce it in your application or in a view that joins the stream tables.
Can I add my own triggers to stream tables?
Yes, for DIFFERENTIAL mode stream tables. When user-defined row-level triggers are detected, the refresh engine automatically switches from MERGE to explicit DELETE + UPDATE + INSERT statements. This ensures triggers fire with the correct TG_OP, OLD, and NEW values. Legacy configs that still set pg_trickle.user_triggers = 'on' are treated the same as auto.
Limitations:
- Row-level triggers do not fire during FULL refresh (they are automatically suppressed via
DISABLE TRIGGER USER). UseREFRESH MODE DIFFERENTIALfor stream tables with triggers. - The
IS DISTINCT FROMguard prevents no-opUPDATEtriggers when the aggregate result is unchanged. BEFOREtriggers that modifyNEWwill affect the stored value — the next refresh may "correct" it back, causing oscillation.
See the pg_trickle.user_triggers GUC in CONFIGURATION.md for control options.
Can I ALTER TABLE a stream table directly?
No. Direct ALTER TABLE would change the physical table without updating pg_trickle's catalog, causing column mismatches and __pgt_row_id invalidation on the next refresh. See Why can't I ALTER TABLE a stream table directly? for details.
Instead, use the pg_trickle API:
-- Change schedule, mode, or status:
SELECT pgtrickle.alter_stream_table('order_totals', schedule => '10m');
-- To change the defining query or column structure, drop and recreate:
SELECT pgtrickle.drop_stream_table('order_totals');
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => '...',
schedule => '5m',
refresh_mode => 'DIFFERENTIAL'
);
Does pg_trickle work with PgBouncer or other connection poolers?
It depends on the pooling mode. pg_trickle's background scheduler uses session-level features that are incompatible with transaction-mode connection pooling:
| Feature | Issue with Transaction-Mode Pooling |
|---|---|
pg_advisory_lock() | Session-level lock released when connection returns to pool — concurrent refreshes possible |
PREPARE / EXECUTE | Prepared statements are session-scoped — "does not exist" errors on different connections |
LISTEN / NOTIFY | Notifications lost when listeners change connections |
Recommended configurations:
- Session-mode pooling (
pool_mode = session): Fully compatible. The scheduler holds a dedicated connection. - Direct connection (no pooler for the scheduler): Fully compatible. Application queries can still go through a pooler.
- Transaction-mode pooling (
pool_mode = transaction): Not supported. The scheduler requires a persistent session.
Tip: If your infrastructure requires transaction-mode pooling (e.g., AWS RDS Proxy, Supabase), route the pg_trickle background worker through a direct connection while keeping application traffic on the pooler. Most connection poolers support per-database or per-user routing rules.
Does pg_trickle work with pgvector?
Partially — it depends on the refresh mode and what the defining query does.
What works:
- Source tables with
vectorcolumns. CDC triggers are generated using PostgreSQL'sformat_type(), which returns the full type name (e.g.vector(1536)). Change buffer tables mirror the source schema correctly, so inserts, updates, and deletes on pgvector tables are captured and replayed without issue. - Passing vector columns through in DIFFERENTIAL mode. Stream tables that select, filter (on non-vector columns), or join sources that happen to contain
vectorcolumns work correctly — the vector data is treated as an opaque value and copied through unchanged. - FULL mode with any pgvector expression. Because FULL mode re-executes the entire defining query, all pgvector operators (
<->,<=>,<#>) and functions (cosine_distance,l2_normalize, etc.) work exactly as they do in a regular query.
What does not work:
- DIFFERENTIAL mode with pgvector distance operators in the query. The DVM engine needs a differentiation rule for every SQL operator it encounters. Custom operators like
<->(L2 distance) or<=>(cosine distance) are not in the built-in rule set. The engine will fall back automatically to FULL mode if such operators appear in the delta query path. Setrefresh_mode => 'FULL'explicitly to make this intent clear. - Incremental aggregation over vector columns. There is no meaningful incremental form for aggregates over
vectorvalues (e.g. averaging embeddings). Use FULL mode for any aggregate that involves vector arithmetic.
Recommended pattern for a nearest-neighbour cache or semantic search result set:
CREATE EXTENSION IF NOT EXISTS vector;
SELECT pgtrickle.create_stream_table(
name => 'top_similar_docs',
query => $$
SELECT d.id, d.title, d.embedding,
d.embedding <=> '[0.1, 0.2, 0.3]'::vector AS distance
FROM documents d
ORDER BY distance
LIMIT 100
$$,
schedule => '5m',
refresh_mode => 'FULL'
);
For use-cases that only carry vector columns through without computing on them, DIFFERENTIAL mode works fine:
-- Vectors are not used in the delta computation — DIFFERENTIAL is safe here
SELECT pgtrickle.create_stream_table(
name => 'active_doc_embeddings',
query => $$
SELECT id, embedding
FROM documents
WHERE status = 'published'
$$,
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
dbt Integration
The dbt-pgtrickle package provides a stream_table materialization that
lets you manage stream tables through dbt’s standard workflow. This section
covers setup, commands, freshness checks, and query change handling.
How do I use pg_trickle with dbt?
Install the dbt-pgtrickle package (a pure Jinja SQL macro package — no Python dependencies):
# packages.yml
packages:
- package: pg_trickle/dbt_pgtrickle
version: ">=0.2.0"
Then define a stream table model using the stream_table materialization:
-- models/order_totals.sql
{{ config(
materialized='stream_table',
schedule='1m',
refresh_mode='DIFFERENTIAL'
) }}
SELECT customer_id, SUM(amount) AS total
FROM {{ source('public', 'orders') }}
GROUP BY customer_id
The stream_table materialization calls pgtrickle.create_stream_table() on the first run and pgtrickle.alter_stream_table() on subsequent runs (if the schedule or mode changes).
What dbt commands work with stream tables?
| Command | Behavior |
|---|---|
dbt run | Creates stream tables that don't exist; updates schedule/mode if changed; does not alter the defining query of existing STs |
dbt run --full-refresh | Drops and recreates all stream tables from scratch (new defining query, fresh data) |
dbt test | Works normally — tests query the stream table as a regular table |
dbt source freshness | Works if you configure a freshness block on the stream table source |
dbt docs generate | Documents stream tables like any other model |
How does dbt run --full-refresh work with stream tables?
When --full-refresh is passed, the stream_table materialization:
- Calls
pgtrickle.drop_stream_table('model_name')to remove the existing stream table, CDC triggers, and change buffers. - Calls
pgtrickle.create_stream_table(...)with the current defining query from the model file. - The new stream table starts in INITIALIZING status and performs its first full refresh.
This is the correct way to update a stream table's defining query in dbt. Without --full-refresh, dbt will not detect query changes (it only compares schedule and mode).
How do I check stream table freshness in dbt?
Use dbt's built-in source freshness feature by adding a freshness block to your source definition:
# models/sources.yml
sources:
- name: pgtrickle
schema: pgtrickle
tables:
- name: order_totals
loaded_at_field: "last_refreshed_at" # from stream_tables_info
freshness:
warn_after: {count: 5, period: minute}
error_after: {count: 15, period: minute}
Then run dbt source freshness to check.
Alternatively, query the pg_trickle monitoring views directly in a dbt test:
-- tests/check_freshness.sql
SELECT pgt_name FROM pgtrickle.stream_tables_info WHERE stale = true
What happens when the defining query changes in dbt?
If you modify the SQL in a stream table model file and run dbt run without --full-refresh:
- The
stream_tablematerialization detects that the stream table already exists. - It compares the schedule and refresh mode — if either changed, it calls
alter_stream_table()to update them. - It does not compare the defining query text. The existing defining query remains in effect.
To apply a new defining query, you must run dbt run --full-refresh. This drops and recreates the stream table with the new query.
Recommendation: After changing a model's SQL, always run dbt run --full-refresh -s model_name to apply the change.
Can I use dbt snapshot with stream tables?
Yes, with caveats. dbt snapshots work by tracking changes to a source table over time using updated_at or check strategies. You can snapshot a stream table like any other table.
However, keep in mind:
- Stream tables are refreshed periodically, not on every write. The snapshot will only capture changes at refresh boundaries, not at the granularity of individual source-table writes.
- The
__pgt_row_idcolumn will appear in the snapshot. You may want to exclude it withcheck_colsor aselectin the snapshot configuration. - FULL refresh mode replaces all rows each cycle, which will appear as "updates" to the snapshot strategy even if the data hasn't changed. Use DIFFERENTIAL mode for stream tables that are snapshotted.
What dbt versions are supported?
dbt-pgtrickle is a pure Jinja SQL macro package that works with:
- dbt-core 1.7+ (the
stream_tablematerialization uses standard Jinja patterns) - dbt-postgres adapter (required for PostgreSQL connection)
There are no Python dependencies beyond dbt-core and dbt-postgres. The package is tested against dbt 1.7.x and 1.8.x in CI.
Row-Level Security (RLS)
Does RLS on source tables affect stream table content?
No. Stream tables always materialize the full, unfiltered result set,
regardless of any RLS policies on source tables. This matches the behavior of
PostgreSQL's built-in REFRESH MATERIALIZED VIEW.
The scheduled refresh runs as a superuser background worker. Manual calls to
refresh_stream_table() and IMMEDIATE-mode IVM triggers also bypass RLS
internally (SET LOCAL row_security = off / SECURITY DEFINER trigger
functions), ensuring the stream table content is always complete and
deterministic.
Can I use RLS on a stream table to filter reads per role?
Yes. Stream tables are regular PostgreSQL tables, so ALTER TABLE … ENABLE ROW LEVEL SECURITY and CREATE POLICY work exactly as expected.
This is the recommended pattern for multi-tenant filtering:
ALTER TABLE pgtrickle.order_totals ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON pgtrickle.order_totals
USING (tenant_id = current_setting('app.tenant_id')::INT);
One stream table serves all tenants. Per-tenant filtering happens at query time with zero storage duplication.
What happens when I ENABLE or DISABLE RLS on a source table?
pg_trickle's DDL event trigger detects ALTER TABLE … ENABLE ROW LEVEL SECURITY, DISABLE ROW LEVEL SECURITY, FORCE ROW LEVEL SECURITY, and
NO FORCE ROW LEVEL SECURITY on source tables and marks all dependent
stream tables for reinitialisation. The same applies to CREATE POLICY,
ALTER POLICY, and DROP POLICY.
Why are IVM trigger functions SECURITY DEFINER?
In IMMEDIATE mode, the IVM trigger fires in the DML-issuing user's context.
If that user has restricted RLS visibility, the delta query could see only a
subset of the base table rows, producing a corrupt stream table. Making the
trigger function SECURITY DEFINER (owned by the extension installer, typically
a superuser) ensures the delta query always has full visibility. The DML itself
is still subject to the user's own RLS policies — only the stream table
maintenance runs with elevated privileges.
The trigger functions also set search_path = pg_catalog, pgtrickle, pgtrickle_changes, public to prevent search_path hijacking — a security best
practice for all SECURITY DEFINER functions. The public schema is included
because the delta SQL references user tables that typically reside there.
Deployment & Operations
This section covers the operational aspects of running pg_trickle in production: background workers, upgrades, restarts, replicas, Kubernetes, partitioned tables, and multi-database deployments.
How many background workers does pg_trickle use?
pg_trickle uses a two-tier background worker model:
- Launcher (
pg_trickle launcher) — one per cluster, static. Scanspg_databaseevery ~10 seconds and spawns a per-database scheduler for every database where pg_trickle is installed. Automatically re-spawns schedulers that exit. - Per-database scheduler (
pg_trickle scheduler) — one dynamic worker per database with pg_trickle installed.
| Component | Workers | Notes |
|---|---|---|
| Launcher | 1 (static) | Cluster-wide; connects to postgres database |
| Scheduler | 1 per database (dynamic) | Persistent per database; drives all refreshes |
| Parallel refresh workers | 0–N per database | Only when pg_trickle.parallel_refresh_mode = 'on' |
| WAL decoder | 0 (shared) | Shares the scheduler's SPI connection |
| Manual refresh | 0 | Runs in the caller's session |
How do I size max_worker_processes?
When max_worker_processes is too low, the launcher silently fails to spawn schedulers for some databases and retries every 5 minutes. Those databases stop refreshing with no error in the stream table itself — you only see it in the PostgreSQL log:
WARNING: pg_trickle launcher: could not spawn scheduler for database 'mydb'
The minimum formula:
max_worker_processes ≥
1 (pg_trickle launcher)
+ N (one scheduler per database with pg_trickle installed)
+ max_dynamic_refresh_workers (only if parallel_refresh_mode = 'on'; default 4)
+ autovacuum_max_workers (default 3)
+ parallel query workers (max_parallel_workers_per_gather × concurrent queries)
+ slots for other extensions (logical replication launcher, etc.)
A practical starting point for a cluster with a handful of databases:
max_worker_processes = 32
This value requires a full PostgreSQL restart (not just reload).
How do I upgrade pg_trickle to a new version?
- Install the new shared library (replace the
.so/.dylibfile in PostgreSQL'slibdirectory). - Run the upgrade SQL:
This applies migration scripts (e.g.,ALTER EXTENSION pg_trickle UPDATE;pg_trickle--0.2.1--0.2.2.sql) that update catalog tables, add new functions, and migrate data as needed. - Restart PostgreSQL if the shared library changed (required for
shared_preload_librarieschanges). - Verify:
SELECT pgtrickle.version();
Zero-downtime upgrades are possible for minor versions (patch releases) that don't change the shared library. Just run ALTER EXTENSION pg_trickle UPDATE — no restart needed.
For detailed instructions, version-specific notes, rollback procedures, and troubleshooting, see the full Upgrading Guide.
How do I know if my shared library and SQL extension versions match?
The background worker checks for version mismatches at startup and logs a
WARNING if the compiled .so version differs from the installed SQL extension
version. You can also check manually:
-- Compiled .so version:
SELECT pgtrickle.version();
-- Installed SQL extension version:
SELECT extversion FROM pg_extension WHERE extname = 'pg_trickle';
If these differ, run ALTER EXTENSION pg_trickle UPDATE; and restart
PostgreSQL if prompted.
Are stream tables preserved during an upgrade?
Yes. ALTER EXTENSION pg_trickle UPDATE applies only additive schema
migrations (new columns, updated function signatures). Existing stream tables,
their data, refresh history, and CDC infrastructure are preserved. The
scheduler resumes normal operation after the upgrade completes.
For version-specific migration notes, see the Upgrading Guide — Version-Specific Notes.
What happens to stream tables during a PostgreSQL restart?
During a restart:
- The scheduler stops. No refreshes occur while PostgreSQL is down.
- CDC triggers are inactive. Source table writes during the restart window are captured when PostgreSQL comes back up (triggers are persistent DDL objects).
- On startup, the scheduler background worker starts, reads the catalog, rebuilds the DAG, and resumes refresh cycles from where it left off.
- Frontier reconciliation. The scheduler detects any gap between the stored frontier LSN and the current WAL position. Source changes that occurred between the last successful refresh and the restart are in the change buffers (for trigger-mode CDC) and will be processed in the first refresh cycle.
Net effect: Stream tables may be stale for the duration of the downtime, but no data is lost. The first refresh cycle after restart catches up automatically.
Can I use pg_trickle on a read replica / standby?
The scheduler does not run on standby servers. When pg_trickle detects it is running in recovery mode (pg_is_in_recovery() = true), the background worker enters a sleep loop and does not attempt any refreshes.
However, stream tables replicated from the primary are readable on the standby — they are regular heap tables and are replicated via physical (streaming) replication like any other table.
Pattern for read-heavy workloads:
- Run pg_trickle on the primary — it performs all refreshes.
- Query stream tables on the standby — read replicas get the latest refreshed data via streaming replication, with replication lag as the only additional delay.
How does pg_trickle work with CloudNativePG / Kubernetes?
pg_trickle is compatible with CloudNativePG. The cnpg/ directory in the repository contains example manifests:
- Dockerfile.ext — builds a PostgreSQL image with pg_trickle pre-installed
- cluster-example.yaml — CloudNativePG Cluster manifest with
shared_preload_libraries = 'pg_trickle'
Key considerations:
- Include
pg_trickleinshared_preload_librariesin the Cluster'spostgresqlconfiguration. - The scheduler runs on the primary pod only. Replica pods detect recovery mode and sleep.
- Pod restarts are handled the same way as PostgreSQL restarts (see above).
- Persistent volume claims preserve catalog and change buffers across pod rescheduling.
Does pg_trickle work with partitioned source tables?
Yes. pg_trickle installs CDC triggers on the partitioned parent table, which PostgreSQL automatically propagates to all existing and future partitions. When a row is inserted into any partition, the trigger fires and writes the change to the buffer table.
Caveats:
TRUNCATEon individual partitions fires the partition-level trigger, which is also captured.- Attaching or detaching partitions (
ALTER TABLE ... ATTACH/DETACH PARTITION) fires DDL event triggers, which may mark the stream table for reinitialization. - Row movement between partitions (when the partition key is updated) is captured as a DELETE from the old partition and an INSERT into the new partition.
Can I run pg_trickle in multiple databases on the same cluster?
Yes. Each database gets its own independent scheduler background worker, its own catalog tables, and its own change buffers. Stream tables in different databases do not interact.
Resource planning: Each database with stream tables requires 1 background worker slot in max_worker_processes. If you have many databases, the default of 8 is easily exhausted.
Important: When
max_worker_processesis exhausted, the launcher silently skips databases it cannot spawn a scheduler for and retries every 5 minutes. This means stream tables in those databases stop refreshing with no visible error — they just go stale. Check the PostgreSQL log for:WARNING: pg_trickle launcher: could not spawn scheduler for database 'mydb'If you see this, increase
max_worker_processesand restart PostgreSQL.
See How do I size max_worker_processes? for the full formula.
-- On each database where you want pg_trickle:
CREATE EXTENSION pg_trickle;
The extension must be created separately in each database — shared_preload_libraries loads the shared library cluster-wide, but the SQL objects (catalog tables, functions) are per-database.
Monitoring & Alerting
pg_trickle provides built-in monitoring views and NOTIFY-based alerting. This section explains the available views, alert events, and failure handling.
How do I list all stream tables in my database?
Several options depending on how much detail you need:
-- Quickest: name + status + mode + staleness
SELECT name, status, refresh_mode, is_populated, staleness
FROM pgtrickle.stream_tables_info;
-- Full stats: refresh counts, rows inserted/deleted, avg duration, error streaks
SELECT * FROM pgtrickle.pg_stat_stream_tables;
-- Live status including consecutive_errors and data_timestamp
SELECT * FROM pgtrickle.pgt_status();
-- Raw catalog (all persisted properties, no computed fields)
SELECT * FROM pgtrickle.pgt_stream_tables;
How do I inspect what pg_trickle is doing right now?
Quick status snapshot:
SELECT name, status, refresh_mode, consecutive_errors, staleness
FROM pgtrickle.pgt_status();
Deep dive into a specific stream table — shows the defining query, DVM operator tree, source tables, generated delta SQL, and current WAL frontier:
SELECT * FROM pgtrickle.explain_st('my_table');
Key properties returned:
| Property | Description |
|---|---|
dvm_supported | Whether differential maintenance is possible for this query |
operator_tree | How the DVM engine has decomposed the query |
delta_query | The actual SQL executed during a differential refresh |
frontier | Per-source LSN positions flushed at last refresh |
Recent refresh activity:
-- Last 10 refreshes for a stream table (action, status, rows, duration):
SELECT * FROM pgtrickle.get_refresh_history('my_table', 10);
-- Aggregate refresh stats for all stream tables:
SELECT * FROM pgtrickle.st_refresh_stats();
CDC and slot health:
-- Per-source CDC mode, WAL lag, and alerts:
SELECT * FROM pgtrickle.check_cdc_health();
-- Replication slot health (slot_name, active, lag_bytes):
SELECT * FROM pgtrickle.slot_health();
Real-time event stream:
LISTEN pg_trickle_alert;
-- Receives JSON payloads for: stale_data, auto_suspended, resumed,
-- reinitialize_needed, buffer_growth_warning, refresh_completed, refresh_failed
Pending change buffers (rows not yet consumed by a differential refresh):
SELECT stream_table, source_table, cdc_mode, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
Are there convenience functions for inspecting source tables and CDC buffers?
Yes. pg_trickle provides two functions added to complement the existing monitoring suite:
pgtrickle.list_sources(name) — shows every source table a stream table depends on, the CDC mode each uses, and any column-level usage metadata:
SELECT * FROM pgtrickle.list_sources('order_totals');
-- Returns: source_table, source_oid, source_type, cdc_mode, columns_used
pgtrickle.change_buffer_sizes() — shows, for every tracked source table, how many CDC rows are pending (not yet consumed by a differential refresh) and the estimated on-disk size of the change buffer:
SELECT * FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
-- Returns: stream_table, source_table, source_oid, cdc_mode, pending_rows, buffer_bytes
A large pending_rows value for a source table means a differential refresh is overdue or stalled — use pgtrickle.get_refresh_history() to investigate.
Can I see a tree view of all stream table dependencies?
Yes. pgtrickle.dependency_tree() walks the dependency DAG and renders it as an indented ASCII tree:
SELECT tree_line, status, refresh_mode
FROM pgtrickle.dependency_tree();
Example output:
tree_line | status | refresh_mode
------------------------------------------+--------+--------------
report_summary | ACTIVE | DIFFERENTIAL
├── orders_by_region | ACTIVE | DIFFERENTIAL
│ ├── public.orders [src] | |
│ └── public.customers [src] | |
└── revenue_totals | ACTIVE | DIFFERENTIAL
└── public.orders [src] | |
Each row has node (qualified name), node_type (stream_table or source_table), depth, status, and refresh_mode. Source tables are shown as leaves tagged with [src].
What monitoring views are available?
| View | Description |
|---|---|
pgtrickle.stream_tables_info | Status overview with computed staleness |
pgtrickle.pg_stat_stream_tables | Comprehensive stats (refresh counts, avg duration, error streaks) |
How do I get alerted when something goes wrong?
pg_trickle sends PostgreSQL NOTIFY messages on the pg_trickle_alert channel with JSON payloads:
| Event | When |
|---|---|
stale_data | Staleness exceeds 2× the schedule |
auto_suspended | Stream table suspended after max consecutive errors |
reinitialize_needed | Upstream DDL change detected |
buffer_growth_warning | Change buffer growing unexpectedly |
refresh_completed | Refresh completed successfully |
refresh_failed | Refresh failed |
Listen with:
LISTEN pg_trickle_alert;
What happens when a stream table keeps failing?
After pg_trickle.max_consecutive_errors (default: 3) consecutive failures, the stream table moves to ERROR status and automatic refreshes stop. An auto_suspended NOTIFY alert is sent.
To recover:
-- Fix the underlying issue (e.g., restore a dropped source table), then:
SELECT pgtrickle.alter_stream_table('my_table', status => 'ACTIVE');
Retries use exponential backoff (base 1s, max 60s, ±25% jitter, up to 5 retries before counting as a real failure).
Configuration Reference
All pg_trickle settings are configured via PostgreSQL GUC parameters. The table below lists every available parameter with its type, default, and description.
| GUC | Type | Default | Description |
|---|---|---|---|
pg_trickle.enabled | bool | true | Enable/disable the scheduler. Manual refreshes still work when false. |
pg_trickle.scheduler_interval_ms | int | 1000 | Scheduler wake interval in milliseconds (100–60000) |
pg_trickle.min_schedule_seconds | int | 60 | Minimum allowed schedule duration (1–86400) |
pg_trickle.max_consecutive_errors | int | 3 | Failures before auto-suspending (1–100) |
pg_trickle.change_buffer_schema | text | pgtrickle_changes | Schema for CDC buffer tables |
pg_trickle.max_concurrent_refreshes | int | 4 | Max parallel refresh workers (1–32) |
pg_trickle.user_triggers | text | auto | User trigger handling: auto (detect), off (suppress), on (deprecated alias for auto) |
pg_trickle.differential_max_change_ratio | float | 0.15 | Change ratio threshold for adaptive FULL fallback (0.0–1.0) |
pg_trickle.cleanup_use_truncate | bool | true | Use TRUNCATE instead of DELETE for buffer cleanup |
All GUCs are SUSET context (superuser SET) and take effect without restart, except shared_preload_libraries which requires a PostgreSQL restart.
Troubleshooting
This section covers common problems and how to debug them. If your issue isn’t listed here, check the refresh history for error messages and the monitoring views for status information.
How do I diagnose stalled data flow through stream tables?
See also: Error Reference — comprehensive guide to all pg_trickle error variants with causes and fixes.
If data seems to have stopped flowing -- stream tables show stale results despite DML on the source tables -- follow this systematic diagnostic workflow. Each step narrows the problem from broad health checks down to specific root causes.
Step 0 -- Verify GUC configuration:
Misconfigured GUCs are a common and easy-to-miss cause of stalled or severely throttled data flow. Check all pg_trickle settings in one query:
SELECT name, setting, unit
FROM pg_settings
WHERE name LIKE 'pg_trickle.%'
OR name = 'max_worker_processes'
ORDER BY name;
Key values to check:
| GUC | Safe value | Problem if set to |
|---|---|---|
pg_trickle.enabled | on | off -- stops all automatic refreshes |
pg_trickle.tiered_scheduling | on (fine) | on with all STs at tier = 'frozen' -- silently skips them |
pg_trickle.max_consecutive_errors | 3-10 | 1 -- one transient error suspends the ST permanently |
pg_trickle.scheduler_interval_ms | 100-1000 | Very high (e.g. 60000) -- scheduler only wakes every 60 s |
pg_trickle.event_driven_wake | on | off -- falls back to poll-only; latency equals scheduler_interval_ms |
pg_trickle.wake_debounce_ms | 1-50 | Very high (e.g. 5000) -- coalesces notifications for 5 s before acting |
pg_trickle.auto_backoff | on | Fine normally, but if refreshes take >95% of schedule it silently stretches intervals up to 8x |
pg_trickle.default_schedule_seconds | 1-60 | Very high -- isolated CALCULATED tables refresh very infrequently |
max_worker_processes | >= 16 (typical) | Too low -- workers cannot be spawned; parallel mode silently stalls |
Also check whether any stream tables are frozen:
SELECT pgt_name, refresh_tier
FROM pgtrickle.pgt_stream_tables
WHERE refresh_tier = 'frozen';
Step 1 -- Quick health overview:
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';
This single call checks scheduler status, error tables, stale tables, buffer
growth, replication slot lag, and the worker pool. Any WARN or ERROR row
tells you where to look next.
Step 2 -- Check stream table status and staleness:
SELECT name, status, refresh_mode, consecutive_errors, staleness
FROM pgtrickle.pgt_status()
ORDER BY staleness DESC NULLS FIRST;
Look for SUSPENDED status (auto-suspended after repeated errors), high
consecutive_errors, or unexpectedly large staleness.
Step 3 -- Check recent refresh activity:
SELECT start_time, stream_table, action, status, duration_ms, error_message
FROM pgtrickle.refresh_timeline(20);
If no recent rows appear, the scheduler may not be running. If rows show
ERROR, the error messages explain why refreshes are failing.
Step 4 -- Inspect errors for a specific stream table:
SELECT * FROM pgtrickle.diagnose_errors('my_stream_table');
Returns the last 5 FAILED refresh events with error classification and
suggested remediation steps.
Step 5 -- Check the CDC pipeline (are changes being captured?):
SELECT stream_table, source_table, cdc_mode, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
pending_rows = 0everywhere: either no DML is happening on the source tables, or CDC triggers are missing.pending_rowsgrowing but stream tables are not refreshing: scheduler or refresh problem (go back to Steps 1-3).
Step 6 -- Verify CDC triggers exist and are enabled:
SELECT source_table, trigger_type, trigger_name
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;
Any rows returned here mean change capture is broken for that source table -- DML changes are not being recorded.
Step 7 -- Check CDC slot health (WAL mode only):
SELECT * FROM pgtrickle.check_cdc_health();
Look for alert values like slot_lag_exceeds_threshold or
replication_slot_missing.
Step 8 -- Verify the dependency DAG:
SELECT tree_line, status, refresh_mode
FROM pgtrickle.dependency_tree();
Confirms the dependency graph is wired as expected. A missing edge means upstream changes will not propagate to downstream stream tables.
Step 9 -- Check the parallel worker pool (if using parallel mode):
SELECT * FROM pgtrickle.worker_pool_status();
SELECT job_id, unit_key, status, duration_ms
FROM pgtrickle.parallel_job_status(300)
WHERE status NOT IN ('SUCCEEDED');
Common root causes at a glance:
| Symptom | Diagnostic function | Likely root cause |
|---|---|---|
| No refreshes happening at all | health_check -> scheduler_running | Background worker not running or pg_trickle.enabled = off |
Stream table in SUSPENDED status | pgt_status | Repeated errors hit max_consecutive_errors threshold |
| Zero pending changes despite DML | trigger_inventory | CDC trigger was dropped or disabled by DDL |
| WAL slot missing or lagging | check_cdc_health, slot_health | Replication slot dropped, or WAL retention exceeded |
| Buffers growing but no refreshes | change_buffer_sizes + refresh_timeline | Scheduler stalled, refresh failing, or lock contention |
| Upstream changes not propagating | dependency_tree | Upstream ST not connected in the DAG |
Unit tests crash with symbol not found in flat namespace on macOS 26+
macOS 26 (Tahoe) changed the dynamic linker (dyld) to eagerly resolve all
flat-namespace symbols at binary load time. pgrx extensions link PostgreSQL
server symbols (e.g. CacheMemoryContext, SPI_connect) with
-Wl,-undefined,dynamic_lookup, which previously resolved lazily. Since
cargo test --lib runs outside the postgres process, those symbols are
missing and the test binary aborts:
dyld[66617]: symbol not found in flat namespace '_CacheMemoryContext'
Use just test-unit — it automatically detects macOS 26+ and injects a
stub library (libpg_stub.dylib) via DYLD_INSERT_LIBRARIES. The stub
provides NULL/no-op definitions for the ~28 PostgreSQL symbols; they are never
called during unit tests (pure Rust logic only).
This does not affect integration tests, E2E tests, just lint,
just build, or the extension running inside PostgreSQL.
See the Installation Guide for details and manual usage.
My stream table is stuck in INITIALIZING status
The initial full refresh may have failed. Check:
SELECT * FROM pgtrickle.get_refresh_history('my_table', 5);
If the error is transient, retry with:
SELECT pgtrickle.refresh_stream_table('my_table');
My stream table shows stale data but the scheduler is running
Common causes:
- TRUNCATE on source table — bypasses CDC triggers. Manual refresh needed.
- Too many errors — check
consecutive_errorsinpgtrickle.pg_stat_stream_tables. Resume withALTER ... status => 'ACTIVE'. - Long-running refresh — check for lock contention or slow defining queries.
- Scheduler disabled — verify
SHOW pg_trickle.enabled;returnson.
I get "cycle detected" when creating a stream table
Stream tables cannot have circular dependencies. If stream table A depends on stream table B and B depends on A (either directly or through a chain of intermediate stream tables), pg_trickle rejects the creation with a clear error message listing the cycle path.
To resolve this, restructure your queries to eliminate the circular reference. Common patterns:
- Extract the shared logic into a single base stream table that both A and B reference.
- Use a regular view instead of a stream table for one side of the dependency.
- Merge the two queries into a single stream table if possible.
A source table was altered and my stream table stopped refreshing
pg_trickle detects DDL changes (column additions, drops, type changes) via event triggers and marks affected stream tables with needs_reinit = true. The next scheduler cycle will attempt to reinitialize the stream table — drop the storage table, recreate it from the current defining query schema, and perform a full refresh.
If the schema change breaks the defining query (e.g., a column referenced in the query was dropped or renamed), the reinitialization will fail repeatedly until the stream table hits max_consecutive_errors and enters ERROR status.
To fix it: Update the defining query and recreate the stream table:
SELECT pgtrickle.drop_stream_table('order_totals');
SELECT pgtrickle.create_stream_table(
name => 'order_totals',
query => 'SELECT id, name FROM orders', -- updated query reflecting new schema
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
Check the refresh history for the specific error message:
SELECT * FROM pgtrickle.get_refresh_history('order_totals', 5);
How do I see the delta query generated for a stream table?
SELECT pgtrickle.explain_st('order_totals');
This shows the DVM operator tree, source tables, and the generated delta SQL.
How do I interpret the refresh history?
The pgtrickle.get_refresh_history() function returns the most recent refresh records for a stream table:
SELECT * FROM pgtrickle.get_refresh_history('order_totals', 10);
Key columns:
| Column | Meaning |
|---|---|
action | Refresh type: FULL, DIFFERENTIAL, TOPK, IMMEDIATE, or REINITIALIZE |
rows_inserted | Rows added to the stream table in this cycle |
rows_deleted | Rows removed from the stream table in this cycle |
rows_updated | Rows modified in the stream table (for explicit DML path) |
duration_ms | Wall-clock time for the refresh |
error_message | NULL for success; error text for failures |
source_changes | Number of pending change records processed |
fallback_reason | If DIFFERENTIAL fell back to FULL: change_ratio_exceeded, truncate_detected, or reinitialize |
Patterns to look for:
- High
rows_inserted+rows_deletedwith lowsource_changes→ possible duplicate rows (keyless source tables) fallback_reason = 'change_ratio_exceeded'frequently → consider lowering the threshold or switching to FULL mode- Increasing
duration_msover time → index maintenance or buffer bloat; consider VACUUM or checking for missing indexes
How can I tell if the scheduler is running?
Several ways to verify:
1. Check the background worker:
SELECT pid, datname, backend_type, state, query
FROM pg_stat_activity
WHERE backend_type = 'pg_trickle scheduler';
If no rows are returned, the scheduler is not running. Common causes:
pg_trickle.enabled = false- Extension not in
shared_preload_libraries max_worker_processesexhausted — the launcher silently skips databases it cannot accommodate and retries every 5 minutes. Check the PostgreSQL log forWARNING: pg_trickle launcher: could not spawn scheduler for database '...'.
2. Check recent refresh activity:
SELECT MAX(refreshed_at) AS last_refresh
FROM pgtrickle.pgt_stream_tables
WHERE status = 'ACTIVE';
If the last refresh was long ago relative to the shortest schedule, the scheduler may be stuck.
3. Check PostgreSQL logs:
The scheduler logs startup and shutdown messages at LOG level:
LOG: pg_trickle scheduler started for database "mydb"
LOG: pg_trickle scheduler shutting down (SIGTERM)
How do I debug a stream table that shows stale data?
Follow this diagnostic checklist:
- Is the scheduler running? (See above)
- Is the stream table active?
If status isSELECT pgt_name, status, consecutive_errors FROM pgtrickle.pg_stat_stream_tables WHERE pgt_name = 'my_st';ERRORorSUSPENDED, the stream table has been auto-suspended after repeated failures. - Are there pending changes?
If zero, the source table may not have CDC triggers (checkSELECT COUNT(*) FROM pgtrickle_changes.changes_<source_oid>;SELECT tgname FROM pg_trigger WHERE tgrelid = '<source_oid>'). - Is the refresh failing silently?
Check for error messages.SELECT * FROM pgtrickle.get_refresh_history('my_st', 5); - Is there lock contention? Long-running transactions holding locks on the source or stream table can block refreshes. Check
pg_locksandpg_stat_activity.
What does the needs_reinit flag mean and how do I clear it?
The needs_reinit flag in pgtrickle.pgt_stream_tables indicates that the stream table's physical storage needs to be rebuilt — typically because a source table's schema changed.
When needs_reinit = true:
- The scheduler skips normal differential/full refresh.
- Instead, it performs a reinitialize: drop the storage table, recreate it from the current defining query schema, and populate with a full refresh.
- If reinitialization succeeds,
needs_reinitis cleared automatically.
If reinitialization keeps failing (e.g., the defining query references a dropped column):
-- Fix the underlying issue first, then clear manually:
UPDATE pgtrickle.pgt_stream_tables SET needs_reinit = false WHERE pgt_name = 'my_st';
-- Or drop and recreate:
SELECT pgtrickle.drop_stream_table('my_st');
SELECT pgtrickle.create_stream_table(
name => 'my_st',
query => 'SELECT ...',
schedule => '1m',
refresh_mode => 'DIFFERENTIAL'
);
Why Are These SQL Features Not Supported?
This section gives detailed technical explanations for each SQL limitation. pg_trickle follows the principle of "fail loudly rather than produce wrong data" — every unsupported feature is detected at stream-table creation time and rejected with a clear error message and a suggested rewrite.
For all of these, returning an explicit error is a deliberate design choice: the alternative would be silently producing incorrect results after a refresh, which is far harder to diagnose.
How does NATURAL JOIN work?
NATURAL JOIN is fully supported. At parse time, pg_trickle resolves the common columns between the two tables (using OpTree::output_columns()) and synthesizes explicit equi-join conditions. This supports INNER, LEFT, RIGHT, and FULL NATURAL JOIN variants.
Internally, NATURAL JOIN is converted to an explicit JOIN ... ON before the DVM engine builds its operator tree, so delta computation works identically to a manually specified equi-join.
Note: The internal __pgt_row_id column is excluded from common column resolution, so NATURAL JOINs between stream tables work correctly.
How do GROUPING SETS, CUBE, and ROLLUP work?
GROUPING SETS, CUBE, and ROLLUP are fully supported via an automatic parse-time rewrite. pg_trickle decomposes these constructs into a UNION ALL of separate GROUP BY queries before the DVM engine processes the query.
Explosion guard:
CUBE(N)generates $2^N$ branches. pg_trickle rejects CUBE/ROLLUP combinations that would produce more than 64 branches to prevent runaway memory usage. Use explicitGROUPING SETS(...)instead.
For example:
-- This defining query:
SELECT dept, region, SUM(amount) FROM sales GROUP BY CUBE(dept, region)
-- Is automatically rewritten to:
SELECT dept, region, SUM(amount) FROM sales GROUP BY dept, region
UNION ALL
SELECT dept, NULL::text, SUM(amount) FROM sales GROUP BY dept
UNION ALL
SELECT NULL::text, region, SUM(amount) FROM sales GROUP BY region
UNION ALL
SELECT NULL::text, NULL::text, SUM(amount) FROM sales
GROUPING() function calls are replaced with integer literal constants corresponding to the grouping level. The rewrite is transparent — the DVM engine sees only standard GROUP BY + UNION ALL operators and can apply incremental delta computation to each branch independently.
How does DISTINCT ON (…) work?
DISTINCT ON is fully supported via an automatic parse-time rewrite. pg_trickle transparently transforms DISTINCT ON into a ROW_NUMBER() window function subquery:
-- This defining query:
SELECT DISTINCT ON (dept) dept, employee, salary
FROM employees ORDER BY dept, salary DESC
-- Is automatically rewritten to:
SELECT dept, employee, salary FROM (
SELECT dept, employee, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees
) sub WHERE rn = 1
The rewrite happens before DVM parsing, so the operator tree sees a standard window function query and can apply partition-based recomputation for incremental delta maintenance.
Why is TABLESAMPLE rejected?
TABLESAMPLE returns a random subset of rows from a table (e.g., FROM orders TABLESAMPLE BERNOULLI(10) gives ~10% of rows).
Stream tables materialize the complete result set of the defining query and keep it up-to-date across refreshes. Baking a random sample into the defining query is not meaningful because:
-
Non-determinism. Each refresh would sample different rows, making the stream table contents unstable and unpredictable. The delta between refreshes would be dominated by sampling noise, not actual data changes.
-
CDC incompatibility. The trigger-based change-capture system tracks specific row-level changes (inserts, updates, deletes). A
TABLESAMPLEdefining query has no stable row identity — the "changed rows" concept doesn't apply when the entire sample shifts each cycle.
Rewrite:
-- Instead of sampling in the defining query:
SELECT * FROM orders TABLESAMPLE BERNOULLI(10)
-- Materialize the full result and sample when querying:
SELECT * FROM order_stream_table WHERE random() < 0.1
Why is LIMIT / OFFSET rejected?
Stream tables materialize the complete result set and keep it synchronized with source data. Bare LIMIT/OFFSET (without a recognized pattern) would truncate the result:
-
Undefined ordering.
LIMITwithoutORDER BYreturns an arbitrary subset. -
Delta instability. When source rows change, the boundary between "in the LIMIT" and "out of the LIMIT" shifts. A single INSERT could evict one row and admit another, requiring the refresh to track the full ordered position of every row.
-
Semantic mismatch. Users who write
LIMIT 100typically want to limit what they read, not what is stored.
Exception — TopK pattern: When the defining query has a top-level ORDER BY … LIMIT N (constant integer, optionally with OFFSET M), pg_trickle recognizes this as a TopK query and accepts it. The ORDER BY clause is required — bare LIMIT without ORDER BY is always rejected because it selects an arbitrary subset. With ORDER BY, the top-N boundary is well-defined and the stream table stores exactly those N rows (starting from position M+1 if OFFSET is specified). See the TopK section for details.
Rewrite (when TopK doesn't apply):
-- Instead of:
'SELECT * FROM orders ORDER BY created_at DESC LIMIT 100'
-- Omit LIMIT from the defining query, apply when reading:
SELECT * FROM orders_stream_table ORDER BY created_at DESC LIMIT 100
Why are window functions in expressions rejected?
Window functions like ROW_NUMBER() OVER (…) are supported as standalone columns in stream tables. However, embedding a window function inside an expression — such as CASE WHEN ROW_NUMBER() OVER (...) = 1 THEN ... or SUM(x) OVER (...) + 1 — is rejected.
This restriction exists because:
-
Partition-based recomputation. pg_trickle's differential mode handles window functions by recomputing entire partitions that were affected by changes. When a window function is buried inside an expression, the DVM engine cannot isolate the window computation from the surrounding expression, making it impossible to correctly identify which partitions to recompute.
-
Expression tree ambiguity. The DVM parser would need to differentiate the outer expression (arithmetic,
CASE, etc.) while treating the inner window function specially. This creates a combinatorial explosion of differentiation rules for every possible expression type × window function combination.
Rewrite:
-- Instead of:
SELECT id, CASE WHEN ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) = 1
THEN 'top' ELSE 'other' END AS rank_label
FROM employees
-- Move window function to a separate column, then use a wrapping stream table:
-- ST1:
SELECT id, dept, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees
-- ST2 (references ST1):
SELECT id, CASE WHEN rn = 1 THEN 'top' ELSE 'other' END AS rank_label
FROM pgtrickle.employees_ranked
Why is FOR UPDATE / FOR SHARE rejected?
FOR UPDATE and related locking clauses (FOR SHARE, FOR NO KEY UPDATE, FOR KEY SHARE) acquire row-level locks on selected rows. This is incompatible with stream tables because:
-
Refresh semantics. Stream table contents are managed by the refresh engine using bulk
MERGEoperations. Row-level locks taken during the defining query would conflict with the refresh engine's own locking strategy. -
No direct DML. Since users cannot directly modify stream table rows, there is no use case for locking rows inside the defining query. The locks would be held for the duration of the refresh transaction and then released, serving no purpose.
How does ALL (subquery) work?
ALL (subquery) comparisons (e.g., WHERE x > ALL (SELECT y FROM t)) are supported via an automatic rewrite to NOT EXISTS. For example, x > ALL (SELECT y FROM t) is rewritten to NOT EXISTS (SELECT 1 FROM t WHERE y >= x), which pg_trickle handles via its anti-join operator.
Why is ORDER BY silently discarded?
ORDER BY in the defining query is accepted but ignored. This is consistent with how PostgreSQL treats CREATE MATERIALIZED VIEW AS SELECT ... ORDER BY ... — the ordering is not preserved in the stored data.
Stream tables are heap tables with no guaranteed row order. The ORDER BY in the defining query would only affect the order of the initial INSERT, which has no lasting effect. Apply ordering when querying the stream table:
-- This ORDER BY is meaningless in the defining query:
'SELECT region, SUM(amount) FROM orders GROUP BY region ORDER BY total DESC'
-- Instead, order when reading:
SELECT * FROM regional_totals ORDER BY total DESC
Why are unsupported aggregates (CORR, COVAR_*, REGR_*) limited to FULL mode?
Regression aggregates like CORR, COVAR_POP, COVAR_SAMP, and the REGR_* family require maintaining running sums of products and squares across the entire group. Unlike COUNT/SUM/AVG (where deltas can be computed from the change alone) or group-rescan aggregates (where only affected groups are re-read), regression aggregates:
-
Lack algebraic delta rules. There is no closed-form way to update a correlation coefficient from a single row change without access to the full group's data.
-
Would degrade to group-rescan anyway. Even if supported, the implementation would need to rescan the full group from source — identical to FULL mode for most practical group sizes.
These aggregates work fine in FULL refresh mode, which re-runs the entire query from scratch each cycle.
Why Are These Stream Table Operations Restricted?
Stream tables are regular PostgreSQL heap tables under the hood, but their contents are managed exclusively by the refresh engine. This section explains why certain operations that work on ordinary tables are disallowed or unsupported on stream tables, and what to do instead.
Why can't I INSERT, UPDATE, or DELETE rows in a stream table?
Stream table contents are the output of the refresh engine — they represent the materialized result of the defining query at a specific point in time. Direct DML would corrupt this contract in several ways:
-
Row ID integrity. Every row has a
__pgt_row_id(a 64-bit xxHash of the group-by key or all columns). The refresh engine uses this for deltaMERGE— matching incoming deltas against existing rows. A manually inserted row with an incorrect or duplicate__pgt_row_idwould cause the next differential refresh to produce wrong results (double-counting, missed deletes, or merge conflicts). -
Frontier inconsistency. Each refresh records a frontier — a set of per-source LSN positions that represent "data up to this point has been materialized." A manual DML change is not tracked by any frontier. The next differential refresh would either overwrite the change (if the delta touches the same row) or leave the stream table in a state that doesn't match any consistent point-in-time snapshot of the source data.
-
Change buffer desync. The CDC triggers on source tables write changes to buffer tables. The refresh engine reads these buffers and advances the frontier. Manual DML on the stream table bypasses this pipeline entirely — the buffer and frontier have no record of the change, so future refreshes cannot account for it.
If you need to post-process stream table data, create a view or a second stream table that references the first one.
Why can't I add foreign keys to or from a stream table?
Foreign key constraints require that referenced/referencing rows exist at the time of each DML statement. The refresh engine violates this assumption:
-
Bulk
MERGEordering. A differential refresh executes a singleMERGE INTOstatement that applies all deltas (inserts and deletes) atomically. PostgreSQL evaluates FK constraints row-by-row within thisMERGE. If a parent row is deleted and a new parent inserted in the same delta batch, the child FK check may fail because it sees the delete before the insert — even though the final state would be consistent. -
Full refresh uses
TRUNCATE+INSERT. In FULL mode, the refresh engine truncates the stream table and re-inserts all rows.TRUNCATEdoes not fire individualDELETEtriggers and bypasses FK cascade logic, which would leave referencing tables with dangling references. -
Cross-table refresh ordering. If stream table A has an FK referencing stream table B, both tables refresh independently (in topological order, but in separate transactions). There is no guarantee that A's refresh sees B's latest data — the FK constraint could transiently fail between refreshes.
Workaround: Enforce referential integrity in the consuming application or use a view that joins the stream tables and validates the relationship.
How do user-defined triggers work on stream tables?
When a DIFFERENTIAL mode stream table has user-defined row-level triggers, the refresh engine uses explicit DML decomposition instead of MERGE:
-
Delta materialized once. The delta query result is stored in a temporary table (
__pgt_delta_<id>) to avoid evaluating it three times. -
DELETE removed rows. Rows in the stream table whose
__pgt_row_idis absent from the delta are deleted.AFTER DELETEtriggers fire with correctOLDvalues. -
UPDATE changed rows. Rows whose
__pgt_row_idexists in both the stream table and delta but whose values differ (checked viaIS DISTINCT FROM) are updated.AFTER UPDATEtriggers fire with correctOLDandNEW. No-op updates (where values are identical) are skipped, preventing spurious triggers. -
INSERT new rows. Rows in the delta whose
__pgt_row_idis absent from the stream table are inserted.AFTER INSERTtriggers fire with correctNEWvalues.
FULL refresh behavior: Row-level user triggers are automatically suppressed during FULL refresh via DISABLE TRIGGER USER / ENABLE TRIGGER USER. A NOTIFY pgtrickle_refresh is emitted so listeners know a FULL refresh occurred. Use REFRESH MODE DIFFERENTIAL for stream tables that need per-row trigger semantics.
Performance: The explicit DML path adds ~25–60% overhead compared to MERGE for triggered stream tables. Stream tables without user triggers have zero overhead (only a fast pg_trigger check, <0.1 ms).
Control: The pg_trickle.user_triggers GUC controls this behavior:
auto(default): detect user triggers automaticallyoff: always use MERGE, suppressing triggerson: deprecated compatibility alias forauto
Why can't I ALTER TABLE a stream table directly?
Stream table metadata (defining query, schedule, refresh mode) is stored in the pg_trickle catalog (pgtrickle.pgt_stream_tables). A direct ALTER TABLE would change the physical table without updating the catalog, causing:
-
Column mismatch. If you add or remove columns, the refresh engine's cached delta query and
MERGEstatement would reference columns that no longer exist (or miss new ones), causing runtime errors. -
__pgt_row_idinvalidation. The row ID hash is computed from the defining query's output columns. Altering the table schema without updating the defining query would make existing row IDs inconsistent with the new column set.
Use pgtrickle.alter_stream_table() to change schedule, refresh mode, or status. To change the defining query or column structure, drop and recreate the stream table.
Why can't I TRUNCATE a stream table?
TRUNCATE removes all rows instantly but does not update the pg_trickle frontier or change buffers. After a TRUNCATE:
-
Differential refresh sees no changes. The frontier still records the last-processed LSN. No new source changes may have occurred, so the next differential refresh produces an empty delta — leaving the stream table empty even though the source still has data.
-
No recovery path for differential mode. The refresh engine has no way to detect that the stream table was externally truncated. It assumes the current contents match the frontier.
Use pgtrickle.refresh_stream_table('my_table') to force a full re-materialization, or drop and recreate the stream table if you need a clean slate.
What are the memory limits for delta processing?
The differential refresh path executes the delta query as a single SQL statement. For large batches (e.g., a bulk UPDATE of 10M rows), PostgreSQL may attempt to materialize the entire delta result set in memory. If the delta exceeds work_mem, PostgreSQL will spill to temporary files on disk, which is slower but safe. In extreme cases, OOM (out of memory) can occur if work_mem is set very high and the delta is enormous.
Mitigations:
-
Adaptive fallback. The
pg_trickle.differential_max_change_ratioGUC (default 0.15) automatically triggers a FULL refresh when the ratio of pending changes to total rows exceeds the threshold. This prevents large deltas from consuming excessive memory. -
work_memtuning. PostgreSQL'swork_memsetting controls how much memory each sort/hash operation uses before spilling to disk. For pg_trickle workloads, 64–256 MB is typical. Monitortemp_blks_writteninpg_stat_statementsto detect spilling. -
pg_trickle.merge_work_mem_mbGUC. Sets a session-levelwork_memoverride during the MERGE execution (default: 0 = use globalwork_mem). This allows higher memory for refresh without affecting other queries. -
Monitoring. If
pg_stat_statementsis installed, pg_trickle logs a warning when the MERGE query writes temporary blocks to disk.
Why are refreshes processed sequentially by default?
The default (parallel_refresh_mode = 'off') is sequential because it is simple, correct, and efficient for most workloads. Topological ordering guarantees upstream stream tables refresh before downstream ones with no coordination overhead.
When to consider enabling parallel refresh:
- Your database has many independent stream tables (no shared dependencies).
- Total cycle time = sum of all refresh durations and some refreshes are visibly blocking unrelated ones.
- You have enough
max_worker_processesheadroom (each parallel worker uses one slot).
Enabling parallel refresh (v0.4.0+):
ALTER SYSTEM SET pg_trickle.parallel_refresh_mode = 'on';
SELECT pg_reload_conf();
With parallel_refresh_mode = 'on', the scheduler builds an execution-unit DAG and dispatches independent units to dynamic background workers. Atomic consistency groups and IMMEDIATE-trigger closures remain single-worker for correctness.
See CONFIGURATION.md — Parallel Refresh for tuning guidance including the worker-budget formula.
How many connections does pg_trickle use?
pg_trickle uses the following PostgreSQL connections:
| Component | Connections | When |
|---|---|---|
| Background scheduler | 1 | Always (per database with STs) |
| WAL decoder polling | 0 (shared) | Uses the scheduler's SPI connection |
| Manual refresh | 1 | Per-call (uses caller's session) |
Total: 1 persistent connection per database. WAL decoder polling shares the scheduler's SPI connection rather than opening separate connections.
max_worker_processes: pg_trickle registers 1 background worker per database during _PG_init(). Ensure max_worker_processes (default 8) has room for the pg_trickle worker plus any other extensions.
Advisory locks: The scheduler holds a session-level advisory lock per actively-refreshing ST. These are released immediately after each refresh completes.
Troubleshooting & Failure Mode Runbook
This document covers common failure scenarios, their symptoms, diagnosis steps, and resolution procedures. It is intended for operators and DBAs running pg_trickle in production.
Quick start: Run
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';for a single-call triage of your installation.
See also:
- Error Reference — all
PgTrickleErrorvariants with causes and fixes- FAQ — Troubleshooting section — common user questions
- Pre-Deployment Checklist — configuration verification
- Configuration — GUC reference
Table of Contents
- Diagnostic Toolkit
- Failure Scenarios
- 1. Scheduler Not Running
- 2. Stream Table Stuck in SUSPENDED Status
- 3. CDC Triggers Missing or Disabled
- 4. WAL Replication Slot Lag or Missing
- 5. Stream Table Stuck in INITIALIZING
- 6. Change Buffers Growing Without Refresh
- 7. Lock Contention Blocking Refresh
- 8. Out-of-Memory During Refresh
- 9. Disk Full / WAL Retention Exceeded
- 10. Circular Pipeline Convergence Failure
- 11. Schema Change Broke Stream Table
- 12. Worker Pool Exhaustion
- 13. Fuse Tripped (Circuit Breaker)
Diagnostic Toolkit
These functions are your primary tools for diagnosing issues:
| Function | Purpose |
|---|---|
pgtrickle.health_check() | Single-call overall health triage (OK/WARN/ERROR) |
pgtrickle.pgt_status() | Status, staleness, error count for all stream tables |
pgtrickle.refresh_timeline(N) | Last N refresh events across all stream tables |
pgtrickle.diagnose_errors('name') | Last 5 failed events with classification and remediation |
pgtrickle.change_buffer_sizes() | CDC pipeline: pending rows and buffer bytes per source |
pgtrickle.trigger_inventory() | CDC trigger presence and enabled state |
pgtrickle.check_cdc_health() | WAL replication slot health (WAL mode only) |
pgtrickle.dependency_tree() | Dependency DAG visualization |
pgtrickle.worker_pool_status() | Parallel refresh worker pool state |
pgtrickle.explain_st('name') | DVM operator tree and generated delta SQL |
Quick health check script:
-- 1. Overall health
SELECT * FROM pgtrickle.health_check() WHERE severity != 'OK';
-- 2. Problem stream tables
SELECT name, status, refresh_mode, consecutive_errors, staleness
FROM pgtrickle.pgt_status()
WHERE status != 'ACTIVE' OR consecutive_errors > 0
ORDER BY consecutive_errors DESC;
-- 3. Recent failures
SELECT start_time, stream_table, action, status, duration_ms, error_message
FROM pgtrickle.refresh_timeline(20)
WHERE status = 'FAILED';
Failure Scenarios
1. Scheduler Not Running
Symptoms:
- No stream tables are refreshing
health_check()reportsscheduler_running = false- No
pg_trickle schedulerprocess inpg_stat_activity
Diagnosis:
-- Check for the scheduler process
SELECT pid, datname, backend_type, state
FROM pg_stat_activity
WHERE backend_type = 'pg_trickle scheduler';
-- Check GUC
SHOW pg_trickle.enabled;
-- Check shared_preload_libraries
SHOW shared_preload_libraries;
Resolution:
| Cause | Fix |
|---|---|
pg_trickle.enabled = off | ALTER SYSTEM SET pg_trickle.enabled = on; SELECT pg_reload_conf(); |
Not in shared_preload_libraries | Add pg_trickle to shared_preload_libraries in postgresql.conf and restart PostgreSQL |
max_worker_processes exhausted | Increase max_worker_processes and restart. The launcher retries every 5 minutes — check PostgreSQL logs for WARNING: pg_trickle launcher: could not spawn scheduler |
| Scheduler crashed | Check PostgreSQL logs for crash details. The launcher will auto-restart it. If recurring, check for OOM or resource limits |
2. Stream Table Stuck in SUSPENDED Status
Symptoms:
- Stream table status shows
SUSPENDED consecutive_errorsis at or abovepg_trickle.max_consecutive_errors- No refreshes happening for this stream table
Diagnosis:
-- Check the stream table status
SELECT pgt_name, status, consecutive_errors, last_error_message
FROM pgtrickle.pg_stat_stream_tables
WHERE pgt_name = 'my_stream_table';
-- Get detailed error history
SELECT * FROM pgtrickle.diagnose_errors('my_stream_table');
Resolution:
- Fix the underlying error (check
last_error_messageanddiagnose_errors) - Resume the stream table:
SELECT pgtrickle.alter_stream_table('my_stream_table', enabled => true);
- Trigger a manual refresh to verify:
SELECT pgtrickle.refresh_stream_table('my_stream_table');
Prevention: Increase pg_trickle.max_consecutive_errors (default 3) if
transient errors are common in your environment:
ALTER SYSTEM SET pg_trickle.max_consecutive_errors = 5;
SELECT pg_reload_conf();
3. CDC Triggers Missing or Disabled
Symptoms:
- Stream table refreshes succeed but shows no changes
change_buffer_sizes()showspending_rows = 0despite active DML- Source tables have no pg_trickle triggers
Diagnosis:
-- Check trigger inventory
SELECT source_table, trigger_type, trigger_name, present, enabled
FROM pgtrickle.trigger_inventory()
WHERE NOT present OR NOT enabled;
-- Manual check on a specific source table
SELECT tgname, tgenabled
FROM pg_trigger
WHERE tgrelid = 'public.orders'::regclass
AND tgname LIKE 'pgt_%';
Resolution:
| Cause | Fix |
|---|---|
Triggers dropped by DDL (e.g., pg_dump + restore without triggers) | Drop and recreate the stream table, or reinitialize: SELECT pgtrickle.refresh_stream_table('my_st'); |
Triggers disabled (ALTER TABLE ... DISABLE TRIGGER) | ALTER TABLE source_table ENABLE TRIGGER ALL; |
| Source gating active | Check SELECT * FROM pgtrickle.source_gates(); and ungate: SELECT pgtrickle.ungate_source('source_table'); |
| WAL mode active but slot missing | See WAL Replication Slot Lag or Missing |
4. WAL Replication Slot Lag or Missing
Symptoms:
check_cdc_health()showsslot_lag_exceeds_thresholdorreplication_slot_missing- WAL disk usage growing unexpectedly
- Stream tables not receiving changes in WAL mode
Diagnosis:
-- Check CDC health
SELECT * FROM pgtrickle.check_cdc_health();
-- Check replication slots directly
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS lag
FROM pg_replication_slots
WHERE slot_name LIKE 'pgt_%';
Resolution:
| Cause | Fix |
|---|---|
| Slot dropped externally | pg_trickle will auto-fallback to trigger-based CDC. To recreate: drop and recreate the stream table |
| Slot lagging (WAL accumulation) | Check for long-running transactions: SELECT pid, age(backend_xmin) FROM pg_stat_replication;. Kill idle-in-transaction sessions |
wal_level != logical | WAL CDC requires wal_level = logical. Set it and restart PostgreSQL |
max_replication_slots exhausted | Increase max_replication_slots and restart |
Fallback: Force trigger-based CDC mode if WAL mode is problematic:
ALTER SYSTEM SET pg_trickle.cdc_mode = 'trigger';
SELECT pg_reload_conf();
5. Stream Table Stuck in INITIALIZING
Symptoms:
- Stream table status is
INITIALIZINGfor an extended period - The initial full refresh may have failed or is still running
Diagnosis:
-- Check refresh history
SELECT * FROM pgtrickle.get_refresh_history('my_st', 5);
-- Check for active refresh
SELECT pid, state, query, now() - query_start AS running_for
FROM pg_stat_activity
WHERE query LIKE '%pgtrickle%' AND state = 'active';
Resolution:
| Cause | Fix |
|---|---|
| Initial refresh failed (check error in history) | Fix the error, then: SELECT pgtrickle.refresh_stream_table('my_st'); |
| Defining query is very slow | Optimize the query, add indexes on source tables, or increase work_mem |
| Lock contention during initial refresh | See Lock Contention |
6. Change Buffers Growing Without Refresh
Symptoms:
change_buffer_sizes()shows largepending_rowsand growingbuffer_bytes- Stream tables are stale
- Refreshes are not running or are failing
Diagnosis:
-- Check buffer sizes
SELECT stream_table, source_table, pending_rows, buffer_bytes
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
-- Check if refreshes are happening
SELECT * FROM pgtrickle.refresh_timeline(10);
-- Check for blocked refresh processes
SELECT pid, wait_event_type, wait_event, state, query
FROM pg_stat_activity
WHERE query LIKE '%pgtrickle%';
Resolution:
| Cause | Fix |
|---|---|
| Scheduler not running | See Scheduler Not Running |
| All refreshes failing | Check diagnose_errors() for each affected stream table |
| Lock contention | See Lock Contention |
| Very large buffer causing slow MERGE | Consider lowering pg_trickle.differential_change_ratio_threshold to trigger FULL refresh for large batches |
Emergency: If buffers are dangerously large and you need immediate relief:
-- Force a full refresh (bypasses change buffers)
SELECT pgtrickle.refresh_stream_table('my_st', force_full => true);
7. Lock Contention Blocking Refresh
Symptoms:
- Refresh duration is much longer than usual
pg_stat_activityshows refresh processes inLockwait state- Long-running transactions on source or stream tables
Diagnosis:
-- Find blocking locks
SELECT blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid AND NOT bl.granted
JOIN pg_locks gl ON gl.locktype = bl.locktype
AND gl.database IS NOT DISTINCT FROM bl.database
AND gl.relation IS NOT DISTINCT FROM bl.relation
AND gl.page IS NOT DISTINCT FROM bl.page
AND gl.tuple IS NOT DISTINCT FROM bl.tuple
AND gl.pid != bl.pid
AND gl.granted
JOIN pg_stat_activity blocking ON blocking.pid = gl.pid
WHERE blocked.query LIKE '%pgtrickle%';
Resolution:
- Identify and terminate the blocking session if appropriate:
SELECT pg_terminate_backend(<blocking_pid>); - Investigate why the blocking transaction is long-running (idle in transaction, slow query, etc.)
- Consider adding
statement_timeoutoridle_in_transaction_session_timeoutto prevent future occurrences
8. Out-of-Memory During Refresh
Symptoms:
- Refresh processes killed by OS OOM killer
- PostgreSQL logs show
out of memoryerrors - Stream tables fail with system-category errors
Diagnosis:
# Check OS OOM killer logs
dmesg | grep -i "oom\|killed process" | tail -20
# Check PostgreSQL logs for memory errors
grep -i "out of memory\|oom" /var/log/postgresql/postgresql-*.log | tail -10
-- Check which stream tables have large source data
SELECT stream_table, source_table, pending_rows
FROM pgtrickle.change_buffer_sizes()
ORDER BY pending_rows DESC;
Resolution:
| Cause | Fix |
|---|---|
| Large FULL refresh on big table | Reduce work_mem or maintenance_work_mem to limit per-query memory |
| Large change buffer accumulation | Refresh more frequently (shorter schedule) to keep buffers small |
| Complex query with many joins | Simplify the defining query or break into cascading stream tables |
| Parallel refresh amplifies memory | Reduce pg_trickle.max_concurrent_refreshes |
Tuning:
-- Limit per-refresh memory
SET work_mem = '64MB';
-- Limit concurrent refreshes to reduce peak memory
ALTER SYSTEM SET pg_trickle.max_concurrent_refreshes = 2;
SELECT pg_reload_conf();
9. Disk Full / WAL Retention Exceeded
Symptoms:
- PostgreSQL logs
No space left on deviceerrors - WAL directory consuming excessive disk
- Replication slots preventing WAL cleanup
Diagnosis:
# Check disk usage
df -h /var/lib/postgresql/data
du -sh /var/lib/postgresql/data/pg_wal/
-- Check replication slot WAL retention
SELECT slot_name, active,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots
ORDER BY pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) DESC;
-- Check change buffer table sizes
SELECT stream_table, source_table,
pg_size_pretty(buffer_bytes::bigint) AS buffer_size
FROM pgtrickle.change_buffer_sizes()
ORDER BY buffer_bytes DESC;
Resolution:
| Cause | Fix |
|---|---|
| Inactive replication slot holding WAL | Drop the slot: SELECT pg_drop_replication_slot('pgt_...'); |
| Change buffer tables too large | Force full refresh to clear buffers, or refresh more frequently |
| WAL accumulation from long transactions | Terminate idle-in-transaction sessions |
max_wal_size too low | Increase max_wal_size in postgresql.conf |
Emergency cleanup:
-- Drop inactive pg_trickle replication slots
SELECT pg_drop_replication_slot(slot_name)
FROM pg_replication_slots
WHERE slot_name LIKE 'pgt_%' AND NOT active;
10. Circular Pipeline Convergence Failure
Symptoms:
- Stream tables in a circular dependency hit the maximum iteration limit
- Refresh history shows repeated cycles without convergence
- Error messages mention
fixed_point_max_iterations
Diagnosis:
-- Check for circular dependencies
SELECT * FROM pgtrickle.dependency_tree();
-- Check refresh history for iteration patterns
SELECT start_time, stream_table, action, status, error_message
FROM pgtrickle.refresh_timeline(50)
WHERE stream_table IN ('st_a', 'st_b') -- suspected cycle members
ORDER BY start_time DESC;
Resolution:
- Verify the cycle is intentional (see Circular Dependencies tutorial)
- Increase the iteration limit if convergence is slow:
ALTER SYSTEM SET pg_trickle.fixed_point_max_iterations = 20; SELECT pg_reload_conf(); - If the cycle never converges, the defining queries may not be monotone. Restructure to eliminate the cycle or ensure monotonicity
11. Schema Change Broke Stream Table
Symptoms:
- Stream table has
needs_reinit = true - Reinitialization keeps failing
- Error messages reference dropped or renamed columns
Diagnosis:
-- Check for pending reinit
SELECT pgt_name, needs_reinit, status, last_error_message
FROM pgtrickle.pg_stat_stream_tables
WHERE needs_reinit;
-- Get error details
SELECT * FROM pgtrickle.diagnose_errors('my_st');
Resolution:
If the defining query is still valid after the DDL change, force a reinit:
SELECT pgtrickle.refresh_stream_table('my_st');
If the defining query needs to be updated:
-- Option 1: Alter the defining query
SELECT pgtrickle.alter_stream_table('my_st',
query => 'SELECT new_column, SUM(amount) FROM orders GROUP BY new_column'
);
-- Option 2: Drop and recreate
SELECT pgtrickle.drop_stream_table('my_st');
SELECT pgtrickle.create_stream_table(
'my_st',
'SELECT new_column, SUM(amount) FROM orders GROUP BY new_column',
'1m'
);
12. Worker Pool Exhaustion
Symptoms:
- Refresh latency increases across the board
- Some stream tables refresh while others queue indefinitely
worker_pool_status()shows all workers busy
Diagnosis:
-- Check worker pool
SELECT * FROM pgtrickle.worker_pool_status();
-- Check for long-running parallel jobs
SELECT job_id, unit_key, status, duration_ms
FROM pgtrickle.parallel_job_status(300)
WHERE status = 'RUNNING'
ORDER BY duration_ms DESC;
Resolution:
| Cause | Fix |
|---|---|
| Too few workers for workload | Increase pg_trickle.max_concurrent_refreshes and/or max_worker_processes |
| One stream table monopolizing workers | Check if a single slow refresh is blocking the pool. Consider splitting into smaller stream tables |
| Global worker cap reached | Increase pg_trickle.max_dynamic_refresh_workers |
13. Fuse Tripped (Circuit Breaker)
Symptoms:
- Stream table shows
fuse_state = 'BLOWN'or refresh is paused fuse_status()reports a tripped fuse- No refreshes happening despite active scheduler
Diagnosis:
-- Check fuse status
SELECT * FROM pgtrickle.fuse_status();
Resolution:
Reset the fuse after investigating the root cause:
SELECT pgtrickle.reset_fuse('my_stream_table');
See the Fuse Circuit Breaker tutorial for details on fuse thresholds and configuration.
General Diagnostic Workflow
When investigating any issue, follow this sequence:
1. health_check() → identify which subsystem is unhealthy
2. pgt_status() → find specific affected stream tables
3. diagnose_errors('name') → get root cause for failures
4. refresh_timeline(20) → correlate with recent refresh events
5. change_buffer_sizes() → check CDC pipeline health
6. trigger_inventory() → verify change capture is working
7. dependency_tree() → confirm DAG wiring
8. PostgreSQL logs → low-level crash/resource details
GUC Quick Reference for Troubleshooting
| GUC | Default | What to check |
|---|---|---|
pg_trickle.enabled | on | Must be on for scheduler to run |
pg_trickle.max_consecutive_errors | 3 | Stream tables suspend after this many failures |
pg_trickle.scheduler_interval_ms | 100 | Very high values cause refresh lag |
pg_trickle.event_driven_wake | on | off = poll-only, higher latency |
pg_trickle.cdc_mode | auto | trigger for reliable fallback |
pg_trickle.max_concurrent_refreshes | 4 | Per-database parallel refresh cap |
pg_trickle.fixed_point_max_iterations | 10 | Circular pipeline iteration limit |
pg_trickle.differential_change_ratio_threshold | 0.5 | Falls back to FULL above this ratio |
pg_trickle.auto_backoff | on | Stretches intervals up to 8x under load |
pg_trickle Error Reference
This document lists all PgTrickleError variants with descriptions, common
causes, and suggested fixes. If you encounter an error not listed here, please
open an issue.
Tip: Most errors include context (table name, OID, or query fragment) in the message text. Use that context to narrow down the root cause.
SQLSTATE Code Reference
Every pg_trickle error includes a PostgreSQL SQLSTATE code for programmatic
error handling. Use SQLSTATE in PL/pgSQL EXCEPTION WHEN blocks or check
the error code in your client library.
| Error Variant | SQLSTATE | Code Name |
|---|---|---|
| QueryParseError | 42000 | SYNTAX_ERROR_OR_ACCESS_RULE_VIOLATION |
| TypeMismatch | 42804 | DATATYPE_MISMATCH |
| UnsupportedOperator | 0A000 | FEATURE_NOT_SUPPORTED |
| CycleDetected | 3F000 | INVALID_SCHEMA_DEFINITION |
| NotFound | 42P01 | UNDEFINED_TABLE |
| AlreadyExists | 42P07 | DUPLICATE_TABLE |
| InvalidArgument | 22023 | INVALID_PARAMETER_VALUE |
| QueryTooComplex | 54000 | PROGRAM_LIMIT_EXCEEDED |
| UpstreamTableDropped | 42P01 | UNDEFINED_TABLE |
| UpstreamSchemaChanged | 42P17 | INVALID_TABLE_DEFINITION |
| LockTimeout | 55P03 | LOCK_NOT_AVAILABLE |
| ReplicationSlotError | 55000 | OBJECT_NOT_IN_PREREQUISITE_STATE |
| WalTransitionError | 55000 | OBJECT_NOT_IN_PREREQUISITE_STATE |
| SpiError | XX000 | INTERNAL_ERROR |
| SpiPermissionError | 42501 | INSUFFICIENT_PRIVILEGE |
| WatermarkBackwardMovement | 22000 | DATA_EXCEPTION |
| WatermarkGroupNotFound | 42704 | UNDEFINED_OBJECT |
| WatermarkGroupAlreadyExists | 42710 | DUPLICATE_OBJECT |
| RefreshSkipped | 55000 | OBJECT_NOT_IN_PREREQUISITE_STATE |
| InternalError | XX000 | INTERNAL_ERROR |
Error Categories
pg_trickle classifies errors into four categories that determine retry behavior:
| Category | Retried by scheduler? | Description |
|---|---|---|
| User | No | Invalid queries, type mismatches, DAG cycles. Fix the input. |
| Schema | No (triggers reinitialize) | Upstream DDL changes. The stream table is reinitialized automatically. |
| System | Yes (with backoff) | Lock timeouts, replication slot problems, transient SPI failures. |
| Internal | No | Unexpected bugs. Please report these. |
User Errors
QueryParseError
Message: query parse error: <details>
Description: The defining query could not be parsed or validated by the pg_trickle query analyzer.
Common causes:
- Syntax error in the defining query
- Use of PostgreSQL syntax not yet supported by pgrx's query parser
- A CTE or subquery that cannot be analyzed
Suggested fix: Simplify the query. Check that it runs as a standalone
SELECT statement. Review SQL Reference — Expression Support
for supported syntax.
TypeMismatch
Message: type mismatch: <details>
Description: A type incompatibility was detected between the defining query output and the stream table schema, or between source columns and expected types.
Common causes:
- Column type changed on a source table after stream table creation
- Explicit cast to an incompatible type in the defining query
UNIONbranches with mismatched column types
Suggested fix: Ensure column types match. Use explicit CAST() to align
types if needed. If the source table changed, use
pgtrickle.repair_stream_table() to reinitialize.
UnsupportedOperator
Message: unsupported operator for DIFFERENTIAL mode: <operator>
Description: The defining query uses an SQL operator or construct that pg_trickle cannot maintain incrementally.
Common causes:
TABLESAMPLE,GROUPING SETSbeyond the branch limit, recursive CTEs with unsupported patterns, certain window function combinations- Non-monotonic or volatile functions in positions that prevent differential maintenance
Suggested fix: Use refresh_mode => 'FULL' to fall back to full
recomputation:
SELECT pgtrickle.alter_stream_table('my_stream_table',
refresh_mode => 'FULL');
Or restructure the query to avoid the unsupported construct. See SQL Reference — Expression Support.
CycleDetected
Message: cycle detected in dependency graph: A -> B -> C -> A
Description: Adding or altering this stream table would create a circular dependency in the refresh DAG.
Common causes:
- Stream table A depends on stream table B, which depends on A
- Indirect cycles through chains of stream tables
Suggested fix: Restructure the stream table definitions to break the cycle.
Use pgtrickle.get_dependency_graph() to visualize the current DAG. If
circular dependencies are intentional, enable pg_trickle.allow_circular = true
(see Configuration).
NotFound
Message: stream table not found: <name>
Description: The specified stream table does not exist in the
pgtrickle.pgt_stream_tables catalog.
Common causes:
- Typo in the stream table name
- The stream table was already dropped
- Schema-qualified name required but not provided (e.g.,
myschema.my_st)
Suggested fix: Check the name with pgtrickle.list_stream_tables(). Use
the fully qualified name: schema.table_name.
AlreadyExists
Message: stream table already exists: <name>
Description: A create_stream_table() call was made for a stream table
name that is already registered.
Common causes:
- Re-running a migration or DDL script without
IF NOT EXISTS
Suggested fix: Use pgtrickle.create_stream_table_if_not_exists() or
pgtrickle.create_or_replace_stream_table() for idempotent creation.
InvalidArgument
Message: invalid argument: <details>
Description: An invalid value was passed to a pg_trickle API function.
Common causes:
- Invalid
refresh_modevalue (must be'DIFFERENTIAL','FULL', or'AUTO') - Calling
resume_stream_table()on a table that is not suspended - Invalid schedule interval or threshold value
- Empty or malformed table name
Suggested fix: Check the function signature in the SQL Reference and correct the argument.
QueryTooComplex
Message: query too complex: <details>
Description: The defining query exceeds the maximum parse depth, which protects against stack overflow during query analysis.
Common causes:
- Deeply nested subqueries (> 64 levels by default)
- Large
UNION ALLchains - Complex CTE hierarchies
Suggested fix: Simplify the query. If the depth limit is too restrictive,
increase pg_trickle.max_parse_depth (default: 64). See
Configuration.
Schema Errors
UpstreamTableDropped
Message: upstream table dropped: OID <oid>
Description: A source table referenced by the stream table's defining query was dropped.
Common causes:
DROP TABLEon a source table- Table replaced via
DROP+CREATE(new OID)
Suggested fix: Either recreate the source table with the same schema or
drop the stream table and recreate it. If pg_trickle.block_source_ddl = true
(default), the DROP would have been blocked in the first place.
UpstreamSchemaChanged
Message: upstream table schema changed: OID <oid>
Description: A source table's schema was altered (e.g., column added, dropped, or type changed) in a way that affects the defining query.
Common causes:
ALTER TABLE ... ADD/DROP/ALTER COLUMNon a source table- Type change on a column used in the defining query
Suggested fix: The stream table will be automatically reinitialized on the
next scheduler tick. If pg_trickle.block_source_ddl = true (default), most
schema changes are blocked proactively. Use
pgtrickle.alter_stream_table(..., query => '...') to update the defining
query if needed.
System Errors
LockTimeout
Message: lock timeout: <details>
Description: A lock required for refresh could not be acquired within the configured timeout.
Common causes:
- Long-running transactions holding locks on the stream table or source tables
- Concurrent
ALTER TABLEorVACUUM FULLoperations - High contention on the change buffer tables
Suggested fix: This error is automatically retried with exponential backoff.
If persistent, investigate long-running transactions with pg_stat_activity.
Consider increasing lock_timeout or reducing refresh frequency.
ReplicationSlotError
Message: replication slot error: <details>
Description: An error occurred with the logical replication slot used for WAL-based CDC.
Common causes:
- Replication slot dropped externally
wal_levelchanged fromlogicalto a lower level- Slot lag exceeded
max_slot_wal_keep_size
Suggested fix: Check replication slot status with
SELECT * FROM pg_replication_slots. Ensure wal_level = logical. If the slot
was dropped, pg_trickle will recreate it automatically. See
Configuration — WAL CDC.
WalTransitionError
Message: WAL transition error: <details>
Description: An error occurred during the transition from trigger-based CDC to WAL-based CDC.
Common causes:
wal_levelis notlogicalwhencdc_mode = 'auto'- Transient connection issues during the transition
Suggested fix: Ensure wal_level = logical in postgresql.conf if you
want WAL-based CDC. Otherwise set pg_trickle.cdc_mode = 'trigger' to stay
on trigger-based CDC. This error is retried automatically.
SpiError
Message: SPI error: <details>
Description: A PostgreSQL Server Programming Interface (SPI) error occurred during an internal query.
Common causes:
- Transient serialization failures under high concurrency
- Deadlocks between refresh and concurrent DML
- Connection issues in background workers
- Permanent errors: missing columns, syntax errors in generated SQL
Suggested fix: Transient SPI errors (deadlocks, serialization failures) are
retried automatically. Permanent errors (permission denied, missing objects)
will suspend the stream table after max_consecutive_errors failures. Check
pgtrickle.check_health() for details.
SpiPermissionError
Message: SPI permission error: <details>
Description: The background worker's role lacks required permissions.
Common causes:
- Missing
SELECTprivilege on a source table - Missing
INSERT/UPDATE/DELETEprivilege on the stream table - Role used by the background worker is not the table owner
Suggested fix: Grant the necessary privileges to the role running pg_trickle's background workers:
GRANT SELECT ON source_table TO pgtrickle_role;
GRANT ALL ON pgtrickle.my_stream_table TO pgtrickle_role;
This error does not count toward the consecutive error suspension limit.
Watermark Errors
WatermarkBackwardMovement
Message: watermark moved backward: <details>
Description: A watermark advancement was rejected because the new value is older than the current watermark, violating monotonicity.
Common causes:
- Clock skew in distributed systems
- Manual watermark manipulation with an incorrect value
- Bug in watermark tracking logic
Suggested fix: Ensure watermark values are monotonically increasing. Check
the current watermark with pgtrickle.get_watermark_groups().
WatermarkGroupNotFound
Message: watermark group not found: <details>
Description: The specified watermark group does not exist.
Common causes:
- Typo in the watermark group name
- The group was deleted or never created
Suggested fix: List existing groups with
pgtrickle.get_watermark_groups().
WatermarkGroupAlreadyExists
Message: watermark group already exists: <details>
Description: A watermark group with this name already exists.
Common causes:
- Re-running a setup script without idempotent guards
Suggested fix: Use a different name or delete the existing group first.
Transient Errors
RefreshSkipped
Message: refresh skipped: <details>
Description: A refresh was skipped because a previous refresh for the same stream table is still running.
Common causes:
- Slow refresh (large delta or complex query) overlapping with the next scheduled cycle
- Multiple manual
refresh_stream_table()calls in parallel
Suggested fix: No action needed — the scheduler will retry on the next
cycle. If this happens frequently, increase the schedule interval or
investigate why refreshes are slow using pgtrickle.explain_st().
This error does not count toward the consecutive error suspension limit.
Internal Errors
InternalError
Message: internal error: <details>
Description: An unexpected internal error that indicates a bug in pg_trickle.
Common causes:
- This should not happen in normal operation
Suggested fix: Please report the issue
with the full error message, your PostgreSQL version, and pg_trickle version.
Include the output of pgtrickle.check_health() and the relevant PostgreSQL
log entries.
See Also
Changelog
What's new in pg_trickle — written for everyone, not just developers.
For future plans and upcoming features, see ROADMAP.md.
Table of Contents
- Unreleased
- 0.20.0 — Dog Feeding
- 0.19.0 — 2026-04-13
- 0.18.0 — 2026-04-12
- 0.17.0 — 2026-04-08
- 0.16.0 — 2026-04-06
- 0.15.0 — 2026-04-03
- 0.14.0 — 2026-04-02
- 0.13.0 — 2026-03-31
- 0.12.0 — 2026-03-28
- 0.11.0 — 2026-03-26
- 0.10.0 — 2026-03-25
- 0.9.0 — 2026-03-20
- 0.8.0 — 2026-03-17
- 0.7.0 — 2026-03-16
- 0.6.0 — 2026-03-14
- 0.5.0 — 2026-03-13
- 0.4.0 — 2026-03-12
- 0.3.0 — 2026-03-11
- 0.2.3 — 2026-03-09
- 0.2.2 — 2026-03-08
- 0.2.1 — 2026-03-05
- 0.2.0 — 2026-03-04
- 0.1.3 — 2026-03-02
- 0.1.2 — 2026-02-28
- 0.1.1 — 2026-02-26
- 0.1.0 — 2026-02-26
[Unreleased]
[0.20.0] — Dog Feeding
pg_trickle now monitors itself. Instead of you having to check on
pg_trickle's health manually, this release lets pg_trickle watch its own
performance, spot problems early, and even fix some of them on its own. Five
new stream tables sit in the pgtrickle schema and continuously analyse
refresh history — the same technology you use for your own data, pointed
inward. One SQL call sets everything up; one call tears it down.
We call this dog feeding — pg_trickle uses its own stream-table technology to keep an eye on itself, just like it keeps your data views up to date.
What's new
-
One-click self-monitoring — run
SELECT pgtrickle.setup_dog_feeding()and pg_trickle creates five monitoring stream tables that continuously track how well it is performing. Runteardown_dog_feeding()to remove them. Both are idempotent — safe to call as many times as you like, even during rolling upgrades. -
Health at a glance — the new
dog_feeding_status()function shows the status of all five monitoring views in one query: whether each one exists, its refresh mode, and the last time it refreshed. Quick to run from a monitoring script or dashboard. -
Threshold recommendations — after enough refresh cycles accumulate (typically 10–20 minutes of activity),
df_threshold_advicestarts producing suggestions for each stream table. Each recommendation includes a confidence level (HIGH / MEDIUM / LOW) and a reason — for example, "DIFF is 73% faster — raise threshold to allow more DIFF". Asla_headroom_pctcolumn shows exactly how much faster incremental refresh is versus full refresh for that table. -
Automatic tuning — set
pg_trickle.dog_feeding_auto_apply = 'threshold_only'and pg_trickle will apply HIGH-confidence threshold recommendations automatically. Changes are rate-limited to once per 10 minutes per stream table, and every adjustment is logged topgt_refresh_historywithinitiated_by = 'DOG_FEED'so you have a full audit trail. -
Real-time alerts — when pg_trickle detects an anomaly (duration spike exceeding 3× the baseline, or two or more recent failures), it sends a
NOTIFYon thepgtrickle_alertchannel with a JSON payload. Your application, Alertmanager webhook, orLISTENclient can act immediately without polling. -
Scheduling interference detection —
df_scheduling_interferencetracks pairs of stream tables that consistently overlap during refresh. When overlap is heavy, the scheduler automatically backs off its poll interval (up to 2× the configured base) to reduce contention. -
Visual dependency graph — the new
explain_dag()function renders your full refresh pipeline as a Mermaid or Graphviz DOT diagram. User stream tables appear in blue, dog-feeding tables in green, suspended tables in red. Paste the output into any Mermaid renderer ordotto see exactly how your tables depend on each other. -
Scheduler overhead report —
scheduler_overhead()returns metrics for the last hour: total refreshes, how many were dog-feeding, the fraction they represent, and average durations. Useful for confirming that self-monitoring adds negligible cost.
What pg_trickle watches
| Monitoring view | What it tracks |
|---|---|
df_efficiency_rolling | Rolling-window refresh speed, change ratio, DIFF vs FULL counts |
df_anomaly_signals | Duration spikes (> 3× baseline), error bursts, mode oscillation |
df_threshold_advice | Per-table threshold recommendations with confidence level and reasoning |
df_cdc_buffer_trends | Change-capture buffer growth rate per source table; alerts on burst spikes |
df_scheduling_interference | Refresh overlap patterns; pairs with 3+ concurrent refreshes in the last hour |
Faster and more reliable
- A new index on
pgt_refresh_history(pgt_id, start_time)speeds up all dog-feeding queries and general history lookups. Applied automatically during the 0.19.0 → 0.20.0 upgrade. - Old history records are now pruned in batches of 1,000 rows per transaction
(previously one large DELETE), which avoids long lock holds on
pgt_refresh_historyduring the nightly cleanup. check_cdc_health()is enriched with spill-risk alerts: if a source table's max burst delta exceeds 10× its average, you get an early warning before the buffer fills.explain_st()now shows two new properties:dog_feeding_coverage(none / partial / full) andrecommended_refresh_mode, so diagnostics automatically surface self-monitoring data when it is available.
New documentation and tooling
- SQL Reference — a new "Dog Feeding — Self-Monitoring" section covers
all five stream tables,
setup_dog_feeding(),teardown_dog_feeding(), confidence levels, and thesla_headroom_pctcolumn. - Getting Started — a new "Day 2 Operations" section walks through enabling dog-feeding, reading recommendations, enabling auto-apply, and visualising the DAG.
- Configuration —
pg_trickle.dog_feeding_auto_applyis fully documented with values, rate-limiting behaviour, and the audit trail. - A ready-made Grafana dashboard (
pg_trickle_dog_feeding.json) with five panels covers refresh throughput, anomaly heatmap, threshold calibration, CDC buffer growth, and the scheduling interference matrix. - A dbt macro (
pgtrickle_enable_monitoring) enables monitoring as a post-hook with one line indbt_project.yml. - A quick-start SQL script at
sql/dog_feeding_setup.sqlwalks through setup, auto-apply, alert listening, and status verification in six steps.
[0.19.0] — 2026-04-13
Safer, faster, easier to operate. This release closes several security and correctness gaps, adds new conveniences for operators and developers, and significantly improves performance for deployments with many stream tables. The background scheduler finds the next table to refresh 10–15× faster. Four breaking changes are included — all easy to adapt to, each one correcting behaviour that was a source of subtle bugs in production.
Breaking changes
-
Only owners can modify their own stream tables — other database users can no longer drop or alter a stream table they did not create. If shared access is intentional, grant superuser or explicitly add the user as owner. Superusers are unaffected.
-
Dropping a stream table no longer cascades —
drop_stream_table()now behaves like PostgreSQL's ownDROP TABLE: it refuses to drop if dependent objects exist, unless you passcascade => trueexplicitly. Previously it silently removed all dependents, which surprised operators after restructuring. -
The refresh notification channel was renamed — change
LISTEN pgtrickle_refreshtoLISTEN pg_trickle_refresh(note the added underscore). The old name was inconsistent with every other channel in the extension. -
The
delete_insertrefresh strategy was removed — this strategy could produce wrong results for queries containing aggregates orDISTINCT. If you had it configured, pg_trickle logs a warning and automatically switches to the safeautostrategy. No data is lost; the next refresh corrects any affected rows.
New features
-
Installation health check —
version_check()returns the installed extension version, the loaded library version, and the PostgreSQL server version in one row. If the extension was upgraded but the server has not been restarted, you get an explicit warning. Useful in deploy scripts and smoke tests. -
Write and refresh in one step —
write_and_refresh(sql, st_name)executes an arbitrary SQL statement and immediately refreshes the named stream table in the same transaction. Downstream readers see consistent results as soon as the transaction commits — no polling loop needed. -
Better connection-pooler support — the new
pg_trickle.connection_pooler_modeGUC configures pg_trickle for PgBouncer, pgcat, or Supavisor at the cluster level. Previously each stream table had to be configured individually, which was error-prone on large deployments. -
Automatic refresh history cleanup —
pgt_refresh_historyis now trimmed automatically after 90 days (configurable withpg_trickle.history_retention_days; set to0to disable). Without this, the history table could grow by thousands of rows per day on busy deployments. -
Schema migration tracking — pg_trickle now records which upgrade scripts have been applied in
pgtrickle.pgt_schema_version. This makes it straightforward to verify that a deployment is fully up to date and simplifies the rollback story. -
Clearer skip messages — when a refresh is skipped because another refresh of the same stream table is already running, you now see a
NOTICE: skipping refresh of <name> — already runningmessage instead of silence. Reduces confusion when debugging slow or stuck schedulers. -
Deeper diagnostics —
explain_st()gains awith_analyzeparameter. When set totrue, it runsEXPLAIN (ANALYZE, BUFFERS)on the refresh query and returns actual row counts, timing, and buffer hit/miss ratios — the same information PostgreSQL's query planner provides for any query, but surfaced inside the stream-table diagnostic tool. -
New deployment guides — step-by-step documentation for PgBouncer, pgcat, Supavisor, CNPG, and Kubernetes deployments, plus an operational runbook for common Kubernetes failure modes.
Bug fixes
-
Fixed a constraint-validation inconsistency in databases upgraded from 0.11.0 or earlier where
pgt_refresh_historyhad a duplicate check entry in the catalog. Affected databases could see spurious constraint errors on busy write paths. -
Error messages throughout the extension now show human-readable table names (e.g.
public.orders) instead of raw PostgreSQL OIDs. This affects "source table was dropped", "schema changed", and several other error paths that were previously unreadable without a catalog lookup.
Performance
-
10–15× faster scheduler dispatch — the scheduler now finds the next stream table to process with a direct lookup instead of scanning the full list on every poll cycle. On a deployment with 500 stream tables this drops from ~650 µs to ~45 µs per poll, reducing background CPU overhead significantly at scale.
-
Single-query change detection — when the scheduler checks whether any source tables have changed, it now issues one query covering all sources at once instead of one query per source table. On deployments with 50+ source tables this meaningfully reduces the overhead of each scheduler cycle, especially under PgBouncer transaction pooling.
[0.18.0] — 2026-04-12
Hardening & Delta Performance. This release focuses on correctness,
reliability, and giving operators better visibility into what pg_trickle is
doing. Stream tables that group by columns containing NULL values now refresh
correctly in all cases. A new memory safety net prevents runaway refreshes
from consuming too much RAM. Error messages across the board now explain what
went wrong and suggest how to fix it. Two new SQL functions —
health_summary() and cache_stats() — give you a single-query overview of
the entire system, and updated Grafana dashboards make monitoring plug-and-play.
The TPC-H industry benchmark now runs as a nightly regression guard, and
property-based tests mathematically verify the core delta engine's arithmetic.
Highlights
-
NULL values in GROUP BY now handled correctly — previous versions could produce wrong results when a stream table grouped by a column that contained NULL values and rows were deleted. The root cause was that NULL group keys broke the internal row-matching logic. This is now fixed: NULL keys are matched correctly during both inserts and deletes, so aggregate stream tables always return the right answer regardless of NULLs in the data.
-
Memory safety net for large deltas — if an unexpectedly large batch of changes arrives (for example, a bulk import into a source table), the incremental refresh could previously consume unbounded memory. A new configuration option (
pg_trickle.delta_work_mem_cap_mb) lets you set a ceiling. When a refresh would exceed it, pg_trickle automatically falls back to a full refresh instead of risking an out-of-memory crash. -
Early warning when refreshes spill to disk — when the incremental refresh engine runs low on memory, PostgreSQL may spill intermediate data to temporary files on disk, which is much slower. pg_trickle now detects this and sends a notification so you can investigate before performance degrades. If spilling happens repeatedly, the scheduler automatically switches the affected stream table to full refresh.
-
One-query system health check — the new
pgtrickle.health_summary()function returns a single row with everything you need at a glance: how many stream tables are active, how many are in error or suspended state, the worst staleness across all tables, whether the scheduler is running, and the overall cache hit rate. Perfect for dashboards, alerting rules, or a quick manual check. -
Cache performance visibility — the new
pgtrickle.cache_stats()function shows how effectively pg_trickle is reusing its internal query templates. You can see cache hit rates, eviction counts, and current cache size — useful for tuningpg_trickle.template_cache_sizeon busy systems. -
Better error messages — every error pg_trickle can raise now includes a standard PostgreSQL error code (SQLSTATE), a DETAIL line explaining the context, and a HINT suggesting what to do. Instead of a cryptic internal error, you get actionable guidance like "Table 'orders' was dropped while stream table 'order_summary' depends on it — recreate the source table or drop the stream table."
Monitoring & dashboards
-
Updated Grafana dashboards — the bundled
pg_trickle_overview.jsondashboard now includes panels for template cache hit rate, P99 and average refresh latency, hourly refresh success/failure counts, and cache eviction trends. Import it into Grafana and point it at your Prometheus instance for instant visibility. -
Prometheus metric documentation — all 8 new metrics exposed by
cache_stats()andhealth_summary()are now fully documented in the monitoring guide, with ready-to-use PromQL queries.
Correctness & testing
-
TPC-H regression guard — all 22 queries from the TPC-H industry benchmark now run nightly against known-good expected output. If a code change causes any query to return different results, CI fails immediately. This catches subtle correctness regressions that targeted tests might miss.
-
Mathematical proof of delta arithmetic — 6 property-based tests (2,000 random cases each) verify that the core engine's insert/delete accounting is correct: operations compose in the right order, groups cancel out properly, and no phantom rows appear after mixed workloads. An additional 4 end-to-end property tests exercise the full pipeline from change capture through to the final merged result.
-
CDC edge case coverage — new tests cover composite primary keys, generated (computed) columns, NULL values in non-key columns, and domain types — real-world schema patterns that were previously untested.
-
dbt integration tests — the dbt adapter now has regression tests for AUTO refresh mode, stream table health checks, and refresh history lifecycle — ensuring the dbt workflow stays reliable across releases.
Scalability
-
Scaling guide — a new
docs/SCALING.mddocument covers how to configure pg_trickle for large deployments (200+ stream tables), including worker pool sizing, tiered scheduling, per-database quotas, and tuning profiles for different workload types. -
Buffer growth stress tests — new tests verify that the
max_buffer_rowssafety limit works correctly under sustained high write rates, including automatic recovery back to incremental refresh after a burst subsides.
Testing infrastructure
-
Faster CI on pull requests — 19 additional test files (~197 tests) were moved to the lightweight test runner that does not require building a custom Docker image. Pull request CI is now faster without sacrificing coverage.
-
Upgrade path tested — the full upgrade chain from version 0.1.3 through every release up to 0.18.0 is verified automatically in CI, including function availability, schema integrity, and data survival.
Fixed
- Upgrade script completeness — the 0.17.0→0.18.0 upgrade migration now
includes all new and changed functions (
pg_trickle_hash,cache_stats(),health_summary()), soALTER EXTENSION pg_trickle UPDATEworks correctly.
[0.17.0] — 2026-04-08
Query Intelligence & Stability. This release teaches pg_trickle to make smarter decisions about how to refresh each stream table, reduces unnecessary work when only a handful of columns actually changed, and proves correctness through 10,000 automated random mutations every night. Large deployments with hundreds of stream tables now handle schema changes much faster. Alongside these improvements, three new documentation resources make it easier to get started, troubleshoot problems, and migrate from pg_ivm.
Highlights
-
Query-aware refresh decisions — pg_trickle previously used a fixed threshold to decide between incremental and full refresh: if more than 50% of rows changed, switch to full. That works for simple queries but is poorly calibrated for joins or aggregates. The engine now classifies each query by its complexity (simple scan, filter, aggregate, join, or join+aggregate) and weights the cost estimate accordingly. Simple queries stay incremental even at high change rates; expensive join-heavy queries switch to full refresh sooner when the data is largely different. You can also pin a table to always use one strategy with the new
pg_trickle.refresh_strategysetting ('auto'/'differential'/'full'), or tune the aggressiveness withpg_trickle.cost_model_safety_margin. -
Skip columns that did not change — when a row is updated in a wide source table (say, 50 columns) but only 2 columns that the stream table actually uses are modified, pg_trickle previously processed the full change anyway. It now tracks exactly which columns were modified and skips updates that touch none of the relevant columns. For aggregate stream tables the savings go further: a value-only update that does not affect group membership is applied as a single lightweight correction instead of a delete-then-insert pair. On write-heavy workloads with wide tables, this reduces the volume of data flowing through the refresh pipeline by 50–90%.
-
Faster schema changes on large deployments — every time you create, alter, or drop a stream table, pg_trickle previously rebuilt the entire internal dependency graph from scratch. With 100 stream tables that takes only a few milliseconds, but at 1,000 it becomes noticeable. The graph is now updated incrementally — only the affected edges are touched, leaving everything else in place. At 1,000 stream tables the rebuild time drops from ~600 µs to ~116 µs and no longer scales with the total number of tables in the database.
-
Nightly correctness oracle — a new automated test runs 10,000 random data mutations every night against a broad set of query shapes. For each mutation it compares the result of incremental refresh against a full recompute and fails if they ever disagree. This catches subtle correctness bugs that only surface after unusual sequences of inserts, updates, and deletes — the kind that hand-written tests rarely reach.
-
ROWS FROM()fully supported — queries that useROWS FROM()to call multiple set-returning functions side-by-side are now fully supported in incremental mode, including updates and deletes. This was previously restricted to insert-only workloads.
New documentation
-
Try it in 60 seconds — a new
playground/directory contains adocker compose upenvironment with PostgreSQL 18 + pg_trickle pre-wired, sample data loaded, and five stream tables ready to query. No installation required beyond Docker. -
Troubleshooting runbook —
docs/TROUBLESHOOTING.mdcovers 13 real-world failure scenarios: scheduler not running, stream table stuck in SUSPENDED state, CDC triggers missing, WAL slot problems, out-of-memory, disk full, circular dependency convergence issues, unexpected schema changes, worker pool exhaustion, and blown fuses. Each scenario lists symptoms, diagnostic queries, and step-by-step resolution. -
Migrating from pg_ivm —
docs/tutorials/MIGRATING_FROM_PG_IVM.mdis a step-by-step guide for teams moving from the pg_ivm extension. It maps every pg_ivm API to its pg_trickle equivalent, explains behavioral differences, and includes ready-to-run SQL examples and a post-migration verification checklist. -
New user FAQ — the top 15 common questions are now answered at the top of
docs/FAQ.mdso new users find answers before scrolling through the full document. -
Post-install verification script —
scripts/verify_install.sqlwalks through the complete setup: checks that pg_trickle is loaded, creates a test stream table, runs a refresh, verifies the result, and cleans up. Useful for confirming a fresh installation or diagnosing environment issues.
Stability & code quality
-
Safer internal code — the number of
unsafeRust blocks in the query parser was reduced from 690 to 441 (a 36% drop) by introducing two helper macros that wrap the most common unsafe patterns. No behavior change; this makes the codebase easier to audit and maintain. -
Cleaner internal structure — the largest source file (
api.rs, ~9,400 lines) was split into three focused modules. This has no user-visible effect but makes the codebase significantly easier to work with and reduces the risk of regressions from unrelated code being in the same file. -
Refresh logic extracted and tested — seven functions responsible for building the SQL used during refresh were extracted into standalone testable units and covered with 29 new unit tests. This catches regressions in generated SQL templates before they reach production.
[0.16.0] — 2026-04-06
Performance & Refresh Optimization. This release makes stream table
refreshes significantly faster across the board. Small changes to large
tables are now applied without expensive full-table scans. Tables that only
receive new rows (no updates or deletes) use a streamlined path that skips
unnecessary work. Aggregate queries like SUM and COUNT are refreshed
with pinpoint updates instead of recalculating entire groups. A new template
cache eliminates repeated startup work when database connections are recycled.
An automated benchmark system now prevents future changes from accidentally
slowing things down.
Highlights
-
Smarter refresh for small changes — when only a handful of rows change in a large stream table (less than 1% of total rows), pg_trickle now uses a faster strategy that skips the full-table comparison. This can reduce refresh time by up to 40% for common workloads where most data stays the same between refreshes. The system picks the best strategy automatically, but you can override it via the
merge_strategysetting. -
Insert-only fast path — stream tables backed by append-only data sources (like event logs or audit trails that never update or delete rows) are now detected automatically and refreshed using a much simpler, faster path. No configuration is needed — pg_trickle observes your data patterns and switches to the fast path on its own. If an update or delete is later detected, it safely falls back to the standard approach with a warning.
-
Faster aggregate refreshes — stream tables that use
SUM,COUNT,AVG, orSTDDEVaggregates now update individual groups directly instead of re-joining against the entire table. For queries with many distinct groups, this can be 5–20× faster. Non-invertible aggregates likeMIN,MAX, andSTRING_AGGcontinue using the standard path. -
Template cache for faster cold starts — the first time a database connection refreshes a stream table, pg_trickle normally spends ~45 ms preparing the refresh query. A new cross-connection cache stores these prepared queries so that subsequent connections (including those from connection poolers like PgBouncer) start refreshing in about 1 ms instead.
-
Automated performance regression checks — every code change to pg_trickle is now automatically benchmarked before it can be merged. If any operation slows down by more than 10%, the change is blocked until the regression is fixed. This protects users from accidental performance degradation in future releases.
New features
-
Error reference guide — a new error reference page documents every error message pg_trickle can produce, explains what caused it, and suggests how to fix it. Useful when troubleshooting unexpected behavior in production.
-
Change buffer growth protection — if a stream table's refresh keeps failing, the backlog of unprocessed changes could previously grow without limit, consuming disk space. A new
max_buffer_rowssetting (default: 1,000,000 rows) caps this growth. When the limit is reached, pg_trickle performs a full refresh to clear the backlog and warns you about the situation. -
Automatic index creation control — pg_trickle has always created helpful indexes on stream tables automatically. A new
auto_indexsetting lets you disable this behavior when you want full control over indexing. Stream tables usingSELECT DISTINCTnow also get an automatic index on their distinct columns. -
Compaction and predicate pushdown stats — the
explain_st()diagnostics function now shows additional information about change buffer compaction thresholds, merge strategy selection, append-only mode, aggregate fast-path status, and template cache hit rates.
Improved
-
Configuration guidance — the documentation now includes detailed tuning advice for the
planner_aggressiveandcleanup_use_truncatesettings, especially for environments using connection poolers like PgBouncer or running under memory pressure. -
Terminal dashboard improvements — the
pgtrickleTUI dashboard now shows the effective refresh mode for each stream table (e.g., when a table is temporarily downgraded from differential to full refresh). The Alerts tab has been restructured with a clearer table layout and better distinction between "stale data" and "no upstream changes" conditions.
Fixed
-
Append-only detection with chained stream tables — stream tables that feed into other stream tables (cascading dependencies) now correctly skip the append-only fast path to avoid data inconsistencies. Previously, a chained stream table could incorrectly use the insert-only path even when downstream tables needed the full change set.
-
Append-only heuristic accuracy — the automatic detection of insert-only data sources now also checks the stream table's own change buffer for non-insert operations, avoiding false positives.
-
Full refresh fallback for mixed changes — when both a stream table and its source table have pending changes in the same refresh cycle, pg_trickle now correctly falls back to a full refresh to avoid inconsistencies.
-
resume_stream_table()confirmed working — the function referenced in error messages when a stream table entersSUSPENDEDstate was verified to exist and work correctly (present since v0.2.0).
Testing & quality
- 13 new end-to-end tests covering JOIN correctness across update/delete cycles, window function differential behavior, differential-vs-full equivalence validation, and source table schema evolution resilience.
- 5 new benchmark scenarios covering semi-joins, anti-joins, multi-table join chains, and aggregate queries at varying group counts. Total: 22 benchmark functions.
- 1,700 unit tests pass (up from 1,630 in v0.15.0).
[0.15.0] — 2026-04-03
0.15.0 brings the terminal dashboard to full operational capability, adds safety features that protect against runaway refreshes, and broadens the ecosystem with guides for popular migration and ORM frameworks. It also includes a major internal refactoring of the query parser and a new streaming benchmark suite.
Highlights
-
Interactive terminal dashboard — the
pgtrickleTUI is no longer read-only. Refresh, pause, resume, and repair stream tables directly from the dashboard. A command palette (:) with fuzzy search makes common operations fast. The poller reconnects automatically after network interruptions. -
Bulk creation —
pgtrickle.bulk_create()creates many stream tables in a single atomic transaction, ideal for CI/CD and dbt pipelines. -
Runaway-refresh protection — two new safety nets prevent expensive merges from spiralling: a pre-flight row-count estimate that downgrades to FULL refresh when deltas are too large (
max_delta_estimate_rows), and a spill detector that forces FULL refresh after repeated temp-file writes (spill_threshold_blocks). -
Stuck-watermark alerting — if an upstream ETL pipeline stops advancing its watermark, pg_trickle now pauses affected stream tables and sends a
watermark_stucknotification so the issue is surfaced immediately rather than silently producing stale data. -
Integration guides — new documentation for Flyway, Liquibase, SQLAlchemy, Django, and dbt Hub helps teams adopt pg_trickle alongside their existing tooling.
New Features
-
Volatile function policy — a new
volatile_function_policysetting lets you choose whether volatile functions (likerandom()orclock_timestamp()) should be rejected (the default), allowed with a warning, or allowed silently when creating stream tables. -
Bulk create API —
pgtrickle.bulk_create(definitions)accepts a JSON array of stream table definitions and creates them all in one transaction. If any definition fails, the entire batch is rolled back. -
Enhanced diagnostics —
pgtrickle.explain_st()now shows refresh timing statistics (min/max/average duration), partition info for partitioned source tables, and a dependency graph you can render with Graphviz. -
Join strategy override — the
merge_join_strategysetting lets you force a specific join method (hash_join,nested_loop, ormerge_join) during delta merges, which can help when the automatic heuristic doesn't suit your workload. -
Pre-flight delta estimation — when
max_delta_estimate_rowsis set, pg_trickle counts the delta rows before merging. If the count exceeds the limit, it falls back to a FULL refresh and logs a notice, preventing out-of-memory conditions on unexpectedly large change sets. -
Spill-aware refresh — if differential merges spill to disk repeatedly (controlled by
spill_threshold_blocksandspill_consecutive_limit), the scheduler switches to FULL refresh automatically. -
Stuck watermark hold-back — the
watermark_holdback_timeoutsetting detects watermarks that have not advanced within a configurable window. Downstream stream tables are paused and awatermark_stucknotification is emitted until the watermark advances again. -
Cascade drop —
drop_stream_table()now accepts an optionalcascadeparameter (defaulttrue). Setting it tofalseraises an error if dependent stream tables exist, matching PostgreSQL's RESTRICT behavior. -
Nexmark benchmark suite — a 10-query streaming benchmark (modelled on an online auction system) validates correctness under sustained high-frequency inserts, updates, and deletes.
-
17 new end-to-end tests — 7 tests for multi-level stream-table chains (3- and 4-level cascades with mixed refresh modes) and 10 tests for diamond/fan-in topologies with IMMEDIATE mode. No deadlocks were found.
Terminal Dashboard (TUI)
- Write actions — refresh, pause, resume, repair, reset fuse, and gate/ungate operations can now be performed without leaving the dashboard.
- Command palette — press
:for fuzzy-matched command entry with tab-completion. - Automatic reconnection — the dashboard reconnects with exponential back-off (up to 15 s) after a connection loss, with a visual indicator.
- Richer views — all 14 views now show additional live data (diagnostics, CDC health, refresh history with row-delta counts, error remediation hints, dependency-graph annotations, worker queue status, and watermark alignment).
- Cross-view filtering — the
/search filter now persists across all 10 list views. - Navigation re-fetch — moving between rows in the Detail view immediately fetches fresh data for the selected table.
- Toast messages — write actions show confirmation and error toasts.
- Sort cycling — press
s/Son the Dashboard to cycle through 6 sort modes. - Mouse support —
--mouseenables scroll-wheel navigation. - Theme toggle —
tor--theme dark|lightswitches colour themes. - JSON export —
Ctrl+Eor:exportwrites the current view to a file. - TLS support —
--sslmodeand--sslrootcertflags.
Documentation & Ecosystem
- Flyway / Liquibase guide — migration patterns for versioned and repeatable migrations, rollback blocks, and CI environments.
- SQLAlchemy / Django guide — read-only model patterns, write-blocking safeguards, DRF viewsets, and freshness checking.
- dbt Hub readiness — the
dbt-pgtricklepackage is version-synced and ready for dbt Hub submission. - Kubernetes / CNPG — updated probe configuration and a new deployment section in the Getting Started guide.
- Full documentation review — configuration reference expanded from 23 to 40+ settings, missing SQL reference entries filled in, outdated FAQ answers corrected.
Internal Improvements
- Parser modularisation — the 21 000-line query parser has been split
into 5 focused sub-modules (
types,validation,rewrites,sublinks, and the main entry point). No behavior change — all 1 687 unit tests pass. - Unsafe audit — every
unsafeblock in the codebase (~750 total) now has a// SAFETY:comment explaining why it is sound. - Shared-memory cache RFC — an RFC for a DSM-based MERGE template cache has been written, informing the v0.16.0 implementation plan.
- TRUNCATE handling verified — TRUNCATE on source tables in trigger CDC mode already triggers a FULL refresh; this is now documented.
- JOIN key-change fix verified — the v0.14.0 correctness fix for simultaneous JOIN key updates and DELETEs has been verified working and the former known-limitation note replaced with a description of the fix.
Bug Fixes
- Fixed a panic in the TUI when deserializing health-check data that returned 64-bit integers where 32-bit was expected.
- Fixed spurious "Error: db error" toasts in the TUI Detail view — background queries now degrade silently instead of surfacing transient errors.
- Fixed incorrect integer type annotations in two E2E tests for IMMEDIATE mode diamond topologies.
[0.14.0] — 2026-04-02
0.14.0 is the Tiered Scheduling, Diagnostics & TUI release. It gives you fine-grained control over how often each stream table refreshes, adds tools that recommend the best refresh strategy for your workload, introduces a full-screen terminal dashboard for managing stream tables without SQL, and includes important security and reliability fixes.
Terminal Dashboard (TUI)
A new pgtrickle command-line tool lets you monitor and manage stream tables
from a terminal — no SQL required. Run it with no arguments to launch a
live-updating full-screen dashboard (think htop for stream tables), or use
one-shot subcommands like pgtrickle list, pgtrickle status, or
pgtrickle refresh for scripting and CI.
The interactive dashboard includes:
- Live overview — stream table statuses, refresh timing, and issue counts update every 2 seconds, with color-coded health indicators.
- Dependency graph — see how stream tables relate to each other in an ASCII tree view.
- Diagnostics — view refresh mode recommendations with confidence levels.
- CDC health — monitor change buffer sizes with warnings when they grow too large.
- Alert feed — real-time notification display with severity levels.
- Issue detection — automatically spots broken dependency chains, growing buffers, blown fuses, and stale data, with a persistent badge showing the issue count from any view.
- Watch mode —
pgtrickle watchprovides continuous non-interactive output suitable for log aggregation. - Output formats — all CLI subcommands support
--format json,--format csv, and human-readable table output.
See docs/TUI.md for the full user guide.
Tiered Refresh Scheduling
Stream tables can now be assigned to refresh tiers — hot, warm, cold, or frozen — to control how frequently they refresh:
- Hot (default) — refreshes at the configured interval.
- Warm — refreshes at 2× the interval.
- Cold — refreshes at 10× the interval, ideal for infrequently accessed reports.
- Frozen — pauses automatic refresh entirely until promoted back.
Assign a tier with
ALTER STREAM TABLE ... SET (tier = 'cold'). A NOTICE is emitted when
demoting from Hot to Cold or Frozen so operators are aware of the change in
refresh frequency.
Smarter Refresh Recommendations
Two new diagnostic functions help you choose the most efficient refresh strategy for each stream table:
-
pgtrickle.recommend_refresh_mode(name)— analyzes seven workload signals (change frequency, timing history, query complexity, table size, index coverage, and latency patterns) and recommends FULL or DIFFERENTIAL mode with a confidence level and plain-language explanation. Useful when you're unsure which mode will be faster for a particular table. -
pgtrickle.refresh_efficiency(name)— shows per-table refresh performance: how many FULL vs. DIFFERENTIAL refreshes have run, average timing for each, and the speedup factor. Good for monitoring dashboards and alerting.
A new tutorial — Tuning Refresh Mode — walks through the process step by step.
Reduced Write Overhead with UNLOGGED Buffers
Enable pg_trickle.unlogged_buffers = true and newly created change buffer
tables will skip write-ahead logging, reducing WAL volume by roughly 30%.
This is ideal for workloads where you can tolerate a full re-sync after a
crash (the extension detects the crash and re-syncs automatically).
A utility function — pgtrickle.convert_buffers_to_unlogged() — converts
existing buffers in one call. Run it during a maintenance window since it
briefly locks each buffer table.
Instant Error Detection
Previously, when a stream table's refresh hit a permanent error (for example,
a function that doesn't exist for the column type), the extension would retry
several times before giving up. Now it recognizes permanent errors immediately,
sets the stream table status to ERROR with a clear error message, and
stops retrying. You can see the error at a glance in the stream_tables_info
view or the TUI dashboard, and fix it by altering the stream table's query.
Security Hardening
- CDC trigger functions now use
SECURITY DEFINER— change-data-capture trigger functions run with the privileges of the extension owner rather than the current user, preventing privilege escalation through modified search paths. - Explicit
SET search_path— all CDC trigger functions now setsearch_pathtopgtrickle_changes, pg_catalogto prevent search-path manipulation attacks.
Other Improvements
-
Export definitions —
pgtrickle.export_definition(name)exports a stream table's full configuration as reproducible SQL (DROP+CREATE+ALTERstatements), making it easy to version-control or migrate stream table definitions between environments. -
Creation-time warnings — when creating a stream table with aggregates like
MIN,MAX, orSTRING_AGGin DIFFERENTIAL mode, a warning now suggests that FULL or AUTO mode may be more efficient. For algebraic aggregates (SUM/COUNT/AVG), the warning only appears when the estimated number of groups is below a configurable threshold. -
Simplified settings — the
merge_planner_hintsandmerge_work_mem_mbsettings have been consolidated into a singleplanner_aggressiveswitch. The old setting names still work but are ignored in favor of the new one. -
GHCR Docker image — a multi-architecture Docker image (
ghcr.io/grove/pg_trickle) with PostgreSQL 18.3 and pg_trickle pre-installed is now published automatically on each release. -
Pre-deployment checklist — new PRE_DEPLOYMENT.md with a 10-point checklist for production deployments.
-
Best-practice patterns guide — new PATTERNS.md with 6 common patterns: Bronze/Silver/Gold materialization, event sourcing, slowly-changing dimensions, high-fan-out topology, real-time dashboards, and tiered refresh strategies.
-
Keyless dedup fix — replaced
MAX(col)witharray_agg(col)[1]for deduplicating keyless scan results, which is more correct for non-orderable types.
Bug Fixes
-
ST-on-ST differential refresh — manually refreshing a stream table that reads from another stream table now uses true incremental (DIFFERENTIAL) refresh instead of falling back to a full re-scan. This matches the behavior of the automatic scheduler and is significantly faster for large tables.
-
Staleness tracking — the staleness indicator now uses the actual last refresh time instead of an internal data timestamp, making the
pg_stat_stream_tablesview more accurate.
Testing & Reliability
-
Soak test — a new long-running stability test validates zero worker crashes, zero ERROR states, and stable memory usage under sustained mixed workload (configurable duration, default 10 minutes).
-
Multi-database isolation test — verifies that two databases in the same PostgreSQL cluster run pg_trickle independently without interference.
-
140 TUI tests — comprehensive unit, snapshot, and interaction tests for the terminal dashboard.
-
23 mixed-object E2E tests — validates stream tables alongside regular PostgreSQL views, materialized views, and other objects.
-
Scheduler race fixes — eliminated flaky test failures caused by scheduler timing races and GUC leak between tests.
New SQL Functions
| Function | Purpose |
|---|---|
pgtrickle.recommend_refresh_mode(name) | Workload-based refresh mode recommendation |
pgtrickle.refresh_efficiency(name) | Per-table refresh performance metrics |
pgtrickle.export_definition(name) | Export stream table as reproducible DDL |
pgtrickle.convert_buffers_to_unlogged() | Convert logged change buffers to UNLOGGED |
New Settings
| Setting | Default | Purpose |
|---|---|---|
pg_trickle.planner_aggressive | true | Consolidated switch for MERGE planner hints |
pg_trickle.unlogged_buffers | false | Create new change buffers as UNLOGGED |
pg_trickle.agg_diff_cardinality_threshold | 1000 | Warn about DIFFERENTIAL mode below this group count |
Deprecated
pg_trickle.merge_planner_hints— Usepg_trickle.planner_aggressiveinstead. Still accepted but ignored at runtime.pg_trickle.merge_work_mem_mb— Same; useplanner_aggressiveinstead.
Upgrading
Run ALTER EXTENSION pg_trickle UPDATE; after installing the new binaries.
The upgrade adds new catalog columns, functions, and the TUI workspace member.
No breaking changes — everything from v0.13.0 continues to work. See
UPGRADING.md for details.
[0.13.0] — 2026-03-31
0.13.0 is the Scalability Foundations release. It makes pg_trickle handle large tables, complex queries, and multi-tenant deployments much more efficiently — and it achieves a major milestone: all 22 TPC-H benchmark queries now run in incremental (DIFFERENTIAL) mode, meaning the engine no longer needs to fall back to slow full-refresh for any standard analytical query pattern.
Smarter Change Detection for Wide Tables
When you UPDATE a few columns in a large table — say, changing a status
column in a 60-column table — pg_trickle used to treat every column as
potentially changed, doing extra work to keep all downstream views up to date.
Now it knows the difference. Columns used in GROUP BY, JOIN, or WHERE clauses are "key columns"; everything else is a "value column." When only value columns change, the engine takes a shortcut: it sends a single correction row instead of a full delete-and-reinsert pair. For wide-table workloads, this can cut the volume of data processed by 50% or more.
Shared Change Buffers
If you have several stream tables watching the same source table, each one used to maintain its own private copy of the change log. That's wasteful. Now they share a single change buffer per source, and each consumer simply tracks how far it has read. The slowest reader protects the buffer for everyone.
You can see how this is working with the new pgtrickle.shared_buffer_stats()
function — it shows each buffer, who's reading from it, how many rows are
queued, and whether it's been automatically partitioned for performance.
Automatic Buffer Partitioning
Set pg_trickle.buffer_partitioning = 'auto' and pg_trickle will start with
simple, unpartitioned change buffers. If a buffer starts accumulating a lot of
rows (high-throughput sources), it automatically converts to a partitioned
layout where old data can be removed almost instantly instead of deleting rows
one by one.
More Partitioning Options for Stream Tables
Building on the RANGE partitioning added in v0.11.0, you can now partition stream tables in three additional ways:
- Multi-column keys — partition by a combination of columns
(
partition_by='region,year') - LIST partitioning — for low-cardinality columns like
statusortype(partition_by='LIST:status') - HASH partitioning — for even distribution across a fixed number of
partitions (
partition_by='HASH:customer_id:8')
You can also change the partition key of an existing stream table at runtime
with alter_stream_table(partition_by => ...) — data is preserved
automatically. If rows land in the default (catch-all) partition, a WARNING
is emitted to prompt you to add explicit partitions.
All 22 TPC-H Queries Now Run Incrementally
The DVM (differential view maintenance) engine received its most significant set of improvements yet, targeting the complex multi-table join patterns found in standard analytical benchmarks:
- Smarter pre-image lookups — instead of reconstructing what the data looked like before a change by subtracting deltas (expensive for large tables), the engine now uses targeted index lookups that only touch the rows that actually changed.
- Predicate pushdown — WHERE conditions from the original query are now pushed into the delta computation, preventing unnecessary cross-products in multi-table joins.
- Deep-join optimizations — queries joining 5+ tables get automatic planner hints (more memory, smarter join strategies) to avoid spilling to disk.
- Scan-count-aware strategy selector — queries that exceed configurable join complexity or delta volume thresholds automatically fall back to full refresh on a per-query basis rather than failing.
The result: all 22 TPC-H queries pass at SF=0.01 in DIFFERENTIAL mode
with zero drift across 3 refresh cycles. The DIFFERENTIAL_SKIP_ALLOWLIST
(queries that previously required full refresh) is now empty.
Refresh Performance Inspection Tools
Two new functions help you understand what pg_trickle is doing under the hood:
pgtrickle.explain_delta(name, format)— shows you the query plan for the auto-generated delta SQL, the same wayEXPLAINworks for regular queries. Available in text, JSON, XML, or YAML format.pgtrickle.dedup_stats()— reports how often concurrent writes produce duplicate entries that need pre-processing before the MERGE step.
Multi-Tenant Worker Quotas
New setting: pg_trickle.per_database_worker_quota — if you run many
databases on one PostgreSQL cluster, this prevents a busy database from
monopolizing all the refresh workers. Workers are assigned by priority
(immediate-mode tables first, then hot, warm, and cold), with burst capacity
up to 150% when other databases are idle.
TPC-H Benchmark Harness
You can now measure refresh performance across all 22 TPC-H queries in a
structured way. Run just bench-tpch to get per-query timing, FULL vs.
DIFFERENTIAL comparison, and P95 latency numbers. Five synthetic benchmarks
(q01, q05, q08, q18, q21) also measure the pure Rust delta-SQL
generation time without needing a database.
Broader SQL Support
IS JSONpredicates (PG 16+) — expressions likeexpr IS JSON OBJECTnow work in incremental mode.- SQL/JSON constructors (PG 16+) —
JSON_OBJECT(...),JSON_ARRAY(...),JSON_OBJECTAGG(...), andJSON_ARRAYAGG(...)are now accepted. - Recursive CTEs — recursive queries with non-monotone operators (like
EXCEPT) correctly fall back to full refresh instead of producing wrong results.
dbt Integration Updates
If you use dbt-pgtrickle, you can now set partitioning and fuse options directly from dbt model config:
{{ config(partition_by='customer_id') }}for partitioned stream tables{{ config(fuse='auto', fuse_ceiling=100000, fuse_sensitivity=3) }}for circuit-breaker protection
Bug Fixes
- Scheduler cascade fix — stream tables downstream of FULL-mode upstream
tables now detect changes correctly via a
last_refresh_atfallback, preventing stale data in chains where the upstream uses full refresh. - SUM(CASE WHEN ...) drift fix — aggregate expressions using CASE were occasionally producing slightly wrong incremental results; these are now correctly detected and processed via a group rescan.
- Duplicate column DDL fix — removed a duplicate column definition in the
pgt_stream_tablesDDL that could cause issues on fresh installs.
Testing Improvements
- New regression test suite targeting 9 structural weaknesses: join multi-cycle correctness (7 tests), differential-equals-full equivalence (11 tests), DVM operator execution, failure recovery, and MERGE template unit tests.
- E2E test infrastructure now uses template databases, cutting per-test setup time significantly.
New SQL Functions
| Function | Purpose |
|---|---|
pgtrickle.explain_delta(name, format) | Show the query plan for the delta SQL |
pgtrickle.dedup_stats() | MERGE deduplication frequency counters |
pgtrickle.shared_buffer_stats() | Per-source change buffer status |
pgtrickle.explain_refresh_mode(name) | Why a stream table uses its current refresh mode |
pgtrickle.reset_fuse(name) | Reset a blown circuit-breaker fuse |
pgtrickle.fuse_status() | Fuse state across all stream tables |
New Catalog Columns
Ten new columns on pgtrickle.pgt_stream_tables:
| Column | Purpose |
|---|---|
effective_refresh_mode | The actual refresh mode after AUTO resolution |
fuse_mode | Circuit-breaker configuration (off / auto / manual) |
fuse_state | Current fuse state (armed / blown) |
fuse_ceiling | Maximum change count before fuse blows |
fuse_sensitivity | Consecutive cycles above ceiling before triggering |
blown_at | When the fuse last blew |
blow_reason | Why the fuse blew |
st_partition_key | Partition key specification |
max_differential_joins | Maximum join count for differential mode |
max_delta_fraction | Maximum delta-to-table ratio for differential mode |
Upgrading
Run ALTER EXTENSION pg_trickle UPDATE; after installing the new binaries.
All new columns and functions are added automatically. No breaking changes —
everything from v0.12.0 continues to work as before. See
UPGRADING.md for details.
[0.12.0] — 2026-03-28
0.12.0 is a correctness, reliability, and developer-experience release built on top of 0.11.0's major new features. It closes the last known wrong-answer bugs for complex join queries, adds tools to help you understand and debug stream table behavior, hardens the scheduler against several edge cases that could cause stale data or crashes, and backs it all with thousands of new automatically generated tests.
Stale Rows Fixed in Stream-Table Chains
What was the problem? When a stream table (B) reads from another stream table (A), each change in A is recorded as a small "what changed" entry — a row added or removed. But the identity key used for those entries was computed differently inside the change buffer than it was inside B's own storage. As a result, when A changed via an upstream UPDATE, B's refresh could silently fail to delete the old version of a row, leaving a stale duplicate.
What changed? The change buffer now computes row identity the same way B does — using a hash of all the data columns rather than the upstream source's primary key. Stale rows after UPDATE no longer appear in stream-table chains. This bug was found and confirmed by the new property-based test suite (see below).
Phantom Rows Fixed for Complex Joins (TPC-H Q7 / Q8 / Q9)
What was the problem? When a stream table's query joins three or more tables together and rows are deleted from more than one join side at the same time, the incremental engine could silently drop the correction — leaving rows in the stream table that should have been removed.
This affected TPC-H queries Q7, Q8, and Q9 (which all involve deep join trees), and any user query with a similar multi-table join structure. A temporary workaround (falling back to full refresh for wide joins) was in place since v0.11.0 and has now been lifted.
What changed? The incremental engine now takes an individual "before snapshot" for each leaf table in the join tree — each one cheaply computed from a single-table comparison — and re-joins them after the delete. This avoids writing multi-gigabyte temp files to disk (the root cause of the original workaround) and eliminates the phantom-row bug entirely. Q7, Q8, and Q9 now run in differential mode without any workarounds.
Type Errors Fixed in Parallel Refresh Chains
What was the problem? When a chain of stream tables is fused into a single
execution unit for efficiency (the "bypass" optimisation added in v0.11.0),
the internal bypass table used text for every column regardless of the
actual column type. This caused an operator does not exist: text > integer
error whenever a downstream stream table had a type-sensitive WHERE clause
(e.g. WHERE amount > 100), making the parallel worker tests fail silently
across all topologies that included a fused chain.
What changed? Bypass tables now use the real column types. The six parallel-worker benchmark tests now complete in 9–26 seconds rather than timing out after 120 seconds.
Scheduler Fixes for Diamond and ST-on-ST Topologies
Two scheduler bugs that caused incorrect refresh behavior with complex dependency graphs were fixed:
-
Diamond timeout. In a diamond topology (A → B, A → C, B+C → D), the L1 arm stream tables (B and C) were created with a 1-minute fixed interval rather than a calculated schedule. This meant D never received updates within the test window. The scheduler also had a bug loading stream table records by ID that caused silent failures in parallel worker paths. Both are fixed.
-
ST-on-ST parallel workers. When an upstream stream table changed, the parallel worker paths (singleton, atomic group, immediate closure, fused chain) were not forcing a full refresh on downstream stream tables the way the main scheduler loop did. This could leave downstream tables stale. The fix ensures all parallel paths treat upstream stream-table changes the same way.
Four New Diagnostic Functions
When stream table behavior is unexpected — wrong refresh mode, a query being rewritten in a surprising way, persistent errors — it previously required reading server logs or source code to understand why. Four new SQL functions expose that internal state directly in queries:
-
pgtrickle.explain_query_rewrite(query TEXT)— shows exactly how pg_trickle rewrites your query for incremental refresh: which operators were applied, how delta keys are injected, and how aggregates are classified. Useful for understanding why a query got a particular refresh mode. -
pgtrickle.diagnose_errors(name TEXT)— shows the last 5 errors for a stream table, each classified by type (correctness, performance, configuration, infrastructure) with a suggested fix. -
pgtrickle.list_auxiliary_columns(name TEXT)— lists the internal__pgt_*columns that pg_trickle injects into a stream table's query plan, with an explanation of each one's purpose. Helpful whenSELECT *returns unexpected extra columns. -
pgtrickle.validate_query(query TEXT)— analyses a SQL query and reports which refresh mode it would get, which SQL constructs were detected, and any warnings — all without creating a stream table.
Multi-Column IN (subquery) Now Gives a Clear Error
What was the problem? A query like WHERE (col_a, col_b) IN (SELECT x, y FROM …) passed validation but produced silently wrong results — the engine
was only matching on the first column and ignoring the second.
What changed? This construct is now detected at stream table creation time
and rejected with a clear error message that recommends rewriting it as
EXISTS (SELECT 1 FROM … WHERE col_a = x AND col_b = y).
IMMEDIATE Mode Proven Correct Under High Concurrency
IMMEDIATE mode (where the stream table updates inside the same transaction as the source table change) now has a dedicated concurrency stress test: 100–120 concurrent transactions firing simultaneously against the same source table, across five scenarios (all inserts, all updates to distinct rows, all updates to the same row, all deletes, and a mixed workload). Zero lost updates, zero phantom rows, and no deadlocks were observed in any run.
Protection Against Pathological Queries
A new guard prevents a particularly deep or convoluted query from consuming
all available stack space and crashing the database backend. When the query
analyser recurses more than 64 levels deep (configurable via
pg_trickle.max_parse_depth), it now returns a clear QueryTooComplex error
instead of crashing.
Tiered Scheduling Now On By Default
The tiered scheduling feature — which automatically slows down cold (infrequently-read) stream tables and speeds up hot ones — is now enabled by default. In large deployments this reduces the scheduler's CPU usage significantly. Stream tables you query often continue refreshing at full speed. Stream tables that nobody has read recently back off gracefully.
If you rely on all stream tables refreshing at the same rate regardless of
read frequency, set pg_trickle.tiered_scheduling = off.
Thousands of Automatically Generated Tests
Two new automated testing systems were added to complement the hand-written test suite:
-
Property-based tests — the test framework automatically generates thousands of random DAG shapes, schedule combinations, and edge cases and checks that the scheduler's ordering guarantees hold for all of them. If any configuration would cause a table to refresh in the wrong order or get spuriously suspended, these tests catch it.
-
SQLancer fuzzing — SQLancer generates random SQL queries and checks that pg_trickle's incremental result matches the result of running the same query directly in PostgreSQL. Any mismatch is automatically saved as a permanent regression test. A weekly CI job runs this continuously. At time of release, zero mismatches have been found.
CDC Write-Side Benchmark Published
A new benchmark suite measures the overhead that pg_trickle's change capture triggers add to your write workload. Results across five scenarios (single-row INSERT, bulk INSERT, bulk UPDATE, bulk DELETE, concurrent writers) are published in docs/BENCHMARK.md. Use these numbers to estimate the impact before deploying pg_trickle on a write-heavy table.
MERGE Template Validation at Test Startup
The SQL templates that pg_trickle generates for applying incremental changes
(the MERGE statements) are now validated with an EXPLAIN dry-run at every
test startup. If a code change accidentally produces a malformed MERGE
template, the tests catch it before any data is processed — rather than
manifesting as a cryptic runtime error.
[0.11.0] — 2026-03-26
This is the biggest release since the initial launch. The headline features are 34× lower latency for real-time workloads, stream-table chains that now refresh incrementally (no more forced full recomputation when one stream table feeds another), declarative partitioning to cut I/O on large tables by up to 100×, a ready-to-use Prometheus and Grafana monitoring stack, and a circuit breaker to protect production databases from runaway change bursts.
34× Lower Latency — Changes Arrive Instantly
Previously, the background worker woke up on a fixed timer every ~500 ms to check for new data, even when nothing had changed. Every change had to wait up to half a second in the change buffer before being processed.
Now, when a source table is modified, the change capture trigger immediately wakes the background worker via a PostgreSQL notification channel. The worker starts processing within ~15 ms of the write committing — a 34× improvement for low-volume workloads. Under heavy DML, a 10 ms debounce window coalesces rapid notifications so the worker isn't flooded.
Event-driven wake is on by default. You can turn it off
(pg_trickle.event_driven_wake = off) to revert to poll-based wake, and you can
tune the debounce window with pg_trickle.wake_debounce_ms (default 10).
Stream-Table-to-Stream-Table Chains Now Refresh Incrementally
Previously, when stream table B's query read from stream table A, pg_trickle had to do a full recomputation of B every time A changed — even if only a few rows in A actually changed. For long chains (A → B → C → D), every hop was a full re-scan.
Now, stream tables can read from other stream tables incrementally. When A refreshes, the rows it added and removed are recorded in a change buffer just like a base table. B wakes up, reads only the changed rows from A, and applies a delta — not a full recomputation. Even when A does a full refresh (e.g. because its query does not support differential mode), a before/after snapshot diff is captured automatically so downstream tables still receive a small insert/delete delta rather than cascading full refreshes through the chain.
Declaratively Partitioned Stream Tables
Stream tables can now be declared with a partition key:
SELECT create_stream_table(
'monthly_sales',
$$ SELECT month, region, SUM(amount) FROM orders GROUP BY 1, 2 $$,
partition_by => 'month'
);
pg_trickle creates a range-partitioned storage table and, when refreshing, automatically restricts the MERGE operation to only the partitions that contain changed rows. For large tables where changes touch only 2–3 out of 100 monthly partitions, this can reduce the MERGE I/O from 10 million rows to ~100,000 — a 100× improvement.
Ready-to-Use Prometheus and Grafana Monitoring
A complete observability stack is now included in the monitoring/ directory:
monitoring/prometheus/pg_trickle_queries.yml— drop-in configuration forpostgres_exporterthat exports 14 metrics covering refresh performance, CDC buffer sizes, staleness, error rates, and per-table status.monitoring/prometheus/alerts.yml— 8 alerting rules that page you when a stream table goes stale (> 5 min), starts error-looping (≥ 3 consecutive failures), is suspended, or when the CDC buffer exceeds 1 GB.monitoring/grafana/dashboards/pg_trickle_overview.json— a pre-built Grafana dashboard with six sections: cluster overview, refresh latency time-series, staleness heatmap, CDC lag, per-table drill-down, and scheduler health.monitoring/docker-compose.yml— brings up PostgreSQL + pg_trickle + postgres_exporter + Prometheus + Grafana with one command (docker compose up). Grafana opens at http://localhost:3000; the dashboard shows live metrics generated by a seed workload of stream tables continuously refreshing synthetic order and product data (seemonitoring/init/01_demo.sql).
No code changes are needed to use this stack with an existing pg_trickle installation.
Circuit Breaker (Fuse) — Protection Against Runaway Change Bursts
A new circuit breaker mechanism halts refresh for a stream table when its pending change count exceeds a configurable threshold. This protects your database from accidental mass-delete scripts, runaway migrations, or data imports that would otherwise trigger an unexpectedly large and expensive refresh operation.
When the fuse blows, pg_trickle sends a pgtrickle_alert PostgreSQL notification
that you can subscribe to, and suspends the affected stream table. You then choose
how to recover using reset_fuse():
reset_fuse(name, action => 'apply')— process the backlog normally (default).reset_fuse(name, action => 'reinitialize')— clear the change buffer and repopulate the stream table from scratch.reset_fuse(name, action => 'skip_changes')— discard the pending changes and resume without reprocessing them.
Configure per-table with alter_stream_table(fuse => 'on', fuse_ceiling => 10000)
or set a global default with pg_trickle.fuse_default_ceiling. Use
fuse_status() to inspect the blown/active state of all stream tables at once.
Wider Column Bitmask — No More 63-Column Limit
pg_trickle's change capture tracks which columns were actually modified in each row so that stream tables that reference only a subset of columns can ignore irrelevant updates. Previously, this optimization silently stopped working for source tables with more than 63 columns — all updates were treated as touching every column.
The bitmask has been extended from a 64-bit integer to an arbitrary-width
PostgreSQL VARBIT value, removing the column count cap entirely. Existing
deployments are migrated automatically (the old column value becomes NULL,
which the filter treats conservatively — no rows are silently dropped). Tables
with fewer than 64 columns are unaffected at the data level.
Per-Database Worker Quotas
In multi-tenant environments where multiple databases share a single PostgreSQL instance, all stream-table refresh workers previously competed for the same concurrency pool. A single busy database could crowd out others.
A new GUC pg_trickle.per_database_worker_quota sets a soft concurrency limit
per database. When the rest of the cluster is lightly loaded (< 80% of available
capacity in use), a database can burst to 150% of its quota. When the cluster is
busy, each database is held to its base quota.
Refresh work is also now dispatched in priority order: IMMEDIATE mode tables → atomic diamond groups → singleton tables.
DAG Scheduling Performance
For deployments with chains of stream tables (A → B → C), several improvements reduce end-to-end propagation latency:
- Fused single-consumer chains. When a stream table chain has exactly one downstream consumer at each hop, the scheduler fuses the chain into a single execution unit in one background worker. Intermediate deltas are stored in temporary in-memory tables instead of persistent change buffers, eliminating the WAL writes, index maintenance, and cleanup that would normally occur at each hop.
- Batch coalescing. Before a downstream table reads from an upstream change buffer, redundant insert/delete pairs for the same row are cancelled out. This prevents rapid-fire upstream refreshes from accumulating duplicate work for downstream tables.
- Adaptive dispatch polling. The parallel dispatch loop now backs off exponentially (20 ms → 200 ms) instead of using a fixed 200 ms poll, and resets to 20 ms as soon as any worker finishes. Cheap refreshes no longer wait a full 200 ms for the next tick.
- Delta amplification warnings. When a differential refresh produces many
more output rows than input rows (default threshold: 100×), a
WARNINGis emitted with the table name, input and output counts, and a tuning hint.explain_st()now exposesamplification_statsfrom the last 20 refreshes.
Smarter Diagnostics and Warnings
Several improvements to make problems visible earlier and easier to diagnose:
- Know which refresh mode is actually running. When a stream table is set to
AUTO, pg_trickle now records which mode it actually chose at each refresh (DIFFERENTIAL,FULL, etc.) in a neweffective_refresh_modecolumn onpgt_stream_tables. A newexplain_refresh_mode(name)function reports the configured mode, the actual mode used, and the reason for any downgrade — all in one query. - Clearer warning when a stream table falls back to full refresh. If a stream
table cannot use differential mode, pg_trickle now emits a
WARNINGmessage naming the affected table and the reason. Previously this happened silently. - Warning when using aggregates that require full group rescans. Aggregate
functions like
STRING_AGG,ARRAY_AGG, andJSON_AGGrequire re-aggregating the entire group whenever any member changes. pg_trickle now warns at stream table creation time when such aggregates are used inDIFFERENTIALmode, andexplain_st()classifies each aggregate's maintenance strategy (incremental, auxiliary-state, or group-rescan) so you can understand the cost. - Better error messages. Errors for unsupported query patterns, cycle
detection, upstream schema changes, and query parse failures now include a
DETAILfield explaining what went wrong and aHINTfield suggesting how to fix it. - Invalid parameter combinations are rejected at creation time. For example,
using
diamond_schedule_policy='slowest'withoutdiamond_consistency='atomic'now produces a clear error atcreate_stream_table/alter_stream_tabletime rather than silently doing the wrong thing at refresh time. - TopK queries validate their metadata on every refresh. Stream tables defined
with
ORDER BY ... LIMIT Nnow recheck that the stored LIMIT/OFFSET metadata still matches the actual query on each refresh. On mismatch, they fall back to a full refresh with aWARNINGrather than silently producing wrong results.
Safety and Reliability Improvements
- No more crashes from schema changes. If a source table's schema changes
while a refresh is running (e.g. a column is dropped), pg_trickle now catches
the error, emits a structured
WARNINGwith the table name and error details, and continues refreshing all other stream tables. The scheduler never crashes due to an individual table's error. - Failure injection tests. New end-to-end tests deliberately drop columns and tables mid-refresh to verify that the scheduler stays alive and other stream tables continue processing correctly.
- Safer defaults. Three default settings have been updated to reflect
production-safe behavior:
parallel_refresh_modenow defaults to'on'(was'off'). Parallel refresh has been stable for several releases; serial mode is now opt-in.block_source_ddlnow defaults totrue. AccidentalALTER TABLEon a source table while a stream table depends on it is now blocked by default, with clear instructions on how to temporarily disable the guard if needed.- The invalidation ring capacity has been doubled from 32 to 128 slots, reducing the risk of invalidation events being silently discarded under rapid DDL.
Getting Started Guide Restructured
docs/GETTING_STARTED.md has been reorganised into five progressive chapters:
- Hello World — create your first stream table and watch it update.
- Joins, Aggregates & Chains — multi-table dependencies and DAG patterns.
- Scheduling & Backpressure — controlling refresh frequency and auto-backoff.
- Monitoring In Depth — using the five key diagnostic functions and the Prometheus/Grafana stack.
- Advanced Topics — FUSE circuit breaker, partitioned stream tables, IMMEDIATE (in-transaction) IVM, and multi-tenant worker quotas.
TPC-H Correctness Gate Added to CI
Five queries derived from the TPC-H benchmark — covering single-table
GROUP BY, filter-aggregate, CASE WHEN inside SUM, a three-way join, and LEFT
OUTER JOIN with GROUP BY — now run in DIFFERENTIAL mode on every push to main
and daily. Any correctness mismatch between pg_trickle's incremental output and
plain PostgreSQL execution fails the CI build automatically.
Docker Hub Image Improvements
The Dockerfile.hub image that is published to Docker Hub has been expanded
with a comprehensive set of GUC defaults fine-tuned for production use. A new
just build-hub-image recipe builds the image locally for testing.
Bug Fixes
- Scheduler crash after event-driven wake was enabled. The background worker
crashed immediately after startup when
event_driven_wake = on(the default) because theLISTENcommand was being issued outside of a transaction. Fixed by issuingLISTENinside a short-lived SPI transaction at startup. (#296) - Spurious full refresh for non-recursive CTEs. Stream tables containing
WITHclauses that were not recursive (WITH foo AS (SELECT ...)) were being incorrectly forced to FULL refresh mode. Only truly recursive CTEs (WITH RECURSIVE) require this. Non-recursive CTEs now correctly use differential mode. (#298) DISTINCT ONinside a CTE body caused a parse error. When a stream table's defining query contained aWITHclause whose body usedDISTINCT ON (...), the DVM query analyser failed with a parse error. TheDISTINCT ONclause is now rewritten before analysis so it no longer interferes. (#300)- Full-refresh fallback warning now names the affected table. When pg_trickle
falls back from differential to full refresh, the emitted
WARNINGnow includes the stream table name and the reason, making it straightforward to identify which table you need to investigate. (#301)
[0.10.0] — 2026-03-25
The headline features of 0.10.0 are cloud deployment compatibility, query
engine correctness, refresh performance, and improved developer
experience for auto_backoff. pg_trickle now works reliably
behind PgBouncer — the connection pooler used by default on Supabase, Railway,
Neon, and other managed PostgreSQL platforms. A broad set of correctness issues
in the incremental query engine are fixed. And several performance optimizations
cut refresh time for large tables and busy deployments.
auto_backoff Is Now Much Friendlier on Developer Machines
When pg_trickle.auto_backoff = true is enabled, the scheduler automatically
slows down stream tables whose refresh cost exceeds their schedule budget — a
good safeguard in production. This release makes the feature safe to use
alongside short schedules (e.g. '1s') in developer and CI environments:
-
Trigger threshold raised from 80 % → 95 %. Backoff now only activates when a refresh consumes more than 95 % of the schedule window. A 900 ms refresh on a 1-second schedule (90 %) used to trigger backoff; it no longer does. EC-11 operator alerting continues to fire at 80 % (unchanged) so you still get an early warning before the scheduler is actually stuck.
-
Maximum slowdown reduced from 64× → 8×. In the worst case, a stream table's effective refresh interval is now capped at 8× its configured schedule (e.g. 8 seconds for a
'1s'table) instead of 64 seconds. The cap self-heals immediately: a single on-time refresh resets the factor to 1×. -
Backoff events now emit
WARNINGinstead ofINFO. When the scheduler stretches or resets a stream table's effective interval, you will see aWARNINGmessage in your PostgreSQL client, including the new effective interval — rather than a silent slowdown with no explanation. -
auto_backoffnow defaults toon. With the above improvements in place, the feature is safe in all environments. New installations get CPU runaway protection out of the box. To restore the old opt-in behaviour, setpg_trickle.auto_backoff = off.
Works Behind PgBouncer
PgBouncer is the most popular PostgreSQL connection pooler. In "transaction mode" — the default setting on most cloud PostgreSQL platforms — it hands a fresh database connection to every transaction, which breaks anything that assumes the same connection stays open between calls (session locks, prepared statements). pg_trickle previously relied on both. This release makes pg_trickle work correctly in such deployments.
-
Session locks replaced with row-level locking. The background scheduler now acquires a short-lived row-level lock on each stream table's catalog entry instead of a session-level advisory lock. Row-level locks are released automatically at transaction end — exactly what PgBouncer transaction mode requires. If a concurrent refresh is already running for a given stream table, the scheduler skips that cycle and retries, rather than blocking.
-
New
pooler_compatibility_modeoption per stream table. Settingpooler_compatibility_mode => truewhen creating or altering a stream table disables prepared statements and NOTIFY emissions for that table. Leave it off (the default) if you're not behind a pooler — behaviour is unchanged from v0.9.0. -
PgBouncer tested end-to-end. A new automated test suite boots PgBouncer in transaction-pool mode alongside pg_trickle and exercises the full lifecycle: create, refresh, alter, drop — all through the pooler. Run with
just test-pgbouncer.
Query Engine Correctness Fixes
Several SQL patterns that appeared to work correctly could produce wrong results silently under the incremental query engine. All of the following are now fixed:
-
Recursive queries (WITH RECURSIVE) update correctly when rows are deleted. Recursive queries are used for organisation hierarchies, bill-of-materials roll-ups, graph traversals, and similar structures. In DIFFERENTIAL mode, deleting a row from the source previously caused a full recomputation (correct, but expensive — O(n)). Now pg_trickle uses the Delete-and-Rederive algorithm, updating only affected rows at O(delta) cost. Computed expressions like
ancestor.path || ' > ' || node.nameupdate correctly when any ancestor is renamed or moved. -
SUM over a FULL OUTER JOIN no longer returns 0 instead of NULL. When matched rows on both join sides transition to matched on one side only (creating null-padded rows), the incremental SUM formula previously returned 0 instead of NULL. pg_trickle now tracks how many non-null values exist in each group and produces the correct answer without any full-group rescan.
-
Multi-source delta merging is now correct for diamond-shaped queries. A "diamond" topology is when two separate paths through the dependency graph both feed into the same stream table (e.g. table A → both B and C → D). Simultaneous changes on both paths could previously cause some corrections to be silently discarded, leaving D with wrong values. Now uses proper weight aggregation (Z-set algebra) so every correction is applied. Six property-based tests verify this for different diamond shapes.
-
Statistical aggregates (CORR, COVAR, REGR_*) now update in constant time. All twelve SQL correlation and regression functions —
CORR,COVAR_POP,COVAR_SAMP, and the tenREGR_*variants — now update incrementally using running totals (Welford-style accumulation) instead of rescanning the whole group. Each changed row is processed once regardless of group size. -
LATERAL subqueries only re-examine correlated rows. When data changes in the inner part of a LATERAL JOIN, pg_trickle previously re-ran the subquery for every row in the outer table. Now it re-runs it only for outer rows that actually correlate with the changed inner data, reducing work from proportional-to-table-size to proportional-to-changes.
-
Materialized view sources now work in DIFFERENTIAL mode. Stream tables can use a PostgreSQL materialized view as their data source when
pg_trickle.matview_polling = onis set. Changes are detected by comparing snapshots, the same mechanism used for foreign table sources. -
Six correctness bugs in the query rewriting engine fixed. These all involved edge cases in how the incremental engine translates SQL:
- SQL comment fragments such as
/* unsupported ... */that were being injected into generated SQL and causing runtime syntax errors are now replaced with clear extension-level errors. - When a column-rename step (e.g.
EXTRACT(year FROM orderdate) AS o_year) sits between an aggregate and its source, GROUP BY and aggregate expressions now resolve correctly. EXCEPTqueries wrapped in a projection no longer silently lose their row multiplicity tracking.- A placeholder row identifier value of zero could collide with real row
hashes; changed to a sentinel value (
i64::MIN) outside the normal hash range. - Empty scalar subqueries now raise a clear error instead of silently emitting NULL.
- SQL comment fragments such as
-
Change capture (CDC) fixes. The UPDATE trigger now correctly handles rows with NULL values in their primary key columns (previously those rows were silently dropped from the change buffer). WAL logical replication publications are automatically rebuilt when a source table is converted to partitioned after the publication was set up — previously this caused the stream table to silently stop updating. TRUNCATE followed by INSERT is handled atomically so post-TRUNCATE inserts are never lost.
Faster Refreshes
-
Automatic covering index on stream table row IDs. Stream tables with eight or fewer output columns now automatically get a covering index with
INCLUDE (col1, col2, ...)on the internal__pgt_row_idcolumn. This lets the MERGE step use index-only scans — no heap lookups for matched rows — reducing refresh time by roughly 20–50% in small-delta / large-table scenarios. -
Change buffer compaction. When the pending change buffer grows beyond
pg_trickle.compact_threshold(default 100,000 rows), pg_trickle compacts it before the next refresh cycle. INSERT→DELETE pairs that cancel each other out are eliminated; multiple sequential changes to the same row are collapsed to a single net change. Reduces delta scan overhead by 50–90% for high-churn tables. Useschange_id(notctid) for safe operation under concurrent VACUUM. -
Tiered refresh scheduling. Large deployments can assign stream tables to one of four tiers: Hot (refresh at the configured interval), Warm (2× interval), Cold (10× interval), or Frozen (skip until manually promoted). Gate the feature with
pg_trickle.tiered_scheduling = on(default off). Set per stream table viaALTER STREAM TABLE ... SET (tier => 'warm'). Frozen stream tables are entirely skipped by the scheduler until you promote them. -
Incremental dependency-graph updates. When a stream table is created, altered, or dropped, the internal dependency graph now updates only the affected entries instead of rebuilding the entire graph from scratch. Reduces the latency impact of DDL operations from roughly 50 ms to roughly 1 ms in deployments with 1,000+ stream tables.
-
Smarter topo-sort caching inside a scheduler tick. The ordering in which stream tables are refreshed (topological order through the dependency graph) is now computed once per scheduler tick and reused across all internal callers, eliminating redundant work.
Better Visibility Into What pg_trickle Is Doing
Several behaviours that previously happened silently now produce a short, actionable message at the moment they occur:
-
ORDER BYwithoutLIMITwarns you at creation time. AddingORDER BYto a stream table's defining query without also addingLIMIThas no effect: stream table storage has no guaranteed row order. pg_trickle now emits aWARNINGpointing you toward the TopK pattern or suggesting you remove theORDER BY. -
append_onlymode reversions are visible. When pg_trickle automatically exits append-only mode (because deletions or updates were detected in the source), the notice is now emitted atWARNINGlevel (wasINFO, normally suppressed) and also dispatched as apgtrickle_alertnotification. -
Cleanup failures escalate after 3 consecutive attempts. If the background worker fails to clean up a source table 3 times in a row, the message is promoted from
DEBUG1(normally invisible) toWARNINGso it appears in the server log. -
Diamond dependency with
diamond_consistency='none'now advises you. When you create a stream table that forms a diamond in the dependency graph and explicitly setdiamond_consistency='none', aNOTICEadvises you to considerdiamond_consistency='atomic'for consistent cross-branch reads. -
diamond_consistencynow defaults to'atomic'. New stream tables get atomic group semantics by default, meaning all branches of a diamond are refreshed together in a single savepoint before the convergence node is updated. This prevents a read from the convergence node seeing one branch partially updated and the other stale. To restore the old independent behavior, passdiamond_consistency => 'none'explicitly. -
Adaptive fallback is visible at the default log level. When a differential refresh falls back to a full refresh because the delta is too large, the message is now emitted at
NOTICElevel (the defaultclient_min_messagesthreshold) instead ofINFO(usually suppressed in the client session). -
CALCULATEDschedule without downstream dependents warns you. When a stream table is created withschedule='calculated'but no existing stream table references it as a downstream dependent, aNOTICEexplains that the schedule will fall back topg_trickle.default_schedule_seconds. -
Internal
__pgt_*auxiliary columns are now documented. The hidden columns that the refresh engine may add to stream table physical storage are described in a new section of SQL_REFERENCE.md. This covers all variants from the always-present__pgt_row_idprimary key through the aggregate-specific auxiliary columns for AVG, STDDEV, CORR, COVAR, REGR_*, window functions, and recursive CTE depth.
Bug Fixes
- Scheduler no longer permanently misses stream tables created under a
stale snapshot.
signal_dag_invalidationis called inside the creating transaction before it commits. If the background scheduler happened to start a new tick and capture a catalog snapshot at that exact instant, the DAG rebuild query would not see the new stream table — yet the version counter was already advanced, so the scheduler would never rebuild again. The affected stream table would then never be scheduled for refresh. Fixed by verifying that every invalidatedpgt_idis present in the rebuilt DAG after each rebuild. If any are missing the scheduler signals a full-rebuild for the next tick (which starts a fresh transaction that includes all committed data) rather than accepting the stale version. Fixes CI testtest_autorefresh_diamond_cascade.
Upgrade Notes
-
New catalog columns. The
0.9.0 → 0.10.0upgrade migration addspooler_compatibility_mode BOOLEANandrefresh_tier TEXTtopgt_stream_tables. RunALTER EXTENSION pg_trickle UPDATE TO '0.10.0'after replacing the extension files. Verification script:scripts/check_upgrade_completeness.sh. -
Hidden auxiliary columns for statistical aggregates. Stream tables using
CORR,COVAR_POP,COVAR_SAMP, or anyREGR_*aggregate will get hidden__pgt_aux_*columns when created or altered under 0.10.0. These are invisible to normal queries (excluded by theNOT LIKE '__pgt_%'convention) and managed automatically. -
pooler_compatibility_modeis off by default. Existing stream tables are unaffected. Enable it only for stream tables accessed through PgBouncer transaction-mode pooling.
Additional Bug Fixes (2026-03-24)
Scheduler stability:
-
Scheduler no longer crashes when concurrent refreshes compete. The internal function that decides whether to skip a refresh cycle was running a locking query outside a transaction boundary — a strict PostgreSQL requirement. It now runs inside a proper subtransaction, eliminating the crash.
-
Auto-backoff no longer causes a transaction conflict in the background worker. When the auto-backoff feature stretches a stream table's refresh interval, it previously tried to open a new transaction inside the background worker's already-open transaction. PostgreSQL does not allow this nesting; the code path is now restructured to avoid it.
Query engine correctness:
-
Queries that filter on hidden columns now produce correct results. For example,
SELECT name FROM users WHERE internal_id > 5— whereinternal_idis not part of the output — could return wrong rows during incremental updates. Fixed. -
JOIN results are correct when both joined tables change at the same time. Simultaneous changes to two stream tables connected by a JOIN could leave the output with stale or duplicated rows. Fixed.
-
NULLIF(a, b)expressions now work in incremental queries.NULLIFreturns NULL when its two arguments are equal. It was not recognised by the incremental parser, causing a fallback error. Fixed. -
LIKEandILIKEpattern matching now work in filter conditions. Filter expressions such asWHERE name LIKE 'A%'orWHERE description ILIKE '%widget%'were not handled by the incremental engine. Fixed. -
Subqueries with
ORDER BY,LIMIT, orOFFSETare now preserved correctly. When the incremental engine reconstructed a subquery, those clauses were silently dropped. The incremental result no longer differs from a full refresh for such queries. -
Scalar subqueries using
LIMITorOFFSETare now handled gracefully. Rather than producing a runtime error, the engine falls back to a full refresh for those cases and continues.
SQL parser:
- Wildcard column references (
table.*) now work for qualified names. A two- or three-part column reference such asschema.table.*oralias.*caused a parser crash. Fixed.
Change capture and WAL:
- State transitions no longer stall when the WAL replication slot is behind. When a stream table moves through the TRANSITIONING state, pg_trickle now advances the WAL replication slot up-front. This eliminates a lag-check stall that could cause the transition to hang indefinitely under write-heavy workloads.
Security:
- Several low-severity code quality and security scanner alerts from Semgrep and CodeQL are resolved. No user-visible behaviour changes.
[0.9.0] — 2026-03-20
The headline feature of 0.9.0 is incremental aggregate maintenance: when a single row changes inside a group of 100,000 rows, pg_trickle no longer has to re-scan all 100,000 rows to update COUNT, SUM, AVG, STDDEV, or VAR results. Instead it keeps running totals and adjusts them in constant time. Only MIN/MAX still needs a rescan — and only when the deleted value happens to be the current extreme.
Beyond aggregates, this release contains a broad set of performance optimizations that reduce wasted I/O during every refresh cycle, two new configuration knobs, a refresh-group management API, and several bug fixes.
Faster Aggregates
- Constant-time COUNT, SUM, AVG: Changed rows are now applied
algebraically (
new_sum = old_sum + inserted − deleted) instead of re-aggregating the whole group. AVG uses hidden auxiliary SUM and COUNT columns maintained automatically on the stream table. - Constant-time STDDEV and VAR: Standard-deviation and variance
aggregates (
STDDEV_POP,STDDEV_SAMP,VAR_POP,VAR_SAMP) now use a sum-of-squares decomposition with a hidden auxiliary column, achieving the same constant-time update as COUNT/SUM/AVG. - MIN/MAX safety guard: Deleting the row that currently holds the minimum (or maximum) value correctly triggers a rescan of that group. Property-based tests verify this boundary.
- Floating-point drift reset: A new setting
(
pg_trickle.algebraic_drift_reset_cycles) periodically forces a full recomputation to correct any floating-point rounding drift that accumulates over many incremental cycles.
Smarter Refresh Scheduling
- Automatic backoff for overloaded streams: The
pg_trickle.auto_backoffGUC was introduced here (default off at the time). See the v0.10.0 entry for the improved thresholds, reduced cap, and the flip toonby default. - Index-aware MERGE: A new threshold setting
(
pg_trickle.merge_seqscan_threshold, default 0.001) tells PostgreSQL to use an index lookup instead of a full table scan when only a tiny fraction of the stream table's rows are changing.
Less Wasted I/O
- Skip unchanged columns: The scan operator now checks the CDC trigger's per-row bitmask to skip UPDATE rows where none of the columns your query actually uses were modified. For wide tables where you only reference a few columns, most UPDATE processing is eliminated.
- Skip unchanged sources in joins: When a multi-source join query has
three source tables but only one of them changed, the delta branches for
the two unchanged sources are now replaced with
FALSEat plan time. PostgreSQL's planner recognises those branches as empty and skips them entirely. - Push WHERE filters into the change scan: If your stream table's
defining query has a WHERE clause (e.g.
WHERE status = 'shipped'), that filter is now applied immediately after reading the change buffer — before rows enter the join or aggregate pipeline. Rows that don't match the filter are discarded right away. - Faster DISTINCT counting: The per-row multiplicity lookup for
SELECT DISTINCTqueries now uses an index-driven scalar subquery instead of a LEFT JOIN, guaranteeing I/O proportional to the number of changed rows regardless of stream table size. - Scalar subquery short-circuit: When a scalar subquery's inner source has no changes in the current cycle, the expensive outer-table snapshot reconstruction is skipped entirely.
Refresh Group Management
- New SQL functions for grouping stream tables that should always be
refreshed together (cross-source snapshot consistency):
pgtrickle.create_refresh_group(name, members, isolation)pgtrickle.drop_refresh_group(name)pgtrickle.refresh_groups()— lists all declared groups.
Bug Fixes
- Fixed a crash when internal status queries failed: The
source_gates()andwatermarks()SQL functions previously crashed the entire PostgreSQL backend process on any internal error. They now report a normal SQL error instead. - Clearer handling of window functions in expressions: Queries like
CASE WHEN ROW_NUMBER() OVER (...) > 5 THEN ...were silently accepted but failed at refresh time with a confusing error. pg_trickle now automatically falls back to full refresh mode (in AUTO mode) or warns you at creation time (in explicit DIFFERENTIAL mode).
Documentation
- Documented the known limitation that recursive CTE stream tables in
DIFFERENTIAL mode fall back to full recomputation when rows are deleted
or updated. Workaround: use
refresh_mode = 'IMMEDIATE'. - Documented the
pgt_refresh_groupscatalog table schema and usage. - Documented the O(partition_size) cost of window function maintenance with mitigation strategies.
Deferred to v0.10.0
The following performance optimizations were evaluated and explicitly deferred. In every case the current behaviour is correct — these items would make certain workloads faster but carry enough implementation risk that they need more design work first:
- Recursive CTE incremental delete/update in DIFFERENTIAL mode (P2-1)
- SUM NULL-transition shortcut for FULL OUTER JOIN aggregates (P2-2)
- Materialized view sources in IMMEDIATE mode (P2-4)
- LATERAL subquery scoped re-execution (P2-6)
- Welford auxiliary columns for CORR/COVAR/REGR_* aggregates (P3-2)
- Merged-delta weight aggregation for multi-source deduplication (B3-2/B3-3)
Upgrade Notes
- New SQL objects: The
0.8.0 → 0.9.0upgrade migration adds thepgt_refresh_groupstable and therestore_stream_tablesfunction. RunALTER EXTENSION pg_trickle UPDATE TO '0.9.0'after replacing the extension files. - Hidden auxiliary columns: Stream tables using AVG, STDDEV, or VAR
aggregates will automatically get hidden
__pgt_aux_*columns when created or altered. These columns are invisible to normal queries (filtered by the existingNOT LIKE '__pgt_%'convention) and are managed automatically. - PGXN publishing: Release artifacts are now automatically uploaded to PGXN via GitHub Actions.
[0.8.0] — 2026-03-17
This release focuses on making your streams easier to back up, far more reliable under complex scenarios, and solidifying the underlying core engine through massive testing improvements.
Added
- Backup and Restore Support: You can now safely backup your database using standard
pg_dumpandpg_restorecommands. The system will automatically reconnect all streams and data queues to eliminate downtime during disaster recovery. - Connection Pooler Opt-In: Replaced the global PgBouncer pooler compatibility setting with a per-stream option. You can now enable connection pooling optimizations selectively on a stream-by-stream basis.
Fixed
- Cyclic Stream Reliability: Fixed internal bugs that occasionally caused streams referencing each other in a loop to get stuck refreshing forever. Streams now accurately detect when row changes stop and naturally settle.
- Large Dependency Chains: Fixed a crash (stack overflow) that could happen if you attempted to drop an extremely large or heavily recursive chain of stream tables sequentially.
- Special Character Support in SQL: Handled an edge case causing errors when multi-byte characters or special non-ASCII symbols were parsed inside certain SQL commands.
- Mac Support for Developer Tooling: Addressed a minor internal tool error stopping test components from automatically building on Apple Silicon machines.
Under the Hood Code and Testing Enhancements
- Massive Testing Hardening: We have fundamentally overhauled and upgraded how we test the system. Our internal test suite has been completely enhanced with tens of thousands of continuous automated checks ensuring query answers are perfect, no matter how complex the data joins or updates get.
- Performance Migrations: Began adopting new tools (
cargo nextest) to speed up how fast we can iterate and develop the software in the background.
[0.7.0] — 2026-03-16
0.7.0 makes pg_trickle easier to trust in real-world data pipelines. The big theme of this release is fewer surprises: the scheduler can now wait for late arriving source data, some circular pipelines can run safely instead of being blocked, more queries stay on incremental refresh, and the system does a better job of deciding when incremental work is no longer worth it.
Added
Multi-source data can wait until it is actually ready
pg_trickle can now delay a refresh until related source tables have all caught
up to roughly the same point in time. This is useful for ETL jobs where, for
example, orders arrives before order_lines and refreshing too early would
produce a half-finished report.
- New watermark APIs:
advance_watermark(source, watermark),create_watermark_group(name, sources[], tolerance_secs), anddrop_watermark_group(name). - New status helpers:
watermarks(),watermark_groups(), andwatermark_status(). - The scheduler now skips gated refreshes when grouped sources are too far apart and records the reason in refresh history.
- New catalog tables store per-source watermarks and watermark group definitions.
- 28 end-to-end tests cover normal operation, bad input, tolerance windows, and scheduler behavior.
Some circular pipelines can now run safely
Stream tables that depend on each other in a loop are no longer always blocked. If the cycle is monotone and uses DIFFERENTIAL mode, pg_trickle can now keep refreshing the group until it stops changing.
- Circular refreshes run to a fixed point, with
pg_trickle.max_fixpoint_iterationsas a safety limit. - Cycle creation and ALTER validation now check that every member is safe for convergence before allowing the loop.
pgtrickle.pgt_status()now reportsscc_id, andpgtrickle.pgt_scc_status()shows per-cycle-group status.pgtrickle.pgt_stream_tablesnow trackslast_fixpoint_iterationsso it is easier to spot slow or unstable cycles.- 6 end-to-end tests cover convergence, rejection of unsafe cycles, non-convergence handling, and cleanup.
More queries stay on incremental refresh
Several query shapes that used to fall back to FULL refresh, or fail outright, now keep working in DIFFERENTIAL and AUTO mode.
- User-defined aggregates created with
CREATE AGGREGATEnow work through the existing group-rescan strategy, including common extension-provided aggregates. - More complex
ORplus subquery patterns are now rewritten correctly, including cases that need De Morgan normalization and multiple rewrite passes. - The rewrite pipeline has a guardrail to stop runaway branch explosion.
- A dedicated 14-test end-to-end suite covers these previously missing cases.
Easier packaging ahead of 1.0
The release also adds infrastructure that makes evaluation and future distribution simpler.
Dockerfile.huband a dedicated CI workflow can build and smoke-test a ready-to-run PostgreSQL 18 image with pg_trickle preinstalled.META.jsonadds PGXN package metadata withrelease_status: "testing".- CNPG smoke testing is now part of the documented pre-1.0 packaging story.
Improved
Refresh strategy and performance decisions are smarter
The scheduler and refresh engine now make better choices when incremental work is likely to help and back off sooner when it is not.
- Wide tables now use xxh64-based change detection instead of slower MD5-based comparisons.
- Aggregate stream tables can skip expensive incremental work and jump straight to FULL refresh when the pending change set is obviously too large.
- Strategy selection now combines a change-ratio signal with recent refresh history, which helps on workloads with uneven batch sizes.
- DAG levels are extracted explicitly, enabling level-parallel refresh scheduling.
- Small internal hot paths such as column-list building and LSN comparison were tightened to remove avoidable allocations.
Benchmarking is much easier to use and compare
The performance toolchain was expanded so regressions are easier to spot and large-scale behavior is easier to study.
- Benchmarks now support per-cycle output, optional
EXPLAIN ANALYZEcapture, larger 1M-row runs, and more stable Criterion settings. - New tooling covers cross-run comparison, concurrent writers, and extra query
shapes such as window, lateral, CTE, and
UNION ALLworkloads. just bench-dockermakes it easier to run Criterion inside the builder image when local linking is awkward.
Changed
Internal Code Quality: Integration Test Suite Hardening
Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:
- Multiset validation — Extracted
assert_sets_equal()helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh. - Round-trip notifications —
pg_trickle_alertnotifications now verify receipt end-to-end viasqlx::PgListener. - DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and
proptest!fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins. - Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
- Cleanups — Standardized naming practices (
test_workflow_*,test_infra_*) and eliminated clock-bound flakes by widening staleness assertions.
Internal low-level code is much safer to audit
This release cuts the amount of low-level unsafe Rust in half without
changing behavior.
- Unsafe blocks were reduced by 51%, from 1,309 to 641.
- Repeated patterns were consolidated into a small set of documented helper functions.
- 37 internal functions no longer need to be marked
unsafe. - Existing unit tests continued to pass unchanged after the refactor.
[0.6.0] — 2026-03-14
Added
Idempotent DDL (create_or_replace)
New one-call function for deploying stream tables without worrying about whether they already exist. Replaces the old "check if it exists, then drop and recreate" pattern.
create_or_replace_stream_table()— a single function that does the right thing automatically:- Creates the stream table if it doesn't exist yet.
- Does nothing if the stream table already exists with the same query and settings (logs an INFO so you know it was a no-op).
- Updates settings (schedule, refresh mode, etc.) if only config changed.
- Replaces the query if the defining query changed — including automatic schema migration and a full refresh.
- dbt uses it automatically. The
stream_tablematerialization now callscreate_or_replace_stream_table()when running against pg_trickle 0.6.0+, with automatic fallback for older versions. - Whitespace-insensitive. Cosmetic SQL differences (extra spaces, tabs, newlines) are correctly treated as no-ops — won't trigger unnecessary rebuilds.
dbt Integration Enhancements
- Check stream table health from dbt. New
pgtrickle_stream_table_status()macro returns whether a stream table is healthy, stale, erroring, or paused. Pair it with the new built-instream_table_healthytest in yourschema.ymlto fail CI when a stream table is behind or broken. - Refresh everything in the right order. New
refresh_all_stream_tablesrun-operation refreshes all dbt-managed stream tables in dependency order. Run it afterdbt runand beforedbt testin your CI pipeline.
Partitioned Source Tables
Stream tables now work with PostgreSQL's declarative table partitioning — RANGE, LIST, and HASH partitioned tables all work as sources out of the box.
- Changes in any partition are captured automatically. CDC triggers fire on the parent table so inserts, updates, and deletes in any child partition are picked up.
- ATTACH PARTITION triggers automatic rebuild. When you attach a new partition, pg_trickle detects the structural change and rebuilds affected stream tables to include the new partition's pre-existing data.
- WAL mode works with partitions. Publications are configured with
publish_via_partition_root = true, so all partitions report changes under the parent table's identity. - New tutorial covering partitioned source tables, ATTACH/DETACH behavior,
and known caveats (
docs/tutorials/PARTITIONED_TABLES.md).
Circular Dependency Foundation
Lays the groundwork for stream tables that reference each other in a cycle (A → B → A). The actual cyclic refresh execution is planned for v0.7.0 — this release adds the detection, validation, and safety infrastructure.
- Cycle detection. pg_trickle can now identify groups of stream tables that form circular dependencies.
- Safety checks at creation time. Queries that can't safely participate in a cycle (those using aggregates, EXCEPT, window functions, or NOT EXISTS) are rejected with a clear error explaining why.
- New settings:
pg_trickle.allow_circular(default: off) — master switch for circular dependencies.pg_trickle.max_fixpoint_iterations(default: 100) — prevents runaway loops.
Source Gating Improvements
bootstrap_gate_status()function. Shows which sources are currently gated, when they were gated, how long the gate has been active, and which stream tables are waiting. Useful for debugging "why isn't my stream table refreshing?"- ETL coordination cookbook. SQL Reference now includes five step-by-step recipes for common bulk-load patterns.
More SQL Patterns Supported
Two query patterns that previously required workarounds now just work:
-
Window functions inside expressions. Queries like
CASE WHEN ROW_NUMBER() OVER (...) = 1 THEN 'top' ELSE 'other' ENDorCOALESCE(SUM() OVER (...), 0)are now accepted and produce correct results. Use FULL refresh mode for these queries — incremental (DIFFERENTIAL) refresh of window-in-expression patterns is not yet supported. Previously, the query was rejected entirely at creation time. -
ALL (subquery)comparisons. Queries likeWHERE price < ALL (SELECT price FROM competitors)are now accepted in both FULL and DIFFERENTIAL modes. Supports all comparison operators (>,>=,<,<=,=,<>) and correctly handles NULL values per the SQL standard.
Operational Safety Improvements
-
Function changes detected automatically. If a stream table's query calls a user-defined function and you update that function with
CREATE OR REPLACE FUNCTION, pg_trickle detects the change and automatically rebuilds the stream table on the next cycle. No manual intervention needed. -
WAL mode explains why it isn't activating. When
cdc_mode = 'auto'and the system stays on trigger-based tracking, the scheduler now periodically logs the exact reason (e.g., "wal_levelis notlogical") andcheck_cdc_health()reports the current mode so you can diagnose the issue. -
WAL + keyless tables rejected early. Creating a stream table with
cdc_mode = 'wal'on a table that has no primary key and noREPLICA IDENTITY FULLis now rejected at creation time with a clear error — instead of silently producing incomplete results later. -
Automatic recovery after backup/restore. When a PostgreSQL server is restored from
pg_basebackup, WAL replication slots are lost. pg_trickle now detects the missing slot, automatically falls back to trigger-based tracking, and logs a WARNING so you know what happened.
Documentation
- ALL (subquery) worked example in the SQL Reference with sample data and expected results.
- Window-in-expression documentation showing before/after examples of the automatic rewrite.
- Foreign table sources tutorial — step-by-step guide for using
postgres_fdwforeign tables as stream table sources.
Fixed
create_or_replacewhitespace handling. Extra spaces, tabs, and newlines in queries no longer trigger unnecessary rebuilds.create_or_replaceschema incompatibility detection. Incompatible column type changes (e.g., text → integer) are now properly detected and handled.
[0.5.0] — 2026-03-13
Added
Row-Level Security (RLS) Support
Stream tables now work correctly with PostgreSQL's Row-Level Security feature, which lets you control which rows different users can see.
- Refreshes always see all data. When a stream table is refreshed, it computes the full result regardless of RLS policies on the source tables. This matches how PostgreSQL's built-in materialized views work. You then add RLS policies directly on the stream table to control who can read what.
- Internal tables are protected. The internal change-tracking tables used by pg_trickle are shielded from RLS interference, so refreshes won't silently fail if you turn on RLS at the schema level.
- Real-time (IMMEDIATE) mode secured. Triggers that keep stream tables updated in real time now run with elevated privileges and a locked-down search path, preventing data corruption or security bypasses.
- RLS changes are detected automatically. If you enable, disable, or force RLS on a source table, pg_trickle detects the change and marks affected stream tables for a full rebuild.
- New tutorial. Step-by-step guide for setting up per-tenant RLS policies
on stream tables (see
docs/tutorials/ROW_LEVEL_SECURITY.md).
Source Gating for Bulk Loads
New pause/resume mechanism for large data imports. When you're loading a big batch of data into a source table, you can temporarily "gate" it to prevent the background scheduler from triggering refreshes mid-load. Once the load is done, ungate it and everything catches up in a single refresh.
gate_source('my_table')— pauses automatic refreshes for any stream table that depends onmy_table.ungate_source('my_table')— resumes automatic refreshes. All changes made during the gate are picked up in the next refresh cycle.source_gates()— shows which source tables are currently gated, when they were gated, and by whom.- Manual refresh still works. Even while a source is gated, you can
explicitly call
refresh_stream_table()if needed. - Gating is idempotent — calling
gate_source()twice is safe, and gating a source that's already gated is a no-op.
Append-Only Fast Path
Significant performance improvement for tables that only receive INSERTs
(event logs, audit trails, time-series data, etc.). When you mark a stream
table as append_only, refreshes skip the expensive merge logic (checking
for deletes, updates, and row comparisons) and use a simple, fast insert.
- How to use: Pass
append_only => truewhen creating or altering a stream table. - Safe fallback. If a DELETE or UPDATE is detected on a source table, the extension automatically falls back to the standard refresh path and logs a warning. It won't silently produce wrong results.
- Restrictions. Append-only mode requires DIFFERENTIAL refresh mode and source tables with primary keys.
Usability Improvements
- Manual refresh history. When you manually call
refresh_stream_table(), the result (success or failure, timing, rows affected) is now recorded in the refresh history, just like scheduled refreshes. quick_healthview. A single-row health summary showing how many stream tables you have, how many are in error or stale, whether the scheduler is running, and an overall status (OK,WARNING,CRITICAL). Easy to plug into monitoring dashboards.create_stream_table_if_not_exists(). A convenience function that does nothing if the stream table already exists, instead of raising an error. Makes migration scripts and deployment automation simpler.
Smooth Upgrade from 0.4.0
- Existing installations can upgrade with
ALTER EXTENSION pg_trickle UPDATE TO '0.5.0'. All new features (source gating, append-only mode, quick health view, and the new convenience functions) are included in the upgrade script. - The upgrade has been verified with automated tests that confirm all 40 SQL objects survive the upgrade intact.
[0.4.0] — 2026-03-12
Added
Parallel Refresh (opt-in)
Stream tables can now be refreshed in parallel, using multiple background workers instead of processing them one at a time. This can dramatically reduce end-to-end refresh latency when you have many independent stream tables.
- Off by default. Set
pg_trickle.parallel_refresh_mode = 'on'to enable. Use'dry_run'to preview what the scheduler would do without changing behavior. - Automatic dependency awareness. The scheduler figures out which stream tables can safely refresh at the same time and which must wait for others. Stream tables connected by real-time (IMMEDIATE) triggers are always refreshed together to prevent race conditions.
- Atomic groups. When a group of stream tables must succeed or fail together (e.g. diamond dependencies), all members are wrapped in a single transaction — if one fails, the whole group rolls back cleanly.
- Worker pool controls:
pg_trickle.max_dynamic_refresh_workers(default 4) — cluster-wide cap on concurrent refresh workers.pg_trickle.max_concurrent_refreshes— per-database dispatch cap.
- Monitoring:
worker_pool_status()— shows how many workers are active and the current limits.parallel_job_status(max_age_seconds)— lists recent and active refresh jobs with timing and status.health_check()now warns when the worker pool is saturated or the job queue is backing up.
- Self-healing. On startup, the scheduler automatically cleans up orphaned jobs and reclaims leaked worker slots from previous crashes.
Statement-Level CDC Triggers
Change tracking triggers have been upgraded from row-level to statement-level, reducing write-side overhead for bulk INSERT and UPDATE operations. This is now the default for all new and existing stream tables. A benchmark harness is included so you can measure the difference on your own hardware.
dbt Getting Started Example
New examples/dbt_getting_started/ project with a complete, runnable dbt
example showing org-chart seed data, staging views, and three stream table
models. Includes an automated test script.
Fixed
Refresh Lock Not Released After Errors
Fixed a bug where refresh_stream_table() could get permanently stuck after
a PostgreSQL error (e.g. running out of temp file space). The internal lock
was session-level and survived transaction rollback, causing all future
refreshes for that stream table to report "another refresh is already in
progress". Refresh locks are now transaction-level, so they are automatically
released when the transaction ends — whether it succeeds or fails.
dbt Integration Fixes
- Fixed query quoting in dbt macros that broke when queries contained single quotes.
- Fixed
schedule = nonein dbt being incorrectly mapped to SQL NULL. - Fixed view inlining when the same view was referenced with different aliases.
Changed
Internal Code Quality: Integration Test Suite Hardening
Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:
-
Multiset validation — Extracted
assert_sets_equal()helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh. -
Round-trip notifications —
pg_trickle_alertnotifications now verify receipt end-to-end viasqlx::PgListener. -
DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and
proptest!fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins. -
Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
-
Cleanups — Standardized naming practices (
test_workflow_*,test_infra_*) and eliminated clock-bound flakes by widening staleness assertions. -
Updated to PostgreSQL 18.3 across CI and test infrastructure.
-
Dependency updates:
tokio1.49 → 1.50 and several GitHub Actions bumps.
Breaking Changes
These behavioural changes shipped in v0.4.0. They improve usability but may require action from users upgrading from v0.3.0.
-
Schedule default changed from
'1m'to'calculated'.create_stream_tablenow defaults toschedule => 'calculated', which auto-computes the refresh interval from downstream dependents instead of refreshing every 1 minute. If you relied on the implicit 1-minute default, explicitly passschedule => '1m'to preserve the old behaviour. -
NULLschedule input rejected. Passingschedule => NULLtocreate_stream_tablenow returns an error. Useschedule => 'calculated'instead — it's explicit and self-documenting. -
Diamond GUCs removed. The cluster-wide GUCs
pg_trickle.diamond_consistencyandpg_trickle.diamond_schedule_policyhave been removed. Diamond behaviour is now controlled per-table via parameters oncreate_stream_table()/alter_stream_table():diamond_consistency => 'atomic',diamond_schedule_policy => 'slowest'.
[0.3.0] — 2026-03-11
This is a correctness and hardening release. No new SQL functions, tables, or
views were added — all changes are in the compiled extension code.
ALTER EXTENSION pg_trickle UPDATE is safe and a no-op for schema objects.
Fixed
Incremental Correctness Fixes
All 18 previously-disabled correctness tests have been re-enabled (0 remaining). The following query patterns now produce correct results during incremental (non-full) refreshes:
-
HAVING clause threshold crossing. Queries with
HAVINGfilters (e.g.HAVING SUM(amount) > 100) now produce correct totals when groups cross the threshold. Previously, a group gaining enough rows to meet the condition would show only the newly added values instead of the correct total. -
FULL OUTER JOIN. Five bugs affecting incremental updates for
FULL OUTER JOINqueries are fixed: mismatched row identifiers, incorrect handling of compound GROUP BY expressions likeCOALESCE(left.col, right.col), and wrong NULL handling for SUM aggregates. -
EXISTS with HAVING subqueries. Queries using
WHERE EXISTS(... GROUP BY ... HAVING ...)now work correctly — the inner GROUP BY and HAVING were previously being silently discarded. -
Correlated scalar subqueries. Correlated subqueries in SELECT like
(SELECT MAX(e.salary) FROM emp e WHERE e.dept_id = d.id)are now automatically rewritten into LEFT JOINs so the incremental engine can handle them correctly.
Background Worker Detection on PostgreSQL 18
Fixed a bug where health_check() and the scheduler reported zero active
workers on PostgreSQL 18 due to a column name change in system views.
Scheduler Stability
Fixed a loop where the scheduler launcher could get stuck retrying failed database probes indefinitely instead of backing off properly.
Added
Security Tooling
Added static security analysis to the CI pipeline:
- GitHub CodeQL — automated security scanning across all Rust source files. First scan: zero findings.
cargo deny— enforces a license allow-list and flags unmaintained or yanked dependencies.- Semgrep — custom rules that flag potentially dangerous patterns such as dynamic SQL construction and privilege escalation. Advisory-only (does not block merges).
- Unsafe block inventory — CI tracks the count of unsafe code blocks per file and fails if any file exceeds its baseline, preventing unreviewed growth of low-level code.
[0.2.3] — 2026-03-09
Added
-
Unsafe function detection. Queries using non-deterministic functions like
random()orclock_timestamp()are now rejected when creating incremental stream tables, because they can't produce reliable results. Functions likenow()that return the same value within a transaction are allowed with a warning. -
Per-table change tracking mode. You can now choose how each stream table tracks changes (
'auto','trigger', or'wal') via thecdc_modeparameter oncreate_stream_table()andalter_stream_table(), instead of relying only on the global setting. -
CDC status view. New
pgtrickle.pgt_cdc_statusview shows the change tracking mode, replication slot, and transition status for every source table in one place. -
Configurable WAL lag thresholds. The warning and critical thresholds for replication slot lag are now configurable via
pg_trickle.slot_lag_warning_threshold_mb(default 100 MB) andpg_trickle.slot_lag_critical_threshold_mb(default 1024 MB), instead of being hard-coded. -
pg_trickle_dumpbackup tool. New standalone CLI that exports all your stream table definitions as replayable SQL, ordered by dependency. Useful for backups before upgrades or migrations. -
Upgrade path.
ALTER EXTENSION pg_trickle UPDATEpicks up all new features from this release.
Changed
Internal Code Quality: Integration Test Suite Hardening
Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:
-
Multiset validation — Extracted
assert_sets_equal()helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh. -
Round-trip notifications —
pg_trickle_alertnotifications now verify receipt end-to-end viasqlx::PgListener. -
DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and
proptest!fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins. -
Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
-
Cleanups — Standardized naming practices (
test_workflow_*,test_infra_*) and eliminated clock-bound flakes by widening staleness assertions. -
After a full refresh, WAL replication slots are now advanced to the current position, preventing unnecessary WAL accumulation and false lag alarms.
-
Change buffers are now flushed after a full refresh, fixing a cycle where the scheduler would alternate endlessly between incremental and full refreshes on bulk-loaded tables.
-
IMMEDIATE mode now correctly rejects explicit WAL CDC requests with a clear error, since real-time mode uses its own trigger mechanism.
-
The
pg_trickle.user_triggerssetting is simplified toautoandoff. The oldonvalue still works as an alias forauto. -
CI pipelines are faster on PRs — only essential tests run; the full suite runs on merge and daily schedule.
[0.2.2] — 2026-03-08
Added
-
Change a stream table's query.
alter_stream_tablenow accepts aqueryparameter, so you can change what a stream table computes without dropping and recreating it. If the new query's columns are compatible, the underlying storage table is preserved — existing views, policies, and publications continue to work. -
AUTO refresh mode (new default). Stream tables now default to
AUTOmode, which uses fast incremental updates when the query supports it and automatically falls back to a full recompute when it doesn't. You no longer need to think about whether your query is "incremental-compatible" — just create the stream table and it picks the best strategy. -
Version mismatch warning. The background scheduler now warns if the installed extension version doesn't match the compiled library, making it easier to spot a half-finished upgrade.
-
ORDER BY + LIMIT + OFFSET. You can now page through top-N results, e.g.
ORDER BY revenue DESC LIMIT 10 OFFSET 20to get the third page of top earners. -
Real-time mode: recursive queries.
WITH RECURSIVEqueries (e.g. org-chart hierarchies) now work in IMMEDIATE mode. A depth limit (default 100) prevents infinite loops. -
Real-time mode: top-N queries.
ORDER BY ... LIMIT Nqueries now work in IMMEDIATE mode — the top-N rows are recomputed on every data change. Maximum N is controlled bypg_trickle.ivm_topk_max_limit(default 1000). -
Foreign table support. Stream tables can now use foreign tables as sources. Changes are detected by comparing snapshots since foreign tables don't support triggers. Enable with
pg_trickle.foreign_table_polling = on. -
Documentation reorganization. Configuration and SQL reference docs are reorganized around practical workflows. New sections cover DDL-during-refresh behavior, standby/replica limitations, and PgBouncer constraints.
Changed
Internal Code Quality: Integration Test Suite Hardening
Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:
-
Multiset validation — Extracted
assert_sets_equal()helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh. -
Round-trip notifications —
pg_trickle_alertnotifications now verify receipt end-to-end viasqlx::PgListener. -
DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and
proptest!fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins. -
Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
-
Cleanups — Standardized naming practices (
test_workflow_*,test_infra_*) and eliminated clock-bound flakes by widening staleness assertions. -
Default refresh mode changed from
'DIFFERENTIAL'to'AUTO'. -
Default schedule changed from
'1m'to'calculated'(automatic). -
Default change tracking mode changed from
'trigger'to'auto'— WAL-based tracking starts automatically when available, with trigger-based as fallback.
[0.2.1] — 2026-03-05
Added
-
Safe upgrades. New upgrade infrastructure ensures that
ALTER EXTENSION pg_trickle UPDATEworks correctly. A CI check detects missing functions or views in upgrade scripts, and automated tests verify that stream tables survive version-to-version upgrades intact. See docs/UPGRADING.md for the upgrade guide. -
ORDER BY + LIMIT + OFFSET. You can now create stream tables over paged results, like "the second page of the top-100 products by revenue" (
ORDER BY revenue DESC LIMIT 100 OFFSET 100). -
'calculated'schedule. Instead of passing SQLNULLto request automatic scheduling, you can now writeschedule => 'calculated'. PassingNULLnow gives a helpful error message. -
Documentation expansion. Six new pages in the online book covering dbt integration, contributing guidelines, security policy, release process, and research comparisons with other projects.
-
Better warnings and safety checks:
- Warning when a source table lacks a primary key (duplicate rows are handled safely but less efficiently).
- Warning when using
SELECT *(new columns added later can break incremental updates). - Alert when the refresh queue is falling behind (> 80% capacity).
- Guard triggers prevent accidental direct writes to stream table storage.
- Automatic fallback from WAL to trigger-based change tracking when the replication slot disappears.
- Nested window functions and complex
WHEREclauses withEXISTSare now handled automatically.
-
Change buffer partitioning. For high-throughput tables, change buffers can now be partitioned so that processed data is dropped efficiently.
-
Column pruning. The incremental engine now skips source columns not used in the query, reducing I/O for wide tables.
Changed
Internal Code Quality: Integration Test Suite Hardening
Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:
-
Multiset validation — Extracted
assert_sets_equal()helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh. -
Round-trip notifications —
pg_trickle_alertnotifications now verify receipt end-to-end viasqlx::PgListener. -
DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and
proptest!fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins. -
Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
-
Cleanups — Standardized naming practices (
test_workflow_*,test_infra_*) and eliminated clock-bound flakes by widening staleness assertions. -
Default
schedulechanged from'1m'to'calculated'(automatic). -
Minimum schedule interval lowered from 60 s to 1 s.
-
Cluster-wide diamond consistency settings removed; per-table settings remain and now default to
'atomic'/'fastest'.
Fixed
- The 0.1.3 → 0.2.0 upgrade script was accidentally a no-op, silently skipping 11 new functions. Fixed.
- Queries combining
WITH(CTEs) andUNION ALLnow parse correctly.
[0.2.0] — 2026-03-04
Added
-
Monitoring & health checks. Six new functions for inspecting your stream tables at runtime (no superuser required):
change_buffer_sizes()— shows how much pending change data each stream table has queued up.list_sources(name)— lists all base tables that feed a given stream table, with row counts and size estimates.dependency_tree()— displays an ASCII tree of how your stream tables depend on each other.health_check()— quick system triage that checks whether the scheduler is running, flags tables in error or stale, and warns about large change buffers or WAL lag.refresh_timeline()— recent refresh history across all stream tables, showing timing, row counts, and any errors.trigger_inventory()— verifies that all required change-tracking triggers are in place and enabled.
-
IMMEDIATE refresh mode (real-time updates). New
'IMMEDIATE'mode keeps stream tables updated within the same transaction as your data changes. There's no delay — the stream table reflects changes the instant they happen. Supports window functions, LATERAL joins, scalar subqueries, and aggregate queries. You can switch between IMMEDIATE and other modes at any time usingalter_stream_table. -
Top-N queries (ORDER BY + LIMIT). Queries like
SELECT ... ORDER BY score DESC LIMIT 10are now supported. The stream table stores only the top N rows and updates efficiently. -
Diamond dependency consistency. When multiple stream tables share common sources and feed into the same downstream table (a "diamond" pattern), they can now be refreshed as an atomic group — either all succeed or all roll back. This prevents inconsistent reads at convergence points. Controlled via the
diamond_consistencyparameter (default:'atomic'). -
Multi-database auto-discovery. The background scheduler now automatically finds and services all databases on the server where pg_trickle is installed. No manual
pg_trickle.databaseconfiguration required — just install the extension and the scheduler discovers it.
Fixed
- Fixed IMMEDIATE mode incorrectly trying to read from change buffer tables (which don't exist in that mode) for certain aggregate queries.
- Fixed type mismatches when join queries had unchanged source tables producing empty change sets.
- Fixed join condition column order being swapped when the right-side table was
written first in the
ONclause (e.g.ON r.id = l.id). - Fixed dbt macros silently rolling back stream table creation because dbt
wraps statements in a
ROLLBACKby default. - Fixed
LIMIT ALLbeing incorrectly rejected as an unsupported LIMIT clause. - Fixed false "query may produce incorrect incremental results" warnings on
simple arithmetic like
depth + 1orpath || name. - Fixed auto-created indexes using the wrong column name when the query had a
column alias (e.g.
SELECT id AS department_id).
[0.1.3] — 2026-03-02
Major hardening release with 50 improvements across correctness, robustness, operational safety, and test coverage.
Added
- DDL change tracking expanded.
ALTER TYPE,ALTER POLICY, andALTER DOMAINon source tables are now detected and trigger a rebuild of affected stream tables. Previously only column changes were tracked. - Recursive query safety guard. Recursive CTEs (
WITH RECURSIVE) are now checked for non-monotonic terms that could produce incorrect incremental results. - Read replica awareness. The background scheduler detects when it's running on a read replica and skips refresh work, preventing errors.
- Range aggregates rejected.
RANGE_AGGandRANGE_INTERSECT_AGGare now properly rejected in incremental mode with a clear error. - Refresh history: row counts. Refresh history now records how many rows were inserted, updated, and deleted in each refresh cycle.
- Change buffer alerts. New
pg_trickle.buffer_alert_thresholdsetting lets you configure when to be warned about growing change buffers. st_auto_threshold()function. Shows the current adaptive threshold that decides when to switch between incremental and full refresh.- Wide table optimization. Tables with more than 50 columns use a hash shortcut during refresh merges, improving performance.
- Change buffer security. Internal change buffer tables are no longer
accessible to
PUBLIC. - Documentation. PgBouncer compatibility, keyless table limitations, delta memory bounds, sequential processing rationale, and connection overhead are all now documented in the FAQ.
TPC-H Correctness Suite: 22/22 Queries Passing
The TPC-H-derived correctness test suite (22 industry-standard analytical queries) now passes completely across multiple rounds of data changes. This validates that incremental refreshes produce identical results to full recomputation for complex real-world query patterns.
Fixed
Window Function Correctness
Fixed incremental maintenance of window functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG/LEAD, SUM OVER, etc.) to correctly handle:
- Non-RANGE frame types
- Ranking functions over tied values
- Window functions wrapping aggregates (e.g.
RANK() OVER (ORDER BY SUM(x))) - Multiple window functions with different PARTITION BY clauses
INTERSECT / EXCEPT Correctness
Fixed incremental maintenance of INTERSECT and EXCEPT queries that
produced wrong results due to invalid SQL generation.
EXISTS / IN with OR Correctness
Fixed EXISTS and IN subqueries combined with OR in WHERE clauses that
produced wrong results.
Aggregate Correctness
MIN/MAXnow correctly rescan the source table when the current minimum or maximum value is deleted.STRING_AGG(... ORDER BY ...)andARRAY_AGG(... ORDER BY ...)no longer silently drop the ORDER BY clause.
[0.1.2] — 2026-02-28
Changed
Internal Code Quality: Integration Test Suite Hardening
Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:
- Multiset validation — Extracted
assert_sets_equal()helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh. - Round-trip notifications —
pg_trickle_alertnotifications now verify receipt end-to-end viasqlx::PgListener. - DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and
proptest!fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins. - Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
- Cleanups — Standardized naming practices (
test_workflow_*,test_infra_*) and eliminated clock-bound flakes by widening staleness assertions.
Project Renamed from pg_stream to pg_trickle
Renamed the entire project from pg_stream to pg_trickle to avoid a
naming collision with an unrelated project. If you were using the old name,
all configuration prefixes changed from pg_stream.* to pg_trickle.*, and
the SQL schemas changed from pgstream to pgtrickle. The "stream tables"
terminology is unchanged.
Fixed
Fixed numerous incremental computation bugs discovered while building a comprehensive correctness test suite based on all 22 TPC-H analytical queries:
- Inner join double-counting. When both sides of a join had changes in the same refresh cycle, some rows were counted twice.
- Shared source cleanup. Cleaning up processed changes for one stream table could accidentally delete entries still needed by another stream table sharing the same source.
- Scalar aggregate identity mismatch. Queries like
SELECT SUM(amount) FROM orderscould produce mismatched row identifiers between the incremental and merge phases. AVG also failed to recompute correctly after partial group changes. - EXISTS / NOT EXISTS snapshots. Incremental maintenance of
EXISTSandNOT EXISTSsubqueries missed pre-change state, producing wrong results. - Column resolution in complex joins. Several fixes for column name resolution in multi-table joins and nested subqueries.
- COUNT(*) rendering.
COUNT(*)was sometimes rendered asCOUNT()(missing the star), causing SQL errors. - Subquery rewriting. Several subquery patterns (correlated vs non-correlated scalar subqueries, derived tables in FROM) were incorrectly rewritten, blocking certain queries from being created.
- Cleanup worker crash. The background cleanup worker no longer crashes when it encounters entries for stream tables that were dropped mid-cycle.
Added
TPC-H Correctness Test Suite
Added a comprehensive correctness test suite based on all 22 TPC-H analytical queries. These tests verify that incremental refreshes produce identical results to a full recompute after INSERT, DELETE, and UPDATE mutations. 20 of 22 queries can be created as stream tables; 15 pass full correctness checks at this point (improved to 22/22 in v0.1.3).
[0.1.1] — 2026-02-26
Changed
Internal Code Quality: Integration Test Suite Hardening
Completed a full hardening pass of the integration test suite, bringing all items in PLAN_TEST_EVALS_INTEGRATION.md to done:
- Multiset validation — Extracted
assert_sets_equal()helper relying on EXCEPT/UNION ALL SQL logic and applied it to workflow tests to ensure storage table state correctly matches the defining query post-refresh. - Round-trip notifications —
pg_trickle_alertnotifications now verify receipt end-to-end viasqlx::PgListener. - DVM operators — Added unit coverage for complex semi/anti-join behaviors (multi-column, filtered, complementary), multi-table join chains for inner and full joins, and
proptest!fuzz tests enforcing generated SQL invariants across INNER, SEMI, and ANTI joins. - Resilience and edge cases — Test coverage for ST drop cascades verifying dependent object removal, exact error escalation thresholds, and scheduler job lifecycles across queued mock states.
- Cleanups — Standardized naming practices (
test_workflow_*,test_infra_*) and eliminated clock-bound flakes by widening staleness assertions.
CloudNativePG Extension Image
Replaced the full PostgreSQL Docker image (~400 MB) with a minimal extension-only image (< 10 MB) following the CloudNativePG Image Volume Extensions specification. This means faster pulls and less disk usage in Kubernetes deployments. The image contains just the extension files — no full PostgreSQL server.
[0.1.0] — 2026-02-26
Initial release of pg_trickle — a PostgreSQL extension that keeps query results automatically up to date as your data changes.
Core Concept
Define a SQL query and a schedule. pg_trickle creates a stream table that stores the query's results and keeps them fresh — either on a schedule (every N seconds) or in real time. When data in your source tables changes, only the affected rows are recomputed instead of re-running the entire query.
What You Can Do
- Create stream tables from
SELECTqueries — joins, aggregates, subqueries, CTEs, window functions, set operations, and more. - Automatic refresh — a background scheduler refreshes stream tables in dependency order. You can also trigger refreshes manually.
- Incremental updates — the engine automatically figures out how to update only the rows that changed, instead of recomputing everything. This works for most query patterns including multi-table joins and aggregates.
- Views as sources — views referenced in your query are automatically expanded so change tracking works on the underlying tables.
- Tables without primary keys — supported via content hashing. Tables with primary keys get better performance.
- Hybrid change tracking — starts with lightweight triggers (no special
PostgreSQL configuration needed). Can automatically switch to WAL-based
tracking for lower overhead when
wal_level = logicalis available. - Multi-database support — the scheduler automatically discovers all databases on the server where the extension is installed.
- User triggers on stream tables — your own
AFTERtriggers on stream tables fire correctly during incremental refreshes. - DDL awareness —
ALTER TABLE,DROP TABLE,CREATE OR REPLACE FUNCTION, and other DDL on source tables or functions used in your query are detected and handled automatically.
SQL Support
Broad coverage of SQL features:
- Joins: INNER, LEFT, RIGHT, FULL OUTER, NATURAL, LATERAL subqueries,
LATERAL set-returning functions (
unnest,jsonb_array_elements, etc.) - Aggregates: 39 functions including COUNT, SUM, AVG, MIN, MAX, STRING_AGG, ARRAY_AGG, JSON_ARRAYAGG, JSON_OBJECTAGG, statistical regression functions (CORR, COVAR_, REGR_), and ordered-set aggregates (MODE, PERCENTILE_CONT, PERCENTILE_DISC)
- Window functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG, LEAD, SUM OVER, etc. with full frame clause support
- Set operations: UNION, UNION ALL, INTERSECT, EXCEPT
- Subqueries: in FROM, EXISTS/NOT EXISTS, IN/NOT IN, scalar subqueries
- CTEs:
WITHandWITH RECURSIVE - Special syntax: DISTINCT, DISTINCT ON, GROUPING SETS / CUBE / ROLLUP, CASE WHEN, COALESCE, JSON_TABLE (PostgreSQL 17+)
- Unsafe function detection: queries using non-deterministic functions
like
random()are rejected with a clear error
Monitoring
explain_st()— shows the incremental computation planst_refresh_stats(),get_refresh_history(),get_staleness()— refresh performance and statusslot_health()— WAL replication slot healthcheck_cdc_health()— change tracking health per source tablestream_tables_infoandpg_stat_stream_tablesviews- NOTIFY alerts for stale data, errors, and refresh events
Documentation
- Architecture guide, SQL reference, configuration reference, FAQ, getting-started tutorial, and deep-dive tutorials.
Known Limitations
TABLESAMPLE,LIMIT/OFFSET,FOR UPDATE/FOR SHARE— not yet supported (clear error messages).- Window functions inside expressions (e.g.
CASE WHEN ROW_NUMBER() ...) — not yet supported. - Circular stream table dependencies — not yet supported.
pg_trickle — Project Roadmap
Last updated: 2026-04-13 Latest release: 0.19.0 (2026-04-13) Current milestone: v0.21.0 — PostgreSQL 17 Support
For a concise description of what pg_trickle is and why it exists, read
ESSENCE.md — it explains the core problem (full REFRESH MATERIALIZED VIEW recomputation), how the differential dataflow approach
solves it, the hybrid trigger→WAL CDC architecture, and the broad SQL
coverage, all in plain language.
Table of Contents
- Overview
- v0.1.x Series — Released
- v0.2.0 — TopK, Diamond Consistency & Transactional IVM
- v0.2.1 — Upgrade Infrastructure & Documentation
- v0.2.2 — OFFSET, AUTO Mode, ALTER QUERY, Edge Cases & CDC Hardening
- v0.2.3 — Non-Determinism, CDC/Mode Gaps & Operational Polish
- v0.3.0 — DVM Correctness, SAST & Test Coverage
- v0.4.0 — Parallel Refresh & Performance Hardening
- v0.5.0 — Row-Level Security & Operational Controls
- v0.6.0 — Partitioning, Idempotent DDL, Edge Cases & Circular Dependency Foundation
- v0.7.0 — Performance, Watermarks, Circular DAG Execution, Observability & Infrastructure
- v0.8.0 — pg_dump Support & Test Hardening
- v0.9.0 — Incremental Aggregate Maintenance
- v0.10.0 — DVM Hardening, Connection Pooler Compatibility, Core Refresh Optimizations & Infrastructure Prep
- v0.11.0 — Partitioned Stream Tables, Prometheus & Grafana Observability, Safety Hardening & Correctness
- v0.12.0 — Correctness, Reliability & Developer Tooling
- v0.13.0 — Scalability Foundations, Partitioning Enhancements, MERGE Profiling & Multi-Tenant Scheduling
- v0.14.0 — Tiered Scheduling, UNLOGGED Buffers & Diagnostics
- v0.15.0 — External Test Suites & Integration
- v0.16.0 — Performance & Refresh Optimization
- v0.17.0 — Query Intelligence & Stability
- v0.18.0 — Hardening & Delta Performance
- v0.19.0 — Production Gap Closure & Distribution
- v0.20.0 — Dog-Feeding
- v0.21.0 — PostgreSQL 17 Support
- v0.22.0 — PGlite Proof of Concept
- v0.23.0 — Core Extraction (
pg_trickle_core) - v0.24.0 — PGlite WASM Extension
- v0.25.0 — PGlite Reactive Integration
- v1.0.0 — Stable Release
- Post-1.0 — Scale, Ecosystem & Platform Expansion
- Effort Summary
- References
Overview
pg_trickle is a PostgreSQL 18 extension that implements streaming tables with incremental view maintenance (IVM) via differential dataflow. The extension is designed for maximum performance, low latency, and high throughput — differential refresh is the default mode, and full refresh is a fallback of last resort. All 13 design phases are complete. This roadmap tracks the path from the v0.1.x series to 1.0 and beyond.
| Version | Theme | Status |
|---|---|---|
| v0.1.x | Core engine, DVM, CDC, scheduling, monitoring | ✅ Released |
| v0.2.0 | TopK, diamond consistency, transactional IVM | ✅ Released |
| v0.2.1 | Upgrade infrastructure & documentation | ✅ Released |
| v0.2.2 | OFFSET, AUTO mode, ALTER QUERY, CDC hardening | ✅ Released |
| v0.2.3 | Non-determinism, CDC/mode gaps, operational polish | ✅ Released |
| v0.3.0 | DVM correctness, SAST & test coverage | ✅ Released |
| v0.4.0 | Parallel refresh & performance hardening | ✅ Released |
| v0.5.0 | Row-level security & operational controls | ✅ Released |
| v0.6.0 | Partitioning, idempotent DDL, circular dependency foundation | ✅ Released |
| v0.7.0 | Performance, watermarks, circular DAG, observability | ✅ Released |
| v0.8.0 | pg_dump support & test hardening | ✅ Released |
| v0.9.0 | Incremental aggregate maintenance | ✅ Released |
| v0.10.0 | DVM hardening, connection pooler compat, refresh optimizations | ✅ Released |
| v0.11.0 | Partitioned stream tables, Prometheus/Grafana, safety hardening | ✅ Released |
| v0.12.0 | Correctness, reliability & developer tooling | ✅ Released |
| v0.13.0 | Scalability foundations, MERGE profiling, multi-tenant scheduling | ✅ Released |
| v0.14.0 | Tiered scheduling, UNLOGGED buffers & diagnostics | ✅ Released |
| v0.15.0 | External test suites & integration | ✅ Released |
| v0.16.0 | Performance & refresh optimization | ✅ Released |
| v0.17.0 | Query intelligence & stability | ✅ Released |
| v0.18.0 | Hardening & delta performance | ✅ Released |
| v0.19.0 | Production gap closure & distribution | ✅ Released |
| v0.20.0 | Dog-feeding (pg_trickle monitors itself) | ✅ Released |
| v0.21.0 | PostgreSQL 17 support | Planned |
| v0.22.0 | PGlite proof of concept | Planned |
| v0.23.0 | Core extraction (pg_trickle_core) | Planned |
| v0.24.0 | PGlite WASM extension | Planned |
| v0.25.0 | PGlite reactive integration | Planned |
| v1.0.0 | Stable release (incl. PG 19 compatibility) | Planned |
v0.1.x Series — Released
Completed items (click to expand)
v0.1.0 — Released (2026-02-26)
Status: Released — all 13 design phases implemented.
Core engine, DVM with 21 OpTree operators, trigger-based CDC, DAG-aware scheduling, monitoring, dbt macro package, and 1,300+ tests.
Key additions over pre-release:
- WAL decoder pgoutput edge cases (F4)
- JOIN key column change limitation docs (F7)
- Keyless duplicate-row behavior documented (F11)
- CUBE explosion guard (F14)
v0.1.1 — Released (2026-02-27)
Patch release: WAL decoder keyless pk_hash fix (F2), old_* column population
for UPDATEs (F3), and delete_insert merge strategy removal (F1).
v0.1.2 — Released (2026-02-28)
Patch release: ALTER TYPE/POLICY DDL tracking (F6), window partition key E2E tests (F8), PgBouncer compatibility docs (F12), read replica detection (F16), SPI retry with SQLSTATE classification (F29), and 40+ additional E2E tests.
v0.1.3 — Released (2026-03-01)
Patch release: Completed 50/51 SQL_GAPS_7 items across all tiers. Highlights:
- Adaptive fallback threshold (F27), delta change metrics (F30)
- WAL decoder hardening: replay deduplication, slot lag alerting (F31–F38)
- TPC-H 22-query correctness baseline (22/22 pass, SF=0.01)
- 460 E2E tests (≥ 400 exit criterion met)
- CNPG extension image published to GHCR
See CHANGELOG.md for the full feature list.
v0.2.0 — TopK, Diamond Consistency & Transactional IVM
Status: Released (2026-03-04).
The 51-item SQL_GAPS_7 correctness plan was completed in v0.1.x. v0.2.0 delivers three major feature additions.
Completed items (click to expand)
| Tier | Items | Status |
|---|---|---|
| 0 — Critical | F1–F3, F5–F6 | ✅ Done in v0.1.1–v0.1.3 |
| 1 — Verification | F8–F10, F12 | ✅ Done in v0.1.2–v0.1.3 |
| 2 — Robustness | F13, F15–F16 | ✅ Done in v0.1.2–v0.1.3 |
| 3 — Test coverage | F17–F26 (62 E2E tests) | ✅ Done in v0.1.2–v0.1.3 |
| 4 — Operational hardening | F27–F39 | ✅ Done in v0.1.3 |
| 4 — Upgrade migrations | F40 | ✅ Done in v0.2.1 |
| 5 — Nice-to-have | F41–F51 | ✅ Done in v0.1.3 |
TPC-H baseline: 22/22 queries pass deterministic correctness checks across
multiple mutation cycles (just test-tpch, SF=0.01).
Queries are derived from the TPC-H Benchmark specification; results are not comparable to published TPC results. TPC Benchmark™ is a trademark of TPC.
ORDER BY / LIMIT / OFFSET — TopK Support ✅
In plain terms: Stream tables can now be defined with
ORDER BY ... LIMIT N— for example "keep the top 10 best-selling products". When the underlying data changes, only the top-N slot is updated incrementally rather than recomputing the entire sorted list from scratch every tick.
ORDER BY ... LIMIT N defining queries are accepted and refreshed correctly.
All 9 plan items (TK1–TK9) implemented, including 5 TPC-H queries with ORDER BY
restored (Q2, Q3, Q10, Q18, Q21).
| Item | Description | Status |
|---|---|---|
| TK1 | E2E tests for FETCH FIRST / FETCH NEXT rejection | ✅ Done |
| TK2 | OFFSET without ORDER BY warning in subqueries | ✅ Done |
| TK3 | detect_topk_pattern() + TopKInfo struct in parser.rs | ✅ Done |
| TK4 | Catalog columns: pgt_topk_limit, pgt_topk_order_by | ✅ Done |
| TK5 | TopK-aware refresh path (scoped recomputation via MERGE) | ✅ Done |
| TK6 | DVM pipeline bypass for TopK tables in api.rs | ✅ Done |
| TK7 | E2E + unit tests (e2e_topk_tests.rs, 18 tests) | ✅ Done |
| TK8 | Documentation (SQL Reference, FAQ, CHANGELOG) | ✅ Done |
| TK9 | TPC-H: restored ORDER BY + LIMIT in Q2, Q3, Q10, Q18, Q21 | ✅ Done |
See PLAN_ORDER_BY_LIMIT_OFFSET.md.
Diamond Dependency Consistency ✅
In plain terms: A "diamond" is when two stream tables share the same source (A → B, A → C) and a third (D) reads from both B and C. Without special handling, updating A could refresh B before C, leaving D briefly in an inconsistent state where it sees new-B but old-C. This groups B and C into an atomic refresh unit so D always sees them change together in a single step.
Atomic refresh groups eliminate the inconsistency window in diamond DAGs (A→B→D, A→C→D). All 8 plan items (D1–D8) implemented.
| Item | Description | Status |
|---|---|---|
| D1 | Data structures (Diamond, ConsistencyGroup) in dag.rs | ✅ Done |
| D2 | Diamond detection algorithm in dag.rs | ✅ Done |
| D3 | Consistency group computation in dag.rs | ✅ Done |
| D4 | Catalog columns + GUCs (diamond_consistency, diamond_schedule_policy) | ✅ Done |
| D5 | Scheduler wiring with SAVEPOINT loop | ✅ Done |
| D6 | Monitoring function pgtrickle.diamond_groups() | ✅ Done |
| D7 | E2E test suite (tests/e2e_diamond_tests.rs) | ✅ Done |
| D8 | Documentation (SQL_REFERENCE.md, CONFIGURATION.md, ARCHITECTURE.md) | ✅ Done |
See PLAN_DIAMOND_DEPENDENCY_CONSISTENCY.md.
Transactional IVM — IMMEDIATE Mode ✅
In plain terms: Normally stream tables refresh on a schedule (every N seconds). IMMEDIATE mode updates the stream table inside the same database transaction as the source table change — so by the time your INSERT/UPDATE/ DELETE commits, the stream table is already up to date. Zero lag, at the cost of a slightly slower write.
New IMMEDIATE refresh mode that updates stream tables within the same
transaction as base table DML, using statement-level AFTER triggers with
transition tables. Phase 1 (core engine) and Phase 3 (extended SQL support)
are complete. Phase 2 (pg_ivm compatibility layer) is postponed. Phase 4
(performance optimizations) has partial completion (delta SQL template caching).
| Item | Description | Status |
|---|---|---|
| TI1 | RefreshMode::Immediate enum, catalog CHECK, API validation | ✅ Done |
| TI2 | Statement-level IVM trigger functions with transition tables | ✅ Done |
| TI3 | DeltaSource::TransitionTable — Scan operator dual-path | ✅ Done |
| TI4 | Delta application (DELETE + INSERT ON CONFLICT) | ✅ Done |
| TI5 | Advisory lock-based concurrency (IvmLockMode) | ✅ Done |
| TI6 | TRUNCATE handling (full refresh of stream table) | ✅ Done |
| TI7 | alter_stream_table mode switching (DIFFERENTIAL↔IMMEDIATE, FULL↔IMMEDIATE) | ✅ Done |
| TI8 | Query restriction validation (validate_immediate_mode_support) | ✅ Done |
| TI9 | Delta SQL template caching (thread-local IVM_DELTA_CACHE) | ✅ Done |
| TI10 | Window functions, LATERAL, scalar subqueries in IMMEDIATE mode | ✅ Done |
| TI11 | Cascading IMMEDIATE stream tables (ST_A → ST_B) | ✅ Done |
| TI12 | 29 E2E tests + 8 unit tests | ✅ Done |
| TI13 | Documentation (SQL Reference, Architecture, FAQ, CHANGELOG) | ✅ Done |
Remaining performance optimizations (ENR-based transition table access, aggregate fast-path, C-level trigger functions, prepared statement reuse) are tracked under post-1.0 A2.
See PLAN_TRANSACTIONAL_IVM.md.
Exit criteria:
-
ORDER BY ... LIMIT N(TopK) defining queries accepted and refreshed correctly - TPC-H queries Q2, Q3, Q10, Q18, Q21 pass with original LIMIT restored
- Diamond dependency consistency (D1–D8) implemented and E2E-tested
- IMMEDIATE refresh mode: INSERT/UPDATE/DELETE on base table updates stream table within the same transaction
- Window functions, LATERAL, scalar subqueries work in IMMEDIATE mode
- Cascading IMMEDIATE stream tables (ST_A → ST_B) propagate correctly
- Concurrent transaction tests pass
v0.2.1 — Upgrade Infrastructure & Documentation
Status: Released (2026-03-05).
Patch release focused on upgrade safety, documentation, and three catalog
schema additions via sql/pg_trickle--0.2.0--0.2.1.sql:
Completed items (click to expand)
has_keyless_source BOOLEAN NOT NULL DEFAULT FALSE— EC-06 keyless source flag; changes apply strategy from MERGE to counted DELETE when set.function_hashes TEXT— EC-16 function-body hash map; forces a full refresh when a referenced function's body changes silently.topk_offset INT— OS2 catalog field for paged TopK OFFSET support, shipped and used in this release.
Upgrade Migration Infrastructure ✅
In plain terms: When you run
ALTER EXTENSION pg_trickle UPDATE, all your stream tables should survive intact. This adds the safety net that makes that true: automated scripts that check every upgrade script covers all database objects, real end-to-end tests that actually perform the upgrade in a test container, and CI gates that catch regressions before they reach users.
Complete safety net for ALTER EXTENSION pg_trickle UPDATE:
| Item | Description | Status |
|---|---|---|
| U1 | scripts/check_upgrade_completeness.sh — CI completeness checker | ✅ Done |
| U2 | sql/archive/ with archived SQL baselines per version | ✅ Done |
| U3 | tests/Dockerfile.e2e-upgrade for real upgrade tests | ✅ Done |
| U4 | 6 upgrade E2E tests (function parity, stream table survival, etc.) | ✅ Done |
| U5 | CI: upgrade-check (every PR) + upgrade-e2e (push-to-main) | ✅ Done |
| U6 | docs/UPGRADING.md user-facing upgrade guide | ✅ Done |
| U7 | just check-upgrade, just build-upgrade-image, just test-upgrade | ✅ Done |
| U8 | Fixed 0.1.3→0.2.0 upgrade script (was no-op placeholder) | ✅ Done |
Documentation Expansion ✅
In plain terms: Added six new pages to the documentation book: a dbt integration guide, contributing guide, security policy, release process, a comparison with the pg_ivm extension, and a deep-dive explaining why row-level triggers were chosen over logical replication for CDC.
GitHub Pages book grew from 14 to 20 pages:
| Page | Section | Source |
|---|---|---|
| dbt Integration | Integrations | dbt-pgtrickle/README.md |
| Contributing | Reference | CONTRIBUTING.md |
| Security Policy | Reference | SECURITY.md |
| Release Process | Reference | docs/RELEASE.md |
| pg_ivm Comparison | Research | plans/ecosystem/GAP_PG_IVM_COMPARISON.md |
| Triggers vs Replication | Research | plans/sql/REPORT_TRIGGERS_VS_REPLICATION.md |
Exit criteria:
-
ALTER EXTENSION pg_trickle UPDATEfrom 0.1.3→0.2.0 tested end-to-end - Completeness check passes (upgrade script covers all pgrx-generated SQL objects)
- CI enforces upgrade script completeness on every PR
- All documentation pages build and render in mdBook
v0.2.2 — OFFSET, AUTO Mode, ALTER QUERY, Edge Cases & CDC Hardening
Status: Released (2026-03-08).
This milestone shipped paged TopK OFFSET support, AUTO-by-default refresh selection, ALTER QUERY, the remaining upgrade-tooling work, edge-case and WAL CDC hardening, IMMEDIATE-mode parity fixes, and the outstanding documentation sweep.
Completed items (click to expand)
ORDER BY + LIMIT + OFFSET (Paged TopK) — Finalization ✅
In plain terms: Extends TopK to support OFFSET — so you can define a stream table as "rows 11–20 of the top-20 best-selling products" (page 2 of a ranked list). Useful for paginated leaderboards, ranked feeds, or any use case where you want a specific window into a sorted result.
Core implementation is complete (parser, catalog, refresh path, docs, 9 E2E
tests). The topk_offset catalog column shipped in v0.2.1 and is exercised
by the paged TopK feature here.
| Item | Description | Status | Ref |
|---|---|---|---|
| OS1 | 9 OFFSET E2E tests in e2e_topk_tests.rs | ✅ Done | PLAN_OFFSET_SUPPORT.md §Step 6 |
| OS2 | sql/pg_trickle--0.2.1--0.2.2.sql — function signature updates (no schema DDL needed) | ✅ Done | PLAN_OFFSET_SUPPORT.md §Step 2 |
AUTO Refresh Mode ✅
In plain terms: Changes the default from "always try differential (incremental) refresh" to a smart automatic selection: use differential when the query supports it, fall back to a full re-scan when it doesn't. New stream tables also get a calculated schedule interval instead of a hardcoded 1-minute default.
| Item | Description | Status | Ref |
|---|---|---|---|
| AM1 | RefreshMode::Auto — uses DIFFERENTIAL when supported, falls back to FULL | ✅ Done | PLAN_REFRESH_MODE_DEFAULT.md |
| AM2 | create_stream_table default changed from 'DIFFERENTIAL' to 'AUTO' | ✅ Done | — |
| AM3 | create_stream_table schedule default changed from '1m' to 'calculated' | ✅ Done | — |
ALTER QUERY ✅
In plain terms: Lets you change the SQL query of an existing stream table without dropping and recreating it. pg_trickle inspects the old and new queries, determines what type of change was made (added a column, dropped a column, or fundamentally incompatible change), and performs the most minimal migration possible — updating in place where it can, rebuilding only when it must.
| Item | Description | Status | Ref |
|---|---|---|---|
| AQ1 | alter_stream_table(query => ...) — validate, classify schema change, migrate storage | ✅ Done | PLAN_ALTER_QUERY.md |
| AQ2 | Schema classification: same, compatible (ADD/DROP COLUMN), incompatible (full rebuild) | ✅ Done | — |
| AQ3 | ALTER-aware cycle detection (check_for_cycles_alter) | ✅ Done | — |
| AQ4 | CDC dependency migration (add/remove triggers for changed sources) | ✅ Done | — |
| AQ5 | SQL Reference & CHANGELOG documentation | ✅ Done | — |
Upgrade Tooling ✅
In plain terms: If the compiled extension library (
.sofile) is a different version than the SQL objects in the database, the scheduler now warns loudly at startup instead of failing in confusing ways later. Also adds FAQ entries and cross-links for common upgrade questions.
| Item | Description | Status | Ref |
|---|---|---|---|
| UG1 | Version mismatch check — scheduler warns if .so version ≠ SQL version | ✅ Done | PLAN_UPGRADE_MIGRATIONS.md §5.2 |
| UG2 | FAQ upgrade section — 3 new entries with UPGRADING.md cross-links | ✅ Done | PLAN_UPGRADE_MIGRATIONS.md §5.4 |
| UG3 | CI and local upgrade automation now target 0.2.2 (upgrade-check, upgrade-image defaults, upgrade E2E env) | ✅ Done | PLAN_UPGRADE_MIGRATIONS.md |
IMMEDIATE Mode Parity ✅
In plain terms: Closes two remaining SQL patterns that worked in DIFFERENTIAL mode but not in IMMEDIATE mode. Recursive CTEs (queries that reference themselves to compute e.g. graph reachability or org-chart hierarchies) now work in IMMEDIATE mode with a configurable depth guard. TopK (ORDER BY + LIMIT) queries also get a dedicated fast micro-refresh path in IMMEDIATE mode.
Close the gap between DIFFERENTIAL and IMMEDIATE mode SQL coverage for the two remaining high-risk patterns — recursive CTEs and TopK queries.
| Item | Description | Effort | Ref |
|---|---|---|---|
| IM1 | Validate recursive CTE semi-naive in IMMEDIATE mode; add stack-depth guard for deeply recursive defining queries | 2–3d | PLAN_EDGE_CASES_TIVM_IMPL_ORDER.md Stage 6 §5.1 |
| IM2 | TopK in IMMEDIATE mode: statement-level micro-refresh + ivm_topk_max_limit GUC | 2–3d | PLAN_EDGE_CASES_TIVM_IMPL_ORDER.md Stage 6 §5.2 |
IMMEDIATE parity subtotal: ✅ Complete (IM1 + IM2)
Edge Case Hardening ✅
In plain terms: Three targeted fixes for uncommon-but-real scenarios: a cap on CUBE/ROLLUP combinatorial explosion (which can generate thousands of grouping variants from a single query and crash the database); automatic recovery when CDC gets stuck in a "transitioning" state after a database restart; and polling-based change detection for foreign tables (tables in external databases) that can't use triggers or WAL.
Self-contained items from Stage 7 of the edge-cases/TIVM implementation plan.
| Item | Description | Effort | Ref |
|---|---|---|---|
| EC1 | pg_trickle.max_grouping_set_branches GUC — cap CUBE/ROLLUP branch-count explosion | 4h | PLAN_EDGE_CASES.md EC-02 |
| EC2 | Post-restart CDC TRANSITIONING health check — detect stuck CDC transitions after crash or restart | 1d | PLAN_EDGE_CASES.md EC-20 |
| EC3 | Foreign table support: polling-based change detection via periodic re-execution | 2–3d | PLAN_EDGE_CASES.md EC-05 |
Edge-case hardening subtotal: ✅ Complete (EC1 + EC2 + EC3)
Documentation Sweep
In plain terms: Filled three documentation gaps: what happens to an in-flight refresh if you run DDL (ALTER TABLE, DROP INDEX) at the same time; limitations when using pg_trickle on standby replicas; and a PgBouncer configuration guide explaining the session-mode requirement and incompatible settings.
Remaining documentation gaps identified in Stage 7 of the gap analysis.
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| DS1 | DDL-during-refresh behaviour: document safe patterns and races | 2h | ✅ Done | PLAN_EDGE_CASES.md EC-17 |
| DS2 | Replication/standby limitations: document in FAQ and Architecture | 3h | ✅ Done | PLAN_EDGE_CASES.md EC-21/22/23 |
| DS3 | PgBouncer configuration guide: session-mode requirements and known incompatibilities | 2h | ✅ Done | PLAN_EDGE_CASES.md EC-28 |
Documentation sweep subtotal: ✅ Complete
WAL CDC Hardening
In plain terms: WAL (Write-Ahead Log) mode tracks changes by reading PostgreSQL's internal replication stream rather than using row-level triggers — which is more efficient and works across concurrent sessions. This work added a complete E2E test suite for WAL mode, hardened the automatic fallback from WAL to trigger mode when WAL isn't available, and promoted
cdc_mode = 'auto'(try WAL first, fall back to triggers) as the default.
WAL decoder F2–F3 fixes (keyless pk_hash,
old_*columns for UPDATE) landed in v0.1.3.
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| W1 | WAL mode E2E test suite (parallel to trigger suite) | 8–12h | ✅ Done | PLAN_HYBRID_CDC.md |
| W2 | WAL→trigger automatic fallback hardening | 4–6h | ✅ Done | PLAN_HYBRID_CDC.md |
| W3 | Promote pg_trickle.cdc_mode = 'auto' to default | ~1h | ✅ Done | PLAN_HYBRID_CDC.md |
WAL CDC subtotal: ~13–19 hours
Exit criteria:
-
ORDER BY + LIMIT + OFFSETdefining queries accepted, refreshed, and E2E-tested -
sql/pg_trickle--0.2.1--0.2.2.sqlexists (column pre-provisioned in 0.2.1; function signature updates) - Upgrade completeness check passes for 0.2.1→0.2.2
- CI and local upgrade-E2E defaults target 0.2.2
-
Version check fires at scheduler startup if
.so/SQL versions diverge -
IMMEDIATE mode: recursive CTE semi-naive validated;
ivm_recursive_max_depthdepth guard added - IMMEDIATE mode: TopK micro-refresh fully tested end-to-end (10 E2E tests)
-
max_grouping_set_branchesGUC guards CUBE/ROLLUP explosion (3 E2E tests) - Post-restart CDC TRANSITIONING health check in place
- Foreign table polling-based CDC implemented (3 E2E tests)
- DDL-during-refresh and standby/replication limitations documented
- WAL CDC mode passes full E2E suite
-
E2E tests pass (
just build-e2e-image && just test-e2e)
v0.2.3 — Non-Determinism, CDC/Mode Gaps & Operational Polish
Status: Released (2026-03-09).
Completed items (click to expand)
Goal: Close a small set of high-leverage correctness and operational gaps that do not need to wait for the larger v0.3.0 parallel refresh, security, and partitioning work. This milestone tightens refresh-mode behavior, makes CDC transitions easier to observe, and removes one silent correctness hazard in DIFFERENTIAL mode.
Non-Deterministic Function Handling
In plain terms: Functions like
random(),gen_random_uuid(), andclock_timestamp()return a different value every time they're called. In DIFFERENTIAL mode, pg_trickle computes what changed between the old and new result — but if a function changes on every call, the "change" is meaningless and produces phantom rows. This detects such functions at stream-table creation time and rejects them in DIFFERENTIAL mode (they still work fine in FULL or IMMEDIATE mode).
Status: Done. Volatility lookup, OpTree enforcement, E2E coverage, and documentation are complete.
Volatile functions (random(), gen_random_uuid(), clock_timestamp()) break
delta computation in DIFFERENTIAL mode — values change on each evaluation,
causing phantom changes and corrupted row identity hashes. This is a silent
correctness gap.
| Item | Description | Effort | Ref |
|---|---|---|---|
| ND1 | Volatility lookup via pg_proc.provolatile + recursive Expr scanner | Done | PLAN_NON_DETERMINISM.md §Part 1 |
| ND2 | OpTree volatility walker + enforcement policy (reject volatile in DIFFERENTIAL, warn for stable) | Done | PLAN_NON_DETERMINISM.md §Part 2 |
| ND3 | E2E tests (volatile rejected, stable warned, immutable allowed, nested volatile in WHERE) | Done | PLAN_NON_DETERMINISM.md §E2E Tests |
| ND4 | Documentation (SQL_REFERENCE.md, DVM_OPERATORS.md) | Done | PLAN_NON_DETERMINISM.md §Files |
Non-determinism subtotal: ~4–6 hours
CDC / Refresh Mode Interaction Gaps ✅
In plain terms: pg_trickle has four CDC modes (trigger, WAL, auto, per-table override) and four refresh modes (FULL, DIFFERENTIAL, IMMEDIATE, AUTO). Not every combination makes sense, and some had silent bugs. This fixed six specific gaps: stale change buffers not being flushed after FULL refreshes (so they got replayed again on the next tick), a missing error for the IMMEDIATE + WAL combination, a new
pgt_cdc_statusmonitoring view, per-table CDC mode overrides, and a guard against refreshing stream tables that haven't been populated yet.
Six gaps between the four CDC modes and four refresh modes — missing
validations, resource leaks, and observability holes. Phased from quick wins
(pure Rust) to a larger feature (per-table cdc_mode override).
| Item | Description | Effort | Ref |
|---|---|---|---|
| G6 | Defensive is_populated + empty-frontier check in execute_differential_refresh() | Done | PLAN_CDC_MODE_REFRESH_MODE_GAPS.md §G6 |
| G2 | Validate IMMEDIATE + cdc_mode='wal' — global-GUC path logs INFO; explicit per-table override is rejected with a clear error | Done | PLAN_CDC_MODE_REFRESH_MODE_GAPS.md §G2 |
| G3 | Advance WAL replication slot after FULL refresh; flush change buffers | Done | PLAN_CDC_MODE_REFRESH_MODE_GAPS.md §G3 |
| G4 | Flush change buffers after AUTO→FULL adaptive fallback (prevents ping-pong) | Done | PLAN_CDC_MODE_REFRESH_MODE_GAPS.md §G4 |
| G5 | pgtrickle.pgt_cdc_status view + NOTIFY on CDC transitions | Done | PLAN_CDC_MODE_REFRESH_MODE_GAPS.md §G5 |
| G1 | Per-table cdc_mode override (SQL API, catalog, dbt, migration) | Done | PLAN_CDC_MODE_REFRESH_MODE_GAPS.md §G1 |
CDC/refresh mode gaps subtotal: ✅ Complete
Progress: G6 is now implemented in
v0.2.3: the low-level differential executor rejects unpopulated stream tables and missing frontiers before it can scan from0/0, while the public manual-refresh path continues to fall back to FULL forinitialize => falsestream tables.Progress: G1 and G2 are now complete:
create_stream_table()andalter_stream_table()accept an optional per-tablecdc_modeoverride, the requested value is stored inpgt_stream_tables.requested_cdc_mode, dbt forwards the setting, and shared-source WAL transition eligibility is now resolved conservatively from all dependent deferred stream tables. The cluster-widepg_trickle.cdc_mode = 'wal'path still logs INFO forrefresh_mode = 'IMMEDIATE', while explicit per-tablecdc_mode => 'wal'requests are rejected for IMMEDIATE mode with a clear error.Progress: G3 and G4 are now implemented in
v0.2.3:advance_slot_to_current()inwal_decoder.rsadvances WAL slots after each FULL refresh; the sharedpost_full_refresh_cleanup()helper inrefresh.rsadvances all WAL/TRANSITIONING slots and flushes change buffers, called fromscheduler.rsafter every Full/Reinitialize execution and from the adaptive fallback path. This prevents change-buffer ping-pong on bulk-loaded tables.Progress: G5 is now implemented in
v0.2.3: thepgtrickle.pgt_cdc_statusconvenience view has been added, and acdc_modestext-array column surfaces per-source CDC modes inpgtrickle.pg_stat_stream_tables. NOTIFY on CDC transitions (TRIGGER → TRANSITIONING → WAL) was already implemented viaemit_cdc_transition_notify()inwal_decoder.rs.
Progress: The SQL upgrade path for these CDC and monitoring changes is in place via
sql/pg_trickle--0.2.2--0.2.3.sql, which addsrequested_cdc_mode, updates thecreate_stream_table/alter_stream_tablesignatures, recreatespgtrickle.pg_stat_stream_tables, and addspgtrickle.pgt_cdc_statusforALTER EXTENSION ... UPDATEusers.
Operational
In plain terms: Four housekeeping improvements: clean up prepared statements when the database catalog changes (prevents stale caches after DDL); make WAL slot lag alert thresholds configurable rather than hardcoded; simplify a confusing GUC setting (
user_triggers) with a deprecated alias; and add apg_trickle_dumptool that exports all stream table definitions to a replayable SQL file — useful as a backup before running an upgrade.
| Item | Description | Effort | Ref |
|---|---|---|---|
| O1 | Prepared statement cleanup on cache invalidation | Done | GAP_SQL_PHASE_7.md G4.4 |
| O2 | Slot lag alerting thresholds configurable (slot_lag_warning_threshold_mb, slot_lag_critical_threshold_mb) | Done | PLAN_HYBRID_CDC.md §6.2 |
| O3 | Simplify pg_trickle.user_triggers GUC (canonical auto / off, deprecated on alias) | Done | PLAN_FEATURE_CLEANUP.md C5 |
| O4 | pg_trickle_dump: SQL export tool for manual backup before upgrade | Done | PLAN_UPGRADE_MIGRATIONS.md §5.3 |
Operational subtotal: Done
Progress: All four operational items are now shipped in
v0.2.3. Warning-level and critical WAL slot lag thresholds are configurable, prepared__pgt_merge_*statements are cleaned up on shared cache invalidation,pg_trickle.user_triggersis simplified to canonicalauto/offsemantics with a deprecatedonalias, andpg_trickle_dumpprovides a replayable SQL export for upgrade backups.
v0.2.3 total: ~45–66 hours
Exit criteria:
- Volatile functions rejected in DIFFERENTIAL mode; stable functions warned
- DIFFERENTIAL on unpopulated ST returns error (G6)
-
IMMEDIATE + explicit
cdc_mode='wal'rejected with clear error (G2) - WAL slot advanced after FULL refresh; change buffers flushed (G3)
- Adaptive fallback flushes change buffers; no ping-pong cycles (G4)
-
pgtrickle.pgt_cdc_statusview available; NOTIFY on CDC transitions (G5) - Prepared statement cache cleanup works after invalidation
-
Per-table
cdc_modeoverride functional in SQL API and dbt adapter (G1) -
Extension upgrade path tested (
0.2.2 → 0.2.3)
v0.3.0 — DVM Correctness, SAST & Test Coverage
Status: Released (2026-03-11).
Completed items (click to expand)
Goal: Re-enable all 18 previously-ignored DVM correctness E2E tests by fixing HAVING, FULL OUTER JOIN, correlated EXISTS+HAVING, and correlated scalar subquery differential computation bugs. Harden the SAST toolchain with privilege-context rules and an unsafe-block baseline. Expand TPC-H coverage with rollback, mode-comparison, single-row, and DAG tests.
DVM Correctness Fixes
In plain terms: The Differential View Maintenance engine — the core algorithm that computes what changed incrementally — had four correctness bugs in specific SQL patterns. Queries using these patterns were silently producing wrong results and had their tests marked "ignored". This release fixes all four: HAVING clauses on aggregates, FULL OUTER JOINs, correlated EXISTS subqueries combined with HAVING, and correlated scalar subqueries in SELECT lists. All 18 previously-ignored E2E tests now pass.
| Item | Description | Status |
|---|---|---|
| DC1 | HAVING clause differential correctness — fix COUNT(*) rewrite and threshold-crossing upward rescan (5 tests un-ignored) | ✅ Done |
| DC2 | FULL OUTER JOIN differential correctness — fix row-id mismatch, compound GROUP BY expressions, SUM NULL semantics, and rescan CTE SELECT list (5 tests un-ignored) | ✅ Done |
| DC3 | Correlated EXISTS with HAVING differential correctness — fix EXISTS sublink parser discarding GROUP BY/HAVING, row-id mismatch for Project(SemiJoin), and diff_project row-id recomputation (1 test un-ignored) | ✅ Done |
| DC4 | Correlated scalar subquery differential correctness — rewrite_correlated_scalar_in_select rewrites correlated scalar subqueries to LEFT JOINs before DVM parsing (2 tests un-ignored) | ✅ Done |
DVM correctness subtotal: 18 previously-ignored E2E tests re-enabled (0 remaining)
SAST Program (Phases 1–3)
In plain terms: Adds formal static security analysis (SAST) to every build. CodeQL and Semgrep scan for known vulnerability patterns — for example, using SECURITY DEFINER functions without locking down
search_path, or callingSET ROLEin ways that could be abused. Separately, every Rustunsafe {}block is inventoried and counted; any PR that adds new unsafe blocks beyond the committed baseline fails CI automatically.
| Item | Description | Status |
|---|---|---|
| S1 | CodeQL + cargo deny + initial Semgrep baseline — zero findings across 115 Rust source files | ✅ Done |
| S2 | Narrow rust.panic-in-sql-path scope — exclude src/dvm/** and src/bin/** to eliminate 351 false-positive alerts | ✅ Done |
| S3 | sql.row-security.disabled Semgrep rule — flag SET LOCAL row_security = off | ✅ Done |
| S4 | sql.set-role.present Semgrep rule — flag SET ROLE / RESET ROLE patterns | ✅ Done |
| S5 | Updated sql.security-definer.present message to require explicit SET search_path | ✅ Done |
| S6 | scripts/unsafe_inventory.sh + .unsafe-baseline — per-file unsafe { counter with committed baseline (1309 blocks across 6 files) | ✅ Done |
| S7 | .github/workflows/unsafe-inventory.yml — advisory CI workflow; fails if any file exceeds its baseline | ✅ Done |
| S8 | Remove pull_request trigger from CodeQL + Semgrep workflows (no inline PR annotations; runs on push-to-main + weekly schedule) | ✅ Done |
SAST subtotal: Phases 1–3 complete; Phase 4 rule promotion tracked as post-v0.3.0 cleanup
TPC-H Test Suite Enhancements (T1–T6)
In plain terms: TPC-H is an industry-standard analytical query benchmark — 22 queries against a simulated supply-chain database. This extends the pg_trickle TPC-H test suite to verify four additional scenarios that the basic correctness checks didn't cover: that ROLLBACK atomically undoes an IVM stream table update; that DIFFERENTIAL and IMMEDIATE mode produce identical answers for the same data; that single-row mutations work correctly (not just bulk changes); and that multi-level stream table DAGs refresh in the correct topological order.
| Item | Description | Status |
|---|---|---|
| T1 | __pgt_count < 0 guard in assert_tpch_invariant — over-retraction detector, applies to all existing TPC-H tests | ✅ Done |
| T2 | Skip-set regression guard in DIFFERENTIAL + IMMEDIATE tests — any newly skipped query not in the allowlist fails CI | ✅ Done |
| T3 | test_tpch_immediate_rollback — verify ROLLBACK restores IVM stream table atomically across RF mutations | ✅ Done |
| T4 | test_tpch_differential_vs_immediate — side-by-side comparison: both incremental modes produce identical results after shared mutations | ✅ Done |
| T5 | test_tpch_single_row_mutations + SQL fixtures — single-row INSERT/UPDATE/DELETE IVM trigger paths on Q01/Q06/Q03 | ✅ Done |
| T6a | test_tpch_dag_chain — two-level DAG (Q01 → filtered projection), refreshed in topological order | ✅ Done |
| T6b | test_tpch_dag_multi_parent — multi-parent fan-in (Q01 + Q06 → UNION ALL), DIFFERENTIAL mode | ✅ Done |
TPC-H subtotal: T1–T6 complete; 22/22 TPC-H queries passing
Exit criteria:
- All 18 previously-ignored DVM correctness E2E tests re-enabled
- SAST Phases 1–3 deployed; unsafe baseline committed; CodeQL zero findings
- TPC-H T1–T6 implemented; rollback, differential-vs-immediate, single-row, and DAG tests pass
-
Extension upgrade path tested (
0.2.3 → 0.3.0)
v0.4.0 — Parallel Refresh & Performance Hardening
Status: Released (2026-03-12).
Completed items (click to expand)
Goal: Deliver true parallel refresh, cut write-side CDC overhead with statement-level triggers, close a cross-source snapshot consistency gap, and ship quick ergonomic and infrastructure improvements. Together these close the main performance and operational gaps before the security and partitioning work begins.
Parallel Refresh
In plain terms: Right now the scheduler refreshes stream tables one at a time. This feature lets multiple stream tables refresh simultaneously — like running several errands at once instead of in a queue. When you have dozens of stream tables, this can cut total refresh latency dramatically.
Detailed implementation is tracked in PLAN_PARALLELISM.md. The older REPORT_PARALLELIZATION.md remains the options-analysis precursor.
| Item | Description | Effort | Ref |
|---|---|---|---|
| P1 | Phase 0–1: instrumentation, dry_run, and execution-unit DAG (atomic groups + IMMEDIATE closures) | 12–20h | PLAN_PARALLELISM.md §10 |
| P2 | Phase 2–4: job table, worker budget, dynamic refresh workers, and ready-queue dispatch | 16–28h | PLAN_PARALLELISM.md §10 |
| P3 | Phase 5–7: composite units, observability, rollout gating, and CI validation | 12–24h | PLAN_PARALLELISM.md §10 |
Progress:
-
P1 — Phase 0 + Phase 1 (done): GUCs (
parallel_refresh_mode,max_dynamic_refresh_workers),ExecutionUnit/ExecutionUnitDagtypes indag.rs, IMMEDIATE-closure collapsing, dry-run logging in scheduler, 10 new unit tests (1211 total). -
P2 — Phase 2–4 (done): Job table (
pgt_scheduler_jobs), catalog CRUD, shared-memory token pool (Phase 2). Dynamic worker entry point, spawn helper, reconciliation (Phase 3). Coordinator dispatch loop with ready-queue scheduling, per-db/cluster-wide budget enforcement, transaction-split spawning, dynamic poll interval, 8 new unit tests (Phase 4). 1233 unit tests total. -
P3a — Phase 5 (done): Composite unit execution —
execute_worker_atomic_group()with C-level sub-transaction rollback,execute_worker_immediate_closure()with root-only refresh (IMMEDIATE triggers propagate downstream). Replaces Phase 3 serial placeholder. -
P3b — Phase 6 (done): Observability —
worker_pool_status(),parallel_job_status()SQL functions;health_check()extended withworker_poolandjob_queuechecks; docs updated. -
P3c — Phase 7 (done): Rollout — GUC documentation in
CONFIGURATION.md, worker-budget guidance inARCHITECTURE.md, CI E2E coverage withPGT_PARALLEL_MODE=on, feature stays gated behindparallel_refresh_mode = 'off'default.
Parallel refresh subtotal: ~40–72 hours
Statement-Level CDC Triggers
In plain terms: Previously, when you updated 1,000 rows in a source table, the database fired a "row changed" notification 1,000 times — once per row. Now it fires once per statement, handing off all 1,000 changed rows in a single batch. For bulk operations like data imports or batch updates this is 50–80% cheaper; for single-row changes you won't notice a difference.
Replace per-row AFTER triggers with statement-level triggers using
NEW TABLE AS __pgt_new / OLD TABLE AS __pgt_old. Expected write-side
trigger overhead reduction of 50–80% for bulk DML; neutral for single-row.
| Item | Description | Effort | Ref |
|---|---|---|---|
✅ Done — build_stmt_trigger_fn_sql in cdc.rs; REFERENCING NEW TABLE AS __pgt_new OLD TABLE AS __pgt_old FOR EACH STATEMENT created by create_change_trigger | |||
pg_trickle.cdc_trigger_mode = 'statement'|'row' GUC + migration to replace row-level triggers on ALTER EXTENSION UPDATE | ✅ Done — CdcTriggerMode enum in config.rs; rebuild_cdc_triggers() in api.rs; 0.3.0→0.4.0 upgrade script migrates existing triggers | ||
✅ Done — bench_stmt_vs_row_cdc_matrix + bench_stmt_vs_row_cdc_quick in e2e_bench_tests.rs; runs via cargo test -- --ignored bench_stmt_vs_row_cdc_matrix |
Statement-level CDC subtotal: ✅ All done (~14h)
Cross-Source Snapshot Consistency (Phase 1)
In plain terms: Imagine a stream table that joins
ordersandcustomers. If a single transaction updates both tables, the old scheduler could read the newordersdata but the oldcustomersdata — a half-applied, internally inconsistent snapshot. This fix takes a "freeze frame" of the change log at the start of each scheduler tick and only processes changes up to that point, so all sources are always read from the same moment in time. Zero configuration required.
At start of each scheduler tick, snapshot pg_current_wal_lsn() as a
tick_watermark and cap all CDC consumption to that LSN. Zero user
configuration — prevents interleaved reads from two sources that were
updated in the same transaction from producing an inconsistent stream table.
| Item | Description | Effort | Ref |
|---|---|---|---|
pg_current_wal_lsn() per tick; cap frontier advance; log in pgt_refresh_history; pg_trickle.tick_watermark_enabled GUC (default on) | ✅ Done |
Cross-source consistency subtotal: ✅ All done
Ergonomic Hardening
In plain terms: Added helpful warning messages for common mistakes: "your WAL level isn't configured for logical replication", "this source table has no primary key — duplicate rows may appear", "this change will trigger a full re-scan of all source data". Think of these as friendly guardrails that explain why something might not work as expected.
| Item | Description | Effort | Ref |
|---|---|---|---|
_PG_init when cdc_mode='auto' but wal_level != 'logical' — prevents silent trigger-only operation | ✅ Done | ||
create_stream_table when source has no primary key — surfaces keyless duplicate-row risk | ✅ Done (pre-existing in warn_source_table_properties) | ||
WARNING when alter_stream_table triggers an implicit full refresh | ✅ Done |
Ergonomic hardening subtotal: ✅ All done
Code Coverage
In plain terms: Every pull request now automatically reports what percentage of the code is exercised by tests, and which specific lines are never touched. It's like a map that highlights the unlit corners — helpful for spotting blind spots before they become bugs.
| Item | Description | Effort | Ref |
|---|---|---|---|
with:, add codecov.yml with patch targets for src/dvm/, add README badge, verify first upload | ✅ Done — reports live at app.codecov.io/github/grove/pg-trickle |
v0.4.0 total: ~60–94 hours
Exit criteria:
-
max_concurrent_refreshesdrives real parallel refresh via coordinator + dynamic refresh workers -
Statement-level CDC triggers implemented (B1/B2/B3); benchmark harness in
bench_stmt_vs_row_cdc_matrix - LSN tick watermark active by default; no interleaved-source inconsistency in E2E tests
- Codecov badge on README; coverage report uploading
-
Extension upgrade path tested (
0.3.0 → 0.4.0)
v0.5.0 — Row-Level Security & Operational Controls
Status: Released (2026-03-13).
Completed items (click to expand)
Goal: Harden the security context for stream tables and IVM triggers, add source-level pause/resume gating for bulk-load coordination, and deliver small ergonomic improvements.
Row-Level Security (RLS) Support
In plain terms: Row-level security lets you write policies like "user Alice can only see rows where
tenant_id = 'alice'". Stream tables already honour these policies when users query them. What this work fixes is the machinery behind the scenes — the triggers and refresh functions that build the stream table need to see all rows regardless of who is running them, otherwise they'd produce an incomplete result. This phase hardens those internal components so they always have full visibility, while end-users still see only their filtered slice.
Stream tables materialize the full result set (like MATERIALIZED VIEW). RLS
is applied on the stream table itself for read-side filtering. Phase 1
hardens the security context; Phase 2 adds a tutorial; Phase 3 completes DDL
tracking. Phase 4 (per-role security_invoker) is deferred to post-1.0.
| Item | Description | Effort | Ref |
|---|---|---|---|
| R1 | Document RLS semantics in SQL_REFERENCE.md and FAQ.md | 1h | PLAN_ROW_LEVEL_SECURITY.md §3.1 |
| R2 | Disable RLS on change buffer tables (ALTER TABLE ... DISABLE ROW LEVEL SECURITY) | 30min | PLAN_ROW_LEVEL_SECURITY.md §3.1 R2 |
| R3 | Force superuser context for manual refresh_stream_table() (prevent "who refreshed it?" hazard) | 2h | PLAN_ROW_LEVEL_SECURITY.md §3.1 R3 |
| R4 | Force SECURITY DEFINER on IVM trigger functions (IMMEDIATE mode delta queries must see all rows) | 2h | PLAN_ROW_LEVEL_SECURITY.md §3.1 R4 |
| R5 | E2E test: RLS on source table does not affect stream table content | 1h | PLAN_ROW_LEVEL_SECURITY.md §3.1 R5 |
| R6 | Tutorial: RLS on stream tables (enable RLS, per-tenant policies, verify filtering) | 1.5h | PLAN_ROW_LEVEL_SECURITY.md §3.2 R6 |
| R7 | E2E test: RLS on stream table filters reads per role | 1h | PLAN_ROW_LEVEL_SECURITY.md §3.2 R7 |
| R8 | E2E test: IMMEDIATE mode + RLS on stream table | 30min | PLAN_ROW_LEVEL_SECURITY.md §3.2 R8 |
| R9 | Track ENABLE/DISABLE RLS DDL on source tables (AT_EnableRowSecurity et al.) in hooks.rs | 2h | PLAN_ROW_LEVEL_SECURITY.md §3.3 R9 |
| R10 | E2E test: ENABLE RLS on source table triggers reinit | 1h | PLAN_ROW_LEVEL_SECURITY.md §3.3 R10 |
RLS subtotal: ~8–12 hours (Phase 4
security_invokerdeferred to post-1.0)
Bootstrap Source Gating
In plain terms: A pause/resume switch for individual source tables. If you're bulk-loading 10 million rows into a source table (a nightly ETL import, for example), you can "gate" it first — the scheduler will skip refreshing any stream table that reads from it. Once the load is done you "ungate" it and a single clean refresh runs. Without gating, the CDC system would frantically process millions of intermediate changes during the load, most of which get immediately overwritten anyway.
Allow operators to pause CDC consumption for specific source tables (e.g. during bulk loads or ETL windows) without dropping and recreating stream tables. The scheduler skips any stream table whose transitive source set intersects the current gated set.
| Item | Description | Effort | Ref |
|---|---|---|---|
| BOOT-1 | pgtrickle.pgt_source_gates catalog table (source_relid, gated, gated_at, gated_by) | 30min | PLAN_BOOTSTRAP_GATING.md |
| BOOT-2 | gate_source(source TEXT) SQL function — sets gate, pg_notify scheduler | 1h | PLAN_BOOTSTRAP_GATING.md |
| BOOT-3 | ungate_source(source TEXT) + source_gates() introspection view | 30min | PLAN_BOOTSTRAP_GATING.md |
| BOOT-4 | Scheduler integration: load gated-source set per tick; skip and log SKIP in pgt_refresh_history | 2–3h | PLAN_BOOTSTRAP_GATING.md |
| BOOT-5 | E2E tests: single-source gate, coordinated multi-source, partial DAG, bootstrap with initialize => false | 3–4h | PLAN_BOOTSTRAP_GATING.md |
Bootstrap source gating subtotal: ~7–9 hours
Ergonomics & API Polish
In plain terms: A handful of quality-of-life improvements: track when someone manually triggered a refresh and log it in the history table; a one-row
quick_healthview that tells you at a glance whether the extension is healthy (total tables, any errors, any stale tables, scheduler running); acreate_stream_table_if_not_exists()helper so deployment scripts don't crash if the table was already created; andCALLsyntax wrappers so the functions feel like native PostgreSQL commands rather than extension functions.
| Item | Description | Effort | Ref |
|---|---|---|---|
| ERG-D | Record manual refresh_stream_table() calls in pgt_refresh_history with initiated_by='MANUAL' | 2h | PLAN_ERGONOMICS.md §D |
| ERG-E | pgtrickle.quick_health view — single-row status summary (total_stream_tables, error_tables, stale_tables, scheduler_running, status) | 2h | PLAN_ERGONOMICS.md §E |
| COR-2 | create_stream_table_if_not_exists() convenience wrapper | 30min | PLAN_CREATE_OR_REPLACE.md §COR-2 |
CREATE PROCEDURE wrappers for all four main SQL functions — enables CALL pgtrickle.create_stream_table(...) syntax | Deferred — PostgreSQL does not allow procedures and functions with the same name and argument types |
Ergonomics subtotal: ~5–5.5 hours (NAT-CALL deferred)
Performance Foundations (Wave 1)
These quick-win items from PLAN_NEW_STUFF.md ship alongside the RLS and operational work. Read the risk analyses in that document before implementing any item.
| Item | Description | Effort | Ref |
|---|---|---|---|
| A-3a | MERGE bypass — Append-Only INSERT path: expose APPEND ONLY declaration on CREATE STREAM TABLE; CDC heuristic fallback (fast-path until first DELETE/UPDATE seen) | 1–2 wk | PLAN_NEW_STUFF.md §A-3 |
A-4, B-2, and C-4 deferred to v0.6.0 Performance Wave 2 (scope mismatch with the RLS/operational-controls theme; correctness risk warrants a dedicated wave).
Performance foundations subtotal: ~10–20h (A-3a only)
v0.5.0 total: ~51–97h
Exit criteria:
- RLS semantics documented; change buffers RLS-hardened; IVM triggers SECURITY DEFINER
- RLS on stream table E2E-tested (DIFFERENTIAL + IMMEDIATE)
-
gate_source/ungate_sourceoperational; scheduler skips gated sources correctly -
quick_healthview andcreate_stream_table_if_not_existsavailable -
Manual refresh calls recorded in history with
initiated_by='MANUAL' - A-3a: Append-Only INSERT path eliminates MERGE for event-sourced stream tables
-
Extension upgrade path tested (
0.4.0 → 0.5.0)
v0.6.0 — Partitioning, Idempotent DDL, Edge Cases & Circular Dependency Foundation
Status: Released (2026-03-14).
Completed items (click to expand)
Goal: Validate partitioned source tables, add create_or_replace_stream_table
for idempotent deployments (critical for dbt and migration workflows), close all
remaining P0/P1 edge cases and two usability-tier gaps, harden ergonomics and
source gating, expand the dbt integration, fill SQL documentation gaps, and lay
the foundation for circular stream table DAGs.
Partitioning Support (Source Tables)
In plain terms: PostgreSQL lets you split large tables into smaller "partitions" — for example one partition per month for an
orderstable. This is a common technique for managing very large datasets. This work teaches pg_trickle to track all those partitions as a unit, so adding a new monthly partition doesn't silently break stream tables that depend onorders. It also handles the special case of foreign tables (tables that live in another database), restricting them to full-scan refresh since they can't be change-tracked the normal way.
| Item | Description | Effort | Ref |
|---|---|---|---|
| 8–12h | PLAN_PARTITIONING_SHARDING.md §7 | ||
ALTER TABLE orders ATTACH PARTITION orders_2026_04 ..., pg_trickle notices and rebuilds affected stream tables so the new partition's data is included. Without this, the new partition would be silently ignored. | 4–8h | PLAN_PARTITIONING_SHARDING.md §3.3 | |
| 2–4h | PLAN_PARTITIONING_SHARDING.md §3.4 | ||
postgres_fdw) can't have triggers or WAL tracking. pg_trickle now detects them and automatically uses full-scan refresh mode instead of failing with a confusing error. | 2–4h | PLAN_PARTITIONING_SHARDING.md §6.3 | |
| 2–4h | PLAN_PARTITIONING_SHARDING.md §8 |
Partitioning subtotal: ~18–32 hours
Idempotent DDL (create_or_replace) ✅
create_or_replace)In plain terms: Right now if you run
create_stream_table()twice with the same name it errors out, and changing the query meansdrop_stream_table()followed bycreate_stream_table()— which loses all the data in between.create_or_replace_stream_table()does the right thing automatically: if nothing changed it's a no-op, if only settings changed it updates in place, if the query changed it rebuilds. This is the same pattern asCREATE OR REPLACE FUNCTIONin PostgreSQL — and it's exactly what the dbt materialization macro needs so everydbt rundoesn't drop and recreate tables from scratch.
create_or_replace_stream_table() performs a smart diff: no-op if identical,
in-place alter for config-only changes, schema migration for ADD/DROP column,
full rebuild for incompatible changes. Eliminates the drop-and-recreate
pattern used by the dbt materialization macro.
| Item | Description | Effort | Ref |
|---|---|---|---|
create_or_replace_stream_table() compares the new definition against the existing one and picks the cheapest path: no-op if identical, settings-only update if just config changed, column migration if columns were added/dropped, or full rebuild if the query is fundamentally different. One function call replaces the drop-and-recreate dance. | 4h | PLAN_CREATE_OR_REPLACE.md | |
stream_table dbt materialization macro to call create_or_replace instead of dropping and recreating on every dbt run. Existing data survives deployments; only genuinely changed stream tables get rebuilt. | 2h | PLAN_CREATE_OR_REPLACE.md | |
ALTER EXTENSION UPDATE. SQL Reference and FAQ updated with usage examples. | 2.5h | PLAN_CREATE_OR_REPLACE.md | |
| 4h | PLAN_CREATE_OR_REPLACE.md |
Idempotent DDL subtotal: ~12–13 hours
Circular Dependency Foundation ✅
In plain terms: Normally stream tables form a one-way chain: A feeds B, B feeds C. A circular dependency means A feeds B which feeds A — usually a mistake, but occasionally useful for iterative computations like graph reachability or recursive aggregations. This lays the groundwork — the algorithms, catalog columns, and GUC settings — to eventually allow controlled circular stream tables. The actual live execution is completed in v0.7.0.
Forms the prerequisite for full SCC-based fixpoint refresh in v0.7.0.
| Item | Description | Effort | Ref |
|---|---|---|---|
| ~2h | PLAN_CIRCULAR_REFERENCES.md Part 1 | ||
| ~1h | PLAN_CIRCULAR_REFERENCES.md Part 2 | ||
| ~1h | PLAN_CIRCULAR_REFERENCES.md Part 3 | ||
max_fixpoint_iterations (default 100) prevents runaway loops, and allow_circular (default off) is the master switch — circular dependencies are rejected unless you explicitly opt in. | ~30min | PLAN_CIRCULAR_REFERENCES.md Part 4 |
Circular dependency foundation subtotal: ~4.5 hours
Edge Case Hardening
In plain terms: Six remaining edge cases from the PLAN_EDGE_CASES.md catalogue — one data correctness issue (P0), three operational-surprise items (P1), and two usability gaps (P2). Together they close every open edge case above "accepted trade-off" status.
P0 — Data Correctness
| Item | Description | Effort | Ref |
|---|---|---|---|
REPLICA IDENTITY FULL to send complete row data. Without it, deltas are silently incomplete. This rejects the combination at creation time with a clear error instead of producing wrong results. | 0.5 day | PLAN_EDGE_CASES.md EC-19 |
P1 — Operational Safety
| Item | Description | Effort | Ref |
|---|---|---|---|
calculate_discount() and someone does CREATE OR REPLACE FUNCTION calculate_discount(...) with new logic, the stream table's cached computation plan becomes stale. This checks function body hashes on each refresh and triggers a rebuild when a change is detected. | 2 days | PLAN_EDGE_CASES.md EC-16 | |
cdc_mode = 'auto', pg_trickle is supposed to upgrade from trigger-based to WAL-based change tracking when possible. If it stays stuck on triggers (e.g. because wal_level isn't set to logical), there's no feedback. This adds a periodic log message explaining the reason and surfaces it in the health_check() output. | 1 day | PLAN_EDGE_CASES.md EC-18 | |
pg_basebackup, replication slots are lost. pg_trickle's WAL decoder would fail trying to read from a slot that no longer exists. This detects the missing slot, automatically falls back to trigger-based tracking, and logs a WARNING so you know what happened. | 1 day | PLAN_EDGE_CASES.md EC-34 |
P2 — Usability Gaps
| Item | Description | Effort | Ref |
|---|---|---|---|
CASE WHEN ROW_NUMBER() OVER (...) = 1 THEN 'first' ELSE 'other' END are currently rejected because the incremental engine can't handle a window function nested inside a CASE. This automatically extracts the window function into a preliminary step and rewrites the outer query to reference the precomputed result — so the query pattern just works. | 3–5 days | PLAN_EDGE_CASES.md EC-03 | |
ALL (subquery) comparisons. Queries like WHERE price > ALL (SELECT price FROM competitors) (meaning "greater than every row in the subquery") are currently rejected in incremental mode. This rewrites them into an equivalent form the engine can handle, removing a Known Limitation from the changelog. | 2–3 days | PLAN_EDGE_CASES.md EC-32 |
Edge case hardening subtotal: ~9.5–13.5 days
Ergonomics Follow-Up ✅
In plain terms: Several test gaps and a documentation item were left over from the v0.5.0 ergonomics work. These are all small E2E tests that confirm existing features actually produce the warnings and errors they're supposed to — catching regressions before users hit them. The changelog entry documents breaking behavioural changes (the default schedule changed from a fixed "every 1 minute" to an auto-calculated interval, and
NULLschedule input is now rejected).
| Item | Description | Effort | Ref |
|---|---|---|---|
'calculated' as a schedule works (pg_trickle picks an interval based on table size) and that passing NULL gives a clear error instead of silently breaking. Catches regressions in the schedule parser. | 4h | PLAN_ERGONOMICS.md §Remaining follow-up | |
diamond_consistency GUC was removed in v0.4.0. Verify that SHOW pg_trickle.diamond_consistency returns an error — not a stale value from a previous installation that confuses users. | 2h | PLAN_ERGONOMICS.md §Remaining follow-up | |
alter_stream_table(query => ...), it may trigger an expensive full re-scan. Verify the WARNING appears so users aren't surprised by a sudden spike in load. | 3h | PLAN_ERGONOMICS.md §Remaining follow-up | |
cdc_mode = 'auto' but PostgreSQL's wal_level isn't set to logical, pg_trickle can't use WAL-based tracking and silently falls back to triggers. Verify the startup WARNING appears so operators know they need to change wal_level. | 3h | PLAN_ERGONOMICS.md §Remaining follow-up | |
NULL schedule input started being rejected. These behavioural changes need explicit CHANGELOG entries so upgrading users aren't caught off guard. | 2h | PLAN_ERGONOMICS.md §Remaining follow-up |
Ergonomics follow-up subtotal: ~14 hours
Bootstrap Source Gating Follow-Up ✅
In plain terms: Source gating (pause/resume for bulk loads) shipped in v0.5.0 with the core API and scheduler integration. This follow-up adds robustness tests for edge cases that real-world ETL pipelines will hit: What happens if you gate a source twice? What if you re-gate it after ungating? It also adds a dedicated introspection function that shows the full gate lifecycle (when gated, who gated it, how long it's been gated), and documentation showing common ETL coordination patterns like "gate → bulk load → ungate → single clean refresh."
| Item | Description | Effort | Ref |
|---|---|---|---|
gate_source('orders') when orders is already gated is a harmless no-op — not an error. Important for ETL scripts that may retry on failure. | 3h | PLAN_BOOTSTRAP_GATING.md | |
| 3h | PLAN_BOOTSTRAP_GATING.md | ||
bootstrap_gate_status() function that shows which sources are gated, when they were gated, who gated them, and how long they've been paused. Useful for debugging when the scheduler seems to be "doing nothing" — it might just be waiting for a gate. | 3h | PLAN_BOOTSTRAP_GATING.md | |
| 3h | PLAN_BOOTSTRAP_GATING.md |
Bootstrap gating follow-up subtotal: ~12 hours
dbt Integration Enhancements ✅
In plain terms: The dbt macro package (
dbt-pgtrickle) shipped in v0.4.0 with the corestream_tablematerialization. This adds three improvements: astream_table_statusmacro that lets dbt models query health information (stale? erroring? how many refreshes?) so you can build dbt tests that fail when a stream table is unhealthy; a bulkrefresh_all_stream_tablesoperation for CI pipelines that need everything fresh before running tests; and expanded integration tests covering thealter_stream_tableflow (which gets more important oncecreate_or_replacelands in the same release).
| Item | Description | Effort | Ref |
|---|---|---|---|
stream_table_status() macro that returns whether a stream table is healthy, stale, or erroring — so you can write dbt tests like "fail if the orders summary hasn't refreshed in the last 5 minutes." Makes pg_trickle a first-class citizen in dbt's testing framework. | 3h | PLAN_ECO_SYSTEM.md §Project 1 | |
dbt run-operation refresh_all_stream_tables command that refreshes all stream tables in the correct dependency order. Designed for CI pipelines: run it after dbt run and before dbt test to make sure all materialized data is current. | 2h | PLAN_ECO_SYSTEM.md §Project 1 | |
stream_table materialization. Especially important now that create_or_replace is landing in the same release. | 3h | PLAN_ECO_SYSTEM.md §Project 1 |
dbt integration subtotal: ~8 hours
SQL Documentation Gaps ✅
In plain terms: Once EC-03 (window functions in expressions) and EC-32 (
ALL (subquery)) are implemented in this release, the documentation needs to explain the new patterns with examples. The foreign table polling CDC feature (shipped in v0.2.2) also needs a worked example showing common setups likepostgres_fdwsource tables with periodic polling.
| Item | Description | Effort | Ref |
|---|---|---|---|
WHERE price > ALL (SELECT ...), how pg_trickle rewrites it internally, and a complete worked example with sample data and expected output. | 2h | GAP_SQL_OVERVIEW.md | |
CASE WHEN ROW_NUMBER() ..., and here's what pg_trickle does under the hood to make it work incrementally." | 2h | PLAN_EDGE_CASES.md EC-03 | |
postgres_fdw foreign table, use it as a stream table source with polling-based change detection, and what to expect in terms of refresh behaviour. This feature shipped in v0.2.2 but was never properly documented with an example. | 1h | Existing feature (v0.2.2) |
SQL documentation subtotal: ~5 hours
v0.6.0 total: ~77–92h
Exit criteria:
- Partitioned source tables E2E-tested; ATTACH PARTITION detected
-
WAL mode works with
publish_via_partition_root = true -
create_or_replace_stream_tabledeployed; dbt macro updated - SCC algorithm in place; monotonicity checker rejects non-monotone cycles
- WAL + keyless without REPLICA IDENTITY FULL rejected at creation (EC-19)
-
ALTER FUNCTIONbody changes detected viapg_prochash polling (EC-16) -
Stuck
autoCDC mode surfaces explanation in logs and health check (EC-18) - Missing WAL slot after restore auto-detected with TRIGGER fallback (EC-34)
- Window functions in expressions supported via subquery-lift rewrite (EC-03)
-
ALL (subquery)rewritten to NULL-safe anti-join (EC-32) - Ergonomics E2E tests for calculated schedule, warnings, and removed GUCs pass
-
gate_source()idempotency and re-gating tested;bootstrap_gate_status()available -
dbt
stream_table_status()andrefresh_all_stream_tablesmacros shipped - SQL Reference updated for EC-03, EC-32, and foreign table polling patterns
-
Extension upgrade path tested (
0.5.0 → 0.6.0)
v0.7.0 — Performance, Watermarks, Circular DAG Execution, Observability & Infrastructure
Status: Released (2026-03-16).
Goal: Land Part 9 performance improvements (parallel refresh scheduling, MERGE strategy optimization, advanced benchmarks), add user-injected temporal watermark gating for batch-ETL coordination, complete the fixpoint scheduler for circular stream table DAGs, ship ready-made Prometheus/Grafana monitoring, and prepare the 1.0 packaging and deployment infrastructure.
Completed items (click to expand)
Watermark Gating
In plain terms: A scheduling control for ETL pipelines where multiple source tables are populated by separate jobs that finish at different times. For example,
ordersmight be loaded by a job that finishes at 02:00 andproductsby one that finishes at 03:00. Without watermarks, the scheduler might refresh a stream table that joins the two at 02:30, producing a half-complete result. Watermarks let each ETL job declare "I'm done up to timestamp X", and the scheduler waits until all sources are caught up within a configurable tolerance before proceeding.
Let producers signal their progress so the scheduler only refreshes stream tables when all contributing sources are aligned within a configurable tolerance. The primary use case is nightly batch ETL pipelines where multiple source tables are populated on different schedules.
| Item | Description | Effort | Ref |
|---|---|---|---|
pgt_watermarks table (source_relid, current_watermark, updated_at, wal_lsn_at_advance); pgt_watermark_groups table (group_name, sources, tolerance) | ✅ Done | PLAN_WATERMARK_GATING.md | |
advance_watermark(source, watermark) — monotonicity check, store LSN alongside watermark, lightweight scheduler signal | ✅ Done | PLAN_WATERMARK_GATING.md | |
create_watermark_group(name, sources[], tolerance) / drop_watermark_group() | ✅ Done | PLAN_WATERMARK_GATING.md | |
SKIP(watermark_misaligned) if not aligned | ✅ Done | PLAN_WATERMARK_GATING.md | |
watermarks(), watermark_groups(), watermark_status() introspection functions | ✅ Done | PLAN_WATERMARK_GATING.md | |
| ✅ Done | PLAN_WATERMARK_GATING.md |
Watermark gating: ✅ Complete
Circular Dependencies — Scheduler Integration
In plain terms: Completes the circular DAG work started in v0.6.0. When stream tables reference each other in a cycle (A → B → A), the scheduler now runs them repeatedly until the result stabilises — no more changes flowing through the cycle. This is called "fixpoint iteration", like solving a system of equations by re-running it until the numbers stop moving. If it doesn't converge within a configurable number of rounds (default 100) it surfaces an error rather than looping forever.
Completes the SCC foundation from v0.6.0 with a working fixpoint iteration
loop. Stream tables in a monotone cycle are refreshed repeatedly until
convergence (zero net change) or max_fixpoint_iterations is exceeded.
| Item | Description | Effort | Ref |
|---|---|---|---|
iterate_to_fixpoint(), convergence detection from (rows_inserted, rows_deleted), non-convergence → ERROR status | ✅ Done | PLAN_CIRCULAR_REFERENCES.md Part 5 | |
allow_circular=true; assign scc_id; recompute SCCs on drop_stream_table | ✅ Done | PLAN_CIRCULAR_REFERENCES.md Part 6 | |
scc_id + last_fixpoint_iterations in views; pgtrickle.pgt_scc_status() function | ✅ Done | PLAN_CIRCULAR_REFERENCES.md Part 7 | |
e2e_circular_tests.rs): 6 scenarios (monotone cycle, non-monotone reject, convergence, non-convergence→ERROR, drop breaks cycle, allow_circular=false default) | ✅ Done | PLAN_CIRCULAR_REFERENCES.md Part 8 |
Circular dependencies subtotal: ~19 hours
Last Differential Mode Gaps
In plain terms: Three query patterns that previously fell back to
FULLrefresh inAUTOmode — or hard-errored in explicitDIFFERENTIALmode — despite the DVM engine having the infrastructure to handle them. All three gaps are now closed.
| Item | Description | Effort | Ref |
|---|---|---|---|
ST_Union, ST_Collect), pgvector vector averages, and any CREATE AGGREGATE function are rejected. Fix: classify unknown aggregates as AggFunc::UserDefined and route them through the existing group-rescan strategy — no new delta math required. | ✅ Done | PLAN_LAST_DIFFERENTIAL_GAPS.md §G1 | |
RANK() OVER (...) + 1, CASE WHEN ROW_NUMBER() OVER (...) <= 10, COALESCE(LAG(v) OVER (...), 0) etc. are rejected. | ✅ Done (v0.6.0) | PLAN_LAST_DIFFERENTIAL_GAPS.md §G2 | |
EXISTS(...) OR … and AND(EXISTS OR …) but gives up on multiple OR+sublink conjuncts. Fix: expand all OR+sublink conjuncts in AND to a cartesian product of UNION branches with a 16-branch explosion guard. | ✅ Done | PLAN_LAST_DIFFERENTIAL_GAPS.md §G3 |
Last differential gaps: ✅ Complete
Pre-1.0 Infrastructure Prep
In plain terms: Three preparatory tasks that make the eventual 1.0 release smoother. A draft Docker Hub image workflow (tests the build but doesn't publish yet); a PGXN metadata file so the extension can eventually be installed with
pgxn install pg_trickle; and a basic CNPG integration test that verifies the extension image loads correctly in a CloudNativePG cluster. None of these ship user-facing features — they're CI and packaging scaffolding.
| Item | Description | Effort | Ref |
|---|---|---|---|
| 5h | ✅ Done | ||
META.json and upload a release_status: "testing" package to PGXN so pgxn install pg_trickle works for early adopters now. PGXN explicitly supports pre-stable releases; this gets real-world install testing and establishes registry presence before 1.0. At 1.0 the only change is flipping release_status to "stable". | 2–3h | ✅ Done | |
| 4h | ✅ Done |
Pre-1.0 infrastructure prep: ✅ Complete
Performance — Regression Fixes & Benchmark Infrastructure (Part 9 S1–S2) ✅ Done
Fixes Criterion benchmark regressions identified in Part 9 and ships five benchmark infrastructure improvements to support data-driven performance decisions.
| Item | Description | Status |
|---|---|---|
| A-3 | Fix prefixed_col_list/20 +34% regression — eliminate intermediate Vec allocation | ✅ Done |
| A-4 | Fix lsn_gt +22% regression — use split_once instead of split().collect() | ✅ Done |
| I-1c | just bench-docker target for running Criterion inside Docker builder image | ✅ Done |
| I-2 | Per-cycle [BENCH_CYCLE] CSV output in E2E benchmarks for external analysis | ✅ Done |
| I-3 | EXPLAIN ANALYZE capture mode (PGS_BENCH_EXPLAIN=true) for delta query plans | ✅ Done |
| I-6 | 1M-row benchmark tier (bench_*_1m_* + bench_large_matrix) | ✅ Done |
| I-8 | Criterion noise reduction (sample_size(200), measurement_time(10s)) | ✅ Done |
Performance — Parallel Refresh, MERGE Optimization & Advanced Benchmarks (Part 9 S4–S6) ✅ Done
DAG level-parallel scheduling, improved MERGE strategy selection (xxh64 hashing, aggregate saturation bypass, cost-based threshold), and expanded benchmark suite (JSON comparison, concurrent writers, window/lateral/CTE).
| Item | Description | Status |
|---|---|---|
| C-1 | DAG level extraction (topological_levels() on StDag and ExecutionUnitDag) | ✅ Done |
| C-2 | Level-parallel dispatch (existing parallel_dispatch_tick infrastructure sufficient) | ✅ Done |
| C-3 | Result communication (existing SchedulerJob + pgt_refresh_history sufficient) | ✅ Done |
| D-1 | xxh64 hash-based change detection for wide tables (≥50 cols) | ✅ Done |
| D-2 | Aggregate saturation FULL bypass (changes ≥ groups → FULL) | ✅ Done |
| D-3 | Cost-based strategy selection from pgt_refresh_history data | ✅ Done |
| I-4 | Cross-run comparison tool (just bench-compare, JSON output) | ✅ Done |
| I-5 | Concurrent writer benchmarks (1/2/4/8 writers) | ✅ Done |
| I-7 | Window / lateral / CTE / UNION ALL operator benchmarks | ✅ Done |
v0.7.0 total: ~59–62h
Exit criteria:
- Part 9 performance: DAG levels, xxh64 hashing, aggregate saturation bypass, cost-based threshold, advanced benchmarks
-
advance_watermark+ scheduler gating operational; ETL E2E tests pass -
Monotone circular DAGs converge to fixpoint; non-convergence surfaces as
ERROR - UDAs, nested window expressions, and deeply nested OR+sublinks supported in DIFFERENTIAL mode
- Docker Hub image CI workflow builds and smoke-tests successfully
-
PGXN
testingrelease uploaded;pgxn install pg_trickleworks - CNPG integration smoke test passes in CI
-
Extension upgrade path tested (
0.6.0 → 0.7.0)
v0.8.0 — pg_dump Support & Test Hardening
Status: Released
Goal: Complete the pg_dump round-trip story so stream tables survive
pg_dump/pg_restore cycles, and comprehensively harden the
E2E test suites with multiset invariants to mathematically enforce DVM correctness.
Completed items (click to expand)
pg_dump / pg_restore Support
In plain terms:
pg_dumpis the standard PostgreSQL backup tool. Without this, a dump of a database containing stream tables may not capture them correctly — and restoring from that dump would require manually recreating them by hand. This teachespg_dumpto emit valid SQL for every stream table, and adds logic to automatically re-link orphaned catalog entries when restoring an extension from a backup.
Complete the native DDL story: teach pg_dump to emit CREATE MATERIALIZED VIEW … WITH (pgtrickle.stream = true) for stream tables and add an event trigger
that re-links orphaned catalog entries on extension restore.
| Item | Description | Effort | Ref |
|---|---|---|---|
| NAT-DUMP | generate_dump() + restore_stream_tables() companion functions (done); event trigger on extension load for orphaned catalog entries | 3–4d | PLAN_NATIVE_SYNTAX.md §pg_dump |
| NAT-TEST | E2E tests: pg_dump round-trip, restore from backup, orphaned-entry recovery | 2–3d | PLAN_NATIVE_SYNTAX.md §pg_dump |
pg_dump support subtotal: ~5–7 days
Test Suite Evaluation & Hardening
In plain terms: Replacing legacy, row-count-based assertions with comprehensive, order-independent multiset evaluations (
assert_st_matches_query) across all testing tiers. This mathematical invariant proving guarantees differential dataflow correctness under highly chaotic multiset interleavings and edge cases.
| Item | Description | Effort | Ref |
|---|---|---|---|
| TE1 | Unit Test Hardening: Full multiset equality testing for pure-Rust DVM operators | Done | PLAN_EVALS_UNIT |
| TE2 | Light E2E Migration: Expand speed-optimized E2E pipeline with rigorous symmetric difference checks | Done | PLAN_EVALS_LIGHT_E2E |
| TE3 | Integration Concurrency: Prove complex orchestration correctness under transaction delays | Done | PLAN_EVALS_INTEGRATION |
| TE4 | Full E2E Hardening: Validate cross-boundary, multi-DAG cascades, partition handling, and upgrade paths | Done | PLAN_EVALS_FULL_E2E |
| TE5 | TPC-H Smoke Test: Stateful invariant evaluations for heavily randomized DML loads over large matrices | Done | PLAN_EVALS_TPCH |
| TE6 | Property-Based Invariants: Chaotic property testing pipelines for topological boundaries and cyclic executions | Done | PLAN_PROPERTY_BASED_INVARIANTS |
| TE7 | cargo-nextest Migration: Move test suite execution to cargo-nextest to aggressively parallelize and isolate tests, solving wall-clock execution regressions | 1–2d | PLAN_CARGO_NEXTEST |
Test evaluation subtotal: ~11-14 days (Mostly Completed)
v0.8.0 total: ~16–21 days
Exit criteria:
- Test infrastructure hardened with exact mathematical multiset validation
-
Test harness migrated to
cargo-nextestto fix speed and CI flake regressions - pg_dump round-trip produces valid, restorable SQL for stream tables (Done)
-
Extension upgrade path tested (
0.7.0 → 0.8.0)
v0.9.0 — Incremental Aggregate Maintenance
Status: Released (2026-03-20).
Goal: Implement algebraic incremental maintenance for decomposable aggregates (COUNT, SUM, AVG, MIN, MAX, STDDEV), reducing per-group refresh from O(group_size) to O(1) for the common case. This is the highest-potential-payoff item in the performance plan — benchmarks show aggregate scenarios going from 2.5 ms to sub-1 ms per group.
Completed items (click to expand)
Critical Bug Fixes
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| G-1 | panic!() in SQL-callable source_gates() and watermarks() functions. Both functions reach panic!() on any SPI error, crashing the PostgreSQL backend process. AGENTS.md explicitly forbids panic!() in code reachable from SQL. Replace both .unwrap_or_else(|e| panic!(…)) calls with pgrx::error!(…) so any SPI failure surfaces as a PostgreSQL ERROR instead. | ~1h | ✅ Done | src/api.rs |
Critical bug fixes subtotal: ~1 hour
Algebraic Aggregate Shortcuts (B-1)
In plain terms: When only one row changes in a group of 100,000, today pg_trickle re-scans all 100,000 rows to recompute the aggregate. Algebraic maintenance keeps running totals:
new_sum = old_sum + Δsum,new_count = old_count + Δcount. Only MIN/MAX needs a rescan — and only when the deleted value was the current minimum or maximum.
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| B1-1 | Algebraic rules: COUNT, SUM (already algebraic), AVG (done — aux cols), STDDEV/VAR (done — sum-of-squares decomposition), MIN/MAX with rescan guard (already implemented) | 3–4 wk | ✅ Done | PLAN_NEW_STUFF.md §B-1 |
| B1-2 | Auxiliary column management (__pgt_aux_sum_*, __pgt_aux_count_*, __pgt_aux_sum2_* — done); hidden via __pgt_* naming convention (existing NOT LIKE '__pgt_%' filter) | 1–2 wk | ✅ Done | PLAN_NEW_STUFF.md §B-1 |
| B1-3 | Migration story for existing aggregate stream tables; periodic full-group recomputation to reset floating-point drift | 1 wk | ✅ Done | PLAN_NEW_STUFF.md §B-1 |
| B1-4 | Fallback to full-group recomputation for non-decomposable aggregates (mode, percentile, string_agg with ordering) | 1 wk | ✅ Done | PLAN_NEW_STUFF.md §B-1 |
| B1-5 | Property-based tests: MIN/MAX boundary case (deleting the exact current min or max value must trigger rescan) | 1 wk | ✅ Done | PLAN_NEW_STUFF.md §B-1 |
Implementation Progress
Completed:
-
AVG algebraic maintenance (B1-1): AVG no longer triggers full group-rescan. Classified as
is_algebraic_via_aux()and tracked via__pgt_aux_sum_*/__pgt_aux_count_*columns. The merge expression computes(old_sum + ins - del) / NULLIF(old_count + ins - del, 0). -
STDDEV/VAR algebraic maintenance (B1-1):
STDDEV_POP,STDDEV_SAMP,VAR_POP, andVAR_SAMPare now algebraic using sum-of-squares decomposition. Auxiliary columns:__pgt_aux_sum_*(running SUM),__pgt_aux_sum2_*(running SUM(x²)),__pgt_aux_count_*. Merge formulas:VAR_POP = GREATEST(0, (n·sum2 − sum²) / n²)VAR_SAMP = GREATEST(0, (n·sum2 − sum²) / (n·(n−1)))STDDEV_POP = SQRT(VAR_POP),STDDEV_SAMP = SQRT(VAR_SAMP)Null guards match PostgreSQL semantics (NULL when count ≤ threshold).
-
Auxiliary column infrastructure (B1-2):
create_stream_table()andalter_stream_table()detect AVG/STDDEV/VAR aggregates and automatically addNUMERICsum/sum2 andBIGINTcount columns. Full refresh and initialization paths injectSUM(arg),COUNT(arg), andSUM(arg*arg). All__pgt_aux_*columns are automatically hidden by the existingNOT LIKE '__pgt_%'convention used throughout the codebase. -
Non-decomposable fallback (B1-4): Already existed as the group-rescan strategy — any aggregate not classified as algebraic or algebraic-via-aux falls back to full group recomputation.
-
Property-based tests (B1-5): Seven proptest tests verify: (a) MIN merge uses
LEAST, MAX merge usesGREATEST; (b) deleting the exact current extremum triggers rescan; (c) delta expressions use matching aggregate functions; (d) AVG is classified as algebraic-via-aux (not group-rescan); (e) STDDEV/VAR use sum-of-squares algebraic path with GREATEST guard; (f) STDDEV wraps in SQRT, VAR does not; (g) DISTINCT STDDEV falls back (not algebraic). -
Migration story (B1-3):
ALTER QUERYtransition seamlessly. Handled by extendingmigrate_aux_columnsto executeALTER TABLE ADD COLUMNorDROP COLUMNexactly matching runtime changes in thenew_avg_auxornew_sum2_auxdefinitions. -
Floating-point drift reset (B1-3): Implemented global GUC
pg_trickle.algebraic_drift_reset_cycles(0=disabled) that counts differential refresh attempts in scheduler memory per-stream-table. When the threshold fires, action degrades toRefreshAction::Reinitialize. -
E2E integration tests: Tested via multi-cycle inserts, updates, and deletes checking proper handling without regression (added specifically for STDDEV/VAR).
Remaining work:
-
Extension upgrade path (
0.8.0 → 0.9.0): Upgrade SQL stub created. Left as a final pre-release checklist item to generate the finalsql/archive/pg_trickle--0.9.0.sqlwithcargo pgrx packageonce all CI checks pass. -
F15 — Selective CDC Column Capture: ✅ Complete. Column-selection pipeline, monitoring exposure via
check_cdc_health().selective_capture, and 3 E2E integration tests done.
⚠️ Critical: the MIN/MAX maintenance rule is directionally tricky. The correct condition for triggering a rescan is: deleted value equals the current min/max (not when it differs). Getting this backwards silently produces stale aggregates on the most common OLTP delete pattern. See the corrected table and risk analysis in PLAN_NEW_STUFF.md §B-1.
Retraction consideration (B-1): Keep in v0.9.0, but item B1-5 (property-based tests covering the MIN/MAX boundary case) is a hard prerequisite for B1-1, not optional follow-on work. The MIN/MAX rule was stated backwards in the original spec; the corrected rule is now in PLAN_NEW_STUFF.md. Do not merge any MIN/MAX algebraic path until property-based tests confirm: (a) deleting the exact current min triggers a rescan and (b) deleting a non-min value does not. Floating-point drift reset (B1-3) is also required before enabling persistent auxiliary columns.
✅ B1-5 hard prerequisite satisfied. Property-based tests now cover both conditions — see
prop_min_max_rescan_guard_directionintests/property_tests.rs.
Algebraic aggregates subtotal: ~7–9 weeks
Advanced SQL Syntax & DVM Capabilities (B-2)
These represent expansions of the DVM engine to handle richer SQL constructs and improve runtime execution consistency.
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| B2-1 | LIMIT / OFFSET / ORDER BY. Top-K queries evaluated directly within the DVM engine. | 2–3 wk | ✅ Done | PLAN_ORDER_BY_LIMIT_OFFSET.md |
| B2-2 | LATERAL Joins. Expanding the parser and DVM diff engine to handle LATERAL subqueries. | 2 wk | ✅ Done | PLAN_LATERAL_JOINS.md |
| B2-3 | View Inlining. Allow stream tables to query standard PostgreSQL views natively. | 1-2 wk | ✅ Done | PLAN_VIEW_INLINING.md |
| B2-4 | Synchronous / Transactional IVM. Evaluating DVM diffs synchronously in the same transaction as the DML. | 3 wk | ✅ Done | PLAN_TRANSACTIONAL_IVM.md |
| B2-5 | Cross-Source Snapshot Consistency. Improving engine consistency models when joining multiple tables. | 2 wk | ✅ Done | PLAN_CROSS_SOURCE_SNAPSHOT_CONSISTENCY.md |
| B2-6 | Non-Determinism Guarding. Better handling or rejection of non-deterministic functions (random(), now()). | 1 wk | ✅ Done | PLAN_NON_DETERMINISM.md |
Multi-Table Delta Batching (B-3)
In plain terms: When a join query has three source tables and all three change in the same cycle, today pg_trickle makes three separate passes through the source tables. B-3 merges those passes into one and prunes UNION ALL branches for sources with no changes.
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| B3-1 | Intra-query delta-branch pruning: skip UNION ALL branch entirely when a source has zero changes in this cycle | 1–2 wk | ✅ Done | PLAN_NEW_STUFF.md §B-3 |
| B3-2 | Merged-delta generation: weight aggregation (GROUP BY __pgt_row_id, SUM(weight)) for cross-source deduplication; remove zero-weight rows | 3–4 wk | ✅ Done (v0.10.0) | PLAN_NEW_STUFF.md §B-3 |
| B3-3 | Property-based correctness tests for simultaneous multi-source changes; diamond-flow scenarios | 1–2 wk | ✅ Done (v0.10.0) | PLAN_NEW_STUFF.md §B-3 |
✅ B3-2 correctly uses weight aggregation (
GROUP BY __pgt_row_id, SUM(weight)) instead ofDISTINCT ON. B3-3 property-based tests (6 diamond-flow scenarios) verify correctness.
Multi-source delta batching subtotal: ~5–8 weeks
Phase 7 Gap Resolutions (DVM Correctness, Syntax & Testing)
These items pull in the remaining correctness edge cases and syntax expansions identified in the Phase 7 SQL Gap Analysis, along with completing exhaustive differential E2E test maturation.
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| G1.1 | JOIN Key Column Changes. Handle updates that simultaneously modify a JOIN key and right-side tracked columns. | 3-5d | ✅ Done | GAP_SQL_PHASE_7.md |
| G1.2 | Window Function Partition Drift. Explicit tracking for updates that cause rows to cross PARTITION BY ranges. | 4-6d | ✅ Done | GAP_SQL_PHASE_7.md |
| G1.5/G7.1 | Keyless Table Duplicate Identity. Resolve __pgt_row_id collisions for non-PK tables with exact duplicate rows. | 3-5d | ✅ Done | GAP_SQL_PHASE_7.md |
| G5.6 | Range Aggregates. Support and differentiate RANGE_AGG and RANGE_INTERSECT_AGG. | 1-2d | ✅ Done | GAP_SQL_PHASE_7.md |
| G5.3 | XML Expression Parsing. Native DVM handling for T_XmlExpr syntax trees. | 1-2d | ✅ Done | GAP_SQL_PHASE_7.md |
| G5.5 | NATURAL JOIN Drift Tracking. DVM tracking of schema shifts in NATURAL JOIN between refreshes. | 2-3d | ✅ Done | GAP_SQL_PHASE_7.md |
| F15 | Selective CDC Column Capture. Limit row I/O by only tracking columns referenced in query lineage. | 1-2 wk | ✅ Done | GAP_SQL_PHASE_6.md |
| F40 | Extension Upgrade Migrations. Robust versioned SQL schema migrations. | 1-2 wk | ✅ Done | REPORT_DB_SCHEMA_STABILITY.md |
Phase 7 Gaps subtotal: ~5-7 weeks
Additional Query Engine Improvements
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| A1 | Circular dependency support (SCC fixpoint iteration) | ~40h | ✅ Done | CIRCULAR_REFERENCES.md |
| A7 | Skip-unchanged-column scanning in delta SQL (requires column-usage demand-propagation pass in DVM parser) | ~1–2d | ✅ Done | PLAN_EDGE_CASES_TIVM_IMPL_ORDER.md Stage 4 §3.4 |
| EC-03 | Window-in-expression DIFFERENTIAL fallback warning: emit a WARNING (and eventually an INFO hint) when a stream table with CASE WHEN window_fn() OVER (...) ... silently falls back from DIFFERENTIAL to FULL refresh mode; currently fails at runtime with column st.* does not exist — no user-visible signal exists | ~1d | ✅ Done | PLAN_EDGE_CASES.md §EC-03 |
| A8 | pgt_refresh_groups SQL API: companion functions (pgtrickle.create_refresh_group(), pgtrickle.drop_refresh_group(), pgtrickle.refresh_groups()) for the Cross-Source Snapshot Consistency catalog table introduced in the 0.8.0→0.9.0 upgrade script | ~2–3d | ✅ Done | PLAN_CROSS_SOURCE_SNAPSHOT_CONSISTENCY.md |
Advanced Capabilities subtotal: ~11–13 weeks
DVM Engine Correctness & Performance Hardening (P2)
These items address correctness gaps that silently degrade to full-recompute modes or cause excessive I/O on each differential cycle. All are observable in production workloads.
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| P2-1 | Recursive CTE DRed in DIFFERENTIAL mode. Currently, any DELETE or UPDATE against a recursive CTE's source in DIFFERENTIAL mode falls back to O(n) full recompute + diff. The Delete-and-Rederive (DRed) algorithm exists for IMMEDIATE mode only. Implement DRed for DeltaSource::ChangeBuffer so recursive CTE stream tables in DIFFERENTIAL mode maintain O(delta) cost. | 2–3 wk | ⏭️ Deferred to v0.10.0 | src/dvm/operators/recursive_cte.rs |
| P2-2 | SUM NULL-transition rescan for FULL OUTER JOIN aggregates. When SUM sits above a FULL OUTER JOIN and rows transition between matched and unmatched states (matched→NULL), the algebraic formula gives 0 instead of NULL, triggering a child_has_full_join() full-group rescan on every cycle where rows cross that boundary. Implement a targeted correction that avoids full-group rescans in the common case. | 1–2 wk | ⏭️ Deferred to v0.10.0 | src/dvm/operators/aggregate.rs |
| P2-3 | DISTINCT multiplicity-count JOIN overhead. Every differential refresh for SELECT DISTINCT queries joins against the stream table's __pgt_count column for the full stream table, even when only a tiny delta is being processed. Replace with a per-affected-row lookup pattern to limit this to O(delta) I/O. | 1 wk | ✅ Done | src/dvm/operators/distinct.rs |
| P2-4 | Materialized view sources in IMMEDIATE mode (EC-09). Stream tables that use a PostgreSQL materialized view as a source are rejected at creation time when IMMEDIATE mode is requested. Implement a polling-change-detection wrapper (same approach as EC-05 for foreign tables) to support REFRESH MATERIALIZED VIEW-sourced queries in IMMEDIATE mode. | 2–3 wk | ⏭️ Deferred to v0.10.0 | plans/PLAN_EDGE_CASES.md §EC-09 |
| P2-5 | changed_cols bitmask captured but not consumed in delta scan SQL. Every CDC change buffer row stores a changed_cols BIGINT bitmask recording which source columns were modified by an UPDATE. The DVM delta scan CTE reads every UPDATE row regardless of whether any query-referenced column actually changed. Implement a demand-propagation pass to identify referenced columns per Scan, then inject a changed_cols & referenced_mask != 0 filter into the delta CTE WHERE clause. For wide source tables (50+ columns) where a typical UPDATE touches 1–3 columns, this eliminates ~98% of UPDATE rows entering the join/aggregate pipeline. | 2–3 wk | ✅ Done | src/dvm/operators/scan.rs · plans/PLAN_EDGE_CASES_TIVM_IMPL_ORDER.md §Task 3.1 |
| P2-6 | LATERAL subquery inner-source change triggers O(|outer table|) full re-execution. When any inner source has CDC entries in the current window, build_inner_change_branch() re-materializes the entire outer table snapshot and re-executes the lateral subquery for every outer row — O(|outer|) per affected cycle. Gate the outer-table scan behind a join to the inner delta rows so only outer rows correlated with changed inner rows are re-executed. (The analogous scalar subquery fix is P3-3; this is the lateral equivalent.) | 1–2 wk | ⏭️ Deferred to v0.10.0 | src/dvm/operators/lateral_subquery.rs |
| P2-7 | Delta predicate pushdown not implemented. WHERE predicates from the defining query are not pushed into the change buffer scan CTE. A stream table defined as SELECT … FROM orders WHERE status = 'shipped' reads all changes from pgtrickle_changes.changes_<oid> then filters — for 10K changes/cycle with 50 matching the predicate, 9,950 rows traverse the join/aggregate pipeline needlessly. Collect pushable predicates from the Filter node above the Scan; inject new_<col> / old_<col> predicate variants into the delta scan SQL. Care required: UPDATE rows need both old and new column values checked to avoid missing deletions that move rows out of the predicate window. | 2–3 wk | ✅ Done | src/dvm/operators/scan.rs · src/dvm/operators/filter.rs · plans/performance/PLAN_NEW_STUFF.md §B-2 |
DVM hardening (P2) subtotal: ~6–9 weeks
DVM Performance Trade-offs (P3)
These items are correct as implemented but scale with data size rather than delta size. They are lower priority than P2 but represent solid measurable wins for high-cardinality workloads.
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| P3-1 | Window partition full recompute. Any single-row change in a window partition triggers recomputation of the entire partition. Add a partition-size heuristic: if the affected partition exceeds a configurable row threshold, downgrade to FULL refresh for that cycle and emit a pgrx::info!() message. At minimum, document the O(partition_size) cost prominently. | 1 wk | ✅ Done (documented) | src/dvm/operators/window.rs |
| P3-2 | Welford auxiliary columns for CORR/COVAR/REGR_* aggregates. CORR, COVAR_POP, COVAR_SAMP, REGR_* currently use O(group_size) group-rescan. Implement Welford-style auxiliary column accumulation (__pgt_aux_sumx_*, __pgt_aux_sumy_*, __pgt_aux_sumxy_*) to reach O(1) algebraic maintenance identical to the STDDEV/VAR path. | 2–3 wk | ⏭️ Deferred to v0.10.0 | src/dvm/operators/aggregate.rs |
| P3-3 | Scalar subquery C₀ EXCEPT ALL scan. Part 2 of the scalar subquery delta computes C₀ = C_current EXCEPT ALL Δ_inserts UNION ALL Δ_deletes by scanning the full outer snapshot. For large outer tables with an unstable inner source, this scan is proportional to the outer table size. Profile and gate the scan behind an existence check on inner-source stability to avoid it when possible; the WHERE EXISTS (SELECT 1 FROM delta_subquery) guard already handles the trivial case. | 1 wk | ✅ Done | src/dvm/operators/scalar_subquery.rs |
| P3-4 | Index-aware MERGE planning. For small deltas against large stream tables (e.g. 5 delta rows, 10M-row ST), the PostgreSQL planner often chooses a sequential scan of the stream table for the MERGE join on __pgt_row_id, yielding O(n) full-table I/O when an index lookup would be O(log n). Emit SET LOCAL enable_seqscan = off within the MERGE transaction when the delta row count is below a configurable threshold fraction of the ST row count (pg_trickle.merge_seqscan_threshold GUC, default 0.001). | 1–2 wk | ✅ Done | src/refresh.rs · src/config.rs · plans/performance/PLAN_NEW_STUFF.md §A-4 |
| P3-5 | auto_backoff GUC for falling-behind stream tables. EC-11 implemented the scheduler_falling_behind NOTIFY alert at 80% of the refresh budget. The companion auto_backoff GUC that automatically doubles the effective refresh interval when a stream table consistently runs behind was explicitly deferred. Add a pg_trickle.auto_backoff bool GUC (default off); when enabled, track a per-ST exponential backoff factor in scheduler shared state and reset it on the first on-time cycle. Saves CPU runaway when operators are offline to respond manually. | 1–2d | ✅ Done | src/scheduler.rs · src/config.rs · plans/PLAN_EDGE_CASES.md §EC-11 |
DVM performance trade-offs (P3) subtotal: ~4–7 weeks
Documentation Gaps (D)
| Item | Description | Effort | Status |
|---|---|---|---|
| D1 | Recursive CTE DIFFERENTIAL mode limitation. The O(n) fallback for mixed DELETE/UPDATE against a recursive CTE source is not documented in docs/SQL_REFERENCE.md or docs/DVM_OPERATORS.md. Users hitting DELETE/UPDATE-heavy workloads on recursive CTE stream tables will see unexpectedly slow refresh times with no explanation. Add a "Known Limitations" callout in both files. | ~2h | ✅ Done |
| D2 | pgt_refresh_groups catalog table undocumented. The catalog table added in the 0.8.0→0.9.0 upgrade script is not described in docs/SQL_REFERENCE.md. Even before the full A8 API lands, document the table schema, its purpose, and the manual INSERT/DELETE workflow users can use in the interim. | ~2h | ✅ Done |
v0.9.0 total: ~23–29 weeks
Exit criteria:
- AVG algebraic path implemented (SUM/COUNT auxiliary columns)
- STDDEV/VAR algebraic path implemented (sum-of-squares decomposition)
- MIN/MAX boundary case (delete-the-extremum) covered by property-based tests
- Non-decomposable fallback confirmed (group-rescan strategy)
-
Auxiliary columns hidden from user queries via
__pgt_*naming convention - Migration path for existing aggregate stream tables tested
- Floating-point drift reset mechanism in place (periodic recompute)
- E2E integration tests for algebraic aggregate paths
- B2-1: Top-K queries (LIMIT/OFFSET/ORDER BY) support
- B2-2: LATERAL Joins support
- B2-3: View Inlining support
- B2-4: Synchronous / Transactional IVM mode
- B2-5: Cross-Source Snapshot Consistency models
- B2-6: Non-Determinism Guarding semantics implemented
-
Extension upgrade path tested (
0.8.0 → 0.9.0) - G1 Correctness Gaps addressed (G1.1, G1.2, G1.5, G1.6)
- G5 Syntax Gaps addressed (G5.2, G5.3, G5.5, G5.6)
- G6 Test Coverage expanded (G6.1, G6.2, G6.3, G6.5)
- F15: Selective CDC Column Capture (optimize I/O by only tracking columns referenced in query lineage)
- F40: Extension Upgrade Migration Scripts (finalize versioned SQL schema migrations)
- B3-1: Delta-branch pruning for zero-change sources (skip UNION ALL branch when source has no changes)
- B3-2: Merged-delta weight aggregation — implemented in v0.10.0 (weight aggregation replaces DISTINCT ON; B3-3 property tests verify correctness)
- B3-3: Property-based correctness tests for B3-2 — implemented in v0.10.0 (6 diamond-flow E2E property tests)
- EC-03: WARNING emitted when window-in-expression query silently falls back from DIFFERENTIAL to FULL refresh mode
-
A8:
pgt_refresh_groupsSQL API (pgt_add_refresh_group,pgt_remove_refresh_group,pgt_list_refresh_groups) - P2-1: Recursive CTE DRed for DIFFERENTIAL mode — deferred to v0.10.0 (high risk; ChangeBuffer mode lacks old-state context for safe rederivation; recomputation fallback is correct)
- P2-2: SUM NULL-transition rescan optimization — deferred to v0.10.0 (requires auxiliary nonnull-count columns; current rescan approach is correct)
-
P2-3: DISTINCT
__pgt_countlookup scoped to O(delta) I/O per cycle - P2-4: Materialized view sources in IMMEDIATE mode — deferred to v0.10.0 (requires external polling-change-detection wrapper; out of scope for v0.9.0)
- P3-1: Window partition O(partition_size) cost documented; heuristic downgrade implemented or explicitly deferred
- P3-2: CORR/COVAR_/REGR_ Welford auxiliary columns — explicitly deferred to v0.10.0 (group-rescan strategy already works correctly for all regression/correlation aggregates)
- P3-3: Scalar subquery C₀ EXCEPT ALL scan gated behind inner-source stability check or explicitly deferred
- D1: Recursive CTE DIFFERENTIAL mode limitation documented in SQL_REFERENCE.md and DVM_OPERATORS.md
-
D2:
pgt_refresh_groupstable schema and interim workflow documented in SQL_REFERENCE.md -
G-1:
panic!()replaced withpgrx::error!()insource_gates()andwatermarks()SQL functions -
G-2 (P2-5):
changed_colsbitmask consumed in delta scan CTE — referenced-column mask filter injected - G-3 (P2-6): LATERAL subquery inner-source scoping — deferred to v0.10.0 (requires correlation predicate extraction from raw SQL; full re-execution is correct)
- G-4 (P2-7): Delta predicate pushdown implemented (pushable predicates injected into change buffer scan CTE)
-
G-5 (P3-4): Index-aware MERGE planning:
SET LOCAL enable_seqscan = offfor small deltas against large STs -
G-6 (P3-5):
auto_backoffGUC implemented; scheduler doubles interval when stream table falls behind
v0.10.0 — DVM Hardening, Connection Pooler Compatibility, Core Refresh Optimizations & Infrastructure Prep
Status: Released (2026-03-23).
Goal: Land deferred DVM correctness and performance improvements (recursive CTE DRed, FULL OUTER JOIN aggregate fix, LATERAL scoping, Welford regression aggregates, multi-source delta merging), fix a class of post-audit DVM safety issues (SQL comment injection as FROM fragments, silent wrong aggregate results, EC-01 gap for complex join trees) and CDC correctness bug (NULL-unsafe PK join, TRUNCATE+INSERT race, stale WAL publication after partitioning), deliver the first wave of refresh performance optimizations (index-aware MERGE, predicate pushdown, change buffer compaction, cost-based refresh strategy), enable cloud-native PgBouncer transaction-mode deployments via an opt-in compatibility mode, and complete the pre-1.0 packaging and deployment infrastructure.
Completed items (click to expand)
Connection Pooler Compatibility
In plain terms: PgBouncer is the most widely used PostgreSQL connection pooler — it sits in front of the database and reuses connections across many application threads. In its common "transaction mode" it hands a different physical connection to each transaction, which breaks anything that assumes the same connection persists between calls (session locks, prepared statements). This work introduces an opt-in compatibility mode for pg_trickle so it works correctly in cloud deployments — Supabase, Railway, Neon, and similar platforms that route through PgBouncer by default.
pg_trickle uses session-level advisory locks and PREPARE statements that are
incompatible with PgBouncer transaction-mode pooling. This section introduces an opt-in graceful degradation layer for connection pooler compatibility.
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| PB1 | Replace pg_advisory_lock() with catalog row-level locking (FOR UPDATE SKIP LOCKED) | 3–4d | ✅ Done (0.10-adjustments) | PLAN_PG_BOUNCER.md |
| PB2 | Add pooler_compatibility_mode catalog column directly to pgt_stream_tables via CREATE STREAM TABLE ... WITH (...) or alter_stream_table() to bypass PREPARE statements and skip NOTIFY locally | 3–4d | ✅ Done (0.10-adjustments) | PLAN_PG_BOUNCER.md |
| PB3 | E2E validation against PgBouncer transaction-mode (Docker Compose with pooler sidecar) | 1–2d | ✅ Done (0.10-adjustments) | PLAN_EDGE_CASES.md EC-28 |
⚠️ PB1 —
SKIP LOCKEDfails silently, not safely.pg_advisory_lock()blocks until the lock is granted, guaranteeing mutual exclusion.FOR UPDATE SKIP LOCKEDreturns zero rows immediately if the row is already locked — meaning a second worker will simply not acquire the lock and proceed as if uncontested, potentially running a concurrent refresh on the same stream table. Before merging PB1, verify that every call site that previously relied on the blocking guarantee now explicitly handles the "lock not acquired" path (e.g. skip this cycle and retry) rather than silently proceeding. The E2E test in PB3 must include a concurrent-refresh scenario that would fail if the skip-and-proceed bug is present.
PgBouncer compatibility subtotal: ~7–10 days
DVM Correctness & Performance (deferred from v0.9.0)
In plain terms: These items were evaluated during v0.9.0 and deferred because the current implementations are correct — they just scale with data size rather than delta size in certain edge cases. All produce correct results today; this work makes them faster.
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| P2-1 | Recursive CTE DRed in DIFFERENTIAL mode. DELETE/UPDATE against a recursive CTE source falls back to O(n) full recompute + diff. Implement DRed for DeltaSource::ChangeBuffer to maintain O(delta) cost. | 2–3 wk | ✅ Done (0.10-adjustments) | src/dvm/operators/recursive_cte.rs |
| P2-2 | SUM NULL-transition rescan for FULL OUTER JOIN aggregates. When SUM sits above a FULL OUTER JOIN and rows transition between matched/unmatched states, algebraic formula gives 0 instead of NULL, triggering full-group rescan. Implement targeted correction. | 1–2 wk | ✅ Done | src/dvm/operators/aggregate.rs |
| P2-4 | Materialized view sources in IMMEDIATE mode (EC-09). Implement polling-change-detection wrapper for REFRESH MATERIALIZED VIEW-sourced queries in IMMEDIATE mode. | 2–3 wk | ✅ Done | plans/PLAN_EDGE_CASES.md §EC-09 |
| P2-6 | LATERAL subquery inner-source scoped re-execution. Gate outer-table scan behind a join to inner delta rows so only correlated outer rows are re-executed, reducing O(|outer|) to O(delta). | 1–2 wk | ✅ Done | src/dvm/operators/lateral_subquery.rs |
| P3-2 | Welford auxiliary columns for CORR/COVAR/REGR_* aggregates. Implement Welford-style accumulation to reach O(1) algebraic maintenance identical to the STDDEV/VAR path. | 2–3 wk | ✅ Done | src/dvm/operators/aggregate.rs |
| B3-2 | Merged-delta weight aggregation. GROUP BY __pgt_row_id, SUM(weight) for cross-source deduplication; remove zero-weight rows. | 3–4 wk | ✅ Done | PLAN_NEW_STUFF.md §B-3 |
| B3-3 | Property-based correctness tests for simultaneous multi-source changes; diamond-flow scenarios. Hard prerequisite for B3-2. | 1–2 wk | ✅ Done | PLAN_NEW_STUFF.md §B-3 |
✅ B3-2 correctly uses weight aggregation (
GROUP BY __pgt_row_id, SUM(weight)) instead ofDISTINCT ON. B3-3 property-based tests verify correctness for 6 diamond-flow topologies (inner join, left join, full join, aggregate, multi-root, deep diamond).
DVM deferred items subtotal: ~12–19 weeks
DVM Safety Fixes & CDC Correctness Hardening
These items were identified during a post-v0.9.0 audit of the DVM engine and CDC pipeline. P0 items produce runtime PostgreSQL syntax errors with no helpful extension-level error; P1 items produce silent wrong results. They target uncommon query shapes but are fully reachable by users without warning.
SQL Comment Injection (P0)
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| SF-1 | build_snapshot_sql catch-all returns an SQL comment as a FROM clause fragment. The _ arm of build_snapshot_sql() returns /* unsupported snapshot for <node> */ which is injected directly into JOIN SQL, producing a PostgreSQL syntax error (syntax error at or near "/") instead of a clear extension error. Affects any RecursiveCte, Except, Intersect, UnionAll, LateralSubquery, LateralFunction, ScalarSubquery, Distinct, or RecursiveSelfRef node appearing as a direct JOIN child. Replace the catch-all arm with PgTrickleError::UnsupportedQuery. | 0.5 d | ✅ Done | src/dvm/operators/join_common.rs |
| SF-2 | Explicit /* unsupported snapshot for distinct */ string in join.rs. Hardcoded variant of SF-1 for the Distinct-child case in inner-join snapshot construction. Same fix: return PgTrickleError::UnsupportedQuery. | 0.5 d | ✅ Done | src/dvm/operators/join.rs |
| SF-3 | parser.rs FROM-clause deparser fallbacks inject SQL comments. /* unsupported RangeSubselect */ and /* unsupported FROM item */ are emitted as FROM clause fragments, causing PostgreSQL syntax errors when the generated SQL is executed. Replace with PgTrickleError::UnsupportedQuery. | 0.5 d | ✅ Done | src/dvm/parser.rs |
DVM Correctness Bugs (P1)
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| SF-4 | child_to_from_sql returns None for renamed-column Project nodes, silently skipping group rescan. When a Project with column renames (e.g. EXTRACT(year FROM orderdate) AS o_year) sits between an aggregate and its source, child_to_from_sql() returns None and the group-rescan CTE is omitted without error. Groups crossing COUNT 0→1 or MAX deletion thresholds produce permanently stale aggregate values. Distinct from tracked P2-2 (SUM/FULL OUTER JOIN specific); this affects any complex projection above an aggregate. | 1–2 wk | ✅ Done | src/dvm/operators/aggregate.rs |
| SF-5 | EC-01 fix is incomplete for right-side join subtrees with ≥3 scan nodes. use_pre_change_snapshot() applies a join_scan_count(child) <= 2 threshold to avoid cascading CTE materialization. For right-side join chains with ≥3 scan nodes (TPC-H Q7, Q8, Q9 all qualify), the original EC-01 phantom-row-after-DELETE bug is still present. The roadmap marks EC-01 as "Done" without noting this remaining boundary. Extend the fix to ≥3-scan right subtrees, or document the limitation explicitly with a test that asserts the boundary. | 2–3 wk | ✅ Done (boundary documented with 5 unit tests + DVM_OPERATORS.md limitation note) | src/dvm/operators/join_common.rs |
| SF-6 | EXCEPT __pgt_count columns not forwarded through Project nodes, causing silent wrong results. EXCEPT uses a "retain but mark invisible" design (never emits 'D' events). A Project above EXCEPT that does not propagate __pgt_count_l/__pgt_count_r prevents the MERGE step from distinguishing visible from invisible rows. Enforce count column propagation in the planner or raise PgTrickleError at planning time if a Project over Except drops these columns. | 1–2 wk | ✅ Done | src/dvm/operators/project.rs |
DVM Edge-Condition Correctness (P2)
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| SF-7 | Empty subquery_cols silently emits (SELECT NULL FROM …) as scalar subquery result. When inner column detection fails (e.g. star-expansion from a view source), scalar_col is set to "NULL" and NULL values silently propagate into the stream table with no error raised. Detect empty subquery_cols at planning time and return PgTrickleError::UnsupportedQuery. | 0.5 d | ✅ Done | src/dvm/operators/scalar_subquery.rs |
| SF-8 | Dummy row_id = 0 in lateral inner-change branch can hash-collide with a real outer row. build_inner_change_branch() emits 0::BIGINT AS __pgt_row_id as a placeholder for re-executed outer rows. Since actual row hashes span the full BIGINT range, a real outer row could hash to 0, causing the DISTINCT/MERGE step to conflate it with the dummy entry. Use a sentinel outside the hash range (e.g. (-9223372036854775808)::BIGINT, i.e. MIN(BIGINT)) or add a separate __pgt_is_inner_dummy BOOLEAN discriminator column. | 1 wk | ✅ Done (sentinel changed to i64::MIN) | src/dvm/operators/lateral_subquery.rs |
CDC Correctness (P1–P2)
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| SF-9 | UPDATE trigger uses = (not IS NOT DISTINCT FROM) on composite PK columns, silently dropping rows with NULL PK columns. The __pgt_new JOIN __pgt_old ON pk_a = pk_a AND pk_b = pk_b uses =, so NULL = NULL evaluates to false and those rows are silently dropped from the change buffer. The stream table permanently diverges from the source with no error. Change all PK join conditions in the UPDATE trigger to use IS NOT DISTINCT FROM. | 0.5 d | ✅ Done | src/cdc.rs |
| SF-10 | TRUNCATE marker + same-window INSERT ordering is untested; post-TRUNCATE rows may be missed. If INSERTs arrive after a TRUNCATE but before the scheduler ticks, the change buffer contains both a 'T' marker and 'I' rows. The "TRUNCATE → full refresh → discard buffer" path has no E2E test coverage for this sequencing. A race between the FULL refresh snapshot and in-flight inserts could drop post-TRUNCATE inserted rows. Add a targeted E2E test and verify atomicity of the discard-vs-snapshot sequence. | 0.5 d | ✅ Done (verified: TRUNCATE triggers full refresh which re-reads source; change buffer is discarded atomically within the same transaction) | src/cdc.rs |
| SF-11 | WAL publication goes stale after a source table is later converted to partitioned. create_publication() sets publish_via_partition_root = true only at creation time. If a source table is subsequently converted to partitioned, WAL events arrive with child-partition OIDs, causing lookup failures and a silent CDC stall for that table (no error, stream table silently freezes). Detect post-creation partitioning during publication health checks and rebuild the publication entry. | 1–2 wk | ✅ Done | src/wal_decoder.rs |
Operational & Documentation Gaps (P3)
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| SF-12 | DiamondSchedulePolicy::Fastest CPU multiplication is undocumented. The default policy refreshes all members of a diamond consistency group whenever any member is due. In an asymmetric diamond (B every 1s, C every 5s, both feeding D), C refreshes 5× more often than scheduled, consuming unexplained CPU. Add a cost-implication warning to CONFIGURATION.md and ARCHITECTURE.md, and explain DiamondSchedulePolicy::Slowest as the low-CPU alternative. | 0.5 d | ✅ Done | src/dag.rs · docs/CONFIGURATION.md |
| SF-13 | ROADMAP inconsistency: B-2 (Delta Predicate Pushdown) listed as ⬜ Not started in v0.10.0 but G-4/P2-7 marked completed in v0.9.0. The v0.9.0 exit criteria mark [x] G-4 (P2-7): Delta predicate pushdown implemented, yet the v0.10.0 table lists B-2 | Delta Predicate Pushdown | ⬜ Not started. If B-2 has additional scope beyond G-4 (e.g. OR-branch handling for deletions, covering index creation, benchmark targets), document that scope explicitly. If B-2 is fully covered by G-4, remove or mark it done in the v0.10.0 table to avoid double-counting effort. | 0.5 d | ✅ Done (B-2 marked as completed by G-4/P2-7) | ROADMAP.md |
DVM safety & CDC hardening subtotal: ~3–4 days (SF-1–3, SF-7, SF-9–10, SF-12–13) + ~6–10 weeks (SF-4–6, SF-8, SF-11)
Core Refresh Optimizations (Wave 2)
Read the risk analyses in PLAN_NEW_STUFF.md before implementing. Implement in this order: A-4 (no schema change), B-2, C-4, then B-4.
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| A-4 | Index-Aware MERGE Planning. Planner hint injection (enable_seqscan = off for small-delta / large-target); covering index auto-creation on __pgt_row_id. No schema changes required. | 1–2 wk | ✅ Done | PLAN_NEW_STUFF.md §A-4 |
| B-2 | Delta Predicate Pushdown. Push WHERE predicates from defining query into change-buffer delta_scan CTE; OR old_col handling for deletions; 5–10× delta-row-volume reduction for selective queries. | 2–3 wk | ✅ Done (v0.9.0 as G-4/P2-7) | PLAN_NEW_STUFF.md §B-2 |
| C-4 | Change Buffer Compaction. Net-change compaction (INSERT+DELETE=no-op; UPDATE+UPDATE=single row); run when buffer exceeds pg_trickle.compact_threshold; use advisory lock to serialise with refresh. | 2–3 wk | ✅ Done | PLAN_NEW_STUFF.md §C-4 |
| B-4 | Cost-Based Refresh Strategy. Replace fixed differential_max_change_ratio with a history-driven cost model fitted on pgt_refresh_history; cold-start fallback to fixed threshold. | 2–3 wk | ✅ Done (cost model + adaptive threshold already active) | PLAN_NEW_STUFF.md §B-4 |
⚠️ C-4: The compaction DELETE must use
seq(the sequence primary key) notctidas the stable row identifier.ctidchanges under VACUUM and will silently delete the wrong rows. See the corrected SQL and risk analysis in PLAN_NEW_STUFF.md §C-4.
⚠️ A-4 — Planner hint must be transaction-scoped (
SET LOCAL), never session-scoped (SET). The existing P3-4 implementation (already shipped) usesSET LOCAL enable_seqscan = off, which PostgreSQL automatically reverts at transaction end. Any extension of A-4 (e.g. the covering index auto-creation path) must continue to useSET LOCAL. Using plainSETinstead would permanently disable seq-scans for the remainder of the session, corrupting planner behaviour for all subsequent queries in that backend.
Core refresh optimizations subtotal: ~7–11 weeks
Scheduler & DAG Scalability
These items address scheduler CPU efficiency and DAG maintenance overhead at scale. Both were identified as C-1 and C-2 in plans/performance/PLAN_NEW_STUFF.md but were not included in earlier milestones.
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| G-7 | Tiered refresh scheduling (Hot/Warm/Cold/Frozen). All stream tables currently refresh at their configured interval regardless of how often they are queried. In deployments with many STs, most Cold/Frozen tables consume full scheduler CPU unnecessarily. Introduce four tiers keyed by a per-ST pgtrickle access counter (not pg_stat_user_tables, which is polluted by pg_trickle's own MERGE scans): Hot (≥10 reads/min: refresh at configured interval), Warm (1–10 reads/min: ×2 interval), Cold (<1 read/min: ×10 interval), Frozen (0 reads since last N cycles: suspend until manually promoted). A single GUC pg_trickle.tiered_scheduling (default off) gates the feature. | 3–4 wk | ✅ Done | src/scheduler.rs · plans/performance/PLAN_NEW_STUFF.md §C-1 |
| G-8 | Incremental DAG rebuild on DDL changes. Any CREATE/ALTER/DROP STREAM TABLE currently triggers a full O(V+E) re-query of all pgt_dependencies rows to rebuild the entire DAG. For deployments with 100+ stream tables this adds per-DDL latency and has a race condition: if two DDL events arrive before the scheduler ticks, only the latest pgt_id stored in shared memory may be processed. Replace with a targeted edge-delta approach: the DDL hooks write affected stream table OIDs into a pending-changes queue; the scheduler applies only those edge insertions/deletions, leaving the rest of the graph intact. | 2–3 wk | ✅ Done | src/dag.rs · src/scheduler.rs · plans/performance/PLAN_NEW_STUFF.md §C-2 |
| C2-1 | Ring-buffer DAG invalidation. Replace single pgt_id scalar in shared memory with a bounded ring buffer of affected IDs; full-rebuild fallback on overflow. Hard prerequisite for correctness of G-8 under rapid DDL changes. | 1 wk | ✅ Done | PLAN_NEW_STUFF.md §C-2 |
| C2-2 | Incremental topo-sort. Incremental topo-sort on affected subgraph only; cache sorted schedule in shared memory. | 1–2 wk | ✅ Done | PLAN_NEW_STUFF.md §C-2 |
⚠️ A single
pgt_idscalar in shared memory is vulnerable to overwrite when two DDL changes arrive between scheduler ticks — use a ring buffer (C2-1) or fall back to full rebuild. See PLAN_NEW_STUFF.md §C-2 risk analysis.
Scheduler & DAG scalability subtotal: ~7–10 weeks
"No Surprises" — Principle of Least Astonishment
In plain terms: pg_trickle does a lot of work automatically — rewriting queries, managing auxiliary columns, transitioning CDC modes, falling back between refresh strategies. Most of this is exactly what users want, but several behaviors happen silently where a brief notification would prevent confusion. This section adds targeted warnings, notices, and documentation so that every implicit behavior is surfaced to the user at the moment it matters.
| Item | Description | Effort | Status | Ref |
|---|---|---|---|---|
| NS-1 | Warn on ORDER BY without LIMIT. Emit WARNING at create_stream_table / alter_stream_table time when query contains ORDER BY without LIMIT: "ORDER BY without LIMIT has no effect on stream tables — storage row order is undefined." | 2–4h | ✅ Done | src/api.rs |
| NS-2 | Warn on append_only auto-revert. Upgrade the info!() to warning!() when append_only is automatically reverted due to DELETE/UPDATE. Add a pgtrickle_alert NOTIFY with category append_only_reverted. | 1–2h | ✅ Done | src/refresh.rs |
| NS-3 | Promote cleanup errors after consecutive failures. Track consecutive drain_pending_cleanups() error count in thread-local state; promote from debug1 to WARNING after 3 consecutive failures for the same source OID. | 2–4h | ✅ Done | src/refresh.rs |
| NS-4 | Document __pgt_* auxiliary columns in SQL_REFERENCE. Add a dedicated subsection listing all implicit columns (__pgt_row_id, __pgt_count, __pgt_sum, __pgt_sum2, __pgt_nonnull, __pgt_covar_*, __pgt_count_l, __pgt_count_r) with the aggregate functions that trigger each. | 2–4h | ✅ Done | docs/SQL_REFERENCE.md |
| NS-5 | NOTICE on diamond detection with diamond_consistency='none'. When create_stream_table detects a diamond dependency and the user hasn't explicitly set diamond_consistency, emit NOTICE: "Diamond dependency detected — consider setting diamond_consistency='atomic' for consistent cross-branch reads." | 2–4h | ✅ Done | src/api.rs · src/dag.rs |
| NS-6 | NOTICE on differential→full fallback. Upgrade the existing info!() in adaptive fallback to NOTICE so it appears at default client_min_messages level. | 0.5–1h | ✅ Done | src/refresh.rs |
| NS-7 | NOTICE on isolated CALCULATED schedule. When create_stream_table creates an ST with schedule='calculated' that has no downstream dependents, emit NOTICE: "No downstream dependents found — schedule will fall back to pg_trickle.default_schedule_seconds (currently Ns)." | 1–2h | ✅ Done | src/api.rs |
"No Surprises" subtotal: ~10–20 hours
v0.10.0 total: ~58–84 hours + ~32–50 weeks DVM, refresh & safety work + ~10–20 hours "No Surprises"
Exit criteria:
-
ALTER EXTENSION pg_trickle UPDATEtested (0.9.0 → 0.10.0) — upgrade script verified complete viascripts/check_upgrade_completeness.sh; addspooler_compatibility_mode,refresh_tier,pgt_refresh_groups, and updated API function signatures - All public documentation current and reviewed — SQL_REFERENCE.md, CONFIGURATION.md, CHANGELOG.md, and ROADMAP.md updated for all v0.10.0 features
-
G-7: Tiered scheduling (Hot/Warm/Cold/Frozen) implemented;
pg_trickle.tiered_schedulingGUC gating the feature - G-8: Incremental DAG rebuild implemented; DDL-triggered edge-delta replaces full O(V+E) re-query
- C2-1: Ring-buffer DAG invalidation safe under rapid consecutive DDL changes
- C2-2: Incremental topo-sort caches sorted schedule; verified by property-based test
- P2-1: Recursive CTE DRed for DIFFERENTIAL mode (O(delta) instead of O(n) recompute) — implemented in 0.10-adjustments
-
P2-2: SUM NULL-transition correction for FULL OUTER JOIN aggregates — implemented;
__pgt_aux_nonnull_*auxiliary column eliminates full-group rescan - P2-4: Materialized view sources supported in IMMEDIATE mode
- P2-6: LATERAL subquery inner-source scoped re-execution (O(delta) instead of O(|outer|))
- P3-2: CORR/COVAR_/REGR_ Welford auxiliary columns for O(1) algebraic maintenance
- B3-2: Merged-delta weight aggregation passes property-based correctness proofs — implemented; replaces DISTINCT ON with GROUP BY + SUM(weight) + HAVING
- B3-3: Property-based tests for simultaneous multi-source changes — implemented; 6 diamond-flow E2E property tests
-
A-4: Covering index auto-created on
__pgt_row_idwith INCLUDE clause for ≤8-column schemas; planner hint prevents seq-scan on small delta;SET LOCALconfirmed (notSET) so hint reverts at transaction end -
B-2: Predicate pushdown reduces delta volume for selective queries —
bench_b2_predicate_pushdownine2e_bench_tests.rsmeasures median filtered vs unfiltered refresh time; asserts filtered ≤3× unfiltered (in practice typically faster) -
C-4: Compaction uses
change_idPK (notctid); correct under concurrent VACUUM; serialised with advisory lock; net-zero elimination + intermediate row collapse -
B-4: Cost model self-calibrates from refresh history (
estimate_cost_based_threshold+compute_adaptive_thresholdwith 60/40 blend); cold-start fallback to fixed GUC threshold -
PB1: Concurrent-refresh scenario covered by
test_pb1_concurrent_refresh_skip_locked_no_corruptionine2e_concurrent_tests.rs; two concurrentrefresh_stream_table()calls verified to produce correct data without corruption;SKIP LOCKEDpath confirmed non-blocking -
SF-1:
build_snapshot_sqlcatch-all arm usespgrx::error!()instead of injecting an SQL comment as a FROM fragment -
SF-2: Explicit
/* unsupported snapshot for distinct */string replaced withPgTrickleError::UnsupportedQueryin join.rs -
SF-3:
parser.rsFROM-clause deparser fallbacks replaced withPgTrickleError::UnsupportedQuery -
SF-4:
child_to_from_sqlwraps Project in subquery with projected expressions; rescan CTE correctly resolves aliased column names - SF-5: EC-01 ≤2-scan boundary documented with 5 unit tests asserting the boundary + DVM_OPERATORS.md limitation note explaining the CTE materialization trade-off
-
SF-6:
diff_projectforwards__pgt_count_l/__pgt_count_rthrough projection when present in child result -
SF-7: Empty
subquery_colsin scalar subquery returnsPgTrickleError::UnsupportedQueryrather than emittingNULL -
SF-8: Lateral inner-change branch uses
i64::MINsentinel instead of0::BIGINTas dummy__pgt_row_id -
SF-9: UPDATE trigger PK join uses
IS NOT DISTINCT FROMfor all PK columns; NULL-PK rows captured correctly - SF-10: TRUNCATE + same-window INSERT E2E test passes; post-TRUNCATE rows not dropped
-
SF-11:
check_publication_health()detects post-creation partitioning and rebuilds publication withpublish_via_partition_root = true -
SF-12:
DiamondSchedulePolicy::Fastestcost-multiplication documented inCONFIGURATION.mdwithSlowestexplanation - SF-13: B-2 / G-4 roadmap inconsistency resolved; entry reflects actual remaining scope (or marked done if fully completed)
-
NS-1:
ORDER BYwithoutLIMITemitsWARNINGat creation time; E2E test verifies message -
NS-2:
append_onlyauto-revert usesWARNING(notINFO) and sendspgtrickle_alertNOTIFY -
NS-3:
drain_pending_cleanupspromotes toWARNINGafter 3 consecutive failures per source OID -
NS-4:
__pgt_*auxiliary columns documented in SQL_REFERENCE with triggering aggregate functions -
NS-5: Diamond detection with
diamond_consistency='none'emitsNOTICEsuggesting'atomic' -
NS-6: Differential→full adaptive fallback uses
NOTICE(notINFO) -
NS-7: Isolated
CALCULATEDschedule emitsNOTICEwith effective fallback interval -
NS-8:
diamond_consistencydefault changed to'atomic'; catalog DDL, API code comments, and all documentation updated to match actual runtime behavior (API already resolvedNULLtoAtomic)
v0.11.0 — Partitioned Stream Tables, Prometheus & Grafana Observability, Safety Hardening & Correctness
Status: Released 2026-03-26. See CHANGELOG.md §0.11.0 for the full feature list.
Highlights: 34× lower latency via event-driven scheduler wake · incremental ST-to-ST refresh chains · declaratively partitioned stream tables (100× I/O reduction) · ready-to-use Prometheus + Grafana monitoring stack · FUSE circuit breaker · VARBIT changed-column bitmask (no more 63-column cap) · per-database worker quotas · DAG scheduling performance improvements (fused chains, adaptive polling, amplification detection) · TPC-H correctness gate in CI · safer production defaults.
Completed items (click to expand)
Partitioned Stream Tables — Storage (A-1)
In plain terms: A 10M-row stream table partitioned into 100 ranges means only the 2–3 partitions that actually received changes are touched by MERGE — reducing the MERGE scan from 10M rows to ~100K. The partition key must be a user-visible column and the refresh path must inject a verified range predicate.
| Item | Description | Effort | Ref |
|---|---|---|---|
| A1-1 | DDL: CREATE STREAM TABLE … PARTITION BY declaration; catalog column for partition key | 1–2 wk | PLAN_NEW_STUFF.md §A-1 |
| A1-2 | Delta inspection: extract min/max of partition key from delta CTE per scheduler tick | 1 wk | PLAN_NEW_STUFF.md §A-1 |
| A1-3 | MERGE rewrite: inject validated partition-key range predicate or issue per-partition MERGEs via Rust loop | 2–3 wk | PLAN_NEW_STUFF.md §A-1 |
| A1-4 | E2E benchmarks: 10M-row partitioned ST, 0.1% change rate concentrated in 2–3 partitions | 1 wk | PLAN_NEW_STUFF.md §A-1 |
⚠️ MERGE joins on
__pgt_row_id(a content hash unrelated to the partition key) — partition pruning will not activate automatically. A predicate injection step is mandatory. See PLAN_NEW_STUFF.md §A-1 risk analysis before starting.
Retraction consideration (A-1): The 5–7 week effort estimate is optimistic. The core assumption — that partition pruning can be activated via a
WHERE partition_key BETWEEN ? AND ?predicate — requires the partition key to be a tracked catalog column (not currently the case) and a verified range derivation from the delta. The alternative (per-partition MERGE loop in Rust) is architecturally sound but requires significant catalog and refresh-path changes. A design spike (2–4 days) producing a written implementation plan must be completed before A1-1 is started. The milestone is at P3 / Very High risk and should not block the 1.0 release if the design spike reveals additional complexity.
Partitioned stream tables subtotal: ~5–7 weeks
Multi-Database Scheduler Isolation (C-3)
| Item | Description | Effort | Ref |
|---|---|---|---|
pg_trickle.per_database_worker_quota); priority ordering (IMMEDIATE > Hot > Warm > Cold); burst capacity up to 150% when other DBs are under budgetcompute_per_db_quota() helper with burst threshold at 80% cluster utilisation; sort_ready_queue_by_priority() dispatches ImmediateClosure first; 7 unit tests. | — | src/scheduler.rs |
Multi-DB isolation subtotal: ✅ Complete
Prometheus & Grafana Observability
In plain terms: Most teams already run Prometheus and Grafana to monitor their databases. This ships ready-to-use configuration files — no custom code, no extension changes — that plug into the standard
postgres_exporterand light up a Grafana dashboard showing refresh latency, staleness, error rates, CDC lag, and per-stream-table detail. Also includes Prometheus alerting rules so you get paged when a stream table goes stale or starts error-looping. A Docker Compose file lets you try the full observability stack with a singledocker compose up.
Zero-code monitoring integration. All config files live in a new
monitoring/ directory in the main repo (or a separate
pgtrickle-monitoring repo). Queries use existing views
(pg_stat_stream_tables, check_cdc_health(), quick_health).
| Item | Description | Effort | Ref |
|---|---|---|---|
monitoring/prometheus/pg_trickle_queries.yml exports 14 metrics (per-table refresh stats, health summary, CDC buffer sizes, status counts, recent error rate) via postgres_exporter. | — | monitoring/prometheus/pg_trickle_queries.yml | |
monitoring/prometheus/alerts.yml has 8 alerting rules: staleness > 5 min, ≥3 consecutive failures, table SUSPENDED, CDC buffer > 1 GB, scheduler down, high refresh duration, cluster WARNING/CRITICAL. | — | monitoring/prometheus/alerts.yml | |
monitoring/grafana/dashboards/pg_trickle_overview.json has 6 sections: cluster overview stat panels, refresh performance time-series, staleness heatmap, CDC health graphs, per-table drill-down table with schema/table variable filters. | — | monitoring/grafana/dashboards/pg_trickle_overview.json | |
monitoring/docker-compose.yml spins up PostgreSQL + pg_trickle + postgres_exporter + Prometheus + Grafana with pre-wired config and demo seed data (monitoring/init/01_demo.sql). docker compose up → Grafana at :3000. | — | monitoring/docker-compose.yml |
Observability subtotal: ~12 hours ✅
Default Tuning & Safety Defaults (from REPORT_OVERALL_STATUS.md)
These four changes flip conservative defaults to the behavior that is safe and correct in production. All underlying features are implemented and tested; only the default values change. Each keeps the original GUC so operators can revert if needed.
| Item | Description | Effort | Ref |
|---|---|---|---|
parallel_refresh_mode default to 'on'.normalize_parallel_refresh_mode maps None/unknown → On; unit test renamed to defaults_to_on. | — | REPORT_OVERALL_STATUS.md §R1 | |
| DEF-2 | auto_backoff default to true.true; trigger threshold raised to 95%, cap reduced to 8×, log level raised to WARNING. CONFIGURATION.md updated. | 1–2h | REPORT_OVERALL_STATUS.md §R10 |
left_snapshot_filtered pre-filter with WHERE left_key IN (SELECT DISTINCT right_key FROM delta) was already present in semi_join.rs. | — | src/dvm/operators/semi_join.rs | |
INVALIDATION_RING_CAPACITY raised to 128 in shmem.rs. | — | REPORT_OVERALL_STATUS.md §R9 | |
block_source_ddl default to true.true; both error messages in hooks.rs include step-by-step escape-hatch procedure. | — | REPORT_OVERALL_STATUS.md §R12 |
Default tuning subtotal: ~14–21 hours
Safety & Resilience Hardening (Must-Ship)
In plain terms: The background worker should never silently hang or leave a stream table in an undefined state when an internal operation fails. These items replace
panic!/unwrap()in code paths reachable from the background worker with structured errors and graceful recovery.
| Item | Description | Effort | Ref |
|---|---|---|---|
scheduler.rs, refresh.rs, hooks.rs: no panic!/unwrap() outside #[cfg(test)]. check_skip_needed now logs WARNING on SPI error with table name and error details. Audit finding documented in comment. | — | src/scheduler.rs | |
tests/e2e_safety_tests.rs: (1) column drop triggers UpstreamSchemaChanged, verifies scheduler stays alive and other STs continue; (2) source table drop, same verification. | — | tests/e2e_safety_tests.rs |
Safety hardening subtotal: ~7–12 hours
Correctness & Code Quality Quick Wins (from REPORT_OVERALL_STATUS.md §12–§15)
In plain terms: Six self-contained improvements identified in the deep gap analysis. Each takes under a day and substantially reduces silent failure modes, operator confusion, and diagnostic friction.
Quick Fixes (< 1 hour each)
| Item | Description | Effort | Ref |
|---|---|---|---|
| QF-1 | println!.println! replaced with pgrx::log!() guarded by new pg_trickle.log_merge_sql GUC (default off). | — | src/refresh.rs |
| QF-2 | api.rs raised from pgrx::info!() to pgrx::warning!(). | — | plans/performance/REPORT_OVERALL_STATUS.md §12 |
| QF-3 | append_only auto-reverts.pgrx::warning!() + emit_alert(AppendOnlyReverted) already present in refresh.rs. | — | plans/performance/REPORT_OVERALL_STATUS.md §15 |
| QF-4 | unwrap() invariants.// INVARIANT: comments added at four unwrap() sites in dvm/parser.rs (after is_empty() guard, len()==1 guards, and non-empty Err return). | — | src/dvm/parser.rs |
Quick-fix subtotal: ~3–4 hours
Effective Refresh Mode Tracking (G12-ERM)
In plain terms: When a stream table is configured as
AUTO, operators currently have no way to discover which mode is actually being used at runtime without reading warning logs. Storing the resolved mode in the catalog and exposing a diagnostic function closes this observability gap.
| Item | Description | Effort | Ref |
|---|---|---|---|
effective_refresh_mode column to pgt_stream_tablespg_trickle--0.10.0--0.11.0.sql created. | — | src/catalog.rs | |
explain_refresh_mode(name TEXT) SQL functionpgtrickle.explain_refresh_mode() returns configured mode, effective mode, and downgrade reason. | — | src/api.rs |
Effective refresh mode subtotal: ~4–7 hours
Correctness Guards (G12-2, G12-AGG)
| Item | Description | Effort | Ref |
|---|---|---|---|
validate_topk_metadata() re-parses the reconstructed full query on each TopK refresh; validate_topk_metadata_fields() validates stored fields (pure logic, unit-testable). Falls back to FULL + WARNING on mismatch. 7 unit tests. | — | src/refresh.rs | |
classify_agg_strategy() classifies each aggregate as ALGEBRAIC_INVERTIBLE / ALGEBRAIC_VIA_AUX / SEMI_ALGEBRAIC / GROUP_RESCAN. Warning emitted at create_stream_table time for DIFFERENTIAL + group-rescan aggs. Strategy exposed in explain_st() as aggregate_strategies JSON. 18 unit tests. | — | src/dvm/parser.rs |
Correctness guards subtotal: ✅ Complete
Parameter & Error Hardening (G15-PV, G13-EH)
| Item | Description | Effort | Ref |
|---|---|---|---|
cdc_mode='wal' + refresh_mode='IMMEDIATE' rejection was already present; (b) diamond_schedule_policy='slowest' + diamond_consistency='none' now rejected in create_stream_table_impl and alter_stream_table_impl with structured error. | — | src/api.rs | |
raise_error_with_context() helper in api.rs uses ErrorReport::new().set_detail().set_hint() for UnsupportedOperator, CycleDetected, UpstreamSchemaChanged, and QueryParseError; all 8 API-boundary error sites updated. | — | src/api.rs |
Parameter & error hardening subtotal: ~6–12 hours
Testing: EC-01 Boundary Regression (G17-EC01B-NEG)
| Item | Description | Effort | Ref |
|---|---|---|---|
join_common.rs covering 3-way join, 4-way join, right-subtree ≥3 scans, and 2-scan boundary. // TODO: Remove when EC01B-1/EC01B-2 fixed in v0.12.0 | — | src/dvm/operators/join_common.rs |
EC-01 boundary regression subtotal: ✅ Complete
Documentation Quick Wins (G16-GS, G16-SM, G16-MQR, G15-GUC)
| Item | Description | Effort | Ref |
|---|---|---|---|
| G16-GS | Restructure GETTING_STARTED.md with progressive complexity. Five chapters: (1) Hello World — single-table ST with no join; (2) Multi-table join; (3) Scheduling & backpressure; (4) Monitoring — 5 key functions; (5) Advanced — FUSE, wide bitmask, partitions. Remove the current flat wall-of-SQL structure. ✅ Done in v0.11.0 Phase 11 — 5-chapter structure implemented; Chapter 1 Hello World example added; Chapter 5 Advanced Topics adds inline FUSE, partitioning, IMMEDIATE, and multi-tenant quota examples. | — | docs/GETTING_STARTED.md |
docs/DVM_OPERATORS.md covering all operators × FULL/DIFFERENTIAL/IMMEDIATE modes with caveat footnotes. | — | docs/DVM_OPERATORS.md | |
docs/GETTING_STARTED.md with pgt_status(), health_check(), change_buffer_sizes(), dependency_tree(), fuse_status(), Prometheus/Grafana stack, key metrics table, and alert summary. | — | docs/GETTING_STARTED.md | |
docs/CONFIGURATION.md. | — | docs/CONFIGURATION.md |
Documentation subtotal: ~2–3 days
Correctness quick-wins & documentation subtotal: ~1–2 days code + ~2–3 days docs
Should-Ship Additions
Wider Changed-Column Bitmask (>63 columns)
In plain terms: Stream tables built on source tables with more than 63 columns fall back silently to tracking every column on every UPDATE, losing all CDC selectivity. Extending the
changed_colsfield from aBIGINTto aBYTEAvector removes this cliff without breaking existing deployments.
| Item | Description | Effort | Ref |
|---|---|---|---|
| WB-1 | Extend the CDC trigger changed_cols column from BIGINT to BYTEA; update bitmask encoding/decoding in cdc.rs; add schema migration for existing change buffer tables (tables with <64 columns are unaffected at the data level). | 1–2 wk | REPORT_OVERALL_STATUS.md §R13 |
| WB-2 | E2E test: wide (>63 column) source table; verify only referenced columns trigger delta propagation; benchmark UPDATE selectivity before/after. | 2–4h | tests/e2e_cdc_tests.rs |
Wider bitmask subtotal: ~1–2 weeks + ~4h testing
Fuse — Anomalous Change Detection
In plain terms: A circuit breaker that stops a stream table from processing an unexpectedly large batch of changes (runaway script, mass delete, data migration) without operator review. A blown fuse halts refresh and emits a
pgtrickle_alertNOTIFY;reset_fuse()resumes with a chosen recovery action (apply,reinitialize, orskip_changes).
| Item | Description | Effort | Ref |
|---|---|---|---|
pgt_stream_tables (fuse_mode, fuse_state, fuse_ceiling, fuse_sensitivity, blown_at, blow_reason) | 1–2h | PLAN_FUSE.md | |
alter_stream_table() new params: fuse, fuse_ceiling, fuse_sensitivity | 1h | PLAN_FUSE.md | |
reset_fuse(name, action => 'apply'|'reinitialize'|'skip_changes') SQL function | 1h | PLAN_FUSE.md | |
fuse_status() introspection function | 1h | PLAN_FUSE.md | |
| 2–3h | PLAN_FUSE.md | ||
apply/reinitialize/skip_changes), diamond/DAG interaction | 4–6h | PLAN_FUSE.md |
Fuse subtotal: ~10–14 hours — ✅ Complete
External Correctness Gate (TS1 or TS2)
In plain terms: Run an independent public query corpus through pg_trickle's DIFFERENTIAL mode and assert the results match a vanilla PostgreSQL execution. This catches blind spots that the extension's own test suite cannot, and provides an objective correctness baseline before v1.0.
| Item | Description | Effort | Ref |
|---|---|---|---|
| TS1 | sqllogictest suite. Run the PostgreSQL sqllogic suite through pg_trickle DIFFERENTIAL mode; gate CI on zero correctness mismatches. Preferred choice: broadest query coverage. | 2–3d | PLAN_TESTING_GAPS.md §J |
| TS2 | JOB (Join Order Benchmark). Correctness baseline and refresh latency profiling on realistic multi-join analytical queries. Alternative if sqllogictest setup is too costly. | 1–2d | PLAN_TESTING_GAPS.md §J |
Deliver one of TS1 or TS2; whichever is completed first meets the exit criterion.
External correctness gate subtotal: ~1–3 days
Differential ST-to-ST Refresh (✅ Done)
In plain terms: When stream table B's defining query reads from stream table A, pg_trickle currently forces a FULL refresh of B every time A updates — re-executing B's entire query even when only a handful of rows changed. This feature gives ST-to-ST dependencies the same CDC change buffer that base tables already have, so B refreshes differentially (applying only the delta). Crucially, even when A itself does a FULL refresh, a pre/post snapshot diff is captured so B still receives a small I/D delta rather than cascading FULL through the chain.
| Item | Description | Status | Ref |
|---|---|---|---|
| ST-ST-1 | Change buffer infrastructure. create_st_change_buffer_table() / drop_st_change_buffer_table() in cdc.rs; lifecycle hooks in api.rs; idempotent ensure_st_change_buffer() | ✅ Done | PLAN_ST_TO_ST.md §Phase 1 |
| ST-ST-2 | Delta capture — DIFFERENTIAL path. Force explicit DML when ST has downstream consumers; capture delta from __pgt_delta_{id} to changes_pgt_{id} | ✅ Done | PLAN_ST_TO_ST.md §Phase 2 |
| ST-ST-3 | Delta capture — FULL path. Pre/post snapshot diff writes I/D pairs to changes_pgt_{id}; eliminates cascading FULL | ✅ Done | PLAN_ST_TO_ST.md §7 |
| ST-ST-4 | DVM scan operator for ST sources. Read from changes_pgt_{id}; pgt_-prefixed LSN tokens; extended frontier and placeholder resolver | ✅ Done | PLAN_ST_TO_ST.md §Phase 3 |
| ST-ST-5 | Scheduler integration. Buffer-based change detection in has_stream_table_source_changes(); removed FULL override; frontier augmented with ST source positions | ✅ Done | PLAN_ST_TO_ST.md §Phase 4 |
| ST-ST-6 | Cleanup & lifecycle. cleanup_st_change_buffers_by_frontier() for ST buffers; removed prewarm skip for ST sources; ST buffer cleanup in both differential and full refresh paths | ✅ Done | PLAN_ST_TO_ST.md §Phase 5–6 |
ST-to-ST differential subtotal: ~4.5–6.5 weeks
Adaptive/Event-Driven Scheduler Wake (Must-Ship)
In plain terms: The scheduler currently wakes on a fixed 1-second timer even when nothing has changed. This adds event-driven wake: CDC triggers notify the scheduler immediately when changes arrive. Median end-to-end latency drops from ~515 ms to ~15 ms for low-volume workloads — a 34× improvement. This is a must-ship item because low latency is a primary project goal.
| Item | Description | Effort | Ref |
|---|---|---|---|
pg_notify('pgtrickle_wake', '') after each change buffer INSERT; scheduler issues LISTEN pgtrickle_wake at startup; 10 ms debounce coalesces rapid notifications; poll fallback preserved. New GUCs: event_driven_wake (default true), wake_debounce_ms (default 10). E2E tests in tests/e2e_wake_tests.rs. | — | REPORT_OVERALL_STATUS.md §R16 |
Event-driven wake subtotal: ✅ Complete
Stretch Goals (if capacity allows after Must-Ship)
| Item | Description | Effort | Ref |
|---|---|---|---|
| 2–4d | PLAN_PARTITIONING_SPIKE.md | ||
CREATE STREAM TABLE … PARTITION BY; st_partition_key catalog column.partition_by parameter added to all three create_stream_table* functions; st_partition_key TEXT column in catalog; validate_partition_key() validates column exists in output; build_create_table_sql emits PARTITION BY RANGE (key); setup_storage_table creates default catch-all partition and non-unique __pgt_row_id index. | 1–2 wk | PLAN_PARTITIONING_SPIKE.md | |
extract_partition_range() in refresh.rs runs SELECT MIN/MAX(key)::text on the resolved delta SQL; returns None on empty delta (MERGE skipped). | 1 wk | PLAN_PARTITIONING_SPIKE.md §8 | |
inject_partition_predicate() replaces __PGT_PART_PRED__ placeholder in MERGE ON clause with AND st."key" BETWEEN 'min' AND 'max'; CachedMergeTemplate stores delta_sql_template; D-2 prepared statements disabled for partitioned STs. | 2–3 wk | PLAN_PARTITIONING_SPIKE.md §8 | |
EXPLAIN (ANALYZE, BUFFERS) partition-scan verification.tests/e2e_partition_tests.rs covering: initial populate, differential inserts, updates/deletes, empty-delta fast path, EXPLAIN plan verification, invalid partition key rejection; added to light-E2E allowlist. | 1 wk | PLAN_PARTITIONING_SPIKE.md §9 |
Stretch subtotal: STRETCH-1 + A1-1 + A1-2 + A1-3 + A1-4 ✅ All complete
DAG Refresh Performance Improvements (from PLAN_DAG_PERFORMANCE.md §8)
In plain terms: Now that ST-to-ST differential refresh eliminates the "every hop is FULL" bottleneck, the next performance frontier is reducing per-hop overhead and exploiting DAG structure more aggressively. These items target the scheduling and dispatch layer — not the DVM engine — and collectively can reduce end-to-end propagation latency by 30–50% for heterogeneous DAGs.
| Item | Description | Effort | Ref |
|---|---|---|---|
remaining_upstreams tracking with immediate downstream readiness propagation. No level barrier exists. 3 validation tests. | 2–3 wk | PLAN_DAG_PERFORMANCE.md §8.1 | |
WaitLatch with shared-memory completion flags.compute_adaptive_poll_ms() pure-logic helper with exponential backoff (20ms → 200ms); ParallelDispatchState tracks adaptive_poll_ms + completions_this_tick; resets to 20ms on worker completion; 8 unit tests. | 1–2 wk | PLAN_DAG_PERFORMANCE.md §8.2 | |
pgt_refresh_history. When a join ST amplifies delta beyond a configurable threshold (e.g., output > 100× input), emit a performance WARNING and optionally fall back to FULL for that hop. Expose amplification metrics in explain_st().pg_trickle.delta_amplification_threshold GUC (default 100×); compute_amplification_ratio + should_warn_amplification pure-logic helpers; WARNING emitted after MERGE with ratio, counts, and tuning hint; explain_st() exposes amplification_stats JSON from last 20 DIFFERENTIAL refreshes; 15 unit tests. | 3–5d | PLAN_DAG_PERFORMANCE.md §8.4 | |
changes_pgt_ buffer table. Eliminates 2× SPI DML per hop (~20 ms savings per hop for 10K-row deltas).FusedChain execution unit kind; find_fusable_chains() pure-logic detection; capture_delta_to_bypass_table() writes to temp table; DiffContext.st_bypass_tables threads bypass through DVM scan; delta SQL cache bypassed when active; 11+4 unit tests. | 3–4 wk | PLAN_DAG_PERFORMANCE.md §8.3 | |
__pgt_row_id that accumulate between reads during rapid-fire upstream refreshes. Adapts existing compute_net_effect() logic to the ST buffer schema.compact_st_change_buffer() with build_st_compact_sql() pure-logic helper; advisory lock namespace 0x5047_5500; integrated in execute_differential_refresh() after C-4 base-table compaction; 9 unit tests. | 1–2 wk | PLAN_DAG_PERFORMANCE.md §8.5 |
DAG refresh performance subtotal: ~8–12 weeks
v0.11.0 total: ~7–10 weeks (partitioning + isolation) + ~12h observability + ~14–21h default tuning + ~7–12h safety hardening + ~2–4 weeks should-ship (bitmask + fuse + external corpus) + ~4.5–6.5 weeks ST-to-ST differential + ~2–3 weeks event-driven wake + ~1–2 days correctness quick-wins + ~2–3 days documentation + ~8–12 weeks DAG performance
Exit criteria: ✅ All met. Released 2026-03-26.
- Declaratively partitioned stream tables accepted; partition key tracked in catalog — ✅ Done in v0.11.0 Partitioning Spike (STRETCH-1 RFC + A1-1)
-
Partitioned storage table created with
PARTITION BY RANGE+ default catch-all partition — ✅ Done (A1-1 physical DDL) - Partition-key range predicate injected into MERGE ON clause; empty-delta fast-path skips MERGE — ✅ Done (A1-2 + A1-3)
- Partition-scoped MERGE benchmark: 10M-row ST, 0.1% change rate (expect ~100× I/O reduction) — ✅ Done (A1-4 E2E tests)
-
Per-database worker quotas enforced; burst reclaimed within 1 scheduler cycle — ✅ Done in v0.11.0 Phase 11 (
pg_trickle.per_database_worker_quotaGUC; burst to 150% at < 80% cluster load) -
Prometheus queries + alerting rules + Grafana dashboard shipped — ✅ Done in v0.11.0 Phase 3 (
monitoring/directory) -
DEF-1:
parallel_refresh_modedefault is'on'; unit test updated — ✅ Done in v0.11.0 Phase 1 -
DEF-2:
auto_backoffdefault istrue; CONFIGURATION.md updated — ✅ Done in v0.10.0 -
DEF-3: SemiJoin delta-key pre-filter verified already implemented — ✅ Done in v0.11.0 Phase 2 (pre-existing in
semi_join.rs) - DEF-4: Invalidation ring capacity is 128 slots — ✅ Done in v0.11.0 Phase 1
-
DEF-5:
block_source_ddldefault istrue; error message includes escape-hatch instructions — ✅ Done in v0.11.0 Phase 1 -
SAF-1: No
panic!/unwrap()in background worker hot paths;check_skip_neededlogs SPI errors — ✅ Done in v0.11.0 Phase 1 -
SAF-2: Failure-injection E2E tests in
tests/e2e_safety_tests.rs— ✅ Done in v0.11.0 Phase 2 - WB-1+2: Changed-column bitmask supports >63 columns (VARBIT); wide-table CDC selectivity E2E passes; schema migration tested — ✅ Done in v0.11.0 Phase 5
-
FUSE-1–6: Fuse blows on configurable change-count threshold;
reset_fuse()recovers in all three action modes; diamond/DAG interaction tested — ✅ Done in v0.11.0 Phase 6 - TS2: TPC-H-derived 5-query DIFFERENTIAL correctness gate passes with zero mismatches; gated in CI — ✅ Done in v0.11.0 Phase 9
-
QF-1–4:
println!replaced with guardedpgrx::log!(); AUTO downgrades emitWARNING;append_onlyreversion verified already warns; parser invariant sites annotated — ✅ Done in v0.11.0 Phase 1 -
G12-ERM:
effective_refresh_modecolumn present inpgt_stream_tables;explain_refresh_mode()returns configured mode, effective mode, downgrade reason — ✅ Done in v0.11.0 Phase 2 -
G12-2: TopK path validates assumptions at refresh time; triggers FULL fallback with
WARNINGon violation — ✅ Done in v0.11.0 Phase 4 -
G12-AGG: Group-rescan aggregate warning fires at
create_stream_tablefor DIFFERENTIAL mode; strategy visible inexplain_st()— ✅ Done in v0.11.0 Phase 4 -
G15-PV: Incompatible
cdc_mode/refresh_modeanddiamond_schedule_policycombinations rejected at creation time with structuredHINT— ✅ Done in v0.11.0 Phase 2 -
G13-EH:
UnsupportedOperator,CycleDetected,UpstreamSchemaChanged,QueryParseErrorincludeDETAILandHINTfields — ✅ Done in v0.11.0 Phase 2 - G17-EC01B-NEG: Negative regression test documents ≥3-scan fall-back behavior; linked to v0.12.0 EC01B fix — ✅ Done in v0.11.0 Phase 4
- G16-GS/SM/MQR/GUC: GETTING_STARTED restructured (5 chapters + Hello World + Advanced Topics); DVM_OPERATORS support matrix; monitoring quick reference; CONFIGURATION.md GUC matrix — ✅ Done in v0.11.0 Phase 11
- ST-ST-1–6: All ST-to-ST dependencies refresh differentially when upstream has a change buffer; FULL refreshes on upstream produce pre/post I/D diff; no cascading FULL — ✅ Done in v0.11.0 Phase 8
- WAKE-1: Event-driven scheduler wake; median latency ~15 ms (34× improvement); 10 ms debounce; poll fallback — ✅ Done in v0.11.0 Phase 7
- DAG-1: Intra-tick pipelining confirmed in Phase 4 architecture — ✅ Done
- DAG-2: Adaptive poll interval (20 ms → 200 ms exponential backoff) — ✅ Done in v0.11.0 Phase 10
-
DAG-3: Delta amplification detection with
pg_trickle.delta_amplification_thresholdGUC — ✅ Done in v0.11.0 Phase 10 -
DAG-4: ST buffer bypass (
FusedChain) for single-consumer CALCULATED chains — ✅ Done in v0.11.0 Phase 10 - DAG-5: ST buffer batch coalescing cancels redundant I/D pairs — ✅ Done in v0.11.0 Phase 10
-
Extension upgrade path tested (
0.10.0 → 0.11.0) — ✅ upgrade SQL insql/pg_trickle--0.10.0--0.11.0.sql
v0.12.0 — Correctness, Reliability & Developer Tooling
Goal: Close the last known wrong-answer bugs in the incremental query engine, add SQL-callable diagnostic functions for observability, harden the scheduler against edge cases uncovered with deeper topologies, and back the whole release with thousands of automatically generated property and fuzz tests.
Phases 5–8 from the original v0.12.0 scope (Scalability Foundations, Partitioning Enhancements, MERGE Profiling, and dbt Macro Updates) have been moved to v0.13.0 to keep this release tightly focused on correctness and reliability. See §v0.13.0 for those items.
Status: Released (2026-03-28).
Completed items (click to expand)
Anomalous Change Detection (Fuse)
In plain terms: Imagine a source table suddenly receives a million-row batch delete — a bug, runaway script, or intentional purge. Without a fuse, pg_trickle would try to process all of it and potentially overload the database. This adds a circuit breaker: you set a ceiling (e.g. "never process more than 50,000 changes at once"), and if that limit is hit the stream table pauses and sends a notification. You investigate, fix the root cause, then resume with
reset_fuse()and choose how to recover (apply the changes, reinitialize from scratch, or skip them entirely).
Per-stream-table fuse that blows when the change buffer row count exceeds a
configurable fixed ceiling or an adaptive μ+kσ threshold derived from
pgt_refresh_history. A blown fuse halts refresh and emits a
pgtrickle_alert NOTIFY; reset_fuse() resumes with a chosen recovery
action.
| Item | Description | Effort | Ref |
|---|---|---|---|
| FUSE-1 | Catalog: fuse state columns on pgt_stream_tables (fuse_mode, fuse_state, fuse_ceiling, fuse_sensitivity, blown_at, blow_reason) | 1–2h | PLAN_FUSE.md |
| FUSE-2 | alter_stream_table() new params: fuse, fuse_ceiling, fuse_sensitivity | 1h | PLAN_FUSE.md |
| FUSE-3 | reset_fuse(name, action => 'apply'|'reinitialize'|'skip_changes') SQL function | 1h | PLAN_FUSE.md |
| FUSE-4 | fuse_status() introspection function | 1h | PLAN_FUSE.md |
| FUSE-5 | Scheduler pre-check: count change buffer rows; evaluate threshold; blow fuse + NOTIFY if exceeded | 2–3h | PLAN_FUSE.md |
| FUSE-6 | E2E tests: normal baseline, spike → blow, reset, diamond/DAG interaction | 4–6h | PLAN_FUSE.md |
Anomalous change detection subtotal: ~10–14 hours
Correctness — EC-01 Deep Fix (≥3-Scan Join Right Subtrees)
In plain terms: The phantom-row-after-DELETE bug (EC-01) was fixed for join children with ≤2 scan nodes on the right side. Wider join chains — TPC-H Q7, Q8, Q9 all qualify — are still silently affected: when both sides of a join are deleted in the same batch, the DELETE can be silently dropped. The existing EXCEPT ALL snapshot strategy causes PostgreSQL to spill multi-GB temp files for deep join trees, which is why the threshold exists. This work designs a fundamentally different per-subtree snapshot strategy that removes the cap.
| Item | Description | Effort | Ref |
|---|---|---|---|
join_scan_count(child) <= 2 threshold in use_pre_change_snapshot | — | src/dvm/operators/join_common.rs · plans/PLAN_EDGE_CASES.md §EC-01 | |
| — | tests/e2e_tpch_tests.rs |
EC-01 deep fix subtotal: ~3–4 weeks — ✅ Complete
CDC Write-Side Overhead Benchmark
In plain terms: Every INSERT/UPDATE/DELETE on a source table fires a PL/pgSQL trigger that writes to the change buffer. We have never measured how much write throughput this costs. These benchmarks quantify it across five scenarios (single-row, bulk INSERT, bulk UPDATE, bulk DELETE, concurrent writers) and gate the decision on whether to implement a
change_buffer_unloggedGUC that could reduce WAL overhead by ~20–30%.
| Item | Description | Effort | Ref |
|---|---|---|---|
tests/e2e_cdc_write_overhead_tests.rs: compare source-only vs. source + stream table DML throughput across five scenarios; report write amplification factor | — | tests/e2e_cdc_write_overhead_tests.rs | |
docs/BENCHMARK.md | — | docs/BENCHMARK.md |
CDC write-side benchmark subtotal: ~3–5 days — ✅ Complete
DAG Topology Benchmark Suite (from PLAN_DAG_BENCHMARK.md)
In plain terms: Production deployments form DAGs with 10–500+ stream tables arranged in chains, fan-outs, diamonds, and mixed topologies. This benchmark suite measures end-to-end propagation latency and throughput through these DAG shapes, validates the theoretical latency formulas from PLAN_DAG_PERFORMANCE.md, and provides regression detection for DAG propagation overhead.
| Item | Description | Effort | Ref |
|---|---|---|---|
| — | PLAN_DAG_BENCHMARK.md §11.1 | ||
| — | PLAN_DAG_BENCHMARK.md §11.2 | ||
| — | PLAN_DAG_BENCHMARK.md §11.3 | ||
docs/BENCHMARK.md, full suite validation run | — | PLAN_DAG_BENCHMARK.md §11.4 |
DAG topology benchmark subtotal: ~3–5 days — ✅ Complete
Developer Tooling & Observability Functions (from REPORT_OVERALL_STATUS.md §15) ✅ Complete
In plain terms: pg_trickle's diagnostic toolbox today is limited to
explain_st()andrefresh_history(). Operators debugging unexpected mode changes, query rewrites, or error patterns must read source code or server logs. This section adds four SQL-callable diagnostic functions that surface internal state in a structured, queryable form.
| Item | Description | Effort | Status |
|---|---|---|---|
| DT-1 | explain_query_rewrite(query TEXT) — parse a query through the DVM pipeline and return the rewritten SQL plus a list of passes applied (operator rewrites, delta-key injections, TopK detection, group-rescan classification). Useful for debugging unexpected refresh behavior without creating a stream table. | ~1–2d | ✅ Done in v0.12.0 Phase 2 |
| DT-2 | diagnose_errors(name TEXT) — return the last 5 error events for a stream table, classified by type (correctness, performance, config, infrastructure), with a suggested remediation for each class. | ~2–3d | ✅ Done in v0.12.0 Phase 2 |
| DT-3 | list_auxiliary_columns(name TEXT) — list all __pgt_* internal columns injected into the stream table's query plan with their purpose (delta tracking, row identity, compaction key). Helps users understand unexpected columns in SELECT * output. | ~1d | ✅ Done in v0.12.0 Phase 2 |
| DT-4 | validate_query(query TEXT) — parse and run DVM validation on a query without creating a stream table; return the resolved refresh mode, detected SQL constructs (group-rescan aggregates, non-equijoins, multi-scan subtrees), and any warnings. | ~1–2d | ✅ Done in v0.12.0 Phase 2 |
Developer tooling subtotal: ~5–8 days
Parser Safety, Concurrency & Query Coverage (from REPORT_OVERALL_STATUS.md §13/§12/§17)
Additional correctness and robustness items from the deep gap analysis: a stack-overflow prevention guard for pathological queries, a concurrency stress test for IMMEDIATE mode, and two investigations into known under- documented query constructs.
| Item | Description | Effort | Ref |
|---|---|---|---|
dvm/parser.rs. Return PgTrickleError::QueryTooComplex if depth exceeds pg_trickle.max_parse_depth (GUC, default 64). Prevents stack-overflow crashes on pathological queries. | — | src/dvm/parser.rs · src/config.rs · src/error.rs | |
IMMEDIATE refresh mode; assert zero lost updates, zero phantom rows, and no deadlocks. | — | tests/e2e_immediate_concurrency_tests.rs | |
IN (subquery) correctness investigation. Determine behavior when DVM encounters EXPR IN (subquery returning multiple columns). Add a correctness test; if the construct is broken, fix it or document as unsupported with a structured error. | — | tests/e2e_multi_column_in_tests.rs · src/dvm/parser.rs | |
| G14-MDED | MERGE deduplication profiling. Profile how often concurrent-write scenarios produce duplicate key entries requiring pre-MERGE compaction. If ≥10% of refresh cycles need dedup, write an RFC for a two-pass MERGE strategy. | ~3–5d | plans/performance/REPORT_OVERALL_STATUS.md §14 |
EXPLAIN (COSTS OFF) dry-run checks for generated MERGE SQL templates at E2E test startup. Catches malformed templates before any data is processed. | — | tests/e2e_merge_template_tests.rs |
Parser safety & coverage subtotal: ~9–15 days
Differential Fuzzing (SQLancer)
In plain terms: SQLancer is a SQL fuzzer that generates thousands of syntactically valid but structurally unusual queries and uses mathematical oracles (NoREC, TLP) to prove our DVM engine produces exactly the same results as PostgreSQL's native executor. Unlike hand-written tests, it explores the long tail of NULL semantics, nested aggregations, and edge cases no human would write. Any backend crash or result mismatch becomes a permanent regression test seed.
| Item | Description | Effort | Ref |
|---|---|---|---|
| SQLANCER-1 | Docker-based harness: just sqlancer spins up E2E container; crash-test oracle verifies that no SQLancer-generated create_stream_table call crashes the backend | 3–4d | PLAN_SQLANCER.md §Steps 1–2 |
| SQLANCER-2 | Equivalence oracle: for each generated query Q, assert create_stream_table + refresh output equals native SELECT (multiset comparison); failures auto-committed as proptest regression seeds | 3–4d | PLAN_SQLANCER.md §Step 3 |
| SQLANCER-3 | CI weekly-sqlancer job (daily schedule + manual dispatch); new proptest seed files committed on any detected correctness failure | 1–2d | PLAN_SQLANCER.md |
SQLancer fuzzing subtotal: ~1–2 weeks
Property-Based Invariant Tests (Items 5 & 6)
In plain terms: Items 1–4 of the property test plan are done. These two remaining items add topology/scheduler stress tests (random DAG shapes with multi-source branch interactions) and pure Rust unit-level properties (ordering monotonicity, SCC bookkeeping correctness). Both slot into the existing proptest harness and provide coverage that example-based tests cannot exhaustively explore.
| Item | Description | Effort | Ref |
|---|---|---|---|
| PROP-5 | Topology / scheduler stress: randomized DAG topologies with multi-source branch interactions; assert no incorrect refresh ordering or spurious suspension | 4–6d | PLAN_TEST_PROPERTY_BASED_INVARIANTS.md §Item 5 |
| PROP-6 | Pure Rust DAG / scheduler helper properties: ordering invariants, monotonic metadata helpers, SCC bookkeeping edge-cases | 2–4d | PLAN_TEST_PROPERTY_BASED_INVARIANTS.md §Item 6 |
Property testing subtotal: ~6–10 days
Async CDC — Research Spike (D-2)
In plain terms: A custom PostgreSQL logical decoding plugin could write changes directly to change buffers without the polling round-trip, cutting CDC latency by ~10× and WAL decoding CPU by 50–80%. This milestone scopes a research spike only — not a full implementation — to validate the key technical constraints.
| Item | Description | Effort | Ref |
|---|---|---|---|
| D2-R | Research spike: prototype in-memory row buffering inside pg_trickle_decoder; validate SPI flush in commit callback; document memory-safety constraints and feasibility; produce a written RFC before any full implementation is started | 2–3 wk | PLAN_NEW_STUFF.md §D-2 |
⚠️ SPI writes inside logical decoding
changecallbacks are not supported. All row buffering must occur in-memory within the plugin's memory context; flush only in thecommitcallback. In-memory buffers must handle arbitrarily large transactions. See PLAN_NEW_STUFF.md §D-2 risk analysis before writing any C code.
Retraction candidate (D-2): Even as a research spike, this item introduces C-level complexity (custom output plugin memory management, commit-callback SPI failure handling, arbitrarily large transaction buffering) that substantially exceeds the stated 2–3 week estimate once the architectural constraints are respected. The risk rating is Very High and the SPI-in-change-callback infeasibility makes the originally proposed design non-functional. Recommend moving D-2 to a post-1.0 research backlog entirely; do not include it in a numbered milestone until a separate feasibility study (outside the release cycle) produces a concrete RFC.
D-2 research spike subtotal: ~2–3 weeks
Scalability Foundations (pulled forward from v0.13.0)
In plain terms: These items directly serve the project's primary goal of world-class performance and scalability. Columnar change tracking eliminates wasted delta processing for wide tables, and shared change buffers reduce I/O multiplication in deployments with many stream tables reading from the same source.
| Item | Description | Effort | Ref |
|---|---|---|---|
| A-2 | Columnar Change Tracking. Per-column bitmask in CDC triggers; skip rows where no referenced column changed; lightweight UPDATE-only path when only projected columns changed; 50–90% delta-volume reduction for wide-table UPDATE workloads. | 3–4 wk | PLAN_NEW_STUFF.md §A-2 |
| D-4 | Shared Change Buffers. Single buffer per source shared across all dependent STs; multi-frontier cleanup coordination; static-superset column mode for initial implementation. | 3–4 wk | PLAN_NEW_STUFF.md §D-4 |
Scalability foundations subtotal: ~6–8 weeks
Partitioning Enhancements (A1 follow-ons from v0.11.0 spike)
In plain terms: The v0.11.0 spike delivered RANGE partitioning end-to-end. These follow-on items extend coverage to the use cases deliberately deferred from A1: multi-column keys, retrofitting existing stream tables, LIST-based partitions, HASH partitions (which need a different strategy than predicate injection), and operational quality-of-life improvements.
| Item | Description | Effort | Ref |
|---|---|---|---|
partition_by; PARTITION BY RANGE (col_a, col_b); multi-column MIN/MAX extraction; ROW() comparison predicates for partition pruning.parse_partition_key_columns(), composite extract_partition_range(), ROW comparison in inject_partition_predicate(); 5 unit tests + 3 E2E tests | — | src/api.rs, src/refresh.rs | |
alter_stream_table(partition_by => …) support. Add/change/remove partition key on existing stream tables; alter_stream_table_partition_key() handles DROP + recreate + full refresh; update_partition_key() in catalog; SQL migration adds parameter; also fixed alter_stream_table_query to preserve partition key. | — | src/api.rs, src/catalog.rs | |
partition_by => 'LIST:col' creates PARTITION BY LIST storage; PartitionMethod enum dispatches LIST vs RANGE; extract_partition_bounds() uses SELECT DISTINCT for LIST; inject_partition_predicate() emits IN (…) predicate; single-column-only validation. | — | src/api.rs, src/refresh.rs | |
partition_by => 'HASH:col[:N]' creates PARTITION BY HASH storage with N auto-created child partitions; execute_hash_partitioned_merge() materializes delta → discovers children via pg_inherits → per-child MERGE filtered through satisfies_hash_partition(); build_hash_child_merge() rewrites MERGE targeting ONLY child_partition. | — | src/api.rs, src/refresh.rs | |
warn_default_partition_growth() emits pgrx::warning!() after FULL and DIFFERENTIAL refresh when the default partition has rows; includes example DDL. | — | src/refresh.rs |
Auto-partition creation (TimescaleDB-style automatic chunk management) remains a post-1.0 item as stated in PLAN_PARTITIONING_SPIKE.md §10.
Partitioning enhancements subtotal: ~5–8 weeks
Performance Defaults (from REPORT_OVERALL_STATUS.md)
Targeted improvements identified in the overall status report. None require large design changes; all build on existing infrastructure.
| Item | Description | Effort | Ref |
|---|---|---|---|
buffer_partitioning for high-throughput sources.should_promote_inner() throughput-based heuristic; convert_buffer_to_partitioned() runtime migration; auto-promote hook in execute_differential_refresh(); docs/CONFIGURATION.md updated; 10 unit tests + 3 E2E tests | — | REPORT_OVERALL_STATUS.md §R7 | |
tiered_scheduling default to true. The feature is implemented and tested since v0.10.0. | — | src/config.rs · docs/CONFIGURATION.md | |
| — | REPORT_OVERALL_STATUS.md §R3/R16 | ||
block_source_ddl default to true. | — | REPORT_OVERALL_STATUS.md §R12 | |
| — | REPORT_OVERALL_STATUS.md §R13 |
Performance defaults subtotal: ~1–3 weeks
DAG Refresh Performance Improvements (from PLAN_DAG_PERFORMANCE.md §8)
➡️ Moved to v0.11.0 — these items build directly on the ST-to-ST differential infrastructure shipped in v0.11.0 Phase 8 and are most impactful while that work is fresh.
v0.12.0 total: ~18–27 weeks + ~6–8 weeks scalability + ~5–8 weeks partitioning enhancements + ~1–3 weeks defaults + ~3–5 weeks developer tooling & observability
Priority tiers: P0 = Phases 1–3 (must ship); P1 = Phases 4 + 7 (target); P2 = Phases 5, 6, 8 (can defer to v0.13.0 as a unit — never partially ship Phase 5/6).
dbt Macro Updates (Phase 8)
Priority P2 — Expose the v0.11.0 SQL API additions (
partition_by,fuse,fuse_ceiling,fuse_sensitivity) in the dbt materialization macros so dbt users can configure them viaconfig(...). No catalog changes; pure Jinja/SQL. Can defer to v0.13.0 as a unit.
| Item | Description | Effort |
|---|---|---|
| DBT-1 | partition_by config option wired through stream_table.sql, create_stream_table.sql, and alter_stream_table.sql | ~1d |
| DBT-2 | fuse, fuse_ceiling, fuse_sensitivity config options wired through the materialization and alter macro with change-detection logic | ~1–2d |
| DBT-3 | dbt docs update: README and SQL_REFERENCE.md dbt section | ~0.5d |
dbt macro updates subtotal: ~2–3.5 days
Exit criteria — all met (v0.12.0 Released 2026-03-28):
- EC01B-1/2: No phantom-row drop for ≥3-scan right-subtree joins; TPC-H Q7/Q8/Q9 DELETE regression tests pass ✅
-
BENCH-W: Write-side overhead benchmarks published in
docs/BENCHMARK.md✅ - DAG-B1–B4: DAG topology benchmark suite complete ✅
- SQLANCER-1/2/3: Crash-test + equivalence oracles in weekly CI job; zero mismatches ✅
- PROP-5+6: Topology stress and DAG/scheduler helper property tests pass ✅
-
DT-1–4:
explain_query_rewrite(),diagnose_errors(),list_auxiliary_columns(),validate_query()callable from SQL ✅ -
G13-SD:
max_parse_depthguard active; pathological query returnsQueryTooComplex✅ - G17-IMS: IMMEDIATE mode concurrency stress test (5 scenarios × 100+ concurrent DML) passes ✅
- G12-SQL-IN: Multi-column IN subquery documented as unsupported with structured error + EXISTS hint ✅
- G17-MERGEEX: MERGE template EXPLAIN validation at E2E test startup ✅
-
PERF-3:
tiered_schedulingdefault istrue; CONFIGURATION.md updated ✅ - ST-ST-9: Content-hash pk_hash in ST change buffers; stale-row-after-UPDATE bug fixed ✅
- DAG-4 bypass column types fixed; parallel worker tests complete without timeout ✅
-
docs/UPGRADING.mdupdated with v0.11.0→v0.12.0 migration notes ✅ -
scripts/check_upgrade_completeness.shpasses ✅ -
Extension upgrade path tested (
0.11.0 → 0.12.0) ✅
v0.13.0 — Scalability Foundations, Partitioning Enhancements, MERGE Profiling & Multi-Tenant Scheduling
Status: Released (2026-03-31).
Goal: Deliver the scalability foundations deferred from v0.12.0 — columnar change tracking and shared change buffers — alongside the partitioning enhancements that build on v0.11.0's RANGE partitioning spike, a MERGE deduplication profiling pass, the dbt macro updates, per-database worker quotas for multi-tenant deployments, the TPC-H-derived benchmarking harness for data-driven performance validation, and a small SQL coverage cleanup for PG 16+ expression types.
Completed items (click to expand)
Phases from PLAN_0_12_0.md: Phases 5 (Scalability), 6 (Partitioning), 7 (MERGE Profiling), and 8 (dbt Macro Updates). Plus three new phases: 9 (Multi-Tenant Scheduler Isolation), 10 (TPC-H Benchmark Harness), and 11 (SQL Coverage Cleanup).
Scalability Foundations (Phase 5)
In plain terms: These items directly serve the project's primary goal of world-class performance and scalability. Columnar change tracking eliminates wasted delta processing for wide tables, and shared change buffers reduce I/O multiplication in deployments with many stream tables reading from the same source.
| Item | Description | Effort | Ref |
|---|---|---|---|
| A-2 | Columnar Change Tracking. Per-column bitmask in CDC triggers; skip rows where no referenced column changed; lightweight UPDATE-only path when only projected columns changed; 50–90% delta-volume reduction for wide-table UPDATE workloads. | 3–4 wk | PLAN_NEW_STUFF.md §A-2 |
| D-4 | Shared Change Buffers. Single buffer per source shared across all dependent STs; multi-frontier cleanup coordination; static-superset column mode for initial implementation. | 3–4 wk | PLAN_NEW_STUFF.md §D-4 |
buffer_partitioning for high-throughput sources.compact_threshold in a single refresh cycle is converted to RANGE(lsn) partitioned mode at runtime. | — | REPORT_OVERALL_STATUS.md §R7 |
⚠️ D-4 multi-frontier cleanup correctness verified.
MIN(consumer_frontier)used in all cleanup paths. Property-based tests with 5–10 consumers and 500 random frontier advancement cases pass.
Scalability foundations subtotal: ~6–8 weeks
Partitioning Enhancements (Phase 6)
In plain terms: The v0.11.0 spike delivered RANGE partitioning end-to-end. These follow-on items extend coverage to the use cases deliberately deferred from A1: multi-column keys, retrofitting existing stream tables, LIST-based partitions, HASH partitions, and operational quality-of-life improvements.
| Item | Description | Effort | Ref |
|---|---|---|---|
partition_by; ROW() predicate for composite keys. | — | src/api.rs, src/refresh.rs | |
alter_stream_table(partition_by => …) support. Add/change/remove partition key with full storage rebuild. | — | src/api.rs, src/catalog.rs | |
PARTITION BY LIST for low-cardinality columns; IN (…) predicate style from the delta. | — | src/api.rs, src/refresh.rs | |
HASH:col[:N] with auto-created child partitions; per-partition MERGE through satisfies_hash_partition(). | — | src/api.rs, src/refresh.rs | |
warn_default_partition_growth() after FULL and DIFFERENTIAL refresh. | — | src/refresh.rs |
Partitioning enhancements subtotal: ~5–8 weeks
MERGE Profiling (Phase 7)
| Item | Description | Effort | Ref |
|---|---|---|---|
| G14-MDED | MERGE deduplication profiling. Profile how often concurrent-write scenarios produce duplicate key entries requiring pre-MERGE compaction. If ≥10% of refresh cycles need dedup, write an RFC for a two-pass MERGE strategy. | 3–5d | plans/performance/REPORT_OVERALL_STATUS.md §14 |
| PROF-DLT | Delta SQL query plan profiling (explain_delta() function). Capture EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) for auto-generated delta SQL queries to identify PostgreSQL execution bottlenecks (join algorithms, scan types, sort spills). Add pgtrickle.explain_delta(st_name, format DEFAULT 'text') SQL function; optional PGS_PROFILE_DELTA=1 environment variable for E2E test auto-capture to /tmp/delta_plans/<st>.json. Enables identification of operator-level performance issues (semi-join full scans, deep join chains). Prerequisite for data-driven MERGE optimization. | 1–2w | PLAN_TPC_H_BENCHMARKING.md §1-5 |
MERGE profiling subtotal: ~1–3 weeks
dbt Macro Updates (Phase 8)
In plain terms: Expose the v0.11.0 SQL API additions (
partition_by,fuse,fuse_ceiling,fuse_sensitivity) in the dbt materialization macros so dbt users can configure them viaconfig(...). No catalog changes; pure Jinja/SQL.
| Item | Description | Effort |
|---|---|---|
| DBT-1 | partition_by config option wired through stream_table.sql, create_stream_table.sql, and alter_stream_table.sql | ~1d |
| DBT-2 | fuse, fuse_ceiling, fuse_sensitivity config options wired through the materialization and alter macro with change-detection logic | ~1–2d |
| DBT-3 | dbt docs update: README and SQL_REFERENCE.md dbt section | ~0.5d |
dbt macro updates subtotal: ~2–3.5 days
Multi-Tenant Scheduler Isolation (Phase 9)
In plain terms: As deployments grow past 10 databases on a single cluster, all schedulers compete for the same global background-worker pool. One busy database can starve the others. Phase 9 gives operators per-database quotas and a priority queue so critical databases always get workers.
| Item | Description | Effort | Ref |
|---|---|---|---|
pg_trickle.per_database_worker_quota GUC; priority ordering: IMMEDIATE > Hot > Warm > Cold STs; burst capacity up to 150% when other databases are under quota.compute_per_db_quota() with 80% burst; tier-aware sort_ready_queue_by_priority; 5 unit tests + 6 E2E tests | — | src/scheduler.rs |
⚠️ C-3 depends on C-1 (tiered scheduling) for Hot/Warm/Cold classification. If C-1 is not ready, fall back to IMMEDIATE > all-other ordering with equal priority within each tier; add full tier-aware ordering as a follow-on when C-1 lands in v0.14.0.
Multi-tenant scheduler isolation subtotal: ~2–3 weeks
TPC-H Benchmark Harness (Phase 10)
In plain terms: The existing TPC-H correctness suite (22/22 queries passing) has no timing infrastructure. Phase 10 adds benchmark mode so we can measure FULL vs DIFFERENTIAL speedups across all 22 queries — the only way to validate that A-2, D-4, and other v0.13.0 changes actually help on realistic analytical workloads, and to catch per-query regressions at larger scale factors.
| Item | Description | Effort | Ref |
|---|---|---|---|
| TPCH-1 | TPCH_BENCH=1 benchmark mode for Phase 3. Instrument test_tpch_full_vs_differential with warm-up cycles (WARMUP_CYCLES=2), reuse extract_last_profile() for [PGS_PROFILE] extraction, emit [TPCH_BENCH] structured output per cycle (query=q01 tier=2 cycle=1 mode=DIFF ms=12.7 decision=0.41 merge=11.3 …). Add print_tpch_summary() with per-query FULL/DIFF median, speedup, P95, and MERGE% table. | 4–5h | PLAN_TPC_H_BENCHMARKING.md §3 |
| TPCH-2 | just bench-tpch / bench-tpch-large / bench-tpch-fast justfile targets. bench-tpch: SF-0.01 with TPCH_BENCH=1; bench-tpch-large: SF-0.1 with 5 cycles; bench-tpch-fast: skip Docker image rebuild. Enables before/after measurement for every v0.13.0 optimization. | 15 min | PLAN_TPC_H_BENCHMARKING.md §3 |
| TPCH-3 | TPC-H OpTree Criterion micro-benchmarks. Add composite OpTree benchmarks to benches/diff_operators.rs representing TPC-H query shapes (diff_tpch_q01, diff_tpch_q05, diff_tpch_q08, diff_tpch_q18, diff_tpch_q21). Measures pure-Rust delta SQL generation time for complex multi-join/semi-join trees; catches DVM engine regressions without a running database. | 4h | PLAN_TPC_H_BENCHMARKING.md §4 |
TPC-H benchmark harness subtotal: ~1 day
SQL Coverage Cleanup (Phase 11)
In plain terms: Three small SQL expression gaps that are unscheduled anywhere. Two are PG 16+ standard SQL syntax currently rejected with errors; one is an audit-gated correctness check for recursive CTEs with non-monotone operators. All are low-effort items that round out DVM coverage without adding scope risk.
| Item | Description | Effort | Ref |
|---|---|---|---|
| SQL-RECUR | Recursive CTE non-monotone divergence audit. Write an E2E test for a recursive CTE with EXCEPT or aggregation in the recursive term (WITH RECURSIVE … SELECT … EXCEPT SELECT …). If the test passes → downgrade G1.3 to P4 (verified correct, no code change). If it fails → add a guard in diff_recursive_cte that detects non-monotone recursive terms and rejects them with ERROR: non-monotone recursive CTEs are not supported in DIFFERENTIAL mode — use FULL. | 6–8h | GAP_SQL_PHASE_7.md §G1.3 |
| SQL-PG16-1 | IS JSON predicate support (PG 16+). expr IS JSON, expr IS JSON OBJECT, expr IS JSON ARRAY, expr IS JSON SCALAR, expr IS JSON WITH UNIQUE KEYS — standard SQL/JSON predicates rejected today. Add a T_JsonIsPredicate arm in parser.rs; the predicate is treated opaquely (no delta decomposition); it passes through to the delta SQL unchanged where the PG executor evaluates it natively. | 2–3h | GAP_SQL_PHASE_6.md §G1.4 |
| SQL-PG16-2 | SQL/JSON constructor support (PG 16+). JSON_OBJECT(…), JSON_ARRAY(…), JSON_OBJECTAGG(…), JSON_ARRAYAGG(…) — standard SQL/JSON constructors (T_JsonConstructorExpr) currently rejected. Add opaque pass-through in parser.rs; treat as scalar expressions (no incremental maintenance of the JSON value itself); handle the aggregate variants the same way as other custom aggregates (full group rescan). | 4–6h | GAP_SQL_PHASE_6.md §G1.5 |
SQL coverage cleanup subtotal: ~1–2 days
DVM Engine Improvements (Phase 10)
In plain terms: The delta SQL generated for deep multi-table joins (e.g., TPC-H Q05/Q09 with 6 joined tables) computes identical pre-change snapshots redundantly at every reference site, spilling multi-GB temporary files that exceed
temp_file_limit. Nested semi-joins (Q20) exhibit an O(n²) blowup from fully materializing the right-side pre-change state. These improvements target the intermediate data volume directly in the delta SQL generator, with TPC-H 22/22 DIFFERENTIAL correctness as the measurable gate.
| Item | Description | Effort | Ref |
|---|---|---|---|
| DI-1 | Named CTE L₀ snapshots. Emit per-leaf pre-change snapshots as named CTEs (NOT MATERIALIZED default; MATERIALIZED when reference count ≥ 3); deduplicate 3–10× redundant EXCEPT ALL evaluations per leaf. Targets Q05/Q09 temp spill root cause. | 2–3d | PLAN_DVM_IMPROVEMENTS.md §DI-1 |
| DI-2 | Pre-image read from change buffer + aggregate UPDATE-split. Replace per-leaf EXCEPT ALL with a NOT EXISTS anti-join on pk_hash + direct old_* read. Per-leaf conditional fallback to EXCEPT ALL when delta exceeds max_delta_fraction for that leaf. Includes aggregate UPDATE-split: the 'D' side of SUM(CASE WHEN …) evaluates using old_* column values, superseding DI-8’s band-aid. | 3.5–5.5d | PLAN_DVM_IMPROVEMENTS.md §DI-2 |
| DI-3 | Group-key filtered aggregate old rescan. Restrict non-algebraic aggregate EXCEPT ALL rescans to affected groups via EXISTS (… IS NOT DISTINCT FROM …) filter. NULL-safe. Independent quick win. | 0.5–1d | PLAN_DVM_IMPROVEMENTS.md §DI-3 |
| DI-6 | Lazy semi-join R_old materialization. Skip EXCEPT ALL for unchanged semi-join right children; push down equi-join key as a filter when R_old is needed. Eliminates Q20-type O(n²) blowup. | 1–2d | PLAN_DVM_IMPROVEMENTS.md §DI-6 |
| DI-4 | Shared R₀ CTE cache. Cache pre-change snapshot SQL by OpTree node identity to avoid regenerating duplicate inline subqueries for shared subtrees. Depends on DI-1. | 1–2d | PLAN_DVM_IMPROVEMENTS.md §DI-4 |
| DI-5 | Part 3 correction consolidation. Consolidate per-node Part 3 correction CTEs for linear inner-join chains into a single term. | 2–3d | PLAN_DVM_IMPROVEMENTS.md §DI-5 |
| DI-7 | Scan-count-aware strategy selector. max_differential_joins and max_delta_fraction per-stream-table options; auto-fallback to FULL refresh when join count or delta-rate threshold is exceeded. Complements DI-2's per-leaf fallback with a coarser per-ST guard at scheduler decision time. | 1–2d | PLAN_DVM_IMPROVEMENTS.md §DI-7 |
| DI-8 | SUM(CASE WHEN …) algebraic drift fix. Detect Expr::Raw("CASE …") in is_algebraically_invertible() and fall back to GROUP_RESCAN. Q14 is unaffected (parsed as ComplexExpression, already GROUP_RESCAN). Correctness band-aid superseded by DI-2’s aggregate UPDATE-split. | ~0.5d | PLAN_DVM_IMPROVEMENTS.md §DI-8 |
| DI-9 | Scheduler skips IMMEDIATE-mode tables. Raise scheduler_interval_ms GUC cap to 600,000 ms; return early from refresh-due check for refresh_mode = IMMEDIATE (verified safe: IMMEDIATE drains TABLE-source buffers synchronously; downstream CALCULATED tables detected via has_stream_table_source_changes() independently). | 0.5d | PLAN_DVM_IMPROVEMENTS.md §DI-9 |
| DI-10 | SF=1 benchmark validation gate. Add bench-tpch-sf1 justfile target (TPCH_SF=1 TPCH_BENCH=1). Gate v0.13.0 release on 22/22 queries at SF=1. CI: manual dispatch only (60–180 min runtime, 4h timeout). | ~0.5d | PLAN_DVM_IMPROVEMENTS.md §DI-10 |
| DI-11 | Predicate pushdown + deep-join L₀ threshold + planner hints. (a) Enable push_filter_into_cross_joins() with scalar-subquery guard. (b) Deep-join L₀ threshold (4+ scans): skip L₀ reconstruction, use L₁ + Part 3 correction. (c) Deep-join planner hints (5+ scans): disable nestloop, raise work_mem, override temp_file_limit. Result: 22/22 TPC-H DIFFERENTIAL. | ~1d | — |
DI-2 promoted from v1.x: CDC
old_*column capture was completed as part of the typed-column CDC rewrite (already in production). DI-2 scope includes both the join-level pre-image capture (NOT EXISTSanti-join) and an aggregate UPDATE-split that usesold_*values for the 'D' side of SUM(CASE WHEN …), superseding DI-8's GROUP_RESCAN band-aid.
Implementation order: DI-8 → DI-9 → DI-1 → DI-3 → DI-2 → DI-6 → DI-4 → DI-5 → DI-7 → DI-10 → DI-11
DVM improvements subtotal: ~2–3 weeks (DI-8/DI-9 are small independent fixes; DI-1–DI-7 are the core engine work; DI-10 is a validation run; DI-11 is predicate pushdown + deep-join optimization)
Regression-Free Testing Initiative (Q2 2026)
Addresses 9 structural weaknesses identified in the regression risk analysis. Target: reduce regression escape rate from ~15% to <5%.
| Phase | Item | Status |
|---|---|---|
| P1 | Test infrastructure hardening: #[must_use] on poll helpers; wait_for_condition with exponential backoff; assert_column_types_match | ✅ Done (2026-03-28) |
| P2 | Join multi-cycle correctness: 7 tests — LEFT/RIGHT/FULL join, join-key update, both-sides DML, 4-table chain, NULL key | ✅ Done (2026-03-28) |
| P3 | Differential ≡ Full equivalence: 11 tests covering every major DVM operator class; effective_refresh_mode guard | ✅ Done (2026-03-28) |
| P4 | DVM operator execution: LATERAL MAX subquery multi-cycle (5 cycles) + recursive CTE org hierarchy multi-cycle (5 cycles) | ✅ Done (2026-03-28) |
| P5 | Failure recovery & schema evolution: 6 failure recovery tests (FR-1..6 in e2e_failure_recovery_tests.rs) + 5 schema evolution tests (SE-1..5 in e2e_ddl_event_tests.rs) | ✅ Done (2026-03-28) |
| P6 | MERGE template unit tests: 8 pure-Rust tests — determine_refresh_action (×5) + build_is_distinct_clause boundary (×3) in src/refresh.rs | ✅ Done (2026-03-28) |
v0.13.0 total: ~15–23 weeks (Scalability: 6–8w, Partitioning: 5–8w, MERGE Profiling: 1–3w, dbt: 2–3.5d, Multi-tenant: 2–3w, TPC-H harness: ~1d, SQL cleanup: ~1–2d, DVM improvements: ~2–3w)
Exit criteria:
-
A-2: Columnar change tracking bitmask skips irrelevant rows; key column classification ✅,
__pgt_key_changedannotation ✅, P5 value-only fast path ✅,DiffResult.has_key_changedsignal propagation ✅, MERGE value-only UPDATE optimization ✅, upgrade script ✅ ✅ Done -
D-4: Shared buffer serves multiple STs via per-source
changes_{oid}naming;pgt_change_tracking.tracked_by_pgt_idsreference counting;shared_buffer_stats()observability; property-based test with 5–10 consumers (3 properties, 500 cases) ✅ Done; 5 E2E fan-out tests -
PERF-2:
buffer_partitioning = 'auto'activates RANGE(lsn) partitioned mode for high-throughput sources — throughput-basedshould_promote_inner()heuristic,convert_buffer_to_partitioned()runtime migration, 10 unit tests + 3 E2E tests,docs/CONFIGURATION.mdupdated ✅ Done - A1-1b: Multi-column RANGE partition keys work end-to-end; composite ROW() predicate triggers partition pruning; 3 E2E tests + 5 unit tests ✅ Done
-
A1-1c:
alter_stream_table(partition_by => …)repartitions existing storage table without data loss; add/change/remove tested -
A1-1d: LIST partitioning creates
PARTITION BY LISTstorage; IN-list predicate injected; single-column-only validated; 4 E2E tests pass -
A1-3b: HASH partitioning uses per-partition MERGE loop; auto-creates N child partitions;
satisfies_hash_partition()filter; 22 unit tests + 6 E2E tests ✅ Done -
PART-WARN:
WARNINGemitted when default partition has rows after refresh;warn_default_partition_growth()on both FULL and DIFFERENTIAL paths ✅ Done -
G14-MDED: Deduplication frequency profiling complete;
TOTAL_DIFF_REFRESHES+DEDUP_NEEDED_REFRESHESshared-memory atomic counters;pgtrickle.dedup_stats()reports ratio; RFC threshold documented at ≥10% ✅ Done -
PROF-DLT:
pgtrickle.explain_delta(st_name, format)function captures delta query plans in text/json/xml/yaml;PGS_PROFILE_DELTA=1auto-capture to/tmp/delta_plans/; documented in SQL_REFERENCE.md ✅ Done -
C-3: Per-database worker quota enforced; tier-aware priority sort (IMMEDIATE > Hot > Warm > Cold) implemented; GUC + E2E quota tests added;
compute_per_db_quota()with burst at 80% cluster load ✅ Done -
TPCH-1/2:
TPCH_BENCH=1mode emits[TPCH_BENCH]lines + summary table;just bench-tpchandbench-tpch-largetargets functional ✅ Done - TPCH-3: Five TPC-H OpTree Criterion benchmarks pass and run without a PostgreSQL backend ✅ Done
-
DBT-1/2/3:
partition_by,fuse,fuse_ceiling,fuse_sensitivityexposed in dbt macros; change detection wired; integration tests added; README and SQL_REFERENCE.md updated ✅ Done - SQL-RECUR: Recursive CTE non-monotone audit complete; G1.3 downgraded to P4 — two Tier 3h E2E tests verify recomputation fallback is correct ✅ Done
-
SQL-PG16-1:
IS JSONpredicate accepted in DIFFERENTIAL defining queries; E2E tests ine2e_expression_tests.rsconfirm correct delta behaviour ✅ Done -
SQL-PG16-2:
JSON_OBJECT,JSON_ARRAY,JSON_OBJECTAGG,JSON_ARRAYAGGaccepted in DIFFERENTIAL defining queries; E2E tests ine2e_expression_tests.rsconfirm correct delta behaviour ✅ Done -
scripts/check_upgrade_completeness.shpasses (all catalog changes insql/pg_trickle--0.12.0--0.13.0.sql) ✅ Done — 58 functions, 8 new columns, all covered -
DI-8:
is_algebraically_invertible()detectsExpr::Raw("CASE …")and returnsfalseforSUM(CASE WHEN …)(Q14 unaffected —ComplexExpression); Q12 removed fromDIFFERENTIAL_SKIP_ALLOWLIST; 4 unit tests ✅ Done -
DI-9:
scheduler_interval_mscap raised to 600,000 ms; scheduler skips IMMEDIATE-mode tables incheck_schedule(); verified safe for CALCULATED dependants ✅ Done -
DI-1: Named CTE L₀ snapshots implemented (
NOT MATERIALIZEDdefault,MATERIALIZEDwhen ref ≥ 3); Q05/Q09 pass DIFFERENTIAL correctness ✅ Done -
DI-2:
NOT EXISTSanti-join replacesEXCEPT ALLinbuild_pre_change_snapshot_sql(); per-leaf conditionalEXCEPT ALLfallback when delta >max_delta_fraction; aggregate UPDATE-split blocked on Q12 drift root cause (DI-8 band-aid retained) ✅ Done -
DI-3: Already implemented — non-algebraic aggregate old rescan filtered via
EXISTS (… IS NOT DISTINCT FROM …)to affected groups; NULL-safe ✅ Done - DI-6: Semi-join R_old lazy materialization with key push-down; Q20 DIFF passes at SF=0.01 ✅ Done
- DI-4/5/7: R₀ cache (subset of DI-1), Part 3 threshold raised from 3→5, strategy selector + max_delta_fraction complete ✅ Done
-
DI-10:
bench-tpch-sf1target added; 22/22 queries pass at SF=0.01 (3 cycles, zero drift) ✅ Done - DI-11: Predicate pushdown enabled with scalar-subquery guard; deep-join L₀ threshold (4 scans); deep-join planner hints (5+ total scans); 22/22 TPC-H DIFFERENTIAL ✅ Done
-
Extension upgrade path tested (
0.12.0 → 0.13.0) ✅ Done
v0.14.0 — Tiered Scheduling, UNLOGGED Buffers & Diagnostics
Status: Released (2026-04-02).
Tiered refresh scheduling, UNLOGGED change buffers, refresh mode diagnostics, error-state circuit breaker, a full-featured TUI dashboard, security hardening (SECURITY DEFINER triggers with explicit search_path), GHCR Docker image, pre-deployment checklist, best-practice patterns guide, and comprehensive E2E test coverage. See CHANGELOG.md for the full feature list.
Completed items (click to expand)
Quick Polish & Error State Circuit Breaker (Phase 1 + 1b) — ✅ Done
- C4:
pg_trickle.planner_aggressiveGUC consolidatesmerge_planner_hints+merge_work_mem_mb. Old GUCs deprecated. - DIAG-2: Creation-time WARNING for group-rescan and low-cardinality algebraic aggregates.
agg_diff_cardinality_thresholdGUC added. - DOC-OPM: Operator support matrix summary table linked from
SQL_REFERENCE.md. - ERR-1: Permanent failures immediately set
ERRORstatus withlast_error_message/last_error_at. API calls clear error state. E2E test pending.
Manual Tiered Scheduling (Phase 2 — C-1) — ✅ Done
Tiered scheduling infrastructure was already in place since v0.11/v0.12 (refresh_tier column, RefreshTier enum, ALTER ... SET (tier=...), scheduler multipliers). Phase 2 verified completeness and added:
- C-1b: NOTICE on tier demotion from Hot to Cold/Frozen, alerting operators to the effective interval change.
- C-1c: Scheduler tier-aware multipliers confirmed: Hot ×1, Warm ×2, Cold ×10, Frozen = skip. Gated by
pg_trickle.tiered_scheduling(defaulttruesince v0.12.0).
UNLOGGED Change Buffers (Phase 3 — D-1) — ✅ Done
- D-1a:
pg_trickle.unlogged_buffersGUC (defaultfalse). New change buffer tables created asUNLOGGEDwhen enabled, reducing WAL amplification by ~30%. - D-1b: Crash recovery detection — scheduler detects UNLOGGED buffers emptied by crash (postmaster restart after last refresh) and auto-enqueues FULL refresh.
- D-1c:
pgtrickle.convert_buffers_to_unlogged()utility function for converting existing logged buffers. Documents lock-window warning. - D-1e: Documentation in
CONFIGURATION.mdandSQL_REFERENCE.md.
Documentation: Best-Practice Patterns Guide (G16-PAT) — ✅ Done
| Item | Description | Effort | Ref |
|---|---|---|---|
docs/PATTERNS.md: 6 patterns (Bronze/Silver/Gold, event sourcing, SCD type-1/2, high-fan-out, real-time dashboards, tiered refresh) with SQL examples, anti-patterns, and refresh mode recommendations. | — | ✅ Done |
Patterns guide subtotal: ✅ Done
Long-Running Stability & Multi-Database Testing (G17-SOAK, G17-MDB) — ✅ Done
Soak test validates zero worker crashes, zero ERROR states, and stable RSS under sustained mixed DML. Multi-database test validates catalog isolation, shared-memory independence, and concurrent correctness.
| Item | Description | Effort | Ref |
|---|---|---|---|
tests/e2e_soak_tests.rs with configurable duration, 5 source tables, mixed DML, health checks, RSS monitoring, correctness verification. just test-soak / just test-soak-short. CI job: schedule + manual dispatch. | — | ✅ Done | |
tests/e2e_mdb_tests.rs with two databases, catalog isolation assertion, concurrent mutation cycles, correctness verification per database. just test-mdb. CI job: schedule + manual dispatch. | — | ✅ Done |
Stability & multi-database testing subtotal: ✅ Done
Container Infrastructure (INFRA-GHCR)
| Item | Description | Effort | Ref |
|---|---|---|---|
| INFRA-GHCR | GHCR Docker image. Dockerfile.ghcr (pinned to postgres:18.3-bookworm) + .github/workflows/ghcr.yml workflow that builds a multi-arch (linux/amd64 + linux/arm64) PostgreSQL 18.3 server image with pg_trickle pre-installed and all sensible GUC defaults baked in. Smoke-tests on amd64 before push. Published to ghcr.io/grove/pg_trickle on every v* tag with immutable (<version>-pg18.3), floating (pg18), and latest tags. Uses GITHUB_TOKEN — no extra secrets. | 4h | — |
Container infrastructure subtotal: ✅ Done
Refresh Mode Diagnostics (DIAG-1) — ✅ Done
Analyzes stream table workload characteristics and recommends the optimal refresh mode. Seven weighted signals (change ratio, empirical timing, query complexity, target size, index coverage, latency variance) produce a composite score with confidence level and human-readable explanation.
| Item | Description | Effort | Ref |
|---|---|---|---|
src/diagnostics.rs — pure signal-scoring functions + unit tests | — | ✅ Done | |
| — | ✅ Done | ||
pgtrickle.recommend_refresh_mode() SQL function | — | ✅ Done | |
pgtrickle.refresh_efficiency() function | — | ✅ Done | |
| — | ✅ Done | ||
| — | ✅ Done |
The function synthesises 7 weighted signals (historical change ratio 0.30, empirical timing 0.35, current change ratio 0.25, query complexity 0.10, target size 0.10, index coverage 0.05, P95/P50 variance 0.05) into a composite score. Confidence degrades gracefully when history is sparse.
Diagnostics subtotal: ~3.5–7 days
Export Definition API (G15-EX) — ✅ Done
| Item | Description | Effort | Ref |
|---|---|---|---|
export_definition(name TEXT) — export a stream table configuration as reproducible DDL | — | ✅ Done |
G15-EX subtotal: ~1–2 days
TUI Tool (E3-TUI)
In plain terms: A full-featured terminal user interface (TUI) for managing, monitoring, and diagnosing pg_trickle stream tables without touching SQL. Built with ratatui in Rust, it provides a real-time dashboard (think
htopfor stream tables), interactive dependency graph visualization, live refresh log, diagnostics with signal breakdown charts, CDC health monitoring, a GUC configuration editor, and a real-time alert feed — all navigable with keyboard shortcuts and a command palette. It also supports every original CLI command as one-shot subcommands for scripting and CI.
| Item | Description | Effort | Ref |
|---|---|---|---|
| E3-TUI | TUI tool (pgtrickle) for interactive management and monitoring | 8–10d | PLAN_TUI.md |
E3-TUI subtotal: ~8–10 days (T1–T8 implemented: CLI skeleton with 18 subcommands, interactive dashboard with 15 views, watch mode with
--filter, LISTEN/NOTIFY alerts with JSON parsing, async polling with force-poll, cascade staleness detection, DAG issue detection, sparklines, fuse detail panel, trigger inventory, context-sensitive help, docs/TUI.md)
GUC Surface Consolidation (C4)
| Item | Description | Effort | Ref |
|---|---|---|---|
| C4 | Consolidate merge_planner_hints + merge_work_mem_mb into single planner_aggressive boolean. Reduces GUC surface area; existing two GUCs become aliases that emit a deprecation notice. | ~1–2h | PLAN_FEATURE_CLEANUP.md §C4 |
C4 subtotal: ~1–2 hours
Documentation: Pre-Deployment Checklist (DOC-PDC) — ✅ Done
| Item | Description | Effort | Ref |
|---|---|---|---|
docs/PRE_DEPLOYMENT.md: 10-point checklist covering PG version, shared_preload_libraries, WAL configuration, PgBouncer compatibility, recommended GUCs, resource planning, monitoring, validation script. Cross-linked from GETTING_STARTED.md and INSTALL.md. | — | ✅ Done |
DOC-PDC subtotal: ✅ Done
Documentation: Operator Mode Support Matrix Cross-Link (DOC-OPM)
| Item | Description | Effort | Ref |
|---|---|---|---|
| DOC-OPM | Cross-link operator support matrix from SQL_REFERENCE.md. The 60+ operator × FULL/DIFFERENTIAL/IMMEDIATE matrix in DVM_OPERATORS.md is not discoverable from the page users actually read. Add a summary table and prominent link in SQL_REFERENCE.md §Supported SQL Constructs. | ~2–4h | docs/DVM_OPERATORS.md · docs/SQL_REFERENCE.md |
DOC-OPM subtotal: ~2–4 hours
Aggregate Mode Warning at Creation Time (DIAG-2)
In plain terms: Queries with very few distinct GROUP BY groups (e.g. 5 regions from 100K rows) are always faster with FULL refresh — differential overhead exceeds the cost of re-aggregating a tiny result set. Today users discover this only after benchmarking. A creation-time WARNING with an explicit recommendation prevents the surprise. The classification logic is already present in the DVM parser (aggregate strategy classification from
is_algebraically_invertible,is_group_rescan); this item exposes it at the SQL boundary.
| Item | Description | Effort | Ref |
|---|---|---|---|
| DIAG-2 | Aggregate mode warning at create_stream_table time. After parsing the defining query, inspect the top-level operator: if it is an Aggregate node containing non-algebraic (group-rescan) functions such as MIN, MAX, STRING_AGG, ARRAY_AGG, BOOL_AND/OR, emit a WARNING recommending refresh_mode='full' or 'auto' and citing the group-rescan cost. For algebraic aggregates (SUM/COUNT/AVG), emit the warning only when the estimated group cardinality (from pg_stats.n_distinct on the GROUP BY columns) is below pg_trickle.agg_diff_cardinality_threshold (default: 1000 distinct groups), since below this threshold FULL is reliably faster. No behavior change — warning only. | ~2–4h | plans/performance/REPORT_OVERALL_STATUS.md §12.3 |
DIAG-2 subtotal: ~2–4 hours
DIFFERENTIAL Refresh for Manual ST-on-ST Path (FIX-STST-DIFF)
Background: When a stream table reads from another stream table (
calculatedschedule), the scheduler propagates changes via a per-ST change buffer (pgtrickle_changes.changes_pgt_{id}) and performs a true DIFFERENTIAL DVM refresh against that buffer. The manualpgtrickle.refresh_stream_table()path does not: it currently falls back to an unconditionalTRUNCATE + INSERT(FULL refresh) for every call.This was introduced as a correctness fix in v0.13.0 (PR #371) to close a scheduler race where the previous no-op guard could leave stale data in place. The FULL fallback is correct but inefficient — it pays a full table scan of all upstream STs even when only a small delta is present.
What needs to happen: Wire
execute_manual_differential_refreshto use the samechanges_pgt_change buffers the scheduler already writes. When a manual refresh is requested for acalculatedST that has a stored frontier, check each upstream ST's change buffer for rows withlsn > frontier.get_st_lsn(upstream_pgt_id). If new rows exist, apply the DVM delta SQL (same asexecute_differential_refresh). If no rows exist beyond the frontier, return a true no-op. This also fixes the pre-existingtest_st_on_st_uses_differential_not_fullE2E failure.
| Item | Description | Effort | Ref |
|---|---|---|---|
execute_manual_differential_refresh (src/api.rs), replace the unconditional FULL fallback for has_st_source with a proper change-buffer delta path: read rows from changes_pgt_{upstream_pgt_id} beyond the stored frontier LSN, run DVM differential SQL, advance the frontier. Matches the scheduler path exactly. Fixes test_st_on_st_uses_differential_not_full. | — | ✅ Done |
FIX-STST-DIFF subtotal: ~1–2 days
v0.14.0 total: ~2–6 weeks + ~1wk patterns guide + ~2–4 days stability tests + ~3.5–7 days diagnostics + ~1–2d export API + ~8–10d TUI + ~0.5d docs + ~2–4h aggregate warning + ~1–2d ST-on-ST diff manual path
Exit criteria:
- C-1: Tier classification with manual assignment; Cold STs skip refresh correctly; E2E tested ✅ Done
-
D-1: UNLOGGED change buffers opt-in (
unlogged_buffers = falseby default); crash-recovery FULL-refresh path tested; E2E tested ✅ Done -
G16-PAT: Patterns guide published in
docs/PATTERNS.mdcovering 6 patterns ✅ Done - G17-SOAK: Soak test passes with zero worker crashes, zero zombie stream tables, stable memory ✅ Done
- G17-MDB: Multi-database scheduler isolation verified ✅ Done
-
DIAG-1:
recommend_refresh_mode()+refresh_efficiency()implemented with 7 signals; E2E tested; tutorial published ✅ Done - DIAG-2: WARNING emitted at creation time for group-rescan and low-cardinality aggregates; threshold configurable ✅ Done
-
G15-EX:
export_definition(name TEXT)returns valid reproducible DDL; round-trip tested ✅ Done -
E3-TUI:
pgtrickleTUI binary builds as workspace member; one-shot CLI commands functional with--format json; interactive dashboard launches with no subcommand; 15 views with cascade staleness, issue detection, sparklines, force-poll, NOTIFY, and context-sensitive help; documented indocs/TUI.md✅ Done -
C4:
merge_planner_hintsandmerge_work_mem_mbconsolidated intoplanner_aggressive✅ Done -
DOC-PDC: Pre-deployment checklist published in
docs/PRE_DEPLOYMENT.md✅ Done - DOC-OPM: Operator mode support matrix summary and link added to SQL_REFERENCE.md ✅ Done
- FIX-STST-DIFF: Manual DIFFERENTIAL refresh for ST-on-ST path ✅ Done
-
INFRA-GHCR:
ghcr.io/grove/pg_tricklemulti-arch image builds, smoke-tests, and pushes onv*tags ✅ Done - ERR-1: Error-state circuit breaker with E2E test coverage ✅ Done
-
Extension upgrade path tested (
0.13.0 → 0.14.0) ✅ Done
v0.15.0 — External Test Suites & Integration
Status: Released (2026-04-03). All 20 roadmap items complete.
Goal: Validate correctness against independent query corpora and ship the dbt integration as a formal release.
Completed items (click to expand)
External Test Suite Integration
In plain terms: pg_trickle's own tests were written by the pg_trickle team, which means they can have the same blind spots as the code. This adds validation against three independent public benchmarks: PostgreSQL's own SQL conformance suite (sqllogictest), the Join Order Benchmark (a realistic analytical query workload), and Nexmark (a streaming data benchmark). If pg_trickle produces a different answer than PostgreSQL does on the same query, these external suites will catch it.
Validate correctness against independent query corpora beyond TPC-H.
➡️ TS1 and TS2 pulled forward to v0.11.0. Delivering one of TS1 or TS2 is an exit criterion for 0.11.0. TS3 (Nexmark) remains in 0.15.0. If TS1/TS2 slip from 0.11.0, they land here.
| Item | Description | Effort | Ref |
|---|---|---|---|
| 2–3d | PLAN_TESTING_GAPS.md §J | ||
| 1–2d | PLAN_TESTING_GAPS.md §J | ||
| TS3 | Nexmark streaming benchmark: sustained high-frequency DML correctness | 1–2d | PLAN_TESTING_GAPS.md §J |
External test suites subtotal: ~1–2 days (TS3 only; TS1/TS2 in v0.11.0) -- ✅ TS3 complete
Documentation Review
In plain terms: A full documentation review polishes everything so the product is ready to be announced to the wider PostgreSQL community.
| Item | Description | Effort | Ref |
|---|---|---|---|
| I2 | Complete documentation review & polish | 4--6h | docs/ |
Documentation subtotal: ✅ Done
Bulk Create API (G15-BC)
| Item | Description | Effort | Ref |
|---|---|---|---|
| G15-BC | bulk_create(definitions JSONB) — create multiple stream tables and their CDC triggers in a single transaction. Useful for dbt/CI pipelines that manage many STs programmatically. | ~2–3d | plans/performance/REPORT_OVERALL_STATUS.md §15 |
G15-BC subtotal: ✅ Completed
Parser Modularization (G13-PRF) -- ✅ Done
In plain terms: At ~21,000 lines,
parser.rswas too large to maintain safely. Split into 5 sub-modules by concern -- zero behavior change.
| Item | Description | Effort | Ref |
|---|---|---|---|
| G13-PRF | src/dvm/parser.rs.mod.rs, types.rs, validation.rs, rewrites.rs, sublinks.rs. Added // SAFETY: comments to all ~750 unsafe blocks (~676 newly documented). | ~3–4wk | plans/performance/REPORT_OVERALL_STATUS.md §13 |
G13-PRF subtotal: ✅ Completed
Watermark Hold-Back Mode (WM-7) -- ✅ Done
In plain terms: The watermark gating system (shipped in v0.7.0) lets ETL producers signal their progress. Hold-back mode adds stuck detection: when a watermark is not advanced within a configurable timeout, downstream stream tables are paused and operators are notified.
| Item | Description | Effort | Ref |
|---|---|---|---|
| WM-7 | Watermark hold-back mode. watermark_holdback_timeout GUC detects stuck watermarks; pauses downstream gated STs; emits pgtrickle_alert NOTIFY with watermark_stuck event; auto-resumes with watermark_resumed event when watermark advances. | ✅ Done | PLAN_WATERMARK_GATING.md §4.1 |
WM-7 subtotal: ✅ Done
Delta Cost Estimation (PH-E1) — ✅ Done
In plain terms: Before executing the MERGE, runs a capped COUNT on the delta subquery to estimate output cardinality. If the count exceeds
pg_trickle.max_delta_estimate_rows, emits a NOTICE and falls back to FULL refresh to prevent OOM or excessive temp-file spills.
| Item | Description | Effort | Ref |
|---|---|---|---|
| PH-E1 | Delta cost estimation. Capped SELECT count(*) FROM (delta LIMIT N+1) before MERGE execution. max_delta_estimate_rows GUC (default: 0 = disabled). Falls back to FULL + NOTICE when exceeded. | — | PLAN_PERFORMANCE_PART_9.md §Phase E |
PH-E1 subtotal: ✅ Complete
dbt Hub Publication (I3) — ✅ Done
In plain terms:
dbt-pgtrickleis now prepared for dbt Hub publication. Thedbt_project.ymlis version-synced (0.15.0), README documents both git and Hub install methods, and a submission guide documents the hubcap PR process. Actual Hub listing requires creating a standalonegrove/dbt-pgtricklerepository and submitting a PR todbt-labs/hubcap.
| Item | Description | Effort | Ref |
|---|---|---|---|
| I3 | Prepared dbt-pgtrickle for dbt Hub publication. Version synced to 0.15.0, README updated with Hub install snippet, submission guide written. Hub listing pending separate repo creation + hubcap PR. | 2–4h | dbt-pgtrickle/ · docs/integrations/dbt-hub-submission.md |
I3 subtotal: ~2–4 hours — ✅ Complete
Hash-Join Planner Hints (PH-D2) — ✅ Done
In plain terms: Added
pg_trickle.merge_join_strategyGUC that lets operators manually override the join strategy used during MERGE. Values:auto(default heuristic),hash_join,nested_loop,merge_join. The existing delta-size heuristics remain the default (auto).
| Item | Description | Effort | Ref |
|---|---|---|---|
| PH-D2 | Hash-join planner hints. Added merge_join_strategy GUC with manual override for join strategy during MERGE. auto preserves existing delta-size heuristics; hash_join/nested_loop/merge_join force specific strategies. | 3–5d | PLAN_PERFORMANCE_PART_9.md §Phase D |
PH-D2 subtotal: ~3–5 days — ✅ Complete
Shared-Memory Template Cache Research Spike (G14-SHC-SPIKE)
In plain terms: Every new database connection that triggers a refresh pays a 15–50ms cold-start cost to regenerate the MERGE SQL template. With PgBouncer in transaction mode, this happens on every refresh cycle. This milestone scopes a research spike only: write an RFC, build a prototype, measure whether DSM-based caching eliminates the cold-start. Full implementation stays in v0.16.0.
| Item | Description | Effort | Ref |
|---|---|---|---|
| G14-SHC-SPIKE | Shared-memory template cache research spike. Write an RFC for DSM + lwlock-based MERGE SQL template caching. Build a prototype benchmark to validate cold-start elimination. Full implementation deferred to v0.16.0. | 2–3d | plans/performance/REPORT_OVERALL_STATUS.md §14 |
G14-SHC-SPIKE subtotal: ~2–3 days -- ✅ RFC complete (plans/performance/RFC_SHARED_TEMPLATE_CACHE.md)
TRUNCATE Capture for Trigger-Mode CDC (TRUNC-1)
In plain terms: WAL-mode CDC detects TRUNCATE on source tables and marks downstream stream tables for reinitialization. But trigger-mode CDC has no TRUNCATE handler — a
TRUNCATEsilently leaves the stream table stale. Adding a DDL event trigger that catches TRUNCATE and flags affected STs closes this correctness gap.
| Item | Description | Effort | Ref |
|---|---|---|---|
| TRUNC-1 | needs_reinit.action='T' marker; refresh engine detects and falls back to FULL. | 4–6h | plans/adrs/PLAN_ADRS.md ADR-070 |
TRUNC-1 subtotal: ✅ Completed
Volatile Function Policy GUC (VOL-1)
In plain terms: Volatile functions (
random(),clock_timestamp(), etc.) are correctly rejected at stream table creation time in DIFFERENTIAL and IMMEDIATE modes. But there’s no way for users to override this — some want volatile functions in FULL mode. Adding avolatile_function_policyGUC withreject/warn/allowmodes gives operators control.
| Item | Description | Effort | Ref |
|---|---|---|---|
| VOL-1 | pg_trickle.volatile_function_policy GUC. Add a GUC with values reject (default), warn, allow to control volatile function handling. reject preserves current behavior; warn emits WARNING but allows creation; allow silently permits (user accepts correctness risk). | 3–5h | plans/sql/PLAN_NON_DETERMINISM.md |
VOL-1 subtotal: ✅ Completed
Spill-Aware Refresh (PH-E2)
In plain terms: After PH-E1 adds pre-flight cost estimation, PH-E2 adds post-flight monitoring: track
temp_bytesfrompg_stat_statementsafter each refresh cycle and auto-adjust if spill is excessive.
| Item | Description | Effort | Ref |
|---|---|---|---|
| PH-E2 | temp_bytes from pg_stat_statements after each refresh cycle. If spill exceeds threshold 3 consecutive times, automatically increase per-ST work_mem override or switch to FULL. Expose in explain_st() as spill_history. | 1–2 wk | PLAN_PERFORMANCE_PART_9.md §Phase E |
PH-E2 subtotal: ✅ Completed
ORM Integration Guides (E5)
In plain terms: Documentation showing how popular ORMs (SQLAlchemy, Django, etc.) interact with stream tables — model definitions, migrations, and freshness checks. Documentation-only work.
| Item | Description | Effort | Ref |
|---|---|---|---|
| E5 | ORM integrations guide (SQLAlchemy, Django, etc.) | 8–12h | PLAN_ECO_SYSTEM.md §5 |
E5 subtotal: ✅ Done
Flyway / Liquibase Migration Support (E4)
In plain terms: Documentation showing how standard migration frameworks interact with stream tables — CREATE/ALTER/DROP patterns, handling CDC triggers across schema migrations. Documentation-only work.
| Item | Description | Effort | Ref |
|---|---|---|---|
| E4 | Flyway / Liquibase migration support | 8–12h | PLAN_ECO_SYSTEM.md §5 |
E4 subtotal: ✅ Done
JOIN Key Change + DELETE Correctness Fix (EC-01) — ✅ Done (pre-existing)
In plain terms: The phantom-row-after-DELETE bug was fixed in v0.14.0 via the R₀ pre-change snapshot strategy. Part 1 of the JOIN delta is split into 1a (inserts ⋈ R₁) + 1b (deletes ⋈ R₀), ensuring DELETE deltas always find the old join partner. The fix was extended to all join depths via the EC-01B-1 per-leaf CTE strategy, and regression tests (EC-01B-2) cover TPC-H Q07, Q08, Q09.
| Item | Description | Effort | Ref |
|---|---|---|---|
| EC-01 | R₀ pre-change snapshot for JOIN key change + DELETE. Part 1 split into 1a (inserts ⋈ R₁) + 1b (deletes ⋈ R₀). Applied to INNER/LEFT/FULL JOIN. Closes G1.1. | — | GAP_SQL_PHASE_7.md §G1.1 |
EC-01 subtotal: ✅ Complete (implemented in v0.14.0)
Multi-Level ST-on-ST Testing (STST-3)
In plain terms: FIX-STST-DIFF (v0.14.0) fixed 2-level stream-table-on-stream-table DIFFERENTIAL refresh. Some 3-level cascade tests exist, but systematic coverage for 3+ level chains — including mixed refresh modes, concurrent DML at multiple levels, and DELETE/UPDATE propagation through deep chains — is missing. This adds a dedicated test matrix to prevent regressions as cascade depth increases.
| Item | Description | Effort | Ref |
|---|---|---|---|
| STST-3 | Multi-level ST-on-ST test matrix (3+ levels). Systematic coverage: 3-level and 4-level chains, INSERT/UPDATE/DELETE propagation, mixed DIFFERENTIAL/FULL modes, concurrent DML at multiple levels, correctness comparison against materialized-view baseline. | 3–5d | e2e_cascade_regression_tests.rs |
STST-3 subtotal: ✅ Done
Circular Dependencies + IMMEDIATE Mode (CIRC-IMM)
In plain terms: Circular dependencies are rejected at creation time (EC-30), but the interaction between near-circular topologies (e.g. diamond dependencies with IMMEDIATE triggers on both sides) and IMMEDIATE mode is untested territory. This adds targeted testing and, if needed, hardening to ensure IMMEDIATE mode doesn't deadlock or produce incorrect results on complex dependency graphs. Conditional P1 — can slip to v0.16.0 if no issues surface during other IMMEDIATE-mode work.
| Item | Description | Effort | Ref |
|---|---|---|---|
| CIRC-IMM | Circular-dependency + IMMEDIATE mode hardening. Test: diamond deps with IMMEDIATE triggers, near-circular topologies, lock ordering under concurrent DML. Add deadlock detection / timeout guard if issues found. | 3–5d | PLAN_EDGE_CASES.md §EC-30 · PLAN_CIRCULAR_REFERENCES.md |
CIRC-IMM subtotal: ✅ Done
Cross-Session MERGE Cache Staleness Fix (G8.1)
In plain terms: When session A alters a stream table's defining query, session B's cached MERGE SQL template remains stale until B encounters a refresh error or reconnects. Adding a catalog version counter that is bumped on every ALTER QUERY and checked before each refresh closes this race window.
| Item | Description | Effort | Ref |
|---|---|---|---|
| G8.1 | catalog_version counter to pgt_stream_tables, bump on ALTER QUERY / DROP / reinit. Before each refresh, compare cached version to catalog; regenerate template on mismatch.CACHE_GENERATION counter + defining_query_hash provides cross-session + per-ST invalidation without a schema change. | 4–6h | — |
G8.1 subtotal: ✅ Completed
explain_st() Enhancements (EXPL-ENH) — ✅ Done
In plain terms: Small quality-of-life improvements to the diagnostic function: refresh timing statistics, partition source info, and a dependency-graph visualization snippet in DOT format.
| Item | Description | Effort | Ref |
|---|---|---|---|
| EXPL-ENH | explain_st() enhancements. Added: (a) refresh timing stats (min/max/avg/latest duration from last 20 refreshes), (b) source partition info for partitioned tables, (c) dependency sub-graph visualization in DOT format. | 4–8h | PLAN_FEATURE_CLEANUP.md |
EXPL-ENH subtotal: ~4–8 hours — ✅ Complete
CNPG Operator Hardening (R4)
In plain terms: Kubernetes-native improvements for the CloudNativePG integration: adopt K8s 1.33+ native ImageVolume (replacing the init-container workaround), add liveness/readiness probe integration for pg_trickle health, and test failover behavior with stream tables.
| Item | Description | Effort | Ref |
|---|---|---|---|
| R4 | CNPG operator hardening. Adopt K8s 1.33+ native ImageVolume, add pg_trickle health to CNPG liveness/readiness probes, test primary→replica failover with active stream tables. | 4–6h | PLAN_CLOUDNATIVEPG.md |
R4 subtotal: ~4–6 hours -- ✅ Complete
v0.15.0 total: ~52–90h + ~2–3d bulk create + ~3–5d planner hints + ~2–3d cache spike + ~3–4wk parser + ~1–2wk watermark + ~2–4wk delta cost/spill + ~2–3d EC-01 + ~3–5d ST-on-ST + ~3–5d CIRC-IMM
Exit criteria:
- At least one external test corpus (sqllogictest, JOB, or Nexmark) passes
- Complete documentation review done
-
G15-BC:
pgtrickle.bulk_create(definitions JSONB)creates all STs and CDC triggers atomically; tested with 10+ definitions in a single call -
G13-PRF:
parser.rssplit into 5 sub-modules; zero behavior change; all existing tests pass -
WM-7: Stuck watermarks detected and downstream STs paused;
watermark_stuckalert emitted; auto-resume on watermark advance -
PH-E1: Delta cost estimation via capped COUNT on delta subquery;
max_delta_estimate_rowsGUC; FULL downgrade + NOTICE when threshold exceeded -
PH-E2: Spill-aware auto-adjustment triggers after 3 consecutive spills;
spill_infoexposed inexplain_st() -
PH-D2:
merge_join_strategyGUC with manual override (auto/hash_join/nested_loop/merge_join) - G14-SHC-SPIKE: RFC written; prototype benchmark validates or invalidates DSM-based approach
- I2: Complete documentation review done -- CONFIGURATION.md GUCs documented (40+), SQL_REFERENCE.md gaps filled, FAQ refs fixed
- TRUNC-1: TRUNCATE on trigger-mode CDC source marks downstream STs for reinit; tested end-to-end
-
VOL-1:
volatile_function_policyGUC controls volatile function handling;reject/warn/allowmodes tested -
I3:
dbt-pgtrickleprepared for dbt Hub; submission guide written; Hub listing pending separate repo + hubcap PR -
E4: Flyway / Liquibase integration guide published in
docs/integrations/flyway-liquibase.md -
E5: ORM integration guides (SQLAlchemy, Django) published in
docs/integrations/orm.md - EC-01: R₀ pre-change snapshot ensures DELETE deltas find old join partners; unit + TPC-H regression tests confirm correctness
- STST-3: 3-level and 4-level ST-on-ST chains tested with INSERT/UPDATE/DELETE propagation; mixed modes covered
- CIRC-IMM: Diamond + near-circular IMMEDIATE topologies tested; no deadlocks or incorrect results
- G8.1: Cross-session MERGE cache invalidation via catalog version counter; tested with concurrent ALTER QUERY + refresh
-
EXPL-ENH:
explain_st()shows refresh timing stats, source partition info, and dependency sub-graph (DOT format) - R4: CNPG operator hardening — ImageVolume, health probes, failover tested
-
G13-PRF:
parser.rssplit into 5 sub-modules; all ~750unsafeblocks have// SAFETY:comments; zero behavior change; all existing tests pass -
Extension upgrade path tested (
0.14.0 → 0.15.0) -
just check-version-syncpasses
v0.16.0 — Performance & Refresh Optimization
Status: Released (2026-04-06).
Faster refreshes across the board: sub-1% deltas use DELETE+INSERT instead of MERGE, insert-only stream tables auto-detect and skip the MERGE join, algebraic aggregates apply pinpoint updates, and a cross-backend template cache eliminates cold-start latency. Automated benchmark regression gating prevents future performance degradation.
Completed items (click to expand)
Goal: Attack the MERGE bottleneck from multiple angles — alternative merge strategies, algebraic aggregate shortcuts, append-only bypass, delta filtering, change buffer compaction, shared-memory template caching — close critical test coverage gaps to validate these new paths.
MERGE Alternatives & Planner Control (Phase D)
In plain terms: MERGE dominates 70–97% of refresh time. This explores whether replacing MERGE with DELETE+INSERT (or INSERT ON CONFLICT + DELETE) is faster for specific patterns — particularly for small deltas against large stream tables where the MERGE join is the bottleneck.
| Item | Description | Effort | Ref |
|---|---|---|---|
DELETE WHERE __pgt_row_id IN (delta_deletes) + INSERT ... SELECT FROM delta_inserts. Benchmark against MERGE for 1K/10K/100K deltas against 1M/10M targets. Gate behind pg_trickle.merge_strategy = 'auto'|'merge'|'delete_insert' GUC. |
MERGE alternatives subtotal: ~1–2 weeks
Algebraic Aggregate UPDATE Fast-Path (B-1)
In plain terms: The current aggregate delta rule recomputes entire groups where the GROUP BY key appears in the delta. For a group with 100K rows where 1 row changed, the aggregate re-scans all 100K rows in that group. For decomposable aggregates (
SUM/COUNT/AVG), a directUPDATE target SET col = col + Δreplaces the full MERGE join — dropping aggregate refresh from O(group_size) to O(1) per group.
| Item | Description | Effort | Ref |
|---|---|---|---|
| B-1 | Algebraic aggregate UPDATE fast-path. For GROUP BY queries where all aggregates are algebraically invertible (SUM/COUNT/AVG), replace the MERGE with a direct UPDATE target SET col = col + Δ WHERE group_key = ? for existing groups, plus INSERT for newly-appearing groups and DELETE for groups whose count reaches zero. Eliminates the MERGE join overhead — the dominant cost for aggregate refresh when group cardinality is high. Requires adding __pgt_aux_count / __pgt_aux_sum auxiliary columns to the stream table. Fallback to existing MERGE path for non-algebraic aggregates (MIN, MAX, STRING_AGG, etc.). Gate behind pg_trickle.aggregate_fast_path GUC (default true). Expected impact: 5–20× apply-time reduction for high-cardinality GROUP BY (10K+ distinct groups); aggregate scenarios at 100K/1% projected to drop from ~50ms to sub-1ms apply time. | 4–6 wk | plans/performance/PLAN_NEW_STUFF.md §B-1 · plans/sql/PLAN_TRANSACTIONAL_IVM.md §Phase 4 |
B-1 subtotal: ~4–6 weeks
Append-Only Stream Tables — MERGE Bypass (A-3-AO)
In plain terms: When a stream table's sources are insert-only (e.g. event logs, append-only tables where CDC never sees DELETE/UPDATE), the MERGE is pure overhead — every delta row is an INSERT, never a match. Bypassing MERGE entirely with a plain
INSERT INTO st SELECT ... FROM deltaremoves the join against the target table, takes onlyRowExclusiveLock, and is the single highest-payoff optimization for event-sourced architectures.
| Item | Description | Effort | Ref |
|---|---|---|---|
CREATE STREAM TABLE … APPEND ONLY declaration. When set, refresh uses INSERT INTO st SELECT ... FROM delta instead of MERGE — no target-table join, RowExclusiveLock only. CDC-observed heuristic fallback: if no DELETE/UPDATE has been seen, use the fast path; fall back to MERGE on first non-insert. Benchmark against MERGE for 1K/10K/100K append deltas. |
A-3-AO subtotal: ~1–2 weeks
Delta Predicate Pushdown (B-2)
In plain terms: For a query like
SELECT ... FROM orders WHERE status = 'shipped', if a CDC change row hasstatus = 'pending', the delta processes it through scan → filter → discard. All the scan and join work is wasted. Pushing the WHERE predicate down into the change buffer scan eliminates irrelevant rows before any join processing begins — a 5–10× reduction in delta row volume for selective queries.
| Item | Description | Effort | Ref |
|---|---|---|---|
Filter nodes whose predicates reference only columns from a single source table. Inject these predicates into the delta_scan CTE as additional WHERE clauses (including OR old_col = 'value' for DELETE correctness). Expected impact: 5–10× delta row reduction for queries with < 10% selectivity. |
B-2 subtotal: ~2–3 weeks
Shared-Memory Template Caching (G14-SHC)
In plain terms: Every new database connection that triggers a refresh pays a 15–50ms cold-start cost to regenerate the MERGE SQL template. With PgBouncer in transaction mode, this happens on every single refresh cycle. Shared-memory caching stores compiled templates in PostgreSQL DSM so they survive across connections — eliminating the cold-start entirely for steady-state workloads.
| Item | Description | Effort | Ref |
|---|---|---|---|
| G14-SHC | Shared-memory template caching (implementation). Full implementation of DSM + lwlock-based MERGE SQL template caching, building on the G14-SHC-SPIKE RFC from v0.15.0. | ~2–3wk | plans/performance/REPORT_OVERALL_STATUS.md §14 |
G14-SHC subtotal: ~2–3 weeks
PostgreSQL 19 Forward-Compatibility (A3) — Moved to v1.0.0
PG 19 beta not available in time. Items A3-1 through A3-4 deferred to v1.0.0 milestone.
Change Buffer Compaction (C-4)
In plain terms: A high-churn source table can accumulate thousands of changes to the same row between refresh cycles — an INSERT followed by 10 UPDATEs followed by a DELETE is really just "nothing happened." Compaction merges multiple changes to the same row ID into a single net change before the delta query runs, reducing change buffer size by 50–90% for high-churn tables. This directly reduces work for every downstream path (MERGE, DELETE+INSERT, append-only INSERT, predicate pushdown).
| Item | Description | Effort | Ref |
|---|---|---|---|
__pgt_row_id into a single net change: INSERT+DELETE cancel out; consecutive UPDATEs collapse to one. Trigger on buffer exceeding pg_trickle.compact_threshold rows (default: 100K). Expected impact: 50–90% reduction in change buffer size for high-churn tables. |
C-4 subtotal: ~2–3 weeks
Test Coverage Hardening (TG2)
In plain terms: The performance optimizations in this release change core refresh paths (MERGE alternatives, aggregate fast-path, append-only bypass, predicate pushdown). Before and alongside these changes, critical test coverage gaps need closing — particularly around operators and scenarios where bugs could hide silently. These gaps were identified in the TESTING_GAPS_2 audit.
High-Priority Gaps
| Item | Description | Effort | Ref |
|---|---|---|---|
Medium-Priority Gaps
| Item | Description | Effort | Ref |
|---|---|---|---|
| TG2-MERGE | refresh.rs MERGE template unit tests. Only helpers/enums tested; the core MERGE SQL template generation is untested at the unit level. | 2–3d | TESTING_GAPS_2.md |
| TG2-CANCEL | Timeout/cancellation during refresh. Zero tests for statement_timeout, pg_cancel_backend() during active refresh. Risk: silent failures or resource leaks under production load. | 1–2d | TESTING_GAPS_2.md |
| TG2-SCHEMA | Source table schema evolution. Partial DDL tests exist; type changes and column renames are thin. Risk: silent data corruption on schema change. | 2–3d | TESTING_GAPS_2.md |
TG2 subtotal: ~2–4 weeks (high-priority) + ~1–2 weeks (medium-priority)
Performance Regression CI (BENCH-CI)
In plain terms: v0.16.0 changes core refresh paths (MERGE alternatives, aggregate fast-path, append-only bypass, predicate pushdown, buffer compaction). Without automated benchmarks in CI, performance regressions will slip through silently. This adds a benchmark suite that runs on every PR and compares against a committed baseline — any statistically significant regression blocks the merge.
| Item | Description | Effort | Ref |
|---|---|---|---|
| BENCH-CI-1 | Benchmark harness in CI. Run just bench (Criterion-based) on a fixed hardware profile (GitHub Actions large runner or self-hosted). Capture results as JSON artifacts. Compare against committed baseline using Criterion's --save-baseline / --baseline. | 2–3d | plans/performance/PLAN_PERFORMANCE_PART_9.md §I |
| BENCH-CI-2 | Regression gate. Parse Criterion JSON output; fail CI if any benchmark regresses by more than 10% (configurable threshold). Report regressions as PR comment with before/after numbers. | 1–2d | plans/performance/PLAN_PERFORMANCE_PART_9.md §I |
| BENCH-CI-3 | Scenario coverage. Ensure benchmark suite covers: scan, filter, aggregate (algebraic + non-algebraic), join (2-table, 3-table), window function, CTE, TopK, append-only, and mixed workloads. At minimum 1K/10K/100K row scales. | 2–3d | plans/performance/PLAN_PERFORMANCE_PART_9.md §I |
BENCH-CI subtotal: ~1–2 weeks
Auto-Indexing on Stream Table Creation (AUTO-IDX)
In plain terms: pg_ivm automatically creates indexes on GROUP BY columns and primary key columns when creating an incrementally maintained view. pg_trickle currently requires manual index creation, which is a friction point for new users. Auto-indexing creates appropriate indexes at stream table creation time — GROUP BY keys, DISTINCT columns, and the
__pgt_row_idcovering index for MERGE performance.
| Item | Description | Effort | Ref |
|---|---|---|---|
create_stream_table() time. Gated behind pg_trickle.auto_index GUC. | — | src/api.rs | |
__pgt_row_id.pg_trickle.auto_index GUC (default true). | — | src/api.rs |
AUTO-IDX: ✅ Done
Quick Wins
| Item | Description | Effort | Ref |
|---|---|---|---|
resume_stream_table(). | — | ||
docs/ERRORS.md with all 20 variants documented. Cross-linked from FAQ. | — | docs/ERRORS.md | |
true (correct for most workloads). Added detailed tuning guidance for memory-constrained and PgBouncer environments in CONFIGURATION.md. | — | docs/CONFIGURATION.md | |
pg_trickle.max_buffer_rows GUC added (default: 1M). Forces FULL refresh + truncation when exceeded. | — | src/config.rs · src/refresh.rs |
Quick wins: ✅ Done
v0.16.0 total: ~1–2 weeks (MERGE alts) + ~4–6 weeks (aggregate fast-path) + ~1–2 weeks (append-only) + ~2–3 weeks (predicate pushdown) + ~2–3 weeks (template cache) + ~2–3 weeks (buffer compaction) + ~3–6 weeks (test coverage) + ~1–2 weeks (bench CI) + ~2–3 days (auto-indexing) + ~2–4 hours (quick wins) Note: PG 19 compatibility (A3, ~18–36h) moved to v1.0.0.
Exit criteria:
-
PH-D1: DELETE+INSERT strategy implemented and gated behind
merge_strategyGUC; correctness verified for INSERT/UPDATE/DELETE deltas -
B-1: Algebraic aggregate fast-path replaces MERGE for
SUM/COUNT/AVGGROUP BY queries;aggregate_fast_pathGUC respected; explicit DML path (DELETE+UPDATE+INSERT) used instead of MERGE for all-algebraic aggregates;explain_st()exposesaggregate_path; existing tests pass — ✅ Done in v0.16.0 Phase 8 -
A-3-AO:
CREATE STREAM TABLE … APPEND ONLYaccepted; refresh uses INSERT path; heuristic auto-promotion on insert-only buffers; falls back to MERGE on first non-insert CDC event - B-2: Delta predicate pushdown implemented for single-source Filter nodes (P2-7); DELETE correctness verified (OR old_col predicate); selective-query benchmarks show delta row reduction
-
G14-SHC: Cross-backend template cache eliminates cold-start; catalog-backed L2 cache with
template_cacheGUC; invalidation on DDL;explain_st()exposes stats A3: PG 19 builds and passes full E2E suite— moved to v1.0.0-
C-4: Change buffer compaction reduces buffer size by ≥50% for high-churn workloads;
compact_thresholdGUC respected; no correctness regressions - TG2-WIN: Window function DVM execution tests cover ROW_NUMBER, RANK, DENSE_RANK, LAG/LEAD across INSERT/UPDATE/DELETE
- TG2-JOIN: Join multi-cycle tests cover INNER/LEFT/FULL JOIN with UPDATE and DELETE propagation; no silent data loss
- TG2-EQUIV: Differential ≡ Full equivalence validated for joins, aggregates, and window functions
- TG2-MERGE: refresh.rs MERGE template generation has unit test coverage (completed in v0.17.0)
- TG2-CANCEL: Timeout and cancellation during refresh tested; no resource leaks (completed in v0.17.0)
- TG2-SCHEMA: Source table type changes and column renames tested end-to-end
- BENCH-CI: Performance regression CI runs on every PR; 10% regression threshold blocks merge; scenario coverage includes scan/filter/aggregate/join/window/CTE/TopK/SemiJoin/AntiJoin
-
AUTO-IDX: Stream tables auto-create indexes on GROUP BY / DISTINCT columns;
__pgt_row_idcovering index for ≤ 8-column tables;auto_indexGUC respected -
C2-BUG:
resume_stream_table()verified operational (present since v0.2.0) - ERR-REF: Error reference doc published with all 20 PgTrickleError variants, common causes, and suggested fixes
-
GUC-DEFAULTS:
planner_aggressiveandcleanup_use_truncatedefaults reviewed; trade-offs documented in CONFIGURATION.md -
BUF-LIMIT:
max_buffer_rowsGUC prevents unbounded change buffer growth; triggers FULL + truncation when exceeded -
Extension upgrade path tested (
0.15.0 → 0.16.0) -
just check-version-syncpasses
v0.17.0 — Query Intelligence & Stability
Status: Released (2026-04-08).
Goal: Make the refresh engine smarter, prove correctness through automated
fuzzing, harden for scale, and prepare for adoption. Cost-based strategy
selection replaces the fixed DIFF/FULL threshold, columnar change tracking
skips irrelevant columns in wide-table UPDATEs, SQLancer integration provides
automated semantic proving, incremental DAG rebuild supports 1000+ stream table
deployments, and unsafe block reduction continues the safety hardening toward
1.0. On the adoption side: api.rs modularization improves code maintainability,
a pg_ivm migration guide targets the largest potential adopter audience, a
failure mode runbook equips production teams, and a Docker Compose playground
provides a 60-second tryout experience.
Completed items (click to expand)
Cost-Based Refresh Strategy Selection (B-4)
In plain terms: The current adaptive FULL/DIFFERENTIAL threshold is a fixed ratio (
differential_max_change_ratiodefault 0.5). A join-heavy query may be better off with FULL at 5% change rate, while a scan-only query benefits from DIFFERENTIAL up to 80%. This replaces the fixed threshold with a cost model trained on each stream table's own refresh history — selecting the cheapest strategy per cycle automatically.
| Item | Description | Effort | Ref |
|---|---|---|---|
| B-4 | Cost-based refresh strategy selection. Collect per-ST statistics (delta_row_count, merge_duration_ms, full_refresh_duration_ms, query_complexity_class) from pgt_refresh_history. Fit a simple linear cost model. Before each refresh, compare estimated_diff_cost(Δ) vs estimated_full_cost × safety_margin and select the cheaper path. Cold-start heuristic (< 10 refreshes) falls back to existing fixed threshold. Gate behind pg_trickle.refresh_strategy = 'auto'|'differential'|'full' GUC. | 2–3 wk | plans/performance/PLAN_NEW_STUFF.md §B-4 |
B-4 subtotal: ~2–3 weeks
Columnar Change Tracking (A-2-COL)
In plain terms: When a source table UPDATE changes only 1 of 50 columns, the current CDC captures the entire row (old + new) and the delta query processes all columns. If the changed column is not referenced by the stream table's defining query, the entire refresh is wasted work. Columnar change tracking adds a per-column bitmask to CDC events so the delta query can skip irrelevant rows at scan time — a 50–90% reduction in delta volume for wide-table OLTP workloads.
| Item | Description | Effort | Ref |
|---|---|---|---|
| A-2-COL-1 | CDC trigger bitmask. Compute changed_columns bitmask (old.col IS DISTINCT FROM new.col) in the CDC trigger; store as int8 or bit(n) alongside the change row. | 1–2 wk | plans/performance/PLAN_NEW_STUFF.md §A-2 |
| A-2-COL-2 | Delta-scan column filtering. At delta-query build time, consult the bitmask: skip rows where no referenced column changed; use lightweight UPDATE-only path when only projected columns changed (no join keys, no filter predicates, no aggregate keys). | 1–2 wk | plans/performance/PLAN_NEW_STUFF.md §A-2 |
| A-2-COL-3 | Aggregate correction optimization. For aggregates where only the aggregated value column changed (not GROUP BY key), emit a single correction row instead of delete-old + insert-new. | 3–5d | plans/performance/PLAN_NEW_STUFF.md §A-2 |
A-2-COL subtotal: ~3–4 weeks
Transactional IVM Phase 4 Remaining (A2)
In plain terms: IMMEDIATE mode (same-transaction refresh) shipped in v0.2.0 using SQL-level statement triggers. Phase 4 completes the transition to lower-overhead C-level triggers and ENR-based transition tables — sharing the transition tuplestore directly between the trigger and the refresh engine instead of copying through a temp table. Also adds prepared statement reuse to eliminate repeated parse/plan overhead for the delta query.
| Item | Description | Effort | Ref |
|---|---|---|---|
pg_sys ENR tuplestore FFI not surfaced by pgrx; carries memory-corruption and pg_upgrade compatibility risk. Revisit after 1.0 stabilisation. | PLAN_TRANSACTIONAL_IVM.md §Phase 4 | ||
CreateTrigger() FFI not surfaced by pgrx; carries memory-corruption and pg_upgrade compatibility risk. Revisit after 1.0 stabilisation. | PLAN_TRANSACTIONAL_IVM.md §Phase 4 | ||
pg_trickle.use_prepared_statements GUC (default true) implemented and wired in refresh.rs; parse/plan overhead eliminated on steady-state workloads. | PLAN_TRANSACTIONAL_IVM.md §Phase 4 |
A2 subtotal: 0h remaining (A2-PS shipped; A2-ENR + A2-CTR deferred post-1.0)
ROWS FROM() Support (A8)
In plain terms:
ROWS FROM()with multiple set-returning functions is a rarely-used SQL feature, but supporting it closes a coverage gap in the parser and DVM pipeline.
| Item | Description | Effort | Ref |
|---|---|---|---|
| A8 | ROWS FROM() with multiple SRF functions. Parser + DVM support for ROWS FROM(generate_series(...), unnest(...)) in defining queries. Very low demand. | ~1–2d | PLAN_TRANSACTIONAL_IVM_PART_2.md Task 2.3 |
A8 subtotal: ~1–2 days
SQLancer Fuzzing Integration (SQLANCER)
In plain terms: pg_trickle's tests were written by the pg_trickle team, which means they share the same assumptions as the code. SQLancer is an automated database testing tool that generates random SQL queries and checks whether the results are correct — it has found hundreds of bugs in PostgreSQL, SQLite, CockroachDB, and TiDB. Integrating SQLancer gives pg_trickle a crash-test oracle (does the parser panic on fuzzed input?), an equivalence oracle (does DIFFERENTIAL mode produce the same answer as FULL?), and stateful DML fuzzing (do random INSERT/UPDATE/DELETE sequences corrupt stream table data?). This is the single highest-value testing investment for finding unknown correctness bugs.
| Item | Description | Effort | Ref |
|---|---|---|---|
just sqlancer), Rust LCG query generator, SQLANCER_CASES/SQLANCER_SEED controls, weekly-sqlancer CI job. | PLAN_SQLANCER.md §1 | ||
test_sqlancer_crash_oracle / run_crash_oracle() verifies zero backend crashes over 200–2000 fuzzed queries. | PLAN_SQLANCER.md §2 | ||
test_sqlancer_diff_vs_full_oracle / run_diff_vs_full_oracle() creates DIFFERENTIAL + FULL stream tables, applies 4 DML mutations, and asserts count parity. Integrated into test_sqlancer_ci_combined. | PLAN_SQLANCER.md §3 | ||
test_sqlancer_stateful_dml / run_stateful_dml_fuzzing() runs SQLANCER_MUTATIONS (default 100, nightly 10 000) random INSERT/UPDATE/DELETE mutations with checkpoints every 50. CI: weekly-sqlancer-stateful job (SQLANCER_MUTATIONS=10000). | PLAN_SQLANCER.md §4 |
SQLANCER subtotal: 0 remaining (all four items shipped in v0.17.0)
Incremental DAG Rebuild (C-2)
In plain terms: When any DDL change occurs (e.g.
ALTER STREAM TABLE,DROP STREAM TABLE), the entire dependency graph is rebuilt from scratch by queryingpgt_dependencies. For 1000+ stream tables this becomes expensive — O(V+E) SPI queries. Incremental DAG maintenance records which specific stream table was affected and only re-sorts the affected subgraph, reducing the scheduler latency spike from ~50ms to ~1ms at scale.
| Item | Description | Effort | Ref |
|---|---|---|---|
| C-2-1 | Delta-based rebuild. Record affected pgt_id in a bounded ring buffer in shared memory alongside DAG_REBUILD_SIGNAL. On overflow, fall back to full rebuild. | 1 wk | plans/performance/PLAN_NEW_STUFF.md §C-2 |
| C-2-2 | Incremental topological sort. Add/remove only affected edges and vertices; re-run topological sort on the affected subgraph only. Cache the sorted schedule in shared memory. | 1–2 wk | plans/performance/PLAN_NEW_STUFF.md §C-2 |
C-2 subtotal: ~2–3 weeks
Unsafe Block Reduction — Phase 6 (UNSAFE-R1/R2)
In plain terms: pg_trickle achieved a 51% reduction in
unsafeblocks (from ~1,300 to 641) in earlier releases. The remaining blocks are concentrated in well-documented field-accessor macros and standaloneis_atype checks. Converting these to safe wrappers removes another 150–250 unsafe blocks with minimal risk — a meaningful safety improvement before 1.0.
| Item | Description | Effort | Ref |
|---|---|---|---|
| UNSAFE-R1 | Safe field-accessor macros. Replace unsafe { (*node).field } patterns with safe accessor functions. Estimated reduction: ~100–150 unsafe blocks. | 2–4h | PLAN_REDUCED_UNSAFE.md §R1 |
| UNSAFE-R2 | Safe is_a checks. Convert standalone unsafe { is_a(node, T_Foo) } calls to safe wrapper functions. Estimated reduction: ~50–99 unsafe blocks. | 2–4h | PLAN_REDUCED_UNSAFE.md §R2 |
UNSAFE-R1/R2 subtotal: ~4–8 hours
api.rs Modularization (API-MOD)
In plain terms:
api.rsis 9,413 lines — the largest file in the codebase. It contains stream table CRUD, ALTER QUERY, CDC management, bulk operations, diagnostics, and monitoring functions all in one file. The same treatment thatparser.rsreceived in v0.15.0 (split from 21K lines into 5 sub-modules) is needed here. Zero behavior change — purely structural.
| Item | Description | Effort | Ref |
|---|---|---|---|
| API-MOD | Split src/api.rs into sub-modules. Proposed split: api/create.rs (create/drop/alter), api/refresh.rs (refresh entry points), api/cdc.rs (CDC management), api/diagnostics.rs (explain_st, health_check), api/bulk.rs (bulk_create), api/mod.rs (re-exports). Zero behavior change. | 1–2 wk | — |
API-MOD subtotal: ~1–2 weeks
pg_ivm Migration Guide (MIG-IVM)
In plain terms: pg_ivm is the incumbent IVM extension with 1,400+ GitHub stars and 4 years of production use. Many potential pg_trickle adopters are currently using pg_ivm. A step-by-step migration guide — mapping pg_ivm concepts to pg_trickle equivalents, with concrete SQL examples — removes the biggest adoption friction for this audience.
| Item | Description | Effort | Ref |
|---|---|---|---|
| MIG-IVM | pg_ivm → pg_trickle migration guide. Map: create_immv() → create_stream_table(); refresh_immv() → refresh_stream_table(); IMMEDIATE mode equivalence; aggregate coverage differences (5 vs 60+); GUC mapping; worked example migrating a real pg_ivm deployment. Publish as docs/tutorials/MIGRATING_FROM_PG_IVM.md. | 2–3d | docs/research/PG_IVM_COMPARISON.md |
MIG-IVM subtotal: ~2–3 days
Failure Mode Runbook (RUNBOOK)
In plain terms: Production teams need to know what happens when things go wrong — and what to do about it. This documents every failure mode pg_trickle can encounter (scheduler crash, WAL slot lag, OOM during refresh, disk full, replication slot conflict, stuck watermarks, circular convergence failure) with symptoms, diagnosis steps, and resolution procedures. Essential for on-call engineers.
| Item | Description | Effort | Ref |
|---|---|---|---|
| RUNBOOK | Failure mode runbook. Document: scheduler crash recovery, WAL decoder failures, OOM during refresh, disk-full behavior, replication slot conflicts, stuck watermarks, circular convergence timeout, CDC trigger failures, SUSPENDED state recovery, lock contention diagnosis. Include health_check() output interpretation and explain_st() troubleshooting. Publish as docs/TROUBLESHOOTING.md. | 3–5d | docs/PRE_DEPLOYMENT.md |
RUNBOOK subtotal: ~3–5 days
Docker Quickstart Playground (PLAYGROUND)
In plain terms: The fastest way to evaluate any database extension is to run it locally in 60 seconds. A
docker-compose.ymlwith PostgreSQL + pg_trickle pre-installed, sample data (e.g. the org-chart from GETTING_STARTED.md), and a Jupyter notebook or pgAdmin web UI gives potential users a zero-friction tryout experience. This is the single most impactful thing for driving initial adoption.
| Item | Description | Effort | Ref |
|---|---|---|---|
| PLAYGROUND | Docker Compose quickstart. docker-compose.yml with: PG 18 + pg_trickle, seed SQL script (org-chart example from GETTING_STARTED.md + TPC-H SF=0.01), pgAdmin web UI (optional). Single docker compose up command. README with guided walkthrough. | 2–3d | docs/GETTING_STARTED.md |
PLAYGROUND subtotal: ~2–3 days
Documentation Polish (DOC-POLISH)
In plain terms: The existing documentation is comprehensive and technically excellent, but it's optimized for users already familiar with IVM and PostgreSQL internals. These items restructure the docs for a better "first hour" experience — simpler getting-started examples, a refresh mode decision guide, a condensed new-user FAQ, and a setup verification checklist. The goal is to reduce cognitive overload for new users without losing the depth that experienced users need.
| Item | Description | Effort | Ref |
|---|---|---|---|
| DOC-HELLO | Simplified "Hello Stream Table" in GETTING_STARTED. Add a Chapter 0 with a single-table, single-aggregate stream table (e.g. SELECT department, count(*) FROM employees GROUP BY department). Create it, insert a row, verify the refresh. Build confidence before the multi-table org-chart example. | 2–4h | docs/GETTING_STARTED.md |
| DOC-DECIDE | Refresh mode decision guide. Flowchart: "Need transactional consistency? → IMMEDIATE. Volatile functions? → FULL. Otherwise → AUTO (DIFFERENTIAL with FULL fallback)." Include when-to-use guidance for each mode with concrete examples. Publish as a section in GETTING_STARTED or as a standalone tutorial. | 2–4h | docs/tutorials/tuning-refresh-mode.md |
| DOC-FAQ-NEW | New User FAQ (top 15 questions). Extract the 15 most common new-user questions from the 3,000-line FAQ into a prominent "New User FAQ" section at the top. Keyword-rich headings for searchability. Link to deep FAQ for details. | 2–3h | docs/FAQ.md |
| DOC-VERIFY | Post-install verification checklist. SQL script that verifies: extension loaded, shared_preload_libraries configured, GUCs set, CDC triggers installable, first stream table creates and refreshes successfully. Runnable as psql -f verify_install.sql. | 2–4h | docs/GETTING_STARTED.md |
| DOC-STUBS | Fill or remove research stubs. PG_IVM_COMPARISON.md (60 bytes) and CUSTOM_SQL_SYNTAX.md (57 bytes) are empty stubs. Either flesh them out (PG_IVM_COMPARISON can draw from the existing comparison data) or remove from SUMMARY.md. | 2–4h | docs/research/ |
DOC-POLISH subtotal: ~2–3 days
v0.17.0 total: ~2–3 weeks (cost-based strategy) + ~3–4 weeks (columnar tracking) + ~32–48 hours (TIVM Phase 4) + ~1–2 days (ROWS FROM) + ~2–3 weeks (SQLancer) + ~2–3 weeks (incremental DAG) + ~4–8 hours (unsafe reduction) + ~1–2 weeks (api.rs modularization) + ~2–3 days (pg_ivm migration) + ~3–5 days (failure runbook) + ~2–3 days (Docker playground) + ~2–3 days (doc polish)
Exit criteria:
-
B-4: Cost-based strategy selector trained on per-ST history; cold-start fallback to fixed threshold;
QueryComplexityClasscost model (scan/filter/aggregate/join/join_agg);refresh_strategy+cost_model_safety_marginGUCs; pre-refresh predictive comparison; 10 unit tests -
A-2-COL: CDC trigger emits
changed_colsVARBIT bitmask (COL-1); delta-scan filters irrelevant rows viachanged_cols & mask(COL-2); aggregate value-only correction 'V' path halves row volume (COL-3) [ ] A2-ENR: 🚫 Deferred post-1.0 — requires rawpg_sysENR tuplestore FFI (memory-corruption risk); revisit after 1.0 stabilisation[ ] A2-CTR: 🚫 Deferred post-1.0 — requires rawCreateTrigger()C FFI (memory-corruption risk); revisit after 1.0 stabilisation-
A2-PS: ✅ Already shipped —
pg_trickle.use_prepared_statementsGUC (defaulttrue) wired inrefresh.rs; parse/plan overhead eliminated on steady-state workloads -
A8:
ROWS FROM()with multiple SRFs accepted in defining queries; E2E tests cover INSERT/UPDATE/DELETE propagation -
SQLANCER: ✅ SQLANCER-1/2 crash + equivalence oracles shipped in v0.12.0; SQLANCER-3 diff-vs-full oracle and SQLANCER-4 stateful DML soak (10K mutations) added in v0.17.0;
weekly-sqlancer-statefulCI job wired - C-2: Incremental DAG rebuild reduces DDL-triggered latency spike to < 5ms at 100+ STs; ring buffer overflow falls back to full rebuild; no correctness regressions
-
UNSAFE-R1/R2: Unsafe block count reduced by 249 (690→441 in parser);
is_node_type!andpg_deref!macros; all 1,700 unit tests pass -
API-MOD:
api.rssplit into 3 sub-modules (mod.rs 5,624 + diagnostics.rs 1,377 + helpers.rs 2,461); zero behavior change; all 1,700 unit tests pass -
MIG-IVM:
docs/tutorials/MIGRATING_FROM_PG_IVM.mdpublished with step-by-step migration, API mapping, behavioral differences, SQL upgrade examples, and verification checklist -
RUNBOOK:
docs/TROUBLESHOOTING.mdcovers 13 failure scenarios (scheduler, SUSPENDED, CDC triggers, WAL slots, INITIALIZING, buffer growth, lock contention, OOM, disk full, circular convergence, schema changes, worker pool, fuse) with symptoms, diagnosis, and resolution -
PLAYGROUND:
playground/with docker-compose.yml, seed.sql (3 base tables, 5 stream tables), and README walkthrough - DOC-HELLO: Chapter 1 "Hello World" in GETTING_STARTED already provides the single-table aggregate example (products/category_summary)
-
DOC-DECIDE: Refresh mode decision guide already published as
tutorials/tuning-refresh-mode.mdwithrecommend_refresh_mode()and signal breakdown - DOC-FAQ-NEW: New User FAQ section with 15 keyword-rich entries added at top of FAQ.md
-
DOC-VERIFY:
scripts/verify_install.sqlchecks shared_preload_libraries, extension, scheduler, GUCs, and runs end-to-end stream table cycle -
DOC-STUBS: Research stubs already use
{{#include}}directives pointing to substantial content (923 + 1232 lines) -
Extension upgrade path tested (
0.16.0 → 0.17.0)
v0.18.0 — Hardening & Delta Performance
Status: Released (2026-04-12).
Release Theme This release hardens pg_trickle for production at scale and delivers the biggest remaining performance win in the differential refresh path. The Z-set multi-source delta engine merges per-source delta branches into a single
GROUP BY + SUM(weight)query, eliminating redundant join evaluation when multiple source tables change in the same cycle. Cross-source snapshot consistency guarantees that multi-source stream tables always read all upstream tables at the same transaction boundary — closing the last known correctness gap. Every production-path.unwrap()is replaced with graceful error propagation, another ~69 unsafe blocks are eliminated, and a populated TPC-H baseline turns the 22-query suite into a true regression canary. SQLancer fuzzing integration provides an external, assumption-free correctness oracle. Together, these changes build the confidence foundation for 1.0.
Completed items (click to expand)
Correctness
| ID | Title | Effort | Priority |
|---|---|---|---|
| CORR-1 | Enforce cross-source snapshot consistency | L | P0 |
| CORR-2 | Populate TPC-H expected-output regression guard | XS | P0 |
| CORR-3 | NULL-safe GROUP BY elimination under deletes | S | P1 |
| CORR-4 | Z-set merged-delta weight accounting proof | M | P0 |
| CORR-5 | HAVING-filtered aggregate correction under group depletion | S | P1 |
CORR-1 — Enforce cross-source snapshot consistency (CSS-3)
In plain terms: When a stream table reads from two different source tables, there is a window where it can see source A at a newer point in time than source B — for example, seeing a new order but the old inventory count. Phase 3 completes the tick-watermark enforcement so both sources are always read at the same consistent LSN before any refresh proceeds. Phases 1 and 2 are already complete.
| Item | Description | Effort | Ref |
|---|---|---|---|
| CSS-3-1 | LSN watermark enforcement in the scheduler — hold refresh until all upstream sources reach the same tick boundary | 4–6h | PLAN_CROSS_SOURCE_SNAPSHOT_CONSISTENCY.md §Phase 3 |
| CSS-3-2 | Catalog column pgt_css_watermark_lsn + GUC pg_trickle.cross_source_consistency (default off) | 2–3h | — |
| CSS-3-3 | E2E test: concurrent writes to two sources, assert stream table never sees a split snapshot | 2–3h | — |
CSS-3 subtotal: ~8–12 hours Dependencies: None. Schema change: Yes.
CORR-2 — Populate TPC-H expected-output regression guard (TPCH-BASE)
In plain terms: The TPC-H correctness tests run all 22 queries but the expected-output comparison guard was never populated — so the tests catch structural failures but not quiet result regressions. Populating the baseline turns the suite into a true correctness canary.
| Item | Description | Effort |
|---|---|---|
| TPCH-BASE-1 | Run TPC-H suite once at known-good state; capture output | 30min |
| TPCH-BASE-2 | Populate comparison baseline in e2e_tpch_tests.rs line 89 (remove TODO); verify guard fires on a deliberate regression | 1h |
TPCH-BASE subtotal: ~1–2 hours Dependencies: None. Schema change: No.
CORR-3 — NULL-safe GROUP BY elimination under deletes
In plain terms: When all rows in a GROUP BY group are deleted and the grouping key contains NULLs, the differential engine must correctly remove the group. SQL's three-valued logic in
IS DISTINCT FROMmay cause delta weight miscounting for NULL keys.
Verify: E2E test with GROUP BY nullable_col, delete all group members,
assert zero rows remain in the stream table.
Dependencies: None. Schema change: No.
CORR-4 — Z-set merged-delta weight accounting proof
In plain terms: Companion correctness gate for PERF-1 (B3-MERGE). The Z-set algebra requires that
SUM(weight)across all merged branches for every primary key never produces a spurious net-positive or net-negative for a single join path.
Verify: property-based tests (proptest) asserting merged_weights == individual_branch_sums for randomly generated multi-source DAGs. All
existing B3-3 diamond-flow tests must pass unchanged.
Dependencies: PERF-1. Schema change: No.
CORR-5 — HAVING-filtered aggregate correction under group depletion
In plain terms: When a HAVING-qualified group loses enough rows to no longer satisfy the predicate (e.g.,
HAVING count(*) > 5and 3 of 6 rows are deleted), the differential aggregate path must delete the stream table row rather than leaving a stale row matching the old HAVING predicate.
Verify: E2E test with HAVING + selective deletes crossing the threshold. Dependencies: None. Schema change: No.
Stability
| ID | Title | Effort | Priority |
|---|---|---|---|
| STAB-1 | Eliminate production-path .unwrap() calls | S | P0 |
| STAB-2 | unsafe block reduction Phase 1 | M | P1 |
| STAB-3 | Spill detection alerting | S | P1 |
| STAB-4 | Parallel worker orphaned resource cleanup | M | P1 |
| STAB-5 | Upgrade migration test (0.17→0.18) | S | P0 |
| STAB-6 | Error SQLSTATE coverage audit | S | P2 |
STAB-1 — Eliminate production-path .unwrap() calls (SAFE-1)
In plain terms: A small number of SQL-parsing code paths in production (non-test) code call
.unwrap()directly — if they encounter unexpected input they will panic the backend process and disconnect all clients. These should propagate errors gracefully instead.
| Item | Description | Effort |
|---|---|---|
| SAFE-1-1 | detect_and_strip_distinct() call in api.rs (L8163) → propagate PgTrickleError | 1h |
| SAFE-1-2 | find_top_level_keyword(sql, "FROM") calls in api.rs (L8229–8258, 3×) → propagate error | 1h |
| SAFE-1-3 | merge_sql[using_start.unwrap()..using_end.unwrap()] in refresh.rs (L6236) → bounds-check | 1h |
| SAFE-1-4 | entry.unwrap() in delta computation loop in refresh.rs (L5992) → return Err | 1h |
| SAFE-1-5 | Chained .unwrap().unwrap() in refresh.rs (L6556–6557) → propagate | 1h |
SAFE-1 subtotal: ~4–6 hours Dependencies: None. Schema change: No.
STAB-2 — unsafe block reduction Phase 1 (UNSAFE-P1)
In plain terms: The DVM parser has 1,286
unsafeblocks — 98% of the total. Phase 1 introduces a singlepg_cstr_to_str()safe helper that eliminates ~69 of the most mechanical ones: C-string-to-Rust conversions. No API or behavior change; pure safety improvement.
| Item | Description | Effort | Ref |
|---|---|---|---|
| UNSAFE-P1-1 | Implement pg_cstr_to_str(ptr: *const c_char) -> &str safe wrapper in src/dvm/parser/mod.rs | 1h | PLAN_REDUCED_UNSAFE.md §Phase 1 |
| UNSAFE-P1-2 | Replace ~69 unsafe { CStr::from_ptr(...).to_str()... } call-sites with the safe helper | 4–6h | — |
| UNSAFE-P1-3 | unsafe_inventory.sh baseline update + CI check | 1h | scripts/unsafe_inventory.sh |
UNSAFE-P1 subtotal: ~6–8 hours Dependencies: None. Schema change: No.
STAB-3 — Spill detection alerting (PH-E2)
In plain terms: The GUCs
pg_trickle.spill_threshold_blocksandpg_trickle.spill_consecutive_limitalready exist to configure spill budgets, but no alert fires when a refresh actually spills to disk. This adds anAlertEvent::SpillThresholdExceedednotification so operators know when large delta queries are hitting disk.
| Item | Description | Effort |
|---|---|---|
| PH-E2-1 | Add AlertEvent::SpillThresholdExceeded variant to src/monitor.rs | 1h |
| PH-E2-2 | Detect spill after MERGE execution; emit alert when consecutive count exceeds limit | 2–3h |
| PH-E2-3 | E2E test: configure low spill threshold, trigger spill, assert alert fires | 1–2h |
PH-E2 subtotal: ~4–6 hours Dependencies: None. Schema change: No.
STAB-4 — Parallel worker orphaned resource cleanup
In plain terms: After a parallel worker panics mid-refresh, advisory locks,
__pgt_delta_*temp tables, and partially-written change buffer rows may be left behind. The scheduler recovery path must clean these up.
Audit the recovery path to ensure: (a) advisory locks are released on next
scheduler tick, (b) temp tables are cleaned up, (c) change buffer rows are
not double-counted on retry. Verify: E2E test simulating worker crash via
pg_terminate_backend() followed by successful recovery.
Dependencies: None. Schema change: No.
STAB-5 — Upgrade migration test (0.17→0.18) Extend the upgrade E2E test framework to cover the 0.17.0→0.18.0 migration path and the three-version chain 0.16→0.17→0.18. Verify: catalog column additions, new function signatures, existing stream tables survive, refresh continues working post-upgrade. Dependencies: All schema-changing items (CORR-1). Schema change: No.
STAB-6 — Error SQLSTATE coverage audit
Audit all ereport!() and error!() calls for SQLSTATE classification.
Ensure every user-facing error has a unique, documented SQLSTATE code that
connection poolers and application retry logic can pattern-match. Cross-
reference with docs/ERRORS.md for completeness.
Dependencies: None. Schema change: No.
Performance
| ID | Title | Effort | Priority |
|---|---|---|---|
| PERF-1 | Z-set multi-source delta engine | L | P0 |
| PERF-2 | Cost-based refresh strategy completion | L | P1 |
| PERF-3 | Zero-change source branch elision | M | P1 |
| PERF-4 | Columnar change tracking Phase 1 — CDC bitmask | L | P1 |
| PERF-5 | Index hint generation for MERGE target | S | P2 |
PERF-1 — Z-set multi-source delta engine (B3-MERGE)
In plain terms: When a stream table joins multiple tables and more than one of those tables receives changes in the same scheduler cycle, the current engine generates one delta branch per source and stacks them in a
UNION ALL. With this change those branches are merged into a singleGROUP BY + SUM(weight)query using Z-set algebra, eliminating duplicate evaluation of shared join paths. B3-1 (branch pruning) and B3-3 (correctness proofs) are already done; this is the final payoff.
| Item | Description | Effort | Ref |
|---|---|---|---|
| B3-2-1 | Z-set merged-delta generation in src/dvm/diff.rs (DiffEngine::diff_node()) | 8–10h | PLAN_MULTI_TABLE_DELTA_BATCHING.md |
| B3-2-2 | Unit + property-based tests (existing B3-3 diamond-flow tests must pass unchanged) | 2–4h | — |
| B3-2-3 | Benchmark regression check against Part-8 baseline | 2h | — |
B3-MERGE subtotal: ~12–16 hours Dependencies: CORR-4 (property tests must accompany). Schema change: No.
PERF-2 — Cost-based refresh strategy completion (B-4 remainder)
In plain terms: Deferred from v0.17.0. The
refresh_strategyGUC landed in the current cycle. The remaining work is the per-ST cost model: collectdelta_row_count,merge_duration_ms,full_refresh_duration_msfrompgt_refresh_history; fit a simple linear cost model; cold-start heuristic (<10 refreshes) falls back to the fixed threshold.
Verify: mixed-workload benchmark showing the model picks the cheaper strategy ≥80% of the time. Dependencies: B-4 Phase 1 (shipped). Schema change: No.
PERF-3 — Zero-change source branch elision
In plain terms: When building a multi-source delta query, skip branches entirely for sources with empty change buffers. Currently all branches are generated and executed regardless of whether a source has changes.
Verify: benchmark showing latency reduction when 1-of-3 sources changes vs. all 3 changing. Dependencies: PERF-1 (applies to the merged delta builder). Schema change: No.
PERF-4 — Columnar change tracking Phase 1 — CDC bitmask (A-2-COL-1)
In plain terms: Deferred from v0.17.0. Compute
changed_columnsbitmask (old.col IS DISTINCT FROM new.col) in the CDC trigger; store asint8orbit(n)alongside the change row. Phase 1 only: bitmask computation + storage. Phase 2 (delta-scan filtering using the bitmask) deferred to v0.22.0. Provides the foundation for 50–90% delta volume reduction on wide-table UPDATE workloads.
Gate behind pg_trickle.columnar_tracking GUC (default off).
Dependencies: None. Schema change: Yes (change buffer schema addition).
PERF-5 — Index hint generation for MERGE target
In plain terms: When the stream table has a covering index on the MERGE join keys, bias the planner toward the index to avoid expensive sequential scans during delta application on large stream tables.
Emit SET enable_seqscan = off within the MERGE statement's session.
Verify: EXPLAIN ANALYZE shows index scan on MERGE for tables with PK index.
Dependencies: None. Schema change: No.
Scalability
| ID | Title | Effort | Priority |
|---|---|---|---|
| SCAL-1 | Change buffer growth stress test at 10× write rate | M | P1 |
| SCAL-2 | Parallel worker utilization profiling at 200+ STs | M | P2 |
| SCAL-3 | Delta working-set memory cap | M | P2 |
SCAL-1 — Change buffer growth stress test at 10× write rate
Run a sustained write load at 10× normal throughput for 30+ minutes with
intentionally slow refresh intervals. Verify the max_buffer_rows cap
triggers correctly, FULL refresh clears the backlog, no disk exhaustion
occurs, and the extension recovers cleanly once write rate normalizes. This
validates the v0.16.0 buffer growth protection under extreme conditions.
Dependencies: None. Schema change: No.
SCAL-2 — Parallel worker utilization profiling at 200+ STs
Profile the scheduler with 200+ stream tables across
pg_trickle.max_workers = 4/8/16 settings. Measure: CPU utilization per
worker, scheduling queue depth, per-ST refresh latency P50/P99. Identify
whether the scheduling loop itself becomes a bottleneck before worker
saturation. Document findings as a scaling guide section.
Dependencies: None. Schema change: No.
SCAL-3 — Delta working-set memory cap
The current delta merge can allocate unbounded work_mem for hash joins. Add
a configurable cap (pg_trickle.delta_work_mem_mb, default: 256 MB) that
triggers FULL refresh fallback when the delta working set would exceed the
limit, preventing OOM on unexpectedly large deltas. Verify: E2E test with low
cap triggers fallback and logs a warning.
Dependencies: None. Schema change: No.
Ease of Use
| ID | Title | Effort | Priority |
|---|---|---|---|
| UX-1 | Template cache observability | S | P1 |
| UX-2 | Pre-built Grafana dashboard panels | M | P1 |
| UX-3 | Error message actionability audit | S | P1 |
| UX-4 | Single-endpoint health summary function | S | P2 |
| UX-5 | Prometheus metric completeness audit | XS | P2 |
| UX-6 | TUI surfaces for cache_stats and health_summary | XS | P2 |
UX-1 — Template cache observability (CACHE-OBS)
In plain terms: The delta SQL template cache (
IVM_DELTA_CACHE) saves regenerating delta queries on every refresh cycle, but its hit rate is invisible to operators. Addingpgtrickle.cache_stats()lets you see whether the cache is effective and tunepg_trickle.ivm_cache_sizeaccordingly.
| Item | Description | Effort |
|---|---|---|
| CACHE-OBS-1 | Add hit/miss/eviction counters to IVM_DELTA_CACHE | 1h |
| CACHE-OBS-2 | Expose via pgtrickle.cache_stats() returning (hits BIGINT, misses BIGINT, evictions BIGINT, size INT) | 1–2h |
| CACHE-OBS-3 | Documentation and E2E smoke test | 1h |
CACHE-OBS subtotal: ~3–4 hours Dependencies: None. Schema change: No.
UX-2 — Pre-built Grafana dashboard panels
Extend monitoring/grafana/ with import-ready JSON panels for: refresh
latency P50/P99 histogram, differential vs. FULL refresh ratio over time,
change buffer backlog per stream table, spill event count, template cache hit
rate, and worker utilization gauge. Document import instructions in
monitoring/README.md.
Dependencies: UX-1 (cache stats metric), STAB-3 (spill events). Schema change: No.
UX-3 — Error message actionability audit
Audit all PgTrickleError variants and ereport!()/error!() calls. Ensure
every user-facing error includes: the stream table name (when applicable), the
operation that failed, and a 1-sentence remediation hint. Cross-reference with
docs/ERRORS.md; add missing entries.
Dependencies: None. Schema change: No.
UX-4 — Single-endpoint health summary function
New pgtrickle.health_summary() function returning a single-row JSONB:
total STs, healthy/degraded/error counts, oldest un-refreshed age, largest
buffer backlog, fuse status, scheduler state. Useful for monitoring
integrations (Nagios, Datadog) without parsing multiple views.
Dependencies: None. Schema change: No.
UX-5 — Prometheus metric completeness audit
Verify every metric emitted by the extension matches the documented name in
docs/CONFIGURATION.md §Prometheus. Remove undocumented metrics or add
documentation. Ensure metric names follow Prometheus naming conventions
(pgtrickle_* prefix, snake_case, unit suffix).
Dependencies: None. Schema change: No.
UX-6 — TUI surfaces for cache_stats and health_summary
In plain terms: The new
pgtrickle.cache_stats()(UX-1) andpgtrickle.health_summary()(UX-4) functions are useful in isolation but are most discoverable when surfaced in the TUI. Even a read-only status panel showing total STs, healthy/degraded/error counts, cache hit rate, and scheduler state would make these endpoints visible to users who reach the extension throughpgtrickle-tuirather than raw SQL. Auditpgtrickle-tui/src/to identify the lightest-weight integration point (likely a new "Health" tab or an expanded "Status" panel). If TUI changes are out of scope for this release, document the gap indocs/TUI.mdso it is not silently deferred.
Verify: TUI displays non-zero cache stats and a valid health JSONB row after at least one refresh cycle in the E2E playground environment. Dependencies: UX-1, UX-4. Schema change: No.
Test Coverage
| ID | Title | Effort | Priority |
|---|---|---|---|
| TEST-1 | TPC-H regression baseline | XS | P0 |
| TEST-2 | SQLancer fuzzing — crash-test oracle | L | P1 |
| TEST-3 | CDC edge cases: NULL PKs, composite PKs, generated columns | M | P1 |
| TEST-4 | Property-based tests for Z-set merged delta | M | P0 |
| TEST-5 | Light E2E eligibility audit | S | P2 |
| TEST-6 | Three-version upgrade chain test (0.16→0.17→0.18) | S | P0 |
| TEST-7 | dbt integration regression coverage | S | P1 |
TEST-1 — TPC-H regression baseline (TPCH-BASE) Same as CORR-2. Capture known-good outputs; verify guard fires on deliberate regression. Dependencies: None. Schema change: No.
TEST-2 — SQLancer fuzzing — crash-test oracle
In plain terms: Deferred from v0.17.0 (second time). Scope reduced to crash-test oracle only for v0.18.0: SQLancer in Docker, configured to feed randomized SQL to the parser and DVM pipeline. Zero-panic guarantee — any input that crashes the extension is a bug. Equivalence oracle (DIFFERENTIAL ≡ FULL) and stateful DML fuzzing deferred to v0.22.0.
Verify: 10K+ fuzzed queries with zero panics. Dependencies: None. Schema change: No.
TEST-3 — CDC edge cases: NULL PKs, composite PKs, generated columns
Create E2E tests covering: (a) tables with nullable PK columns in
differential mode, (b) composite PKs with 3+ columns, (c) GENERATED ALWAYS AS stored columns as source columns, (d) domain-typed columns, (e)
array-typed columns referenced in defining queries.
Dependencies: None. Schema change: No.
TEST-4 — Property-based tests for Z-set merged delta Required companion to PERF-1. proptest-based tests generating random multi-source DAGs (2–5 sources, 1–3 join levels) with random DML sequences. Assert merged delta produces identical stream table state as sequential per-branch application. Detect weight-accounting bugs before they ship. Dependencies: PERF-1. Schema change: No.
TEST-5 — Light E2E eligibility audit
Review all 10 full E2E test files (~90 tests). Identify tests that don't
require custom Docker image features (custom extensions, special
configurations) and can run on the stock postgres:18.3 image. Migrate
eligible tests to reduce CI wall-clock time on PRs.
Dependencies: None. Schema change: No.
TEST-6 — Three-version upgrade chain test (0.16→0.17→0.18) Extend upgrade E2E tests to cover: fresh install of 0.16.0, create stream tables, upgrade to 0.17.0, verify survival, upgrade to 0.18.0, verify survival + new features functional. Dependencies: All schema-changing items. Schema change: No.
TEST-7 — dbt integration regression coverage
In plain terms: The
dbt-pgtricklemacro package is the primary adoption vector for teams using dbt, but the integration test suite indbt-pgtrickle/integration_tests/currently verifies only happy-path macro expansion. Add regression tests covering: (a)pgtrickle_stream_tablemacro with all supported materialisation strategies (differential,full,auto), (b) incremental model compatibility, (c)pgtrickle_statustest macro, (d) teardown and recreation idempotency (drop + re-run produces identical output). Run as part ofjust test-dbt.
Verify: just test-dbt passes all new cases; idempotency test confirms
identical stream table contents after a full dbt run --full-refresh cycle.
Dependencies: None. Schema change: No.
Conflicts & Risks
-
PERF-1 + CORR-4 + TEST-4 form a mandatory cluster. The Z-set multi-source delta engine (B3-MERGE) is the highest-impact performance item but also touches the DVM engine core. Property-based tests (TEST-4) and the weight accounting proof (CORR-4) are not optional — they must ship alongside PERF-1 to prevent correctness regressions.
-
Two schema changes. CORR-1 (CSS-3) adds
pgt_css_watermark_lsnto the catalog. PERF-4 (A-2-COL-1) addschanged_columnsto change buffer tables. Both require upgrade migration scripts and freeze-risk coordination. Consider batching both into a single migration file. -
PERF-3 depends on PERF-1. Zero-change branch elision modifies the same delta query builder as B3-MERGE. Sequence PERF-3 strictly after PERF-1 to avoid merge conflicts and compound risk.
-
TEST-2 (SQLancer) is deferred for the second time. Originally planned for v0.17.0, it remains unstarted. v0.18.0 scopes it to crash-test oracle only (L effort instead of XL), but there is a risk of perpetual deferral. If capacity is tight, prioritize the crash-test oracle as a standalone deliverable rather than deferring the full suite again.
-
PERF-2 (cost model) requires production history data. The per-ST cost model trains on
pgt_refresh_history. Users upgrading from v0.17.0 will have a cold history cache. The cold-start heuristic (< 10 refreshes) is critical — test it explicitly. -
PERF-4 (columnar tracking) changes CDC trigger output. The
changed_columnsbitmask adds overhead to every trigger invocation. Gate behind a GUC (defaultoff) and benchmark the per-row overhead (< 1μstarget) before enabling by default in a later release. -
B-4 and A-2-COL are carry-overs from v0.17.0. Both were originally scoped for v0.17.0 but not started. They are re-proposed here with reduced scope (B-4 cost model only, A-2-COL Phase 1 bitmask only). If v0.17.0 ships B-4 partially, adjust PERF-2 scope accordingly.
v0.18.0 total: ~70–100 hours
Exit criteria:
-
CORR-1: Split-snapshot E2E test passes under concurrent writes;
pgt_css_watermark_lsncolumn added - CORR-2 / TEST-1: TPC-H baseline populated; deliberate regression detected by the guard
- CORR-3: NULL-keyed GROUP BY group fully removed after all-row delete
- CORR-4 / TEST-4: Property-based Z-set weight tests pass for randomly generated multi-source DAGs
- CORR-5: HAVING-qualified group deleted from stream table when row count drops below threshold
-
STAB-1: All production-path
unwrap()calls inapi.rsandrefresh.rsreplaced with proper error propagation -
STAB-2:
unsafe_inventory.shreports ≥69 fewerunsafeblocks; CI baseline updated - STAB-3: Spill alert fires in E2E test with artificially low threshold
- STAB-4: Worker crash recovery E2E test cleans up advisory locks, temp tables, and buffer rows
- STAB-5 / TEST-6: Three-version upgrade chain (0.16→0.17→0.18) passes
-
STAB-6: All user-facing errors have documented SQLSTATE codes in
docs/ERRORS.md - PERF-1: Merged multi-source delta implemented; all B3-3 diamond-flow property tests pass unchanged
- PERF-2: Cost model picks cheaper strategy ≥80% of the time on mixed workload benchmark
- PERF-3: Zero-change branch elision shows measurable latency reduction in multi-source benchmark
-
PERF-4:
changed_columnsbitmask stored in change buffer; per-row overhead < 1μs - PERF-5: Index scan confirmed via EXPLAIN ANALYZE for MERGE on tables with PK covering index
- SCAL-1: Buffer growth stress test at 10× rate completes without disk exhaustion or data loss
- SCAL-2: Profiling report for 200+ STs documented
- SCAL-3: Delta work_mem cap triggers FULL fallback in E2E test
-
UX-1:
pgtrickle.cache_stats()returns correct counters in smoke test - UX-2: Grafana dashboard JSON importable; documents refresh latency, buffer backlog, spill events
- UX-3: Error message audit complete; all errors include table name and remediation hint
-
UX-4:
pgtrickle.health_summary()returns single-row JSONB with correct counts - UX-5: Prometheus metric names match documentation; no undocumented metrics
- TEST-2: SQLancer crash-test oracle runs 10K+ fuzzed queries with zero panics
- TEST-3: CDC edge case tests cover NULL PKs, composite PKs, generated columns, domain types, arrays
- TEST-5: At least 10 tests migrated from full E2E to light E2E
-
TEST-7: dbt regression suite covers all macro strategies and teardown idempotency;
just test-dbtpasses -
UX-6: TUI (or
docs/TUI.mdgap note) reflectscache_stats()andhealth_summary()availability -
Extension upgrade path tested (
0.17.0 → 0.18.0) -
just check-version-syncpasses
v0.19.0 — Production Gap Closure & Distribution
Status: Released (2026-04-13).
Release Theme This release closes the most impactful correctness, security, stability, and performance gaps identified in the Phase 7 deep-dive and subsequent audits that v0.18.0 did not address. It removes the unsafe
delete_insertmerge strategy, adds ownership checks to all DDL-like API functions, hardens the WAL decoder path before it is promoted to production-ready, eliminates O(n²) scheduler dispatch overhead, and ships pg_trickle on standard package registries for the first time. The JOIN delta R₀ fix for simultaneous key-change + right-side delete is the highest-value correctness improvement remaining before 1.0. CDC ordering guarantees, parallel worker crash recovery, delta branch pruning for zero-change sources, and an index-aware MERGE path round out a release that strengthens every layer of the stack. Four to five weeks of focused work delivers measurable correctness improvements, privilege enforcement, catalog index optimizations, a PgBouncer transaction-mode compatibility fix, read-replica safety, and PGXN/apt/rpm distribution.
Completed items (click to expand)
Correctness
| ID | Title | Effort | Priority |
|---|---|---|---|
| CORR-1 | Remove unsafe delete_insert merge strategy | XS | P0 |
| CORR-2 | JOIN delta R₀ fix — key change + right-side delete | M | P1 |
| CORR-3 | Track ALTER TYPE / ALTER DOMAIN DDL events | S | P1 |
| CORR-4 | Track ALTER POLICY DDL events for RLS source tables | S | P1 |
| CORR-5 | Fix keyless content-hash collision on identical-content rows | S | P1 |
| CORR-6 | Harden guarded .unwrap() calls in DVM operators | XS | P2 |
| CORR-7 | TRUNCATE + INSERT CDC ordering guarantee | S | P1 |
| CORR-8 | NULL join-key delta handling for INNER/OUTER joins | S | P1 |
CORR-1 — Remove unsafe delete_insert merge strategy
In plain terms: The
delete_insertstrategy (set viapg_trickle.merge_join_strategy = 'delete_insert') is semantically unsafe for aggregate and DISTINCT queries because the DELETE half executes against already-mutated state, producing phantom deletes. It is slower than standard MERGE for small deltas and incompatible with prepared statements. Theautostrategy already covers its only legitimate use case.
| Item | Description | Effort |
|---|---|---|
| CORR-1-1 | Remove delete_insert as a valid enum value; emit ERROR if set with hint to use 'auto'. | XS |
| CORR-1-2 | Add upgrade SQL to detect old GUC value and log a NOTICE. | XS |
Verify: SET pg_trickle.merge_join_strategy = 'delete_insert' raises ERROR
with actionable hint. All existing benchmarks pass.
Dependencies: None. Schema change: No.
CORR-2 — JOIN delta R₀ fix for simultaneous key-change + right-side delete
In plain terms: When a row's join key column is updated (
UPDATE orders SET cust_id = 5 WHERE cust_id = 3) in the same refresh cycle as the old join partner (customer 3) is deleted, the DELETE half of the delta finds no match incurrent_rightand is silently dropped, leaving a stale row in the stream table until the next full refresh. The fix applies the R₀ snapshot technique (pre-change right-side state via EXCEPT ALL) symmetrically with the existing L₀ already implemented for Part 2 of the delta.build_snapshot_sql()injoin_common.rsalready exists.
| Item | Description | Effort |
|---|---|---|
| CORR-2-1 | Add right_part1_source / use_r0 logic mirroring use_l0 in diff_inner_join, diff_left_join, diff_full_join. | M |
| CORR-2-2 | Split Part 1 SQL into two UNION ALL arms for the use_r0 case; update row ID hashing for Part 1b. | M |
| CORR-2-3 | Integration tests: co-delete scenario, UPDATE-then-delete, multi-cycle correctness, TPC-H Q07 regression. | M |
Verify: E2E test where UPDATE orders SET cust_id = new_id and
DELETE FROM customers WHERE id = old_id land in the same refresh cycle produces
correct stream table result without a forced full refresh.
Dependencies: EC-01 R₀ EXCEPT ALL pattern (shipped in v0.15.0). Schema change: No.
CORR-3 — Track ALTER TYPE / ALTER DOMAIN DDL events
In plain terms: When a user-defined type or domain used by a source table column is altered (e.g., extending an enum, changing a domain constraint), the DDL event trigger fires but
hooks.rsdoes not classify it as requiring downstream stream table invalidation. Fix: extend the DDL classifier to catchALTER TYPEandALTER DOMAINand trigger cascade invalidation.
Verify: ALTER TYPE my_enum ADD VALUE 'new_val' on a type used by a source
column triggers the marked-for-reinit flag on dependent stream tables.
Dependencies: None. Schema change: No.
CORR-4 — Track ALTER POLICY DDL events for RLS source tables
In plain terms: If an
ALTER POLICYchanges the USING expression on a source table, stream tables may silently return wrong results for sessions with active RLS. Fix: detectALTER POLICYin the DDL classifier and mark dependent stream tables for conservative reinit.
Verify: ALTER POLICY on a source table with dependent stream tables triggers
invalidation. E2E test with RLS policy change confirms correct reinitialization.
Dependencies: None. Schema change: No.
CORR-5 — Fix keyless content-hash collision on identical-content rows
In plain terms: The keyless table path uses a content hash to identify rows. If two rows have completely identical content, they hash to the same bucket. Under concurrent INSERT + DELETE of identical rows, the net-counting approach may attribute a delete to the wrong "copy" of the row, leaving incorrect counts. Fix: incorporate the change buffer's
(lsn, op_index)pair into the hash to break ties between otherwise-identical rows.
Verify: E2E test with two identical rows — insert 2, delete 1 in same cycle; stream table retains exactly 1 row. Dependencies: EC-06 keyless path (shipped in prior release). Schema change: No.
CORR-6 — Harden guarded .unwrap() calls in DVM operators
In plain terms: Several DVM operators use
.unwrap()on values that are logically guaranteed by a prioris_some()guard, but the coupling is implicit and fragile — a refactor could silently break the invariant, causing a panic in SQL-reachable code. The most fragile instance isctx.st_qualified_name.as_deref().unwrap()infilter.rs(line ~130), guarded byhas_stwhich is derived fromis_some()several lines earlier. Replace these patterns withif let Some(…)or.unwrap_or_else(|| …)to make the invariant structurally enforced rather than comment-documented.
Verify: grep -rn '\.unwrap()' src/dvm/operators/ returns zero hits outside
test modules. All existing unit tests pass.
Dependencies: None. Schema change: No.
CORR-7 — TRUNCATE + INSERT CDC ordering guarantee
In plain terms: When a
TRUNCATEand subsequentINSERToccur within the same transaction on a source table, the change buffer must preserve their ordering. If the refresh engine processes the INSERT before the TRUNCATE, the stream table loses all rows including the newly inserted ones. The trigger- based CDC path records operations inctidorder within a statement, but cross-statement ordering within a single transaction relies on the change buffer’sop_seqcolumn. Verify thatop_seqis monotonically increasing across statements and that the refresh engine applies TRUNCATE before INSERT.
Verify: E2E test: BEGIN; TRUNCATE src; INSERT INTO src VALUES (1); COMMIT;
followed by refresh — stream table contains exactly 1 row.
Dependencies: None. Schema change: No.
CORR-8 — NULL join-key delta handling for INNER/OUTER joins
In plain terms: When a join key column contains NULL, the INNER JOIN delta should produce zero matching rows (NULL ≠ NULL in SQL), and LEFT/FULL OUTER JOIN deltas should produce NULL-extended rows. The v0.18.0 NULL GROUP BY fix addressed aggregate grouping but the JOIN delta path’s NULL-key behavior is exercised only indirectly by existing tests. Add explicit coverage: INSERT a row with NULL join key, UPDATE it to a non-NULL key, DELETE it — verify each delta cycle produces correct results under both INNER and LEFT JOIN.
Verify: E2E tests with NULL join keys for INNER JOIN, LEFT JOIN, and FULL JOIN — all delta cycles produce correct results matching a full recompute. Dependencies: None. Schema change: No.
Security
| ID | Title | Effort | Priority |
|---|---|---|---|
| SEC-1 | Add ownership checks to drop_stream_table / alter_stream_table | S | P0 |
| SEC-2 | SQL injection audit for dynamic refresh SQL | XS | P1 |
SEC-1 — Add ownership checks to drop_stream_table / alter_stream_table
In plain terms: Currently, any role with EXECUTE privilege on
pgtrickle.drop_stream_table()orpgtrickle.alter_stream_table()can modify or drop any stream table, regardless of who created it. PostgreSQL convention requires that only the owner (or a superuser) can DROP or ALTER an object. Fix: callpg_class_ownercheck(stream_table_oid, GetUserId())(or the pgrx-safe equivalent) at the top of both functions and raiseERROR: must be owner of stream table "name"if the check fails.create_stream_tablealready records the creating role as the table owner inpg_class.
Verify: Non-owner role calling pgtrickle.drop_stream_table('other_users_st')
receives ERROR: must be owner of stream table "other_users_st". Superuser
can still drop any stream table. E2E test with two roles confirms.
Dependencies: None. Schema change: No.
SEC-2 — SQL injection audit for dynamic refresh SQL
In plain terms: The refresh engine builds SQL strings dynamically using
format!()with user-provided table names, column names, and schema names. While pgrx’squote_identifier()andquote_literal()are used in most places, a focused audit of everyformat!()call site inrefresh.rs,diff.rs, and theoperators/directory ensures no path allows unquoted user input into executable SQL. This is a review-only item — fix any findings immediately as P0.
Verify: Audit checklist signed off — every format!() that incorporates
catalog-derived names uses quote_identifier() or parameterised SPI queries.
Zero unquoted interpolations outside test code.
Dependencies: None. Schema change: No.
Stability
| ID | Title | Effort | Priority |
|---|---|---|---|
| STAB-1 | PgBouncer transaction-mode compatibility guard | M | P1 |
| STAB-2 | Read-replica / hot-standby safety guard | S | P1 |
| STAB-3 | Elevate Semgrep to blocking in CI | XS | P1 |
| STAB-4 | auto_backoff GUC — double interval after 3 falling-behind cycles | S | P2 |
| STAB-5 | Harden unwrap() in scheduler hot path | XS | P2 |
| STAB-6 | Parallel worker crash recovery sweep | M | P1 |
| STAB-7 | Extension version mismatch detection at load | XS | P2 |
STAB-1 — PgBouncer transaction-mode compatibility guard
In plain terms: In PgBouncer transaction mode, session-level state is lost between transactions because different backend connections may serve the same session. pg_trickle uses transaction-scoped advisory locks which are safe, but also uses prepared statements and
SET LOCAL— both of which fail silently in transaction mode, causing incorrect refresh behavior. Addingpg_trickle.connection_pooler_modeGUC (none/session/transaction) and disabling prepared statements intransactionmode prevents silent misbehavior.
Verify: integration test with PgBouncer transaction mode confirms refreshes
complete correctly without prepared statement errors.
pg_trickle.connection_pooler_mode = 'transaction' documented in
docs/PRE_DEPLOYMENT.md.
Dependencies: None. Schema change: No.
STAB-2 — Read-replica / hot-standby safety guard
In plain terms: If pg_trickle's background worker accidentally starts on a streaming replica (hot standby), it attempts writes to the catalog and crash-loops. Fix: detect
pg_is_in_recovery()at worker startup and exit gracefully withLOG: pg_trickle background worker skipped: server is in recovery mode.
Verify: integration test that simulates a replica environment; background worker exits cleanly with the correct log message. No crash loop. Dependencies: None. Schema change: No.
STAB-3 — Elevate Semgrep to blocking in CI
In plain terms: CodeQL and cargo-deny are already blocking in CI; Semgrep runs as advisory-only. Before v1.0.0, all SAST tooling should be blocking. Verify zero findings across all current rules, then flip the CI step from
continue-on-error: trueto blocking.
Verify: CI step passes in blocking mode. Zero advisory-only bypasses remain. Dependencies: None. Schema change: No.
STAB-4 — auto_backoff GUC for scheduler overload
In plain terms: EC-11 shipped the
scheduler_falling_behindalert but deferred auto-remediation. When a stream table has triggered the alert for 3 consecutive cycles, automatically double the effective refresh interval for that table until the next successful on-time cycle. Prevents a single heavy stream table from starving the rest of the queue.
Verify: E2E test with artificially slow stream table; effective interval
doubles after 3 consecutive falling-behind alerts; returns to original
interval after catching up.
Dependencies: EC-11 scheduler_falling_behind (shipped in v0.18.0). Schema change: No.
STAB-5 — Harden unwrap() in scheduler hot path
In plain terms: The scheduler dispatch loop in
scheduler.rsuseseu_dag.units().find(|u| u.id == uid).unwrap()at several call sites (lines ~1522, ~1680, ~1751, ~1811, ~1859, ~1885). While the IDs come from the same DAG and are expected to always match, a stale topo-order after a concurrent DDL change could cause a panic inside the background worker. Fix: replace with.ok_or(PgTrickleError::InternalError("unit not found in DAG"))?or use the HashMap introduced by PERF-5. This eliminates the lastunwrap()cluster in the scheduler hot path.
Verify: grep -n '\.unwrap()' src/scheduler.rs returns zero hits outside
test-only code. All scheduler integration tests pass.
Dependencies: PERF-5 (HashMap replaces .find().unwrap() pattern). Schema change: No.
STAB-6 — Parallel worker crash recovery sweep
In plain terms: If a background worker is killed (OOM, SIGKILL) or crashes mid-refresh, it may leave behind: (a) orphaned advisory locks that block the next refresh of that stream table, (b) partially consumed rows in the change buffer (consumed but not committed), or (c) incomplete catalog state. Add a startup recovery sweep to the scheduler: on launch, scan for advisory locks held by PIDs that no longer exist (
pg_stat_activity), roll back anyxact_status = 'in progress'from dead backends, and reset stream tables stuck inREFRESHINGstate with no active backend.
Verify: Integration test: kill a worker PID mid-refresh via
pg_terminate_backend(); restart the scheduler; the affected stream table
recovers without manual intervention within one scheduler cycle.
Dependencies: None. Schema change: No.
STAB-7 — Extension version mismatch detection at load
In plain terms: Running
ALTER EXTENSION pg_trickle UPDATEupdates the SQL objects but the shared library (pg_trickle.so) remains loaded from the previous version until the server is restarted. This mismatch can cause subtle failures (wrong function signatures, missing struct fields). Add a version check in_PG_init()that compares the compiled-in version string against the SQL-levelextversionfrompg_extension. Emit a WARNING if they differ and refuse to start background workers until the server is reloaded.
Verify: After ALTER EXTENSION pg_trickle UPDATE without server restart,
the extension log shows WARNING: pg_trickle shared library version (X) does not match installed extension version (Y) — restart PostgreSQL.
Background workers do not start.
Dependencies: None. Schema change: No.
Performance
| ID | Title | Effort | Priority |
|---|---|---|---|
| PERF-1 | Fix WAL decoder: old_* columns always NULL on UPDATE | S | P1 |
| PERF-2 | Fix WAL decoder: naive pgoutput action string parsing | S | P1 |
| PERF-3 | EXPLAIN (ANALYZE, BUFFERS) surface for delta SQL in explain_st() | S | P2 |
| PERF-4 | Add catalog indexes on pgt_relid and pgt_dependencies(pgt_id) | XS | P1 |
| PERF-5 | Eliminate O(n²) units().find() in scheduler dispatch | S | P1 |
| PERF-6 | Batch has_table_source_changes() into single query | S | P2 |
| PERF-7 | Delta branch pruning for zero-change sources | S | P1 |
| PERF-8 | Index-aware MERGE path selection | S | P2 |
PERF-1 — Fix WAL decoder: old_* columns always NULL on UPDATE
In plain terms: In WAL-based CDC (
pg_trickle.wal_enabled = true), theold_col_*values for UPDATE rows are always NULL because the decoder readsnew_tuplefor both old and new field positions. This breaks R₀ snapshot construction for the WAL path. Fix: correctly writeold_tuplefields to theold_col_*buffer columns for UPDATE events. Currently dormant (only manifests withwal_enabled = true).
Verify: WAL decoder integration test: UPDATE source SET pk = new_pk; assert
old_col_pk IS NOT NULL in the change buffer and equals the pre-update value.
Dependencies: None. Schema change: No.
PERF-2 — Fix WAL decoder: naive pgoutput action string parsing
In plain terms: The WAL decoder parses action type with
starts_with("I")which incorrectly matches any string beginning with "I" (e.g.,"INSERT"). Fix: use exact single-character comparison (== "I") or parse the action byte directly from the pgoutput message buffer. Currently dormant (only manifests withwal_enabled = true).
Verify: WAL decoder unit tests for each action type using exact-match assertion. Fuzz test with action strings longer than 1 character. Dependencies: None. Schema change: No.
PERF-3 — EXPLAIN (ANALYZE, BUFFERS) in explain_st()
In plain terms:
pgtrickle.explain_st(name)returns the delta SQL template without execution statistics. Adding awith_analyze BOOLEANparameter that runsEXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)on the delta SQL gives operators plan + actual row counts + buffer hit/miss data — making slow refresh diagnosis much easier.
Verify: pgtrickle.explain_st('my_st', with_analyze => true) returns JSONB
with Plan, Actual Rows, and Shared Hit Blocks fields. Documented in
docs/SQL_REFERENCE.md.
Dependencies: None. Schema change: No.
PERF-4 — Add catalog indexes on pgt_relid and pgt_dependencies(pgt_id)
In plain terms:
pgt_stream_tableshas an index onstatusbut not onpgt_relid, which is used in hot-path lookups (WHERE pgt_relid = $1) by DDL hooks, CDC trigger installation, and refresh dependency resolution.pgt_dependencieshas an index onsource_relidbut not onpgt_id, which is used when rebuilding a single stream table's dependency set. Adding these two B-tree indexes eliminates sequential scans on these catalog tables at scale.
Verify: \di pgtrickle.idx_pgt_relid and \di pgtrickle.idx_deps_pgt_id
exist after upgrade. EXPLAIN of SELECT * FROM pgtrickle.pgt_stream_tables WHERE pgt_relid = 12345 shows Index Scan.
Dependencies: None. Schema change: Yes (upgrade SQL adds CREATE INDEX).
PERF-5 — Eliminate O(n²) units().find() in scheduler dispatch
In plain terms: The scheduler dispatch loop calls
eu_dag.units().find(|u| u.id == uid)inside iteration overtopo_orderandready_queue, causing O(n²) behavior per tick. At 500+ stream tables this adds measurable overhead. Fix: build aHashMap<UnitId, &Unit>once per tick and replace all.find()lookups with O(1) map access.
Verify: Benchmark with 500 stream tables shows tick latency < 1ms (currently
~5–10ms). grep -n 'units().find' src/scheduler.rs returns zero hits.
Dependencies: None. Schema change: No.
PERF-6 — Batch has_table_source_changes() into single query
In plain terms:
has_table_source_changes()executes N separateSELECT EXISTS(SELECT 1 FROM changes_<oid> LIMIT 1)SPI queries — one per source table per stream table per scheduler tick. For a stream table with 5 sources, this is 5 SPI round-trips. Batching into a singleSELECT unnest(ARRAY[oid1, oid2, ...]) AS oid WHERE EXISTS(...)or using a singleUNION ALLsubquery reduces this to 1 SPI call regardless of source count.
Verify: SPI call count for has_table_source_changes() is 1 regardless of
source table count. Scheduler integration tests pass.
Dependencies: None. Schema change: No.
PERF-7 — Delta branch pruning for zero-change sources
In plain terms: In a multi-source JOIN stream table (
SELECT * FROM a JOIN b ON ...), the delta has two arms: Δ_a ⋈ b and a ⋈ Δ_b. If only sourceahas changes, the second arm (a ⋈ Δ_b) reads an empty change buffer and produces zero rows — but the engine still executes the full SQL including the join againsta. Short-circuit: checkhas_table_source_changes()per source before building each delta arm. Skip arms where the source has zero changes. For a 5-source star join with only 1 changing source, this eliminates 4 of 5 delta arms entirely.
Verify: Benchmark with 5-source JOIN where only 1 source changes; observe
4 of 5 delta arms skipped in explain_st() output. Refresh latency drops
proportionally.
Dependencies: PERF-6 (batched source-change check). Schema change: No.
PERF-8 — Index-aware MERGE path selection
In plain terms: The MERGE statement used during differential refresh joins the delta against the stream table on
__pgt_row_id. If the stream table has a covering index on the row ID column (which pg_trickle creates by default), the planner should use an index nested-loop join. However, PostgreSQL’s cost model sometimes prefers a hash join for large deltas. Add a targetedSET LOCAL enable_hashjoin = offwithin the refresh transaction when the delta cardinality is below a configurable threshold (pg_trickle.merge_index_threshold, default 10,000 rows) to steer the planner toward the index path for small deltas.
Verify: EXPLAIN of the MERGE with delta < 10,000 rows shows Index Nested
Loop instead of Hash Join. Benchmark shows improved P99 latency for small
deltas on large stream tables.
Dependencies: None. Schema change: No.
Scalability
| ID | Title | Effort | Priority |
|---|---|---|---|
| SCAL-1 | Read replica compatibility section in docs/SCALING.md | S | P1 |
| SCAL-2 | Multi-database GUC stub (pg_trickle.database_list) | S | P2 |
| SCAL-3 | CNPG operational runbook in docs/SCALING.md | S | P2 |
| SCAL-4 | Partitioned source table impact assessment | M | P2 |
SCAL-1 — Read replica compatibility documentation
In plain terms: The background worker now safely skips on replicas (STAB-2), but the interaction with read replicas for query offloading deserves its own documentation section. Add
docs/SCALING.md §Read Replicascovering: which queries are safe on a replica, howpg_is_in_recovery()is used by the extension, and the recommended architecture for OLAP read-offload alongside pg_trickle stream tables.
Verify: docs/SCALING.md has a dedicated replica section.
Dependencies: STAB-2. Schema change: No.
SCAL-2 — Multi-database GUC stub
In plain terms: Post-1.0 multi-database support requires catalog changes. This item adds only the
pg_trickle.database_list TEXTGUC declaration with a default of''(current database only) and a startup WARNING if set. This reserves the configuration namespace and lets operators test GUC surface before the full feature ships.
Verify: SHOW pg_trickle.database_list returns ''. Setting a non-empty
value emits a WARNING: "pg_trickle.database_list is not yet implemented."
Dependencies: None. Schema change: No.
SCAL-3 — CNPG operational runbook in docs/SCALING.md
In plain terms: The CNPG (CloudNativePG) smoke test in CI validates that pg_trickle loads and functions on a CNPG-managed cluster, but the operational patterns are not documented. Add a §CNPG / Kubernetes section to
docs/SCALING.mdcovering:cluster-example.yamlannotations for loading the extension, pod restart behavior when the background worker crashes, WAL volume sizing for CDC, recommendedshared_preload_librariesconfiguration, and health check integration with Kubernetes liveness/readiness probes.
Verify: docs/SCALING.md has a CNPG/Kubernetes section. Content reviewed
against actual CNPG deployment behavior.
Dependencies: None. Schema change: No.
SCAL-4 — Partitioned source table impact assessment
In plain terms: Stream tables backed by partitioned source tables (inheritance or declarative partitioning) are untested and likely broken: CDC triggers may be installed only on the parent, change buffers may miss partition-routed inserts, and
ALTER TABLE ... ATTACH/DETACH PARTITIONDDL events are unhandled. This item is a time-boxed spike (2 days): create a partitioned source, attach a stream table, run INSERT/UPDATE/DELETE through various partitions, and document what works, what breaks, and what the fix scope is. Output: aplans/PLAN_PARTITIONING_SPIKE.mdupdate.
Verify: Spike report documents concrete findings. At minimum: which operations work, which fail, and a rough estimate for full partitioning support. Dependencies: None. Schema change: No.
Ease of Use
| ID | Title | Effort | Priority |
|---|---|---|---|
| UX-1 | PGXN release_status → "stable" | XS | P1 |
| UX-2 | Automated Docker Hub release pipeline | S | P1 |
| UX-3 | apt/rpm packaging via PGDG | M | P1 |
| UX-4 | Connection pooler compatibility guide in docs/PRE_DEPLOYMENT.md | S | P1 |
| UX-5 | pgtrickle.write_and_refresh(dml_sql TEXT, st_name TEXT) | S | P2 |
| UX-6 | Change drop_stream_table cascade default to false | XS | P1 |
| UX-7 | Resolve OIDs to table names in error messages | S | P1 |
| UX-8 | Emit NOTICE when refresh_stream_table is skipped | XS | P1 |
| UX-9 | Fix CONFIGURATION.md TOC gaps for 3 undocumented GUCs | XS | P2 |
| UX-10 | TUI per-table refresh latency sparkline | S | P2 |
| UX-11 | pgtrickle.version() diagnostic function | XS | P2 |
UX-1 — PGXN release_status → "stable"
In plain terms: pg_trickle's
META.jsonusesrelease_status: "testing". Flipping to"stable"signals production-readiness, enabling the extension to appear in the main PGXN package listing and in downstream package managers that consume the PGXN stable feed. One field change inMETA.json.
Verify: META.json "release_status": "stable". Published PGXN listing
reflects the change after the next PGXN sync.
Dependencies: None. Schema change: No.
UX-2 — Automated Docker Hub release pipeline
In plain terms: Automate publishing
pgtrickle/pg_trickle:<ver>-pg18andpgtrickle/pg_trickle:lateston every tagged release. Wire the existingDockerfile.hubinto the GitHub Actions release workflow viadocker/build-push-action. Thelatesttag tracks the highest non-prerelease version.
Verify: After a test release tag, Docker Hub shows the correct image.
docker pull pgtrickle/pg_trickle:0.19.0-pg18 succeeds and passes the
smoke test.
Dependencies: Dockerfile.hub (already exists). Schema change: No.
UX-3 — apt/rpm packaging via PGDG
In plain terms: PostgreSQL users install extensions via
apt install postgresql-18-pg-trickleordnf install pg_trickle_18. Submit package specs topgrpms.org(rpm) and the PGDG apt repository (deb). Generate packages from the GitHub release tarball. This is the most impactful distribution improvement possible.
Verify: apt install postgresql-18-pg-trickle works on Ubuntu 24.04.
dnf install pg_trickle_18 works on RHEL 9. Both pass verify_install.sql.
Dependencies: None. Schema change: No.
UX-4 — Connection pooler compatibility guide
In plain terms: Add a dedicated section to
docs/PRE_DEPLOYMENT.mdcovering: PgBouncer session mode (fully compatible), PgBouncer transaction mode (setpg_trickle.connection_pooler_mode = 'transaction'), pgpool-II (session mode only), PgCat (session mode only). Include a compatibility matrix andpostgresql.conf+ PgBouncer config snippets.
Verify: PRE_DEPLOYMENT.md pooler section reviewed by a DBA familiar with PgBouncer. All described modes are tested or explicitly marked "untested." Dependencies: STAB-1. Schema change: No.
UX-5 — pgtrickle.write_and_refresh() convenience function
In plain terms: In DIFFERENTIAL mode, a write followed by
refresh_stream_table()requires two API calls. A single function that executes the DML and triggers a refresh atomically simplifies read-your-writes patterns for applications that need immediate consistency without the overhead of IMMEDIATE mode.
Verify: SELECT pgtrickle.write_and_refresh('INSERT INTO src VALUES (1)', 'my_st')
executes the INSERT and refreshes the stream table. Documented in
docs/SQL_REFERENCE.md.
Dependencies: None. Schema change: No.
UX-6 — Change drop_stream_table cascade default to false
In plain terms:
pgtrickle.drop_stream_table(name, cascade)currently defaultscascadetotrue. This violates the PostgreSQL convention whereDROPdefaults toRESTRICTandCASCADEmust be explicit. A user callingSELECT pgtrickle.drop_stream_table('my_st')may inadvertently cascade-drop dependent stream tables. Fix: change the default tofalse(RESTRICT). This is a behavior change — existing scripts that rely on the implicit cascade must addcascade => trueexplicitly.
Verify: SELECT pgtrickle.drop_stream_table('parent_st') returns an error
when parent_st has dependents. SELECT pgtrickle.drop_stream_table('parent_st', cascade => true) succeeds. Documented in CHANGELOG as a breaking change.
Dependencies: None. Schema change: No (function signature change only).
UX-7 — Resolve OIDs to table names in error messages
In plain terms:
UpstreamTableDropped(u32)andUpstreamSchemaChanged(u32)display raw PostgreSQL OIDs (e.g.,"upstream table dropped: OID 16384"). Users cannot easily map OIDs to table names. Fix: resolve the OID toschema.tableviapg_classat error-construction time or store the name alongside the OID. If the table is already dropped, fall back to"OID <oid> (table no longer exists)".
Verify: UpstreamTableDropped error message shows "upstream table dropped: public.orders" instead of raw OID. Fallback tested with a pre-dropped table.
Dependencies: None. Schema change: No.
UX-8 — Emit NOTICE when refresh_stream_table is skipped
In plain terms: When
refresh_stream_table()encounters aRefreshSkippedcondition (e.g., no changes detected, another refresh already in progress), it currently logs atdebug1level and returns success — invisible to the caller at default log levels. Fix: emit a PostgreSQLNOTICE(visible to the calling session) in addition to thedebug1log, so the caller knows the refresh did not execute.
Verify: SELECT pgtrickle.refresh_stream_table('my_st') with no pending
changes emits NOTICE: refresh skipped for "my_st": no changes detected.
Visible in psql output.
Dependencies: None. Schema change: No.
UX-9 — Fix CONFIGURATION.md TOC gaps
In plain terms: Three GUCs (
delta_work_mem_cap_mb,volatile_function_policy,unlogged_buffers) have full documentation sections indocs/CONFIGURATION.mdbut are missing from the table of contents navigation at the top of the file. Additionally, there is a duplicate "Guardrails" entry in the TOC. Fix: add the missing TOC entries and remove the duplicate.
Verify: All ### pg_trickle.* headings in CONFIGURATION.md have a
corresponding TOC link. No duplicate entries.
Dependencies: None. Schema change: No.
UX-10 — TUI per-table refresh latency sparkline
In plain terms: The
pgtrickleTUI dashboard shows each stream table’s current status and last refresh duration, but operators cannot see at a glance whether latency is trending up or down. Add a sparkline column (last 20 refresh latencies, ~80 chars wide) to the stream table list view. The data is already available inpgt_refresh_history; the TUI polls it on each tick. This makes performance degradation and recovery immediately visible without switching to Grafana.
Verify: TUI stream table view shows a sparkline column. Sparkline updates
after each refresh cycle. Values match pgt_refresh_history entries.
Dependencies: None. Schema change: No.
UX-11 — pgtrickle.version() diagnostic function
In plain terms: A
SELECT pgtrickle.version()function that returns the installed extension version, the shared library version, and the target PostgreSQL major version as a composite record. This is standard practice for PostgreSQL extensions (cf.postgis_full_version()) and simplifies remote diagnostics — support can ask a user to run one query instead of checkingpg_available_extensions,pg_config, andSHOW server_versionseparately.
Verify: SELECT * FROM pgtrickle.version() returns three fields:
extension_version, library_version, pg_major_version. Values match the
installed state.
Dependencies: None. Schema change: No.
Test Coverage
| ID | Title | Effort | Priority |
|---|---|---|---|
| TEST-1 | E2E tests for CORR-2 (JOIN delta R₀ fix) | S | P1 |
| TEST-2 | E2E tests for DDL tracking gaps (CORR-3 / CORR-4) | S | P1 |
| TEST-3 | WAL decoder unit tests for PERF-1 / PERF-2 | S | P1 |
| TEST-4 | PgBouncer transaction-mode integration smoke test | M | P1 |
| TEST-5 | Read-replica guard integration test | S | P1 |
| TEST-6 | Ownership-check privilege tests for SEC-1 | S | P1 |
| TEST-7 | Scheduler dispatch benchmark (500+ STs) | S | P1 |
| TEST-8 | Upgrade E2E tests (e2e_migration_tests.rs) | M | P1 |
| TEST-9 | Extract unit-testable logic from E2E-only paths | M | P1 |
| TEST-10 | TPC-H scale factor coverage (SF-1, SF-10) | S | P2 |
TEST-1 — E2E tests for CORR-2 (JOIN delta R₀ fix)
In plain terms: The co-delete scenario (UPDATE join key + DELETE join partner in same cycle) is currently untested. Add three E2E tests: (a) simultaneous key change + right-side delete; (b) UPDATE key + DELETE multiple right-side rows; (c) multi-cycle correctness after the scenario.
Verify: 3 E2E tests in e2e_join_tests.rs. All pass; intermediate full
refresh not required for correctness.
Dependencies: CORR-2. Schema change: No.
TEST-2 — E2E tests for DDL tracking (CORR-3 / CORR-4)
In plain terms: Add E2E tests verifying that
ALTER TYPE,ALTER DOMAIN, andALTER POLICYDDL events correctly trigger stream table invalidation.
Verify: 3 E2E tests (one per DDL type). Stream table state after reinit is correct. Dependencies: CORR-3, CORR-4. Schema change: No.
TEST-3 — WAL decoder unit tests
In plain terms: Add WAL decoder unit tests that explicitly enable
wal_enabled = trueand verify: (a)old_col_*values are non-NULL for UPDATE rows; (b)pk_hashis non-zero for keyless tables; (c) action string parsing uses exact comparison.
Verify: 5+ unit tests in tests/wal_decoder_tests.rs using Testcontainers
with WAL mode enabled.
Dependencies: PERF-1, PERF-2. Schema change: No.
TEST-4 — PgBouncer transaction-mode smoke test
In plain terms: Start PgBouncer in transaction mode via Testcontainers, connect pg_trickle through it, and run a basic refresh cycle. Verifies
connection_pooler_mode = 'transaction'correctly disables prepared statements and refreshes complete without errors.
Verify: integration test passes with PgBouncer transaction mode container. Dependencies: STAB-1. Schema change: No.
TEST-5 — Read-replica guard integration test
In plain terms: Start a streaming replica via Testcontainers, install pg_trickle on the replica, and verify the background worker exits cleanly with the correct log message rather than crash-looping.
Verify: worker log contains "pg_trickle background worker skipped: server is in recovery mode." No ERROR or FATAL in replica logs. Dependencies: STAB-2. Schema change: No.
TEST-6 — Ownership-check privilege tests for SEC-1
In plain terms: Add E2E tests with two PostgreSQL roles: role A creates a stream table, role B (non-superuser, non-owner) attempts to drop and alter it. Verify that role B receives
ERROR: must be owner of stream table. Also verify that a superuser can drop/alter any stream table regardless of ownership.
Verify: 3 E2E tests (non-owner drop, non-owner alter, superuser override). Dependencies: SEC-1. Schema change: No.
TEST-7 — Scheduler dispatch benchmark (500+ STs)
In plain terms: Add a Criterion benchmark that creates a mock DAG with 500+ stream tables and measures per-tick dispatch latency. This gates PERF-5 (HashMap optimization) and provides a regression baseline for future scheduler changes. The benchmark should run in the existing
benches/framework.
Verify: cargo bench --bench scheduler_bench runs and reports P50/P99 tick
latency. Baseline saved for Criterion regression gate.
Dependencies: PERF-5. Schema change: No.
TEST-8 — Upgrade E2E tests (e2e_migration_tests.rs)
In plain terms: The upgrade path from 0.18.0 → 0.19.0 is currently tested only by verifying
ALTER EXTENSION pg_trickle UPDATEruns without error. There are no tests that verify (a) existing stream tables continue to function after upgrade, (b) the new catalog schema items (DB-2 FK, DB-3 version table, DB-5 history retention) are present and correct, or (c) stream table data is preserved. Add a Testcontainers-based upgrade E2E test.
Verify: tests/e2e_migration_tests.rs tests: fresh install, upgrade from
previous version with populated stream tables, catalog integrity check,
post-upgrade refresh cycle. All pass.
Dependencies: DB-1, DB-2, DB-3. Schema change: No (tests existing schema).
TEST-9 — Extract unit-testable logic from E2E-only paths
In plain terms: Several core functions in
refresh.rsandscheduler.rsare currently exercised only through end-to-end tests that require a PostgreSQL container. Extracting pure logic from SPI-dependent code and adding direct unit tests makes regressions detectable in seconds instead of minutes. Target: identify 5+ functions (refresh strategy selection, delta cardinality estimation, backoff calculation, topo-sort cycle detection, merge strategy costing) that operate on plain Rust data structures and can be tested with#[cfg(test)]modules.
Verify: 5+ new #[cfg(test)] unit tests in src/refresh.rs or
src/scheduler.rs. just test-unit runs them in < 5 seconds.
Dependencies: None. Schema change: No.
TEST-10 — TPC-H scale factor coverage (SF-1, SF-10)
In plain terms: The v0.18.0 TPC-H regression guard runs all 22 queries at a single scale factor. Real-world correctness bugs sometimes only manifest at higher cardinalities where hash collisions, sort spill, and parallel execution change the code path. Add nightly runs at SF-1 (6M rows) and SF-10 (60M rows) alongside the existing default. The SF-10 run doubles as a performance soak test — flag any query whose refresh time regresses by more than 20% compared to the previous nightly.
Verify: CI nightly job runs TPC-H at SF-1 and SF-10. All 22 queries produce correct results at both scales. SF-10 timing baseline saved for regression detection. Dependencies: None. Schema change: No.
Schema Stability
| ID | Title | Effort | Priority |
|---|---|---|---|
| DB-1 | Fix duplicate 'DIFFERENTIAL' in two CHECK constraints | XS | P0 |
| DB-2 | Add ON DELETE CASCADE FK on pgt_refresh_history.pgt_id | XS | P0 |
| DB-3 | Add pgtrickle.pgt_schema_version version tracking table | XS | P0 |
| DB-4 | Rename pgtrickle_refresh NOTIFY channel → pg_trickle_refresh | XS | P0 |
| DB-5 | pg_trickle.history_retention_days GUC + scheduler daily cleanup | S | P1 |
| DB-6 | Document public API stability contract in docs/SQL_REFERENCE.md | XS | P1 |
| DB-7 | Add migration script template to sql/ | XS | P1 |
| DB-8 | Validate orphan cleanup in drop_stream_table | XS | P1 |
| DB-9 | pgtrickle.migrate() utility function | S | P2 |
DB-1 — Fix duplicate 'DIFFERENTIAL' in CHECK constraints
In plain terms: Both
pgt_stream_tables.refresh_modeandpgt_refresh_history.actionhave'DIFFERENTIAL'listed twice in their CHECK constraints. While logically harmless, it signals sloppiness and produces confusing output in dumps. Both fromREPORT_DB_SCHEMA_STABILITY.md §3.1.
Verify: \d+ pgtrickle.pgt_stream_tables and \d+ pgtrickle.pgt_refresh_history
show their CHECK constraints with no duplicate values.
Dependencies: None. Schema change: Yes (upgrade SQL drops/recreates constraints).
DB-2 — Add ON DELETE CASCADE FK on pgt_refresh_history.pgt_id
In plain terms:
pgt_refresh_history.pgt_idreferencespgt_stream_tables.pgt_idlogically but has no formal FK. When a stream table is dropped, orphan history rows accumulate indefinitely. AddingFOREIGN KEY (pgt_id) REFERENCES pgtrickle.pgt_stream_tables(pgt_id) ON DELETE CASCADEcleans up automatically.
Verify: Drop a stream table; SELECT count(*) FROM pgtrickle.pgt_refresh_history WHERE pgt_id = <dropped_id> returns 0.
Dependencies: None. Schema change: Yes.
DB-3 — Add pgtrickle.pgt_schema_version version tracking table
In plain terms: There is currently no way for migration scripts to verify which schema version is installed before applying changes. Add a
pgt_schema_version(version TEXT PRIMARY KEY, applied_at TIMESTAMPTZ, description TEXT)table seeded with the current version. Every future migration script will check this table and insert its target version.
Verify: SELECT version FROM pgtrickle.pgt_schema_version ORDER BY applied_at DESC LIMIT 1 returns the current extension version after upgrade.
Dependencies: None. Schema change: Yes.
DB-4 — Rename pgtrickle_refresh NOTIFY channel → pg_trickle_refresh
In plain terms: Two existing NOTIFY channels use
pg_trickle_*naming (pg_trickle_alert,pg_trickle_cdc_transition). The third uses inconsistentpgtrickle_refresh(no separator). Rename before 1.0 while still pre-1.0. Any externalLISTEN pgtrickle_refreshin application code must be updated. Document as a breaking change in CHANGELOG.
Verify: LISTEN pg_trickle_refresh receives notifications on refresh events.
LISTEN pgtrickle_refresh receives none.
Dependencies: None. Schema change: No (code change only).
DB-5 — pg_trickle.history_retention_days GUC + scheduler cleanup
In plain terms:
pgt_refresh_historyhas no retention policy. Production deployments running daily refreshes on 100+ stream tables will accumulate millions of rows within months. Add a GUC (default: 30 days) and a daily cleanup step in the scheduler:DELETE FROM pgtrickle.pgt_refresh_history WHERE start_time < now() - make_interval(...).
Verify: SET pg_trickle.history_retention_days = 1 and run the cleanup;
rows older than 1 day are removed. Default retains 30 days.
Dependencies: None. Schema change: No (new GUC + cleanup logic only).
DB-6 — Document public API stability contract
In plain terms: The stability contract defined in
REPORT_DB_SCHEMA_STABILITY.md §5(Tier 1/2/3 surfaces) is not yet published anywhere users can find it. Add a "Stability Guarantees" section todocs/SQL_REFERENCE.mdcovering: which function signatures are stable, which view columns can be added without a major version, and which internal objects may change with migration scripts.
Verify: docs/SQL_REFERENCE.md has a §Stability Guarantees section linked
from the TOC.
Dependencies: None. Schema change: No.
DB-7 — Add migration script template to sql/
In plain terms: The
sql/pg_trickle--0.18.0--0.19.0.sqlfile is currently empty (stub). Populate it with: (a) the DB-1 CHECK constraint fixes, (b) the DB-2 FK addition, (c) the DB-3 schema version table creation, and (d) the DB-4 NOTIFY channel rename notice. Also create a reusable migration script template comment header for future versions.
Verify: ALTER EXTENSION pg_trickle UPDATE on a 0.18.0 instance applies
all schema changes correctly. check_upgrade_completeness.sh passes.
Dependencies: DB-1, DB-2, DB-3, DB-4. Schema change: Yes (this IS the migration script).
DB-8 — Validate orphan cleanup in drop_stream_table
In plain terms: When a stream table is dropped,
pgt_change_trackingrows with the droppedpgt_idintracked_by_pgt_ids(aBIGINT[]column) may not be cleaned up if the array contains other IDs. Add an explicit sweep: remove the droppedpgt_idfrom alltracked_by_pgt_idsarrays; delete rows where the array becomes empty.
Verify: Create a shared-source ST pair, drop one; SELECT * FROM pgtrickle.pgt_change_tracking shows correct state.
Dependencies: None. Schema change: No.
DB-9 — pgtrickle.migrate() utility function
In plain terms: Add a
pgtrickle.migrate()SQL function that iterates over all registered stream tables and applies any pending dynamic object migrations (change buffer schema updates, CDC trigger function regeneration). This is called automatically at the end ofALTER EXTENSION UPDATEand can also be called manually after an upgrade to repair STs that were being refreshed during the upgrade window.
Verify: SELECT pgtrickle.migrate() completes without error on a fresh
install and after a version upgrade. Returns a summary of migrated objects.
Dependencies: DB-3 (uses schema version to determine needed migrations). Schema change: No.
v0.19.0 total: ~4–5 weeks
Exit criteria:
-
CORR-1:
delete_insertstrategy removed;ERRORraised on old GUC value -
CORR-2: JOIN delta R₀ fix:
UPDATE key + DELETE partnerin same cycle produces correct stream table result -
CORR-3:
ALTER TYPE/ALTER DOMAINDDL events trigger stream table invalidation -
CORR-4:
ALTER POLICYDDL events trigger stream table invalidation - CORR-5: Keyless content-hash collision test passes with two identical-content rows
-
CORR-6: Zero
.unwrap()insrc/dvm/operators/outside test modules -
SEC-1: Non-owner
drop_stream_table/alter_stream_tableraisesERROR: must be owner -
STAB-1:
pg_trickle.connection_pooler_modeGUC added; transaction mode disables prepared statements - STAB-2: Background worker exits cleanly on hot standby with correct log message
- STAB-3: Semgrep elevated to blocking; zero findings verified
-
STAB-4:
auto_backoffGUC: interval doubles after 3 consecutive falling-behind alerts -
STAB-5: Zero
.unwrap()in scheduler hot path outside test modules -
PERF-1: WAL decoder writes correct
old_col_*values for UPDATE rows - PERF-2: WAL decoder uses exact action string comparison
-
PERF-4: Catalog indexes on
pgt_relidandpgt_dependencies(pgt_id)exist after upgrade -
PERF-5: Zero
units().find()in scheduler; HashMap-based O(1) lookup -
PERF-6:
has_table_source_changes()executes single SPI query regardless of source count -
SCAL-1:
docs/SCALING.mdreplica section added -
UX-1:
META.jsonrelease_status→"stable"; PGXN listing updated - UX-2: Docker Hub release automation wired in GitHub Actions
- UX-3: apt/rpm packages available via PGDG
-
UX-4:
docs/PRE_DEPLOYMENT.mdconnection pooler compatibility guide added -
UX-6:
drop_stream_tabledefaults tocascade => false -
UX-7:
UpstreamTableDropped/UpstreamSchemaChangedshow table name instead of raw OID -
UX-8:
refresh_stream_tableemits NOTICE when refresh is skipped - UX-9: CONFIGURATION.md TOC complete; no duplicate entries
- TEST-1: 3 JOIN delta R₀ E2E tests pass
- TEST-2: 3 DDL tracking E2E tests pass
-
TEST-3: 5+ WAL decoder unit tests pass with
wal_enabled = true - TEST-4: PgBouncer transaction-mode integration test passes
- TEST-5: Read-replica guard integration test passes
- TEST-6: 3 ownership-check privilege E2E tests pass
- TEST-7: Scheduler dispatch benchmark baseline saved
- TEST-8: Upgrade E2E tests pass (pre- and post-upgrade stream table correctness)
-
DB-1: No duplicate
'DIFFERENTIAL'in CHECK constraints -
DB-2:
pgt_refresh_history.pgt_idFK withON DELETE CASCADEadded -
DB-3:
pgtrickle.pgt_schema_versiontable present and seeded -
DB-4:
pgtrickle_refreshchannel renamed topg_trickle_refresh -
DB-5:
pg_trickle.history_retention_daysGUC active; daily cleanup deletes old rows -
DB-6:
docs/SQL_REFERENCE.mdstability contract section published -
DB-7:
sql/pg_trickle--0.18.0--0.19.0.sqlapplies DB-1 through DB-4 changes -
DB-8:
drop_stream_tableleaves no orphan rows inpgt_change_tracking - CORR-7: TRUNCATE + INSERT in same transaction — stream table correct after refresh
- CORR-8: NULL join-key delta correct for INNER, LEFT, and FULL JOIN
- SEC-2: SQL injection audit complete — zero unquoted interpolations in refresh SQL
- STAB-6: Worker crash recovery sweep cleans orphaned locks and stuck REFRESHING state
-
STAB-7: Version mismatch WARNING emitted after
ALTER EXTENSIONwithout restart - PERF-7: Delta branch pruning skips zero-change source arms in multi-JOIN
- PERF-8: Index-aware MERGE uses nested loop for small deltas on indexed tables
-
SCAL-3:
docs/SCALING.mdCNPG/Kubernetes section published - SCAL-4: Partitioning spike report written with concrete findings
- UX-10: TUI sparkline column visible for refresh latency trend
-
UX-11:
pgtrickle.version()returns extension, library, and PG versions - TEST-9: 5+ unit tests extracted from E2E-only refresh/scheduler logic
- TEST-10: TPC-H nightly runs at SF-1 and SF-10 with correct results
-
Extension upgrade path tested (
0.18.0 → 0.19.0) -
just check-version-syncpasses
Conflicts & Risks
-
CORR-1 is a user-visible breaking change. Any deployment with
merge_join_strategy = 'delete_insert'inpostgresql.confwill error at startup after upgrade. Requires a prominent CHANGELOG entry and a NOTICE during the upgrade migration. -
CORR-2 touches high-traffic diff operators.
diff_inner_joinanddiff_left_joinare the most commonly used operators. Gate the merge behind TPC-H regression suite + TEST-1. Do not merge without both passing. -
STAB-1 introduces a new GUC. The
pg_trickle.connection_pooler_modeGUC must be mirrored in upgrade migration SQL,CONFIGURATION.md, andcheck-version-syncvalidation. -
PERF-1/PERF-2 are currently dormant. Changes to
wal_decoder.rsmust be tested withwal_enabled = trueexplicitly. The default trigger-based CDC is unaffected — keep WAL tests behind an explicit env var to avoid slowing down the default test run. -
UX-3 (apt/rpm packaging) depends on PGDG maintainer availability (~8–12h) and can be cut without impacting correctness if it risks delaying the release.
-
SEC-1 changes privilege semantics. Existing deployments where non-owner roles call
drop_stream_tableoralter_stream_tablewill break. Requires a CHANGELOG entry and, optionally, apg_trickle.skip_ownership_checkGUC (defaultfalse) for a transition period. -
UX-6 changes the cascade default. Scripts relying on implicit
cascade => truewill silently change behavior — DROP will error instead of cascading. Ship alongside SEC-1 and document both breaking changes together. -
PERF-4 requires upgrade SQL. The two
CREATE INDEXstatements must be added tosql/pg_trickle--0.18.0--0.19.0.sql. Index creation on a busy system may briefly lock the catalog tables (millisecond-range for small catalogs; document in upgrade notes). -
DB-4 renames the
pgtrickle_refreshNOTIFY channel. Any application code usingLISTEN pgtrickle_refreshwill stop receiving notifications after upgrade. The old channel name ceases to exist. Document prominently in CHANGELOG and UPGRADING.md. -
DB-2 adds a CASCADE FK. If any external tooling holds open transactions when a stream table is dropped, the cascade may fail under lock. Test in upgrade E2E (TEST-8) before shipping.
-
STAB-6 touches the scheduler startup path. A bug in the recovery sweep could incorrectly reset a stream table that is still being refreshed on a live backend. The sweep must verify that the PID is truly dead via
pg_stat_activitybefore taking corrective action. -
PERF-8 disables
hashjoinwithin the refresh transaction. If the threshold is set too high, large deltas will use a slower nested-loop path. Make themerge_index_thresholdGUC tunable and document clearly that it only affects the MERGE step, not the delta SQL. -
SCAL-4 (partitioning spike) may uncover scope too large for v0.19.0. If the spike reveals that full partitioning support requires CDC architectural changes, defer the implementation to a later release and document findings in the spike report.
v0.20.0 — Dog-Feeding (pg_trickle Monitors Itself)
Status: Released (2026-04-15). All 62 items implemented, 1 skipped
(PERF-6 already shipped in v0.19.0). See plans/PLAN_0_20_0.md.
Release Theme This release implements dog-feeding: pg_trickle uses its own stream tables to maintain reactive analytics over its internal catalog and refresh-history tables. Five dog-feeding stream tables (
df_efficiency_rolling,df_anomaly_signals,df_threshold_advice,df_cdc_buffer_trends,df_scheduling_interference) replace repeated full-scan diagnostic functions with continuously-maintained incremental views, enable multi-cycle trend detection for threshold tuning, and surface anomalies reactively. An optional auto-apply policy layer can automatically adjustauto_thresholdwhen confidence is high. This validates pg_trickle on its own non-trivial workload and demonstrates the incremental analytics value proposition to users.See plans/PLAN_DOG_FEEDING.md for the full design, architecture, and risk analysis.
Phase 1 — Foundation
| Item | Description | Effort | Ref |
|---|---|---|---|
| DF-F1 | Verify CDC on pgt_refresh_history. Confirm that create_stream_table() installs INSERT triggers on pgt_refresh_history. Fix schema-exclusion logic if the pgtrickle schema is skipped. | 2–4h | PLAN_DOG_FEEDING.md §7 Phase 1 |
| DF-F2 | Create df_efficiency_rolling (DF-1). Maintained rolling-window aggregates over pgt_refresh_history. Replaces refresh_efficiency() full scans. | 2–4h | PLAN_DOG_FEEDING.md §5 DF-1 |
| DF-F3 | E2E test: DF-1 output matches refresh_efficiency(). Insert synthetic history rows, refresh DF-1, assert aggregates agree. | 2–4h | PLAN_DOG_FEEDING.md §8 |
| DF-F4 | pgtrickle.setup_dog_feeding() helper. Single SQL call that creates all five df_* stream tables. | 2–4h | PLAN_DOG_FEEDING.md §7 Phase 4 |
| DF-F5 | pgtrickle.teardown_dog_feeding() helper. Drops all df_* stream tables cleanly. | 1h | PLAN_DOG_FEEDING.md §7 Phase 4 |
Phase 2 — Anomaly Detection
| Item | Description | Effort | Ref |
|---|---|---|---|
| DF-A1 | Create df_anomaly_signals (DF-2). Detects duration spikes, error bursts, and mode oscillation by comparing recent behavior against DF-1 baselines. | 3–5h | PLAN_DOG_FEEDING.md §5 DF-2 |
| DF-A2 | Create df_threshold_advice (DF-3). Multi-cycle threshold recommendation replacing the single-step compute_adaptive_threshold() convergence. | 3–5h | PLAN_DOG_FEEDING.md §5 DF-3 |
| DF-A3 | Verify DAG ordering. DF-1 refreshes before DF-2 and DF-3. | 1–2h | PLAN_DOG_FEEDING.md §7 Phase 2 |
| DF-A4 | E2E test: threshold spike detection. Inject synthetic history making DIFF consistently fast; assert DF-3 recommends raising the threshold. | 2–4h | PLAN_DOG_FEEDING.md §8 |
| DF-A5 | E2E test: anomaly duration spike. Inject a 3× duration spike; assert DF-2 detects it. | 2–4h | PLAN_DOG_FEEDING.md §8 |
Phase 3 — CDC Buffer & Interference
| Item | Description | Effort | Ref |
|---|---|---|---|
| DF-C1 | Create df_cdc_buffer_trends (DF-4). Tracks change-buffer growth rates per source table. May require pgtrickle.cdc_buffer_row_counts() helper for dynamic table names. | 4–8h | PLAN_DOG_FEEDING.md §5 DF-4 |
| DF-C2 | Create df_scheduling_interference (DF-5). Detects concurrent refresh overlap. FULL-refresh mode initially (bounded 1-hour window). | 3–5h | PLAN_DOG_FEEDING.md §5 DF-5 |
| DF-C3 | E2E test: scheduling overlap detection. Create 3 STs with overlapping schedules; verify DF-5 detects overlap. | 2–4h | PLAN_DOG_FEEDING.md §8 |
Phase 4 — GUC & Auto-Apply
| Item | Description | Effort | Ref |
|---|---|---|---|
| DF-G1 | pg_trickle.dog_feeding_auto_apply GUC. Values: off (default) / threshold_only / full. Registered in src/config.rs. | 1–2h | PLAN_DOG_FEEDING.md §6.2 |
| DF-G2 | Auto-apply worker (threshold_only). Post-tick hook reads df_threshold_advice; applies ALTER STREAM TABLE ... SET auto_threshold = <recommended> when confidence is HIGH and delta > 5%. Rate-limited to 1 change per ST per 10 minutes. | 4–8h | PLAN_DOG_FEEDING.md §7 Phase 5 |
| DF-G3 | initiated_by = 'DOG_FEED' audit trail. Log auto-apply changes to pgt_refresh_history. | 1–2h | PLAN_DOG_FEEDING.md §7 Phase 5 |
| DF-G4 | E2E test: auto-apply threshold. Enable threshold_only, inject history making DIFF consistently faster, verify threshold increases automatically. | 2–4h | PLAN_DOG_FEEDING.md §8 |
| DF-G5 | E2E test: rate limiting. Verify no more than 1 threshold change per ST per 10 minutes. | 1–2h | PLAN_DOG_FEEDING.md §8 |
Phase 5 — Operational Diagnostics
| Item | Description | Effort | Ref |
|---|---|---|---|
| OPS-1 | pgtrickle.recommend_refresh_mode(st_name) Reads df_threshold_advice to return a structured recommendation { mode, confidence, reason } rather than computing on demand. | 2–4h | PLAN_DOG_FEEDING.md §10.6 |
| OPS-2 | check_cdc_health() spill-risk enrichment. Query df_cdc_buffer_trends growth rate; emit a spill_risk alert when buffer growth will breach spill_threshold_blocks within 2 cycles. | 2–4h | PLAN_DOG_FEEDING.md §10.3 |
| OPS-3 | pgtrickle.scheduler_overhead() diagnostic function. Returns busy-time ratio, queue depth, avg dispatch latency, and fraction of CPU spent on DF STs vs user STs. | 2–4h | — |
| OPS-4 | pgtrickle.explain_dag() — Mermaid/DOT output. Returns DAG as Mermaid markdown with node colours: user=blue, dog-feeding=green, suspended=red. | 3–4h | — |
| OPS-5 | sql/dog_feeding_setup.sql quick-start template. Runnable script: call setup_dog_feeding(), set dog_feeding_auto_apply = 'threshold_only', configure LISTEN, query initial recommendations. | 1h | — |
| OPS-6 | Workload-aware poll intervals via DF-5 signal. Replace compute_adaptive_poll_ms() exponential backoff with pre-emptive dispatch interval widening when df_scheduling_interference detects contention. | 2–4h | PLAN_DOG_FEEDING.md §10.2 |
| DASH-1 | Grafana Dog-Feeding Dashboard. New monitoring/grafana/dashboards/pg_trickle_dog_feeding.json — 5 panels reading from DF-1 through DF-5. | 4–6h | PLAN_DOG_FEEDING.md §10.5 |
| DBT-1 | dbt pgtrickle_enable_monitoring post-hook macro. Calls setup_dog_feeding() automatically after a successful dbt run; documented in dbt-pgtrickle/. | 2h | — |
OPS-1 — pgtrickle.recommend_refresh_mode(st_name text)
Reads directly from
df_threshold_adviceinstead of computing a single-cycle cost comparison on demand (PLAN_DOG_FEEDING.md §10.6). ReturnsTABLE(mode text, confidence text, reason text). When confidence is LOW (< 10 history rows), emits a fallback with mode='AUTO'and a reason explaining insufficient data. Integrates withexplain_st()output.Verify: call on an ST with ≥ 20 history cycles; assert
mode∈{'DIFFERENTIAL','FULL','AUTO'}andconfidence∈{'HIGH','MEDIUM','LOW'}. Dependencies: DF-A2. Schema change: No.
OPS-2 — check_cdc_health() spill-risk enrichment
Currently
check_cdc_health()performs full-table scans to detect anomalies. When DF-C1 is active, querydf_cdc_buffer_trendsgrowth rate instead. Emit aspill_risk = 'IMMINENT'row when the 1-cycle growth rate extrapolated 2 cycles ahead exceedsspill_threshold_blocks. Falls back to full scan when dog-feeding is not set up.Verify: inject 80% of
spill_threshold_blocksworth of buffer rows with a steep growth rate; assertcheck_cdc_health()returns a spill-risk alert. Dependencies: DF-C1. Schema change: No.
OPS-3 — pgtrickle.scheduler_overhead() diagnostic function
Returns a snapshot of scheduler efficiency:
scheduler_busy_ratio(fraction of wall-clock time spent executing refreshes),queue_depth(STs waiting to be dispatched),avg_dispatch_latency_ms,df_refresh_fraction(fraction of busy time attributable to DF STs). This makes PERF-3's < 1% CPU target observable in production without custom monitoring.Verify: function returns non-NULL values after 5+ refresh cycles; assert
df_refresh_fraction < 0.01in the soak test context. Dependencies: DF-D4. Schema change: No (new function only).
OPS-4 — pgtrickle.explain_dag() — Mermaid / DOT graph output
Returns the full refresh DAG as a Mermaid markdown string (default) or Graphviz DOT (via
format => 'dot'argument). Node labels show ST name, current mode, and refresh interval. Node colours: user STs = blue, dog-feeding STs = green, suspended = red, fused = orange. Edges show dependency direction. Validates that DF-1 → DF-2 → DF-3 ordering is correct post-setup.Verify:
SELECT pgtrickle.explain_dag()aftersetup_dog_feeding()returns a string containing all fivedf_nodes in green with correct edges. Dependencies: None. Schema change: No (new function only).
OPS-5 — sql/dog_feeding_setup.sql quick-start template
A standalone SQL script in
sql/that an operator can run withpsql -f sql/dog_feeding_setup.sql. Contents: callssetup_dog_feeding(), setspg_trickle.dog_feeding_auto_apply = 'threshold_only', runsLISTEN pg_trickle_alert, queriesdog_feeding_status()for a status summary, and queriesdf_threshold_advicefor initial recommendations with a warm-up note. Referenced from GETTING_STARTED.md Day 2 operations section (UX-4).Verify: script executes without errors on a fresh install; produces visible output showing 5 active DF STs. Dependencies: DF-F4, DF-G1, UX-4. Schema change: No.
OPS-6 — Workload-aware poll intervals via DF-5 signal
Currently
compute_adaptive_poll_ms()uses pure exponential backoff that reacts to contention only after it occurs. Replace this with a pre-emptive signal: after each scheduler tick, read the latestoverlap_countfromdf_scheduling_interference; ifoverlap_count >= 2, increase the dispatch interval for the next tick by 20% before dispatching (capped atpg_trickle.max_poll_interval_ms). This closes the dog-feeding feedback loop by letting the analytics directly influence scheduling policy, reducing contention on write-heavy deployments without waiting for timeouts.Verify: soak test with known-contending STs shows lower
overlap_countin DF-5 with signal enabled vs disabled.scheduler_overhead()shows reduced busy-time ratio. Dependencies: DF-C2, OPS-3. Schema change: No.
DASH-1 — Grafana Dog-Feeding Dashboard
Add
monitoring/grafana/dashboards/pg_trickle_dog_feeding.jsonalongside the existingpg_trickle_overview.json. Five panels: (1) Refresh throughput timeline (DF-1avg_diff_msover time), (2) Anomaly heatmap (DF-2 per-ST anomaly type grid), (3) Threshold calibration scatter (DF-3 current vs recommended threshold), (4) CDC buffer growth sparklines (DF-4 per-source growth rate), (5) Interference matrix (DF-5 overlap heatmap). Provisioned automatically inmonitoring/grafana/provisioning/.Verify:
docker compose upinmonitoring/loads both dashboards; all five panels resolve withoutNo dataerrors using the postgres-exporter queries. Dependencies: DF-F2, DF-A1, DF-A2, DF-C1, DF-C2. Schema change: No.
DBT-1 — pgtrickle_enable_monitoring dbt post-hook macro
Add a
pgtrickle_enable_monitoringmacro todbt-pgtrickle/macros/that calls{{ pgtrickle.setup_dog_feeding() }}and emits alog()message confirming activation. Documented indbt-pgtrickle/README.md. Users add+post-hook: "{{ pgtrickle_enable_monitoring() }}"todbt_project.ymlto auto-enable monitoring after anydbt run. Idempotent — safe to call on every run becausesetup_dog_feeding()is already idempotent (STAB-1).Verify:
just test-dbtincludes a test case that runs the macro twice; assertsdog_feeding_status()shows 5 active STs after both calls. Dependencies: DF-F4, STAB-1. Schema change: No.
Documentation & Safety
| Item | Description | Effort | Ref |
|---|---|---|---|
| DF-D1 | SQL_REFERENCE.md: dog-feeding quick start. Document setup_dog_feeding(), teardown_dog_feeding(), all five df_* stream tables, and the auto-apply GUC. | 2–4h | — |
| DF-D2 | CONFIGURATION.md: pg_trickle.dog_feeding_auto_apply GUC. | 1h | — |
| DF-D3 | E2E test: control plane survives DF ST suspension. Drop or suspend all df_* STs; verify the scheduler and refresh logic operate identically. | 2–4h | PLAN_DOG_FEEDING.md §8 |
| DF-D4 | Soak test addition. Add dog-feeding STs to the existing soak test; verify no memory growth or scheduler stalls under 1-hour sustained load. | 2–4h | PLAN_DOG_FEEDING.md §8 |
Correctness
| ID | Title | Effort | Priority |
|---|---|---|---|
| CORR-1 | df_threshold_advice output always within [0.01, 0.80] | S | P0 |
| CORR-2 | DF-2 suppresses false-positive spike on first-ever refresh | S | P0 |
| CORR-3 | avg_change_ratio never NaN/Inf on zero-delta streams | S | P0 |
| CORR-4 | CDC INSERT-only invariant verified on pgt_refresh_history | XS | P1 |
| CORR-5 | DF-1 historical window boundary is exclusive, not inclusive | XS | P1 |
CORR-1 — df_threshold_advice output always within [0.01, 0.80]
The
LEAST(0.80, GREATEST(0.01, …))expression in DF-3 must hold for all input combinations including NULLavg_diff_ms, zeroavg_full_ms, and extreme ratios. Add a property-based test (proptest) that generates random(avg_diff_ms, avg_full_ms, current_threshold)triples and asserts the output is always in the valid range. Any value outside [0.01, 0.80] that reaches auto-apply would corrupt stream table configuration.Verify: proptest with 10,000 iterations; zero out-of-range results. Dependencies: DF-A2. Schema change: No.
CORR-2 — DF-2 suppresses false-positive spike on first-ever refresh
df_anomaly_signalscompareslatest.duration_msagainsteff.avg_diff_ms. On the very first refresh of a stream table there is no rolling average yet (eff.avg_diff_ms IS NULL), so theCASE WHENwould produce no anomaly. Confirm the LATERAL subquery returns NULL (not 0) when history is empty, and that theCASEguard is> 3.0 * NULLIF(eff.avg_diff_ms, 0)so a NULL baseline never triggers a spike.Verify: E2E test creating a brand-new ST; assert
duration_anomaly IS NULLon first DF-2 refresh. Dependencies: DF-A1. Schema change: No.
CORR-3 — avg_change_ratio never NaN/Inf on zero-delta streams
DF-1 computes
avg(h.delta_row_count::float / NULLIF(h.rows_inserted + h.rows_deleted, 0)). If a stream table runs only FULL refreshes (no DIFF cycles) the divisor is always NULL andavg()returns NULL — correct. But if DIFF runs with exactly zero rows inserted and zero deleted (CDC buffer was empty),NULLIFmust prevent a divide-by-zero NaN. Verify the guard holds and thatavg_change_ratiois either a valid float in [0, 1] or NULL.Verify: E2E test triggering a DIFF refresh on a quiescent source; assert
avg_change_ratio IS NULL OR avg_change_ratio BETWEEN 0 AND 1. Dependencies: DF-F2. Schema change: No.
CORR-4 — CDC INSERT-only invariant verified on pgt_refresh_history
pgt_refresh_historyis semantically append-only: rows are only ever INSERTed (one per refresh). The CDC trigger installed by DF-F1 must be an INSERT-only trigger (no UPDATE/DELETE triggers). If the trigger were registered asFOR EACH ROW AFTER INSERT OR UPDATE, a future catalog UPDATE would generate spurious change-buffer rows and corrupt DF-1 aggregates. Inspectpg_triggerto confirm only anINSERTtrigger exists.Verify:
SELECT tgtype FROM pg_trigger WHERE tgrelid = 'pgtrickle.pgt_refresh_history'::regclassreturns only INSERT-event triggers. Dependencies: DF-F1. Schema change: No.
CORR-5 — DF-1 historical window boundary is exclusive, not inclusive
The
WHERE h.start_time > now() - interval '1 hour'clause uses a strict>comparison. This ensures a row withstart_timeexactly equal to the boundary is excluded on each pass, preventing double-counting in rolling aggregates. Confirm the query plan uses the index on(pgt_id, start_time)(see PERF-2) and that the boundary is consistent across DF-1, DF-2, and DF-4 (all use the same 1-hour lookback).Verify: unit test comparing aggregate output with a row at the exact boundary; assert it is excluded. Dependencies: DF-F2. Schema change: No.
Stability
| ID | Title | Effort | Priority |
|---|---|---|---|
| STAB-1 | setup_dog_feeding() is fully idempotent | S | P0 |
| STAB-2 | Auto-apply handles ALTER STREAM TABLE failure gracefully | S | P0 |
| STAB-3 | DF STs survive DROP EXTENSION + CREATE EXTENSION cycle | S | P1 |
| STAB-4 | Auto-apply worker checks ST still exists before applying | XS | P1 |
| STAB-5 | teardown_dog_feeding() is safe when some DF STs already removed | XS | P1 |
STAB-1 — setup_dog_feeding() is fully idempotent
Calling
setup_dog_feeding()a second time while DF STs already exist must not raise an error. UseIF NOT EXISTSsemantics internally (or check catalog before creating). The function must also be safe to call concurrently from two sessions. Idempotency is critical for upgrade scripts and Terraform-style declarative deployment workflows.Verify: call
setup_dog_feeding()three times in a row; no errors, no duplicate stream tables. Dependencies: DF-F4. Schema change: No.
STAB-2 — Auto-apply handles ALTER STREAM TABLE failure gracefully
The auto-apply post-tick hook reads
df_threshold_adviceand issuesALTER STREAM TABLE … SET auto_threshold = <recommended>. If the stream table was dropped between the advice read and the apply (a TOCTOU race), the ALTER will error. Catch SQL errors in the post-tick hook with an appropriatematchonPgTrickleErrorand log a WARNING rather than crashing the background worker.Verify: unit test with a mocked
ALTERthat returnsERROR: relation does not exist; assert the worker logs a warning and continues to the next advice row. Dependencies: DF-G2. Schema change: No.
STAB-3 — DF STs survive DROP EXTENSION + CREATE EXTENSION cycle
DROP EXTENSION pg_trickle CASCADEdrops all extension-owned objects. AfterCREATE EXTENSION pg_trickle,setup_dog_feeding()should recreate the DF STs cleanly. There must be no leftover triggers, orphaned change buffer tables, or stale catalog rows from the previous installation. This is the most likely failure mode after an emergency rollback + reinstall.Verify: E2E test:
setup_dog_feeding()→DROP EXTENSION CASCADE→CREATE EXTENSION→setup_dog_feeding()→ insert history → refresh DF-1; assert correct aggregates. Dependencies: DF-F4, DF-F5. Schema change: No.
STAB-4 — Auto-apply worker checks ST still exists before applying
Before issuing
ALTER STREAM TABLE, the worker should confirm the ST is still inpgt_stream_tablesand is not in SUSPENDED or FUSED state. Applying a threshold change to a SUSPENDED ST is harmless but wasteful; applying to a FUSED ST is wrong (the fuse exists for a reason). Add a pre-apply guard in the Rust post-tick hook.Verify: E2E test suspending an ST manually while auto-apply is enabled; assert no threshold change is applied-to a suspended stream table. Dependencies: DF-G2. Schema change: No.
STAB-5 — teardown_dog_feeding() is safe when some DF STs already removed
If a user manually drops
df_anomaly_signalsbefore callingteardown_dog_feeding(), the teardown function must not error onDROP STREAM TABLE df_anomaly_signals. Usedrop_stream_table(name, if_exists => true)semantics for each DF table in the teardown. Otherwise a partial teardown leaves the system in an inconsistent state.Verify: drop two DF STs manually, then call
teardown_dog_feeding(); assert no errors and remaining DF STs are gone. Dependencies: DF-F5. Schema change: No.
Performance
| ID | Title | Effort | Priority |
|---|---|---|---|
| PERF-1 | Index on pgt_refresh_history(pgt_id, start_time) for DF queries | XS | P0 |
| PERF-2 | Benchmark DF-1 vs refresh_efficiency() on 10 K history rows | S | P0 |
| PERF-3 | Dog-feeding scheduler overhead target: < 1% of total CPU | S | P1 |
| PERF-4 | DF-5 self-join uses bounded index scan, not seq-scan | S | P1 |
| PERF-5 | History pruning batch-DELETE with short transactions (no CDC lock contention) | S | P1 |
| PERF-6 | Columnar change tracking Phase 1 — CDC bitmask (deferred from v0.17/v0.18) | M | P1 |
PERF-1 — Index on pgt_refresh_history(pgt_id, start_time) for DF queries
All five DF stream tables filter
pgt_refresh_historyon(pgt_id, start_time). Without a composite index on these columns the rolling-window WHERE clause forces a sequential scan of the growing history table. Verify the index was created during extension install (check the upgrade migration); if missing, add it as part of the 0.19.0 → 0.20.0 migration script.Verify:
EXPLAIN (FORMAT TEXT) SELECT … FROM pgtrickle.pgt_refresh_history WHERE pgt_id = 1 AND start_time > now() - interval '1 hour'shows an index scan. Schema change: Yes (index addition in migration script).
PERF-2 — Benchmark DF-1 vs refresh_efficiency() on 10 K history rows
The primary performance claim of dog-feeding is that a maintained DIFFERENTIAL stream table is cheaper than scanning the full history table on every diagnostic call. Establish a Criterion micro-benchmark that seeds 10 K history rows, then compares: (a) a full
SELECT * FROM pgtrickle.refresh_efficiency()call vs (b) aSELECT * FROM pgtrickle.df_efficiency_rollingread after one incremental refresh. The benchmark documents the win concretely.Verify: Criterion benchmark shows DF-1 read is at least 5× faster than
refresh_efficiency()at 10 K rows. Included inbenches/and run in CI. Dependencies: DF-F2. Schema change: No.
PERF-3 — Dog-feeding scheduler overhead target: < 1% of total CPU
Five DF STs at 48–96 s schedules add background refresh work. Under a realistic load (20 user STs, 10 K history rows), the total time spent refreshing DF STs should be < 1% of total scheduler CPU. Measure in the E2E soak test by comparing scheduler loop busy-time with and without DF STs. If overhead exceeds 1%, relax schedules to 120 s or move DF STs to
refresh_tier = 'cold'.Verify: soak test reports DF refresh overhead as a fraction of total scheduler CPU; assert < 1%. Dependencies: DF-D4. Schema change: No.
PERF-4 — DF-5 self-join uses bounded index scan, not seq-scan
df_scheduling_interferencejoinspgt_refresh_historyto itself on an overlap condition with a 1-hour bound. Without the index from PERF-1 this double-scan is O(N²) in history rows. Verify EXPLAIN shows nested-loop index scans (not hash or merge join over full table) for both sides of the self-join. If the planner chooses a seq-scan, addenable_seqscan = offfor the DF-5 query or restructure with a CTE.Verify: EXPLAIN of DF-5 query shows index scans on both sides of the JOIN. Dependencies: PERF-1, DF-C2. Schema change: No.
PERF-5 — History pruning batch-DELETE with short transactions
pg_trickle.history_retention_dayscleanup (shipped in v0.19.0) currently deletes rows in a single long transaction. Under dog-feeding, that transaction holds a lock onpgt_refresh_historythat can delay CDC trigger INSERTs. Rewrite the purge as batched DELETEs: delete at most 500 rows per transaction, commit between batches, sleep 50 ms between batches. The index from PERF-1 ensures each batch is an index-range scan, not a seq-scan.Verify: soak test running history purge concurrently with DF CDC trigger INSERTs; no lock wait timeout observed. Batch size configurable via
pg_trickle.history_purge_batch_sizeGUC (default 500). Dependencies: PERF-1. Schema change: No.
PERF-6 — Columnar change tracking Phase 1 — CDC bitmask
Deferred from v0.17.0 (twice) and v0.18.0. Dog-feeding now provides concrete internal workload data that justifies the schema change. Phase 1 only: compute
changed_columnsbitmask (old.col IS DISTINCT FROM new.col) in the CDC trigger for UPDATE rows; store asint8in the change buffer. Phase 2 (delta-scan filtering using the bitmask) deferred to v0.22.0. Gate behindpg_trickle.columnar_trackingGUC (defaultoff). This is the foundation for 50–90% delta volume reduction on wide-table UPDATE workloads.Verify: UPDATE a 20-column row, changing 2 columns; assert
changed_columnsbitmask has exactly 2 bits set.just check-upgrade-allpasses. Dependencies: None. Schema change: Yes (change buffer schema addition + migration script).
Scalability
| ID | Title | Effort | Priority |
|---|---|---|---|
| SCAL-1 | DF STs refresh within window at 100 user stream tables | S | P1 |
| SCAL-2 | pgt_refresh_history retention interacts correctly with dog-feeding | S | P1 |
| SCAL-3 | 1-hour rolling window doesn't over-aggregate when history is sparse | XS | P2 |
SCAL-1 — DF STs refresh within window at 100 user stream tables
With 100 user STs generating up to 100 history rows per 48 s window, DF-1 processes up to ~7,500 rows/hour. Verify that the DIFFERENTIAL refresh of DF-1 completes within its 48 s schedule interval at this load, leaving margin for DF-2 and DF-3. If DF-1 duration exceeds 10 s, investigate query plan and index usage. Run as part of the soak-test at high table count.
Verify: soak test with 100 STs; DF-1 refresh duration < 10 s throughout. Dependencies: PERF-1. Schema change: No.
SCAL-2 — pgt_refresh_history retention interacts correctly with dog-feeding
pg_trickle.history_retention_days(shipped in v0.19.0, default 90 days) purges old history rows. DF-1 only looks back 1 hour, so retention does not affect correctness. However the purge job must not hold a long-running lock that delays CDC trigger firing on concurrent INSERT into the history table. Verify that the cleanup job uses a DELETE … RETURNING batch strategy with short transactions to avoid blocking DF CDC triggers.Verify: E2E test running the history purge job while DF-1 is being refreshed; no lock wait timeout, no CDC trigger delay. Dependencies: DF-F1. Schema change: No.
SCAL-3 — 1-hour rolling window doesn't over-aggregate when history is sparse
For a stream table that refreshes every 30 minutes (2 refreshes/hour), the DF-1 1-hour window contains at most 2 rows. The
AVG()aggregate is still meaningful, butpercentile_cont(0.95)over 2 rows is misleading. Document the minimum sample size (in theconfidencecolumn of DF-3) and add a note in SQL_REFERENCE.md that DF stats are most meaningful for STs refreshing every 60 s or faster.Verify: SQL_REFERENCE.md updated;
confidence = 'LOW'for STs withtotal_refreshes < 10. Dependencies: DF-A2. Schema change: No.
Ease of Use
| ID | Title | Effort | Priority |
|---|---|---|---|
| UX-1 | pgtrickle.dog_feeding_status() diagnostic function | S | P0 |
| UX-2 | setup_dog_feeding() warm-up hint when history is sparse | XS | P1 |
| UX-3 | NOTIFY on anomaly via pg_trickle_alert channel | S | P1 |
| UX-4 | GETTING_STARTED.md: "Day 2 operations" section | S | P1 |
| UX-5 | explain_st() shows if a DF ST covers the queried stream table | XS | P2 |
| UX-6 | recommend_refresh_mode() exposed in explain_st() JSON output | XS | P2 |
| UX-7 | scheduler_overhead() output included in TUI diagnostics panel | XS | P2 |
| UX-8 | df_threshold_advice extended with SLA headroom column | S | P2 |
UX-1 — pgtrickle.dog_feeding_status() diagnostic function
A single-query overview of the dog-feeding analytics plane: name, last refresh timestamp, row count, and whether the DF ST is ACTIVE / SUSPENDED / NOT_CREATED. Calling this function is the first thing an operator should run to check that dog-feeding is working. Return type:
TABLE(df_name text, status text, last_refresh timestamptz, row_count bigint, note text).Verify: function returns 5 rows when all DF STs are active; returns rows with
status = 'NOT_CREATED'whensetup_dog_feeding()has not been called. Schema change: No (new function only).
UX-2 — setup_dog_feeding() warm-up hint when history is sparse
If
pgt_refresh_historyhas fewer than 50 rows whensetup_dog_feeding()is called, emit a NOTICE:"Dog-feeding stream tables created. DF analytics will populate as refresh history accumulates (currently N rows; recommend ≥ 50 before consulting df_threshold_advice)."This prevents operators from acting on meaningless LOW-confidence advice immediately after setup.Verify: call
setup_dog_feeding()on a fresh install; assert NOTICE contains the row count and the ≥ 50 recommendation. Dependencies: DF-F4. Schema change: No.
UX-3 — NOTIFY on anomaly via pg_trickle_alert channel
When
df_anomaly_signalsdetects aduration_anomaly IS NOT NULLorrecent_failures >= 2after a refresh, emit apg_notify('pg_trickle_alert', payload::text)withevent = 'dog_feed_anomaly', the stream table name, anomaly type, last duration, baseline, and a plain-English recommendation. This integrates with existing alert pipelines without requiring a new channel. Fires from a post-refresh trigger ondf_anomaly_signalsor from the auto-apply post-tick hook.Verify: E2E test LISTEN on
pg_trickle_alert; inject a 3× duration spike; assert NOTIFY payload arrives with correct anomaly type. Dependencies: DF-A1. Schema change: No.
UX-4 — GETTING_STARTED.md: "Day 2 operations" section
Add a new section to
docs/GETTING_STARTED.mdcovering the first steps after initial deployment: (1) enable dog-feeding withsetup_dog_feeding(), (2) check status withdog_feeding_status(), (3) querydf_threshold_adviceto tune thresholds, (4) set up anomaly alerting via LISTEN. This gives new users a clear post-install checklist and demonstrates the dog-feeding value proposition immediately.Verify: documentation PR reviewed; code examples in GETTING_STARTED.md execute without modification. Dependencies: UX-1, UX-2. Schema change: No.
UX-5 — explain_st() shows if a DF ST covers the queried stream table
When a user calls
pgtrickle.explain_st('my_table'), append a line"Dog-feeding coverage: df_efficiency_rolling ✓, df_threshold_advice ✓"(or"Not set up — run setup_dog_feeding()") to the output. This surfaces the analytics plane to users who might not know dog-feeding exists, without requiring a separate function call.Verify:
SELECT explain_st('any_table')output includes adog_feedingfield in the JSON output. Dependencies: UX-1. Schema change: No.
UX-8 — df_threshold_advice extended with SLA headroom column
Extend the DF-3 defining query to include a computed
sla_headroom_mscolumn:freshness_deadline_ms - avg_diff_msfrompgt_refresh_history. Whensla_headroom_ms < 0, add a booleansla_breach_risk = trueflag so operators can see at a glance which STs risk missing their freshness SLA on the next DIFFERENTIAL cycle. Thefreshness_deadlinecolumn already exists inpgt_refresh_history(since v0.2.3). No schema change required.Verify: create an ST with a tight
freshness_deadline; run slow synthetic refreshes; assertdf_threshold_advice.sla_breach_risk = true. Dependencies: DF-A2. Schema change: No (view column addition only).
UX-6 — recommend_refresh_mode() exposed in explain_st() JSON output
explain_st()already shows dog-feeding coverage (UX-5). Extend its JSON output with arecommended_modefield reading fromdf_threshold_advice(OPS-1). If OPS-1 is not available (no DF setup), fall back tonullwith asetup_dog_feeding()hint. Keeps the single-function diagnostic surface comprehensive without requiring separate calls.Verify:
SELECT explain_st('any_table')JSON includesrecommended_modeandmode_confidencefields. Dependencies: OPS-1. Schema change: No.
UX-7 — scheduler_overhead() output included in TUI diagnostics panel
The TUI (
pgtrickle-tui) already shows refresh latency sparklines and ST status. Add a diagnostics panel (toggle keyD) showing the fields fromscheduler_overhead(): busy ratio, queue depth, and DF fraction as a percentage. Gives operators hands-on observability without needing psql.Verify: TUI diagnostics panel shows all three scheduler overhead fields;
df_refresh_fractionupdates after each DF refresh cycle. Dependencies: OPS-3. Schema change: No.
Test Coverage
| ID | Title | Effort | Priority |
|---|---|---|---|
| TEST-1 | Property test: DF-3 recommended threshold always ∈ [0.01, 0.80] | S | P0 |
| TEST-2 | Light E2E: dog-feeding create/refresh/teardown full cycle | S | P0 |
| TEST-3 | Upgrade test: pgt_refresh_history rows survive 0.19.0 → 0.20.0 | S | P0 |
| TEST-4 | Regression test: DF STs absent from check_cdc_health() anomaly list | XS | P1 |
| TEST-5 | Stability test: dog-feeding under 1-h soak with 50 user STs | M | P1 |
| TEST-6 | Light E2E: setup_dog_feeding() idempotency (3× call) | XS | P1 |
TEST-1 — Property test: DF-3 recommended threshold always ∈ [0.01, 0.80]
Implements CORR-1 as a
proptestunit test. Generate random(avg_diff_ms: 0.0–100_000.0, avg_full_ms: 0.0–100_000.0, current: 0.01–0.80)triples, compute the DF-3 CASE expression in Rust, assert output ∈ [0.01, 0.80]. Can be a pure Rust unit test insrc/refresh.rsalongside the existingcompute_adaptive_thresholdtests — no database required.Verify:
just test-unitpasses; 10,000 proptest iterations with zero failures. Dependencies: CORR-1. Schema change: No.
TEST-2 — Light E2E: dog-feeding create/refresh/teardown full cycle
A light E2E test (stock
postgres:18.3container) that: (1) installs the extension, (2) creates 3 user STs, (3) runs 5 refresh cycles to populate history, (4) callssetup_dog_feeding(), (5) refreshes all DF STs once, (6) assertsdog_feeding_status()shows 5 active STs, (7) callsteardown_dog_feeding(), (8) asserts all DF STs are gone.Verify: test passes in
just test-light-e2ewith zero assertions failed. Schema change: No.
TEST-3 — Upgrade test: pgt_refresh_history rows survive 0.19.0 → 0.20.0
The 0.19.0 → 0.20.0 migration adds an index to
pgt_refresh_history(PERF-1). The upgrade must not truncate, reorder, or modify existing history rows. Write an upgrade E2E test: deploy 0.19.0, run 10 refreshes,ALTER EXTENSION pg_trickle UPDATE, assert all 10 history rows are intact and the new index exists.Verify: upgrade E2E test passes;
SELECT count(*) FROM pgt_refresh_historyunchanged after upgrade. Schema change: Yes (index).
TEST-4 — Regression test: DF STs absent from check_cdc_health() anomaly list
pgtrickle.check_cdc_health()scans all stream tables for CDC anomalies. Aftersetup_dog_feeding(), DF STs must not appear in the anomaly list just because they are refreshed at longer intervals (48–96 s). Their schedules must be recognised as intentionally relaxed, not "falling behind".Verify: E2E test:
setup_dog_feeding()→ wait one full DF cycle → assertcheck_cdc_health()returns no anomalies for anydf_table. Dependencies: DF-F4. Schema change: No.
TEST-5 — Stability test: dog-feeding under 1-h soak with 50 user STs
Extends DF-D4. Runs 50 user STs + 5 DF STs for 1 hour under steady insert load (1 000 rows/min across all sources). Assertions: (a) all DF STs remain ACTIVE, (b) no OOM or background worker crash, (c) DF-1 avg refresh duration < 5 s throughout, (d)
pgtrickle.dog_feeding_status()shows 5 active STs at end of run.Verify: soak test passes with all four assertions. Dependencies: DF-D4, SCAL-1. Schema change: No.
TEST-6 — Light E2E: setup_dog_feeding() idempotency (3× call)
Implements STAB-1 as a light E2E test. Call
setup_dog_feeding()three consecutive times in the same session. Assert: no errors, exactly fivedf_stream tables inpgt_stream_tables, no duplicate triggers inpg_triggerfor history table.Verify: test passes in
just test-light-e2e;SELECT count(*) FROM pgtrickle.pgt_stream_tables WHERE pgt_name LIKE 'df_%' = 5after all three calls. Dependencies: STAB-1. Schema change: No.
Conflicts & Risks
-
PERF-1 (index addition) requires a migration script change. Adding
CREATE INDEX CONCURRENTLYto the 0.19.0 → 0.20.0 migration must be tested withjust check-upgrade-all.CONCURRENTLYcannot run inside a transaction block — the migration must issue it outside the default single-transaction DDL wrapper. -
UX-3 (NOTIFY on anomaly) fires from a post-refresh path. If the
pg_notify()call fails (e.g., payload too large), it must not roll back the DF-2 refresh. Wrap the notify in aBEGIN … EXCEPTION WHEN OTHERS THEN NULL ENDblock, or fire it from a deferred trigger. -
STAB-3 (DROP EXTENSION cycle) requires DF STs to be extension-owned or cleanly unregistered. If DF STs are not extension-owned objects,
DROP EXTENSION CASCADEwill not drop them. Either register them as extension members or document thatteardown_dog_feeding()must be called beforeDROP EXTENSION. -
TEST-5 (soak test) overlaps with the existing soak test in CI. Add it to the daily
stability-tests.ymlworkflow rather thanci.ymlto avoid extending PR CI time. Mark with#[ignore]and trigger viajust test-soak. -
CORR-5 / PERF-4 interaction. The
start_time > now() - interval '1 hour'boundary and the index depend on the planner choosing an index range scan. On very busy deployments where the cardinality estimate is off, the planner may prefer a seq-scan. Consider addingSET enable_seqscan = offinside the DF stream table queries if plan stability is a concern. -
PERF-6 (columnar tracking) is a schema change — deferred twice already. The
changed_columnscolumn addition to all change buffer tables requires a migration script. Gate strictly behindpg_trickle.columnar_tracking = offdefault. If capacity is tight, PERF-6 can be cut from v0.20.0 without affecting any other item — it shares no code paths with the DF pipeline. -
OPS-2 (
check_cdc_health()enrichment) has a fallback requirement. Whensetup_dog_feeding()has not been called, the function must fall back to the old full-scan path without error. Guard with a catalog check fordf_cdc_buffer_trendsexistence before querying it. -
OPS-4 (
explain_dag()) output size. At 100+ user STs the Mermaid output may exceed typical terminal width. Offerformat => 'dot'andlimit => Narguments to constrain output. Defaultformat => 'mermaid'with aNOTICEwhen DAG has > 20 nodes. -
OPS-6 (workload-aware poll) writes to the scheduler hot path. The
compute_adaptive_poll_ms()function is called on every scheduler tick. The DF-5 read must be a single O(1) catalog lookup (latest row only), not a full table scan. Guard withLIMIT 1 ORDER BY collected_at DESC. If the DF-5 table does not exist (dog-feeding not set up), fall back to the old backoff logic without error. -
DASH-1 (Grafana) depends on postgres-exporter SQL queries. The dashboard panels use custom SQL collectors in the postgres-exporter config. Verify that
monitoring/docker-compose already mounts query config; if not, add apg_trickle_df_queries.yamlcollector file alongside the existing exporter config. -
DBT-1 macro idempotency. The
pgtrickle_enable_monitoringmacro callssetup_dog_feeding()on everydbt run. Document that this is intentionally safe (STAB-1) and adds < 5 ms overhead per run.
v0.20.0 total: ~3–4 weeks
Exit criteria:
-
DF-F1:
pgt_refresh_historyreceives CDC INSERT triggers whencreate_stream_table()is called -
DF-F2:
df_efficiency_rollingcreated and refreshes correctly in DIFFERENTIAL mode -
DF-F3: DF-1 output matches
refresh_efficiency()results on synthetic history -
DF-F4:
setup_dog_feeding()creates all fivedf_*stream tables in one call -
DF-F5:
teardown_dog_feeding()drops alldf_*tables cleanly with no orphaned triggers -
DF-A1:
df_anomaly_signalscreated and detects 3× duration spikes -
DF-A2:
df_threshold_adviceprovides HIGH-confidence recommendations after ≥ 20 refresh cycles - DF-A3: DAG ensures DF-1 refreshes before DF-2 and DF-3 in every scheduler tick
-
DF-C1:
df_cdc_buffer_trendscreated (FULL or DIFFERENTIAL mode) -
DF-C2:
df_scheduling_interferencedetects overlapping concurrent refreshes -
DF-G1:
pg_trickle.dog_feeding_auto_applyGUC registered with defaultoff - DF-G2: Auto-apply adjusts threshold with ≥ 1 confirmed change in E2E test
- DF-G5: Rate limiting verified — no more than 1 change per ST per 10 minutes
-
DF-D3: Suspending all
df_*STs does not affect control-plane operation -
CORR-1:
df_threshold_adviceoutput always within [0.01, 0.80] (property test) - CORR-2: No false-positive DURATION_SPIKE on first-ever refresh of a new ST
-
CORR-3:
avg_change_ratiois NULL or in [0, 1] for zero-delta sources -
CORR-4: Only INSERT triggers (no UPDATE/DELETE) on
pgt_refresh_history -
STAB-1:
setup_dog_feeding()called 3× produces no errors and no duplicates - STAB-2: Auto-apply worker logs WARNING (not panic) when ALTER target disappears
-
STAB-3: DROP EXTENSION + CREATE EXTENSION +
setup_dog_feeding()cycle works cleanly -
PERF-1:
pgt_refresh_history(pgt_id, start_time)index exists and is used by DF queries -
PERF-2: DF-1 read ≥ 5× faster than
refresh_efficiency()at 10 K history rows -
UX-1:
pgtrickle.dog_feeding_status()returns correct status for all five DF STs -
UX-2:
setup_dog_feeding()emits warm-up NOTICE when history has < 50 rows -
UX-3:
pg_trickle_alertNOTIFY received within one DF cycle after a 3× duration spike - TEST-1: Proptest for DF-3 threshold bounds passes 10,000 iterations
- TEST-2: Light E2E full cycle test passes
-
TEST-3: Upgrade E2E: history rows intact and index present after
0.19.0 → 0.20.0 -
TEST-4:
check_cdc_health()reports no anomalies fordf_*tables after setup -
OPS-1:
recommend_refresh_mode()returnsmode∈{'DIFFERENTIAL','FULL','AUTO'}andconfidence∈{'HIGH','MEDIUM','LOW'} -
OPS-2:
check_cdc_health()returns spill-risk alert when buffer growth rate extrapolates to breach threshold within 2 cycles -
OPS-3:
scheduler_overhead()returns non-NULL fields after ≥ 5 refresh cycles;df_refresh_fraction < 0.01in soak test -
OPS-4:
explain_dag()output contains all fivedf_*nodes aftersetup_dog_feeding() -
OPS-5:
sql/dog_feeding_setup.sqlexecutes without errors on a fresh install - PERF-5: Concurrent history purge + DF CDC INSERT produces no lock wait timeouts in soak test
-
PERF-6:
changed_columnsbitmask stored in change buffer for UPDATE rows whencolumnar_tracking = on(if included) -
OPS-6: Soak test shows lower
overlap_countin DF-5 with workload-aware poll enabled vs disabled -
DASH-1:
docker compose upinmonitoring/loads pg_trickle_dog_feeding dashboard; all 5 panels show data -
DBT-1:
pgtrickle_enable_monitoringmacro runs twice without error;dog_feeding_status()shows 5 active STs after both calls -
UX-8:
df_threshold_advice.sla_breach_risk = truewhenavg_diff_ms > freshness_deadline_mson synthetic data -
Extension upgrade path tested (
0.19.0 → 0.20.0) -
just check-version-syncpasses
v0.21.0 — PostgreSQL 17 Support
Release Theme This release adds PostgreSQL 17 as a supported target alongside PostgreSQL 18. PGlite is built on PostgreSQL 17, so this is a hard prerequisite for the PGlite proof of concept (v0.22.0). The pgrx 0.17.x framework already supports PG 17 — the work is enabling the feature flag, adapting version-sensitive code paths, expanding the CI matrix, and validating the full test suite against a PG 17 instance.
Cargo & Build System
| Item | Description | Effort | Ref |
|---|---|---|---|
| PG17-1 | Add pg17 feature to Cargo.toml. Define pg17 = ["pgrx/pg17", "pgrx-tests/pg17"] feature. Keep default = ["pg18"]. | 1h | — |
| PG17-2 | Broaden #[cfg] guards in src/dag.rs. Three #[cfg(feature = "pg18")] blocks must become #[cfg(any(feature = "pg17", feature = "pg18"))]. | 1–2h | — |
| PG17-3 | Guard NodeTag numeric assertions. src/dvm/parser/mod.rs asserts specific NodeTag integer values (e.g., T_GroupingSet = 107) that shift between PG versions. Gate behind #[cfg(feature = "pg18")] or use per-version value tables. | 2–4h | — |
| PG17-4 | Audit pg_sys::* API surface. Verify that every pg_sys call compiles and behaves correctly on PG 17 bindings. Focus on catalog struct field names, WAL decoder types, and any PG 18-only additions. | 4–8h | — |
CI & Infrastructure
| Item | Description | Effort | Ref |
|---|---|---|---|
| PG17-5 | CI matrix expansion. Add PG 17 build + unit test job to ci.yml. Use postgres:17 Docker image for integration and light E2E tests. | 4–8h | — |
| PG17-6 | justfile parameterisation. Add pg17 variants for build, test, and package recipes (e.g., just build-pg17, just test-e2e-pg17). | 2–4h | — |
| PG17-7 | tests/Dockerfile.e2e PG version parameter. Accept a build arg for the base PostgreSQL image version so the same Dockerfile works for PG 17 and PG 18. | 2–4h | — |
| PG17-8 | Scripts parameterisation. Update run_unit_tests.sh, run_light_e2e_tests.sh, run_e2e_tests.sh to accept a PG version argument instead of hardcoding pg18. | 2–4h | — |
Testing & Validation
| Item | Description | Effort | Ref |
|---|---|---|---|
| PG17-9 | Full E2E suite against PG 17. Run the complete E2E test suite against a PG 17 instance. Fix any parser or catalog incompatibilities that surface. | 1–2d | — |
| PG17-10 | TPC-H validation on PG 17. Run TPC-H benchmark queries on PG 17 to verify differential refresh correctness for complex queries. | 4–8h | — |
| PG17-11 | Upgrade path test. Verify ALTER EXTENSION pg_trickle UPDATE from 0.19.0 to 0.20.0 works on both PG 17 and PG 18. | 2–4h | — |
Documentation
| Item | Description | Effort | Ref |
|---|---|---|---|
| PG17-12 | Update docs and README. Change "PostgreSQL 18 extension" to "PostgreSQL 17/18 extension" in README.md, INSTALL.md, src/lib.rs doc comments, and ARCHITECTURE.md. | 1–2h | — |
| PG17-13 | Docker Hub image variants. Publish images tagged with both PG versions (e.g., :0.20.0-pg17, :0.20.0-pg18). | 2–4h | — |
v0.21.0 total: ~2–4 days
Exit criteria:
-
PG17-1:
cargo build --features pg17 --no-default-featurescompiles cleanly -
PG17-2/PG17-3:
cargo clippy --features pg17 --no-default-featurespasses with zero warnings -
PG17-4: No
pg_syscompile errors on PG 17 bindings - PG17-5: CI runs unit + integration + light E2E tests on PG 17
- PG17-9: Full E2E suite passes on PG 17 with zero failures
- PG17-10: TPC-H differential refresh matches full refresh on PG 17
- PG17-11: Extension upgrade path works on both PG 17 and PG 18
- PG17-12: Documentation reflects PG 17/18 dual support
-
Extension upgrade path tested (
0.20.0 → 0.21.0) -
just check-version-syncpasses
v0.22.0 — PGlite Proof of Concept
Release Theme This release validates whether PGlite users want real incremental view maintenance by shipping a lightweight TypeScript plugin with zero core changes. The plugin (
@pgtrickle/pglite-lite) intercepts DML via statement-level AFTER triggers and applies pre-computed delta SQL for simple patterns — single-table aggregates, two-table inner joins, and filtered scans. It deliberately limits scope to 3–5 SQL patterns to keep effort low while generating a concrete demand signal. If adoption materialises, the full core extraction (v0.23.0) and WASM build (v0.24.0) proceed. The main pg_trickle PostgreSQL extension ships no functional changes in this release — only version bumps and upgrade migration plumbing.
See PLAN_PGLITE.md for the full feasibility report.
PGlite JS Plugin PoC (Strategy C — Phase 0)
In plain terms: PGlite's built-in
live.incrementalQuery()re-runs the full query on every change and diffs at the JavaScript layer. This proof of concept ships a PGlite plugin (@pgtrickle/pglite-lite) that intercepts DML via statement-level AFTER triggers and applies pre-computed delta SQL for simple cases — single-table aggregates and two-table inner joins. It validates whether PGlite users want real IVM and whether the trigger infrastructure works correctly in PGlite's single-user WASM mode. No WASM compilation, no pgrx changes, no core refactoring required.
| Item | Description | Effort | Ref |
|---|---|---|---|
| PGL-0-1 | PGlite trigger infrastructure validation. Empirically verify that statement-level triggers with REFERENCING NEW TABLE AS ... OLD TABLE AS ... work in PGlite's single-user mode. Document any limitations. | 4–8h | PLAN_PGLITE.md §8 Q1 |
| PGL-0-2 | Delta SQL templates for simple patterns. Implement delta SQL generation in TypeScript for: (a) single-table GROUP BY with COUNT/SUM/AVG, (b) two-table INNER JOIN, (c) simple WHERE filter. Pre-compute at createStreamTable() time. | 2–3d | PLAN_PGLITE.md §5 Strategy C |
| PGL-0-3 | PGlite plugin skeleton. TypeScript plugin implementing createStreamTable(), dropStreamTable(), trigger registration, and delta application via PGlite's plugin API. | 2–3d | PLAN_PGLITE.md §5 Strategy C |
| PGL-0-4 | npm package @pgtrickle/pglite-lite. Package, publish, README with usage examples, and 3–5 supported SQL patterns documented. | 1–2d | — |
| PGL-0-5 | Benchmark vs live.incrementalQuery(). Compare latency and throughput for a 10K-row table with single-row inserts. Quantify the IVM advantage. | 1d | PLAN_PGLITE.md §4.2 |
Phase 0 subtotal: ~2–3 weeks
Correctness
| ID | Title | Effort | Priority |
|---|---|---|---|
| CORR-1 | Delta SQL equivalence for supported patterns | M | P0 |
| CORR-2 | NULL-key aggregate correctness in JS delta | S | P0 |
| CORR-3 | Multi-DML transaction atomicity | S | P1 |
CORR-1 — Delta SQL equivalence for supported patterns
In plain terms: The TypeScript delta SQL templates must produce the exact same stream table state as a full query re-evaluation, for every combination of INSERT, UPDATE, and DELETE on the supported patterns (single-table GROUP BY + COUNT/SUM/AVG, two-table INNER JOIN, simple WHERE filter). Correctness is proven by running each DML operation, comparing the delta-maintained result against a fresh
SELECT, and asserting row-for-row equivalence.
Verify: automated test suite runs 100+ randomised DML sequences per pattern; zero divergence from full re-evaluation. Dependencies: PGL-0-2, PGL-0-3. Schema change: No.
CORR-2 — NULL-key aggregate correctness in JS delta
In plain terms: When a GROUP BY key is NULL, SQL three-valued logic means
GROUP BY NULLforms its own group. The TypeScript delta templates must handle NULL group keys correctly — insertions into the NULL group, deletions that empty it, and updates that move rows in/out of the NULL group. This is the most common correctness pitfall in hand-rolled IVM.
Verify: E2E test with nullable GROUP BY column; assert NULL group appears, grows, shrinks, and disappears correctly. Dependencies: CORR-1. Schema change: No.
CORR-3 — Multi-DML transaction atomicity
In plain terms: PGlite runs in single-connection mode, so a
BEGIN; INSERT ...; DELETE ...; COMMITsequence fires two separate statement-level triggers. The plugin must ensure the stream table reflects the net effect of the entire transaction, not an intermediate state. If trigger ordering produces incorrect intermediate results, a post-transaction reconciliation pass is needed.
Verify: test with BEGIN; INSERT; UPDATE; DELETE; COMMIT on a single base
table; stream table matches full re-evaluation after commit.
Dependencies: PGL-0-3. Schema change: No.
Stability
| ID | Title | Effort | Priority |
|---|---|---|---|
| STAB-1 | Trigger cleanup on dropStreamTable | S | P0 |
| STAB-2 | Graceful error on unsupported SQL | S | P0 |
| STAB-3 | Plugin idempotency (create-drop-create cycle) | S | P1 |
STAB-1 — Trigger cleanup on dropStreamTable
In plain terms: When a user calls
dropStreamTable(), all statement- level AFTER triggers registered on source tables must be removed. Orphaned triggers would fire on every subsequent DML and attempt to write to a non-existent stream table, causing errors.
Verify: after dropStreamTable(), no pg_trickle-related triggers remain in
pg_trigger for the source tables.
Dependencies: PGL-0-3. Schema change: No.
STAB-2 — Graceful error on unsupported SQL
In plain terms: The PoC supports only 3–5 SQL patterns. If a user passes an unsupported query (e.g., a LEFT JOIN, window function, or recursive CTE), the plugin must throw a clear, actionable error message listing what is supported — not silently produce wrong results or crash.
Verify: createStreamTable() with an unsupported query throws an error
whose message names the unsupported feature and lists supported alternatives.
Dependencies: PGL-0-2. Schema change: No.
STAB-3 — Plugin idempotency (create-drop-create cycle)
In plain terms: Creating a stream table, dropping it, and creating it again with the same name must work without leftover state. Leftover catalog rows, triggers, or temp tables from the first creation must not interfere with the second.
Verify: create-drop-create cycle produces correct results; no duplicate triggers or stale catalog entries. Dependencies: STAB-1. Schema change: No.
Performance
| ID | Title | Effort | Priority |
|---|---|---|---|
| PERF-1 | Benchmark vs live.incrementalQuery() | M | P0 |
| PERF-2 | Delta overhead profiling per DML | S | P1 |
| PERF-3 | Large result set scalability (10K/100K rows) | S | P1 |
PERF-1 — Benchmark vs live.incrementalQuery() (= PGL-0-5)
In plain terms: The entire value proposition of this PoC depends on being faster than PGlite's built-in
live.incrementalQuery()for the supported patterns. Produce a public benchmark comparing latency and throughput for single-row inserts into a 10K-row base table across all three supported patterns (aggregate, join, filter).
Verify: delta-maintained stream table refresh latency < 50% of
live.incrementalQuery() latency for all supported patterns at 10K rows.
Dependencies: PGL-0-3, PGL-0-4. Schema change: No.
PERF-2 — Delta overhead profiling per DML
In plain terms: Measure the per-DML overhead added by the statement- level triggers. INSERT-heavy workloads should not suffer more than 2x latency increase compared to the same INSERT without pg_trickle triggers installed. Profile trigger function execution time, temp table creation, and delta DML.
Verify: microbenchmark shows per-DML overhead < 2 ms for aggregate pattern; < 5 ms for join pattern at 10K source rows. Dependencies: PGL-0-3. Schema change: No.
PERF-3 — Large result set scalability (10K/100K rows)
In plain terms: Verify that the delta approach maintains its advantage over full re-evaluation as base table size grows. At 100K rows, the delta path should be significantly faster than full re-evaluation for single-row changes.
Verify: at 100K base table rows, single-row insert refresh latency is < 10% of full query re-evaluation latency. Dependencies: PERF-1. Schema change: No.
Scalability
| ID | Title | Effort | Priority |
|---|---|---|---|
| SCAL-1 | Multiple stream tables on same source | S | P1 |
| SCAL-2 | Cascading stream table triggers | M | P2 |
| SCAL-3 | Concurrent DML with multiple stream tables | S | P2 |
SCAL-1 — Multiple stream tables on same source
In plain terms: Verify that 3+ stream tables can be maintained from the same base table simultaneously. Each DML fires one trigger per stream table; ensure triggers do not interfere with each other.
Verify: 3 stream tables on the same source; INSERT + UPDATE + DELETE cycle; all 3 produce correct results. Dependencies: PGL-0-3. Schema change: No.
SCAL-2 — Cascading stream table triggers
In plain terms: If stream table B reads from stream table A's underlying storage, an INSERT into A's source should propagate through A's trigger, update A, and then fire B's trigger to update B — all within the same PGlite transaction. Verify this works in PGlite's single-connection environment without deadlocks or infinite trigger loops.
Verify: A->B cascade produces correct results for INSERT/DELETE on A's source. No infinite loops detected. Dependencies: SCAL-1. Schema change: No.
SCAL-3 — Concurrent DML with multiple stream tables
In plain terms: PGlite is single-connection, but a user could issue rapid sequential DML (
INSERT; INSERT; INSERT) without explicit transactions. Verify all stream tables converge to the correct state.
Verify: 100 sequential INSERTs with 3 stream tables; final state matches full re-evaluation. Dependencies: SCAL-1. Schema change: No.
Ease of Use
| ID | Title | Effort | Priority |
|---|---|---|---|
| UX-1 | Getting-started README with copy-paste examples | S | P0 |
| UX-2 | Supported patterns decision table | XS | P0 |
| UX-3 | Error messages include remediation hints | S | P1 |
| UX-4 | TypeScript type definitions | S | P1 |
| UX-5 | ElectricSQL outreach and collaboration | S | P1 |
UX-1 — Getting-started README with copy-paste examples
In plain terms: The npm package README must include 3 complete, copy-pasteable examples — one per supported pattern — that a developer can run in under 2 minutes. Include Node.js and browser (Vite) examples.
Verify: all README examples execute without modification on a fresh PGlite instance. Dependencies: PGL-0-4. Schema change: No.
UX-2 — Supported patterns decision table
In plain terms: A clear table showing which SQL patterns are and are not supported, what error you get for unsupported patterns, and when full support is expected (v0.24.0). This prevents user frustration and sets expectations.
Verify: decision table in README and npm page lists all tested patterns with status (supported / unsupported / planned). Dependencies: None. Schema change: No.
UX-3 — Error messages include remediation hints
In plain terms: Every error thrown by the plugin must include the table name, the failing operation, and a one-sentence hint. Example:
"LEFT JOIN is not supported in pglite-lite. Use @pgtrickle/pglite (v0.24.0+) for full SQL support, or rewrite as INNER JOIN."
Verify: all error paths tested; every error message includes a remediation sentence. Dependencies: STAB-2. Schema change: No.
UX-4 — TypeScript type definitions
In plain terms: Ship
.d.tstype definitions so TypeScript users get autocomplete and type checking forcreateStreamTable(),dropStreamTable(), and configuration options.
Verify: TypeScript project consumes the plugin with strict mode; no any
types leaked.
Dependencies: PGL-0-4. Schema change: No.
UX-5 — ElectricSQL outreach and collaboration
In plain terms: PGlite is developed by ElectricSQL. Their cooperation is essential for Phase 2 (WASM build). Initiate contact before shipping Phase 0 to gauge interest, validate assumptions about PGlite's trigger infrastructure, and explore potential co-marketing.
Verify: documented exchange with ElectricSQL team (GitHub issue, email, or meeting notes). Dependencies: None. Schema change: No.
Test Coverage
| ID | Title | Effort | Priority |
|---|---|---|---|
| TEST-1 | Automated correctness suite (all patterns x DML types) | M | P0 |
| TEST-2 | PGlite version compatibility matrix | S | P1 |
| TEST-3 | Regression test: trigger firing order | S | P1 |
| TEST-4 | Bundle size monitoring | XS | P2 |
| TEST-5 | Extension upgrade path (0.18 to 0.19) | S | P0 |
TEST-1 — Automated correctness suite (all patterns x DML types)
In plain terms: For each supported pattern (aggregate, join, filter), run every DML type (INSERT, UPDATE, DELETE, multi-row, TRUNCATE) and assert the stream table matches a fresh full evaluation. This is the primary quality gate.
Verify: Jest/Vitest test suite with > 50 test cases; all pass on PGlite latest. Dependencies: PGL-0-2, PGL-0-3. Schema change: No.
TEST-2 — PGlite version compatibility matrix
In plain terms: PGlite updates frequently. Test the plugin against the last 3 PGlite releases to ensure trigger behavior hasn't changed. Document the minimum supported PGlite version.
Verify: CI matrix runs tests against PGlite N, N-1, N-2. Dependencies: TEST-1. Schema change: No.
TEST-3 — Regression test: trigger firing order
In plain terms: When multiple triggers exist on the same table, PostgreSQL fires them in alphabetical order by trigger name. Verify that trigger naming conventions prevent ordering conflicts with user-defined triggers.
Verify: test with a user-defined AFTER trigger alongside the plugin's trigger; both fire correctly; stream table produces correct results. Dependencies: PGL-0-3. Schema change: No.
TEST-4 — Bundle size monitoring
In plain terms: The npm package should be small (< 50 KB minified + gzipped) since this is a pure-JS plugin with no WASM. Add a CI check that fails if bundle size exceeds the threshold.
Verify: npm pack --dry-run reports < 50 KB gzipped.
Dependencies: PGL-0-4. Schema change: No.
TEST-5 — Extension upgrade path (0.18 to 0.19)
In plain terms: The main pg_trickle PostgreSQL extension ships no functional changes in v0.21.0, but the upgrade migration path must still be tested.
ALTER EXTENSION pg_trickle UPDATEfrom 0.20.0 to 0.21.0 must leave existing stream tables intact.
Verify: upgrade E2E test confirms all existing stream tables survive and
refresh correctly after 0.20.0 -> 0.21.0 upgrade.
Dependencies: None. Schema change: No (PG extension unchanged).
Conflicts & Risks
-
Demand uncertainty is the primary risk. This entire milestone is a bet that PGlite users want IVM beyond what pg_ivm provides. If Phase 0 generates no adoption signal, v0.23.0–v0.25.0 should be deprioritised and v1.0.0 proceeds without PGlite. Define a concrete adoption threshold (e.g., > 100 npm weekly downloads within 60 days of publication) as a go/no-go gate for v0.23.0.
-
PGlite trigger infrastructure is unverified. PGL-0-1 (trigger validation) is a hard prerequisite for everything else. If statement-level triggers with transition tables do not work in PGlite's single-user mode, the entire Strategy C approach fails and the PoC must pivot to a pure JS diff approach (lower value).
-
PGlite version mismatch. PGlite tracks PostgreSQL 17; pg_trickle targets PG 18. The PoC operates at the SQL level and should be unaffected, but if PGlite upgrades to PG 18 mid-cycle, trigger behavior may change. Pin the minimum PGlite version in
package.json. -
No core Rust changes, but version bump required. The main pg_trickle extension needs a v0.22.0 version bump, upgrade migration SQL, and passing CI even though no functional code changes. This is low-risk but must not be forgotten.
-
ElectricSQL collaboration timing. UX-5 (outreach) should happen early — before v0.22.0 ships — to avoid building something ElectricSQL is already working on or would actively resist. If they signal interest in co-development, Phase 2 scope and timeline may shift.
-
TypeScript delta SQL correctness is harder to prove than Rust. The main extension uses property-based testing and SQLancer for correctness. The TS plugin lacks these tools. TEST-1 must be rigorously designed to compensate — consider porting the proptest approach to a JS property- testing library (e.g., fast-check).
v0.22.0 total: ~2–3 weeks (PGlite plugin) + ~1–2 days (PG extension version bump)
Exit criteria:
- PGL-0-1: Statement-level triggers with transition tables confirmed working in PGlite
- PGL-0-2: Delta SQL correct for single-table aggregate, two-table join, and filtered query
-
PGL-0-3:
@pgtrickle/pglite-liteplugin creates and maintains stream tables in PGlite - PGL-0-4: npm package published with README and usage examples
-
PGL-0-5: Benchmark shows measurable latency improvement over
live.incrementalQuery()for supported patterns - CORR-1: Automated delta SQL equivalence tests pass (100+ DML sequences per pattern)
- CORR-2: NULL-key aggregate groups correctly created, updated, and removed
- CORR-3: Multi-DML transaction produces correct net result
-
STAB-1: No orphaned triggers after
dropStreamTable() - STAB-2: Unsupported SQL patterns produce clear, actionable errors
- STAB-3: Create-drop-create cycle produces correct results
-
PERF-1: Delta refresh latency < 50% of
live.incrementalQuery()at 10K rows - PERF-3: Delta advantage holds at 100K rows (< 10% of full re-evaluation latency)
- SCAL-1: 3+ stream tables on same source produce correct results
- UX-1: README examples run unmodified on fresh PGlite instance
- UX-2: Supported patterns decision table published
- UX-4: TypeScript type definitions ship with strict-mode compatibility
- TEST-1: > 50 correctness test cases pass on PGlite latest
- TEST-2: CI tests pass against PGlite N, N-1, N-2
-
TEST-5: Extension upgrade path tested (
0.21.0 -> 0.22.0) -
just check-version-syncpasses
v0.23.0 — Core Extraction (pg_trickle_core)
Release Theme This release surgically separates pg_trickle's "brain" — the DVM engine, operator delta SQL generation, query rewrite passes, and DAG computation — into a standalone Rust crate (
pg_trickle_core) with zero pgrx dependency. The extraction touches ~51,000 lines of code across 30+ source files but produces zero user-visible behavior change: every existing test must pass unchanged. The payoff is threefold: the core crate compiles to WASM (enabling the PGlite extension in v0.24.0), pure-logic unit tests run without a PostgreSQL instance (10x faster CI), and the main extension gains a cleaner internal architecture. Approximately 500 unsafe blocks in the parser require an abstraction layer over rawpg_sysnode traversal, making this the most technically demanding refactoring in the project's history.
See PLAN_PGLITE.md §5 Strategy A for the full extraction architecture.
Core Crate Extraction (Phase 1)
In plain terms: pg_trickle's "brain" — the code that analyses SQL queries, builds operator trees, and generates delta SQL — is currently tangled with pgrx (the Rust-to-PostgreSQL bridge). This milestone surgically separates the pure logic into its own crate so it can be compiled independently. The existing extension continues to work unchanged; it just imports from
pg_trickle_coreinstead of having the code inline. Atrait DatabaseBackendabstracts SPI and parser access so the core logic can be tested without a running PostgreSQL instance.
| Item | Description | Effort | Ref |
|---|---|---|---|
| PGL-1-1 | Create pg_trickle_core crate. Workspace member with [lib] target, no pgrx dependency. Move OpTree, Expr, Column, AggExpr, and all shared types. | 1–2d | PLAN_PGLITE.md §5 Strategy A |
| PGL-1-2 | Extract operator delta SQL generation. Move all src/dvm/operators/ logic (~24K lines, 23 files) into the core crate. Each operator's generate_delta_sql() becomes a pure function taking abstract types. | 3–5d | PLAN_PGLITE.md §5 Strategy A |
| PGL-1-3 | Extract auto-rewrite passes. Move view inlining, DISTINCT ON rewrite, GROUPING SETS expansion, and SubLink extraction into pg_trickle_core::rewrites. | 2–3d | PLAN_PGLITE.md §5 Strategy A |
| PGL-1-4 | Extract DAG computation. Move dependency graph, topological sort, cycle detection, diamond detection into pg_trickle_core::dag. | 1–2d | PLAN_PGLITE.md §5 Strategy A |
| PGL-1-5 | Define trait DatabaseBackend. Abstract trait for SPI queries and raw_parser access. Implement for pgrx in the main extension crate. | 2–3d | PLAN_PGLITE.md §5 Strategy A |
| PGL-1-6 | WASM compilation gate. Verify pg_trickle_core compiles to wasm32-unknown-emscripten target. CI check for WASM build. | 1–2d | PLAN_PGLITE.md §5 Strategy A |
| PGL-1-7 | Existing test suite passes. All unit, integration, and E2E tests pass with the refactored crate structure. Zero behavior change. | 2–3d | — |
Phase 1 subtotal: ~3–4 weeks
Correctness
| ID | Title | Effort | Priority |
|---|---|---|---|
| CORR-1 | Delta SQL output byte-for-byte equivalence | M | P0 |
| CORR-2 | OpTree serialization round-trip fidelity | S | P0 |
| CORR-3 | Rewrite pass ordering preservation | S | P1 |
| CORR-4 | DAG cycle detection parity after extraction | S | P1 |
CORR-1 — Delta SQL output byte-for-byte equivalence
In plain terms: After the extraction, every operator's
generate_delta_sql()must produce the exact same SQL string as it did before the refactoring. Any byte-level difference — even whitespace — indicates a semantic shift that could change query plans or correctness. Capture the SQL output for all 22 TPC-H stream tables before and after the extraction and assert bit-for-bit equality.
Verify: snapshot test comparing delta SQL for all TPC-H queries + the full E2E test suite. Any diff fails the build. Dependencies: PGL-1-2. Schema change: No.
CORR-2 — OpTree serialization round-trip fidelity
In plain terms: The
OpTreetypes are moving to a new crate. If any field is accidentally dropped or retyped during the move, the delta SQL generator will silently produce wrong output. Add a round-trip test: serialize an OpTree to JSON, deserialize it back, and assert structural equality. This catches missing#[derive]attributes and field ordering issues.
Verify: proptest generating random OpTrees; serialize-deserialize round-trip produces identical trees. Dependencies: PGL-1-1. Schema change: No.
CORR-3 — Rewrite pass ordering preservation
In plain terms: The auto-rewrite passes (view inlining, DISTINCT ON, GROUPING SETS, SubLink extraction) must execute in the same order after extraction. Reordering could change the resulting OpTree and thereby the delta SQL. Add an integration test that runs all rewrite passes on a complex query (joining 3 tables with DISTINCT ON + GROUPING SETS) and asserts the final OpTree matches a golden snapshot.
Verify: golden-snapshot test for rewrite pass output on complex query. Dependencies: PGL-1-3. Schema change: No.
CORR-4 — DAG cycle detection parity after extraction
In plain terms: The cycle detection algorithm in
dag.rshas subtleties around self-referencing views and diamond patterns. After moving to the core crate, the algorithm must detect the same cycles. Run the existing cycle-detection unit tests and add 3 new edge cases: self-referencing CTE, diamond with mixed IMMEDIATE/DIFFERENTIAL, and 4-level cascade.
Verify: all existing DAG unit tests pass + 3 new edge-case tests. Dependencies: PGL-1-4. Schema change: No.
Stability
| ID | Title | Effort | Priority |
|---|---|---|---|
| STAB-1 | pg_sys node abstraction layer (~500 unsafe blocks) | L | P0 |
| STAB-2 | Compile-time pgrx dependency leak detection | S | P0 |
| STAB-3 | Cargo workspace configuration correctness | S | P0 |
| STAB-4 | Extension upgrade path (0.19 to 0.20) | S | P0 |
| STAB-5 | Feature-flag isolation for WASM target | S | P1 |
STAB-1 — pg_sys node abstraction layer (~500 unsafe blocks)
In plain terms:
rewrites.rs(118 unsafe blocks, 295pg_sysrefs) andsublinks.rs(367 unsafe blocks, 492pg_sysrefs) are the most deeply coupled to pgrx. The core crate cannot contain rawpg_syscalls. Define atrait NodeVisitor(or equivalent) that wraps pg_sys node traversal behind safe method calls. The pgrx backend implements the trait using actual pg_sys pointers; a mock backend can be used for unit tests. This is the single highest-effort item in the release.
Verify: zero pg_sys:: references in pg_trickle_core/; grep -r pg_sys pg_trickle_core/src/ returns empty.
Dependencies: PGL-1-1, PGL-1-5. Schema change: No.
STAB-2 — Compile-time pgrx dependency leak detection
In plain terms: After extraction, any accidental
use pgrx::*in the core crate would break the WASM build. Add a CI job that compilespg_trickle_corein isolation (without the pgrx feature) and fails if any pgrx symbol is referenced. This catches leaks immediately rather than at WASM build time.
Verify: cargo build -p pg_trickle_core --no-default-features succeeds in
CI.
Dependencies: PGL-1-1. Schema change: No.
STAB-3 — Cargo workspace configuration correctness
In plain terms: Adding a workspace member changes
Cargo.lockresolution, feature unification, andcargo pgrxbehavior. Verify:cargo pgrx packagestill produces a valid.so,cargo testruns all workspace tests, andcargo pgrx testworks for the extension crate. pgrx version must remain pinned at 0.17.x.
Verify: cargo pgrx package, cargo test --workspace, cargo pgrx test
all succeed.
Dependencies: PGL-1-1. Schema change: No.
STAB-4 — Extension upgrade path (0.19 to 0.20)
In plain terms: v0.23.0 makes no SQL-visible changes (same functions, same catalog schema), but the upgrade migration must still be tested.
ALTER EXTENSION pg_trickle UPDATEfrom 0.21.0 to 0.22.0 must leave existing stream tables intact and refreshable.
Verify: upgrade E2E test confirms stream tables survive and refresh
correctly after 0.22.0 -> 0.23.0.
STAB-5 — Feature-flag isolation for WASM target
In plain terms: The core crate must compile on both native and WASM. Any platform-specific code (e.g.,
std::time::Instantunavailable onwasm32-unknown-emscripten) must be gated behind#[cfg]attributes. Add a CI matrix entry for the WASM target that catches platform leaks.
Verify: cargo build --target wasm32-unknown-emscripten -p pg_trickle_core
succeeds in CI.
Dependencies: PGL-1-6. Schema change: No.
Performance
| ID | Title | Effort | Priority |
|---|---|---|---|
| PERF-1 | Zero-overhead abstraction for DatabaseBackend | M | P0 |
| PERF-2 | Benchmark regression gate across extraction | S | P0 |
| PERF-3 | Core-only unit test speedup measurement | S | P1 |
PERF-1 — Zero-overhead abstraction for DatabaseBackend
In plain terms: The
trait DatabaseBackendintroduces dynamic dispatch (dyn DatabaseBackendor generics). For the native extension, the abstraction must add zero measurable overhead. Use monomorphization (generics, not trait objects) for the hot path — delta SQL generation is called on every refresh cycle and must not regress. Measure with Criterion before/after on thediff_operatorsbenchmark suite.
Verify: Criterion benchmark shows < 1% regression on diff_operators suite
after extraction.
Dependencies: PGL-1-5. Schema change: No.
PERF-2 — Benchmark regression gate across extraction
In plain terms: The extraction touches 51K lines of code. Even without functional changes, module restructuring can alter inlining, cache locality, and link-time optimization. Run the full Criterion benchmark suite before and after and assert no regression > 5%.
Verify: scripts/criterion_regression_check.py passes with 5% threshold on
all existing benchmarks.
Dependencies: PGL-1-7. Schema change: No.
PERF-3 — Core-only unit test speedup measurement
In plain terms: One of the key benefits of extraction is that
pg_trickle_coreunit tests run without starting PostgreSQL. Measure the wall-clock time forcargo test -p pg_trickle_corevs the old in-tree unit tests. Document the speedup in the CHANGELOG — expect 5-10x faster CI for unit-level tests.
Verify: document test execution times before/after in PR description. Dependencies: PGL-1-7. Schema change: No.
Scalability
| ID | Title | Effort | Priority |
|---|---|---|---|
| SCAL-1 | Workspace build parallelism verification | S | P1 |
| SCAL-2 | Core crate binary size for WASM budget | S | P1 |
| SCAL-3 | Incremental compilation impact assessment | S | P2 |
SCAL-1 — Workspace build parallelism verification
In plain terms: With two crates,
cargo buildcan compilepg_trickle_coreand other non-dependent crates in parallel. Verify that the workspace DAG allows parallel compilation and measure the incremental rebuild time for a change inpg_trickle_coreonly.
Verify: cargo build --timings shows parallel compilation of core crate.
Dependencies: PGL-1-1. Schema change: No.
SCAL-2 — Core crate binary size for WASM budget
In plain terms: v0.24.0 targets < 2 MB WASM bundle. Measure the compiled size of
pg_trickle_corefor the WASM target now so the budget is known before Phase 2. If > 5 MB, investigatewasm-optstripping and feature-gating large operator modules.
Verify: wasm32-unknown-emscripten build of pg_trickle_core produces < 5
MB unoptimized. Document size in tracking issue.
Dependencies: PGL-1-6. Schema change: No.
SCAL-3 — Incremental compilation impact assessment
In plain terms: Splitting into two crates changes the incremental compilation boundary. A change in
pg_trickle_corenow forces a recompile of the extension crate. Measure incremental compile time for common edit patterns (add a test, modify an operator, change a rewrite pass) and ensure developer-experience compile times remain < 30s.
Verify: document incremental compile times for 3 edit patterns. Dependencies: PGL-1-1. Schema change: No.
Ease of Use
| ID | Title | Effort | Priority |
|---|---|---|---|
| UX-1 | Workspace-aware justfile targets | S | P0 |
| UX-2 | Developer guide for core crate contributions | S | P1 |
| UX-3 | ARCHITECTURE.md update for two-crate layout | S | P1 |
UX-1 — Workspace-aware justfile targets
In plain terms: Existing
justtargets (just test-unit,just lint,just fmt) must work seamlessly with the new workspace layout. Update the justfile sojust test-unitruns bothpg_trickle_coreunit tests and extension unit tests. Addjust test-corefor core-only tests.
Verify: all existing just targets pass; just test-core runs core-only
tests in < 5 seconds.
Dependencies: PGL-1-1. Schema change: No.
UX-2 — Developer guide for core crate contributions
In plain terms: Contributors need to know the rules: what goes in
pg_trickle_core(pure logic, no pgrx) vs the extension crate (SPI, FFI, SQL functions). Add a section toCONTRIBUTING.mdexplaining the crate boundary, theDatabaseBackendtrait contract, and how to add a new operator to the core crate.
Verify: CONTRIBUTING.md updated with crate boundary rules. Dependencies: PGL-1-5. Schema change: No.
UX-3 — ARCHITECTURE.md update for two-crate layout
In plain terms: The module layout diagram in
docs/ARCHITECTURE.mdandAGENTS.mdmust reflect the new two-crate structure. Update both files so new contributors see the correct layout.
Verify: docs/ARCHITECTURE.md and AGENTS.md module diagrams show
pg_trickle_core/ and pg_trickle/ crates.
Dependencies: PGL-1-7. Schema change: No.
Test Coverage
| ID | Title | Effort | Priority |
|---|---|---|---|
| TEST-1 | Delta SQL snapshot tests for all 22 TPC-H queries | M | P0 |
| TEST-2 | Pure-Rust unit tests for extracted operators | L | P0 |
| TEST-3 | Mock DatabaseBackend for in-memory testing | M | P1 |
| TEST-4 | WASM build smoke test in CI | S | P0 |
| TEST-5 | Cargo deny / audit for new crate | XS | P0 |
TEST-1 — Delta SQL snapshot tests for all 22 TPC-H queries
In plain terms: Before extraction, capture the exact delta SQL output for each of the 22 TPC-H stream table definitions. After extraction, run the same generator and diff. Any change is a hard failure. This is the primary correctness gate for the refactoring.
Verify: cargo test -p pg_trickle_core -- snapshot passes with zero diffs.
Dependencies: CORR-1. Schema change: No.
TEST-2 — Pure-Rust unit tests for extracted operators
In plain terms: The 23 operator files currently have ~1,700 unit tests that run inside
cargo pgrx test(requires PostgreSQL). After extraction, all pure-logic tests should run viacargo test -p pg_trickle_corewithout a database. Tests that require SPI (e.g., catalog lookups) stay in the extension crate. Audit and migrate every test that can run without PostgreSQL.
Verify: > 80% of existing operator unit tests run in pg_trickle_core
without PostgreSQL.
Dependencies: PGL-1-2, TEST-3. Schema change: No.
TEST-3 — Mock DatabaseBackend for in-memory testing
In plain terms: For core crate tests that need to call the parser or SPI, provide a
MockBackendthat returns canned parse trees and query results. This allows testing the full pipeline (parse -> rewrite -> operator tree -> delta SQL) without PostgreSQL.
Verify: MockBackend supports at least: raw_parser() returning a canned
OpTree, and spi_query() returning a canned result set. 10+ tests use it.
Dependencies: PGL-1-5. Schema change: No.
TEST-4 — WASM build smoke test in CI
In plain terms: Add a CI job that compiles
pg_trickle_coretowasm32-unknown-emscriptenon every PR. This catches platform-specific code leaks before they accumulate. The job does not need to run the WASM binary — just compile it.
Verify: CI job build-wasm passes on every PR targeting the core crate.
Dependencies: PGL-1-6, STAB-5. Schema change: No.
TEST-5 — Cargo deny / audit for new crate
In plain terms: The new
pg_trickle_corecrate may introduce new transitive dependencies. Ensurecargo deny checkandcargo auditcover the new crate and report no advisories.
Verify: cargo deny check and cargo audit pass for the full workspace.
Dependencies: PGL-1-1. Schema change: No.
Conflicts & Risks
-
STAB-1 is the critical path. The ~500 unsafe blocks in
rewrites.rsandsublinks.rsrequire aNodeVisitorabstraction over rawpg_syspointer traversal. This is the highest-effort, highest-risk item. If the abstraction proves too leaky (e.g., too many pg_sys node types to wrap), consider leavingrewrites.rsandsublinks.rsin the extension crate and extracting only operators + DAG + types to the core crate. This reduces v0.23.0 scope but still delivers the WASM-compilable operator engine for v0.24.0. -
PERF-1 must be validated before merging. Introducing a
trait DatabaseBackendcould add vtable dispatch overhead on the hot refresh path. Use monomorphization (generics) rather thandyn Traitfor the extension-side implementation. If Criterion shows > 1% regression, investigate#[inline]annotations and LTO settings. -
No schema changes, but workspace restructuring can break
cargo pgrx. Thecargo-pgrxtool makes assumptions about workspace layout (e.g., expecting a singlelib.rsentry point). Testcargo pgrx package,cargo pgrx test, andcargo pgrx runearly. Ifcargo-pgrx0.17.x cannot handle the workspace, consider upgrading to a newer pgrx that supports workspaces, or use a[patch]section inCargo.toml. -
TEST-2 depends on TEST-3 (MockBackend). Pure-Rust operator tests need a way to feed canned parse trees. Build the MockBackend early so TEST-2 can proceed.
-
WASM target may not be available in standard CI runners. The
wasm32-unknown-emscriptentarget requires Emscripten SDK. Either install it in CI (adds ~2 min setup) or use a pre-built Docker image with the SDK. Budget for CI setup time. -
Extraction is all-or-nothing per module. Partially extracting a module (e.g., moving half of
rewrites.rs) creates circular dependencies. Each module must move completely or stay. Plan the extraction order: types -> operators -> DAG -> diff -> rewrites -> sublinks.
v0.23.0 total: ~3–4 weeks (extraction) + ~1–2 weeks (abstraction layer + testing)
Exit criteria:
-
PGL-1-1:
pg_trickle_corecrate exists as a workspace member with zero pgrx dependencies - PGL-1-2: All operator delta SQL generation lives in the core crate
- PGL-1-3: All auto-rewrite passes live in the core crate
- PGL-1-4: DAG computation lives in the core crate
-
PGL-1-5:
trait DatabaseBackenddefined; pgrx implementation passes all existing tests -
PGL-1-6:
cargo build --target wasm32-unknown-emscripten -p pg_trickle_coresucceeds -
PGL-1-7:
just test-allpasses with zero regressions - CORR-1: Delta SQL snapshot tests pass for all 22 TPC-H queries (byte-for-byte match)
- CORR-2: OpTree serialize-deserialize round-trip passes proptest
- CORR-3: Rewrite pass ordering golden snapshot matches
- CORR-4: DAG cycle detection passes with 3 new edge-case tests
-
STAB-1: Zero
pg_sys::references inpg_trickle_core/src/ -
STAB-2:
cargo build -p pg_trickle_core --no-default-featurespasses in CI -
STAB-3:
cargo pgrx packageandcargo pgrx testsucceed with workspace layout -
STAB-4: Extension upgrade path tested (
0.22.0 -> 0.23.0) - STAB-5: WASM target builds in CI
-
PERF-1: Criterion shows < 1% regression on
diff_operatorsbenchmark - PERF-2: Full benchmark suite passes with < 5% regression threshold
- TEST-1: TPC-H delta SQL snapshot tests pass
- TEST-2: > 80% of operator unit tests run without PostgreSQL
- TEST-3: MockBackend used by 10+ core crate tests
-
TEST-4: CI
build-wasmjob passes on every PR -
TEST-5:
cargo deny checkandcargo auditpass for workspace -
UX-1: All existing
justtargets pass;just test-coreadded - UX-3: ARCHITECTURE.md and AGENTS.md updated with two-crate layout
-
just check-version-syncpasses
v0.24.0 — PGlite WASM Extension
Release Theme This release delivers the first working PGlite extension — the moment pg_trickle's incremental view maintenance runs in the browser. By wrapping
pg_trickle_core(extracted in v0.23.0) in a thin C/FFI shim and compiling to WASM via PGlite's Emscripten toolchain, we ship an npm package (@pgtrickle/pglite) that gives PGlite users the full DVM operator vocabulary — outer joins, window functions, subqueries, recursive CTEs — in IMMEDIATE mode. This dramatically exceeds pg_ivm's PGlite offering (INNER joins + basic aggregates only). The release also establishes the cross-platform correctness and performance baselines that all future PGlite work builds on.
See PLAN_PGLITE.md §5 Strategy A and §7 Phase 2 for the full architecture.
PGlite WASM Build (Phase 2)
In plain terms: This takes the
pg_trickle_corecrate extracted in v0.23.0 and wraps it in a thin C shim that PGlite's Emscripten-based extension build system can compile to WASM. The result is a PGlite extension package (@pgtrickle/pglite) that providescreate_stream_table(),drop_stream_table(), andalter_stream_table()— all running IMMEDIATE mode inside the WASM PostgreSQL engine with the full DVM operator set.
| Item | Description | Effort | Ref |
|---|---|---|---|
| PGL-2-1 | C shim for PGlite. Thin C wrapper bridging PGlite's Emscripten environment to pg_trickle_core via Rust FFI. Handles raw_parser calls through PGlite's built-in PostgreSQL parser. | 1–2wk | PLAN_PGLITE.md §5 Strategy A |
| PGL-2-2 | DatabaseBackend for PGlite. Implement the trait for PGlite's single-connection SPI and built-in parser. Remove advisory lock acquisition (trivial in single-connection). | 3–5d | PLAN_PGLITE.md §5 Strategy A |
| PGL-2-3 | WASM bundle build. Integrate with PGlite's extension toolchain (postgres-pglite). Produce .tar.gz WASM bundle. Target bundle size < 2 MB. | 3–5d | PLAN_PGLITE.md §8 |
| PGL-2-4 | TypeScript wrapper. @pgtrickle/pglite npm package with PGlite plugin API. createStreamTable(), dropStreamTable(), alterStreamTable() with full IMMEDIATE mode support. | 2–3d | PLAN_PGLITE.md §7 Phase 2 |
| PGL-2-5 | IMMEDIATE mode E2E tests on PGlite. Verify inner joins, outer joins, aggregates, DISTINCT, UNION ALL, window functions, subqueries, CTEs (non-recursive + recursive), LATERAL, view inlining, DISTINCT ON, GROUPING SETS. | 1–2wk | PLAN_PGLITE.md §4.1 |
| PGL-2-6 | PG 17 vs PG 18 parse tree compatibility. PGlite tracks PG 17; pg_trickle targets PG 18. Audit and gate any node struct differences with conditional compilation. | 3–5d | PLAN_PGLITE.md §8 |
Phase 2 subtotal: ~5–7 weeks
Correctness
| ID | Title | Effort | Priority |
|---|---|---|---|
| CORR-1 | PG 17/18 parse tree node divergence audit | M | P0 |
| CORR-2 | Delta SQL cross-platform equivalence | M | P0 |
| CORR-3 | Advisory lock no-op safety proof | S | P1 |
| CORR-4 | IMMEDIATE trigger ordering in single-connection | S | P1 |
CORR-1 — PG 17/18 parse tree node divergence audit
In plain terms: PGlite embeds PostgreSQL 17's parser; pg_trickle's
OpTreeconstruction targets PostgreSQL 18 node structs. Any struct layout difference (added fields, renamed members, changed enum values) would cause the C shim to misinterpret parse trees, producing silently wrong delta SQL. Systematically diff the PG 17 and PG 18 parse tree headers (nodes/parsenodes.h,nodes/primnodes.h) and catalog every node type that pg_trickle traverses. Gate incompatible nodes behind#[cfg(pg17)]/#[cfg(pg18)]conditional compilation.
Verify: a CI job compiles pg_trickle_core against both PG 17 and PG 18
parse tree headers. A test generates OpTrees from the same SQL on both
versions and asserts structural equality.
Dependencies: PGL-2-6. Schema change: No.
CORR-2 — Delta SQL cross-platform equivalence
In plain terms: The same SQL view definition must produce the exact same delta SQL on native PostgreSQL 18 and PGlite (WASM + PG 17 parser). Any divergence means one platform gets wrong incremental results. Create a snapshot test suite that runs all 22 TPC-H stream table definitions through both the native and WASM
DatabaseBackendimplementations and asserts byte-for-byte identical delta SQL output.
Verify: snapshot comparison test passes for all 22 TPC-H queries on both platforms. Any diff is a hard failure. Dependencies: PGL-2-2, CORR-1. Schema change: No.
CORR-3 — Advisory lock no-op safety proof
In plain terms: The native extension uses
pg_advisory_xact_lock()to prevent concurrent refresh of the same stream table. PGlite is single-connection — the lock acquisition is a no-op. Verify that removing the lock cannot cause re-entrancy (a trigger firingcreate_stream_table()from within a refresh) by auditing all SPI call paths from the PGliteDatabaseBackendfor re-entrant calls.
Verify: code review + integration test that attempts re-entrant refresh from within a trigger. Must error cleanly, not corrupt state. Dependencies: PGL-2-2. Schema change: No.
CORR-4 — IMMEDIATE trigger ordering in single-connection
In plain terms: IMMEDIATE mode relies on AFTER triggers firing in a specific order when multiple source tables are modified in the same statement (e.g., a CTE with multiple INSERTs). Verify that PGlite's trigger execution order matches native PostgreSQL's for the trigger configurations pg_trickle creates.
Verify: integration test with multi-table CTE INSERT on PGlite; assert stream table state matches native. Dependencies: PGL-2-5. Schema change: No.
Stability
| ID | Title | Effort | Priority |
|---|---|---|---|
| STAB-1 | WASM heap OOM graceful degradation | M | P0 |
| STAB-2 | C shim panic/unwind boundary safety | S | P0 |
| STAB-3 | Extension load/unload lifecycle correctness | S | P0 |
| STAB-4 | Native extension upgrade path (0.21 → 0.22) | S | P0 |
| STAB-5 | npm package version synchronization | XS | P1 |
STAB-1 — WASM heap OOM graceful degradation
In plain terms: WASM environments have a finite heap (typically 256 MB in browsers, configurable in Node). A large stream table with many operators could exhaust WASM memory during OpTree construction or delta SQL generation. The extension must detect allocation failures and return a clear PostgreSQL error rather than crashing the WASM instance (which would kill all PGlite state). Implement a memory-aware allocator wrapper or check
emscripten_get_heap_size()at entry points.
Verify: stress test creating stream tables over increasingly complex views until OOM; assert PGlite remains functional and returns an actionable error. Dependencies: PGL-2-1. Schema change: No.
STAB-2 — C shim panic/unwind boundary safety
In plain terms: Rust panics must not cross the FFI boundary into C. The C shim must catch panics via
std::panic::catch_unwind()and convert them to PostgreSQLereport(ERROR)calls. Any uncaught panic in WASM would abort the entire PGlite instance. Audit every#[no_mangle] extern "C"entry point in the shim for panic safety.
Verify: test that triggers a panic path (e.g., invalid SQL) from TypeScript; assert PGlite returns a SQL error, not a WASM trap. Dependencies: PGL-2-1. Schema change: No.
STAB-3 — Extension load/unload lifecycle correctness
In plain terms: PGlite extensions can be loaded and unloaded. The C shim must free all Rust-allocated memory on unload and not leave dangling pointers or leaked state. Test the full lifecycle: load extension → create stream tables → drop stream tables → unload extension → reload extension → create new stream tables.
Verify: lifecycle test with memory profiling shows zero leaked allocations after unload/reload cycle. Dependencies: PGL-2-1, PGL-2-4. Schema change: No.
STAB-4 — Native extension upgrade path (0.22 → 0.23)
In plain terms: v0.24.0 adds PGlite support but makes no SQL-visible changes to the native extension. The upgrade migration from 0.21.0 to 0.22.0 must leave existing stream tables intact and refreshable.
Verify: upgrade E2E test confirms stream tables survive and refresh
correctly after 0.23.0 -> 0.24.0.
STAB-5 — npm package version synchronization
In plain terms: The
@pgtrickle/pglitenpm package version must match the extension version (0.22.0). Add a CI check that verifiespackage.jsonversion matchespg_trickle.controlversion, similar to the existingjust check-version-synctarget.
Verify: just check-version-sync also validates npm package version.
Dependencies: PGL-2-4. Schema change: No.
Performance
| ID | Title | Effort | Priority |
|---|---|---|---|
| PERF-1 | WASM vs native refresh latency benchmark | M | P0 |
| PERF-2 | WASM bundle size optimization (< 2 MB target) | M | P0 |
| PERF-3 | PGlite cold-start extension load time | S | P1 |
PERF-1 — WASM vs native refresh latency benchmark
In plain terms: WASM is expected to be 1.5–3× slower than native (per PLAN_PGLITE.md §8). Quantify the actual overhead by benchmarking IMMEDIATE-mode refresh on both platforms using the same schema + data. The overhead must stay below the threshold where IMMEDIATE mode is still faster than full re-evaluation — otherwise PGlite users would be better off just re-running the query. Establish a Criterion-like benchmark suite for PGlite (potentially using Node.js +
@electric-sql/pglite).
Verify: benchmark report showing WASM refresh latency for 5 representative stream tables (scan, join, aggregate, window, recursive CTE). Document native-to-WASM overhead ratio. Dependencies: PGL-2-5. Schema change: No.
PERF-2 — WASM bundle size optimization (< 2 MB target)
In plain terms: The WASM bundle must be < 2 MB for acceptable download times in browser environments (PostGIS is 8.2 MB, pgcrypto is 1.1 MB — pg_trickle should be closer to pgcrypto). Apply
wasm-opt -Oz, LTO,codegen-units = 1, strip debug info, and feature-gate large operator modules (e.g., recursive CTE, window functions) behind optional features if needed to meet the target.
Verify: CI job measures WASM bundle size after wasm-opt and fails if > 2
MB. Document size breakdown by operator module.
Dependencies: PGL-2-3. Schema change: No.
PERF-3 — PGlite cold-start extension load time
In plain terms: The first
CREATE EXTENSION pg_tricklein a PGlite session compiles and loads the WASM module. This must complete in < 500 ms in a browser and < 200 ms in Node.js. Measure and optimize by using streaming WASM compilation (WebAssembly.compileStreaming()) and ensuring the extension_PG_init()function does minimal work.
Verify: benchmark measuring time from CREATE EXTENSION to first
create_stream_table() on fresh PGlite instance. Document cold-start time.
Dependencies: PGL-2-1, PGL-2-3. Schema change: No.
Scalability
| ID | Title | Effort | Priority |
|---|---|---|---|
| SCAL-1 | Stream table count ceiling in WASM | S | P1 |
| SCAL-2 | Wide-table OpTree memory footprint | S | P1 |
| SCAL-3 | Dataset size practical limit for IMMEDIATE mode | S | P2 |
SCAL-1 — Stream table count ceiling in WASM
In plain terms: Each stream table consumes memory for its OpTree, delta SQL templates, and trigger metadata. In native PostgreSQL with gigabytes of RAM this is trivial, but in a 256 MB WASM heap it matters. Determine the practical limit by creating stream tables in a loop until OOM, then document the ceiling and add a guard that errors at 80% capacity with an actionable message.
Verify: stress test documents the ceiling (e.g., "~200 stream tables with average 3-table join in 256 MB heap"). Guard errors at 80%. Dependencies: STAB-1. Schema change: No.
SCAL-2 — Wide-table OpTree memory footprint
In plain terms: A stream table over a 100-column source table produces a large OpTree and long delta SQL strings. Profile the memory consumption of OpTree construction for wide tables and ensure it fits within the WASM heap budget alongside typical stream table counts.
Verify: profile OpTree allocation for 10, 50, 100-column source tables. Document memory per stream table as a function of column count. Dependencies: PGL-2-5. Schema change: No.
SCAL-3 — Dataset size practical limit for IMMEDIATE mode
In plain terms: IMMEDIATE mode fires triggers on every DML, so overhead scales with write frequency. In a WASM environment with ~2× slower execution, determine at what dataset size (rows × columns × writes/second) IMMEDIATE mode becomes impractical. Document the breakpoint so PGlite users know when their use case has outgrown the browser and should migrate to native pg_trickle with DIFFERENTIAL mode.
Verify: benchmark with increasing write rates; document the throughput ceiling (e.g., "> 10K rows/sec INSERT rate degrades stream table latency past 100 ms"). Dependencies: PERF-1. Schema change: No.
Ease of Use
| ID | Title | Effort | Priority |
|---|---|---|---|
| UX-1 | TypeScript API ergonomics and type safety | S | P0 |
| UX-2 | PGlite getting-started guide | M | P0 |
| UX-3 | WASM-context error message quality | S | P1 |
| UX-4 | npm package README with runnable examples | S | P1 |
UX-1 — TypeScript API ergonomics and type safety
In plain terms: The
@pgtrickle/pgliteTypeScript API must follow PGlite plugin conventions (PGlitePlugininterface,init()lifecycle). All methods must be fully typed — noanytypes. The API surface must be minimal:createStreamTable(sql),dropStreamTable(name),alterStreamTable(name, sql),listStreamTables(), andrefreshStreamTable(name). Review against existing PGlite plugins (@electric-sql/pglite-repl,pglite-vector) for consistency.
Verify: TypeScript strict mode compilation with no errors. API review against PGlite plugin conventions checklist. Dependencies: PGL-2-4. Schema change: No.
UX-2 — PGlite getting-started guide
In plain terms: A
docs/tutorials/PGLITE_QUICKSTART.mdguide walking a user fromnpm installto a working React app with live stream tables in < 10 minutes. Include: install, create PGlite instance with extension, define source table + stream table, insert data, observe stream table update. Provide a CodeSandbox / StackBlitz link for zero-install try-it-now experience.
Verify: a new developer can follow the guide and see a working stream table in PGlite in a browser within 10 minutes. Dependencies: PGL-2-4, UX-1. Schema change: No.
UX-3 — WASM-context error message quality
In plain terms: Error messages from the Rust/C shim must be JavaScript-friendly: no raw pg_sys error codes, no memory addresses. Every error must include the stream table name, the failing SQL fragment, and a remediation hint. Unsupported features (DIFFERENTIAL mode, scheduled refresh, parallel workers) must error with "Not supported in PGlite:
. Use IMMEDIATE mode." rather than cryptic internal errors.
Verify: audit all error paths in the C shim + PGlite DatabaseBackend.
Every error message includes table name + remediation hint.
Dependencies: PGL-2-1, PGL-2-2. Schema change: No.
UX-4 — npm package README with runnable examples
In plain terms: The npm package must have a README with: badge for PGlite compatibility, install command, 3 runnable examples (basic aggregate, join, window function), API reference, link to the full PGlite quickstart guide, and a "Limitations vs native pg_trickle" section clearly stating: no DIFFERENTIAL mode, no scheduled refresh, no parallel workers, PG 17 parser only.
Verify: README renders correctly on npmjs.com; examples are copy-pasteable into a Node.js REPL. Dependencies: PGL-2-4, UX-2. Schema change: No.
Test Coverage
| ID | Title | Effort | Priority |
|---|---|---|---|
| TEST-1 | Full DVM operator E2E suite on PGlite | L | P0 |
| TEST-2 | PG 17/18 parse tree compatibility tests | M | P0 |
| TEST-3 | WASM memory stress tests | M | P1 |
| TEST-4 | TypeScript integration tests | M | P0 |
| TEST-5 | Bundle size regression gate in CI | S | P0 |
TEST-1 — Full DVM operator E2E suite on PGlite
In plain terms: Run every DVM operator (23 operators across inner join, outer join, full join, semi-join, anti-join, aggregate, distinct, union/intersect/except, subquery, scalar subquery, CTE scan, recursive CTE, lateral function, lateral subquery, window function, scan, filter, project) through IMMEDIATE mode in PGlite. This is the primary correctness gate for the WASM extension. Use a Node.js test harness with
@electric-sql/pgliteto run the tests headlessly.
Verify: test suite with ≥ 1 test per operator (23+ tests) passes in CI using PGlite Node.js. Test matrix: INSERT, UPDATE, DELETE for each operator. Dependencies: PGL-2-5. Schema change: No.
TEST-2 — PG 17/18 parse tree compatibility tests
In plain terms: For every parse tree node type that pg_trickle traverses, generate a test query that exercises that node, parse it on both PG 17 (PGlite) and PG 18 (native), and assert that the resulting
OpTreeis structurally identical. This catches version-specific divergences before they reach users.
Verify: compatibility test suite covers all node types referenced in
pg_trickle_core. Any divergence is a hard failure with clear diagnostic.
Dependencies: CORR-1. Schema change: No.
TEST-3 — WASM memory stress tests
In plain terms: Create increasing numbers of stream tables with increasing complexity until OOM. Verify that: (a) the guard from SCAL-1 fires at 80% capacity, (b) PGlite remains functional after the guard fires, (c) dropping stream tables actually frees memory. Run under different heap sizes (64 MB, 128 MB, 256 MB) to validate the guard thresholds.
Verify: stress test with 3 heap sizes completes without WASM trap. Guard fires at documented threshold. Memory reclaimed after DROP. Dependencies: STAB-1, SCAL-1. Schema change: No.
TEST-4 — TypeScript integration tests
In plain terms: Test the
@pgtrickle/pgliteTypeScript API end-to-end using Jest or Vitest in Node.js. Cover: create/drop/alter stream table, error handling (invalid SQL, unsupported features), plugin lifecycle (init/cleanup), and concurrent operations on different stream tables. Run as part of CI on every PR that touchespg_trickle_pglite/.
Verify: ≥ 20 TypeScript integration tests pass in CI. Test coverage report for the TypeScript wrapper shows > 90% line coverage. Dependencies: PGL-2-4, UX-1. Schema change: No.
TEST-5 — Bundle size regression gate in CI
In plain terms: Add a CI job that builds the WASM bundle, runs
wasm-opt, measures the final.wasmfile size, and fails if it exceeds 2 MB. Store the current size as a baseline and alert on any increase > 10%. This prevents bundle bloat as features are added.
Verify: CI job check-wasm-size runs on every PR touching
pg_trickle_core/ or pg_trickle_pglite/. Fails at > 2 MB.
Dependencies: PGL-2-3, PERF-2. Schema change: No.
Conflicts & Risks
-
CORR-1 (PG 17/18 parse tree compatibility) is the highest risk. PGlite embeds PG 17; pg_trickle targets PG 18. If node struct layouts diverged significantly between versions (e.g.,
JoinExprgained a field,RangeTblEntrychanged a flag), the C shim must handle both layouts via conditional compilation. In the worst case, some operators may need version-specific code paths. Start this audit early — it blocks PGL-2-1 and PGL-2-2. -
PERF-2 (bundle size < 2 MB) may conflict with full operator coverage. If the 23-operator delta SQL generator compiles to > 2 MB, we may need to feature-gate rarely-used operators (recursive CTE, GROUPING SETS) behind cargo features. This would reduce the "full DVM vocabulary" claim and require documenting which operators are available by default. Measure early with a minimal build to establish baseline.
-
PGlite's Emscripten toolchain is a moving target. PGlite's extension build system (
postgres-pglite) is not yet stable. Breaking changes in the toolchain could block PGL-2-3. Pin the PGlite version and track upstream releases. Have a fallback plan: manual Emscripten compilation without the PGlite toolchain. -
STAB-2 (panic boundary) and STAB-1 (OOM handling) interact. A Rust OOM in WASM triggers a panic, which must not cross the FFI boundary. Both items must be implemented together: the OOM guard (STAB-1) sets a pre-panic threshold, and the catch_unwind wrapper (STAB-2) is the last-resort safety net.
-
No prior C FFI in the codebase. The only C code is
scripts/pg_stub.c(test helper). The C shim (PGL-2-1) introduces a new language and toolchain requirement. Ensure the C code is minimal (< 500 lines), well-documented, and covered by the TypeScript integration tests. -
TEST-1 and TEST-4 require a PGlite-based CI runner. Need Node.js 18+ with
@electric-sql/pglitein CI. This is a new CI dependency. Add it to the existing CI matrix as a separate job that only runs whenpg_trickle_pglite/orpg_trickle_core/files are modified.
v0.24.0 total: ~5–7 weeks (WASM build) + ~2–3 weeks (testing + polish)
Exit criteria:
- PGL-2-1: C shim compiles and links against PGlite's WASM PostgreSQL headers
-
PGL-2-2: PGlite
DatabaseBackendpasses all IMMEDIATE-mode operator tests -
PGL-2-3: WASM bundle size < 2 MB after
wasm-opt -
PGL-2-4:
@pgtrickle/pglitenpm package published to npmjs.com - PGL-2-5: All 23 DVM operators pass E2E tests on PGlite
-
PGL-2-6: PG 17 parse tree differences documented and handled with
#[cfg] - CORR-1: PG 17/18 parse tree audit complete; compatibility tests pass
- CORR-2: Delta SQL cross-platform snapshot tests pass for all 22 TPC-H queries
- CORR-3: Re-entrant refresh test passes on PGlite
- CORR-4: Multi-table CTE trigger ordering matches native
- STAB-1: OOM stress test: PGlite survives with actionable error
- STAB-2: Panic from invalid SQL returns SQL error, not WASM trap
- STAB-3: Load/unload/reload lifecycle test: zero leaked allocations
-
STAB-4: Extension upgrade path tested (
0.23.0 -> 0.24.0) - PERF-1: WASM vs native benchmark report published (≤ 3× overhead)
- PERF-2: WASM bundle ≤ 2 MB (CI gated)
- PERF-3: Cold-start load time < 500 ms browser, < 200 ms Node.js
- TEST-1: ≥ 23 operator E2E tests pass on PGlite in CI
- TEST-2: Parse tree compatibility tests cover all traversed node types
- TEST-3: Memory stress tests pass under 64/128/256 MB heap sizes
- TEST-4: ≥ 20 TypeScript integration tests with > 90% line coverage
-
TEST-5: CI
check-wasm-sizejob passes on every PR - UX-1: TypeScript strict mode compilation: zero errors
- UX-2: PGlite getting-started guide published with CodeSandbox link
- UX-4: npm README renders correctly on npmjs.com
-
just check-version-syncpasses (incl. npm package version)
v0.25.0 — PGlite Reactive Integration
Release Theme This release completes the PGlite story by bridging the gap between database-side incremental view maintenance and front-end UI reactivity. By connecting stream table deltas to PGlite's
live.changes()API and providing framework-specific hooks (useStreamTable()for React and Vue), pg_trickle becomes the first IVM engine to offer truly reactive UI bindings — where DOM updates are proportional to changed rows, not result set size. This is the local-first developer's final mile: fromINSERTto re-render in a single digit millisecond count, with no polling, no diffing, and no full query re-execution.
See PLAN_PGLITE.md §7 Phase 3 for the full reactive integration design.
Reactive Bindings (Phase 3)
In plain terms: Phase 2 gave PGlite users in-engine IVM. This phase connects stream table changes to PGlite's
live.changes()API and provides framework-specific hooks —useStreamTable()for React,useStreamTable()for Vue — so UI components automatically re-render when the underlying data changes. For local-first apps like collaborative editors, dashboards, and offline-capable tools, this is the last mile between incremental SQL and reactive UI.
| Item | Description | Effort | Ref |
|---|---|---|---|
| PGL-3-1 | live.changes() bridge. Emit INSERT/UPDATE/DELETE change events from stream table delta application to PGlite's live query system. Keyed by __pgt_row_id. | 3–5d | PLAN_PGLITE.md §7 Phase 3 |
| PGL-3-2 | React hooks. useStreamTable(query) hook that subscribes to stream table changes and returns reactive state. Handles mount/unmount lifecycle. | 3–5d | — |
| PGL-3-3 | Vue composable. useStreamTable(query) composable with equivalent functionality. | 2–3d | — |
| PGL-3-4 | Documentation and examples. Local-first app patterns: collaborative todo list, real-time dashboard, offline-first inventory tracker. Published as @pgtrickle/pglite docs. | 2–3d | — |
| PGL-3-5 | Performance benchmarks. End-to-end latency from INSERT to React re-render. Compare against live.incrementalQuery() for complex queries (3-table join + aggregate). | 1–2d | — |
Phase 3 subtotal: ~2–3 weeks
Correctness
| ID | Title | Effort | Priority |
|---|---|---|---|
| CORR-1 | Change event fidelity vs stream table state | M | P0 |
| CORR-2 | Multi-row DML atomicity in reactive stream | S | P0 |
| CORR-3 | Hook state consistency after rapid mutations | M | P1 |
| CORR-4 | DELETE/re-INSERT identity stability | S | P1 |
CORR-1 — Change event fidelity vs stream table state
In plain terms: The
live.changes()bridge emits INSERT/UPDATE/DELETE events derived from the IMMEDIATE mode delta application. If an event is missed, duplicated, or misclassified (e.g., an UPDATE emitted as DELETE + INSERT), the React/Vue state will diverge from the actual stream table contents. For every DML operation on every DVM operator type, assert that the sequence of change events, when applied to an empty accumulator, produces a set identical toSELECT * FROM stream_table.
Verify: integration test replaying 1,000 random DML operations across all
operator types; final accumulator state matches SELECT *. Any divergence
is a hard failure.
Dependencies: PGL-3-1. Schema change: No.
CORR-2 — Multi-row DML atomicity in reactive stream
In plain terms: A single
INSERT INTO source SELECT ... FROM generate_series(1, 100)inserts 100 rows and triggers IMMEDIATE mode delta application. Thelive.changes()bridge must emit all 100 change events as a single batch — not trickle them one-by-one — so that React performs a single re-render, not 100. If events leak across batch boundaries, the UI shows intermediate states that never existed in the database.
Verify: test with 100-row INSERT; assert useStreamTable() callback fires
exactly once with all 100 rows. Intermediate renders counted via React
profiler must be ≤ 1.
Dependencies: PGL-3-1, PGL-3-2. Schema change: No.
CORR-3 — Hook state consistency after rapid mutations
In plain terms: If a user performs INSERT → DELETE → INSERT on the same row within 10 ms (e.g., optimistic UI with undo), the hook must resolve to the correct final state. Race conditions between the
live.changes()event stream and React's asynchronous render cycle could show stale data. The hook must use a monotonic sequence number (from the bridge's event stream) to discard stale updates.
Verify: stress test with 50 rapid mutations on the same row at 1 ms
intervals; final hook state matches SELECT *. Test on both React 18
(concurrent mode) and React 19.
Dependencies: PGL-3-1, PGL-3-2. Schema change: No.
CORR-4 — DELETE/re-INSERT identity stability
In plain terms: When a row is deleted and a new row with the same PK is inserted, the
__pgt_row_idchanges but the PK doesn't. The change bridge must emit a DELETE for the old__pgt_row_idand an INSERT for the new one — not an UPDATE — so that React's reconciler correctly unmounts and remounts the component (not just re-renders it). Wrong identity semantics cause stale closures and event handler leaks.
Verify: test DELETE + INSERT with same PK; verify React component lifecycle (unmount + mount, not just update). Use React DevTools profiler. Dependencies: PGL-3-1, PGL-3-2. Schema change: No.
Stability
| ID | Title | Effort | Priority |
|---|---|---|---|
| STAB-1 | Memory leak prevention in long-lived hooks | M | P0 |
| STAB-2 | Subscription cleanup on component unmount | S | P0 |
| STAB-3 | Error boundary integration for hook failures | S | P0 |
| STAB-4 | Native extension upgrade path (0.24 → 0.25) | S | P0 |
| STAB-5 | Framework version compatibility matrix | S | P1 |
STAB-1 — Memory leak prevention in long-lived hooks
In plain terms: A
useStreamTable()hook in a long-lived component (e.g., a dashboard that runs for hours) accumulates change events via thelive.changes()subscription. If the bridge or hook retains references to processed events, memory grows unboundedly. Implement a bounded event buffer (configurable, default 1,000 events) that discards processed events after they are applied to the hook's state snapshot. After the buffer fills, old entries are garbage-collected.
Verify: 4-hour soak test with continuous 1 row/sec mutations. Heap snapshot at 1h and 4h shows < 10% growth. No detached DOM nodes or leaked closures. Dependencies: PGL-3-1, PGL-3-2. Schema change: No.
STAB-2 — Subscription cleanup on component unmount
In plain terms: When a React component using
useStreamTable()is unmounted (e.g., route change), thelive.changes()subscription must be cancelled immediately. Failing to clean up causes: (a) memory leaks from the change listener, (b) "setState on unmounted component" warnings, (c) stale event processing after the component is gone. UseuseEffect()cleanup function with an AbortController pattern.
Verify: mount/unmount cycle test (100 cycles); zero console warnings, zero leaked subscriptions (verified via PGlite connection subscription count). Dependencies: PGL-3-2. Schema change: No.
STAB-3 — Error boundary integration for hook failures
In plain terms: If the
live.changes()bridge throws (e.g., stream table was dropped while the hook is active), the hook must propagate the error to React's error boundary / Vue'sonErrorCaptured— not swallow it silently or crash the app. Provide anonErrorcallback option and a default that throws to the nearest error boundary.
Verify: test dropping a stream table while useStreamTable() is active;
assert error boundary catches the error with an actionable message.
Dependencies: PGL-3-2, PGL-3-3. Schema change: No.
STAB-4 — Native extension upgrade path (0.24 → 0.25)
In plain terms: v0.25.0 adds reactive bindings at the TypeScript/npm layer only. The native PostgreSQL extension and PGlite WASM extension must continue to work unchanged. The upgrade migration from 0.23.0 to 0.24.0 must leave existing stream tables and the
@pgtrickle/pgliteWASM extension intact.
Verify: upgrade E2E test confirms stream tables survive and refresh
correctly after 0.24.0 -> 0.25.0. TypeScript API backward compatibility
verified.
Dependencies: None. Schema change: No.
STAB-5 — Framework version compatibility matrix
In plain terms: Test
useStreamTable()against: React 18.x, React 19.x, Vue 3.4+. Document which framework versions are supported. Future consideration: Svelte 5 (runes), SolidJS, Angular signals — document these as "community-contributed" integration points, not first-party.
Verify: CI matrix testing React 18, React 19, Vue 3.4. Published compatibility table in npm README. Dependencies: PGL-3-2, PGL-3-3. Schema change: No.
Performance
| ID | Title | Effort | Priority |
|---|---|---|---|
| PERF-1 | INSERT-to-render latency benchmark | M | P0 |
| PERF-2 | Batch rendering efficiency (single re-render) | S | P0 |
| PERF-3 | Bridge overhead vs raw live.changes() | S | P1 |
PERF-1 — INSERT-to-render latency benchmark
In plain terms: Measure the end-to-end latency from
INSERT INTO source_tableto the React component's DOM update. The target is < 50% oflive.incrementalQuery()latency for a 3-table join + aggregate at 10K rows (per PLAN_PGLITE.md). This is the headline metric: if pg_trickle's reactive path is not significantly faster than PGlite's built-in incremental query, the value proposition collapses.
Verify: benchmark suite with 5 complexity levels (scan, filter, join,
aggregate, window). Publish results as a comparison table against
live.incrementalQuery(). Target: < 50% latency at 10K rows.
Dependencies: PGL-3-1, PGL-3-2, PGL-3-5. Schema change: No.
PERF-2 — Batch rendering efficiency (single re-render)
In plain terms: A bulk INSERT (100 rows) must produce exactly one React re-render, not 100. The change bridge must batch events emitted within the same transaction into a single
live.changes()notification. UsequeueMicrotask()orrequestAnimationFrame()batching in the TypeScript wrapper to coalesce rapid-fire events.
Verify: React profiler shows ≤ 1 render per bulk DML. Test with 1, 10, 100, 1000-row INSERTs; render count is always 1. Dependencies: PGL-3-1, PGL-3-2, CORR-2. Schema change: No.
PERF-3 — Bridge overhead vs raw live.changes()
In plain terms: The change bridge adds a translation layer between the IMMEDIATE mode delta application and PGlite's
live.changes()API. Measure the overhead of this translation (serialization, event construction, key mapping) and ensure it is < 5% of total refresh latency. If overhead is higher, optimize the bridge's change event construction (e.g., avoid JSON round-trips, use structured clones).
Verify: micro-benchmark isolating bridge overhead from WASM refresh time. Document overhead as percentage of total INSERT-to-event latency. Dependencies: PGL-3-1. Schema change: No.
Scalability
| ID | Title | Effort | Priority |
|---|---|---|---|
| SCAL-1 | Multiple concurrent subscriptions | S | P1 |
| SCAL-2 | Large result set rendering (10K+ rows) | M | P1 |
| SCAL-3 | Multi-tab / SharedWorker isolation | S | P2 |
SCAL-1 — Multiple concurrent subscriptions
In plain terms: A dashboard page may render 5-10
useStreamTable()hooks simultaneously, each watching a different stream table. The bridge must not create per-hook subscriptions tolive.changes()— instead, use a single multiplexed subscription that fans out to registered hooks. Measure performance with 1, 5, 10, 20 concurrent hooks.
Verify: benchmark with 20 concurrent useStreamTable() hooks; latency
degradation < 20% vs single hook. Memory growth linear (not quadratic).
Dependencies: PGL-3-1, PGL-3-2. Schema change: No.
SCAL-2 — Large result set rendering (10K+ rows)
In plain terms: A stream table with 10K+ rows produces a large initial snapshot when
useStreamTable()mounts. The hook must support virtualized rendering (integrating with libraries likereact-virtualortanstack-virtual) by providing a stable row identity key (__pgt_row_id) and fine-grained change signals (which rows changed, not just "something changed"). Without this, mounting a 10K-row stream table would freeze the UI for seconds.
Verify: demo app with 10K-row stream table using @tanstack/react-virtual.
Mount time < 200 ms. Single-row INSERT re-renders only the affected row,
not the full list.
Dependencies: PGL-3-2, PGL-3-4. Schema change: No.
SCAL-3 — Multi-tab / SharedWorker isolation
In plain terms: In multi-tab apps using PGlite with SharedWorker, each tab gets its own
useStreamTable()hooks but shares a single PGlite instance. The bridge must correctly fan out change events to all tabs without cross-tab interference or duplicate processing. Document the SharedWorker architecture and test with 3 concurrent tabs.
Verify: 3-tab test with shared PGlite instance via SharedWorker. INSERT in tab 1 causes re-render in all 3 tabs. No duplicate events. No memory leaks across tabs. Dependencies: PGL-3-1. Schema change: No.
Ease of Use
| ID | Title | Effort | Priority |
|---|---|---|---|
| UX-1 | Local-first app example: collaborative todo | M | P0 |
| UX-2 | Real-time dashboard example | M | P0 |
| UX-3 | API reference with interactive playground | S | P1 |
| UX-4 | Migration guide from live.incrementalQuery() | S | P1 |
UX-1 — Local-first app example: collaborative todo
In plain terms: A complete, runnable React app demonstrating pg_trickle + PGlite for a collaborative todo list: multiple "users" (simulated in separate components) INSERT/UPDATE/DELETE todos, each user's view updates reactively via
useStreamTable(). Published in the monorepo underexamples/pglite-todo/with a CodeSandbox link. This is the primary "show, don't tell" marketing asset.
Verify: example app runs in CodeSandbox with zero local setup. README explains every code section. A non-pg_trickle developer can understand it in 5 minutes. Dependencies: PGL-3-2, PGL-3-4. Schema change: No.
UX-2 — Real-time dashboard example
In plain terms: A React dashboard with 3 stream tables: (a) live order count (aggregate), (b) revenue by region (join + aggregate), (c) top products (window function + LIMIT). Data is inserted via a simulated event stream. Each panel updates reactively. Demonstrates the breadth of SQL operators supported in PGlite, beyond what
live.incrementalQuery()can efficiently handle.
Verify: example app with 3 panels. INSERT 100 orders; all 3 panels update with a single render each. Published to CodeSandbox. Dependencies: PGL-3-2, PGL-3-4. Schema change: No.
UX-3 — API reference with interactive playground
In plain terms: An interactive documentation page (MDX or Storybook) where users can type SQL, create a stream table, insert data, and see the
useStreamTable()hook update live — all in the browser via PGlite. This replaces the need for a local install for initial exploration.
Verify: playground page loads in < 3 seconds. Users can create a stream table and see reactive updates within 30 seconds of page load. Dependencies: PGL-3-2, UX-1. Schema change: No.
UX-4 — Migration guide from live.incrementalQuery()
In plain terms: Users already using PGlite's
live.incrementalQuery()need a clear guide showing: (a) when to switch to pg_trickle (complex queries, high-throughput writes, large result sets), (b) how to migrate step-by-step (replacelive.incrementalQuery(q)withcreateStreamTable(q)+useStreamTable(name)), (c) what to expect (latency improvement, memory trade-off, SQL surface differences).
Verify: migration guide published in docs. Includes a before/after code diff and a decision flowchart. Dependencies: PGL-3-4, PERF-1. Schema change: No.
Test Coverage
| ID | Title | Effort | Priority |
|---|---|---|---|
| TEST-1 | Change event fidelity suite (all operators) | L | P0 |
| TEST-2 | React hook lifecycle tests | M | P0 |
| TEST-3 | Vue composable lifecycle tests | M | P0 |
| TEST-4 | Cross-framework render count assertions | S | P0 |
| TEST-5 | Long-running soak test for memory leaks | M | P1 |
TEST-1 — Change event fidelity suite (all operators)
In plain terms: For each of the 23 DVM operators, test that the
live.changes()bridge emits the correct change events for INSERT, UPDATE, and DELETE on the source table. Replay events into an accumulator and assert it matchesSELECT * FROM stream_table. This extends v0.24.0 TEST-1 (operator E2E) by adding the reactive layer.
Verify: ≥ 69 tests (23 operators × 3 DML types). Accumulator matches
SELECT * for every test case.
Dependencies: PGL-3-1, v0.24.0 TEST-1. Schema change: No.
TEST-2 — React hook lifecycle tests
In plain terms: Test the full lifecycle of
useStreamTable(): (a) initial mount returns current stream table state, (b) INSERT on source triggers re-render with new data, (c) unmount cancels subscription, (d) remount re-subscribes and returns current state, (e) rapid mount/unmount (100 cycles) has no leaks. Use React Testing Library withrenderHook().
Verify: ≥ 15 tests covering mount, update, unmount, remount, error, and stress scenarios. Zero console warnings in test output. Dependencies: PGL-3-2. Schema change: No.
TEST-3 — Vue composable lifecycle tests
In plain terms: Equivalent of TEST-2 for Vue: mount, update, unmount, remount, error handling. Use Vue Test Utils with
mount()andwrapper.unmount(). Test with both Options API and Composition API usage patterns.
Verify: ≥ 10 tests covering Vue lifecycle. Zero console warnings. Dependencies: PGL-3-3. Schema change: No.
TEST-4 — Cross-framework render count assertions
In plain terms: For each framework (React, Vue), verify that a bulk INSERT (100 rows) triggers exactly 1 render, not 100. This is the batching correctness test. Use framework-specific profiling APIs (React Profiler, Vue DevTools perf hooks) to count renders.
Verify: render count = 1 for 100-row bulk INSERT in both React and Vue. CI assertion. Dependencies: PGL-3-2, PGL-3-3, PERF-2. Schema change: No.
TEST-5 — Long-running soak test for memory leaks
In plain terms: Run a React app with
useStreamTable()for 4 hours with 1 mutation/second. Take heap snapshots at 0h, 1h, 2h, 4h. Assert heap growth < 10%. Check for detached DOM nodes, leaked event listeners, and orphaned closures. This validates STAB-1 under real conditions.
Verify: soak test runs in CI (with a 30-min abbreviated version for PR CI). Full 4-hour version runs in nightly CI. Heap growth < 10%. Dependencies: STAB-1, PGL-3-2. Schema change: No.
Conflicts & Risks
-
live.changes()API stability. PGlite'slive.changes()is relatively new and its event format may change between PGlite releases. Pin the PGlite version and add an adapter layer so the bridge can accommodate event format changes without rewriting the React/Vue hooks. If PGlite deprecateslive.changes()before v0.25.0 ships, fall back toLISTEN/NOTIFYwith a custom channel. -
CORR-2 (batch atomicity) and PERF-2 (single re-render) are coupled. The batching mechanism must ensure correctness (all-or-nothing event delivery) AND performance (single render). Using
queueMicrotask()for batching risks splitting a transaction's events across two microtasks if the event stream straddles a microtask boundary. Consider explicit transaction-boundary markers in the bridge's event protocol. -
React concurrent mode complicates CORR-3 (rapid mutations). React 18/19 concurrent features (
startTransition,useDeferredValue) may delay or re-order state updates fromuseStreamTable(). The hook must useuseSyncExternalStore()(React 18+) to ensure tearing-free reads. This is non-negotiable for correctness. -
SCAL-2 (large result set rendering) requires external library integration. The
useStreamTable()hook should not bundle a virtualization library — instead, expose stable row keys and fine-grained change signals that integrate with@tanstack/react-virtualor similar. Document the pattern but do not create a hard dependency. -
SCAL-3 (SharedWorker) is exploratory. PGlite's SharedWorker support has known limitations (no concurrent transactions). Mark SCAL-3 as P2 and scope it to documentation + a proof-of-concept, not production-grade support.
-
No native extension changes in v0.25.0. This release is entirely in the TypeScript/npm layer. Any temptation to add native features (e.g.,
LISTEN/NOTIFYbridge, WebSocket push) should be deferred to post-1.0. Keep the scope tight: reactive bindings + examples + docs.
v0.25.0 total: ~2–3 weeks (bridge + hooks) + ~1–2 weeks (examples + testing + polish)
Exit criteria:
-
PGL-3-1: Stream table changes appear in
live.changes()event stream -
PGL-3-2: React
useStreamTable()hook re-renders on stream table changes -
PGL-3-3: Vue
useStreamTable()composable re-renders on stream table changes - PGL-3-4: At least 2 example apps published with documentation and CodeSandbox links
- PGL-3-5: End-to-end latency benchmarked and published
-
CORR-1: 1,000-operation replay test: accumulator matches
SELECT *for all operators - CORR-2: 100-row bulk INSERT triggers exactly 1 re-render
-
CORR-3: 50 rapid same-row mutations: final hook state matches
SELECT * - CORR-4: DELETE + re-INSERT with same PK: correct unmount/mount lifecycle
- STAB-1: 4-hour soak test: heap growth < 10%
- STAB-2: 100 mount/unmount cycles: zero leaked subscriptions
- STAB-3: Stream table dropped while hook active: error boundary catches
-
STAB-4: Extension upgrade path tested (
0.24.0 -> 0.25.0) - STAB-5: CI matrix passes for React 18, React 19, Vue 3.4+
-
PERF-1: INSERT-to-render latency < 50% of
live.incrementalQuery()at 10K rows - PERF-2: Render count = 1 for bulk DML (1, 10, 100, 1000 rows)
- TEST-1: ≥ 69 change event fidelity tests pass (23 operators × 3 DML types)
- TEST-2: ≥ 15 React hook lifecycle tests pass
- TEST-3: ≥ 10 Vue composable lifecycle tests pass
- TEST-4: Cross-framework render count = 1 for bulk DML
- TEST-5: 30-min abbreviated soak test passes in PR CI
- UX-1: Collaborative todo example published to CodeSandbox
- UX-2: Real-time dashboard example published to CodeSandbox
-
UX-4: Migration guide from
live.incrementalQuery()published -
just check-version-syncpasses (incl. npm package version)
v1.0.0 — Stable Release
Goal: First officially supported release. Semantic versioning locks in. API, catalog schema, and GUC names are considered stable. Focus is distribution — getting pg_trickle onto package registries — and PostgreSQL 19 forward-compatibility.
PostgreSQL 19 Forward-Compatibility (A3)
In plain terms: When PostgreSQL 19 beta stabilises and pgrx 0.18.x ships with PG 19 support, this milestone bumps the pgrx dependency, audits every internal
pg_sys::*API call for breaking changes, adds conditional compilation gates, and validates the WAL decoder against any pgoutput format changes introduced in PG 19. Moved here from the earlier v0.22.0 milestone because PG 19 beta availability is uncertain.
| Item | Description | Effort | Ref |
|---|---|---|---|
| A3-1 | pgrx version bump to 0.18.x (PG 19 support) + cargo pgrx init --pg19 | 2–4h | PLAN_PG19_COMPAT.md §2 |
| A3-2 | pg_sys::* API audit: heap access, catalog structs, WAL decoder LogicalDecodingContext | 8–16h | PLAN_PG19_COMPAT.md §3 |
| A3-3 | Conditional compilation (#[cfg(feature = "pg19")]) for changed APIs | 4–8h | PLAN_PG19_COMPAT.md §4 |
| A3-4 | CI matrix expansion for PG 19 + full E2E suite run | 4–8h | PLAN_PG19_COMPAT.md |
A3 subtotal: ~18–36 hours
Release engineering
In plain terms: The 1.0 release is the official "we stand behind this API" declaration — from this point on the function names, catalog schema, and configuration settings won't change without a major version bump. The practical work is getting pg_trickle onto standard package registries (PGXN, apt, rpm) so it can be installed with the same commands as any other PostgreSQL extension, and hardening the CloudNativePG integration for Kubernetes deployments.
| Item | Description | Effort | Ref |
|---|---|---|---|
| R1 | Semantic versioning policy + compatibility guarantees | 2–3h | PLAN_VERSIONING.md |
| R2 | apt / rpm packaging (Debian/Ubuntu .deb + RHEL .rpm via PGDG) | 8–12h | PLAN_PACKAGING.md |
| R2b | PGXN release_status → "stable" (flip one field; PGXN testing release ships in v0.7.0) | 30min | PLAN_PACKAGING.md |
| R3 | ✅ Done | PLAN_CLOUDNATIVEPG.md | |
| R4 | 4–6h | PLAN_CLOUDNATIVEPG.md | |
| R5 | Docker Hub official image. Publish pgtrickle/pg_trickle:1.0.0-pg18 and :latest to Docker Hub. Sync Dockerfile.hub version tag with release. Automate via GitHub Actions release workflow. | 2–4h | — |
| R6 | Version sync automation. Ensure just check-version-sync covers all version references (Cargo.toml, extension control files, Dockerfile.hub, dbt_project.yml, CNPG manifests). Add to CI as a blocking check. | 2–3h | — |
| SAST-SEMGREP | Elevate Semgrep to blocking in CI. CodeQL and cargo-deny already block; Semgrep is advisory-only. Flip to blocking for consistent safety gating. Before flipping, verify zero findings across all current rules. | 1–2h | PLAN_SAST.md |
v1.0.0 total: ~36–66 hours (incl. PG 19 compat ~18–36h + release engineering ~18–30h)
Exit criteria:
- A3: PG 19 builds and passes full E2E suite
- CI matrix includes PG 19
- Published on PGXN (stable) and apt/rpm via PGDG
-
Docker Hub image published (
pgtrickle/pg_trickle:1.0.0-pg18and:latest) -
CNPG extension image published to GHCR (
pg_trickle-ext) - CNPG cluster-example.yaml validated (Image Volume approach)
-
just check-version-syncpasses and blocks CI on mismatch - SAST-SEMGREP: Semgrep elevated to blocking in CI; zero findings verified
- Upgrade path from v0.17.0 tested
- Semantic versioning policy in effect
Post-1.0 — Scale, Ecosystem & Platform Expansion
These are not gated on 1.0 but represent the longer-term horizon. PG backward compatibility (PG 16–18) and native DDL syntax were moved here from v0.16.0 to keep the pre-1.0 milestones focused on performance and correctness.
Ecosystem expansion
In plain terms: Building first-class integrations with the tools most data teams already use — a proper dbt adapter (beyond just a materialization macro), an Airflow provider so you can trigger stream table refreshes from Airflow DAGs, a
pgtrickleTUI for managing and monitoring stream tables without writing SQL (shipped in v0.14.0), and integration guides for popular ORMs and migration frameworks like Django, SQLAlchemy, Flyway, and Liquibase.
| Item | Description | Effort | Ref |
|---|---|---|---|
| E1 | dbt full adapter (dbt-pgtrickle extending dbt-postgres) | 20–30h | PLAN_DBT_ADAPTER.md |
| E2 | Airflow provider (apache-airflow-providers-pgtrickle) | 16–20h | PLAN_ECO_SYSTEM.md §4 |
pgtrickle) for management outside SQL | 4–6d | PLAN_TUI.md | |
| E4 | 8–12h | PLAN_ECO_SYSTEM.md §5 | |
| E5 | 8–12h | PLAN_ECO_SYSTEM.md §5 |
Scale
In plain terms: When you have hundreds of stream tables or a very large cluster, the single background worker that drives pg_trickle today can become a bottleneck. These items explore running the scheduler as an external sidecar process (outside the database itself), distributing stream tables across Citus shards for horizontal scale-out, and managing stream tables that span multiple databases in the same PostgreSQL cluster.
| Item | Description | Effort | Ref |
|---|---|---|---|
| S1 | External orchestrator sidecar for 100+ STs | 20–40h | REPORT_PARALLELIZATION.md §D |
| S2 | Citus / distributed PostgreSQL compatibility | ~6 months | plans/infra/CITUS.md |
| S3 | Multi-database support (beyond postgres DB) | TBD | PLAN_MULTI_DATABASE.md |
PG Backward Compatibility (PG 16–18)
In plain terms: pg_trickle currently only targets PostgreSQL 18. This work adds support for PG 16 and PG 17 so teams that haven't yet upgraded can still use the extension. Each PostgreSQL major version has subtly different internal APIs — especially around query parsing and the WAL format used for change-data-capture — so each version needs its own feature flags, build path, and CI test run.
| Item | Description | Effort | Ref |
|---|---|---|---|
| BC1 | Cargo.toml feature flags (pg16, pg17, pg18) + cfg_aliases | 4–8h | PLAN_PG_BACKCOMPAT.md §5.2 Phase 1 |
| BC2 | #[cfg] gate JSON_TABLE nodes in parser.rs (~250 lines, PG 17+) | 12–16h | PLAN_PG_BACKCOMPAT.md §5.2 Phase 2 |
| BC3 | pg_get_viewdef() trailing-semicolon behavior verification | 2–4h | PLAN_PG_BACKCOMPAT.md §5.2 Phase 3 |
| BC4 | CI matrix expansion (PG 16, 17, 18) + parameterized Dockerfiles | 12–16h | PLAN_PG_BACKCOMPAT.md §5.2 Phases 4–5 |
| BC5 | WAL decoder validation against PG 16–17 pgoutput format | 8–12h | PLAN_PG_BACKCOMPAT.md §6A |
Backward compatibility subtotal: ~38–56 hours
Native DDL Syntax
In plain terms: Currently you create stream tables by calling a function:
SELECT pgtrickle.create_stream_table(...). This adds support for standard PostgreSQL DDL syntax:CREATE MATERIALIZED VIEW my_view WITH (pgtrickle.stream = true) AS SELECT .... That single change meanspg_dumpcan back them up properly,\dmin psql lists them, ORMs can introspect them, and migration tools like Flyway treat them like ordinary database objects. Stream tables finally look native to PostgreSQL tooling.
| Item | Description | Effort | Ref |
|---|---|---|---|
| NAT-1 | ProcessUtility_hook infrastructure: register in _PG_init(), dispatch+passthrough, hook chaining with TimescaleDB/pg_stat_statements | 3–5d | PLAN_NATIVE_SYNTAX.md §Tier 2 |
| NAT-2 | CREATE/DROP/REFRESH interception: parse CreateTableAsStmt reloptions, route to internal impls, IF EXISTS handling, CONCURRENTLY no-op | 8–13d | PLAN_NATIVE_SYNTAX.md §Tier 2 |
| NAT-3 | E2E tests: CREATE/DROP/REFRESH via DDL syntax, hook chaining, non-pg_trickle matview passthrough | 2–3d | PLAN_NATIVE_SYNTAX.md §Tier 2 |
Native DDL syntax subtotal: ~13–21 days
Advanced SQL
In plain terms: Longer-horizon features requiring significant research — backward-compatibility to PG 14/15, partitioned stream table storage, and remaining SQL coverage gaps. Several items have been pulled forward to v0.16.0 and v0.17.0.
| Item | Description | Effort | Ref |
|---|---|---|---|
| ~36–54h | PLAN_TRANSACTIONAL_IVM.md | ||
| ~18–36h | PLAN_PG19_COMPAT.md | ||
| A4 | PostgreSQL 14–15 backward compatibility | ~40h | PLAN_PG_BACKCOMPAT.md |
| A5 | Partitioned stream table storage (opt-in) | ~60–80h | PLAN_PARTITIONING_SHARDING.md §4 |
pg_trickle.buffer_partitioning GUC) | ✅ Done | PLAN_EDGE_CASES_TIVM_IMPL_ORDER.md Stage 4 §3.3 | |
ROWS FROM() with multiple SRF functions | ~1–2d | PLAN_TRANSACTIONAL_IVM_PART_2.md Task 2.3 |
Parser Modularization & Shared Template Cache (G13-PRF, G14-SHC)
In plain terms: Two large-effort research items identified in the deep gap analysis. Parser modularization is a prerequisite for native DDL syntax (BC2); shared template caching eliminates per-connection cold-start overhead.
| Item | Description | Effort | Ref |
|---|---|---|---|
src/dvm/parser.rs. | ~3–4wk | plans/performance/REPORT_OVERALL_STATUS.md §13 | |
| ~2–3wk | plans/performance/REPORT_OVERALL_STATUS.md §14 |
Parser modularization: ✅ Done in v0.15.0. Template caching: ➡️ v0.16.0
Convenience API Functions (G15-BC, G15-EX)
In plain terms: Two quality-of-life API additions that simplify programmatic stream table management, useful for dbt/CI pipelines.
| Item | Description | Effort | Ref |
|---|---|---|---|
| G15-BC | bulk_create(definitions JSONB) — create multiple stream tables and their CDC triggers in a single transaction. Useful for dbt/CI pipelines that manage many STs programmatically. ➡️ Pulled to v0.15.0 | ~2–3d | plans/performance/REPORT_OVERALL_STATUS.md §15 |
export_definition(name TEXT) — export a stream table configuration as reproducible CREATE STREAM TABLE … WITH (…) DDL. | ~1–2d | plans/performance/REPORT_OVERALL_STATUS.md §15 |
Convenience API subtotal: ~2–3 days (G15-EX pulled to v0.14.0; G15-BC pulled to v0.15.0)
Effort Summary
| Milestone | Effort estimate | Cumulative | Status |
|---|---|---|---|
| v0.1.x — Core engine + correctness | ~30h actual | 30h | ✅ Released |
| v0.2.0 — TopK, Diamond & Transactional IVM | ✔️ Complete | 62–78h | ✅ Released |
| v0.2.1 — Upgrade Infrastructure & Documentation | ~8h | 70–86h | ✅ Released |
| v0.2.2 — OFFSET Support, ALTER QUERY & Upgrade Tooling | ~50–70h | 120–156h | ✅ Released |
| v0.2.3 — Non-Determinism, CDC/Mode Gaps & Operational Polish | 45–66h | 165–222h | ✅ Released |
| v0.3.0 — DVM Correctness, SAST & Test Coverage | ~20–30h | 185–252h | ✅ Released |
| v0.4.0 — Parallel Refresh & Performance Hardening | ~60–94h | 245–346h | ✅ Released |
| v0.5.0 — RLS, Operational Controls + Perf Wave 1 (A-3a only) | ~51–97h | 296–443h | ✅ Released |
| v0.6.0 — Partitioning, Idempotent DDL & Circular Dependency Foundation | ~35–50h | 331–493h | ✅ Released |
| v0.7.0 — Performance, Watermarks, Circular DAG Execution, Observability & Infrastructure | ~59–62h | 390–555h | |
| v0.8.0 — pg_dump Support & Test Hardening | ~16–21d | — | |
| v0.9.0 — Incremental Aggregate Maintenance (B-1) | ~7–9 wk | — | |
| v0.10.0 — DVM Hardening, Connection Pooler Compat, Core Refresh Opts & Infra Prep | ~7–10d + ~26–40 wk | — | |
| v0.11.0 — Partitioned Stream Tables, Prometheus & Grafana, Safety Hardening & Correctness | ~7–10 wk + ~12h obs + ~14–21h defaults + ~7–12h safety + ~2–4 wk should-ship | — | |
| v0.12.0 — Scalability Foundations, Partitioning Enhancements & Correctness | ~18–27 wk + ~6–8 wk scalability + ~5–8 wk partitioning + ~1–3 wk defaults | — | |
| v0.13.0 — Scalability Foundations, Partitioning Enhancements, MERGE Profiling & Multi-Tenant Scheduling | ~15–23 wk | — | |
| v0.14.0 — Tiered Scheduling, UNLOGGED Buffers & Diagnostics | ~2–6 wk + ~1 wk patterns + ~2–4d stability + ~3.5–7d diagnostics + ~1–2d export + ~4–6d TUI + ~0.5d docs | — | |
| v0.15.0 — External Test Suites & Integration | ~40–70h + ~2–3d bulk create + ~3–5d planner hints + ~2–3d cache spike + ~3–4wk parser + ~1–2wk watermark + ~2–4wk delta cost/spill | — | ✅ Released |
| v0.16.0 — Performance & Refresh Optimization | ~1–2wk MERGE alts + ~4–6wk aggregate fast-path + ~1–2wk append-only + ~2–3wk predicate pushdown + ~2–3wk template cache + ~2–3wk buffer compaction + ~3–6wk test coverage + ~1–2wk bench CI + ~2–3d auto-indexing + ~12–22h quick wins | — | |
| v0.17.0 — Query Intelligence & Stability | ~2–3wk cost-based strategy + ~3–4wk columnar tracking + ~32–48h TIVM Phase 4 + ~1–2d ROWS FROM + ~2–3wk SQLancer + ~2–3wk incremental DAG + ~4–8h unsafe reduction + ~1–2wk api.rs mod + ~2–3d migration guide + ~3–5d runbook + ~2–3d playground + ~2–3d doc polish | — | |
| v0.18.0 — Hardening & Delta Performance | ~70–100h | — | |
| v0.19.0 — Production Gap Closure & Distribution | ~4–5 weeks | — | |
| v0.20.0 — Dog-Feeding (pg_trickle monitors itself) | ~3–4wk | — | |
| v0.21.0 — PostgreSQL 17 Support | ~2–4d | — | |
| v0.22.0 — PGlite Proof of Concept | ~2–3wk (plugin) + ~1–2d (version bump) | — | |
v0.23.0 — Core Extraction (pg_trickle_core) | ~3–4wk (extraction) + ~1–2wk (abstraction + testing) | — | |
| v0.24.0 — PGlite WASM Extension | ~5–7wk (WASM build) + ~2–3wk (testing + polish) | — | |
| v0.25.0 — PGlite Reactive Integration | ~2–3wk (bridge + hooks) + ~1–2wk (examples + testing + polish) | — | |
| v1.0.0 — Stable release (incl. PG 19 compat) | ~36–66h | — | |
| Post-1.0 (PG compat + Native DDL) | ~38–56h (PG 16–18) + ~13–21d (Native DDL) | — | |
| Post-1.0 (ecosystem) | 88–134h | — | |
| Post-1.0 (scale) | 6+ months | — |
References
| Document | Purpose |
|---|---|
| CHANGELOG.md | What's been built |
| plans/PLAN.md | Original 13-phase design plan |
| plans/sql/SQL_GAPS_7.md | 53 known gaps, prioritized |
| plans/sql/PLAN_PARALLELISM.md | Detailed implementation plan for true parallel refresh |
| plans/performance/REPORT_PARALLELIZATION.md | Parallelization options analysis |
| plans/performance/STATUS_PERFORMANCE.md | Benchmark results |
| plans/ecosystem/PLAN_ECO_SYSTEM.md | Ecosystem project catalog |
| plans/dbt/PLAN_DBT_ADAPTER.md | Full dbt adapter plan |
| plans/infra/CITUS.md | Citus compatibility plan |
| plans/infra/PLAN_VERSIONING.md | Versioning & compatibility policy |
| plans/infra/PLAN_PACKAGING.md | PGXN / deb / rpm packaging |
| plans/infra/PLAN_DOCKER_IMAGE.md | Official Docker image (superseded by CNPG extension image) |
| plans/ecosystem/PLAN_CLOUDNATIVEPG.md | CNPG Image Volume extension image |
| plans/infra/PLAN_MULTI_DATABASE.md | Multi-database support |
| plans/infra/PLAN_PG19_COMPAT.md | PostgreSQL 19 forward-compatibility |
| plans/sql/PLAN_UPGRADE_MIGRATIONS.md | Extension upgrade migrations |
| plans/sql/PLAN_TRANSACTIONAL_IVM.md | Transactional IVM (immediate, same-transaction refresh) |
| plans/sql/PLAN_ORDER_BY_LIMIT_OFFSET.md | ORDER BY / LIMIT / OFFSET gaps & TopK support |
| plans/sql/PLAN_NON_DETERMINISM.md | Non-deterministic function handling |
| plans/sql/PLAN_ROW_LEVEL_SECURITY.md | Row-Level Security support plan (Phases 1–4) |
| plans/infra/PLAN_PARTITIONING_SHARDING.md | PostgreSQL partitioning & sharding compatibility |
| plans/infra/PLAN_PG_BACKCOMPAT.md | Supporting older PostgreSQL versions (13–17) |
| plans/sql/PLAN_DIAMOND_DEPENDENCY_CONSISTENCY.md | Diamond dependency consistency (multi-path refresh atomicity) |
| plans/adrs/PLAN_ADRS.md | Architectural decisions |
| docs/ARCHITECTURE.md | System architecture |
Release Process
This document describes how to create a release of pg_trickle.
Overview
Releases are fully automated via GitHub Actions. Pushing a version tag (v*)
triggers the Release workflow, which:
- Runs a preflight version-sync check to ensure all version references match the tag
- Builds extension packages for Linux (amd64), macOS (arm64), and Windows (amd64)
- Smoke-tests the Linux artifact against a live PostgreSQL 18 instance
- Creates a GitHub Release with archives and SHA256 checksums
- Builds and pushes a multi-arch extension image to GHCR (for CNPG Image Volumes)
A separate PGXN workflow also fires on the same
v* tag and publishes the source archive to the PostgreSQL Extension Network.
Prerequisites
- Push access to the repository (or a PR merged by a maintainer)
- All CI checks passing on
main(verify the last run on the version-bump commit succeeded) - The version in
Cargo.tomlmatches the tag you intend to push - Required GitHub secrets configured (see Required GitHub Secrets below)
Required GitHub Secrets
The release automation uses the following GitHub Actions secrets. Set them under Settings → Secrets and variables → Actions → New repository secret.
| Secret | Used by | Description |
|---|---|---|
PGXN_USERNAME | pgxn.yml | Your PGXN account username. Used to authenticate the curl upload to PGXN Manager when publishing source archives to the PostgreSQL Extension Network. Register at pgxn.org. |
PGXN_PASSWORD | pgxn.yml | Password for the PGXN account above. Never hardcode this — it must be stored as a secret so it is never exposed in logs or committed to the repository. |
CODECOV_TOKEN | coverage.yml | Upload token for Codecov. Used to publish unit and E2E coverage reports. Obtain it from the Codecov dashboard after linking the repository. The workflow degrades gracefully (fail_ci_if_error: false) if absent. |
BENCHER_API_TOKEN | benchmarks.yml | API token for Bencher, the continuous benchmarking platform. Used to track Criterion benchmark results on main and detect regressions on pull requests. The benchmark steps are skipped entirely when this secret is absent, so CI still passes without it. Create a project at bencher.dev and copy the token from the project settings. |
Note: The
GITHUB_TOKENsecret is provided automatically by GitHub Actions and does not need to be configured manually. It is used by the release workflow to create GitHub Releases, by the Docker workflow to push images to GHCR, and by Bencher to post PR comments.
Step-by-Step
1. Decide the version number
Follow Semantic Versioning:
| Change type | Bump | Example |
|---|---|---|
| Breaking SQL API or config change | Major | 1.0.0 → 2.0.0 |
| New feature, backward-compatible | Minor | 0.1.0 → 0.2.0 |
| Bug fix, no API change | Patch | 0.2.0 → 0.2.1 |
| Pre-release / release candidate | Suffix | 0.3.0-rc.1 |
2. Update the version
Four files must have their version bumped together:
# 1. Cargo.toml — the canonical version source for the extension
# Change: version = "0.7.0" → version = "0.8.0"
# 2. pgtrickle-tui/Cargo.toml — the TUI binary; must always match Cargo.toml
# Change: version = "0.7.0" → version = "0.8.0"
# 3. META.json — the PGXN package metadata
# Change both top-level "version" and the nested "provides" version
# 4. CHANGELOG.md
# Rename ## [Unreleased] → ## [0.8.0] — YYYY-MM-DD
# Add a new empty ## [Unreleased] section at the top
Important:
Cargo.toml(extension) andpgtrickle-tui/Cargo.toml(TUI) must always carry the same version. They are built and released together, and a mismatch causescargo install --path pgtrickle-tuito report the wrong version. Thejust check-version-syncscript does not currently enforce this, so it must be checked manually.
The extension control file (pg_trickle.control) uses
default_version = '@CARGO_VERSION@', which pgrx substitutes automatically at
build time — no manual edit needed there.
After editing, verify all version-related files are in sync:
just check-version-sync
3. Commit the version bump
git add Cargo.toml META.json CHANGELOG.md
git commit -m "release: v0.8.0"
git push origin main
4. Wait for CI to pass and verify upgrade completeness
Ensure the CI workflow passes on main with
the version bump commit. All unit, integration, E2E, and pgrx tests must be
green.
Critical: Before tagging, verify that the upgrade script covers all SQL schema changes:
# Run comprehensive upgrade completeness checks
just check-upgrade-all
# If any check fails (e.g. "ERROR: X new function(s) missing from upgrade script"),
# fix the issue by adding the missing SQL objects to:
# sql/pg_trickle--<prev>--<new>.sql
#
# Then re-run until all checks pass:
just check-upgrade-all # Should print "All 15 upgrade step(s) passed completeness checks."
Why this matters: New SQL functions, views, tables, and columns added in any prior
release must be carried forward in the upgrade script, even if the current release
doesn't change them. The upgrade script is the source of truth for what PostgreSQL
applies when users run ALTER EXTENSION pg_trickle UPDATE.
Confirm the local and CI upgrade-E2E defaults were advanced to the new release:
just check-version-sync # Verifies ci.yml, justfile, and test defaults
5. Create and push the tag
git tag -a v0.2.0 -m "Release v0.2.0"
git push origin v0.2.0
This triggers the Release workflow automatically.
6. Monitor the release
Watch the Actions tab for progress. The release workflow runs these jobs in order:
preflight ──► build-release (linux, macos, windows)
│
▼
test-release ──► publish-release
──► publish-docker-arch (linux/amd64 + linux/arm64)
│
▼
publish-docker (merge manifest + push :latest)
The PGXN workflow (pgxn.yml) runs independently and publishes the source
archive to pgxn.org in parallel with the release workflow.
7. Make the GHCR package public (first release only)
When a package is pushed to GHCR for the first time it is private by default. Because this is an open-source project, packages linked to the public repository inherit public visibility — but you must make the package public once to unlock that:
- Go to github.com/⟨owner⟩ → Packages → pg_trickle-ext
- Click Package settings
- Scroll to Danger Zone → Change package visibility → set to Public
After that first change:
- All future pushes keep the package public automatically
- Unauthenticated
docker pull ghcr.io/grove/pg_trickle-ext:...works - Storage and bandwidth are free (GHCR open-source advantage)
- The package page shows the README, linked repository, license, and description from the OCI labels
8. Verify the release
Once both workflows complete:
- Check the GitHub Releases page for the new release
-
Verify all three platform archives are attached (
.tar.gzfor Linux/macOS,.zipfor Windows) -
Verify
SHA256SUMS.txtis present -
Verify the extension image is available at
ghcr.io/grove/pg_trickle-ext:<version> -
Verify the PGXN upload succeeded:
pgxn info pg_trickleshould show the new version - Optionally verify the extension image layout:
docker pull ghcr.io/grove/pg_trickle-ext:<version>
ID=$(docker create ghcr.io/grove/pg_trickle-ext:<version>)
docker cp "$ID:/lib/" /tmp/ext-lib/
docker cp "$ID:/share/" /tmp/ext-share/
docker rm "$ID"
ls -la /tmp/ext-lib/ /tmp/ext-share/extension/
Post-Release Checklist
Complete these steps immediately after a release tag has been pushed and both the Release and PGXN workflows have finished successfully.
-
Create a post-release branch from
main(e.g.post-release-<ver>-a) -
Bump
Cargo.tomlversionto the next development version (e.g.0.12.0→0.13.0) -
Bump
pgtrickle-tui/Cargo.tomlversionto the same next development version — must always matchCargo.toml -
Bump
META.json— both the top-level"version"and the nested"provides" → "pg_trickle" → "version"to match -
Write
plans/PLAN_0_<next>_0.md— initial planning document for the next milestone -
Delete
plans/PLAN_0_<released>_0.md— remove the now-completed plan -
Wrap roadmap items — in
ROADMAP.md, wrap all completed items from the old release with<details>tags to archive them -
Add
## [Unreleased]stub toCHANGELOG.mdabove the just-released entry -
Create
sql/pg_trickle--<released>--<next>.sql— empty upgrade script stub for the next migration hop -
Copy
sql/archive/pg_trickle--<released>.sql→sql/archive/pg_trickle--<next>.sql— placeholder archive baseline for the next version -
Update
justfile— advancebuild-upgrade-imageandtest-upgradetodefaults to<next>; update thebuild-hubDocker image tag -
Update
tests/e2e_upgrade_tests.rs— advance allunwrap_or("<released>".into())fallback strings to<next> -
Update version numbers in
README.md— search for occurrences of the released version (e.g.0.17.0) and advance them to<next>: CNPG image reference (ghcr.io/grove/pg_trickle-ext:<version>), dbtrevisiontag, and any other hardcoded version strings. A quick check:grep -n '<released>' README.md -
Run
just check-version-sync— must exit 0 before opening the PR -
Open a PR against
mainwith the commit titlechore: start v<next> development cycle
Preparing for the Next Release (Pre-Work Checklist)
Use this checklist at the start of each new release milestone to ensure the repository is properly set up before development begins. This maps directly to what just check-version-sync verifies.
| File / target | Action | check-version-sync check |
|---|---|---|
Cargo.toml | version = "<next>" | canonical version source |
META.json | both "version" fields set to <next> | PGXN manifest |
CHANGELOG.md | ## [Unreleased] section present | (manual hygiene) |
sql/pg_trickle--<prev>--<next>.sql | stub file exists | upgrade SQL exists |
sql/archive/pg_trickle--<next>.sql | placeholder file exists (copy of <prev>) | archive SQL exists |
.github/workflows/ci.yml | upgrade matrix and chain end at <next> | CI matrix up to date |
justfile | build-upgrade-image and test-upgrade to defaults = <next> | justfile defaults |
tests/e2e_upgrade_tests.rs | all unwrap_or fallbacks = "<next>" | e2e fallback strings |
Quick-verify with:
just check-version-sync
# Should print: All version references are in sync.
Release Artifacts
Each release produces:
| Artifact | Description |
|---|---|
pg_trickle-<ver>-pg18-linux-amd64.tar.gz | Extension files for Linux x86_64 |
pg_trickle-<ver>-pg18-macos-arm64.tar.gz | Extension files for macOS Apple Silicon |
pg_trickle-<ver>-pg18-windows-amd64.zip | Extension files for Windows x64 |
SHA256SUMS.txt | SHA-256 checksums for all archives |
ghcr.io/grove/pg_trickle-ext:<ver> | CNPG extension image for Image Volumes (amd64 + arm64) |
Installing from an archive
tar xzf pg_trickle-<version>-pg18-linux-amd64.tar.gz
cd pg_trickle-<version>-pg18-linux-amd64
sudo cp lib/*.so "$(pg_config --pkglibdir)/"
sudo cp extension/*.control extension/*.sql "$(pg_config --sharedir)/extension/"
Then add to postgresql.conf and restart:
shared_preload_libraries = 'pg_trickle'
See INSTALL.md for full installation details.
Pre-releases
Tags containing -rc, -beta, or -alpha (e.g., v0.3.0-rc.1) are
automatically marked as pre-releases on GitHub. Pre-release extension images are
tagged but do not update the latest tag.
Hotfix Releases
For urgent fixes on an older release:
# Branch from the tag
git checkout -b hotfix/v0.2.1 v0.2.0
# Apply fix, bump version to 0.2.1
git commit -am "fix: ..."
git push origin hotfix/v0.2.1
# Tag from the branch (CI will still run the release workflow)
git tag -a v0.2.1 -m "Release v0.2.1"
git push origin v0.2.1
Files to Update for Each Release
Every release requires manual updates to the files below. Missing any of them leads to version skew between the code, the docs, and the packages.
| File | What to change | Why |
|---|---|---|
Cargo.toml | version = "x.y.z" field | The canonical version source. pgrx reads this at build time and substitutes it into pg_trickle.control via @CARGO_VERSION@. The git tag must match. |
META.json | Both "version" fields (top-level and inside "provides") | The PGXN package manifest. The pgxn.yml workflow uploads this file as part of the source archive; a stale version here means the wrong version appears on pgxn.org. |
CHANGELOG.md | Rename ## [Unreleased] → ## [x.y.z] — YYYY-MM-DD; add a new empty ## [Unreleased] at the top | Keeps the public changelog accurate and gives downstream users a dated record of changes. |
ROADMAP.md | Update the preamble's latest-release/current-milestone lines; mark the released milestone done; advance the "We are here" pointer to the next milestone | Keeps the forward-looking plan aligned with reality. Leaves no confusion about what just shipped versus what is next. |
README.md | Update test-count line (~N unit tests + M E2E tests) if test counts changed significantly | The README is the first thing users read; stale numbers erode trust. |
INSTALL.md | Update any version numbers in install commands or example URLs | Users copy-paste installation commands; stale versions cause failures. |
docs/UPGRADING.md | Add the new version-specific migration notes and extend the supported upgrade-path table | Documents exactly what ALTER EXTENSION ... UPDATE will do and which chains are supported. |
sql/pg_trickle--<old>--<new>.sql | Add or update the hand-authored upgrade script for every SQL-surface change (new objects, changed signatures, changed defaults, view changes). Also carry forward all functions/views/tables added in previous releases — the upgrade script is cumulative. | ALTER EXTENSION ... UPDATE only applies what is explicitly scripted; function defaults and signatures stored in pg_proc do not update themselves. Omitting a function that existed in <old> but is expected in <new> will break user upgrades. |
sql/archive/pg_trickle--<new>.sql | Regenerate and commit the full-install SQL baseline for the new version. This file was created as a placeholder copy of <prev> at the start of the development cycle — it must be replaced with the actual generated SQL before tagging. Run cargo pgrx schema (or the equivalent just target) to produce the final schema, then overwrite the placeholder. | Future upgrade-completeness checks and upgrade E2E tests need an exact baseline for the released version. A stale placeholder from the start of the cycle will cause spurious failures. |
.github/workflows/ci.yml, justfile, tests/build_e2e_upgrade_image.sh, tests/Dockerfile.e2e-upgrade | Advance the upgrade-check chain and default upgrade-E2E target version to the new release | Prevents release automation and local upgrade validation from getting stuck on the previous version after a new migration hop is added. |
pg_trickle.control | No manual edit needed — default_version is set to '@CARGO_VERSION@' and pgrx substitutes it at build time. Verify the substitution in the built artifact. | Ensures the SQL CREATE EXTENSION command installs the right version. |
CRITICAL: After updating
sql/pg_trickle--<old>--<new>.sql, always runjust check-upgrade-allto verify that the upgrade script is complete. This checks not just the immediate hop to the new version, but the entire upgrade chain from v0.1.3 onwards. If the check fails (e.g. "ERROR: 3 new function(s) missing"), it means the upgrade script is missing one or more SQL objects that users will expect to have after upgrading. Fix all failures before tagging.
Checklist summary
[ ] Cargo.toml — version bumped
[ ] META.json — both "version" fields updated to match
[ ] CHANGELOG.md — [Unreleased] renamed to [x.y.z] with date; new empty [Unreleased] added
[ ] ROADMAP.md — preamble updated; released milestone marked done
[ ] README.md — test counts current (if materially changed)
[ ] INSTALL.md — version references current
[ ] docs/UPGRADING.md — latest migration notes and supported chains added
[ ] sql/pg_trickle--<old>--<new>.sql — covers every SQL-surface change AND carries forward all previous release functions
[ ] sql/archive/pg_trickle--<new>.sql — regenerated from final schema and committed (replaces the dev-cycle placeholder)
[ ] just check-upgrade-all — all upgrade steps pass completeness checks (not just the one-step hop)
[ ] Upgrade automation defaults — CI/local upgrade checks and E2E target the new version
[ ] just check-version-sync — all version references in sync
[ ] All CI checks on main have passed (verify the last run on the version-bump commit succeeded)
[ ] git tag matches Cargo.toml version
Troubleshooting
Release workflow failed
Go to the Actions tab and identify which job failed. Then follow the appropriate recovery path below.
Option A: Re-run (transient failure)
If the failure is transient — network timeout, registry hiccup, runner issue — you can re-run without changing anything:
- Open the failed workflow run in the Actions tab
- Click Re-run all jobs (or re-run just the failed job)
This works because the v* tag still points to the same commit, and the
workflow uses cancel-in-progress: false so a re-run won't be cancelled.
Option B: Fix code and re-tag
If the failure is a real build or code issue:
# 1. Delete the remote tag
git push origin :refs/tags/v0.2.0
# 2. Delete the local tag
git tag -d v0.2.0
# 3. Fix the issue, commit, and push
git add <files>
git commit -m "fix: ..."
git push origin main
# 4. Re-tag on the new commit and push
git tag -a v0.2.0 -m "Release v0.2.0"
git push origin v0.2.0
This triggers a fresh release workflow run.
Option C: Clean up a partial GitHub Release
If the workflow created a draft or partial Release before failing:
- Go to Releases in the repository
- Delete the broken release (this does not delete the tag)
- Then follow Option A or Option B above
Upgrade script completeness check failed
If just check-upgrade-all reports errors like "ERROR: X new function(s) missing from upgrade script", it means the upgrade SQL script is incomplete:
# 1. Look at the error — it tells you exactly what's missing
just check-upgrade-all # e.g. "ERROR: 3 new function(s) missing from upgrade script:
# - pgtrickle.\"explain_refresh_mode\"
# - pgtrickle.\"fuse_status\"
# - pgtrickle.\"reset_fuse\""
# 2. Find where those objects are defined in the previous release
# (they should already exist in sql/archive/pg_trickle--<prev>.sql)
grep -n "CREATE.*FUNCTION.*explain_refresh_mode" sql/archive/pg_trickle--*.sql
# 3. Copy the function definitions (CREATE OR REPLACE FUNCTION) to the
# upgrade script you're fixing. They should go into:
# sql/pg_trickle--<old>--<new>.sql
#
# Typically, carry-forward functions are grouped in their own section
# at the top of the upgrade script with a comment explaining they're
# from a prior release.
# 4. Re-run the check to verify it passes
just check-upgrade-all
Why this happens: When a new release (e.g. v0.11.0) adds SQL functions, those
functions must be explicitly included in all subsequent upgrade scripts. The upgrade
script is the ground truth — PostgreSQL only applies what is listed in the .sql file.
If you skip a function that users expect, their upgraded extension will be missing
that object.
Common failure causes
| Symptom | Cause | Fix |
|---|---|---|
| Version mismatch error | Cargo.toml version doesn't match the git tag | Run just check-version-sync, fix any skew, commit, delete tag, re-tag (Option B) |
| Build failure | Compilation error in release profile | Fix on main, re-tag (Option B) |
| Docker push failed | Missing permissions | Verify packages: write is in the workflow and GITHUB_TOKEN has GHCR access, then re-run (Option A) |
| Smoke test failed | Extension doesn't load in PostgreSQL | Fix the issue, re-tag (Option B) |
| PGXN upload failed | Missing PGXN_USERNAME / PGXN_PASSWORD secrets, or META.json version not updated | Add the secrets in repository settings; verify META.json version matches the tag; re-run the pgxn.yml workflow from the Actions tab |
just check-upgrade-all reports missing functions/views | Upgrade script is incomplete — new objects from prior releases not carried forward | See "Upgrade script completeness check failed" above for recovery steps |
| Rate limited | GitHub API or GHCR throttling | Wait a few minutes, then re-run (Option A) |
Yanking a release
If a release has a critical issue:
- Mark it as pre-release on the GitHub Releases page (uncheck "Set as the latest release")
- Add a warning to the release notes
- Publish a patch release with the fix
Security Policy
Supported Versions
| Version | Supported |
|---|---|
| 0.13.x (current pre-release) | ✅ |
During pre-1.0 development, only the latest minor version receives security fixes. Once v1.0.0 is released, the two most recent minor versions will receive security fixes.
Reporting a Vulnerability
Please do not report security vulnerabilities via public GitHub Issues.
Use GitHub's built-in private vulnerability reporting:
- Go to the Security tab of this repository
- Click "Report a vulnerability"
- Fill in the details — affected version, description, reproduction steps, and potential impact
We aim to acknowledge reports within 48 hours and provide a fix or mitigation within 14 days for critical issues.
What to Include
A useful report includes:
- PostgreSQL version and
pg_trickleversion - Minimal reproduction SQL or Rust code
- Description of the unintended behaviour and its security impact
- Whether the vulnerability requires a trusted (superuser) or untrusted role to trigger
Scope
In-scope:
- SQL injection or privilege escalation via
pgtrickle.*functions - Memory safety issues in the Rust extension code (buffer overflows, use-after-free, etc.)
- Denial-of-service caused by a low-privilege user triggering runaway resource usage
- Information disclosure through change buffers (
pgtrickle_changes.*) or monitoring views
Out-of-scope:
- Vulnerabilities in PostgreSQL itself (report to the PostgreSQL security team)
- Vulnerabilities in pgrx (report to pgcentralfoundation/pgrx)
- Issues requiring physical access to the database host
Disclosure Policy
We follow coordinated disclosure. Once a fix is released we will publish a security advisory on GitHub with a CVE if applicable.
pg_trickle vs. DBSP: Similarities and Differences
What They Share (Conceptual Foundation)
pg_trickle explicitly cites DBSP as its theoretical foundation (see PRIOR_ART.md). The key overlap:
| Concept | DBSP (paper) | pg_trickle (implementation) |
|---|---|---|
| Z-set / delta model | Rows annotated with weights (+1/−1) in an abelian group | __pgt_action = 'I'/'D' column on every delta row — effectively Z-sets restricted to {+1, −1} |
| Per-operator differentiation | Recursive Algorithm 4.6: Q^Δ = D ∘ Q ∘ I, decomposed per-operator via the chain rule (Q₁ ∘ Q₂)^Δ = Q₁^Δ ∘ Q₂^Δ | DiffContext::diff_node() walks the OpTree and calls per-operator differentiators (scan, filter, project, join, aggregate, distinct, union, etc.) — same recursive structural decomposition |
| Linear operators are self-incremental | Theorem 3.3: for LTI operator Q, Q^Δ = Q | Filter and Project pass deltas through unchanged (just apply predicate/projection to the delta stream) |
| Bilinear join rule | Theorem 3.4: Δ(a × b) = Δa × Δb + a × Δb + Δa × b | diff_inner_join generates exactly 3 UNION ALL parts: (delta_left ⋈ current_right), (current_left ⋈ delta_right), and optionally (delta_left ⋈ delta_right) |
| Aggregate auxiliary counters | §4.2: counting algorithm for maintaining aggregates with deletions | __pgt_count auxiliary column, LEFT JOIN back to stream table to read old counts and compute new counts |
| Recursive queries | §6: fixed-point iteration with z⁻¹ delay operator, semi-naive evaluation | diff_recursive_cte uses recomputation-diff (DRed-style), not DBSP's native fixed-point circuit |
Key Differences
1. Execution model — standalone engine vs. embedded in PostgreSQL
DBSP is a standalone streaming runtime (Rust library, now Feldera). It compiles query plans into dataflow graphs that maintain in-memory state and process continuous micro-batches. Operators are long-lived stateful actors with their own memory.
pg_trickle is an extension inside PostgreSQL. It has no persistent dataflow graph. On each refresh, it generates a single SQL query (CTE chain) that PostgreSQL's own planner/executor evaluates. After execution, no operator state persists — auxiliary state lives in the stream table itself (__pgt_count columns) and change buffer tables.
2. Streams vs. periodic batches
DBSP operates on true infinite streams indexed by logical time t ∈ ℕ. Each "step" processes one micro-batch of changes, and operators carry integration state (I operator = running sum from t=0).
pg_trickle operates in discrete refresh cycles triggered by a lag-based scheduler. There is no integration operator — the "current state" is just the stream table's contents, and changes are consumed from CDC buffer tables between LSN boundaries. Each refresh is a self-contained transaction.
3. Z-set weights vs. binary actions
DBSP uses integer weights in ℤ — rows can have weights > 1 (bags) or < −1 (multiple deletions). This enables correct multiset semantics and composable group algebra.
pg_trickle uses binary actions ('I' insert, 'D' delete, sometimes 'U' update). It doesn't maintain true Z-set weights. For aggregates, the __pgt_count auxiliary column serves a similar purpose but is specific to the aggregate operator — it's not a general weight propagated through the operator tree.
4. Integration operator (I)
DBSP: The integration operator I(s)[t] = Σᵢ≤ₜ s[i] is an explicit first-class circuit element. It maintains running sums of changes and is the key mechanism for computing incremental joins (z⁻¹(I(a)) = "accumulated left side up to previous step").
pg_trickle: No explicit integration. The equivalent of I is just "read the current contents of the source/stream table." Join differentiation directly reads the current snapshot of the non-delta side (build_snapshot_sql() generates FROM "public"."orders" r), which implicitly includes all historical changes.
5. Recursion
DBSP: Native fixed-point circuits with z⁻¹ delay. Can incrementally maintain recursive queries (e.g., transitive closure) by iterating only on new changes within each step — semi-naive evaluation generalized to arbitrary recursion.
pg_trickle: Uses recomputation-diff for recursive CTEs — re-executes the full recursive query and anti-joins against current storage to compute the delta. This is correct but not truly incremental for the recursive part.
6. Correctness guarantees
DBSP: Proven correct in Lean. All theorems are machine-checked. The chain rule, cycle rule, and bilinear decomposition are formally verified.
pg_trickle: Verified empirically via property-based tests (the assert_invariant checks that Contents(ST) = Q(DB) after each mutation cycle). No formal proof, but the per-operator rules are direct translations of DBSP's rules.
7. Scope
DBSP: A general-purpose theory and streaming engine. Handles nested relations, streaming aggregation over windows, arbitrary compositions. The Feldera implementation supports a full SQL frontend.
pg_trickle: Focused on materialized views inside PostgreSQL. Supports a specific subset of SQL (scan, filter, project, inner/left/full join, aggregates, DISTINCT, UNION ALL, INTERSECT, EXCEPT, CTEs, window functions, lateral joins). It is not a general streaming engine — it leverages PostgreSQL's own query planner and executor.
Summary
pg_trickle applies DBSP's differentiation rules to generate delta queries, but it is not a DBSP implementation. It borrows the mathematical framework (per-operator differentiation, Z-set-like deltas, bilinear join decomposition) while making fundamentally different architectural choices: embedded in PostgreSQL, no persistent dataflow state, periodic batch execution, and PostgreSQL's planner as the optimizer. Think of it as "DBSP's differentiation algebra, compiled down to SQL CTEs and executed by PostgreSQL."
Prior Art
This document lists the academic papers, PostgreSQL commits, open-source tools,
and standard algorithms whose techniques are reused in pg_trickle.
Maintaining this record serves two purposes:
- Attribution — credit the research and engineering work this project builds upon.
- Independent derivation — demonstrate that every core technique predates and is independent of any single vendor's commercial product.
Differential View Maintenance (DVM)
DBSP — Automatic Incremental View Maintenance
Budiu, M., Ryzhyk, L., McSherry, F., & Tannen, V. (2023). "DBSP: Automatic Incremental View Maintenance for Rich Query Languages." Proceedings of the VLDB Endowment (PVLDB), 16(7), 1601–1614. https://arxiv.org/abs/2203.16684
The Z-set abstraction (rows annotated with +1/−1 multiplicity) is the
theoretical foundation for the __pgt_action column produced by the delta
operators in src/dvm/operators/. The per-operator differentiation rules
(scan, filter, project, join, aggregate, union) are direct applications of
the DBSP lifting operator (D) described in this paper.
See DBSP_COMPARISON.md for a detailed comparison of pg_trickle's architecture with the DBSP model.
Gupta & Mumick — Materialized Views Survey
Gupta, A. & Mumick, I.S. (1995). "Maintenance of Materialized Views: Problems, Techniques, and Applications." IEEE Data Engineering Bulletin, 18(2), 3–18.
Gupta, A. & Mumick, I.S. (1999). Materialized Views: Techniques, Implementations, and Applications. MIT Press. ISBN 978-0-262-57122-7.
The per-operator differentiation rules in src/dvm/operators/ follow the
derivation given in section 3 of the 1995 survey. The counting algorithm
for maintaining aggregates with deletions uses the approach described in
the MIT Press book.
DBToaster — Higher-order Delta Processing
Koch, C., Ahmad, Y., Kennedy, O., Nikolic, M., Nötzli, A., Olteanu, D., & Zavodny, J. (2014). "DBToaster: Higher-order Delta Processing for Dynamic, Frequently Fresh Views." The VLDB Journal, 23(2), 253–278. https://doi.org/10.1007/s00778-013-0348-4
Inspiration for the recursive delta compilation strategy where the delta of a complex query is itself a query that can be differentiated.
DRed — Deletion and Re-derivation
Gupta, A., Mumick, I.S., & Subrahmanian, V.S. (1993). "Maintaining Views Incrementally." Proceedings of the 1993 ACM SIGMOD International Conference, 157–166.
The DRed algorithm for handling deletions in recursive views is the basis for
the recursive CTE differential refresh strategy in src/dvm/operators/recursive_cte.rs.
Scheduling
Earliest-Deadline-First (EDF)
Liu, C.L. & Layland, J.W. (1973). "Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment." Journal of the ACM, 20(1), 46–61. https://doi.org/10.1145/321738.321743
The schedule-based scheduling in src/scheduler.rs applies the classic
EDF principle: the stream table whose freshness deadline expires soonest is
refreshed first. EDF is optimal for uniprocessor preemptive scheduling and is
a standard technique in operating systems and real-time databases.
Topological Sort — Kahn's Algorithm
Kahn, A.B. (1962). "Topological sorting of large networks." Communications of the ACM, 5(11), 558–562. https://doi.org/10.1145/368996.369025
The dependency DAG in src/dag.rs uses Kahn's algorithm for topological
ordering and cycle detection. This is standard computer science curriculum
and appears in every major algorithms textbook (Cormen et al., Sedgewick,
Kleinberg & Tardos).
Change Data Capture (CDC)
PostgreSQL Row-Level Triggers
Row-level AFTER INSERT/UPDATE/DELETE triggers have been available in
PostgreSQL since version 6.x (late 1990s). The trigger-based change capture
pattern used in src/cdc.rs is a well-established PostgreSQL technique:
- PostgreSQL documentation: CREATE TRIGGER — trigger-based CDC has been a standard pattern for decades.
- PostgreSQL wiki: "Trigger-based Change Data Capture in PostgreSQL."
Debezium
Debezium project (Red Hat, open source since 2016). https://debezium.io/
Debezium implements trigger-based and WAL-based CDC for PostgreSQL and other
databases. The change buffer table pattern (pg_trickle_changes.changes_<oid>)
follows a similar approach, modified for single-process consumption within
the PostgreSQL backend.
pgaudit
pgaudit extension (2015). https://github.com/pgaudit/pgaudit
Captures DML via AFTER row-level triggers for audit logging, demonstrating
the same trigger-based change-capture technique in production since 2015.
Materialized View Refresh
PostgreSQL REFRESH MATERIALIZED VIEW CONCURRENTLY
PostgreSQL 9.4 (December 2014, commit
96ef3b8).src/backend/commands/matview.c
The snapshot-diff strategy used for recomputation-diff refreshes (where the
full query is re-executed and anti-joined against current storage to compute
inserts and deletes) mirrors the algorithm implemented in PostgreSQL's
REFRESH MATERIALIZED VIEW CONCURRENTLY. This PostgreSQL feature predates
all relevant patents and is publicly documented.
SQL MERGE Statement
ISO/IEC 9075:2003 (SQL:2003 standard) —
MERGEstatement. PostgreSQL 15 (October 2022, commit7103eba).
The MERGE-based delta application in src/refresh.rs uses the
ISO-standard MERGE statement, independently implemented by Oracle, SQL
Server, DB2, and PostgreSQL. This is not derived from any vendor-specific
implementation.
General Database Theory
Relational Algebra
Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, 13(6), 377–387.
The operator tree in src/dvm/parser.rs models standard relational algebra
operators (select, project, join, aggregate, union). These are foundational
database theory from 1970.
Semi-Naive Evaluation
Bancilhon, F. & Ramakrishnan, R. (1986). "An Amateur's Introduction to Recursive Query Processing Strategies." Proceedings ACM SIGMOD, 16–52.
General background for recursive CTE evaluation strategies. PostgreSQL's own
WITH RECURSIVE implementation uses iterative fixpoint evaluation based on
these principles.
This document is maintained for attribution and independent-derivation documentation purposes. It does not constitute legal advice.
Custom SQL Syntax for PostgreSQL Extensions
Comprehensive Technical Research Report
Date: 2026-02-25
Context: pg_trickle extension — evaluating approaches to support CREATE STREAM TABLE syntax or equivalent native-feeling DDL.
Table of Contents
- Executive Summary
- PostgreSQL Parser Hooks / Utility Hooks
- The ProcessUtility_hook Approach
- Raw Parser Extension (gram.y)
- The Utility Command Approach
- Custom Access Methods (CREATE ACCESS METHOD)
- Table Access Method API (PostgreSQL 12+)
- Foreign Data Wrapper Approach
- Event Triggers
- TimescaleDB Continuous Aggregates Pattern
- Citus Distributed DDL Pattern
- PostgreSQL 18 New Features
- COMMENT / OPTIONS Abuse Pattern
- pg_ivm (Incremental View Maintenance) Pattern
- CREATE TABLE ... USING (Table Access Methods) Deep Dive
- Comparison Matrix
- Recommendations for pg_trickle
1. Executive Summary
PostgreSQL's parser is not extensible — there is no parser hook that allows extensions to add new grammar rules. This is a fundamental design constraint. Every approach to "custom DDL syntax" in extensions falls into one of two categories:
- Intercept existing syntax — Use
ProcessUtility_hookor event triggers to intercept standard DDL (e.g.,CREATE TABLE,CREATE VIEW) and augment its behavior. - Use a SQL function as the DDL interface — Define
SELECT my_extension.create_thing(...)as the user-facing API (this is what pg_trickle currently does).
No production PostgreSQL extension ships truly new SQL grammar without forking the PostgreSQL parser. TimescaleDB, Citus, pg_ivm, and others all work within existing syntax boundaries.
2. PostgreSQL Parser Hooks / Utility Hooks
Available Hook Points
PostgreSQL provides several hook function pointers that extensions can override in _PG_init():
| Hook | Header | Purpose |
|---|---|---|
ProcessUtility_hook | tcop/utility.h | Intercept utility (DDL) statement execution |
post_parse_analyze_hook | parser/analyze.h | Inspect/modify the analyzed parse tree after semantic analysis |
planner_hook | optimizer/planner.h | Replace or augment the query planner |
ExecutorStart_hook | executor/executor.h | Intercept executor startup |
ExecutorRun_hook | executor/executor.h | Intercept executor row processing |
ExecutorFinish_hook | executor/executor.h | Intercept executor finish |
ExecutorEnd_hook | executor/executor.h | Intercept executor cleanup |
object_access_hook | catalog/objectaccess.h | Notifications when objects are created/modified/dropped |
emit_log_hook | utils/elog.h | Intercept log messages |
What's Missing: No Parser Hook
There is no parser_hook or raw_parser_hook. The raw parser (gram.y → scan.l → bison grammar) is compiled into the PostgreSQL server binary. Extensions cannot:
- Add new keywords (e.g.,
STREAM) - Add new grammar productions (e.g.,
CREATE STREAM TABLE) - Modify the tokenizer/lexer
- Intercept raw SQL text before parsing
The closest hook is post_parse_analyze_hook, which fires after the SQL has already been parsed and analyzed. By this point:
- The SQL string has already been tokenized and parsed by gram.y
- A parse tree (
Querynode) has been produced - If the SQL contains unknown syntax, a
syntax errorhas already been raised
Technical Details of post_parse_analyze_hook
/* In src/backend/parser/analyze.c */
typedef void (*post_parse_analyze_hook_type)(ParseState *pstate,
Query *query,
JumbleState *jstate);
post_parse_analyze_hook_type post_parse_analyze_hook = NULL;
Extensions can set this in _PG_init():
static post_parse_analyze_hook_type prev_post_parse_analyze_hook = NULL;
void _PG_init(void) {
prev_post_parse_analyze_hook = post_parse_analyze_hook;
post_parse_analyze_hook = my_post_parse_analyze;
}
Use cases: Query rewriting after parsing (e.g., adding security predicates, row-level security), statistics collection, plan caching invalidation. Not usable for new syntax because parsing has already completed.
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | Impossible — cannot add new grammar |
| Intercept existing DDL | Yes via ProcessUtility_hook |
| Modify parsed queries | Yes via post_parse_analyze_hook |
| Complexity | Low for hooking, but limited in capability |
| PG version | All modern versions (hooks stable since PG 9.x) |
| Maintenance | Very low — hook signatures rarely change |
3. The ProcessUtility_hook Approach
How It Works
ProcessUtility_hook is the most powerful DDL interception point. It fires for every "utility statement" (DDL, COPY, EXPLAIN, etc.) after parsing but before execution.
typedef void (*ProcessUtility_hook_type)(PlannedStmt *pstmt,
const char *queryString,
bool readOnlyTree,
ProcessUtilityContext context,
ParamListInfo params,
QueryEnvironment *queryEnv,
DestReceiver *dest,
QueryCompletion *qc);
An extension can:
- Inspect the parse tree node — The
PlannedStmt->utilityStmtfield contains the parsed DDL node (e.g.,CreateStmt,AlterTableStmt,ViewStmt). - Modify the parse tree — Change fields before passing to the standard handler.
- Replace execution entirely — Skip calling the standard handler and do something else.
- Post-process — Call the standard handler first, then do additional work.
- Block execution — Raise an error to prevent the DDL.
What Extensions Use This
| Extension | What they intercept | Purpose |
|---|---|---|
| TimescaleDB | CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE INDEX, etc. | Convert regular tables to hypertables, distribute DDL |
| Citus | Most DDL statements | Propagate DDL to worker nodes |
| pg_partman | CREATE TABLE, partition DDL | Auto-manage partitioning |
| pg_stat_statements | All utility statements | Track DDL execution statistics |
| pgAudit | All utility statements | Audit logging |
| pg_hint_plan | — | Uses post_parse_analyze_hook instead |
| sepgsql | Object creation/modification | Security label enforcement |
Can It Handle New Syntax?
No. It can only intercept DDL that PostgreSQL's parser already understands. You cannot use ProcessUtility_hook to handle CREATE STREAM TABLE because the parser will reject that syntax before the hook is ever called.
However, it can intercept and augment existing syntax:
CREATE TABLE ... (some_option)→ InterceptCreateStmt, check for special markers, do extra workCREATE VIEW ... WITH (custom_option = true)→ InterceptViewStmt, checkreloptionsCREATE MATERIALIZED VIEW ... WITH (custom = true)→ Same approach
Pattern: Intercepting CREATE TABLE
static void my_process_utility(PlannedStmt *pstmt, ...) {
Node *parsetree = pstmt->utilityStmt;
if (IsA(parsetree, CreateStmt)) {
CreateStmt *stmt = (CreateStmt *) parsetree;
// Check for a special reloption or table name pattern
ListCell *lc;
foreach(lc, stmt->options) {
DefElem *opt = (DefElem *) lfirst(lc);
if (strcmp(opt->defname, "stream") == 0) {
// This is a stream table! Do custom logic.
create_stream_table_from_ddl(stmt, queryString);
return; // Don't call standard handler
}
}
}
// Pass through to standard handler
if (prev_ProcessUtility)
prev_ProcessUtility(pstmt, ...);
else
standard_ProcessUtility(pstmt, ...);
}
Pros/Cons
| Aspect | Assessment |
|---|---|
Native CREATE STREAM TABLE | No — parser rejects unknown syntax |
CREATE TABLE ... WITH (stream=true) | Yes — feasible via reloptions |
| Complexity | Medium — must carefully chain with other extensions |
| PG version | All modern versions |
| Maintenance | Low — hook signature changes rarely (changed in PG14, PG15) |
| Risk | Must always chain prev_ProcessUtility — misbehaving can break other extensions |
4. Raw Parser Extension (gram.y)
How It Works
PostgreSQL's SQL parser is a Bison-generated LALR(1) parser defined in:
src/backend/parser/gram.y— Grammar rules (~18,000 lines)src/backend/parser/scan.l— Flex lexer (tokenizer)src/include/parser/kwlist.h— Reserved/unreserved keyword list
To add CREATE STREAM TABLE, you would:
- Add
STREAMto the keyword list (unreserved or reserved) - Add grammar rules to
gram.y:CreateStreamTableStmt: CREATE STREAM TABLE qualified_name '(' OptTableElementList ')' OptWith AS SelectStmt { CreateStreamTableStmt *n = makeNode(CreateStreamTableStmt); n->relation = $4; n->query = $9; /* ... */ $$ = (Node *) n; } ; - Add a new
NodeTagforCreateStreamTableStmt - Handle it in
ProcessUtility - Rebuild the PostgreSQL server
Implications
This requires forking PostgreSQL. The modified parser is compiled into postgres binary. You cannot ship a grammar modification as a loadable extension (.so/.dylib).
Who Does This?
- YugabyteDB — Fork of PG with custom grammar for distributed features
- CockroachDB — Entirely custom parser (Go, not PG's Bison grammar)
- Amazon Aurora (partially) — Custom grammar additions for Aurora-specific features
- Greenplum — Fork of PG with added grammar for
DISTRIBUTED BY,PARTITION BYetc. - ParadeDB — Fork of PG with some custom syntax additions
Pros/Cons
| Aspect | Assessment |
|---|---|
Native CREATE STREAM TABLE | Yes — full parser-level support |
| Complexity | Very high — must maintain a PG fork |
| PG version | Tied to a single PG version |
| Maintenance | Extremely high — must rebase on every PG release (gram.y changes significantly between major versions) |
| Distribution | Cannot use CREATE EXTENSION; must ship entire modified PostgreSQL |
| User adoption | Very low — users must replace their PostgreSQL installation |
| psql autocomplete | Would work with matching psql modifications |
| pg_dump/pg_restore | Broken unless you also modify those tools |
Verdict: Not viable for an extension. Only viable for a PostgreSQL fork/distribution.
5. The Utility Command Approach
How It Works
Some sources reference a "custom utility command" mechanism. In practice, this does not exist as a formal PostgreSQL extension point. What people sometimes mean is one of:
5a. Using DO Blocks as Custom Commands
DO $$ BEGIN PERFORM pgtrickle.create_stream_table('my_st', 'SELECT ...'); END $$;
This is just a wrapped function call — not a real custom command.
5b. Abusing COMMENT or SET for Command Dispatch
Some extensions parse custom commands from strings:
-- Using SET to pass commands
SET myext.command = 'CREATE STREAM TABLE my_st AS SELECT ...';
SELECT myext.execute_pending_command();
Or using post_parse_analyze_hook to intercept a specially-formatted query:
-- Extension intercepts this via post_parse_analyze_hook
SELECT * FROM myext.dispatch('CREATE STREAM TABLE ...');
5c. Overloading Existing Syntax
Some extensions overload SELECT or CALL:
CALL pgtrickle.create_stream_table('my_st', $$SELECT ...$$);
CALL was introduced in PostgreSQL 11 for stored procedures. Using it makes the DDL feel more "command-like" than SELECT function().
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | No — still a function call in disguise |
| User experience | Moderate — CALL is better than SELECT |
| Complexity | Low |
| PG version | PG11+ for CALL |
| Maintenance | Very low |
6. Custom Access Methods (CREATE ACCESS METHOD)
How It Works
PostgreSQL supports extension-defined access methods (index AMs and table AMs):
CREATE ACCESS METHOD my_am TYPE TABLE HANDLER my_am_handler;
This was introduced in PostgreSQL 9.6 for index AMs and extended to table AMs in PostgreSQL 12. The CREATE ACCESS METHOD statement shows PostgreSQL's philosophy: extensions can define new implementations of existing concepts (tables, indexes) but not new concepts (stream tables).
Table AM vs. Index AM
| Type | Since | Handler Signature | Example |
|---|---|---|---|
| Index AM | PG 9.6 | IndexAmRoutine with scan/insert/delete callbacks | bloom, brin, GiST |
| Table AM | PG 12 | TableAmRoutine with 60+ callbacks | heap (default), columnar (Citus), zedstore (experimental) |
Can We Use This for Stream Tables?
The table AM API defines how tuples are stored and retrieved, not how tables are created or maintained. A stream table's key features are:
- Defining query — Not part of the table AM concept
- Automatic refresh — Not part of the table AM concept
- Change tracking — Could partially overlap with table AM's tuple modification callbacks
- Storage — The actual storage could use heap (default) AM
You could theoretically create a custom table AM that:
- Uses heap storage underneath
- Intercepts INSERT/UPDATE/DELETE to maintain change buffers
- Adds custom metadata
But this would be an extreme abuse of the API. Table AMs are meant for storage engines, not for implementing materialized view semantics.
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | No — CREATE TABLE ... USING my_am is the closest |
| Complexity | Extremely high — 60+ callbacks to implement |
| Fitness | Poor — table AM is about storage, not view maintenance |
| PG version | PG 12+ |
| Maintenance | High — AM API evolves between major versions |
7. Table Access Method API (PostgreSQL 12+)
Deep Technical Details
The Table Access Method (AM) API was introduced in PostgreSQL 12 via commit c2fe139c20 by Andres Freund. It abstracts the storage layer, allowing extensions to replace the default heap storage with custom implementations.
The CREATE TABLE ... USING Syntax
-- Use default AM (heap)
CREATE TABLE normal_table (id int, data text);
-- Use custom AM
CREATE TABLE my_table (id int, data text) USING my_custom_am;
-- Set default for a database
SET default_table_access_method = 'my_custom_am';
TableAmRoutine Structure
The handler function must return a TableAmRoutine struct with callbacks:
typedef struct TableAmRoutine {
NodeTag type;
/* Slot callbacks */
const TupleTableSlotOps *(*slot_callbacks)(Relation rel);
/* Scan callbacks */
TableScanDesc (*scan_begin)(Relation rel, Snapshot snap, int nkeys, ...);
void (*scan_end)(TableScanDesc scan);
void (*scan_rescan)(TableScanDesc scan, ...);
bool (*scan_getnextslot)(TableScanDesc scan, ...);
/* Parallel scan */
Size (*parallelscan_estimate)(Relation rel);
Size (*parallelscan_initialize)(Relation rel, ...);
void (*parallelscan_reinitialize)(Relation rel, ...);
/* Index fetch */
IndexFetchTableData *(*index_fetch_begin)(Relation rel);
void (*index_fetch_reset)(IndexFetchTableData *data);
void (*index_fetch_end)(IndexFetchTableData *data);
bool (*index_fetch_tuple)(IndexFetchTableData *data, ...);
/* Tuple modification */
void (*tuple_insert)(Relation rel, TupleTableSlot *slot, ...);
void (*tuple_insert_speculative)(Relation rel, ...);
void (*tuple_complete_speculative)(Relation rel, ...);
void (*multi_insert)(Relation rel, TupleTableSlot **slots, int nslots, ...);
TM_Result (*tuple_delete)(Relation rel, ItemPointer tid, ...);
TM_Result (*tuple_update)(Relation rel, ItemPointer otid, ...);
TM_Result (*tuple_lock)(Relation rel, ItemPointer tid, ...);
/* DDL callbacks */
void (*relation_set_new_filelocator)(Relation rel, ...);
void (*relation_nontransactional_truncate)(Relation rel);
void (*relation_copy_data)(Relation rel, const RelFileLocator *newrlocator);
void (*relation_copy_for_cluster)(Relation rel, ...);
void (*relation_vacuum)(Relation rel, VacuumParams *params, ...);
bool (*scan_analyze_next_block)(TableScanDesc scan, ...);
bool (*scan_analyze_next_tuple)(TableScanDesc scan, ...);
/* Planner support */
void (*relation_estimate_size)(Relation rel, int32 *attr_widths, ...);
/* ... more callbacks */
} TableAmRoutine;
Hybrid Approach: Table AM + ProcessUtility_hook
A more practical pattern:
- Register a custom table AM (e.g.,
stream_am) that wraps heap - Use
ProcessUtility_hookto interceptCREATE TABLE ... USING stream_am - When detected, perform stream table registration (catalog, CDC, etc.)
- The actual storage uses standard heap via delegation
-- User writes:
CREATE TABLE order_totals (region text, total numeric)
USING stream_am
WITH (query = 'SELECT region, SUM(amount) FROM orders GROUP BY region',
schedule = '1m',
refresh_mode = 'DIFFERENTIAL');
Problems with This Approach
- Column list is mandatory —
CREATE TABLE ... USINGrequires explicit column definitions. Stream tables should derive columns from the query. - Query in WITH clause — Storing a full SQL query in
reloptionsis hacky and has length limits. - No AS SELECT — Table AMs don't support
CREATE TABLE ... AS SELECTwith USING clause in the standard grammar. - VACUUM, ANALYZE complexity — Must implement or delegate all maintenance callbacks.
- pg_dump compatibility — pg_dump would dump
CREATE TABLE ... USING stream_ambut not the associated metadata (query, schedule, etc.)
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | Partial — CREATE TABLE ... USING stream_am |
| Feels like a stream table | No — still looks like a regular table with options |
| Complexity | Very high |
| pg_dump | Broken — metadata in catalog tables won't be dumped |
| PG version | PG 12+ |
| Maintenance | High — table AM API changes between versions |
8. Foreign Data Wrapper Approach
How It Works
Foreign Data Wrappers (FDW) allow PostgreSQL to access external data sources via CREATE FOREIGN TABLE. An extension can register a custom FDW:
CREATE EXTENSION pg_trickle;
CREATE SERVER stream_server FOREIGN DATA WRAPPER pgtrickle_fdw;
CREATE FOREIGN TABLE order_totals (region text, total numeric)
SERVER stream_server
OPTIONS (
query 'SELECT region, SUM(amount) FROM orders GROUP BY region',
schedule '1m',
refresh_mode 'DIFFERENTIAL'
);
FDW API
The FDW API provides callbacks for:
GetForeignRelSize— Estimate relation size for planningGetForeignPaths— Generate access pathsGetForeignPlan— Create a plan nodeBeginForeignScan— Start scanIterateForeignScan— Get next tupleEndForeignScan— End scanAddForeignUpdatePaths— Support INSERT/UPDATE/DELETE (optional)
How It Could Work for Stream Tables
- Define a custom FDW (
pgtrickle_fdw) - The FDW's scan callbacks read from the underlying storage table
ProcessUtility_hookinterceptsCREATE FOREIGN TABLE ... SERVER stream_serverto set up CDC, catalog entries, etc.- A background worker handles refresh scheduling
Problems
- Foreign tables have restrictions — Cannot have indexes, constraints, triggers, or participate in inheritance. This severely limits usability.
- Query planner limitations — Foreign tables use a separate planning path with potentially worse plan quality.
- No MVCC — Foreign tables typically don't provide snapshot isolation semantics.
- User model confusion — "Foreign table" implies external data, not a derived view.
- EXPLAIN output — Shows "Foreign Scan" instead of "Seq Scan", confusing users.
- pg_dump — Foreign tables are dumped, but server/FDW setup may not transfer correctly.
- Two-step creation — Requires
CREATE SERVERbeforeCREATE FOREIGN TABLE.
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | Partial — CREATE FOREIGN TABLE with options |
| Feels like a stream table | No — foreign tables have different semantics |
| Index support | No — major limitation |
| Trigger support | No — major limitation |
| Complexity | Medium |
| PG version | PG 9.1+ |
| Maintenance | Low — FDW API is very stable |
Verdict: Not suitable. The restrictions on foreign tables (no indexes, no triggers) make this impractical for stream tables that need to behave like regular tables.
9. Event Triggers
How It Works
Event triggers fire on DDL events at the database level:
CREATE EVENT TRIGGER my_trigger ON ddl_command_end
WHEN TAG IN ('CREATE TABLE', 'ALTER TABLE', 'DROP TABLE')
EXECUTE FUNCTION my_handler();
Available events:
ddl_command_start— Before DDL execution (PG 9.3+)ddl_command_end— After DDL execution (PG 9.3+)sql_drop— When objects are dropped (PG 9.3+)table_rewrite— When a table is rewritten (PG 9.5+)
Inside the Handler
CREATE FUNCTION my_handler() RETURNS event_trigger AS $$
DECLARE
obj record;
BEGIN
FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands()
LOOP
-- obj.objid, obj.object_type, obj.command_tag, etc.
IF obj.command_tag = 'CREATE TABLE' AND obj.object_type = 'table' THEN
-- Check if this table has a special marker
-- (e.g., a specific reloption or comment)
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Pattern: CREATE TABLE + Event Trigger
- User creates a table with a special comment or option:
CREATE TABLE order_totals (region text, total numeric); COMMENT ON TABLE order_totals IS 'pgtrickle:query=SELECT region...;schedule=1m'; - Event trigger on
ddl_command_endfires - Handler parses the comment, detects stream table intent
- Handler registers the stream table in the catalog
Limitations
- Cannot modify the DDL — Event triggers observe DDL, they can't change what happened. On
ddl_command_end, the table already exists. - Cannot prevent DDL — On
ddl_command_start, you can raise an error to prevent it, but you can't redirect it. - Two-step process — User must
CREATE TABLEAND then mark it somehow (comment, option, separate function call). - No custom syntax — Event triggers watch existing DDL commands.
- pg_trickle already uses this — For DDL tracking on upstream tables (see
hooks.rs).
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | No — watches existing DDL only |
| Complexity | Low |
| Can transform DDL | No — observe only |
| PG version | PG 9.3+ |
| Maintenance | Very low |
| pg_trickle usage | Already used for upstream DDL tracking |
10. TimescaleDB Continuous Aggregates Pattern
How It Works
TimescaleDB continuous aggregates (caggs) demonstrate the most sophisticated approach to custom DDL-like syntax in a PostgreSQL extension. Their evolution is instructive.
Phase 1: Pure Function API (early versions)
-- Create a view, then register it
CREATE VIEW daily_temps AS
SELECT time_bucket('1 day', time) AS day, AVG(temp)
FROM conditions GROUP BY 1;
SELECT add_continuous_aggregate_policy('daily_temps', ...);
Phase 2: CREATE MATERIALIZED VIEW WITH (introduced in TimescaleDB 2.0)
CREATE MATERIALIZED VIEW daily_temps
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 day', time) AS day, device_id, AVG(temp)
FROM conditions
GROUP BY 1, 2;
How the Hook Chain Works
TimescaleDB's approach uses layered hooks:
ProcessUtility_hookinterceptsCREATE MATERIALIZED VIEW- Checks
reloptionsfortimescaledb.continuousin theWithClause - If found:
- Does NOT call standard ProcessUtility for the matview
- Instead creates a regular hypertable (the materialization)
- Creates an internal view (the user-facing query interface)
- Registers refresh policies in the catalog
- Sets up continuous aggregate metadata
- For
REFRESH MATERIALIZED VIEW, intercepts and routes to their refresh engine - For
DROP MATERIALIZED VIEW, intercepts and cleans up all artifacts
The Magic: Reloptions as Extension Point
PostgreSQL's CREATE MATERIALIZED VIEW ... WITH (option = value) passes options as DefElem nodes in the parse tree. The parser treats these as generic key-value pairs — it does NOT validate the option names. This is the key insight: PostgreSQL's parser accepts arbitrary options in WITH clauses.
// In ProcessUtility_hook:
if (IsA(parsetree, CreateTableAsStmt)) {
CreateTableAsStmt *stmt = (CreateTableAsStmt *) parsetree;
if (stmt->objtype == OBJECT_MATVIEW) {
// Check for our custom option in stmt->into->options
bool is_continuous = false;
ListCell *lc;
foreach(lc, stmt->into->rel->options) {
DefElem *opt = (DefElem *) lfirst(lc);
if (strcmp(opt->defname, "timescaledb.continuous") == 0) {
is_continuous = true;
break;
}
}
if (is_continuous) {
// Handle as continuous aggregate
return;
}
}
}
Refresh Policies
-- Add a refresh policy (function call, not DDL)
SELECT add_continuous_aggregate_policy('daily_temps',
start_offset => INTERVAL '1 month',
end_offset => INTERVAL '1 day',
schedule_interval => INTERVAL '1 hour');
What pg_trickle Could Learn
The TimescaleDB pattern for pg_trickle would look like:
-- Option A: CREATE MATERIALIZED VIEW with custom option
CREATE MATERIALIZED VIEW order_totals
WITH (pgtrickle.stream = true, pgtrickle.schedule = '1m', pgtrickle.mode = 'DIFFERENTIAL')
AS SELECT region, SUM(amount) FROM orders GROUP BY region;
-- Option B: CREATE TABLE with custom option (less natural)
CREATE TABLE order_totals (region text, total numeric)
WITH (pgtrickle.stream = true);
-- Then separately: SELECT pgtrickle.set_query('order_totals', 'SELECT ...');
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | Good — CREATE MATERIALIZED VIEW ... WITH (pgtrickle.stream) looks natural |
| User experience | Very good — familiar DDL syntax with extension options |
| Complexity | High — must implement full ProcessUtility_hook chain |
| pg_dump | Partial — matview DDL is dumped, but custom metadata needs pg_dump extension or config tables |
| PG version | PG 9.3+ (matviews), PG 12+ (better option handling) |
| Maintenance | Medium — must track changes to matview creation internals |
| Shared preload | Required — ProcessUtility_hook needs shared_preload_libraries |
11. Citus Distributed DDL Pattern
How It Works
Citus (now part of Microsoft) demonstrates another approach to extending DDL behavior:
ProcessUtility_hook Chain
Citus has one of the most comprehensive ProcessUtility_hook implementations:
void multi_ProcessUtility(PlannedStmt *pstmt, ...) {
// 1. Classify the DDL
Node *parsetree = pstmt->utilityStmt;
// 2. Check if it affects distributed tables
if (IsA(parsetree, AlterTableStmt)) {
// Propagate ALTER TABLE to all worker nodes
PropagateAlterTable((AlterTableStmt *)parsetree, queryString);
}
// 3. Call standard handler (or skip for intercepted commands)
if (prev_ProcessUtility)
prev_ProcessUtility(pstmt, ...);
else
standard_ProcessUtility(pstmt, ...);
// 4. Post-processing
if (IsA(parsetree, CreateStmt)) {
// Check if we should auto-distribute this table
}
}
Table Distribution via Function Calls
Citus does NOT add custom DDL syntax. Distribution is done via function calls:
-- Create a regular table
CREATE TABLE events (id bigint, data jsonb, created_at timestamptz);
-- Distribute it (function call, not DDL)
SELECT create_distributed_table('events', 'id');
-- Or create a reference table
SELECT create_reference_table('lookups');
Columnar Storage via Table AM
Citus also provides columnar storage as a table AM:
CREATE TABLE analytics_data (...)
USING columnar;
This uses the table AM API (PostgreSQL 12+) — see Section 7.
What Citus Teaches Us
- Function calls for complex operations —
create_distributed_table()is analogous topgtrickle.create_stream_table(). - ProcessUtility_hook for DDL propagation — Intercept standard DDL and add behavior.
- Table AM for storage — Separate concern from distribution logic.
- No custom syntax — Even with Microsoft's resources, Citus doesn't fork the parser.
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | No — uses function calls like pg_trickle |
| Approach validated | Yes — Citus is used at massive scale with this pattern |
| Complexity | Medium (function API) to High (ProcessUtility_hook) |
| User adoption | Proven successful |
| Maintenance | Low for function API |
12. PostgreSQL 18 New Features
Relevant Extension Points in PG 18
PostgreSQL 18 (released 2025) includes several features relevant to this analysis:
12a. Virtual Generated Columns
PG 18 adds GENERATED ALWAYS AS (expr) VIRTUAL columns. Not directly relevant to stream tables, but shows PostgreSQL's willingness to expand CREATE TABLE syntax incrementally.
12b. Improved Table AM API
PG 18 refines the table AM API with better TOAST handling and improved parallel scan support. This makes custom table AMs slightly more practical.
12c. Enhanced Event Trigger Information
PG 18 expands pg_event_trigger_ddl_commands() with additional metadata fields, making event-trigger-based approaches more capable.
12d. pg_stat_io Improvements
Enhanced I/O statistics infrastructure that could benefit monitoring of stream table refresh operations.
12e. No New Parser Extension Points
PostgreSQL 18 does not add any parser extension mechanism. The parser remains monolithic and non-extensible. There have been occasional discussions on pgsql-hackers about parser hooks, but no concrete proposals have been accepted.
12f. No Custom DDL Extension Points
No new general-purpose DDL extension points beyond the existing hook system.
Looking Forward: Discussion on pgsql-hackers
There have been recurring threads on pgsql-hackers about:
- Extension-defined SQL syntax — Rejected due to complexity and parser architecture
- Loadable parser modules — Theoretical discussions, no implementation
- Extension catalogs — Some interest in allowing extensions to register custom catalogs
None of these are implemented in PG 18.
Pros/Cons
| Aspect | Assessment |
|---|---|
| New syntax extension points | None in PG 18 |
| Table AM improvements | Minor — slightly easier to implement |
| Event trigger improvements | Minor — more metadata available |
| Parser extensibility | Not planned for any upcoming PG release |
13. COMMENT / OPTIONS Abuse Pattern
How It Works
Several extensions use table comments or reloptions as a "poor man's metadata" to tag tables with custom semantics.
Pattern 1: COMMENT-based
CREATE TABLE order_totals (region text, total numeric);
COMMENT ON TABLE order_totals IS '@pgtrickle {"query": "SELECT ...", "schedule": "1m"}';
An event trigger or background worker scans pg_description for tables with the @pgtrickle prefix and processes them.
Pattern 2: Reloptions-based
CREATE TABLE order_totals (region text, total numeric)
WITH (fillfactor = 70, pgtrickle.stream = true);
Problem: PostgreSQL validates reloptions against a known list. You cannot add arbitrary options to WITH (...) without registering them. Extensions can register custom reloptions via add_reloption() functions, but this is a relatively obscure API.
Pattern 3: GUC-based Tagging
-- Set a GUC that our ProcessUtility_hook reads
SET pgtrickle.next_create_is_stream = true;
SET pgtrickle.stream_query = 'SELECT region, SUM(amount) FROM orders GROUP BY region';
-- Hook intercepts this CREATE TABLE and registers it
CREATE TABLE order_totals (region text, total numeric);
-- Reset
RESET pgtrickle.next_create_is_stream;
This is extremely hacky but has been used in practice (some partitioning extensions used similar patterns before native partitioning).
Who Uses This?
- pgmemcache — Uses comments to configure caching behavior
- Some row-level security extensions — Comments to define policies
- pg_partman — Uses a configuration table (not comments) but similar concept
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | No — abuses existing mechanisms |
| User experience | Poor — fragile, easy to break by editing comments |
| Complexity | Low |
| pg_dump | COMMENT is dumped — metadata survives pg_dump/restore |
| Robustness | Low — comments can be accidentally changed |
| PG version | All versions |
14. pg_ivm (Incremental View Maintenance) Pattern
How It Works
pg_ivm is the most directly comparable extension to pg_trickle. It implements incremental view maintenance for PostgreSQL.
API Design
pg_ivm uses a pure function-call API:
-- Create an incrementally maintainable materialized view
SELECT create_immv('order_totals', 'SELECT region, SUM(amount) FROM orders GROUP BY region');
-- Refresh
SELECT refresh_immv('order_totals');
-- Drop
DROP TABLE order_totals; -- Just drop the underlying table
Key function: create_immv(name, query) — Creates an "Incrementally Maintainable Materialized View" (IMMV).
Internal Implementation
create_immv()is a SQL function (not a hook)- It parses the query, creates a storage table, sets up triggers on source tables
- IMMVs are stored as regular tables with metadata in a custom catalog (
pg_ivm_immv) - Triggers on source tables automatically update the IMMV on DML
No ProcessUtility_hook
pg_ivm does not use ProcessUtility_hook. It operates entirely through:
- SQL functions (
create_immv,refresh_immv) - Row-level triggers for automatic maintenance
- A custom catalog table for metadata
Why No Custom Syntax?
pg_ivm was developed as a proof-of-concept for PostgreSQL core IVM support. The authors explicitly chose function-call syntax to:
- Avoid
shared_preload_librariesrequirement (hooks need it) - Keep the extension simple and portable
- Focus on the IVM algorithm, not the user interface
Eventually Merged to Core?
There was discussion about upstreaming IVM to PostgreSQL core. If merged, it would get proper syntax (CREATE INCREMENTAL MATERIALIZED VIEW). As an extension, it stays with function calls.
Relevance to pg_trickle
pg_trickle's current API (pgtrickle.create_stream_table()) follows the exact same pattern as pg_ivm. This is the established approach for IVM extensions.
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | No — function calls |
| Complexity | Low — simple function API |
| shared_preload_libraries | Not required for basic function API |
| pg_dump | No — function calls are not dumped; must use custom dump/restore |
| User experience | Moderate — familiar to pg_ivm users |
| Community acceptance | Established pattern for IVM extensions |
15. CREATE TABLE ... USING (Table Access Methods) Deep Dive
Full Syntax
CREATE TABLE tablename (
column1 datatype,
column2 datatype,
...
) USING access_method_name
WITH (storage_parameter = value, ...);
How the Parser Handles USING
In gram.y:
CreateStmt: CREATE OptTemp TABLE ...
OptTableAccessMethod OptWith ...
OptTableAccessMethod:
USING name { $$ = $2; }
| /* empty */ { $$ = NULL; }
;
The USING clause sets CreateStmt->accessMethod to the access method name string.
How ProcessUtility Handles It
In createRelation() (src/backend/commands/tablecmds.c):
- If
accessMethodis specified, look it up inpg_am - Verify it's a table AM (not an index AM)
- Store the AM OID in
pg_class.relam - Use the AM's callbacks for all subsequent operations
Custom Reloptions with Table AMs
Table AMs can define custom reloptions via:
static relopt_parse_elt stream_relopt_tab[] = {
{"query", RELOPT_TYPE_STRING, offsetof(StreamOptions, query)},
{"schedule", RELOPT_TYPE_STRING, offsetof(StreamOptions, schedule)},
{"refresh_mode", RELOPT_TYPE_STRING, offsetof(StreamOptions, refresh_mode)},
};
This would allow:
CREATE TABLE order_totals (region text, total numeric)
USING stream_heap
WITH (query = 'SELECT ...', schedule = '1m', refresh_mode = 'DIFFERENTIAL');
Problems Specific to Stream Tables
-
Column derivation — Stream tables derive columns from the query.
CREATE TABLE ... USINGrequires explicit column definitions, creating redundancy and potential inconsistency. -
No AS SELECT — You can't combine
USINGwithAS SELECT:-- This does NOT work in PostgreSQL grammar: CREATE TABLE order_totals USING stream_heap AS SELECT region, SUM(amount) FROM orders GROUP BY region; -
Full AM implementation required — Even if you delegate to heap, you must implement all callbacks and handle edge cases.
-
VACUUM/ANALYZE — Must properly delegate to heap for these to work.
-
Replication — Logical replication assumes heap tuples; custom AMs may break.
Hybrid Practical Approach
If pursuing this route:
-- Step 1: Set default AM
SET default_table_access_method = 'stream_heap';
-- Step 2: Create with query in options
CREATE TABLE order_totals ()
WITH (pgtrickle.query = 'SELECT region, SUM(amount) FROM orders GROUP BY region',
pgtrickle.schedule = '1m');
-- ProcessUtility_hook would:
-- 1. Detect USING stream_heap (or detect our custom reloptions)
-- 2. Parse the query from options
-- 3. Derive columns from the query
-- 4. Create the actual table with proper columns using heap AM
-- 5. Register in pgtrickle catalog
-- 6. Set up CDC
Pros/Cons
| Aspect | Assessment |
|---|---|
| Native syntax | Partial — CREATE TABLE ... USING stream_heap WITH (...) |
| Column derivation | Not supported — must specify columns or use hook magic |
| Complexity | Very high |
| pg_dump | Good — CREATE TABLE ... USING is properly dumped |
| PG version | PG 12+ |
| Maintenance | High — AM API changes between versions |
16. Comparison Matrix
| Approach | Native Syntax | Complexity | pg_dump | PG Version | Maintenance | Recommended |
|---|---|---|---|---|---|---|
| Function API (current) | No | Low | No* | Any | Very Low | Yes |
| ProcessUtility_hook + MATVIEW WITH | Good | High | Partial | 9.3+ | Medium | Maybe |
| Raw parser fork | Perfect | Very High | No | Fork only | Very High | No |
| Table AM USING | Partial | Very High | Yes | 12+ | High | No |
| FDW FOREIGN TABLE | Partial | Medium | Yes | 9.1+ | Low | No |
| Event triggers alone | No | Low | No | 9.3+ | Low | No |
| COMMENT abuse | No | Low | Yes | Any | Low | No |
| GUC + CREATE TABLE hack | No | Medium | Partial | Any | Medium | No |
| TimescaleDB pattern (MATVIEW + WITH) | Good | High | Partial | 9.3+ | Medium | Best option |
* Custom pg_dump support can be added via pg_dump hook or wrapper script.
17. Recommendations for pg_trickle
Current Approach: Function API (Keep and Enhance)
pg_trickle's current approach (pgtrickle.create_stream_table('name', 'query', ...)) is:
- Proven — Same pattern as pg_ivm, Citus, and many other extensions
- Simple — No
shared_preload_librariesrequired for basic usage - Maintainable — No hook chains to debug
- Portable — Works on any PG version that supports pgrx
Enhancement opportunities:
-- Current
SELECT pgtrickle.create_stream_table('order_totals',
'SELECT region, SUM(amount) FROM orders GROUP BY region', '1m');
-- Enhanced: CALL syntax for more DDL-like feel (PG 11+)
CALL pgtrickle.create_stream_table('order_totals',
$$SELECT region, SUM(amount) FROM orders GROUP BY region$$, '1m');
Future Option: TimescaleDB-style Materialized View Integration
If user demand justifies the complexity, pg_trickle could add a second creation path via ProcessUtility_hook:
-- New native-feeling syntax (requires shared_preload_libraries)
CREATE MATERIALIZED VIEW order_totals
WITH (pgtrickle.stream = true, pgtrickle.schedule = '1m')
AS SELECT region, SUM(amount) FROM orders GROUP BY region
WITH NO DATA;
-- Original function API still works (no hook needed)
SELECT pgtrickle.create_stream_table('order_totals',
'SELECT region, SUM(amount) FROM orders GROUP BY region', '1m');
Implementation plan for hook-based approach:
- Register
ProcessUtility_hookin_PG_init()(already needed forshared_preload_libraries) - Intercept
CREATE MATERIALIZED VIEW→ Check forpgtrickle.streamoption - If found: parse options, call
create_stream_table_impl()internally, create standard storage table instead of matview - Intercept
DROP MATERIALIZED VIEW→ Check if target is a stream table → Clean up - Intercept
REFRESH MATERIALIZED VIEW→ Route to stream table refresh engine - Intercept
ALTER MATERIALIZED VIEW→ Route to stream table alter logic
Estimated complexity: ~800-1200 lines of Rust hook code + tests.
Not Recommended
- Forking PostgreSQL for custom grammar — Maintenance cost is prohibitive
- Table AM approach — Complexity without proportional benefit
- FDW approach — Too many restrictions on foreign tables
- COMMENT abuse — Fragile and poor UX
pg_dump / pg_restore Strategy
Regardless of approach, pg_dump is a challenge. Options:
- Custom dump/restore functions —
pgtrickle.dump_config()andpgtrickle.restore_config() - Migration script generation —
pgtrickle.generate_migration()outputs SQL to recreate all stream tables - Event trigger on restore — Detect when tables are restored and re-register them
- Sidecar file — Generate a companion SQL file alongside pg_dump
Appendix A: Hook Registration in pgrx (Rust)
For reference, here's how ProcessUtility_hook registration works in pgrx:
#![allow(unused)] fn main() { use pgrx::prelude::*; use pgrx::pg_sys; use std::ffi::CStr; static mut PREV_PROCESS_UTILITY_HOOK: pg_sys::ProcessUtility_hook_type = None; #[pg_guard] pub extern "C-unwind" fn my_process_utility( pstmt: *mut pg_sys::PlannedStmt, query_string: *const std::os::raw::c_char, read_only_tree: bool, context: pg_sys::ProcessUtilityContext, params: pg_sys::ParamListInfo, query_env: *mut pg_sys::QueryEnvironment, dest: *mut pg_sys::DestReceiver, qc: *mut pg_sys::QueryCompletion, ) { // SAFETY: pstmt is a valid pointer provided by PostgreSQL let stmt = unsafe { (*pstmt).utilityStmt }; // Check if this is a CreateTableAsStmt (materialized view) if unsafe { pgrx::is_a(stmt, pg_sys::NodeTag::T_CreateTableAsStmt) } { // Check for our custom options... } // Chain to previous hook or standard handler unsafe { if let Some(prev) = PREV_PROCESS_UTILITY_HOOK { prev(pstmt, query_string, read_only_tree, context, params, query_env, dest, qc); } else { pg_sys::standard_ProcessUtility( pstmt, query_string, read_only_tree, context, params, query_env, dest, qc); } } } pub fn register_hooks() { unsafe { PREV_PROCESS_UTILITY_HOOK = pg_sys::ProcessUtility_hook; pg_sys::ProcessUtility_hook = Some(my_process_utility); } } }
Appendix B: Key Source Files in PostgreSQL
| File | Purpose |
|---|---|
src/backend/parser/gram.y | SQL grammar (~18,000 lines) |
src/backend/parser/scan.l | Lexer/tokenizer |
src/include/parser/kwlist.h | Keyword definitions |
src/backend/tcop/utility.c | ProcessUtility() — DDL dispatcher |
src/backend/commands/tablecmds.c | CREATE/ALTER/DROP TABLE implementation |
src/backend/commands/createas.c | CREATE TABLE AS / CREATE MATVIEW AS |
src/include/access/tableam.h | Table Access Method API |
src/include/foreign/fdwapi.h | FDW API |
src/backend/commands/event_trigger.c | Event trigger infrastructure |
Appendix C: References
- PostgreSQL Documentation — Table Access Method Interface
- PostgreSQL Documentation — Event Triggers
- PostgreSQL Documentation — Writing A Foreign Data Wrapper
- TimescaleDB Source — process_utility.c
- Citus Source — multi_utility.c
- pg_ivm Source — createas.c
- pgrx Documentation — Hooks
- PostgreSQL Wiki — CustomScanProviders
pg_trickle vs pg_ivm — Comparison Report & Gap Analysis
Date: 2026-02-28 (merged 2026-03-01, updated 2026-03-20) Author: Internal research Status: Reference document
1. Executive Summary
Both pg_trickle and pg_ivm implement Incremental View Maintenance (IVM) as
PostgreSQL extensions — the goal of keeping materialized query results up-to-date
without full recomputation. Despite the shared objective they differ fundamentally
in design philosophy, maintenance model, SQL coverage, operational model, and
target audience.
pg_ivm is a mature, widely-deployed C extension (1.4k GitHub stars, 17 releases)
focused on immediate, synchronous IVM that runs inside the same transaction as
the base-table write. pg_trickle is a Rust extension (v0.9.0) offering
both deferred (scheduled) and immediate (transactional) IVM with a richer SQL
dialect, a dependency DAG, and built-in operational tooling.
pg_trickle is significantly ahead of pg_ivm in SQL coverage, operator support,
aggregate support, and operational features. As of v0.2.1, pg_trickle also
matches pg_ivm's core strength — immediate, in-transaction maintenance — via
the IMMEDIATE refresh mode (all phases complete). pg_ivm's one remaining
structural advantage is broader PostgreSQL version support (PG 13–18):
- IMMEDIATE mode — fully implemented. Statement-level AFTER triggers with transition tables update stream tables within the same transaction as base-table DML. Window functions, LATERAL, scalar subqueries, cascading IMMEDIATE stream tables, WITH RECURSIVE (with a stack-depth warning), and TopK micro-refresh are all supported. See PLAN_TRANSACTIONAL_IVM.md.
- AUTO refresh mode — new default for
create_stream_table. Selects DIFFERENTIAL when the query supports it and transparently falls back to FULL otherwise, eliminating the need to choose a mode at creation time. - pg_ivm compatibility layer — postponed. The
pgivm.create_immv()/pgivm.refresh_immv()/pgivm.pg_ivm_immvwrappers (Phase 2) are deferred to post-1.0. - PLAN_PG_BACKCOMPAT.md details backporting
pg_trickle to PG 14–18 (recommended) or PG 16–18 (minimum viable),
requiring ~2.5–3 weeks of effort primarily in
#[cfg]-gating ~435 lines of JSON/SQL-standard parse-tree handling.
With IMMEDIATE mode fully implemented, Row Level Security support (v0.5.0), pg_dump/restore support (v0.8.0), algebraic aggregate maintenance (v0.9.0), parallel refresh (v0.4.0), circular pipeline support (v0.7.0), watermark APIs (v0.7.0), and 40+ unique features, pg_ivm's only remaining advantages are PG version breadth and production maturity.
2. Project Overview
| Attribute | pg_ivm | pg_trickle |
|---|---|---|
| Repository | sraoss/pg_ivm | grove/pg-trickle |
| Language | C | Rust (pgrx 0.17) |
| Latest release | 1.13 (2025-10-20) | 0.9.0 (2026-03-20) |
| Stars | ~1,400 | early stage |
| License | PostgreSQL License | Apache 2.0 |
| PG versions | 13 – 18 | 18 only; PG 14–18 planned |
| Schema | pgivm | pgtrickle / pgtrickle_changes |
| Shared library required | Yes (shared_preload_libraries or session_preload_libraries) | Yes (shared_preload_libraries, required for background worker) |
| Background worker | No | Yes (scheduler + optional WAL decoder) |
3. Maintenance Model
This is the most important design difference between the two extensions.
pg_ivm — Immediate Maintenance
pg_ivm updates its views synchronously inside the same transaction that
modified the base table. When a row is inserted/updated/deleted, AFTER row
triggers fire and update the IMMV before the transaction commits.
BEGIN;
UPDATE base_table ...; -- triggers fire here
-- IMMV is updated before COMMIT
COMMIT;
Consequences:
- The IMMV is always exactly consistent with the committed state of the base table — zero staleness.
- Write latency increases by the cost of view maintenance. For large joins or aggregates on popular tables this can be significant.
- Locking:
ExclusiveLockis held on the IMMV during maintenance to prevent concurrent anomalies. InREPEATABLE READorSERIALIZABLEisolation, errors are raised when conflicts are detected. TRUNCATEon a base table triggers full IMMV refresh (for most view types).- Not compatible with logical replication (subscriber nodes are not updated).
pg_trickle — Deferred, Scheduled Maintenance
pg_trickle updates its stream tables asynchronously, driven by a background worker scheduler. Changes are captured by row-level triggers (or optionally by WAL decoding) into change-buffer tables and are applied in batch on the next refresh cycle.
-- Write path: only a trigger INSERT into change buffer
BEGIN;
UPDATE base_table ...; -- trigger captures delta into pgtrickle_changes.*
COMMIT;
-- Separate refresh cycle (background worker):
apply_delta_to_stream_table(...)
Consequences:
- Write latency is minimized — the trigger write into the change buffer is ~2–50 μs regardless of view complexity.
- Stream tables are stale between refresh cycles. The staleness bound is
configurable (e.g.
'30s','5m','@hourly', or cron expressions). - Refresh can be triggered manually:
pgtrickle.refresh_stream_table(...). - Multiple stream tables can share a refresh pipeline ordered by dependency (topological DAG scheduling).
- The WAL-based CDC mode (
pg_trickle.cdc_mode = 'wal') eliminates trigger overhead entirely whenwal_level = logicalis available. - Append-only fast path (v0.5.0):
append_only => trueskips merge for INSERT-only tables with auto-fallback if DELETE/UPDATE detected. - Source gating (v0.5.0): pause CDC during bulk loads via
gate_source()andungate_source()to avoid trigger overhead during large batch inserts.
Implemented: pg_trickle IMMEDIATE Mode
pg_trickle now offers an IMMEDIATE refresh mode (Phase 1 + Phase 3 complete)
that uses statement-level AFTER triggers with transition tables — the same
mechanism pg_ivm uses. Key implementation details:
- Reuses the DVM engine — the Scan operator reads from transition tables (via temporary views) instead of change buffer tables.
- Phase 1 (complete): core IMMEDIATE engine — INSERT/UPDATE/DELETE/TRUNCATE
handling, advisory lock-based concurrency (
IvmLockMode), mode switching viaalter_stream_table, query restriction validation. - Phase 2 (postponed):
pgivm.*compatibility layer for drop-in migration. - Phase 3 (complete): extended SQL support — window functions, LATERAL,
scalar subqueries, cascading IMMEDIATE stream tables, WITH RECURSIVE
(IM1: supported with a stack-depth warning), and TopK micro-refresh
(IM2: recomputes top-K on every DML, gated by
pg_trickle.ivm_topk_max_limit). - Phase 4 (complete): delta SQL template caching (
IVM_DELTA_CACHE); ENR-based transition tables and C-level triggers deferred to post-1.0 as optimizations only.
-- Create an IMMEDIATE stream table (zero staleness)
SELECT pgtrickle.create_stream_table(
'live_totals',
'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
NULL, -- no schedule needed
'IMMEDIATE'
);
-- Updates propagate within the same transaction
BEGIN;
INSERT INTO orders (region, amount) VALUES ('EU', 100);
SELECT * FROM live_totals; -- already includes the new row
COMMIT;
4. SQL Feature Coverage — Summary
| Dimension | pg_ivm | pg_trickle | Winner |
|---|---|---|---|
| Maintenance timing | Immediate (in-transaction triggers) | Deferred (scheduler/manual) and IMMEDIATE (in-transaction) | pg_trickle (offers both models) |
| PostgreSQL versions | 13–18 | 18 only; PG 14–18 planned | pg_ivm (today); planned parity |
| Aggregate functions | 5 (COUNT, SUM, AVG, MIN, MAX) | 60+ (all built-in aggregates incl. algebraic O(1) for COUNT/SUM/AVG/STDDEV/VAR) | pg_trickle |
| FILTER clause on aggregates | No | Yes | pg_trickle |
| HAVING clause | No | Yes | pg_trickle |
| Inner joins | Yes (including self-join) | Yes (including self-join, NATURAL, nested) | pg_trickle |
| Outer joins | Yes (limited — equijoin, single condition, many restrictions) | Yes (LEFT/RIGHT/FULL, nested, complex conditions) | pg_trickle |
| DISTINCT | Yes (reference-counted) | Yes (reference-counted) | Tie |
| DISTINCT ON | No | Yes (auto-rewritten to ROW_NUMBER) | pg_trickle |
| UNION / INTERSECT / EXCEPT | No | Yes (all 6 variants, bag + set) | pg_trickle |
| Window functions | No | Yes (partition recomputation) | pg_trickle |
| CTEs (non-recursive) | Simple only (no aggregates, no DISTINCT inside) | Full (aggregates, DISTINCT, multi-reference shared delta) | pg_trickle |
| CTEs (recursive) | No | Yes (semi-naive, DRed, recomputation; IMMEDIATE mode with stack-depth warning) | pg_trickle |
| Subqueries in FROM | Simple only (no aggregates/DISTINCT inside) | Full support | pg_trickle |
| EXISTS subqueries | Yes (WHERE only, AND only, no agg/DISTINCT) | Yes (WHERE + targetlist, AND/OR, agg/DISTINCT inside) | pg_trickle |
| NOT EXISTS / NOT IN | No | Yes (anti-join operator) | pg_trickle |
| IN (subquery) | No | Yes (semi-join operator) | pg_trickle |
| Scalar subquery in SELECT | No | Yes (scalar subquery operator) | pg_trickle |
| LATERAL subqueries | No | Yes (row-scoped recomputation) | pg_trickle |
| LATERAL SRFs | No | Yes (jsonb_array_elements, unnest, etc.) | pg_trickle |
| JSON_TABLE (PG 17+) | No | Yes | pg_trickle |
| GROUPING SETS / CUBE / ROLLUP | No | Yes (auto-rewritten to UNION ALL) | pg_trickle |
| Views as sources | No (simple tables only) | Yes (auto-inlined, nested) | pg_trickle |
| Partitioned tables | No | Yes | pg_trickle |
| Foreign tables | No | FULL mode only | pg_trickle |
| Cascading (view-on-view) | No | Yes (DAG-aware scheduling) | pg_trickle |
| Background scheduling | No (user must trigger) | Yes (cron + duration, background worker) | pg_trickle |
| Monitoring / observability | 1 catalog table | Extensive (stats, history, staleness, CDC health, NOTIFY) | pg_trickle |
| CDC mechanism | Triggers only | Hybrid (triggers + optional WAL) | pg_trickle |
| DDL tracking | No automatic handling | Yes (event triggers, auto-reinit) | pg_trickle |
| TRUNCATE handling | Yes (auto-truncate IMMV) | IMMEDIATE mode: full refresh in same txn; DEFERRED: queued full refresh | Tie (functionally equivalent in IMMEDIATE mode) |
| Auto-indexing | Yes (on GROUP BY / DISTINCT / PK columns) | No (user creates indexes) | pg_ivm |
| Row Level Security | Yes (with limitations) | Yes (refreshes see all data; RLS on stream table; IMMEDIATE mode secured) | pg_trickle (richer model) |
| Concurrency model | ExclusiveLock on IMMV during maintenance | Advisory locks, non-blocking reads, parallel refresh | pg_trickle |
| Data type restrictions | Must have btree opclass (no json, xml, point) | No documented type restrictions | pg_trickle |
| Maturity / ecosystem | 4 years, 1.4k stars, PGXN, yum packages | v0.9.0 released, 1,100+ unit tests + 900+ E2E tests, 22 TPC-H benchmarks, dbt integration | pg_ivm |
4.1 Areas Where pg_ivm Wins
Of the ~35 dimensions in the summary table above, pg_ivm holds an advantage in only 3 (down from 6 before IMMEDIATE mode and RLS were implemented). One is substantive, two are temporary gaps with existing plans.
1. PostgreSQL Version Support (substantive, planned resolution)
pg_ivm ships pre-built packages for PostgreSQL 13–18 across all major Linux distros via yum.postgresql.org and PGXN. pg_trickle currently targets PG 18 only.
This is the single largest remaining structural gap. PG 13 is EOL (Nov 2025), but PG 14–17 are widely deployed in production environments. Users on those versions simply cannot use pg_trickle today.
Planned resolution: PLAN_PG_BACKCOMPAT.md
details backporting to PG 14–18 (~2.5–3 weeks). pgrx 0.17 already supports
PG 14–18 via feature flags; ~435 lines in parser.rs need #[cfg] gating
for JSON/SQL-standard parse-tree handling.
2. Auto-Indexing (substantive, low priority)
When pg_ivm creates an IMMV, it automatically adds indexes on columns used in
GROUP BY, DISTINCT, and primary keys. This is a genuine usability advantage
— new users get reasonable read performance without manual intervention.
pg_trickle leaves index creation entirely to the user. For DIFFERENTIAL mode
stream tables, the DVM engine's MERGE-based delta application already uses the
stream table's primary key (which is auto-created), and index-aware MERGE
(pg_trickle.merge_seqscan_threshold, added v0.9.0) uses index lookups for
tiny change ratios, but secondary indexes for read-side query patterns must
be added manually.
Impact: Low — experienced users always create application-specific indexes anyway. Auto-indexing mostly helps onboarding and simple use-cases.
Planned resolution: Tracked as part of the pg_ivm compatibility layer
(Phase 2, postponed to post-1.0). Could also be implemented independently as
a CREATE INDEX IF NOT EXISTS step in create_stream_table.
3. Maturity / Ecosystem (temporary, closing over time)
pg_ivm has 4 years of production use, ~1,400 GitHub stars, 17 releases, and is distributed via PGXN, yum, and apt package repositories. It has a track record of stability and a community of users.
pg_trickle is a v0.9.0 series release with 1,100+ unit tests, 200+ integration tests, 570+ light E2E tests, 90+ full E2E tests, and 22 TPC-H correctness benchmarks—but no wide production deployments yet. It lacks the battle-testing that comes from years of real-world usage.
Impact: High for risk-averse organizations considering production adoption. Low for greenfield projects or teams willing to adopt early.
Resolution: This gap closes naturally with time, releases, and adoption.
The dbt integration (dbt-pgtrickle) and CNPG/Kubernetes deployment support
accelerate ecosystem development.
5. Detailed SQL Comparison
5.1 Aggregate Functions
| Function | pg_ivm | pg_trickle |
|---|---|---|
| COUNT(*) / COUNT(expr) | ✅ Algebraic | ✅ Algebraic (O(1) running total, v0.9.0) |
| SUM | ✅ Algebraic | ✅ Algebraic (O(1) running total, v0.9.0) |
| AVG | ✅ Algebraic (via SUM/COUNT) | ✅ Algebraic (O(1) via SUM/COUNT decomposition, v0.9.0) |
| MIN | ✅ Semi-algebraic (rescan on extremum delete) | ✅ Semi-algebraic (O(1) unless extremum deleted, v0.9.0 safety guard) |
| MAX | ✅ Semi-algebraic (rescan on extremum delete) | ✅ Semi-algebraic (O(1) unless extremum deleted, v0.9.0 safety guard) |
| BOOL_AND / BOOL_OR | ❌ | ✅ Group-rescan |
| STRING_AGG | ❌ | ✅ Group-rescan |
| ARRAY_AGG | ❌ | ✅ Group-rescan |
| JSON_AGG / JSONB_AGG | ❌ | ✅ Group-rescan |
| BIT_AND / BIT_OR / BIT_XOR | ❌ | ✅ Group-rescan |
| JSON_OBJECT_AGG / JSONB_OBJECT_AGG | ❌ | ✅ Group-rescan |
| STDDEV / VARIANCE (all variants) | ❌ | ✅ Algebraic (O(1) sum-of-squares decomposition, v0.9.0) |
| MODE / PERCENTILE_CONT / PERCENTILE_DISC | ❌ | ✅ Group-rescan |
| CORR / COVAR / REGR_* (11 functions) | ❌ | ✅ Group-rescan |
| ANY_VALUE (PG 16+) | ❌ | ✅ Group-rescan |
| JSON_ARRAYAGG / JSON_OBJECTAGG (PG 16+) | ❌ | ✅ Group-rescan |
| User-defined aggregates (CREATE AGGREGATE) | ❌ | ✅ Group-rescan |
| FILTER (WHERE) clause | ❌ | ✅ |
| WITHIN GROUP (ORDER BY) | ❌ | ✅ |
| COUNT(DISTINCT expr) / SUM(DISTINCT expr) | ❌ | ✅ |
| Total | 5 | 60+ |
Gap for pg_ivm: Massive. Only 5 of ~60 built-in aggregate functions are supported.
pg_trickle v0.9.0 also introduced algebraic (O(1)) maintenance for COUNT,
SUM, AVG, STDDEV, and VARIANCE — meaning these aggregates update in constant
time per changed row via running totals, whereas pg_ivm’s algebraic support
is limited to COUNT, SUM, AVG. pg_trickle additionally supports user-defined
aggregates via group-rescan and floating-point drift correction
(pg_trickle.algebraic_drift_reset_cycles).
5.2 Joins
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| Inner join | ✅ | ✅ |
| Self-join | ✅ | ✅ |
| LEFT JOIN | ✅ (restricted) | ✅ (full) |
| RIGHT JOIN | ✅ (restricted) | ✅ (normalized to LEFT) |
| FULL OUTER JOIN | ✅ (restricted) | ✅ (8-part delta) |
| NATURAL JOIN | ? | ✅ |
| Cross join | ? | ✅ |
| Nested joins (3+ tables) | ✅ | ✅ |
| Non-equi joins (theta) | ? | ✅ |
| Outer join + aggregates | ❌ | ✅ |
| Outer join + subqueries | ❌ | ✅ |
| Outer join + CASE/non-strict | ❌ | ✅ |
| Outer join multi-condition | ❌ (single equality only) | ✅ |
Gap for pg_ivm: Outer joins are heavily restricted — single equijoin condition, no aggregates, no subqueries, no CASE expressions, no IS NULL in WHERE.
5.3 Subqueries
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| Simple subquery in FROM | ✅ (no aggregates/DISTINCT inside) | ✅ (full support) |
| EXISTS in WHERE | ✅ (AND only, no agg/DISTINCT inside) | ✅ (AND + OR, full SQL inside) |
| NOT EXISTS in WHERE | ❌ | ✅ (anti-join operator) |
| IN (subquery) | ❌ | ✅ (rewritten to semi-join) |
| NOT IN (subquery) | ❌ | ✅ (rewritten to anti-join) |
| ALL (subquery) | ❌ | ✅ (rewritten to anti-join) |
| Scalar subquery in SELECT | ❌ | ✅ (scalar subquery operator) |
| Scalar subquery in WHERE | ❌ | ✅ (auto-rewritten to CROSS JOIN) |
| LATERAL subquery in FROM | ❌ | ✅ (row-scoped recomputation) |
| LATERAL SRF in FROM | ❌ | ✅ (jsonb_array_elements, unnest, etc.) |
| Subqueries in OR | ❌ | ✅ (auto-rewritten to UNION) |
Gap for pg_ivm: Severely limited subquery support. No anti-joins, no scalar subqueries, no LATERAL, no SRFs.
5.4 CTEs
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| Simple non-recursive CTE | ✅ (no aggregates/DISTINCT inside) | ✅ (full SQL inside) |
| Multi-reference CTE | ? | ✅ (shared delta optimization) |
| Chained CTEs | ? | ✅ |
| WITH RECURSIVE | ❌ | ✅ (semi-naive, DRed, recomputation; IMMEDIATE mode with stack-depth warning) |
Gap for pg_ivm: No recursive CTEs, no aggregates/DISTINCT inside CTEs.
5.5 Set Operations
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| UNION ALL | ❌ | ✅ |
| UNION (set) | ❌ | ✅ (via DISTINCT + UNION ALL) |
| INTERSECT | ❌ | ✅ (dual-count multiplicity) |
| INTERSECT ALL | ❌ | ✅ |
| EXCEPT | ❌ | ✅ (dual-count multiplicity) |
| EXCEPT ALL | ❌ | ✅ |
Gap for pg_ivm: No set operations at all.
5.6 Window Functions
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| ROW_NUMBER, RANK, DENSE_RANK | ❌ | ✅ |
| SUM/AVG/COUNT OVER () | ❌ | ✅ |
| Frame clauses (ROWS/RANGE/GROUPS) | ❌ | ✅ |
| Named WINDOW clauses | ❌ | ✅ |
| PARTITION BY recomputation | ❌ | ✅ |
Gap for pg_ivm: Window functions are completely unsupported.
5.7 DISTINCT & Grouping
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| SELECT DISTINCT | ✅ | ✅ |
| DISTINCT ON (expr, ...) | ❌ | ✅ (auto-rewritten to ROW_NUMBER) |
| GROUP BY | ✅ | ✅ |
| GROUPING SETS | ❌ | ✅ (auto-rewritten to UNION ALL) |
| CUBE | ❌ | ✅ (auto-rewritten via GROUPING SETS) |
| ROLLUP | ❌ | ✅ (auto-rewritten via GROUPING SETS) |
| GROUPING() function | ❌ | ✅ |
| HAVING | ❌ | ✅ |
5.8 Source Table Types
| Source type | pg_ivm | pg_trickle |
|---|---|---|
| Simple heap tables | ✅ | ✅ |
| Views | ❌ | ✅ (auto-inlined) |
| Materialized views | ❌ | FULL mode only |
| Partitioned tables | ❌ | ✅ |
| Partitions | ❌ | ✅ (via parent) |
| Foreign tables | ❌ | FULL mode only |
| Other IMMVs / stream tables | ❌ | ✅ (DAG cascading) |
Gap for pg_ivm: Only simple heap tables. No views, no partitioned tables, no cascading.
6. API Comparison
pg_ivm API
-- Create an IMMV
SELECT pgivm.create_immv('myview', 'SELECT * FROM mytab');
-- Full refresh (emergency)
SELECT pgivm.refresh_immv('myview', true); -- with data
SELECT pgivm.refresh_immv('myview', false); -- disable maintenance
-- Inspect
SELECT immvrelid, pgivm.get_immv_def(immvrelid)
FROM pgivm.pg_ivm_immv;
-- Drop
DROP TABLE myview;
-- Rename
ALTER TABLE myview RENAME TO myview2;
pg_ivm IMMVs are standard PostgreSQL tables. They can be dropped with
DROP TABLE and renamed with ALTER TABLE.
pg_trickle API
-- Create a stream table (AUTO mode: DIFFERENTIAL when possible, FULL fallback)
SELECT pgtrickle.create_stream_table(
'order_totals',
'SELECT region, SUM(amount) AS total FROM orders GROUP BY region'
-- refresh_mode defaults to 'AUTO', schedule defaults to 'calculated'
);
-- Create a stream table (explicit deferred, scheduled)
SELECT pgtrickle.create_stream_table(
'order_totals',
'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
schedule => '2m',
refresh_mode => 'DIFFERENTIAL'
);
-- Create a stream table (immediate, in-transaction)
SELECT pgtrickle.create_stream_table(
'live_totals',
'SELECT region, SUM(amount) AS total FROM orders GROUP BY region',
schedule => NULL,
refresh_mode => 'IMMEDIATE'
);
-- Manual refresh
SELECT pgtrickle.refresh_stream_table('order_totals');
-- Alter schedule, mode, or defining query
SELECT pgtrickle.alter_stream_table('order_totals', schedule => '5m');
SELECT pgtrickle.alter_stream_table(
'order_totals',
query => 'SELECT region, SUM(amount) AS total FROM orders WHERE active GROUP BY region'
);
-- Drop
SELECT pgtrickle.drop_stream_table('order_totals');
-- Status and monitoring
SELECT * FROM pgtrickle.pgt_status();
SELECT * FROM pgtrickle.pg_stat_stream_tables;
SELECT * FROM pgtrickle.pgt_stream_tables;
-- DAG inspection
SELECT * FROM pgtrickle.pgt_dependencies;
-- Extended observability (added v0.2.0+)
SELECT * FROM pgtrickle.change_buffer_sizes(); -- CDC buffer health
SELECT * FROM pgtrickle.list_sources('order_totals'); -- source table stats
SELECT * FROM pgtrickle.dependency_tree(); -- ASCII DAG view
SELECT * FROM pgtrickle.health_check(); -- OK/WARN/ERROR triage
SELECT * FROM pgtrickle.refresh_timeline(); -- cross-stream history
SELECT * FROM pgtrickle.trigger_inventory(); -- CDC trigger audit
SELECT * FROM pgtrickle.diamond_groups(); -- diamond consistency groups
-- Source gating (v0.5.0)
SELECT pgtrickle.gate_source('orders'); -- pause CDC
SELECT pgtrickle.ungate_source('orders'); -- resume CDC
SELECT * FROM pgtrickle.source_gates(); -- gate status
-- Watermarks (v0.7.0)
SELECT pgtrickle.advance_watermark('orders', '2026-03-20 12:00:00');
SELECT pgtrickle.create_watermark_group('sync', ARRAY['orders','products'], 30);
SELECT * FROM pgtrickle.watermarks();
SELECT * FROM pgtrickle.watermark_status();
-- Parallel refresh monitoring (v0.4.0)
SELECT * FROM pgtrickle.worker_pool_status();
SELECT * FROM pgtrickle.parallel_job_status();
-- Refresh groups (v0.9.0)
SELECT pgtrickle.create_refresh_group('my_group', ARRAY['st1','st2']);
SELECT pgtrickle.drop_refresh_group('my_group');
-- Idempotent DDL (v0.6.0)
SELECT pgtrickle.create_or_replace_stream_table(
'order_totals',
'SELECT region, SUM(amount) AS total FROM orders GROUP BY region'
);
pg_trickle stream tables are regular PostgreSQL tables but managed through the
pgtrickle schema's API functions. They cannot be renamed with ALTER TABLE
(use alter_stream_table).
7. Scheduling and Dependency Management
| Capability | pg_ivm | pg_trickle |
|---|---|---|
| Automatic scheduling | ❌ (immediate only, no scheduler) | ✅ background worker |
| Manual refresh | ✅ refresh_immv() | ✅ refresh_stream_table() |
| Cron schedules | ❌ | ✅ (standard 5/6-field cron + aliases) |
| Duration-based staleness bounds | ❌ | ✅ ('30s', '5m', '1h', …) |
| Dependency DAG | ❌ | ✅ (stream tables can reference other stream tables) |
| Topological refresh ordering | ❌ | ✅ (upstream refreshes before downstream) |
| CALCULATED schedule propagation | ❌ | ✅ (consumers drive upstream schedules) |
| Parallel refresh | ❌ | ✅ (worker pool with database + cluster caps, v0.4.0) |
| Circular pipeline support | ❌ | ✅ (monotone cycles with fixed-point iteration, v0.7.0) |
| Watermark coordination | ❌ | ✅ (multi-source readiness gates, v0.7.0) |
| Refresh group management | ❌ | ✅ (atomic multi-ST refresh, v0.9.0) |
pg_trickle's DAG scheduling is a significant differentiator: you can build multi-layer pipelines where each downstream stream table is automatically refreshed after its upstream dependencies.
8. Change Data Capture
| Attribute | pg_ivm | pg_trickle |
|---|---|---|
| Mechanism | AFTER row triggers (inline, same txn) | AFTER row/statement triggers → change buffer |
| WAL-based CDC | ❌ | ✅ optional (pg_trickle.cdc_mode = 'wal') |
| Statement-level triggers | ❌ | ✅ (v0.4.0, reduced overhead for bulk operations) |
| Logical replication slots | Not used | Used in WAL mode only |
| Write-side overhead | Higher (view maintenance in txn) | Lower (small trigger insert only) |
| Change buffer tables | None (applied immediately) | pgtrickle_changes.changes_<oid> |
| TRUNCATE handling | IMMV truncated/refreshed synchronously | Change buffer cleared; full refresh queued |
9. Concurrency and Isolation
pg_ivm
- Holds
ExclusiveLockon the IMMV during incremental update. - In
READ COMMITTED: serializes concurrent updates to the same IMMV. - In
REPEATABLE READ/SERIALIZABLE: raises an error when a concurrent transaction has already updated the IMMV. - Single-table INSERT-only IMMVs use the lighter
RowExclusiveLock.
pg_trickle
- Refresh operations acquire an advisory lock per stream table so only one refresh can run at a time.
- Base table writes are never blocked by refresh operations.
- Parallel refresh (v0.4.0):
pg_trickle.parallel_refresh_mode = 'on'enables a worker pool with per-database (max_concurrent_refreshes, default 4) and cluster-wide (max_dynamic_refresh_workers) caps. - Atomic refresh groups for diamond dependencies.
- Crash recovery: in-flight refreshes are marked failed on restart; the scheduler retries on the next cycle.
10. Observability
| Feature | pg_ivm | pg_trickle |
|---|---|---|
| Catalog of managed views | pgivm.pg_ivm_immv | pgtrickle.pgt_stream_tables |
| Per-refresh timing/history | ❌ | ✅ pgtrickle.pgt_refresh_history |
| Staleness reporting | ❌ | ✅ stale column + get_staleness() |
| Scheduler status | ❌ | ✅ pgtrickle.pgt_status() |
| NOTIFY-based alerting | ❌ | ✅ pgtrickle_refresh channel (10+ alert types) |
| Error tracking | ❌ | ✅ consecutive error counter, last error message |
| dbt integration | ❌ | ✅ dbt-pgtrickle macro package |
| Explain/introspection | ❌ | ✅ explain_st |
| CDC buffer health | ❌ | ✅ pgtrickle.change_buffer_sizes() (v0.2.0) |
| Source table stats | ❌ | ✅ pgtrickle.list_sources() (v0.2.0) |
| Dependency tree view | ❌ | ✅ pgtrickle.dependency_tree() (v0.2.0) |
| Health triage | ❌ | ✅ pgtrickle.health_check() (v0.2.0) |
| Cross-stream refresh history | ❌ | ✅ pgtrickle.refresh_timeline() (v0.2.0) |
| CDC trigger audit | ❌ | ✅ pgtrickle.trigger_inventory() (v0.2.0) |
| Diamond group inspection | ❌ | ✅ pgtrickle.diamond_groups() (v0.2.0) |
| Quick health summary | ❌ | ✅ pgtrickle.quick_health view (v0.5.0) |
| Source gating status | ❌ | ✅ pgtrickle.source_gates() (v0.5.0) |
| Watermark monitoring | ❌ | ✅ pgtrickle.watermarks() / watermark_status() (v0.7.0) |
| Parallel worker status | ❌ | ✅ pgtrickle.worker_pool_status() / parallel_job_status() (v0.4.0) |
| SCC cycle status | ❌ | ✅ pgtrickle.pgt_scc_status() (v0.7.0) |
| Replication slot health | ❌ | ✅ pgtrickle.slot_health() |
| CDC mode per-source | ❌ | ✅ pgtrickle.pgt_cdc_status view |
11. Installation and Deployment
| Attribute | pg_ivm | pg_trickle |
|---|---|---|
| Pre-built packages | RPM via yum.postgresql.org | OCI image, tarball |
| CNPG / Kubernetes | ❌ (no OCI image) | ✅ OCI extension image + CNPG smoke tests |
| Docker local dev | Manual | ✅ documented + Docker Hub image |
shared_preload_libraries | Required (or session_preload_libraries) | Required |
| Extension upgrade scripts | ✅ (1.0 → 1.1 → … → 1.13) | ✅ (0.1.3 → … → 0.9.0, CI completeness check, upgrade E2E tests) |
pg_dump / restore | Manual IMMV recreation required | ✅ Standard pg_dump supported (v0.8.0) |
12. Performance Characteristics
pg_ivm
- Write path: slower — every DML statement triggers inline view maintenance. From the README example: a single row update on a 10M-row join IMMV takes ~15 ms vs ~9 ms for a plain table update.
- Read path: instant — IMMV is always current, no refresh needed on read.
- Refresh (full): comparable to
REFRESH MATERIALIZED VIEW(~20 seconds for a 10M-row join in the example).
pg_trickle
- Write path: minimal overhead — only a small trigger INSERT into the change buffer (~2–50 μs per row). In WAL mode, zero trigger overhead. Statement-level CDC triggers (v0.4.0) further reduce overhead for bulk ops.
- Read path: instant from the materialized table (potentially stale).
- Refresh (differential): proportional to the number of changed rows, not the total table size. A single-row change on a million-row aggregate touches one row's worth of computation. Algebraic aggregates (v0.9.0) like COUNT/SUM/AVG/STDDEV/VAR update in O(1) constant time per changed row.
- Refresh (full): re-runs the entire query; comparable to
REFRESH MATERIALIZED VIEW. - Parallel refresh (v0.4.0): linear speedup with worker pool size.
- I/O optimizations (v0.9.0): column skipping, source skipping in joins, WHERE filter push-down, index-aware MERGE for tiny change ratios, scalar subquery short-circuit.
13. Known Limitations
pg_ivm Limitations
- Adds latency to every write on tracked base tables.
- Cannot track tables modified via logical replication (subscriber nodes are not updated).
pg_dump/pg_upgraderequire manual recreation of all IMMVs.- Limited aggregate support (no user-defined aggregates, no window functions).
- Column type restrictions (btree operator class required in target list).
- No scheduler or background worker — refresh is immediate only.
- On high-churn tables,
min/maxaggregates can trigger expensive rescans.
pg_trickle Limitations
- In DIFFERENTIAL/FULL mode, data is stale between refresh cycles. Use IMMEDIATE mode for zero-staleness, in-transaction consistency.
- Recursive CTEs in IMMEDIATE mode emit a stack-depth warning; very deep recursion may hit PostgreSQL's stack limit.
- Recursive CTEs in DIFFERENTIAL mode fall back to full recomputation for mixed DELETE/UPDATE changes (DRed scheduled for v0.10.0+).
LIMITwithoutORDER BYis not supported in defining queries.OFFSETwithoutORDER BY … LIMITis not supported. Paged TopK (ORDER BY … LIMIT N OFFSET M) is fully supported.ORDER BY+LIMIT(TopK) without OFFSET uses scoped recomputation (MERGE).- Volatile SQL functions rejected in DIFFERENTIAL mode.
- Materialized views as sources not supported in DIFFERENTIAL mode.
- Window functions in expressions (e.g.
CASE WHEN ROW_NUMBER() OVER (...) > 5) require FULL mode. - Foreign tables as sources require FULL mode.
ALTER EXTENSION pg_trickle UPDATEmigration scripts ship from v0.2.1; continuous upgrade path through v0.9.0.- Targets PostgreSQL 18 only; no backport to PG 13–17 (planned for PG 14–18).
- v0.9.x series — extensive testing but not yet production-hardened at scale.
14. PostgreSQL Version Support
| pg_ivm | pg_trickle (current) | pg_trickle (planned) | |
|---|---|---|---|
| PG 13 | ✅ | ❌ | ❌ (EOL Nov 2025) |
| PG 14 | ✅ | ❌ | ✅ (full plan) |
| PG 15 | ✅ | ❌ | ✅ (full plan) |
| PG 16 | ✅ | ❌ | ✅ (MVP target) |
| PG 17 | ✅ | ❌ | ✅ (MVP target) |
| PG 18 | ✅ | ✅ | ✅ |
Planned resolution: PLAN_PG_BACKCOMPAT.md:
- Minimum viable (PG 16–18): ~1.5 weeks effort.
- Full target (PG 14–18): ~2.5–3 weeks effort.
- pgrx 0.17.0 already supports PG 14–18 via feature flags.
- ~435 lines in
src/dvm/parser.rsneed#[cfg]gating (all in JSON/SQL-standard sections). The remaining ~13,500 lines compile unchanged.
Feature degradation matrix:
| Feature | PG 14 | PG 15 | PG 16 | PG 17 | PG 18 |
|---|---|---|---|---|---|
| Core streaming tables | ✅ | ✅ | ✅ | ✅ | ✅ |
| Trigger-based CDC | ✅ | ✅ | ✅ | ✅ | ✅ |
| Differential refresh | ✅ | ✅ | ✅ | ✅ | ✅ |
| SQL/JSON constructors | — | — | ✅ | ✅ | ✅ |
| JSON_TABLE | — | — | — | ✅ | ✅ |
| WAL-based CDC | Needs test | Needs test | Likely | Likely | ✅ |
15. Features Unique to Each System
Features Unique to pg_trickle (42 items, no pg_ivm equivalent)
- IMMEDIATE + deferred modes (pg_ivm is immediate-only; pg_trickle offers both)
- 60+ aggregate functions (vs 5), including algebraic O(1) for COUNT/SUM/AVG/STDDEV/VAR
- FILTER / HAVING / WITHIN GROUP on aggregates
- Window functions (partition recomputation)
- Set operations (UNION ALL, UNION, INTERSECT, EXCEPT — all 6 variants)
- Recursive CTEs (semi-naive, DRed, recomputation; including IMMEDIATE mode with stack-depth warning)
- LATERAL subqueries and SRFs (jsonb_array_elements, unnest, JSON_TABLE)
- Anti-join / semi-join operators (NOT EXISTS, NOT IN, IN, EXISTS with full SQL)
- Scalar subqueries in SELECT list
- Views as sources (auto-inlined with nested expansion)
- Partitioned table support (RANGE, LIST, HASH with auto-rebuild on ATTACH PARTITION)
- Cascading stream tables (ST referencing other STs via DAG)
- Background scheduler (cron + duration + canonical periods) with multi-database auto-discovery
- GROUPING SETS / CUBE / ROLLUP (auto-rewritten)
- DISTINCT ON (auto-rewritten to ROW_NUMBER)
- Hybrid CDC (trigger → WAL transition)
- DDL change detection and automatic reinitialization (including ALTER FUNCTION body changes)
- Monitoring suite (15+ observability functions:
change_buffer_sizes,list_sources,dependency_tree,health_check,refresh_timeline,trigger_inventory,diamond_groups,source_gates,watermarks,watermark_groups,watermark_status,worker_pool_status,parallel_job_status,pgt_scc_status,slot_health,check_cdc_health) - Auto-rewrite pipeline (6 transparent SQL rewrites)
- Volatile function detection
- AUTO refresh mode (smart DIFFERENTIAL/FULL selection with transparent fallback)
- ALTER QUERY — change the defining query of an existing stream table online, with schema-change classification and OID-preserving migration
- dbt macro package (materialization, status macro, health test, refresh operation)
- CNPG / Kubernetes deployment
- SQL/JSON constructors (JSON_OBJECT, JSON_ARRAY, etc.)
- JSON_TABLE support (PG 17+)
- TopK stream tables (ORDER BY + LIMIT, including IMMEDIATE mode via micro-refresh)
- Paged TopK (ORDER BY + LIMIT + OFFSET for server-side pagination)
- Diamond dependency consistency (multi-path refresh atomicity with SAVEPOINT)
- Extension upgrade infrastructure (SQL migration scripts, CI completeness check, upgrade E2E tests, per-release SQL baselines)
- Row Level Security (refreshes see all data; RLS policies on ST itself; IMMEDIATE mode secured; internal change buffers shielded from RLS interference) (v0.5.0)
- Source gating (pause/resume CDC for bulk loads:
gate_source,ungate_source) (v0.5.0) - Append-only fast path (
append_only => trueskips merge for INSERT-only tables) (v0.5.0) - Parallel refresh (background worker pool with per-database and cluster-wide caps, atomic groups for diamond dependencies) (v0.4.0)
- Statement-level CDC triggers (reduced write-side overhead for bulk operations) (v0.4.0)
- Circular pipeline support (monotone cycles with fixed-point iteration,
max_fixpoint_iterationssafety limit, SCC status monitoring) (v0.7.0) - Watermark APIs (delay refresh until multi-source data is ready:
advance_watermark,create_watermark_group, tolerance-based readiness) (v0.7.0) - pg_dump / pg_restore support (safe backup with auto-reconnect of streams) (v0.8.0)
- Algebraic aggregate maintenance (O(1) constant-time updates for COUNT/SUM/AVG/STDDEV/VAR with floating-point drift correction) (v0.9.0)
- Refresh group management (
create_refresh_group,drop_refresh_groupfor atomic multi-ST refresh) (v0.9.0) - Automatic backoff (exponential slowdown for overloaded streams) (v0.9.0)
- Index-aware MERGE (use index lookups for tiny change ratios) (v0.9.0)
Features Unique to pg_ivm (with planned resolutions)
| # | Feature | Status | Ref |
|---|---|---|---|
| 1 | Immediate (synchronous) maintenance | ✅ Closed — IMMEDIATE refresh mode fully implemented (all phases) | PLAN_TRANSACTIONAL_IVM |
| 2 | Auto-index creation on GROUP BY / DISTINCT / PK | Postponed (Phase 2 of transactional IVM) | PLAN_TRANSACTIONAL_IVM §5.2 |
| 3 | TRUNCATE propagation (auto-truncate IMMV) | ✅ Closed — IMMEDIATE mode fires full refresh on TRUNCATE | PLAN_TRANSACTIONAL_IVM §3.2 |
| 4 | Row Level Security respect | ✅ Closed — v0.5.0: refreshes see all data; RLS on ST itself; IMMEDIATE mode secured; change buffers shielded | ROW_LEVEL_SECURITY.md |
| 5 | PostgreSQL 13–17 support | PG 14–18 backcompat planned (~2.5–3 weeks) | PLAN_PG_BACKCOMPAT |
| 6 | session_preload_libraries | Not applicable (background worker needs shared_preload) | — |
| 7 | Rename via ALTER TABLE | Event trigger support (low effort) | — |
| 8 | Drop via DROP TABLE | Postponed (Phase 2 of transactional IVM) | PLAN_TRANSACTIONAL_IVM §4.3 |
| 9 | Extension upgrade scripts | ✅ Closed — Scripts ship from v0.2.1; CI completeness check and upgrade E2E tests in place | — |
| 10 | pg_dump / pg_restore | ✅ Closed — v0.8.0: safe backup with pg_dump and pg_restore, auto-reconnect streams | — |
Of the 10 items, 5 are now closed (immediate maintenance, TRUNCATE, RLS, upgrade scripts, pg_dump), 3 have concrete implementation plans, and 2 are low-priority or not applicable.
16. Use-Case Fit
| Scenario | Recommended |
|---|---|
| Need views consistent within the same transaction | Either (pg_trickle IMMEDIATE mode or pg_ivm) |
| Application cannot tolerate any view staleness | Either (pg_trickle IMMEDIATE mode or pg_ivm) |
| High write throughput, views can be slightly stale | pg_trickle (DIFFERENTIAL mode) |
| Multi-layer summary pipelines with dependencies | pg_trickle |
| Time-based or cron-driven refresh schedules | pg_trickle |
| Views with complex SQL (window functions, CTEs, UNION) | pg_trickle |
| Simple aggregation with zero-staleness requirement | Either (pg_trickle has richer SQL coverage) |
| Kubernetes / CloudNativePG deployment | pg_trickle |
| dbt integration | pg_trickle |
| Circular / self-referencing pipelines | pg_trickle |
| Multi-source watermark coordination | pg_trickle |
| High-throughput bulk loading (append-only) | pg_trickle (append-only fast path) |
| Row Level Security on analytical summaries | pg_trickle (richer RLS model) |
| pg_dump / pg_restore workflow | pg_trickle |
| PostgreSQL 13–17 | pg_ivm |
| PostgreSQL 18 | pg_trickle (superset of pg_ivm) |
| Production-hardened, stable API | pg_ivm |
| Early adopter, rich SQL coverage needed | pg_trickle |
17. Coexistence
The two extensions can be installed in the same database simultaneously — they
use different schemas (pgivm vs pgtrickle/pgtrickle_changes) and do not
interfere with each other. However, with pg_trickle's IMMEDIATE mode now
available and its dramatically broader feature set (v0.9.0), there is little
reason to use both:
- Use pg_trickle IMMEDIATE for small, critical lookup tables that must be perfectly consistent within transactions (the use-case that previously required pg_ivm).
- Use pg_trickle DIFFERENTIAL/FULL for large analytical summary tables, multi-layer aggregation pipelines, circular pipelines, or views where slight staleness is acceptable.
- Use pg_trickle AUTO (default) to let the system choose the best strategy.
- Use pg_ivm only if you need PostgreSQL 13–17 support or prefer its mature, battle-tested codebase.
18. Recommendations
Planned work that closes pg_ivm gaps
| Priority | Item | Plan | Effort | Closes Gaps |
|---|---|---|---|---|
| ✅ Done | IMMEDIATE refresh mode (all phases) | PLAN_TRANSACTIONAL_IVM | Complete | #1 (immediate maintenance), #3 (TRUNCATE) |
| ✅ Done | Extension upgrade scripts | v0.2.1 release | Complete | #9 (upgrade scripts) |
| ✅ Done | Row Level Security | v0.5.0 release | Complete | #4 (RLS) |
| ✅ Done | pg_dump / pg_restore | v0.8.0 release | Complete | #10 (backup/restore) |
| Postponed | pg_ivm compatibility layer | PLAN_TRANSACTIONAL_IVM Phase 2 | Deferred to post-1.0 | #2 (auto-indexing), #7 (rename), #8 (DROP TABLE) |
| High | PG 16–18 backcompat (MVP) | PLAN_PG_BACKCOMPAT §11 | ~1.5 weeks | #5 (PG version support) |
| Medium | PG 14–18 backcompat (full) | PLAN_PG_BACKCOMPAT §5 | ~2.5–3 weeks | #5 (PG version support) |
Remaining small gaps (no existing plan)
| Priority | Item | Description | Effort |
|---|---|---|---|
| Low | ALTER TABLE RENAME | Detect rename via event trigger, update catalog | 2–4h |
Not worth pursuing
| Item | Reason |
|---|---|
| PG 13 support | EOL since November 2025. Incompatible raw_parser() API. |
| session_preload_libraries | Requires background worker, which needs shared_preload_libraries. |
19. Conclusion
pg_trickle covers all of pg_ivm's SQL surface and extends it dramatically with 55+ additional aggregate functions (including algebraic O(1) maintenance for COUNT/SUM/AVG/STDDEV/VAR), window functions, set operations, recursive CTEs, LATERAL support, anti/semi-joins, circular pipeline support, watermark coordination, parallel refresh, Row Level Security, and a comprehensive operational layer.
The immediate maintenance gap is now fully closed: pg_trickle's IMMEDIATE
refresh mode provides the same in-transaction consistency as pg_ivm, while also
supporting window functions, LATERAL, scalar subqueries, WITH RECURSIVE (IM1),
TopK micro-refresh (IM2), and cascading stream tables in IMMEDIATE mode — all
of which pg_ivm cannot do.
The upgrade infrastructure gap is also closed: v0.2.1 ships SQL migration scripts with continuous upgrade path through v0.9.0, a CI completeness checker, and upgrade E2E tests, matching pg_ivm's upgrade path story.
The Row Level Security gap is closed (v0.5.0): refreshes see all data, RLS policies on the stream table itself control access, and IMMEDIATE mode is secured with shielded change buffers.
The pg_dump/restore gap is closed (v0.8.0): safe backup with standard PostgreSQL tools and automatic stream reconnection on restore.
The one remaining structural gap is PG version support:
- PLAN_PG_BACKCOMPAT details backporting
to PG 14–18 (or PG 16–18 as MVP) in ~2.5–3 weeks, primarily by
#[cfg]- gating ~435 lines of JSON/SQL-standard parse-tree code.
Once backcompat is implemented, pg_trickle will be a strict superset of pg_ivm in every dimension: same immediate maintenance model, comparable PG version support (14–18 vs 13–18, with PG 13 EOL), dramatically wider SQL coverage (60+ aggregates vs 5, 21 DVM operators, 42 unique features), and a complete operational layer that pg_ivm entirely lacks.
For users migrating from pg_ivm, the IMMEDIATE refresh mode already provides
the same zero-staleness guarantee. A full compatibility layer (pgivm.create_immv,
pgivm.refresh_immv, pgivm.pg_ivm_immv) is planned for post-1.0 to enable
zero-change migration.
References
- pg_ivm repository: https://github.com/sraoss/pg_ivm
- pg_trickle repository: https://github.com/grove/pg-trickle
- DBSP differential dataflow paper: https://arxiv.org/abs/2203.16684
- pg_trickle ESSENCE.md: ../../ESSENCE.md
- pg_trickle DVM operators: ../../docs/DVM_OPERATORS.md
- pg_trickle architecture: ../../docs/ARCHITECTURE.md
Triggers vs Logical Replication for CDC in pg_trickle
Status: Evaluation Report (updated with implementation status)
Date: 2026-02-24
Context: ADR-001/ADR-002 in PLAN_ADRS.md · PLAN_USER_TRIGGERS_EXPLICIT_DML.md
Executive Summary
pg_trickle uses row-level AFTER triggers to capture changes on source tables. This report evaluates the trigger-based approach against logical replication (WAL-based CDC) across five dimensions: correctness, performance, operations, and two end-user features — user-defined triggers on stream tables and logical replication subscriptions from stream tables.
Conclusion: Triggers remain the correct choice for the current scope given
operational simplicity and zero-config deployment. The hybrid approach —
trigger bootstrap for creation with automatic WAL transition for steady-state —
is now implemented (pg_trickle.cdc_mode GUC, src/wal_decoder.rs). User-
defined triggers on stream tables are also implemented (pg_trickle.user_triggers
GUC, DISABLE TRIGGER USER during refresh). These were previously recommendations
(§6.2, §6.6); both are now shipped.
However, the atomicity constraint — the original reason for choosing triggers — is primarily a creation-time inconvenience, not a steady-state limitation. Once a stream table exists, logical replication has three significant runtime advantages:
-
No write-side overhead — With triggers, every INSERT/UPDATE/DELETE on a tracked source table does extra work before the application's transaction can commit: it runs a PL/pgSQL function, writes a row into a buffer table, and updates an index. This slows down the application. With logical replication, PostgreSQL already writes every change to its internal transaction log (WAL) regardless — the CDC layer simply reads that log after the fact, so the application's writes are not slowed down at all.
-
TRUNCATE capture — When someone runs
TRUNCATEon a source table, row-level triggers do not fire (TRUNCATE replaces the entire file rather than deleting rows one-by-one). This leaves stream tables silently stale until a manual refresh. Logical replication captures TRUNCATE natively from the WAL, so pg_trickle would know immediately that all rows were removed. -
Change ordering from the transaction log — With triggers, each trigger independently calls
pg_current_wal_lsn()to timestamp its change. With logical replication, the ordering comes directly from the WAL — the authoritative, global record of all database changes — which means change ordering is guaranteed to match commit order, even across concurrent transactions.
The two end-user features (user triggers and logical replication FROM stream tables) are both achievable without changing the CDC mechanism. A hybrid approach (triggers for creation, logical replication for steady-state) deserves serious consideration. See §3 for the full analysis.
1. Background
Current Architecture
CDC triggers on each tracked source table write typed per-column rows into
per-table buffer tables (pgtrickle_changes.changes_<oid>). Each buffer row
captures:
| Column | Purpose |
|---|---|
change_id | BIGSERIAL ordering within a source |
lsn | pg_current_wal_lsn() at trigger time |
action | 'I' / 'U' / 'D' |
pk_hash | Content hash of PK columns (optional) |
new_<col> | Per-column NEW values (INSERT/UPDATE) |
old_<col> | Per-column OLD values (UPDATE/DELETE) |
A covering B-tree index (lsn, pk_hash, change_id) INCLUDE (action) supports
the differential refresh's LSN-range scan.
The Atomicity Constraint
create_stream_table() performs DDL (CREATE TABLE) and DML (catalog inserts)
before setting up CDC. pg_create_logical_replication_slot() cannot execute
inside a transaction that has already performed writes. This makes
single-transaction atomic creation impossible with logical replication — the
decisive factor in the original ADR.
2. Comparison Matrix
2.1 Correctness & Transactional Safety
| Aspect | Triggers | Logical Replication |
|---|---|---|
| Atomic creation | ✅ Same transaction as DDL+catalog | ❌ Slot creation requires separate transaction |
| Change visibility | ✅ Immediate (same transaction) | ⚠️ Asynchronous (after COMMIT + WAL decode) |
| TRUNCATE capture | ❌ Row-level triggers not fired | ✅ WAL emits TRUNCATE since PG 11 |
| Transaction ordering | ✅ Change buffer rows ordered by LSN | ✅ WAL stream preserves commit order |
| Crash recovery | ✅ Buffer tables are WAL-logged; no orphan state | ⚠️ Slot survives crash but may need re-sync |
| Schema change handling | ✅ DDL event hooks rebuild trigger in-place | ⚠️ Requires slot re-creation or output plugin awareness |
Key insight: The TRUNCATE gap is the most significant correctness
limitation of the trigger approach. A statement-level AFTER TRUNCATE trigger
that marks downstream STs for automatic FULL refresh would close this gap
without changing the CDC architecture (see §6 Recommendation 3).
2.2 Performance
| Metric | Triggers | Logical Replication |
|---|---|---|
| Per-row write overhead | ~2–4 μs (narrow INSERT) to ~5–15 μs (wide UPDATE) | ~0 (WAL writes happen regardless) |
| Expected throughput reduction | 1.5–5× on tracked source tables | None on source tables |
| Write amplification | 2× (source WAL + buffer table WAL + index) | 1× (only source WAL) |
| Change buffer storage | Heap table + index per source | WAL segments (shared, recycled) |
| Sequence contention | BIGSERIAL per buffer (lightweight) | N/A |
| Throughput ceiling | ~5,000 writes/sec (estimated) | WAL throughput (much higher) |
| Decoding CPU cost | N/A | Non-trivial; output plugin runs in WAL sender |
| Zero-change refresh | ~3 ms (EXISTS check on empty buffer) | ~3 ms (no pending WAL changes) |
Key insight: Trigger overhead is synchronous — every committing transaction pays the cost. For applications with moderate write rates (<5,000 writes/sec) this is acceptable. For high-throughput OLTP workloads, logical replication's zero write-side overhead is a significant advantage.
2.3 Operational Complexity
| Aspect | Triggers | Logical Replication |
|---|---|---|
| PostgreSQL configuration | None required | wal_level = logical + restart |
| Managed PG compatibility | ✅ Works everywhere | ⚠️ Some providers restrict wal_level |
| WAL retention risk | None (buffer tables are independent) | Slots prevent WAL cleanup; disk exhaustion risk |
| Slot management | N/A | Create, monitor, drop; orphan detection |
max_replication_slots | N/A | Must be sized for number of tracked sources |
REPLICA IDENTITY config | N/A | Required on all tracked source tables |
| Monitoring | Buffer table row counts | Slot lag, WAL retention, decode rate |
| Extension dependencies | None | Output plugin (pgoutput, wal2json, or custom) |
| Upgrade path | CREATE OR REPLACE FUNCTION | Slot protocol version compatibility |
Key insight: Triggers are operationally simpler by a wide margin. Logical replication introduces a class of failure modes (stuck slots, WAL bloat, replica identity misconfiguration) that require dedicated monitoring and operational runbooks.
2.4 Feature: User Triggers on Stream Tables
This addresses end-user triggers on the output stream tables, not CDC triggers on source tables.
| Aspect | Current (Trigger CDC) | With Logical Replication CDC |
|---|---|---|
| Feasibility | ✅ Achievable via session_replication_role | ✅ Same mechanism applies |
| Refresh suppression | SET LOCAL session_replication_role = 'replica' | Same |
| Post-refresh notification | NOTIFY pg_trickle_refresh with metadata | Same |
| MERGE firing pattern | DELETE+INSERT (not UPDATE); must be suppressed | Same — refresh mechanism is independent of CDC |
Key insight: User trigger support on stream tables is orthogonal to the
CDC mechanism and is now implemented. The solution uses ALTER TABLE ... DISABLE TRIGGER USER / ENABLE TRIGGER USER around FULL refresh (avoiding
the session_replication_role conflict with logical replication publishing).
In DIFFERENTIAL mode, explicit per-row DML (INSERT/UPDATE/DELETE) is used
instead of MERGE so that user-defined AFTER triggers fire correctly. The
implementation is controlled by the pg_trickle.user_triggers GUC (auto/
on/off). See PLAN_USER_TRIGGERS_EXPLICIT_DML.md
for the full design.
Note: Sections 2.1–2.5 compare creation-time and operational aspects. For a focused steady-state comparison (what matters once the ST exists), see §3.
2.5 Feature: Logical Replication FROM Stream Tables
This addresses end-users subscribing to stream table changes via PostgreSQL's built-in logical replication.
| Aspect | Status | Notes |
|---|---|---|
| Basic publishing | ✅ Works today | STs are regular heap tables; CREATE PUBLICATION works |
__pgt_row_id column | ⚠️ Replicated by default | Use column list in PUBLICATION to exclude, or document as usable PK |
| Differential refresh | ✅ DELETE+INSERT via MERGE are replicated | Subscriber sees individual DELETEs and INSERTs, not UPDATEs |
| Full refresh | ✅ TRUNCATE + INSERT replicated | Subscriber needs replica_identity set; receives TRUNCATE + mass INSERT |
REPLICA IDENTITY | Needs configuration | __pgt_row_id could serve as unique index for identity |
The session_replication_role Conflict
If the refresh engine sets session_replication_role = 'replica' to suppress
user triggers (Phase 1 of the user-trigger plan), this may also suppress
publication of the DML to logical replication subscribers. When a session
is in replica mode, PostgreSQL treats it as a replication subscriber — DML
performed in that session may not be forwarded to downstream subscribers
(depending on the publication's publish_via_partition_root and the
subscriber's origin setting).
This is a potential conflict between the two features. Options:
| Option | User Triggers Suppressed? | Replication Published? | Drawback |
|---|---|---|---|
session_replication_role = 'replica' | ✅ Yes | ❌ May not be published | Breaks logical replication from STs |
ALTER TABLE ... DISABLE TRIGGER USER | ✅ Yes | ✅ Yes | Requires ACCESS EXCLUSIVE lock |
pg_trickle.suppress_user_triggers GUC → DISABLE TRIGGER USER only when needed | ✅ Configurable | ✅ Yes | Lock overhead; crash safety concern (ENABLE on recovery) |
tgisinternal flag manipulation | ✅ Yes | ✅ Yes | Non-portable; catalog-level hack |
Recommended resolution: Use ALTER TABLE ... DISABLE TRIGGER USER within
a SAVEPOINT, restoring on error. The ACCESS EXCLUSIVE lock is brief (only
held for the catalog update, not the entire refresh). If the user has enabled
both user triggers AND logical replication on a stream table, this is the only
approach that supports both simultaneously. If neither feature is in use, skip
the overhead entirely.
3. Separating Creation-Time from Steady-State
The original ADR chose triggers because pg_create_logical_replication_slot()
cannot execute inside a transaction that has already performed writes. This
report initially treated that constraint as "decisive." But it deserves
scrutiny: the atomicity constraint only affects the create_stream_table()
call — a one-time event. Once a stream table exists, CDC runs for hours,
days, or months. The steady-state characteristics are what actually matter for
performance, correctness, and user experience.
3.1 The Atomicity Constraint Is a Solvable Engineering Problem
The constraint is real but workable. Three approaches exist, all with well-understood trade-offs:
| Approach | How It Works | Downside |
|---|---|---|
| Two-phase creation | Phase 1: DDL + catalog in one transaction. Phase 2: slot creation in a separate transaction. Rollback Phase 1 artifacts on Phase 2 failure. | Brief window where catalog entry exists without CDC. Cleanup on failure adds ~50 lines of code. |
| Background worker handoff | Main transaction creates DDL + catalog + temporary trigger. Background worker creates slot asynchronously, then drops trigger. | Race window: changes between COMMIT and slot creation are captured by the temporary trigger, so no data is lost. Adds complexity (~100 lines). |
| Trigger bootstrap → slot transition | Create with triggers (current approach). After first successful refresh, migrate to logical replication in the background. | Trigger overhead during bootstrap period (minutes). Most natural hybrid approach. |
None of these are architecturally difficult. The two-phase approach is straightforward — if slot creation fails, drop the storage table and catalog entry. The temporary-trigger approach eliminates even the theoretical data-loss window. These are engineering inconveniences, not fundamental blockers.
3.2 Steady-State: Triggers vs Logical Replication (Honest Comparison)
Once the stream table exists and CDC is running, here is how the two approaches compare on their actual runtime merits.
In plain terms: With triggers, every time the application writes a row to a tracked source table, the database does extra work right then and there — calling a function, writing to a buffer table, updating an index — all before the application's transaction can finish. This is like a toll booth on a highway: every car (write) must stop and pay (trigger overhead) before continuing.
With logical replication, the database already writes every change to its internal transaction log (the WAL) as part of normal operation. CDC simply reads that log after the fact, in a separate background process. The application's writes pass through without stopping — there is no toll booth. The cost of reading the log is paid by the database server, but it happens asynchronously and never slows down the application.
Where Logical Replication Wins (Steady-State)
| Dimension | Trigger Impact | Logical Replication Advantage |
|---|---|---|
| Write-path latency | Every INSERT/UPDATE/DELETE on a tracked source pays ~2–15 μs synchronous overhead (PL/pgSQL dispatch, buffer INSERT, index update). This is inside the committing transaction's critical path. | Zero additional write-path cost. WAL writes happen regardless; decoding is asynchronous. Source table DML performance is completely unaffected. |
| Write amplification | Each source row change produces: (1) source table WAL, (2) buffer table heap write, (3) buffer table WAL, (4) buffer index update, (5) index WAL. ~2–3× total write amplification. | 1× — only the source table's normal WAL. No additional heap writes, no secondary indexes. |
| TRUNCATE capture | Cannot capture. Row-level triggers don't fire. Requires a separate statement-level AFTER TRUNCATE workaround (§4) that only marks for reinit — the actual row deletions are invisible to differential mode. | Native. WAL emits TRUNCATE events since PG 11. The decoder receives a clean signal that all rows were removed. |
| Throughput ceiling | Estimated ~5,000 writes/sec on tracked sources before trigger overhead dominates. PL/pgSQL function dispatch is the bottleneck. | Bounded by WAL throughput — typically 50,000–200,000+ writes/sec depending on hardware and wal_buffers. |
| Connection-pool pressure | Trigger executes in the application's connection. Long-running trigger INSERTs can increase connection hold time under load. | Decoding runs in a dedicated WAL sender process. Application connections are unaffected. |
| Vacuum pressure | Buffer tables accumulate dead tuples between cleanups. Each refresh cycle creates bloat that autovacuum must reclaim. | No buffer tables to vacuum. WAL segments are recycled by the WAL management subsystem. |
| Transaction ID consumption | Each trigger INSERT consumes sub-transaction resources within the outer transaction. High-volume batch operations can cause excessive subtransaction overhead. | No additional transaction work. |
Where Triggers Win (Steady-State)
| Dimension | Trigger Advantage | Logical Replication Impact |
|---|---|---|
| Operational simplicity | No external state to manage. Buffer tables are regular heap tables — queryable, monitorable, backed up normally. Drop the trigger and it's gone. | Replication slots are persistent server-side state. A stuck or crashed consumer prevents WAL recycling, potentially filling the disk. Requires monitoring, max_slot_wal_keep_size guards, and orphan-slot cleanup. |
| Zero configuration | Works with any wal_level (minimal, replica, logical). No restart required. No REPLICA IDENTITY configuration. | Requires wal_level = logical (server restart), max_replication_slots sizing, and REPLICA IDENTITY on every tracked source table. Many managed PostgreSQL providers default to wal_level = replica. |
| Schema evolution | DDL event hooks rebuild the trigger function via CREATE OR REPLACE FUNCTION. New columns are added to the buffer table with ADD COLUMN IF NOT EXISTS. Simple, same-transaction, no coordination. | Schema changes on tracked tables require careful handling. The output plugin must be aware of column additions/removals. Slot may need to be recreated. ALTER TABLE during active decoding can cause protocol errors. |
| Debugging & visibility | Change buffers are queryable tables: SELECT * FROM pgtrickle_changes.changes_12345 ORDER BY change_id DESC LIMIT 10. Immediate visibility into what was captured. | WAL is binary and opaque. Inspecting captured changes requires pg_logical_slot_peek_changes() which advances or peeks the slot — disruptive in production. |
| Crash recovery | Buffer tables are WAL-logged and survive crashes. No special recovery needed — the refresh engine picks up from the last frontier LSN. | Slots survive crashes, but the decoding position may be ahead of what pg_trickle has consumed. Requires careful bookkeeping to avoid replaying or losing changes. |
| Multi-source coordination | Each source has an independent buffer table. The refresh engine reads from multiple buffers with independent LSN ranges. No coordination between sources. | Multiple sources could share a single slot (decoding all tables) or use per-source slots. Shared slots require demultiplexing; per-source slots multiply the slot management burden. |
| Isolation | Trigger failure (e.g., buffer table full) raises an error in the application transaction — visible and immediate. | Decoding failure is asynchronous. The application commits successfully, but changes may never reach the buffer. Silent data loss is possible unless monitored. |
Neutral (Roughly Equivalent)
| Dimension | Notes |
|---|---|
| Refresh-path performance | Both approaches populate the same buffer table schema. The MERGE/DVM pipeline is identical regardless of how buffers were filled. |
| Zero-change detection | Triggers: EXISTS check on empty buffer (~3 ms). Logical replication: check slot position vs current WAL LSN (~3 ms). Equivalent. |
| Memory footprint | Triggers: PL/pgSQL function cache per backend. Logical replication: WAL sender process + decoding context. Both are modest. |
3.3 When Does Logical Replication Become the Better Choice?
The crossover point depends on workload characteristics:
| Scenario | Better Choice | Why |
|---|---|---|
| < 1,000 writes/sec on tracked sources | Triggers | Overhead is negligible; operational simplicity dominates |
| 1,000–5,000 writes/sec | Either / Triggers still acceptable | Trigger overhead is measurable but unlikely to be the bottleneck |
| > 5,000 writes/sec | Logical Replication | Write-path overhead starts to matter; 2–3× write amplification compounds |
| ETL patterns (TRUNCATE + bulk INSERT) | Logical Replication | Native TRUNCATE capture; no stale-data gap |
| Wide tables (20+ columns) | Logical Replication | Trigger overhead scales with column count (~5–15 μs); WAL overhead does not |
Managed PostgreSQL with wal_level restrictions | Triggers | No choice — logical replication may not be available |
| Many tracked sources (50+) | Logical Replication | Fewer moving parts than 50 triggers + 50 buffer tables + 50 indexes |
| Need logical replication FROM stream tables | Triggers (with caveats) | see §2.5 — session_replication_role conflict with DISABLE TRIGGER USER as workaround |
3.4 Reassessing the Decision
With the atomicity constraint properly scoped as a creation-time concern, the decision to use triggers rests on three remaining pillars:
-
Operational simplicity — no
wal_levelchange, no slot management, noREPLICA IDENTITYconfiguration. This is genuinely valuable for an early-stage extension that needs frictionless adoption. -
Debugging visibility — queryable buffer tables are a major developer experience advantage. Being able to
SELECT * FROM changes_<oid>during debugging is invaluable. -
Zero-config deployment — works on any PostgreSQL 18 instance without server restarts or configuration changes. Critical for managed PostgreSQL environments.
However, these advantages are primarily about developer and operator experience, not about the fundamental capability of the system. A mature pg_trickle deployment that needs high write throughput, TRUNCATE support, or minimal source-table impact would be better served by logical replication in steady-state.
The honest assessment: Triggers are the right choice today for pragmatic reasons (simplicity, early-stage adoption, managed PG compatibility). But the report should not overstate the atomicity constraint as a fundamental blocker — it is a solvable problem. If pg_trickle grows to serve high-throughput production workloads, the migration to logical replication for steady-state CDC should be treated as a planned evolution, not a theoretical future.
4. TRUNCATE: The Gap and How to Close It
This limitation is one of the strongest arguments for logical replication in steady-state — see §3.2 for the comparison.
The TRUNCATE limitation is the most commonly cited drawback of trigger-based CDC. PostgreSQL does not fire row-level triggers for TRUNCATE because TRUNCATE operates at the file level (O(1)) — there are no individual rows to enumerate.
Current Behavior
- User runs
TRUNCATE source_table - CDC trigger does not fire — change buffer remains empty
- Scheduler sees zero changes →
NO_DATA→ stream table is stale - Stream table shows data from rows that no longer exist
Proposed Fix: Statement-Level AFTER TRUNCATE Trigger
PostgreSQL supports statement-level AFTER TRUNCATE triggers. While they
provide no OLD row data, they can mark downstream stream tables for
reinitialization:
CREATE TRIGGER pg_trickle_truncate_<oid>
AFTER TRUNCATE ON <source_table>
FOR EACH STATEMENT
EXECUTE FUNCTION pgtrickle.on_source_truncated('<source_oid>');
The trigger function would:
- Look up all stream tables that depend on this source
- Mark them
needs_reinit = truein the catalog - Cascade transitively to downstream STs
This closes the TRUNCATE gap without changing the CDC architecture. The next scheduler cycle would trigger a FULL refresh automatically.
Effort estimate: ~2–4 hours (trigger creation in cdc.rs, PL/pgSQL or
Rust function for on_source_truncated, cascade logic reuse from hooks.rs).
5. Migration Path: Trigger → Logical Replication (Now Implemented)
Status: Phase A (Hybrid Creation) is now implemented in
src/wal_decoder.rs. Thepg_trickle.cdc_modeGUC controls the behavior (trigger/auto/wal).
As discussed in §3, the atomicity constraint is a creation-time problem with known solutions. The buffer table schema and downstream IVM pipeline are decoupled from the capture mechanism, so migration is isolated to the CDC layer. This should be treated as a planned evolution for high-throughput deployments, not a theoretical future:
Phase A: Hybrid Creation
create_stream_table()continues using triggers for atomic creation- After first successful full refresh, a background worker creates a replication slot and transitions to WAL-based capture
- Trigger is dropped; buffer table continues to be populated from WAL decode
Phase B: Steady-State WAL Capture
- Background worker runs a logical decoding consumer per tracked source
- WAL changes are decoded and written to the same buffer table schema
- Downstream pipeline (DVM, MERGE, frontier) is unchanged
- TRUNCATE events are captured natively from WAL
Prerequisites
wal_level = logical(must be documented as optional upgrade path)REPLICA IDENTITYon tracked sources (auto-configured or user-managed)- Custom output plugin or
pgoutput+ column mapping - Slot health monitoring (WAL retention alerts, orphan cleanup)
Effort estimate: 3–5 weeks for a production-quality implementation.
6. Recommendations
Recommendation 1: Keep Trigger-Based CDC (For Now)
Operational simplicity and zero-config deployment are strong advantages for an early-stage extension. The performance ceiling (~5,000 writes/sec) is adequate for current target use cases. The atomicity constraint, while solvable (see §3.1), adds creation-time complexity that is not yet justified.
However: This decision should be revisited when any of these triggers are
hit: (a) users report write-path latency from CDC triggers, (b) TRUNCATE-based
ETL patterns become a common pain point, (c) pg_trickle targets environments
where wal_level = logical is already the norm. The steady-state advantages of
logical replication (§3.2) are substantial and should not be dismissed.
Recommendation 2: ✅ IMPLEMENTED — User Trigger Suppression
User-defined triggers on stream tables are now fully supported. The
implementation uses ALTER TABLE ... DISABLE TRIGGER USER / ENABLE TRIGGER USER around FULL refresh, and explicit per-row DML (INSERT/UPDATE/DELETE)
instead of MERGE during DIFFERENTIAL refresh so user AFTER triggers fire
correctly. Controlled by pg_trickle.user_triggers GUC (auto/on/off).
The session_replication_role approach from the original plan was rejected to
avoid conflict with logical replication publishing (see §2.5).
Recommendation 3: Add TRUNCATE Capture Trigger
Add a statement-level AFTER TRUNCATE trigger on each tracked source table
that marks downstream STs for reinitialization. This closes the most
significant usability gap without changing the CDC architecture.
Recommendation 4: Document Logical Replication FROM Stream Tables
Add documentation and examples for CREATE PUBLICATION on stream tables,
including:
- Column filtering to exclude
__pgt_row_id REPLICA IDENTITYconfiguration using__pgt_row_idas unique index- Behavior during FULL vs DIFFERENTIAL refresh
- Interaction with user trigger suppression
Recommendation 5: Benchmark Trigger Overhead
Execute the benchmark plan in PLAN_TRIGGERS_OVERHEAD.md to establish data-driven thresholds for the logical replication migration crossover point. The results should feed directly into the §3.3 crossover analysis.
Recommendation 6: ✅ IMPLEMENTED — Hybrid CDC Approach
The "trigger bootstrap → slot transition" pattern is now implemented in
src/wal_decoder.rs (1152 lines). The implementation includes:
- Automatic transition: After stream table creation with triggers, a background worker creates a logical replication slot and transitions to WAL-based capture.
- GUC control:
pg_trickle.cdc_mode(trigger/auto/wal) andpg_trickle.wal_transition_timeoutcontrol the behavior. - Transition orchestration: Create slot → wait for catch-up → drop trigger. Automatic fallback to triggers if slot creation fails.
- Catalog extension:
pgt_dependenciesgainscdc_mode,slot_name,decoder_confirmed_lsn,transition_started_atcolumns. - Health monitoring:
pgtrickle.check_cdc_health()function andNOTIFY pg_trickle_cdc_transitionnotifications.
7. Decision Log
| # | Decision | Rationale |
|---|---|---|
| D1 | Keep triggers for CDC on source tables — for now | Zero-config, operational simplicity, adequate for current scale |
| D2 | Atomicity constraint is solvable, not fundamental | Two-phase creation and hybrid bootstrap are proven patterns (§3.1) |
| D3 | Logical replication is superior in steady-state | Zero write overhead, TRUNCATE capture, higher throughput ceiling (§3.2) |
| D4 | User triggers on STs are orthogonal to CDC choice | session_replication_role / DISABLE TRIGGER USER works with either approach |
| D5 | Logical replication FROM STs works today | Regular heap tables; needs documentation, not code |
| D6 | TRUNCATE gap is closable with statement-level trigger | Low effort, high impact — but logical replication handles it natively |
| D7 | Hybrid approach is the optimal long-term target | Trigger bootstrap for creation + logical replication for steady-state |
| D8 | User trigger suppression uses DISABLE TRIGGER USER | Avoids session_replication_role conflict with logical replication publishing (§2.5) |
| D9 | Hybrid CDC implemented with auto-transition | pg_trickle.cdc_mode = 'auto' triggers → WAL transition after creation |
| D10 | Explicit DML for DIFFERENTIAL refresh with user triggers | INSERT/UPDATE/DELETE instead of MERGE so AFTER triggers fire correctly |