Best-Practice Patterns for pg_trickle
This guide covers common data modeling patterns and recommended configurations for pg_trickle stream tables. Each pattern includes worked SQL examples, anti-patterns to avoid, and refresh mode recommendations.
Version: v0.14.0+. Some features require recent versions — check SQL_REFERENCE.md for per-feature availability.
Table of Contents
- Pattern 1: Bronze / Silver / Gold Materialization
- Pattern 2: Event Sourcing with Stream Tables
- Pattern 3: Slowly Changing Dimensions (SCD)
- Pattern 4: High-Fan-Out Topology
- Pattern 5: Real-Time Dashboards
- Pattern 6: Tiered Refresh Strategy
- General Guidelines
Pattern 1: Bronze / Silver / Gold Materialization
A multi-layer approach where raw data flows through progressively refined stream tables, similar to a medallion architecture.
Architecture
[raw_events] ← Bronze: raw ingest table (regular table)
↓
[events_cleaned] ← Silver: filtered, deduplicated, typed
↓
[events_aggregated] ← Gold: business-level aggregates
SQL Example
-- Bronze: regular PostgreSQL table (source of truth)
CREATE TABLE raw_events (
event_id BIGSERIAL PRIMARY KEY,
user_id INT NOT NULL,
event_type TEXT NOT NULL,
payload JSONB,
received_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Silver: cleaned and deduplicated events
SELECT pgtrickle.create_stream_table(
'events_cleaned',
$$SELECT DISTINCT ON (event_id)
event_id,
user_id,
event_type,
(payload->>'amount')::numeric AS amount,
received_at
FROM raw_events
WHERE event_type IN ('purchase', 'refund', 'subscription')$$,
schedule => '5s',
refresh_mode => 'DIFFERENTIAL'
);
-- Gold: per-user purchase summary
SELECT pgtrickle.create_stream_table(
'user_purchase_summary',
$$SELECT user_id,
COUNT(*) AS total_purchases,
SUM(amount) AS total_spent,
AVG(amount) AS avg_order
FROM events_cleaned
WHERE event_type = 'purchase'
GROUP BY user_id$$,
schedule => 'calculated',
refresh_mode => 'DIFFERENTIAL'
);
Recommended Configuration
| Layer | Refresh Mode | Schedule | Tier |
|---|---|---|---|
| Silver | DIFFERENTIAL | 5s – 30s | hot |
| Gold | DIFFERENTIAL | calculated | hot |
Anti-Patterns
- Don't use FULL refresh for Silver. With frequent small inserts, DIFFERENTIAL is 10–100x faster.
- Don't skip the Silver layer. Joining raw tables directly in Gold queries produces wider joins and slower deltas.
- Don't use IMMEDIATE mode for Gold. Aggregate maintenance on every DML row is expensive — batched DIFFERENTIAL is more efficient.
Pattern 2: Event Sourcing with Stream Tables
Use stream tables as projections of an append-only event log. The source table is the event store; stream tables materialize different read models.
SQL Example
-- Event store (append-only source)
CREATE TABLE events (
event_id BIGSERIAL PRIMARY KEY,
aggregate_id UUID NOT NULL,
event_type TEXT NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Projection 1: Current state per aggregate
SELECT pgtrickle.create_stream_table(
'aggregate_state',
$$SELECT DISTINCT ON (aggregate_id)
aggregate_id,
event_type AS last_event,
payload AS current_state,
created_at AS last_updated
FROM events
ORDER BY aggregate_id, created_at DESC$$,
schedule => '2s',
refresh_mode => 'DIFFERENTIAL'
);
-- Projection 2: Event counts by type per hour
SELECT pgtrickle.create_stream_table(
'hourly_event_counts',
$$SELECT date_trunc('hour', created_at) AS hour,
event_type,
COUNT(*) AS event_count
FROM events
GROUP BY 1, 2$$,
schedule => '10s',
refresh_mode => 'DIFFERENTIAL'
);
Recommended Configuration
| Projection | Refresh Mode | Why |
|---|---|---|
| Current state | DIFFERENTIAL | Small delta per cycle; DISTINCT ON supported |
| Hourly counts | DIFFERENTIAL | Algebraic aggregate (COUNT), efficient delta |
| String aggregations | AUTO | GROUP_RESCAN aggs may benefit from FULL |
Anti-Patterns
- Don't DELETE from the event store. pg_trickle tracks changes via triggers; mixing append and delete on the source creates unnecessary delta complexity. Archive old events to a separate table.
- Don't use
append_only => truewith UPDATE/DELETE patterns. Theappend_onlyflag skips DELETE tracking in the change buffer — only use it when the source truly never updates or deletes.
Pattern 3: Slowly Changing Dimensions (SCD)
SCD Type 1: Overwrite
The stream table always reflects the current state. Source updates overwrite previous values.
-- Source: customer dimension table (updated in place)
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
name TEXT NOT NULL,
email TEXT,
tier TEXT DEFAULT 'standard',
updated_at TIMESTAMPTZ DEFAULT now()
);
-- SCD-1: current customer state enriched with order stats
SELECT pgtrickle.create_stream_table(
'customer_360',
$$SELECT c.customer_id,
c.name,
c.email,
c.tier,
COUNT(o.id) AS total_orders,
COALESCE(SUM(o.amount), 0) AS lifetime_value
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, c.email, c.tier$$,
schedule => '30s',
refresh_mode => 'DIFFERENTIAL'
);
SCD Type 2: History Tracking
For SCD-2, maintain a history table with valid-from/valid-to ranges. The stream table provides the current snapshot.
-- Source: customer history with validity ranges
CREATE TABLE customer_history (
customer_id INT NOT NULL,
name TEXT NOT NULL,
tier TEXT NOT NULL,
valid_from TIMESTAMPTZ NOT NULL,
valid_to TIMESTAMPTZ, -- NULL = current
PRIMARY KEY (customer_id, valid_from)
);
-- Current active records only
SELECT pgtrickle.create_stream_table(
'customers_current',
$$SELECT customer_id, name, tier, valid_from
FROM customer_history
WHERE valid_to IS NULL$$,
schedule => '10s',
refresh_mode => 'DIFFERENTIAL'
);
Anti-Patterns
- Don't use FULL refresh for SCD-1 with large dimension tables. Customer tables with millions of rows but few changes per cycle are ideal for DIFFERENTIAL.
- Don't forget to index
valid_to IS NULLfor SCD-2 sources. Without it, the delta scan touches all historical rows.
Pattern 4: High-Fan-Out Topology
When a single source table feeds many downstream stream tables.
Architecture
[orders]
↙ ↓ ↓ ↘
[daily_totals] [by_region] [by_product] [top_customers]
SQL Example
-- Single source feeding multiple views
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
region TEXT NOT NULL,
product_id INT NOT NULL,
amount NUMERIC(10,2) NOT NULL,
order_date DATE NOT NULL DEFAULT CURRENT_DATE
);
-- Fan-out: 4 stream tables on 1 source
SELECT pgtrickle.create_stream_table('daily_totals',
'SELECT order_date, SUM(amount) AS daily_total, COUNT(*) AS order_count
FROM orders GROUP BY order_date',
schedule => '5s', refresh_mode => 'DIFFERENTIAL');
SELECT pgtrickle.create_stream_table('by_region',
'SELECT region, SUM(amount) AS total, COUNT(*) AS cnt
FROM orders GROUP BY region',
schedule => '5s', refresh_mode => 'DIFFERENTIAL');
SELECT pgtrickle.create_stream_table('by_product',
'SELECT product_id, SUM(amount) AS total, COUNT(*) AS cnt
FROM orders GROUP BY product_id',
schedule => '5s', refresh_mode => 'DIFFERENTIAL');
SELECT pgtrickle.create_stream_table('top_customers',
'SELECT customer_id, SUM(amount) AS lifetime_value, COUNT(*) AS order_count
FROM orders GROUP BY customer_id',
schedule => '10s', refresh_mode => 'DIFFERENTIAL');
Recommended Configuration
- All fan-out targets share the same source change buffer — CDC overhead
is paid once regardless of how many stream tables read from
orders. - Use
schedule => 'calculated'on downstream STs when they chain from other stream tables. - Consider
pg_trickle.max_workersif fan-out exceeds 8 (default: 4 workers).
Anti-Patterns
- Don't use IMMEDIATE mode on high-fan-out sources. Each DML row triggers N refreshes (one per downstream ST). Use DIFFERENTIAL with a batched schedule instead.
- Don't set different schedules on STs that should be consistent.
If
daily_totalsandby_regionmust agree, give them the same schedule or usediamond_consistency => 'atomic'.
Pattern 5: Real-Time Dashboards
For dashboards that need sub-second refresh latency.
SQL Example
-- Live order monitor (sub-second freshness)
SELECT pgtrickle.create_stream_table(
'order_monitor',
$$SELECT
date_trunc('minute', order_date) AS minute,
region,
COUNT(*) AS orders,
SUM(amount) AS revenue
FROM orders
WHERE order_date >= CURRENT_DATE
GROUP BY 1, 2$$,
schedule => '1s',
refresh_mode => 'DIFFERENTIAL'
);
-- For truly real-time needs, use IMMEDIATE mode (triggers on each DML)
SELECT pgtrickle.create_stream_table(
'live_counter',
$$SELECT region, COUNT(*) AS cnt, SUM(amount) AS total
FROM orders GROUP BY region$$,
schedule => 'IMMEDIATE',
refresh_mode => 'DIFFERENTIAL'
);
When to Use IMMEDIATE vs Scheduled DIFFERENTIAL
| Scenario | Mode | Why |
|---|---|---|
| Dashboard polls every 1s | 1s | Batched delta amortizes overhead |
| GraphQL subscription, < 100ms | IMMEDIATE | Triggers fire synchronously per DML |
| Aggregate with GROUP_RESCAN | 5s+ | Avoid per-row full rescans |
| High write throughput (>1K/s) | 2s–5s | IMMEDIATE adds latency to each INSERT |
Anti-Patterns
- Don't use IMMEDIATE for complex joins. Each INSERT/UPDATE/DELETE fires the full DVM delta SQL synchronously — multi-table joins in IMMEDIATE mode add significant latency to writes.
- Don't forget
pooler_compatibility_modewith PgBouncer. Transaction pooling drops temp tables between transactions; enable this flag to avoid stale PREPARE statements.
Pattern 6: Tiered Refresh Strategy
Assign refresh importance tiers to control scheduling priority.
-- Hot: real-time operational dashboard
SELECT pgtrickle.create_stream_table('live_metrics', ...);
SELECT pgtrickle.alter_stream_table('live_metrics', tier => 'hot');
-- Warm: hourly business reports (2x interval multiplier)
SELECT pgtrickle.create_stream_table('hourly_report', ...,
schedule => '1m');
SELECT pgtrickle.alter_stream_table('hourly_report', tier => 'warm');
-- Cold: daily analytics (10x interval multiplier)
SELECT pgtrickle.create_stream_table('daily_analytics', ...,
schedule => '5m');
SELECT pgtrickle.alter_stream_table('daily_analytics', tier => 'cold');
-- Frozen: archive/audit (skip refresh entirely)
SELECT pgtrickle.alter_stream_table('audit_log_summary', tier => 'frozen');
Tier Multipliers
| Tier | Schedule Multiplier | Use Case |
|---|---|---|
| hot | 1x | Operational dashboards, alerts |
| warm | 2x | Hourly reports, batch pipelines |
| cold | 10x | Daily analytics, low-priority STs |
| frozen | skip | Paused/archived, manual refresh |
General Guidelines
Choosing a Refresh Mode
| Scenario | Recommended Mode |
|---|---|
| Source has < 5% change ratio per cycle | DIFFERENTIAL |
| Source changes > 50% per cycle | FULL |
| Query is a simple filter/projection | DIFFERENTIAL |
| Query has GROUP_RESCAN aggregates (MIN, MAX) | AUTO |
| Query joins 4+ tables | DIFFERENTIAL |
| Target table < 1000 rows | FULL |
| Need per-row latency guarantee | IMMEDIATE |
Use pgtrickle.recommend_refresh_mode() (v0.14.0+) for automated
analysis:
SELECT pgt_name, recommended_mode, confidence, reason
FROM pgtrickle.recommend_refresh_mode();
Monitoring Checklist
-- Check refresh efficiency across all stream tables
SELECT pgt_name, refresh_mode, diff_speedup, avg_change_ratio
FROM pgtrickle.refresh_efficiency()
ORDER BY total_refreshes DESC;
-- Find stream tables that might benefit from mode change
SELECT pgt_name, current_mode, recommended_mode, reason
FROM pgtrickle.recommend_refresh_mode()
WHERE recommended_mode != 'KEEP';
-- Check for error states
SELECT pgt_name, status, last_error_message
FROM pgtrickle.stream_tables_info
WHERE status IN ('ERROR', 'SUSPENDED');
-- Export definitions for backup
SELECT pgtrickle.export_definition(pgt_schema || '.' || pgt_name)
FROM pgtrickle.pgt_stream_tables;
Common Mistakes
-
Using FULL refresh by default. Start with DIFFERENTIAL — it's correct for 80%+ of workloads. Switch to FULL only when
recommend_refresh_mode()suggests it. -
Over-scheduling. A 1-second schedule on a table with 1-hour change cycles wastes CPU. Match the schedule to actual data arrival rate.
-
Ignoring
append_only. If the source table is truly append-only (no UPDATEs, no DELETEs), setappend_only => trueto halve change buffer writes. -
Not using
calculatedschedule for chained STs. When ST-B reads from ST-A, useschedule => 'calculated'on ST-B to avoid unnecessary refreshes. The scheduler automatically propagates ST-A changes downstream. -
Mixing IMMEDIATE and complex joins. IMMEDIATE mode fires delta SQL on every DML — an 8-table join in IMMEDIATE mode adds 50–200ms to each INSERT. Use scheduled DIFFERENTIAL for complex queries.