Best-Practice Patterns for pg_trickle

This guide covers common data modeling patterns and recommended configurations for pg_trickle stream tables. Each pattern includes worked SQL examples, anti-patterns to avoid, and refresh mode recommendations.

Version: v0.14.0+. Some features require recent versions — check SQL_REFERENCE.md for per-feature availability.


Table of Contents


Pattern 1: Bronze / Silver / Gold Materialization

A multi-layer approach where raw data flows through progressively refined stream tables, similar to a medallion architecture.

Architecture

  [raw_events]          ← Bronze: raw ingest table (regular table)
       ↓
  [events_cleaned]      ← Silver: filtered, deduplicated, typed
       ↓
  [events_aggregated]   ← Gold: business-level aggregates

SQL Example

-- Bronze: regular PostgreSQL table (source of truth)
CREATE TABLE raw_events (
    event_id    BIGSERIAL PRIMARY KEY,
    user_id     INT NOT NULL,
    event_type  TEXT NOT NULL,
    payload     JSONB,
    received_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Silver: cleaned and deduplicated events
SELECT pgtrickle.create_stream_table(
    'events_cleaned',
    $$SELECT DISTINCT ON (event_id)
        event_id,
        user_id,
        event_type,
        (payload->>'amount')::numeric AS amount,
        received_at
      FROM raw_events
      WHERE event_type IN ('purchase', 'refund', 'subscription')$$,
    schedule => '5s',
    refresh_mode => 'DIFFERENTIAL'
);

-- Gold: per-user purchase summary
SELECT pgtrickle.create_stream_table(
    'user_purchase_summary',
    $$SELECT user_id,
             COUNT(*) AS total_purchases,
             SUM(amount) AS total_spent,
             AVG(amount) AS avg_order
      FROM events_cleaned
      WHERE event_type = 'purchase'
      GROUP BY user_id$$,
    schedule => 'calculated',
    refresh_mode => 'DIFFERENTIAL'
);
LayerRefresh ModeScheduleTier
SilverDIFFERENTIAL5s – 30shot
GoldDIFFERENTIALcalculatedhot

Anti-Patterns

  • Don't use FULL refresh for Silver. With frequent small inserts, DIFFERENTIAL is 10–100x faster.
  • Don't skip the Silver layer. Joining raw tables directly in Gold queries produces wider joins and slower deltas.
  • Don't use IMMEDIATE mode for Gold. Aggregate maintenance on every DML row is expensive — batched DIFFERENTIAL is more efficient.

Pattern 2: Event Sourcing with Stream Tables

Use stream tables as projections of an append-only event log. The source table is the event store; stream tables materialize different read models.

SQL Example

-- Event store (append-only source)
CREATE TABLE events (
    event_id    BIGSERIAL PRIMARY KEY,
    aggregate_id UUID NOT NULL,
    event_type   TEXT NOT NULL,
    payload      JSONB NOT NULL,
    created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Projection 1: Current state per aggregate
SELECT pgtrickle.create_stream_table(
    'aggregate_state',
    $$SELECT DISTINCT ON (aggregate_id)
        aggregate_id,
        event_type AS last_event,
        payload AS current_state,
        created_at AS last_updated
      FROM events
      ORDER BY aggregate_id, created_at DESC$$,
    schedule => '2s',
    refresh_mode => 'DIFFERENTIAL'
);

-- Projection 2: Event counts by type per hour
SELECT pgtrickle.create_stream_table(
    'hourly_event_counts',
    $$SELECT date_trunc('hour', created_at) AS hour,
             event_type,
             COUNT(*) AS event_count
      FROM events
      GROUP BY 1, 2$$,
    schedule => '10s',
    refresh_mode => 'DIFFERENTIAL'
);
ProjectionRefresh ModeWhy
Current stateDIFFERENTIALSmall delta per cycle; DISTINCT ON supported
Hourly countsDIFFERENTIALAlgebraic aggregate (COUNT), efficient delta
String aggregationsAUTOGROUP_RESCAN aggs may benefit from FULL

Anti-Patterns

  • Don't DELETE from the event store. pg_trickle tracks changes via triggers; mixing append and delete on the source creates unnecessary delta complexity. Archive old events to a separate table.
  • Don't use append_only => true with UPDATE/DELETE patterns. The append_only flag skips DELETE tracking in the change buffer — only use it when the source truly never updates or deletes.

Pattern 3: Slowly Changing Dimensions (SCD)

SCD Type 1: Overwrite

The stream table always reflects the current state. Source updates overwrite previous values.

-- Source: customer dimension table (updated in place)
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name        TEXT NOT NULL,
    email       TEXT,
    tier        TEXT DEFAULT 'standard',
    updated_at  TIMESTAMPTZ DEFAULT now()
);

-- SCD-1: current customer state enriched with order stats
SELECT pgtrickle.create_stream_table(
    'customer_360',
    $$SELECT c.customer_id,
             c.name,
             c.email,
             c.tier,
             COUNT(o.id) AS total_orders,
             COALESCE(SUM(o.amount), 0) AS lifetime_value
      FROM customers c
      LEFT JOIN orders o ON o.customer_id = c.customer_id
      GROUP BY c.customer_id, c.name, c.email, c.tier$$,
    schedule => '30s',
    refresh_mode => 'DIFFERENTIAL'
);

SCD Type 2: History Tracking

For SCD-2, maintain a history table with valid-from/valid-to ranges. The stream table provides the current snapshot.

-- Source: customer history with validity ranges
CREATE TABLE customer_history (
    customer_id INT NOT NULL,
    name        TEXT NOT NULL,
    tier        TEXT NOT NULL,
    valid_from  TIMESTAMPTZ NOT NULL,
    valid_to    TIMESTAMPTZ,  -- NULL = current
    PRIMARY KEY (customer_id, valid_from)
);

-- Current active records only
SELECT pgtrickle.create_stream_table(
    'customers_current',
    $$SELECT customer_id, name, tier, valid_from
      FROM customer_history
      WHERE valid_to IS NULL$$,
    schedule => '10s',
    refresh_mode => 'DIFFERENTIAL'
);

Anti-Patterns

  • Don't use FULL refresh for SCD-1 with large dimension tables. Customer tables with millions of rows but few changes per cycle are ideal for DIFFERENTIAL.
  • Don't forget to index valid_to IS NULL for SCD-2 sources. Without it, the delta scan touches all historical rows.

Pattern 4: High-Fan-Out Topology

When a single source table feeds many downstream stream tables.

Architecture

                    [orders]
                   ↙  ↓  ↓  ↘
  [daily_totals] [by_region] [by_product] [top_customers]

SQL Example

-- Single source feeding multiple views
CREATE TABLE orders (
    id          SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    region      TEXT NOT NULL,
    product_id  INT NOT NULL,
    amount      NUMERIC(10,2) NOT NULL,
    order_date  DATE NOT NULL DEFAULT CURRENT_DATE
);

-- Fan-out: 4 stream tables on 1 source
SELECT pgtrickle.create_stream_table('daily_totals',
    'SELECT order_date, SUM(amount) AS daily_total, COUNT(*) AS order_count
     FROM orders GROUP BY order_date',
    schedule => '5s', refresh_mode => 'DIFFERENTIAL');

SELECT pgtrickle.create_stream_table('by_region',
    'SELECT region, SUM(amount) AS total, COUNT(*) AS cnt
     FROM orders GROUP BY region',
    schedule => '5s', refresh_mode => 'DIFFERENTIAL');

SELECT pgtrickle.create_stream_table('by_product',
    'SELECT product_id, SUM(amount) AS total, COUNT(*) AS cnt
     FROM orders GROUP BY product_id',
    schedule => '5s', refresh_mode => 'DIFFERENTIAL');

SELECT pgtrickle.create_stream_table('top_customers',
    'SELECT customer_id, SUM(amount) AS lifetime_value, COUNT(*) AS order_count
     FROM orders GROUP BY customer_id',
    schedule => '10s', refresh_mode => 'DIFFERENTIAL');
  • All fan-out targets share the same source change buffer — CDC overhead is paid once regardless of how many stream tables read from orders.
  • Use schedule => 'calculated' on downstream STs when they chain from other stream tables.
  • Consider pg_trickle.max_workers if fan-out exceeds 8 (default: 4 workers).

Anti-Patterns

  • Don't use IMMEDIATE mode on high-fan-out sources. Each DML row triggers N refreshes (one per downstream ST). Use DIFFERENTIAL with a batched schedule instead.
  • Don't set different schedules on STs that should be consistent. If daily_totals and by_region must agree, give them the same schedule or use diamond_consistency => 'atomic'.

Pattern 5: Real-Time Dashboards

For dashboards that need sub-second refresh latency.

SQL Example

-- Live order monitor (sub-second freshness)
SELECT pgtrickle.create_stream_table(
    'order_monitor',
    $$SELECT
        date_trunc('minute', order_date) AS minute,
        region,
        COUNT(*) AS orders,
        SUM(amount) AS revenue
      FROM orders
      WHERE order_date >= CURRENT_DATE
      GROUP BY 1, 2$$,
    schedule => '1s',
    refresh_mode => 'DIFFERENTIAL'
);

-- For truly real-time needs, use IMMEDIATE mode (triggers on each DML)
SELECT pgtrickle.create_stream_table(
    'live_counter',
    $$SELECT region, COUNT(*) AS cnt, SUM(amount) AS total
      FROM orders GROUP BY region$$,
    schedule => 'IMMEDIATE',
    refresh_mode => 'DIFFERENTIAL'
);

When to Use IMMEDIATE vs Scheduled DIFFERENTIAL

ScenarioModeWhy
Dashboard polls every 1s1sBatched delta amortizes overhead
GraphQL subscription, < 100msIMMEDIATETriggers fire synchronously per DML
Aggregate with GROUP_RESCAN5s+Avoid per-row full rescans
High write throughput (>1K/s)2s5sIMMEDIATE adds latency to each INSERT

Anti-Patterns

  • Don't use IMMEDIATE for complex joins. Each INSERT/UPDATE/DELETE fires the full DVM delta SQL synchronously — multi-table joins in IMMEDIATE mode add significant latency to writes.
  • Don't forget pooler_compatibility_mode with PgBouncer. Transaction pooling drops temp tables between transactions; enable this flag to avoid stale PREPARE statements.

Pattern 6: Tiered Refresh Strategy

Assign refresh importance tiers to control scheduling priority.

-- Hot: real-time operational dashboard
SELECT pgtrickle.create_stream_table('live_metrics', ...);
SELECT pgtrickle.alter_stream_table('live_metrics', tier => 'hot');

-- Warm: hourly business reports (2x interval multiplier)
SELECT pgtrickle.create_stream_table('hourly_report', ...,
    schedule => '1m');
SELECT pgtrickle.alter_stream_table('hourly_report', tier => 'warm');

-- Cold: daily analytics (10x interval multiplier)
SELECT pgtrickle.create_stream_table('daily_analytics', ...,
    schedule => '5m');
SELECT pgtrickle.alter_stream_table('daily_analytics', tier => 'cold');

-- Frozen: archive/audit (skip refresh entirely)
SELECT pgtrickle.alter_stream_table('audit_log_summary', tier => 'frozen');

Tier Multipliers

TierSchedule MultiplierUse Case
hot1xOperational dashboards, alerts
warm2xHourly reports, batch pipelines
cold10xDaily analytics, low-priority STs
frozenskipPaused/archived, manual refresh

General Guidelines

Choosing a Refresh Mode

ScenarioRecommended Mode
Source has < 5% change ratio per cycleDIFFERENTIAL
Source changes > 50% per cycleFULL
Query is a simple filter/projectionDIFFERENTIAL
Query has GROUP_RESCAN aggregates (MIN, MAX)AUTO
Query joins 4+ tablesDIFFERENTIAL
Target table < 1000 rowsFULL
Need per-row latency guaranteeIMMEDIATE

Use pgtrickle.recommend_refresh_mode() (v0.14.0+) for automated analysis:

SELECT pgt_name, recommended_mode, confidence, reason
FROM pgtrickle.recommend_refresh_mode();

Monitoring Checklist

-- Check refresh efficiency across all stream tables
SELECT pgt_name, refresh_mode, diff_speedup, avg_change_ratio
FROM pgtrickle.refresh_efficiency()
ORDER BY total_refreshes DESC;

-- Find stream tables that might benefit from mode change
SELECT pgt_name, current_mode, recommended_mode, reason
FROM pgtrickle.recommend_refresh_mode()
WHERE recommended_mode != 'KEEP';

-- Check for error states
SELECT pgt_name, status, last_error_message
FROM pgtrickle.stream_tables_info
WHERE status IN ('ERROR', 'SUSPENDED');

-- Export definitions for backup
SELECT pgtrickle.export_definition(pgt_schema || '.' || pgt_name)
FROM pgtrickle.pgt_stream_tables;

Common Mistakes

  1. Using FULL refresh by default. Start with DIFFERENTIAL — it's correct for 80%+ of workloads. Switch to FULL only when recommend_refresh_mode() suggests it.

  2. Over-scheduling. A 1-second schedule on a table with 1-hour change cycles wastes CPU. Match the schedule to actual data arrival rate.

  3. Ignoring append_only. If the source table is truly append-only (no UPDATEs, no DELETEs), set append_only => true to halve change buffer writes.

  4. Not using calculated schedule for chained STs. When ST-B reads from ST-A, use schedule => 'calculated' on ST-B to avoid unnecessary refreshes. The scheduler automatically propagates ST-A changes downstream.

  5. Mixing IMMEDIATE and complex joins. IMMEDIATE mode fires delta SQL on every DML — an 8-table join in IMMEDIATE mode adds 50–200ms to each INSERT. Use scheduled DIFFERENTIAL for complex queries.