What Happens When You UPDATE a Row?

This tutorial traces what happens when an UPDATE statement hits a base table that is referenced by a stream table. It covers the trigger capture, the scan-level decomposition into DELETE + INSERT, and how each DVM operator propagates the change — including cases where the group key changes, where JOINs are involved, and where multiple UPDATEs happen within a single refresh window.

Prerequisite: Read WHAT_HAPPENS_ON_INSERT.md first — it introduces the full 7-phase lifecycle. This tutorial focuses on how UPDATE differs.

Setup

Same e-commerce example:

CREATE TABLE orders (
    id       SERIAL PRIMARY KEY,
    customer TEXT NOT NULL,
    amount   NUMERIC(10,2) NOT NULL
);

SELECT pgstream.create_stream_table(
    'customer_totals',
    $$
      SELECT customer, SUM(amount) AS total, COUNT(*) AS order_count
      FROM orders GROUP BY customer
    $$,
    '1m',
    'DIFFERENTIAL'
);

-- Seed some data
INSERT INTO orders (customer, amount) VALUES
    ('alice', 49.99),
    ('alice', 30.00),
    ('bob',   75.00);

After the first refresh, the stream table contains:

customer | total | order_count
---------|-------|------------
alice    | 79.99 | 2
bob      | 75.00 | 1

Case 1: Simple Value UPDATE (Same Group Key)

UPDATE orders SET amount = 59.99 WHERE id = 1;

Alice's first order changes from 49.99 to 59.99. The customer (group key) stays the same.

Phase 1: Trigger Capture

The AFTER UPDATE trigger fires and writes one row to the change buffer with both OLD and NEW values:

pgstream_changes.changes_16384
┌───────────┬─────────────┬────────┬──────────┬──────────┬────────────┬──────────┬────────────┐
│ change_id │ lsn         │ action │ new_cust │ new_amt  │ old_cust   │ old_amt  │ pk_hash    │
├───────────┼─────────────┼────────┼──────────┼──────────┼────────────┼──────────┼────────────┤
│ 4         │ 0/1A3F3000  │ U      │ alice    │ 59.99    │ alice      │ 49.99    │ -837291    │
└───────────┴─────────────┴────────┴──────────┴──────────┴────────────┴──────────┴────────────┘

Key difference from INSERT: the trigger writes both new_* and old_* columns. The pk_hash is computed from NEW.id.

Phase 2–4: Scheduler, Frontier, Change Detection

Identical to the INSERT flow. The scheduler detects one change row in the LSN window.

Phase 5: Scan Differentiation — The U → D+I Split

This is where UPDATE handling diverges fundamentally. The scan delta operator decomposes the UPDATE into two events:

__pgs_row_id | __pgs_action | customer | amount
-------------|--------------|----------|-------
-837291      | D            | alice    | 49.99     ← old values (DELETE)
-837291      | I            | alice    | 59.99     ← new values (INSERT)

Why split into D+I? This is a core IVM principle. Downstream operators (aggregates, joins, filters) don't have special "update" logic — they only understand insertions and deletions. By decomposing the UPDATE:

  • The DELETE event subtracts the old values from running aggregates
  • The INSERT event adds the new values

This algebraic approach handles arbitrary operator trees without operator-specific update logic.

Phase 5 (continued): Aggregate Differentiation

The aggregate operator processes both events against the alice group:

-- DELETE event: subtract old values
alice: total += CASE WHEN action='D' THEN -49.99 END  →  -49.99
alice: count += CASE WHEN action='D' THEN -1 END       →  -1

-- INSERT event: add new values
alice: total += CASE WHEN action='I' THEN +59.99 END  →  +59.99
alice: count += CASE WHEN action='I' THEN +1 END       →  +1

Net effect on alice's group:

total delta:  -49.99 + 59.99 = +10.00
count delta:  -1 + 1 = 0

The aggregate emits this as an INSERT (because the group still exists and its value changed):

customer | total  | order_count | __pgs_row_id | __pgs_action
---------|--------|-------------|--------------|-------------
alice    | +10.00 | 0           | 7283194      | I

Phase 6: MERGE

The MERGE updates the existing row:

-- MERGE WHEN MATCHED AND action = 'I' THEN UPDATE:
-- alice's total: 79.99 + 10.00 = 89.99  (via reference counting)
-- alice's count: 2 + 0 = 2

Wait — that's not right. The MERGE doesn't add deltas; it replaces the row. The aggregate delta query actually computes the new absolute value by combining the stored state with the delta:

COALESCE(existing.total, 0) + delta.total  → 79.99 + 10.00 = 89.99
COALESCE(existing.__pgs_count, 0) + delta.__pgs_count → 2 + 0 = 2

Result:

SELECT * FROM customer_totals;
 customer | total | order_count
----------|-------|------------
 alice    | 89.99 | 2            ← was 79.99
 bob      | 75.00 | 1

Case 2: Group Key Change (Customer Reassignment)

UPDATE orders SET customer = 'bob' WHERE id = 2;

Alice's second order (amount=30.00) is reassigned to Bob. The group key itself changes.

Trigger Capture

change_id | lsn         | action | new_cust | new_amt | old_cust | old_amt | pk_hash
5         | 0/1A3F3100  | U      | bob      | 30.00   | alice    | 30.00   | 4521038

The old and new customer values differ.

Scan Delta: D+I Split

__pgs_row_id | __pgs_action | customer | amount
-------------|--------------|----------|-------
4521038      | D            | alice    | 30.00    ← removes from alice's group
4521038      | I            | bob      | 30.00    ← adds to bob's group

Aggregate Delta

The aggregate groups by customer, so the DELETE and INSERT land in different groups:

Group "alice":
  total delta:  -30.00
  count delta:  -1

Group "bob":
  total delta:  +30.00
  count delta:  +1

After MERGE

SELECT * FROM customer_totals;
 customer | total  | order_count
----------|--------|------------
 alice    | 59.99  | 1            ← lost one order (-30.00)
 bob      | 105.00 | 2            ← gained one order (+30.00)

This is why the D+I decomposition is essential. Without it, you'd need special "move between groups" logic. With it, the standard aggregate differentiation handles group key changes naturally.


Case 3: UPDATE That Deletes a Group

-- Alice only has one order left. Reassign it to bob.
UPDATE orders SET customer = 'bob' WHERE id = 1;

Aggregate Delta

Group "alice":
  total delta:    -59.99
  count delta:    -1
  new __pgs_count: 1 - 1 = 0  → group vanishes!

Group "bob":
  total delta:    +59.99
  count delta:    +1

When __pgs_count reaches 0, the aggregate emits a DELETE for alice's group:

customer | total | __pgs_row_id | __pgs_action
---------|-------|--------------|-------------
alice    | —     | 7283194      | D             ← group removed
bob      | ...   | 9182734      | I             ← group updated

The MERGE deletes alice's row entirely:

SELECT * FROM customer_totals;
 customer | total  | order_count
----------|--------|------------
 bob      | 165.00 | 3

Case 4: Multiple UPDATEs on the Same Row (Within One Refresh Window)

What if a row is updated multiple times before the next refresh?

UPDATE orders SET amount = 10.00 WHERE id = 3;  -- bob: 75 → 10
UPDATE orders SET amount = 20.00 WHERE id = 3;  -- bob: 10 → 20
UPDATE orders SET amount = 30.00 WHERE id = 3;  -- bob: 20 → 30

The change buffer now has 3 rows for pk_hash of order #3:

change_id | action | old_amt | new_amt
6         | U      | 75.00   | 10.00
7         | U      | 10.00   | 20.00
8         | U      | 20.00   | 30.00

Net-Effect Computation

The scan delta uses a split fast-path design. Since order #3 has multiple changes (cnt > 1), it takes the multi-change path with window functions:

FIRST_VALUE(action) OVER (PARTITION BY pk_hash ORDER BY change_id)  → 'U'
LAST_VALUE(action) OVER (...)                                        → 'U'

Both first and last actions are 'U', so:

  • DELETE: emits using old values from the earliest change (change_id=6): old_amt = 75.00
  • INSERT: emits using new values from the latest change (change_id=8): new_amt = 30.00

Net delta:

__pgs_row_id | __pgs_action | amount
-------------|--------------|-------
pk_hash_3    | D            | 75.00    ← original value before all changes
pk_hash_3    | I            | 30.00    ← final value after all changes

The aggregate sees -75.00 + 30.00 = -45.00. This is correct regardless of the intermediate values. The intermediate rows (10.00, 20.00) are never seen.


Case 5: INSERT + UPDATE in Same Window

INSERT INTO orders (customer, amount) VALUES ('charlie', 100.00);
UPDATE orders SET amount = 200.00 WHERE customer = 'charlie';

Both happen before the next refresh. The buffer has:

change_id | action | old_amt | new_amt
9         | I      | NULL    | 100.00
10        | U      | 100.00  | 200.00

Net-effect analysis:

  • first_action = 'I' (row didn't exist before this window)
  • last_action = 'U' (row exists after)

Result:

  • No DELETE emitted (first_action = 'I' means the row was born in this window)
  • INSERT with final values: (charlie, 200.00)

The aggregate sees a pure insertion of (charlie, 200.00) — the intermediate value of 100.00 never appears.


Case 6: UPDATE + DELETE in Same Window

UPDATE orders SET amount = 999.99 WHERE id = 3;
DELETE FROM orders WHERE id = 3;

Net-effect:

  • first_action = 'U' (row existed before)
  • last_action = 'D' (row no longer exists)

Result:

  • DELETE with original old values from the first change
  • No INSERT (last_action = 'D')

The aggregate correctly sees only a removal.


Case 7: UPDATE with JOINs

Consider a stream table that joins two tables:

CREATE TABLE customers (
    id   SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    tier TEXT NOT NULL DEFAULT 'standard'
);

CREATE TABLE orders (
    id          SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(id),
    amount      NUMERIC(10,2)
);

SELECT pgstream.create_stream_table(
    'order_details',
    $$
      SELECT c.name, c.tier, o.amount
      FROM orders o
      JOIN customers c ON o.customer_id = c.id
    $$,
    '1m',
    'DIFFERENTIAL'
);

Now update a customer's tier:

UPDATE customers SET tier = 'premium' WHERE name = 'alice';

How the JOIN Delta Works

The join differentiation follows the formula:

$$\Delta(L \bowtie R) = (\Delta L \bowtie R) \cup (L \bowtie \Delta R) - (\Delta L \bowtie \Delta R)$$

Since only the customers table changed:

  • $\Delta L$ = changes to orders (empty)
  • $\Delta R$ = changes to customers (alice's tier: standard → premium)

So:

  • Part 1: $\Delta\text{orders} \bowtie \text{customers}$ = empty (no order changes)
  • Part 2: $\text{orders} \bowtie \Delta\text{customers}$ = all of alice's orders joined with her tier change
  • Part 3: $\Delta\text{orders} \bowtie \Delta\text{customers}$ = empty (no order changes)

Part 2 produces the delta: for each of alice's orders, DELETE the old row (with tier='standard') and INSERT a new row (with tier='premium').

The stream table is updated to reflect the new tier across all of alice's order rows.


Performance Summary

ScenarioBuffer rowsDelta rows emittedWork
Simple value change12 (D+I)O(1) per group
Group key change12 (D+I, different groups)O(1) per affected group
Group deletion11 (D) + 1 (I) or 1 (D)O(1)
N updates same rowN2 (D first-old + I last-new)O(N) scan, O(1) aggregate
INSERT+UPDATE same window21 (I only)O(1)
UPDATE+DELETE same window21 (D only)O(1)

In all cases, the work is proportional to the number of changed rows, not the total table size. A single UPDATE on a billion-row table produces the same delta cost as on a 10-row table.


Next in This Series