Custom SQL Syntax for PostgreSQL Extensions

Comprehensive Technical Research Report

Date: 2026-02-25 Context: pg_stream extension — evaluating approaches to support CREATE STREAM TABLE syntax or equivalent native-feeling DDL.


Table of Contents

  1. Executive Summary
  2. PostgreSQL Parser Hooks / Utility Hooks
  3. The ProcessUtility_hook Approach
  4. Raw Parser Extension (gram.y)
  5. The Utility Command Approach
  6. Custom Access Methods (CREATE ACCESS METHOD)
  7. Table Access Method API (PostgreSQL 12+)
  8. Foreign Data Wrapper Approach
  9. Event Triggers
  10. TimescaleDB Continuous Aggregates Pattern
  11. Citus Distributed DDL Pattern
  12. PostgreSQL 18 New Features
  13. COMMENT / OPTIONS Abuse Pattern
  14. pg_ivm (Incremental View Maintenance) Pattern
  15. CREATE TABLE ... USING (Table Access Methods) Deep Dive
  16. Comparison Matrix
  17. Recommendations for pg_stream

1. Executive Summary

PostgreSQL's parser is not extensible — there is no parser hook that allows extensions to add new grammar rules. This is a fundamental design constraint. Every approach to "custom DDL syntax" in extensions falls into one of two categories:

  1. Intercept existing syntax — Use ProcessUtility_hook or event triggers to intercept standard DDL (e.g., CREATE TABLE, CREATE VIEW) and augment its behavior.
  2. Use a SQL function as the DDL interface — Define SELECT my_extension.create_thing(...) as the user-facing API (this is what pg_stream currently does).

No production PostgreSQL extension ships truly new SQL grammar without forking the PostgreSQL parser. TimescaleDB, Citus, pg_ivm, and others all work within existing syntax boundaries.


2. PostgreSQL Parser Hooks / Utility Hooks

Available Hook Points

PostgreSQL provides several hook function pointers that extensions can override in _PG_init():

HookHeaderPurpose
ProcessUtility_hooktcop/utility.hIntercept utility (DDL) statement execution
post_parse_analyze_hookparser/analyze.hInspect/modify the analyzed parse tree after semantic analysis
planner_hookoptimizer/planner.hReplace or augment the query planner
ExecutorStart_hookexecutor/executor.hIntercept executor startup
ExecutorRun_hookexecutor/executor.hIntercept executor row processing
ExecutorFinish_hookexecutor/executor.hIntercept executor finish
ExecutorEnd_hookexecutor/executor.hIntercept executor cleanup
object_access_hookcatalog/objectaccess.hNotifications when objects are created/modified/dropped
emit_log_hookutils/elog.hIntercept log messages

What's Missing: No Parser Hook

There is no parser_hook or raw_parser_hook. The raw parser (gram.yscan.l → bison grammar) is compiled into the PostgreSQL server binary. Extensions cannot:

  • Add new keywords (e.g., STREAM)
  • Add new grammar productions (e.g., CREATE STREAM TABLE)
  • Modify the tokenizer/lexer
  • Intercept raw SQL text before parsing

The closest hook is post_parse_analyze_hook, which fires after the SQL has already been parsed and analyzed. By this point:

  • The SQL string has already been tokenized and parsed by gram.y
  • A parse tree (Query node) has been produced
  • If the SQL contains unknown syntax, a syntax error has already been raised

Technical Details of post_parse_analyze_hook

/* In src/backend/parser/analyze.c */
typedef void (*post_parse_analyze_hook_type)(ParseState *pstate,
                                             Query *query,
                                             JumbleState *jstate);
post_parse_analyze_hook_type post_parse_analyze_hook = NULL;

Extensions can set this in _PG_init():

static post_parse_analyze_hook_type prev_post_parse_analyze_hook = NULL;

void _PG_init(void) {
    prev_post_parse_analyze_hook = post_parse_analyze_hook;
    post_parse_analyze_hook = my_post_parse_analyze;
}

Use cases: Query rewriting after parsing (e.g., adding security predicates, row-level security), statistics collection, plan caching invalidation. Not usable for new syntax because parsing has already completed.

Pros/Cons

AspectAssessment
Native syntaxImpossible — cannot add new grammar
Intercept existing DDLYes via ProcessUtility_hook
Modify parsed queriesYes via post_parse_analyze_hook
ComplexityLow for hooking, but limited in capability
PG versionAll modern versions (hooks stable since PG 9.x)
MaintenanceVery low — hook signatures rarely change

3. The ProcessUtility_hook Approach

How It Works

ProcessUtility_hook is the most powerful DDL interception point. It fires for every "utility statement" (DDL, COPY, EXPLAIN, etc.) after parsing but before execution.

typedef void (*ProcessUtility_hook_type)(PlannedStmt *pstmt,
                                         const char *queryString,
                                         bool readOnlyTree,
                                         ProcessUtilityContext context,
                                         ParamListInfo params,
                                         QueryEnvironment *queryEnv,
                                         DestReceiver *dest,
                                         QueryCompletion *qc);

An extension can:

  1. Inspect the parse tree node — The PlannedStmt->utilityStmt field contains the parsed DDL node (e.g., CreateStmt, AlterTableStmt, ViewStmt).
  2. Modify the parse tree — Change fields before passing to the standard handler.
  3. Replace execution entirely — Skip calling the standard handler and do something else.
  4. Post-process — Call the standard handler first, then do additional work.
  5. Block execution — Raise an error to prevent the DDL.

What Extensions Use This

ExtensionWhat they interceptPurpose
TimescaleDBCREATE TABLE, ALTER TABLE, DROP TABLE, CREATE INDEX, etc.Convert regular tables to hypertables, distribute DDL
CitusMost DDL statementsPropagate DDL to worker nodes
pg_partmanCREATE TABLE, partition DDLAuto-manage partitioning
pg_stat_statementsAll utility statementsTrack DDL execution statistics
pgAuditAll utility statementsAudit logging
pg_hint_planUses post_parse_analyze_hook instead
sepgsqlObject creation/modificationSecurity label enforcement

Can It Handle New Syntax?

No. It can only intercept DDL that PostgreSQL's parser already understands. You cannot use ProcessUtility_hook to handle CREATE STREAM TABLE because the parser will reject that syntax before the hook is ever called.

However, it can intercept and augment existing syntax:

  • CREATE TABLE ... (some_option) → Intercept CreateStmt, check for special markers, do extra work
  • CREATE VIEW ... WITH (custom_option = true) → Intercept ViewStmt, check reloptions
  • CREATE MATERIALIZED VIEW ... WITH (custom = true) → Same approach

Pattern: Intercepting CREATE TABLE

static void my_process_utility(PlannedStmt *pstmt, ...) {
    Node *parsetree = pstmt->utilityStmt;

    if (IsA(parsetree, CreateStmt)) {
        CreateStmt *stmt = (CreateStmt *) parsetree;
        // Check for a special reloption or table name pattern
        ListCell *lc;
        foreach(lc, stmt->options) {
            DefElem *opt = (DefElem *) lfirst(lc);
            if (strcmp(opt->defname, "stream") == 0) {
                // This is a stream table! Do custom logic.
                create_stream_table_from_ddl(stmt, queryString);
                return; // Don't call standard handler
            }
        }
    }

    // Pass through to standard handler
    if (prev_ProcessUtility)
        prev_ProcessUtility(pstmt, ...);
    else
        standard_ProcessUtility(pstmt, ...);
}

Pros/Cons

AspectAssessment
Native CREATE STREAM TABLENo — parser rejects unknown syntax
CREATE TABLE ... WITH (stream=true)Yes — feasible via reloptions
ComplexityMedium — must carefully chain with other extensions
PG versionAll modern versions
MaintenanceLow — hook signature changes rarely (changed in PG14, PG15)
RiskMust always chain prev_ProcessUtility — misbehaving can break other extensions

4. Raw Parser Extension (gram.y)

How It Works

PostgreSQL's SQL parser is a Bison-generated LALR(1) parser defined in:

  • src/backend/parser/gram.y — Grammar rules (~18,000 lines)
  • src/backend/parser/scan.l — Flex lexer (tokenizer)
  • src/include/parser/kwlist.h — Reserved/unreserved keyword list

To add CREATE STREAM TABLE, you would:

  1. Add STREAM to the keyword list (unreserved or reserved)
  2. Add grammar rules to gram.y:
    CreateStreamTableStmt:
        CREATE STREAM TABLE qualified_name '(' OptTableElementList ')'
        OptWith AS SelectStmt
        {
            CreateStreamTableStmt *n = makeNode(CreateStreamTableStmt);
            n->relation = $4;
            n->query = $9;
            /* ... */
            $$ = (Node *) n;
        }
    ;
    
  3. Add a new NodeTag for CreateStreamTableStmt
  4. Handle it in ProcessUtility
  5. Rebuild the PostgreSQL server

Implications

This requires forking PostgreSQL. The modified parser is compiled into postgres binary. You cannot ship a grammar modification as a loadable extension (.so/.dylib).

Who Does This?

  • YugabyteDB — Fork of PG with custom grammar for distributed features
  • CockroachDB — Entirely custom parser (Go, not PG's Bison grammar)
  • Amazon Aurora (partially) — Custom grammar additions for Aurora-specific features
  • Greenplum — Fork of PG with added grammar for DISTRIBUTED BY, PARTITION BY etc.
  • ParadeDB — Fork of PG with some custom syntax additions

Pros/Cons

AspectAssessment
Native CREATE STREAM TABLEYes — full parser-level support
ComplexityVery high — must maintain a PG fork
PG versionTied to a single PG version
MaintenanceExtremely high — must rebase on every PG release (gram.y changes significantly between major versions)
DistributionCannot use CREATE EXTENSION; must ship entire modified PostgreSQL
User adoptionVery low — users must replace their PostgreSQL installation
psql autocompleteWould work with matching psql modifications
pg_dump/pg_restoreBroken unless you also modify those tools

Verdict: Not viable for an extension. Only viable for a PostgreSQL fork/distribution.


5. The Utility Command Approach

How It Works

Some sources reference a "custom utility command" mechanism. In practice, this does not exist as a formal PostgreSQL extension point. What people sometimes mean is one of:

5a. Using DO Blocks as Custom Commands

DO $$ BEGIN PERFORM pgstream.create_stream_table('my_st', 'SELECT ...'); END $$;

This is just a wrapped function call — not a real custom command.

5b. Abusing COMMENT or SET for Command Dispatch

Some extensions parse custom commands from strings:

-- Using SET to pass commands
SET myext.command = 'CREATE STREAM TABLE my_st AS SELECT ...';
SELECT myext.execute_pending_command();

Or using post_parse_analyze_hook to intercept a specially-formatted query:

-- Extension intercepts this via post_parse_analyze_hook
SELECT * FROM myext.dispatch('CREATE STREAM TABLE ...');

5c. Overloading Existing Syntax

Some extensions overload SELECT or CALL:

CALL pgstream.create_stream_table('my_st', $$SELECT ...$$);

CALL was introduced in PostgreSQL 11 for stored procedures. Using it makes the DDL feel more "command-like" than SELECT function().

Pros/Cons

AspectAssessment
Native syntaxNo — still a function call in disguise
User experienceModerate — CALL is better than SELECT
ComplexityLow
PG versionPG11+ for CALL
MaintenanceVery low

6. Custom Access Methods (CREATE ACCESS METHOD)

How It Works

PostgreSQL supports extension-defined access methods (index AMs and table AMs):

CREATE ACCESS METHOD my_am TYPE TABLE HANDLER my_am_handler;

This was introduced in PostgreSQL 9.6 for index AMs and extended to table AMs in PostgreSQL 12. The CREATE ACCESS METHOD statement shows PostgreSQL's philosophy: extensions can define new implementations of existing concepts (tables, indexes) but not new concepts (stream tables).

Table AM vs. Index AM

TypeSinceHandler SignatureExample
Index AMPG 9.6IndexAmRoutine with scan/insert/delete callbacksbloom, brin, GiST
Table AMPG 12TableAmRoutine with 60+ callbacksheap (default), columnar (Citus), zedstore (experimental)

Can We Use This for Stream Tables?

The table AM API defines how tuples are stored and retrieved, not how tables are created or maintained. A stream table's key features are:

  • Defining query — Not part of the table AM concept
  • Automatic refresh — Not part of the table AM concept
  • Change tracking — Could partially overlap with table AM's tuple modification callbacks
  • Storage — The actual storage could use heap (default) AM

You could theoretically create a custom table AM that:

  1. Uses heap storage underneath
  2. Intercepts INSERT/UPDATE/DELETE to maintain change buffers
  3. Adds custom metadata

But this would be an extreme abuse of the API. Table AMs are meant for storage engines, not for implementing materialized view semantics.

Pros/Cons

AspectAssessment
Native syntaxNoCREATE TABLE ... USING my_am is the closest
ComplexityExtremely high — 60+ callbacks to implement
FitnessPoor — table AM is about storage, not view maintenance
PG versionPG 12+
MaintenanceHigh — AM API evolves between major versions

7. Table Access Method API (PostgreSQL 12+)

Deep Technical Details

The Table Access Method (AM) API was introduced in PostgreSQL 12 via commit c2fe139c20 by Andres Freund. It abstracts the storage layer, allowing extensions to replace the default heap storage with custom implementations.

The CREATE TABLE ... USING Syntax

-- Use default AM (heap)
CREATE TABLE normal_table (id int, data text);

-- Use custom AM
CREATE TABLE my_table (id int, data text) USING my_custom_am;

-- Set default for a database
SET default_table_access_method = 'my_custom_am';

TableAmRoutine Structure

The handler function must return a TableAmRoutine struct with callbacks:

typedef struct TableAmRoutine {
    NodeTag type;

    /* Slot callbacks */
    const TupleTableSlotOps *(*slot_callbacks)(Relation rel);

    /* Scan callbacks */
    TableScanDesc (*scan_begin)(Relation rel, Snapshot snap, int nkeys, ...);
    void (*scan_end)(TableScanDesc scan);
    void (*scan_rescan)(TableScanDesc scan, ...);
    bool (*scan_getnextslot)(TableScanDesc scan, ...);

    /* Parallel scan */
    Size (*parallelscan_estimate)(Relation rel);
    Size (*parallelscan_initialize)(Relation rel, ...);
    void (*parallelscan_reinitialize)(Relation rel, ...);

    /* Index fetch */
    IndexFetchTableData *(*index_fetch_begin)(Relation rel);
    void (*index_fetch_reset)(IndexFetchTableData *data);
    void (*index_fetch_end)(IndexFetchTableData *data);
    bool (*index_fetch_tuple)(IndexFetchTableData *data, ...);

    /* Tuple modification */
    void (*tuple_insert)(Relation rel, TupleTableSlot *slot, ...);
    void (*tuple_insert_speculative)(Relation rel, ...);
    void (*tuple_complete_speculative)(Relation rel, ...);
    void (*multi_insert)(Relation rel, TupleTableSlot **slots, int nslots, ...);
    TM_Result (*tuple_delete)(Relation rel, ItemPointer tid, ...);
    TM_Result (*tuple_update)(Relation rel, ItemPointer otid, ...);
    TM_Result (*tuple_lock)(Relation rel, ItemPointer tid, ...);

    /* DDL callbacks */
    void (*relation_set_new_filelocator)(Relation rel, ...);
    void (*relation_nontransactional_truncate)(Relation rel);
    void (*relation_copy_data)(Relation rel, const RelFileLocator *newrlocator);
    void (*relation_copy_for_cluster)(Relation rel, ...);
    void (*relation_vacuum)(Relation rel, VacuumParams *params, ...);
    bool (*scan_analyze_next_block)(TableScanDesc scan, ...);
    bool (*scan_analyze_next_tuple)(TableScanDesc scan, ...);

    /* Planner support */
    void (*relation_estimate_size)(Relation rel, int32 *attr_widths, ...);

    /* ... more callbacks */
} TableAmRoutine;

Hybrid Approach: Table AM + ProcessUtility_hook

A more practical pattern:

  1. Register a custom table AM (e.g., stream_am) that wraps heap
  2. Use ProcessUtility_hook to intercept CREATE TABLE ... USING stream_am
  3. When detected, perform stream table registration (catalog, CDC, etc.)
  4. The actual storage uses standard heap via delegation
-- User writes:
CREATE TABLE order_totals (region text, total numeric)
    USING stream_am
    WITH (query = 'SELECT region, SUM(amount) FROM orders GROUP BY region',
          schedule = '1m',
          refresh_mode = 'DIFFERENTIAL');

Problems with This Approach

  1. Column list is mandatoryCREATE TABLE ... USING requires explicit column definitions. Stream tables should derive columns from the query.
  2. Query in WITH clause — Storing a full SQL query in reloptions is hacky and has length limits.
  3. No AS SELECT — Table AMs don't support CREATE TABLE ... AS SELECT with USING clause in the standard grammar.
  4. VACUUM, ANALYZE complexity — Must implement or delegate all maintenance callbacks.
  5. pg_dump compatibility — pg_dump would dump CREATE TABLE ... USING stream_am but not the associated metadata (query, schedule, etc.)

Pros/Cons

AspectAssessment
Native syntaxPartialCREATE TABLE ... USING stream_am
Feels like a stream tableNo — still looks like a regular table with options
ComplexityVery high
pg_dumpBroken — metadata in catalog tables won't be dumped
PG versionPG 12+
MaintenanceHigh — table AM API changes between versions

8. Foreign Data Wrapper Approach

How It Works

Foreign Data Wrappers (FDW) allow PostgreSQL to access external data sources via CREATE FOREIGN TABLE. An extension can register a custom FDW:

CREATE EXTENSION pg_stream;
CREATE SERVER stream_server FOREIGN DATA WRAPPER pgstream_fdw;

CREATE FOREIGN TABLE order_totals (region text, total numeric)
    SERVER stream_server
    OPTIONS (
        query 'SELECT region, SUM(amount) FROM orders GROUP BY region',
        schedule '1m',
        refresh_mode 'DIFFERENTIAL'
    );

FDW API

The FDW API provides callbacks for:

  • GetForeignRelSize — Estimate relation size for planning
  • GetForeignPaths — Generate access paths
  • GetForeignPlan — Create a plan node
  • BeginForeignScan — Start scan
  • IterateForeignScan — Get next tuple
  • EndForeignScan — End scan
  • AddForeignUpdatePaths — Support INSERT/UPDATE/DELETE (optional)

How It Could Work for Stream Tables

  1. Define a custom FDW (pgstream_fdw)
  2. The FDW's scan callbacks read from the underlying storage table
  3. ProcessUtility_hook intercepts CREATE FOREIGN TABLE ... SERVER stream_server to set up CDC, catalog entries, etc.
  4. A background worker handles refresh scheduling

Problems

  1. Foreign tables have restrictions — Cannot have indexes, constraints, triggers, or participate in inheritance. This severely limits usability.
  2. Query planner limitations — Foreign tables use a separate planning path with potentially worse plan quality.
  3. No MVCC — Foreign tables typically don't provide snapshot isolation semantics.
  4. User model confusion — "Foreign table" implies external data, not a derived view.
  5. EXPLAIN output — Shows "Foreign Scan" instead of "Seq Scan", confusing users.
  6. pg_dump — Foreign tables are dumped, but server/FDW setup may not transfer correctly.
  7. Two-step creation — Requires CREATE SERVER before CREATE FOREIGN TABLE.

Pros/Cons

AspectAssessment
Native syntaxPartialCREATE FOREIGN TABLE with options
Feels like a stream tableNo — foreign tables have different semantics
Index supportNo — major limitation
Trigger supportNo — major limitation
ComplexityMedium
PG versionPG 9.1+
MaintenanceLow — FDW API is very stable

Verdict: Not suitable. The restrictions on foreign tables (no indexes, no triggers) make this impractical for stream tables that need to behave like regular tables.


9. Event Triggers

How It Works

Event triggers fire on DDL events at the database level:

CREATE EVENT TRIGGER my_trigger ON ddl_command_end
    WHEN TAG IN ('CREATE TABLE', 'ALTER TABLE', 'DROP TABLE')
    EXECUTE FUNCTION my_handler();

Available events:

  • ddl_command_start — Before DDL execution (PG 9.3+)
  • ddl_command_end — After DDL execution (PG 9.3+)
  • sql_drop — When objects are dropped (PG 9.3+)
  • table_rewrite — When a table is rewritten (PG 9.5+)

Inside the Handler

CREATE FUNCTION my_handler() RETURNS event_trigger AS $$
DECLARE
    obj record;
BEGIN
    FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands()
    LOOP
        -- obj.objid, obj.object_type, obj.command_tag, etc.
        IF obj.command_tag = 'CREATE TABLE' AND obj.object_type = 'table' THEN
            -- Check if this table has a special marker
            -- (e.g., a specific reloption or comment)
        END IF;
    END LOOP;
END;
$$ LANGUAGE plpgsql;

Pattern: CREATE TABLE + Event Trigger

  1. User creates a table with a special comment or option:
    CREATE TABLE order_totals (region text, total numeric);
    COMMENT ON TABLE order_totals IS 'pgstream:query=SELECT region...;schedule=1m';
    
  2. Event trigger on ddl_command_end fires
  3. Handler parses the comment, detects stream table intent
  4. Handler registers the stream table in the catalog

Limitations

  1. Cannot modify the DDL — Event triggers observe DDL, they can't change what happened. On ddl_command_end, the table already exists.
  2. Cannot prevent DDL — On ddl_command_start, you can raise an error to prevent it, but you can't redirect it.
  3. Two-step process — User must CREATE TABLE AND then mark it somehow (comment, option, separate function call).
  4. No custom syntax — Event triggers watch existing DDL commands.
  5. pg_stream already uses this — For DDL tracking on upstream tables (see hooks.rs).

Pros/Cons

AspectAssessment
Native syntaxNo — watches existing DDL only
ComplexityLow
Can transform DDLNo — observe only
PG versionPG 9.3+
MaintenanceVery low
pg_stream usageAlready used for upstream DDL tracking

10. TimescaleDB Continuous Aggregates Pattern

How It Works

TimescaleDB continuous aggregates (caggs) demonstrate the most sophisticated approach to custom DDL-like syntax in a PostgreSQL extension. Their evolution is instructive.

Phase 1: Pure Function API (early versions)

-- Create a view, then register it
CREATE VIEW daily_temps AS
SELECT time_bucket('1 day', time) AS day, AVG(temp)
FROM conditions GROUP BY 1;

SELECT add_continuous_aggregate_policy('daily_temps', ...);

Phase 2: CREATE MATERIALIZED VIEW WITH (introduced in TimescaleDB 2.0)

CREATE MATERIALIZED VIEW daily_temps
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 day', time) AS day, device_id, AVG(temp)
FROM conditions
GROUP BY 1, 2;

How the Hook Chain Works

TimescaleDB's approach uses layered hooks:

  1. ProcessUtility_hook intercepts CREATE MATERIALIZED VIEW
  2. Checks reloptions for timescaledb.continuous in the WithClause
  3. If found:
    • Does NOT call standard ProcessUtility for the matview
    • Instead creates a regular hypertable (the materialization)
    • Creates an internal view (the user-facing query interface)
    • Registers refresh policies in the catalog
    • Sets up continuous aggregate metadata
  4. For REFRESH MATERIALIZED VIEW, intercepts and routes to their refresh engine
  5. For DROP MATERIALIZED VIEW, intercepts and cleans up all artifacts

The Magic: Reloptions as Extension Point

PostgreSQL's CREATE MATERIALIZED VIEW ... WITH (option = value) passes options as DefElem nodes in the parse tree. The parser treats these as generic key-value pairs — it does NOT validate the option names. This is the key insight: PostgreSQL's parser accepts arbitrary options in WITH clauses.

// In ProcessUtility_hook:
if (IsA(parsetree, CreateTableAsStmt)) {
    CreateTableAsStmt *stmt = (CreateTableAsStmt *) parsetree;
    if (stmt->objtype == OBJECT_MATVIEW) {
        // Check for our custom option in stmt->into->options
        bool is_continuous = false;
        ListCell *lc;
        foreach(lc, stmt->into->rel->options) {
            DefElem *opt = (DefElem *) lfirst(lc);
            if (strcmp(opt->defname, "timescaledb.continuous") == 0) {
                is_continuous = true;
                break;
            }
        }
        if (is_continuous) {
            // Handle as continuous aggregate
            return;
        }
    }
}

Refresh Policies

-- Add a refresh policy (function call, not DDL)
SELECT add_continuous_aggregate_policy('daily_temps',
    start_offset => INTERVAL '1 month',
    end_offset => INTERVAL '1 day',
    schedule_interval => INTERVAL '1 hour');

What pg_stream Could Learn

The TimescaleDB pattern for pg_stream would look like:

-- Option A: CREATE MATERIALIZED VIEW with custom option
CREATE MATERIALIZED VIEW order_totals
WITH (pgstream.stream = true, pgstream.schedule = '1m', pgstream.mode = 'DIFFERENTIAL')
AS SELECT region, SUM(amount) FROM orders GROUP BY region;

-- Option B: CREATE TABLE with custom option (less natural)
CREATE TABLE order_totals (region text, total numeric)
WITH (pgstream.stream = true);
-- Then separately: SELECT pgstream.set_query('order_totals', 'SELECT ...');

Pros/Cons

AspectAssessment
Native syntaxGoodCREATE MATERIALIZED VIEW ... WITH (pgstream.stream) looks natural
User experienceVery good — familiar DDL syntax with extension options
ComplexityHigh — must implement full ProcessUtility_hook chain
pg_dumpPartial — matview DDL is dumped, but custom metadata needs pg_dump extension or config tables
PG versionPG 9.3+ (matviews), PG 12+ (better option handling)
MaintenanceMedium — must track changes to matview creation internals
Shared preloadRequired — ProcessUtility_hook needs shared_preload_libraries

11. Citus Distributed DDL Pattern

How It Works

Citus (now part of Microsoft) demonstrates another approach to extending DDL behavior:

ProcessUtility_hook Chain

Citus has one of the most comprehensive ProcessUtility_hook implementations:

void multi_ProcessUtility(PlannedStmt *pstmt, ...) {
    // 1. Classify the DDL
    Node *parsetree = pstmt->utilityStmt;

    // 2. Check if it affects distributed tables
    if (IsA(parsetree, AlterTableStmt)) {
        // Propagate ALTER TABLE to all worker nodes
        PropagateAlterTable((AlterTableStmt *)parsetree, queryString);
    }

    // 3. Call standard handler (or skip for intercepted commands)
    if (prev_ProcessUtility)
        prev_ProcessUtility(pstmt, ...);
    else
        standard_ProcessUtility(pstmt, ...);

    // 4. Post-processing
    if (IsA(parsetree, CreateStmt)) {
        // Check if we should auto-distribute this table
    }
}

Table Distribution via Function Calls

Citus does NOT add custom DDL syntax. Distribution is done via function calls:

-- Create a regular table
CREATE TABLE events (id bigint, data jsonb, created_at timestamptz);

-- Distribute it (function call, not DDL)
SELECT create_distributed_table('events', 'id');

-- Or create a reference table
SELECT create_reference_table('lookups');

Columnar Storage via Table AM

Citus also provides columnar storage as a table AM:

CREATE TABLE analytics_data (...)
    USING columnar;

This uses the table AM API (PostgreSQL 12+) — see Section 7.

What Citus Teaches Us

  • Function calls for complex operationscreate_distributed_table() is analogous to pgstream.create_stream_table().
  • ProcessUtility_hook for DDL propagation — Intercept standard DDL and add behavior.
  • Table AM for storage — Separate concern from distribution logic.
  • No custom syntax — Even with Microsoft's resources, Citus doesn't fork the parser.

Pros/Cons

AspectAssessment
Native syntaxNo — uses function calls like pg_stream
Approach validatedYes — Citus is used at massive scale with this pattern
ComplexityMedium (function API) to High (ProcessUtility_hook)
User adoptionProven successful
MaintenanceLow for function API

12. PostgreSQL 18 New Features

Relevant Extension Points in PG 18

PostgreSQL 18 (released 2025) includes several features relevant to this analysis:

12a. Virtual Generated Columns

PG 18 adds GENERATED ALWAYS AS (expr) VIRTUAL columns. Not directly relevant to stream tables, but shows PostgreSQL's willingness to expand CREATE TABLE syntax incrementally.

12b. Improved Table AM API

PG 18 refines the table AM API with better TOAST handling and improved parallel scan support. This makes custom table AMs slightly more practical.

12c. Enhanced Event Trigger Information

PG 18 expands pg_event_trigger_ddl_commands() with additional metadata fields, making event-trigger-based approaches more capable.

12d. pg_stat_io Improvements

Enhanced I/O statistics infrastructure that could benefit monitoring of stream table refresh operations.

12e. No New Parser Extension Points

PostgreSQL 18 does not add any parser extension mechanism. The parser remains monolithic and non-extensible. There have been occasional discussions on pgsql-hackers about parser hooks, but no concrete proposals have been accepted.

12f. No Custom DDL Extension Points

No new general-purpose DDL extension points beyond the existing hook system.

Looking Forward: Discussion on pgsql-hackers

There have been recurring threads on pgsql-hackers about:

  • Extension-defined SQL syntax — Rejected due to complexity and parser architecture
  • Loadable parser modules — Theoretical discussions, no implementation
  • Extension catalogs — Some interest in allowing extensions to register custom catalogs

None of these are implemented in PG 18.

Pros/Cons

AspectAssessment
New syntax extension pointsNone in PG 18
Table AM improvementsMinor — slightly easier to implement
Event trigger improvementsMinor — more metadata available
Parser extensibilityNot planned for any upcoming PG release

13. COMMENT / OPTIONS Abuse Pattern

How It Works

Several extensions use table comments or reloptions as a "poor man's metadata" to tag tables with custom semantics.

Pattern 1: COMMENT-based

CREATE TABLE order_totals (region text, total numeric);
COMMENT ON TABLE order_totals IS '@pgstream {"query": "SELECT ...", "schedule": "1m"}';

An event trigger or background worker scans pg_description for tables with the @pgstream prefix and processes them.

Pattern 2: Reloptions-based

CREATE TABLE order_totals (region text, total numeric)
    WITH (fillfactor = 70, pgstream.stream = true);

Problem: PostgreSQL validates reloptions against a known list. You cannot add arbitrary options to WITH (...) without registering them. Extensions can register custom reloptions via add_reloption() functions, but this is a relatively obscure API.

Pattern 3: GUC-based Tagging

-- Set a GUC that our ProcessUtility_hook reads
SET pgstream.next_create_is_stream = true;
SET pgstream.stream_query = 'SELECT region, SUM(amount) FROM orders GROUP BY region';

-- Hook intercepts this CREATE TABLE and registers it
CREATE TABLE order_totals (region text, total numeric);

-- Reset
RESET pgstream.next_create_is_stream;

This is extremely hacky but has been used in practice (some partitioning extensions used similar patterns before native partitioning).

Who Uses This?

  • pgmemcache — Uses comments to configure caching behavior
  • Some row-level security extensions — Comments to define policies
  • pg_partman — Uses a configuration table (not comments) but similar concept

Pros/Cons

AspectAssessment
Native syntaxNo — abuses existing mechanisms
User experiencePoor — fragile, easy to break by editing comments
ComplexityLow
pg_dumpCOMMENT is dumped — metadata survives pg_dump/restore
RobustnessLow — comments can be accidentally changed
PG versionAll versions

14. pg_ivm (Incremental View Maintenance) Pattern

How It Works

pg_ivm is the most directly comparable extension to pg_stream. It implements incremental view maintenance for PostgreSQL.

API Design

pg_ivm uses a pure function-call API:

-- Create an incrementally maintainable materialized view
SELECT create_immv('order_totals', 'SELECT region, SUM(amount) FROM orders GROUP BY region');

-- Refresh
SELECT refresh_immv('order_totals');

-- Drop
DROP TABLE order_totals;  -- Just drop the underlying table

Key function: create_immv(name, query) — Creates an "Incrementally Maintainable Materialized View" (IMMV).

Internal Implementation

  1. create_immv() is a SQL function (not a hook)
  2. It parses the query, creates a storage table, sets up triggers on source tables
  3. IMMVs are stored as regular tables with metadata in a custom catalog (pg_ivm_immv)
  4. Triggers on source tables automatically update the IMMV on DML

No ProcessUtility_hook

pg_ivm does not use ProcessUtility_hook. It operates entirely through:

  • SQL functions (create_immv, refresh_immv)
  • Row-level triggers for automatic maintenance
  • A custom catalog table for metadata

Why No Custom Syntax?

pg_ivm was developed as a proof-of-concept for PostgreSQL core IVM support. The authors explicitly chose function-call syntax to:

  1. Avoid shared_preload_libraries requirement (hooks need it)
  2. Keep the extension simple and portable
  3. Focus on the IVM algorithm, not the user interface

Eventually Merged to Core?

There was discussion about upstreaming IVM to PostgreSQL core. If merged, it would get proper syntax (CREATE INCREMENTAL MATERIALIZED VIEW). As an extension, it stays with function calls.

Relevance to pg_stream

pg_stream's current API (pgstream.create_stream_table()) follows the exact same pattern as pg_ivm. This is the established approach for IVM extensions.

Pros/Cons

AspectAssessment
Native syntaxNo — function calls
ComplexityLow — simple function API
shared_preload_librariesNot required for basic function API
pg_dumpNo — function calls are not dumped; must use custom dump/restore
User experienceModerate — familiar to pg_ivm users
Community acceptanceEstablished pattern for IVM extensions

15. CREATE TABLE ... USING (Table Access Methods) Deep Dive

Full Syntax

CREATE TABLE tablename (
    column1 datatype,
    column2 datatype,
    ...
) USING access_method_name
  WITH (storage_parameter = value, ...);

How the Parser Handles USING

In gram.y:

CreateStmt: CREATE OptTemp TABLE ...
    OptTableAccessMethod OptWith ...

OptTableAccessMethod:
    USING name    { $$ = $2; }
    | /* empty */ { $$ = NULL; }
    ;

The USING clause sets CreateStmt->accessMethod to the access method name string.

How ProcessUtility Handles It

In createRelation() (src/backend/commands/tablecmds.c):

  1. If accessMethod is specified, look it up in pg_am
  2. Verify it's a table AM (not an index AM)
  3. Store the AM OID in pg_class.relam
  4. Use the AM's callbacks for all subsequent operations

Custom Reloptions with Table AMs

Table AMs can define custom reloptions via:

static relopt_parse_elt stream_relopt_tab[] = {
    {"query", RELOPT_TYPE_STRING, offsetof(StreamOptions, query)},
    {"schedule", RELOPT_TYPE_STRING, offsetof(StreamOptions, schedule)},
    {"refresh_mode", RELOPT_TYPE_STRING, offsetof(StreamOptions, refresh_mode)},
};

This would allow:

CREATE TABLE order_totals (region text, total numeric)
    USING stream_heap
    WITH (query = 'SELECT ...', schedule = '1m', refresh_mode = 'DIFFERENTIAL');

Problems Specific to Stream Tables

  1. Column derivation — Stream tables derive columns from the query. CREATE TABLE ... USING requires explicit column definitions, creating redundancy and potential inconsistency.

  2. No AS SELECT — You can't combine USING with AS SELECT:

    -- This does NOT work in PostgreSQL grammar:
    CREATE TABLE order_totals
        USING stream_heap
        AS SELECT region, SUM(amount) FROM orders GROUP BY region;
    
  3. Full AM implementation required — Even if you delegate to heap, you must implement all callbacks and handle edge cases.

  4. VACUUM/ANALYZE — Must properly delegate to heap for these to work.

  5. Replication — Logical replication assumes heap tuples; custom AMs may break.

Hybrid Practical Approach

If pursuing this route:

-- Step 1: Set default AM
SET default_table_access_method = 'stream_heap';

-- Step 2: Create with query in options
CREATE TABLE order_totals ()
    WITH (pgstream.query = 'SELECT region, SUM(amount) FROM orders GROUP BY region',
          pgstream.schedule = '1m');

-- ProcessUtility_hook would:
-- 1. Detect USING stream_heap (or detect our custom reloptions)
-- 2. Parse the query from options
-- 3. Derive columns from the query
-- 4. Create the actual table with proper columns using heap AM
-- 5. Register in pgstream catalog
-- 6. Set up CDC

Pros/Cons

AspectAssessment
Native syntaxPartialCREATE TABLE ... USING stream_heap WITH (...)
Column derivationNot supported — must specify columns or use hook magic
ComplexityVery high
pg_dumpGoodCREATE TABLE ... USING is properly dumped
PG versionPG 12+
MaintenanceHigh — AM API changes between versions

16. Comparison Matrix

ApproachNative SyntaxComplexitypg_dumpPG VersionMaintenanceRecommended
Function API (current)NoLowNo*AnyVery LowYes
ProcessUtility_hook + MATVIEW WITHGoodHighPartial9.3+MediumMaybe
Raw parser forkPerfectVery HighNoFork onlyVery HighNo
Table AM USINGPartialVery HighYes12+HighNo
FDW FOREIGN TABLEPartialMediumYes9.1+LowNo
Event triggers aloneNoLowNo9.3+LowNo
COMMENT abuseNoLowYesAnyLowNo
GUC + CREATE TABLE hackNoMediumPartialAnyMediumNo
TimescaleDB pattern (MATVIEW + WITH)GoodHighPartial9.3+MediumBest option

* Custom pg_dump support can be added via pg_dump hook or wrapper script.


17. Recommendations for pg_stream

Current Approach: Function API (Keep and Enhance)

pg_stream's current approach (pgstream.create_stream_table('name', 'query', ...)) is:

  • Proven — Same pattern as pg_ivm, Citus, and many other extensions
  • Simple — No shared_preload_libraries required for basic usage
  • Maintainable — No hook chains to debug
  • Portable — Works on any PG version that supports pgrx

Enhancement opportunities:

-- Current
SELECT pgstream.create_stream_table('order_totals',
    'SELECT region, SUM(amount) FROM orders GROUP BY region', '1m');

-- Enhanced: CALL syntax for more DDL-like feel (PG 11+)
CALL pgstream.create_stream_table('order_totals',
    $$SELECT region, SUM(amount) FROM orders GROUP BY region$$, '1m');

Future Option: TimescaleDB-style Materialized View Integration

If user demand justifies the complexity, pg_stream could add a second creation path via ProcessUtility_hook:

-- New native-feeling syntax (requires shared_preload_libraries)
CREATE MATERIALIZED VIEW order_totals
WITH (pgstream.stream = true, pgstream.schedule = '1m')
AS SELECT region, SUM(amount) FROM orders GROUP BY region
WITH NO DATA;

-- Original function API still works (no hook needed)
SELECT pgstream.create_stream_table('order_totals',
    'SELECT region, SUM(amount) FROM orders GROUP BY region', '1m');

Implementation plan for hook-based approach:

  1. Register ProcessUtility_hook in _PG_init() (already needed for shared_preload_libraries)
  2. Intercept CREATE MATERIALIZED VIEW → Check for pgstream.stream option
  3. If found: parse options, call create_stream_table_impl() internally, create standard storage table instead of matview
  4. Intercept DROP MATERIALIZED VIEW → Check if target is a stream table → Clean up
  5. Intercept REFRESH MATERIALIZED VIEW → Route to stream table refresh engine
  6. Intercept ALTER MATERIALIZED VIEW → Route to stream table alter logic

Estimated complexity: ~800-1200 lines of Rust hook code + tests.

  • Forking PostgreSQL for custom grammar — Maintenance cost is prohibitive
  • Table AM approach — Complexity without proportional benefit
  • FDW approach — Too many restrictions on foreign tables
  • COMMENT abuse — Fragile and poor UX

pg_dump / pg_restore Strategy

Regardless of approach, pg_dump is a challenge. Options:

  1. Custom dump/restore functionspgstream.dump_config() and pgstream.restore_config()
  2. Migration script generationpgstream.generate_migration() outputs SQL to recreate all stream tables
  3. Event trigger on restore — Detect when tables are restored and re-register them
  4. Sidecar file — Generate a companion SQL file alongside pg_dump

Appendix A: Hook Registration in pgrx (Rust)

For reference, here's how ProcessUtility_hook registration works in pgrx:

#![allow(unused)]
fn main() {
use pgrx::prelude::*;
use pgrx::pg_sys;
use std::ffi::CStr;

static mut PREV_PROCESS_UTILITY_HOOK: pg_sys::ProcessUtility_hook_type = None;

#[pg_guard]
pub extern "C-unwind" fn my_process_utility(
    pstmt: *mut pg_sys::PlannedStmt,
    query_string: *const std::os::raw::c_char,
    read_only_tree: bool,
    context: pg_sys::ProcessUtilityContext,
    params: pg_sys::ParamListInfo,
    query_env: *mut pg_sys::QueryEnvironment,
    dest: *mut pg_sys::DestReceiver,
    qc: *mut pg_sys::QueryCompletion,
) {
    // SAFETY: pstmt is a valid pointer provided by PostgreSQL
    let stmt = unsafe { (*pstmt).utilityStmt };

    // Check if this is a CreateTableAsStmt (materialized view)
    if unsafe { pgrx::is_a(stmt, pg_sys::NodeTag::T_CreateTableAsStmt) } {
        // Check for our custom options...
    }

    // Chain to previous hook or standard handler
    unsafe {
        if let Some(prev) = PREV_PROCESS_UTILITY_HOOK {
            prev(pstmt, query_string, read_only_tree, context,
                 params, query_env, dest, qc);
        } else {
            pg_sys::standard_ProcessUtility(
                pstmt, query_string, read_only_tree, context,
                params, query_env, dest, qc);
        }
    }
}

pub fn register_hooks() {
    unsafe {
        PREV_PROCESS_UTILITY_HOOK = pg_sys::ProcessUtility_hook;
        pg_sys::ProcessUtility_hook = Some(my_process_utility);
    }
}
}

Appendix B: Key Source Files in PostgreSQL

FilePurpose
src/backend/parser/gram.ySQL grammar (~18,000 lines)
src/backend/parser/scan.lLexer/tokenizer
src/include/parser/kwlist.hKeyword definitions
src/backend/tcop/utility.cProcessUtility() — DDL dispatcher
src/backend/commands/tablecmds.cCREATE/ALTER/DROP TABLE implementation
src/backend/commands/createas.cCREATE TABLE AS / CREATE MATVIEW AS
src/include/access/tableam.hTable Access Method API
src/include/foreign/fdwapi.hFDW API
src/backend/commands/event_trigger.cEvent trigger infrastructure

Appendix C: References

  1. PostgreSQL Documentation — Table Access Method Interface
  2. PostgreSQL Documentation — Event Triggers
  3. PostgreSQL Documentation — Writing A Foreign Data Wrapper
  4. TimescaleDB Source — process_utility.c
  5. Citus Source — multi_utility.c
  6. pg_ivm Source — createas.c
  7. pgrx Documentation — Hooks
  8. PostgreSQL Wiki — CustomScanProviders