Chapter 11

Staging as Product Memory

The staging layer is the part of the migration pipeline that most implementations omit. Without it, the pipeline is a pipe: data flows from Oracle to PostgreSQL and there is no persistent record of what happened, what was applied, what was rejected, or what state the pipeline was in before the last restart. A pipe can load data. It cannot answer "why does this row not match" two weeks after the full load, or "which batch caused that lag spike on Tuesday," or "can we safely resume from checkpoint 4,432 after the network failure."

Staging is the audit trail, the checkpoint store, and the error ledger that transforms a data pipe into a product.

11.1 The Staging Layer Architecture

The staging layer is a set of tables in the target PostgreSQL database (or in a dedicated schema) that the migration pipeline writes to as it works. It is not the application data — it is metadata about the migration state.

Three categories of staging tables:

1. Progress tracking: Records what has been done (full load chunks, CDC positions, reconciliation runs).

2. Row-level status: Records the state of individual rows that required special handling — rejected rows, rows with validation failures, rows that required compensating writes.

3. Event log: A time-ordered log of pipeline events (starts, completions, errors, threshold breaches).

-- Staging schema: separate from application data
CREATE SCHEMA _pulsaride;

-- Core staging tables
CREATE TABLE _pulsaride.full_load_status (
  table_name     TEXT NOT NULL,
  chunk_id       BIGINT NOT NULL,
  min_key        TEXT,
  max_key        TEXT,
  rows_expected  BIGINT,
  rows_loaded    BIGINT NOT NULL DEFAULT 0,
  status         TEXT NOT NULL DEFAULT 'PENDING',
  anchor_scn     NUMERIC NOT NULL,
  started_at     TIMESTAMP WITH TIME ZONE,
  completed_at   TIMESTAMP WITH TIME ZONE,
  error_message  TEXT,
  PRIMARY KEY (table_name, chunk_id)
);

CREATE TABLE _pulsaride.cdc_checkpoint (
  table_name      TEXT NOT NULL,
  last_applied_scn NUMERIC NOT NULL,
  last_event_ts   TIMESTAMP WITH TIME ZONE,
  events_applied  BIGINT NOT NULL DEFAULT 0,
  events_skipped  BIGINT NOT NULL DEFAULT 0,
  updated_at      TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
  PRIMARY KEY (table_name)
);

CREATE TABLE _pulsaride.rejected_rows (
  reject_id       BIGSERIAL PRIMARY KEY,
  table_name      TEXT NOT NULL,
  source_pk       TEXT NOT NULL,
  phase           TEXT NOT NULL, -- 'FULL_LOAD' or 'CDC'
  reject_reason   TEXT NOT NULL,
  reject_detail   JSONB,
  source_row      JSONB,
  created_at      TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

CREATE TABLE _pulsaride.event_log (
  event_id        BIGSERIAL PRIMARY KEY,
  event_type      TEXT NOT NULL,
  table_name      TEXT,
  detail          JSONB,
  created_at      TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

11.2 The Staging Layer as Auditable Ledger

The staging layer's first role is auditing: it provides an answer to "what happened and when" for any migration event. This is not a nice-to-have. Post-cutover incidents — "this account record is wrong, when did it change?" — are common. The staging layer provides the answer without requiring re-running the migration or accessing Oracle.

Full Load Audit

Each chunk in full_load_status records:

The exact key range loaded (min_key, max_key)
The anchor SCN at which the data was read
The number of rows loaded
The start and end timestamp

This allows reconstructing exactly what Oracle looked like at load time for any range of rows. If a post-cutover discrepancy is reported for account_id = 847,291, the audit query is:

-- Find the full load chunk that loaded account_id 847291
SELECT fl.*,
       TIMESTAMP_TO_SCN(fl.anchor_scn) AS snapshot_timestamp
FROM _pulsaride.full_load_status fl
WHERE fl.table_name = 'account'
AND fl.min_key::BIGINT <= 847291
AND fl.max_key::BIGINT >= 847291
AND fl.status = 'COMPLETE';

This tells you exactly when the row was loaded and from what Oracle snapshot. Combined with Oracle's audit table or redo log, you can reconstruct the complete history.

CDC Event Audit

The CDC checkpoint records the last applied SCN per table. Combined with the event log, it is possible to reconstruct which CDC events were applied to any table between any two SCNs.

Product note: Pulsaride Transform's event log records CDC events at batch granularity (not per-row) for performance. Each batch entry includes the SCN range, the table, the event count, the apply duration, and any errors. The pulsaride-monitoring dashboard surfaces the event log as a scrollable timeline with filtering by table and event type.

11.3 Row-Level Status: The Rejected Row Ledger

Not every row survives the migration unchanged. Rows are rejected or flagged when:

A column value violates a PostgreSQL constraint that Oracle did not enforce (e.g., a CHECK constraint on a column that has DISABLE NOVALIDATE in Oracle)
A LOB column exceeds a configured size limit
A value cannot be cast to the target type (e.g., 'N/A' stored in a column mapped to NUMERIC)
A NOT NULL column is NULL in Oracle (deferred constraint violation)
A transformation rule produces a NULL where the target requires a value

These rows are not silently dropped. They are written to _pulsaride.rejected_rows with the full source row (as JSONB), the reason, and the phase (FULL_LOAD or CDC).

-- Sample rejected row
INSERT INTO _pulsaride.rejected_rows
  (table_name, source_pk, phase, reject_reason, reject_detail, source_row)
VALUES (
  'customer_profile',
  '99821',
  'FULL_LOAD',
  'NOT_NULL_VIOLATION',
  '{"column": "email", "constraint": "customer_profile_email_not_null"}',
  '{"customer_id": 99821, "name": "Acme Corp", "email": null, "status": "ACTIVE"}'
);

The operator reviews rejected rows before cutover. Each row represents a data quality issue in the Oracle source that the target schema exposes. Resolution options:

Fix the source data in Oracle: Preferred — correct the root cause.
Adjust the target constraint: If the constraint is too strict for real-world data, relax it.
Apply a transformation: Map NULL email to a placeholder value (e.g., 'unknown@placeholder.invalid').
Accept the rejection: If the row represents genuinely invalid data that should not exist in the target.

Product note: Pulsaride Transform exposes rejected rows through a /pulsaride/rejected API endpoint and the monitoring dashboard. The dashboard groups rejected rows by table and reason, making it easy to identify systemic rejection patterns (e.g., "all 2,400 rejections are NOT_NULL_VIOLATION on customer_profile.email") versus individual data anomalies.

11.4 Replay Semantics

Replay semantics define what the pipeline does when it restarts after a failure. Two models:

Exactly-once replay: Every event is applied exactly once. Requires transactional coordination between the pipeline state and the data writes. Complex to implement correctly.

At-least-once replay with idempotent writes: Events may be replayed but the write operation is idempotent — applying the same event twice produces the same result as applying it once. Simpler to implement and sufficient for migration purposes.

Pulsaride Transform uses at-least-once with idempotent writes:

Full load: Chunk-level idempotency. Before re-loading a chunk, delete any partial data in that key range, then re-load.
CDC: UPSERT-based application. An UPDATE event becomes an INSERT ... ON CONFLICT DO UPDATE. Re-applying it has the same effect.
DELETE events: DELETE WHERE pk = :pk is idempotent — if the row is already deleted, the DELETE does nothing.

The one case where at-least-once is not sufficient: a DELETE followed by an INSERT for the same primary key. If the DELETE is re-applied after the INSERT (because of a replay), the row that should exist is deleted. This case arises when the CDC consumer restarts mid-transaction.

The mitigation is SCN-ordered application: events within a transaction are always applied in SCN order, and re-plays start from the beginning of the earliest uncommitted transaction, not from the last committed event. This guarantees that a DELETE-then-INSERT cannot be applied as DELETE-then-INSERT-DELETE.

11.5 Schema Evolution in Staging

The staging tables themselves must accommodate schema changes in the source during a migration. If Oracle adds a column to a table while the migration is running:

The full load chunks already completed did not load that column (it did not exist).
CDC events for rows after the ALTER TABLE will include the new column.
The target PostgreSQL table does not have the column yet.

Pulsaride Transform handles this through a staging schema_changes table:

CREATE TABLE _pulsaride.schema_changes (
  change_id     BIGSERIAL PRIMARY KEY,
  detected_at   TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
  detected_scn  NUMERIC NOT NULL,
  table_name    TEXT NOT NULL,
  change_type   TEXT NOT NULL, -- 'ADD_COLUMN', 'DROP_COLUMN', 'MODIFY_COLUMN'
  column_name   TEXT,
  old_type      TEXT,
  new_type      TEXT,
  applied       BOOLEAN NOT NULL DEFAULT FALSE,
  applied_at    TIMESTAMP WITH TIME ZONE
);

When the CDC consumer encounters an event with a column that does not exist in the target schema, it:

Pauses CDC application for that table
Records the schema change in schema_changes
Alerts the operator via the monitoring system

The operator:

Reviews the ALTER TABLE that occurred on Oracle
Applies the equivalent DDL on PostgreSQL
Marks the schema change as applied in the staging table
CDC resumes from the point of the schema change

This prevents schema divergence from silently corrupting the migration.

11.6 The Staging Layer as Operational Interface

The staging layer is the migration's operational interface. Operations teams who are not engineers can use the monitoring dashboard to answer operational questions without SQL:

"Is the migration still running?" → Event log: last event timestamp
"Are we behind?" → CDC checkpoint: lag metric
"How many rows failed?" → Rejected rows: count by table
"When will the full load finish?" → Full load status: completion percentage and estimated time

This shifts the operational model from "ask an engineer" to "look at the dashboard." In a migration that runs for 3–7 days with a team that needs sleep, this matters.

11.7 Example: The `payment_method` Rejection Pattern

During the full load of the payment_method table, 4,421 rows are rejected with NOT_NULL_VIOLATION on column card_token. The target schema declares card_token TEXT NOT NULL, but 4,421 Oracle rows have CARD_TOKEN IS NULL.

The rejected row ledger shows:

SELECT source_row->>'payment_method_id' AS id,
       source_row->>'payment_type' AS type,
       source_row->>'status' AS status
FROM _pulsaride.rejected_rows
WHERE table_name = 'payment_method'
AND reject_reason = 'NOT_NULL_VIOLATION'
AND reject_detail->>'column' = 'card_token'
LIMIT 10;

Output:

id       | type         | status
---------|--------------|--------
10042    | BANK_TRANSFER| INACTIVE
10183    | BANK_TRANSFER| INACTIVE
10291    | CASH         | INACTIVE
...

All 4,421 rejected rows have payment_type IN ('BANK_TRANSFER', 'CASH') — payment types that do not have a card token. The Oracle schema did not enforce NOT NULL on card_token because it knew that non-card payment types would not have a token. The target schema, designed for a card-first world, incorrectly declared it NOT NULL.

Resolution:

Change the target column definition to card_token TEXT (nullable)
Re-run the full load for the payment_method table
All 4,421 rows load successfully

The staging ledger records this as a CONSTRAINT_ADJUSTED event and notes the original constraint definition, the adjustment rationale, and the operator who approved it. This audit trail is available for any post-cutover compliance review.

← Previous

Chapter 10CDC Design for Migration

Chapter 12Dependency Resolution at Runtime