Dependency-Aware Migration

The depends_ondirective lets you migrate tables in any order without worrying about FK constraints. Rows that reference a parent row that hasn't migrated yet are automatically deferred and replayed the moment the parent arrives.

The problem it solves

In a classic Oracle → PostgreSQL migration, FK relationships force you to migrate tables in a strict topological order: customers before orders, orders before order_items. If two jobs run in parallel or a parent row fails, child rows silently fail or require manual re-runs.

With depends_on, execution order becomes irrelevant. Pulsaride tracks the dependency automatically and replays deferred rows without any operator intervention.

YAML configuration

name: orders-migration
version: "1.0"
target_table: orders
reject_policy: FAIL_ROW        # failed rows → DLQ, not abort

depends_on:
  - table: customers            # the parent table that must exist first
    key: id                     # the PK column in the parent
    via: o.CUSTOMER_ID          # the FK column in this table's source
    on_missing: DEFER           # DEFER | FAIL | SKIP

sources:
  - name: o
    table: ORDERS
fields:
  - name: order_id
    source: o.ORDER_ID
  - name: customer_id
    source: o.CUSTOMER_ID
  # ... other fields

on_missing policy

ValueBehaviorWhen to use
DEFERRow is saved to PULSARIDE_DEFERRED and replayed when the parent arrivesNormal FK relationships — parent will eventually be migrated
FAILRow is routed to the DLQ as a hard FK violationParent should always exist — missing parent is a data quality error
SKIPRow is silently droppedOrphaned rows that are acceptable to lose (e.g. deleted parent)

How it works

  1. RouteDeferralEngine.route(row) is called for each source row before writing. It checks whether the referenced parent key exists in the target.
  2. Defer — If the parent is missing, the row is saved to the PULSARIDE_DEFERREDtable with its blocking table, key value, and full payload.
  3. Write — If the parent exists, the row is written normally.
  4. onWritten hook — When a parent row is successfully written (e.g. a customers row),DeferralEngine.onWritten(writtenKeys) is called automatically.
  5. ReplayreplayPending() looks up all deferred rows blocked on the newly arrived keys and writes them. Multi-level cascades are re-evaluated recursively.
  6. Symmetric — The same hooks work in both batch mode (JdbcRowWriter) and CDC mode (CdcMergeWriter). A customer arriving via CDC will trigger replay of deferred orders.

Multi-table cascade example

# order_items depends on both orders AND products
name: order-items-migration
target_table: order_items
depends_on:
  - table: orders
    key: order_id
    via: oi.ORDER_ID
    on_missing: DEFER
  - table: products
    key: product_id
    via: oi.PRODUCT_ID
    on_missing: DEFER

# order_items are deferred until BOTH orders and products exist.
# They replay as soon as the last dependency is resolved.

Performance optimisation — preload

If the parent table is already fully migrated before child migration starts, call DeferralEngine.preload(table, keyColumn) at step initialisation. This bulk-loads all known parent keys into an in-memory HashSet, eliminating per-row DB lookups and avoiding any deferral.

// In your Spring Batch step listener:
deferralEngine.preload("customers", "id");
// All customer IDs are now in-memory — orders write without deferral.

CLI commands

# List all deferred rows
pt migrate deferred list

# Filter by destination table
pt migrate deferred list --target orders

# Filter by the blocking dependency
pt migrate deferred list --blocking customers

# Manually trigger replay (useful after a parent batch completes out-of-band)
pt migrate deferred replay --blocking customers

Diagnostics

# pt diagnose Check 7 shows deferred counts per blocking table:
pt diagnose --target-url "$TGT_URL"

# Check 7: Deferred rows
#   customers: 3525 pending  (oldest: 2026-04-05T14:32:11Z)
#   products:  0 pending

PULSARIDE_DEFERRED schema

The table is created automatically at startup alongside MIGRATION_RUNS and DLQ_EVENTS. No manual DDL required.

CREATE TABLE PULSARIDE_DEFERRED (
  id              BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  target_table    VARCHAR(255) NOT NULL,
  blocking_table  VARCHAR(255) NOT NULL,
  blocking_key    VARCHAR(255) NOT NULL,
  payload         TEXT         NOT NULL,  -- JSON row snapshot
  deferred_at     TIMESTAMP    NOT NULL,
  replayed_at     TIMESTAMP               -- null until successfully written
);

Guarantees

  • Execution order is irrelevant. Migrating customers before orders or orders before customers produces identical end state.
  • No double-writes. Once a deferred row is replayed and written, replayed_atis stamped — subsequent replay attempts are no-ops.
  • Backward compatible. Configs without depends_on are unaffected — the DeferralEngine is a no-op when no dependencies are declared.
  • CDC symmetric. A parent arriving via a Debezium CDC stream triggers the same replay path as a parent arriving via batch full-load.