Dependency-Aware Migration
The depends_ondirective lets you migrate tables in any order without worrying about FK constraints. Rows that reference a parent row that hasn't migrated yet are automatically deferred and replayed the moment the parent arrives.
The problem it solves
In a classic Oracle → PostgreSQL migration, FK relationships force you to migrate tables in a strict topological order: customers before orders, orders before order_items. If two jobs run in parallel or a parent row fails, child rows silently fail or require manual re-runs.
With depends_on, execution order becomes irrelevant. Pulsaride tracks the dependency automatically and replays deferred rows without any operator intervention.
YAML configuration
name: orders-migration
version: "1.0"
target_table: orders
reject_policy: FAIL_ROW # failed rows → DLQ, not abort
depends_on:
- table: customers # the parent table that must exist first
key: id # the PK column in the parent
via: o.CUSTOMER_ID # the FK column in this table's source
on_missing: DEFER # DEFER | FAIL | SKIP
sources:
- name: o
table: ORDERS
fields:
- name: order_id
source: o.ORDER_ID
- name: customer_id
source: o.CUSTOMER_ID
# ... other fieldson_missing policy
| Value | Behavior | When to use |
|---|---|---|
DEFER | Row is saved to PULSARIDE_DEFERRED and replayed when the parent arrives | Normal FK relationships — parent will eventually be migrated |
FAIL | Row is routed to the DLQ as a hard FK violation | Parent should always exist — missing parent is a data quality error |
SKIP | Row is silently dropped | Orphaned rows that are acceptable to lose (e.g. deleted parent) |
How it works
- Route —
DeferralEngine.route(row)is called for each source row before writing. It checks whether the referenced parent key exists in the target. - Defer — If the parent is missing, the row is saved to the
PULSARIDE_DEFERREDtable with its blocking table, key value, and full payload. - Write — If the parent exists, the row is written normally.
- onWritten hook — When a parent row is successfully written (e.g. a customers row),
DeferralEngine.onWritten(writtenKeys)is called automatically. - Replay —
replayPending()looks up all deferred rows blocked on the newly arrived keys and writes them. Multi-level cascades are re-evaluated recursively. - Symmetric — The same hooks work in both batch mode (
JdbcRowWriter) and CDC mode (CdcMergeWriter). A customer arriving via CDC will trigger replay of deferred orders.
Multi-table cascade example
# order_items depends on both orders AND products
name: order-items-migration
target_table: order_items
depends_on:
- table: orders
key: order_id
via: oi.ORDER_ID
on_missing: DEFER
- table: products
key: product_id
via: oi.PRODUCT_ID
on_missing: DEFER
# order_items are deferred until BOTH orders and products exist.
# They replay as soon as the last dependency is resolved.Performance optimisation — preload
If the parent table is already fully migrated before child migration starts, call DeferralEngine.preload(table, keyColumn) at step initialisation. This bulk-loads all known parent keys into an in-memory HashSet, eliminating per-row DB lookups and avoiding any deferral.
// In your Spring Batch step listener:
deferralEngine.preload("customers", "id");
// All customer IDs are now in-memory — orders write without deferral.CLI commands
# List all deferred rows pt migrate deferred list # Filter by destination table pt migrate deferred list --target orders # Filter by the blocking dependency pt migrate deferred list --blocking customers # Manually trigger replay (useful after a parent batch completes out-of-band) pt migrate deferred replay --blocking customers
Diagnostics
# pt diagnose Check 7 shows deferred counts per blocking table: pt diagnose --target-url "$TGT_URL" # Check 7: Deferred rows # customers: 3525 pending (oldest: 2026-04-05T14:32:11Z) # products: 0 pending
PULSARIDE_DEFERRED schema
The table is created automatically at startup alongside MIGRATION_RUNS and DLQ_EVENTS. No manual DDL required.
CREATE TABLE PULSARIDE_DEFERRED ( id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, target_table VARCHAR(255) NOT NULL, blocking_table VARCHAR(255) NOT NULL, blocking_key VARCHAR(255) NOT NULL, payload TEXT NOT NULL, -- JSON row snapshot deferred_at TIMESTAMP NOT NULL, replayed_at TIMESTAMP -- null until successfully written );
Guarantees
- Execution order is irrelevant. Migrating customers before orders or orders before customers produces identical end state.
- No double-writes. Once a deferred row is replayed and written,
replayed_atis stamped — subsequent replay attempts are no-ops. - Backward compatible. Configs without
depends_onare unaffected — the DeferralEngine is a no-op when no dependencies are declared. - CDC symmetric. A parent arriving via a Debezium CDC stream triggers the same replay path as a parent arriving via batch full-load.