Validation

A migration can be complete without being correct. Row counts can match while field values are truncated, incorrectly mapped, or silently corrupted.

Validation in pulsaride-transform gives you confidence at both levels — fast completeness checks and deep field-level accuracy — without manual SQL queries or separate tooling.

How it fits in the pipeline

Source DB
   ↓  migration (full-load or CDC)
Target DB
   ↓  L1 — row count check        (always runs)
   ↓  L2 — sampled field diff      (runs when expect: is configured)
   ↓
Dashboard  /pulsaride
DLQ table  pulsaride_dlq

Did all my data move? (L1)

What it gives you

Confirmation that no rows were lost or duplicated during migration. If the source had 12,483 rows and the target has 12,483 rows, L1 passes. If the counts diverge, the table is markedFAIL and the run exits with code 1.

How it works

After each table completes, pulsaride-transform queries both databases and compares row counts. The check runs automatically — no configuration required. Results are exposed via Spring Actuator:

GET /actuator/health/pulsaride

{
  "status": "UP",
  "details": {
    "products": { "status": "PASS", "source": 12483, "target": 12483 },
    "orders":   { "status": "PASS", "source":  4201, "target":  4201 },
    "invoices": { "status": "FAIL", "source":  8900, "target":  8897, "delta": -3 }
  }
}

The same status is visible per-table in the monitoring dashboard at /pulsaride. A FAIL on any table blocks the cutover signal.

Is my data actually correct? (L2)

What it gives you

Field-level confidence. L2 samples a percentage of migrated rows, joins source and target on the key columns you specify, and compares every field value. Mismatches are written to the DLQ with the source value, target value, and field name — so you can see exactly what drifted.

How it works

L2 activates when you add an expect: block to your migration YAML. Thesample_pct controls how many rows are diff-checked; key_columnstells the engine how to join the two sides:

# migration/orders.yaml
name: orders
target_table: orders
expect:
  min_rows: 4000
  sample_pct: 5           # check 5% of rows field-by-field
  key_columns: [order_id]
  max_dlq_rows: 10        # fail the run if more than 10 rows land in DLQ

Fields added or renamed by your transform rules are diff-checked against the mapped target column — not the original source column name.

Inspecting failures

When validation fails, issues surface in two places:

Monitoring dashboard (/pulsaride) — real-time per-table status, DLQ depth, and diff mismatch count.
DLQ table (pulsaride_dlq) — one row per rejected or mismatched record, with the source payload and rejection reason.

-- See all L2 mismatches for the orders table
SELECT
  source_key,
  field_name,
  source_value,
  target_value,
  reason,
  created_at
FROM pulsaride_dlq
WHERE table_name = 'orders'
  AND status     = 'DIFF_MISMATCH'
ORDER BY created_at DESC
LIMIT 50;

The DLQ is created automatically in the target database. Each row includes the full source payload, the specific field that failed, and the run ID for correlation.

When to use each level

Situation	Recommended level
Large tables, fast pipeline, no field transforms	L1 only (default)
Tables with field mappings, type casts, or expressions	L1 + L2 with `sample_pct: 2–5`
Critical tables (payments, identities, inventory)	L1 + L2 with `sample_pct: 10–20` and `max_dlq_rows: 0`
Pre-cutover final check	L1 + L2 with `sample_pct: 100` on key tables

Real-world example: silent data corruption

Consider a migration that maps PRODUCT_NAME VARCHAR2(200) in Oracle toname VARCHAR(100) in PostgreSQL. The row count check (L1) passes because every row transferred. But any product name over 100 characters was silently truncated.

L1 would not catch this. L2 catches it immediately: the sampled diff finds a mismatch on the name field and writes the affected rows to the DLQ withreason: VALUE_TRUNCATED.

Adding an expect assertion catches it at the schema level before migration even runs:

fields:
  - name: name
    source: p.PRODUCT_NAME
    target_type: varchar(100)
    expect: "length(?) <= 100"   # fails at row level if violated

expect: rules reference

Rule	Type	Description
`min_rows`	integer	Fails the run if the target row count is below this value after migration
`sample_pct`	float (0–100)	Percentage of source rows to sample for L2 field-level diff
`key_columns`	list of strings	Columns used to join source and target rows during the diff
`max_dlq_rows`	integer	Fails the run if the DLQ row count for this table exceeds this value