Validation
A migration can be complete without being correct. Row counts can match while field values are truncated, incorrectly mapped, or silently corrupted.
Validation in pulsaride-transform gives you confidence at both levels — fast completeness checks and deep field-level accuracy — without manual SQL queries or separate tooling.
How it fits in the pipeline
Source DB ↓ migration (full-load or CDC) Target DB ↓ L1 — row count check (always runs) ↓ L2 — sampled field diff (runs when expect: is configured) ↓ Dashboard /pulsaride DLQ table pulsaride_dlq
Did all my data move? (L1)
What it gives you
Confirmation that no rows were lost or duplicated during migration. If the source had 12,483 rows and the target has 12,483 rows, L1 passes. If the counts diverge, the table is markedFAIL and the run exits with code 1.
How it works
After each table completes, pulsaride-transform queries both databases and compares row counts. The check runs automatically — no configuration required. Results are exposed via Spring Actuator:
GET /actuator/health/pulsaride
{
"status": "UP",
"details": {
"products": { "status": "PASS", "source": 12483, "target": 12483 },
"orders": { "status": "PASS", "source": 4201, "target": 4201 },
"invoices": { "status": "FAIL", "source": 8900, "target": 8897, "delta": -3 }
}
}The same status is visible per-table in the monitoring dashboard at /pulsaride. A FAIL on any table blocks the cutover signal.
Is my data actually correct? (L2)
What it gives you
Field-level confidence. L2 samples a percentage of migrated rows, joins source and target on the key columns you specify, and compares every field value. Mismatches are written to the DLQ with the source value, target value, and field name — so you can see exactly what drifted.
How it works
L2 activates when you add an expect: block to your migration YAML. Thesample_pct controls how many rows are diff-checked; key_columnstells the engine how to join the two sides:
# migration/orders.yaml name: orders target_table: orders expect: min_rows: 4000 sample_pct: 5 # check 5% of rows field-by-field key_columns: [order_id] max_dlq_rows: 10 # fail the run if more than 10 rows land in DLQ
Fields added or renamed by your transform rules are diff-checked against the mapped target column — not the original source column name.
Inspecting failures
When validation fails, issues surface in two places:
- Monitoring dashboard (
/pulsaride) — real-time per-table status, DLQ depth, and diff mismatch count. - DLQ table (
pulsaride_dlq) — one row per rejected or mismatched record, with the source payload and rejection reason.
-- See all L2 mismatches for the orders table SELECT source_key, field_name, source_value, target_value, reason, created_at FROM pulsaride_dlq WHERE table_name = 'orders' AND status = 'DIFF_MISMATCH' ORDER BY created_at DESC LIMIT 50;
The DLQ is created automatically in the target database. Each row includes the full source payload, the specific field that failed, and the run ID for correlation.
When to use each level
| Situation | Recommended level |
|---|---|
| Large tables, fast pipeline, no field transforms | L1 only (default) |
| Tables with field mappings, type casts, or expressions | L1 + L2 with sample_pct: 2–5 |
| Critical tables (payments, identities, inventory) | L1 + L2 with sample_pct: 10–20 and max_dlq_rows: 0 |
| Pre-cutover final check | L1 + L2 with sample_pct: 100 on key tables |
Real-world example: silent data corruption
Consider a migration that maps PRODUCT_NAME VARCHAR2(200) in Oracle toname VARCHAR(100) in PostgreSQL. The row count check (L1) passes because every row transferred. But any product name over 100 characters was silently truncated.
L1 would not catch this. L2 catches it immediately: the sampled diff finds a mismatch on the name field and writes the affected rows to the DLQ withreason: VALUE_TRUNCATED.
Adding an expect assertion catches it at the schema level before migration even runs:
fields:
- name: name
source: p.PRODUCT_NAME
target_type: varchar(100)
expect: "length(?) <= 100" # fails at row level if violatedexpect: rules reference
| Rule | Type | Description |
|---|---|---|
min_rows | integer | Fails the run if the target row count is below this value after migration |
sample_pct | float (0–100) | Percentage of source rows to sample for L2 field-level diff |
key_columns | list of strings | Columns used to join source and target rows during the diff |
max_dlq_rows | integer | Fails the run if the DLQ row count for this table exceeds this value |