Chapter 16

Making Sense of Reconciliation Outputs

A reconciliation run produces a report. The report contains counts, hashes, sample comparisons, and aggregate deltas. The challenge is not generating the report — it is reading it correctly. A reconciliation output that reports mismatches is not always a data problem. It may be expected behavior, timing, or a false positive from a type mapping that the reconciliation engine does not account for.

This chapter covers how to read reconciliation reports, how to classify mismatches by root cause, how to identify false positives, and what operator action each type of finding requires.

16.1 The Anatomy of a Reconciliation Report

A Pulsaride Transform reconciliation report has four sections:

Summary: Overall status, timestamp, CDC lag at run time, number of tables checked, number of passing/failing tables.

Table-level results: One row per table showing the reconciliation mode, result (PASS/FAIL/WARN), and the numeric delta.

Discrepancy detail: For each failing or warning table, the specific rows or aggregates that differ, with source and target values.

Classification: Each discrepancy is classified by type (see 16.2) to guide operator action.

RECONCILIATION REPORT
Run ID: recon-4421
Timestamp: 2026-04-15 02:14:33 UTC
CDC Lag: 18 seconds

SUMMARY
  Tables checked: 340
  Tables PASS: 337
  Tables WARN: 2
  Tables FAIL: 1

TABLE RESULTS
  account         COUNT  PASS  Oracle: 847,331   PostgreSQL: 847,331  Delta: 0
  ledger_entry    HASH   FAIL  Segment mismatch in segment 73/100
  order_header    COUNT  WARN  Oracle: 2,441,882  PostgreSQL: 2,441,876  Delta: -6
  customer        SAMPLE WARN  3/1000 sample rows differ (see detail)

DISCREPANCY DETAIL
  ledger_entry (HASH FAIL)
    Segment 73: hash mismatch
    Drill-down: 3 rows
      ledger_id=9,201,441  amount: Oracle=1500.0000  PG=1499.9900  Delta=0.01
      ledger_id=9,201,442  amount: Oracle=750.0000   PG=749.9900   Delta=0.01
      ledger_id=9,201,889  amount: Oracle=2250.0000  PG=2249.9900  Delta=0.01

  order_header (COUNT WARN)
    Delta: -6 rows  (PostgreSQL has 6 fewer rows)
    CDC lag at run time: 18 seconds
    Classification: EXPECTED_CDC_DELTA (6 rows likely in-flight)

  customer (SAMPLE WARN)
    Row customer_id=44,891:
      email: Oracle='User@EXAMPLE.COM'  PG='user@example.com'
      Classification: CASE_NORMALIZATION (email was lowercased during migration)
    Row customer_id=71,022:
      phone_number: Oracle='+1 (555) 012-3456'  PG='+15550123456'
      Classification: NORMALIZATION_RULE (phone_format rule applied)
    Row customer_id=88,104:
      status: Oracle='ACTIVE'  PG='ACTIVE'  Match — false positive in sample hash
      Classification: FALSE_POSITIVE (hash collision artifact, values identical)

16.2 Discrepancy Classification

Each discrepancy falls into one of eight classifications. The classification drives the operator action.

Classification 1: EXPECTED_CDC_DELTA

What it is: The row or count difference is within the expected range given the current CDC lag.

Example: 6 missing rows in order_header when CDC lag is 18 seconds and the insert rate is 0.5 rows/second. Expected in-flight: 18 × 0.5 = 9 rows. Delta of 6 is within expectation.

Operator action: None. Monitor — the delta should reduce to zero as CDC lag decreases.

Escalation trigger: If the delta does not decrease as CDC lag decreases, reclassify as DATA_LOSS.

Classification 2: DATA_LOSS

What it is: PostgreSQL has fewer rows than Oracle, and the difference cannot be explained by CDC lag.

Example: 4,221 missing rows in transaction when CDC lag is 2 seconds. The insert rate is 10 rows/second. Expected in-flight: 20 rows. Actual delta: 4,221.

Operator action: Immediate investigation. Root causes:

Full load chunk failed silently (check staging ledger)
CDC event gap (check _pulsaride.scn_gaps)
Rows filtered by a transformation rule that should not have excluded them

Classification 3: VALUE_DRIFT

What it is: Row exists in both systems but column values differ, and the difference is not explained by a known normalization rule.

Example: ledger_entry.amount differs by $0.01 for 3 rows. No normalization rule applies to amount.

Operator action: Investigate the specific row history in Oracle (flashback query to understand when the value changed, whether CDC captured the change) and apply compensating write if the Oracle value is correct.

Classification 4: NORMALIZATION_RULE

What it is: Column values differ because a configured transformation rule was applied during migration and the reconciliation engine is comparing pre-normalization Oracle values to post-normalization PostgreSQL values.

Example: customer.phone_number differs because a phone_format: E164 rule was applied. Oracle stores +1 (555) 012-3456; PostgreSQL stores +15550123456. Both are correct representations.

Operator action: Verify the rule is configured correctly. If the difference is expected, add this column to the excludeFromComparison list for this rule. No data correction needed.

Product note: Pulsaride Transform's reconciliation engine reads the normalization rule configuration and applies the inverse transformation to the PostgreSQL value before comparison, eliminating false positives from expected normalization. If a VALUE_DRIFT discrepancy appears in a normalized column, it means either the normalization was applied inconsistently or the CDC event bypassed the normalization rule.

Classification 5: CASE_NORMALIZATION

What it is: A string column differs by case. Caused by case normalization rules (email lowercase, status uppercase) that were applied inconsistently.

Example: customer.email differs: Oracle has User@EXAMPLE.COM, PostgreSQL has user@example.com. A lowercase normalization was applied during full load.

Operator action: Verify the rule was applied consistently. If CDC events for this row did not apply the normalization (raw value copied without transformation), this is a CDC normalization gap — update the CDC transformation pipeline to apply the same rule.

Classification 6: TYPE_ENCODING

What it is: Identical semantic values that differ in representation due to type mapping differences.

Example:

Oracle TIMESTAMP value: 2024-01-15 14:30:00 (no timezone)
PostgreSQL TIMESTAMP WITH TIME ZONE value: 2024-01-15 14:30:00+00 (UTC)
Comparison: mismatch (string comparison fails), but values are semantically identical

Example:

Oracle NUMBER(5,2) value: 100 (stored as integer, no decimal point)
PostgreSQL NUMERIC(5,2) value: 100.00 (stored with decimal precision)
Comparison: mismatch if comparing strings, match if comparing numeric values

Operator action: Update the reconciliation comparison logic to account for the type encoding difference. No data correction needed — values are semantically identical.

Classification 7: FALSE_POSITIVE

What it is: The reconciliation engine reported a mismatch but the values are actually identical. Caused by comparison implementation bugs, encoding artifacts, or hash collisions.

Example: A sample reconciliation flags customer_id=88,104 as differing, but fetching the actual row shows identical values. This can occur with certain Unicode normalization forms that are visually identical but byte-different (e.g., ñ as a single character vs n + combining tilde).

Operator action: Investigate the specific comparison that failed. If values are genuinely identical, add the row to a known-false-positive list and report the bug to pulsaride support with the comparison details.

Classification 8: SCHEMA_DRIFT

What it is: Oracle and PostgreSQL schemas have diverged — a column was added or dropped in Oracle during parallel running, and the target has not been updated.

Example: A new column risk_score NUMERIC(5,2) was added to customer in Oracle on April 12. PostgreSQL does not have this column. Reconciliation for customer shows column count mismatch.

Operator action: Apply the schema change to PostgreSQL (add the column). Update the CDC transformation to include the new column. Re-run reconciliation for customer.

16.3 Reading the False Positive Rate

Every reconciliation run has a false positive rate — discrepancies that appear in the report but do not represent actual data errors. A high false positive rate degrades operator trust in the report: operators who see 200 discrepancies and know 190 of them are false positives will start dismissing real ones.

Managing the false positive rate requires:

Configuring normalization rules correctly. The reconciliation engine applies inverse transformations for known rules. Correctly configured rules eliminate the largest source of false positives.
Excluding CDC-lag deltas. Count deltas within the CDC-lag tolerance are classified as EXPECTED_CDC_DELTA and filtered from the FAIL/WARN display.
Type mapping configuration. Column-level comparison rules handle encoding differences (timestamp timezone, numeric precision).
Baseline tracking. After the first reconciliation run, baseline the expected discrepancy rate. Subsequent runs compare against the baseline — only new discrepancies are flagged.

Product note: The monitoring dashboard shows a false positive trend: the percentage of discrepancies classified as NORMALIZATION_RULE, CASE_NORMALIZATION, TYPE_ENCODING, or FALSE_POSITIVE over the past 7 days. A rising false positive trend indicates misconfigured normalization rules. The goal is < 5% of reported discrepancies being false positives.

16.4 The Operator Workflow for a Failed Reconciliation

When a table shows FAIL status, the operator workflow is:

1. Read the discrepancy detail for the failing table
2. Classify each discrepancy (use the 8-classification framework)
3. For EXPECTED_CDC_DELTA: wait 15 minutes, re-run reconciliation
4. For DATA_LOSS: open incident, investigate staging ledger and SCN gaps
5. For VALUE_DRIFT: investigate CDC event history, generate compensating write
6. For NORMALIZATION_RULE / CASE_NORMALIZATION: verify rule configuration
7. For TYPE_ENCODING: update comparison configuration
8. For FALSE_POSITIVE: document and escalate to product team
9. After resolution: re-run reconciliation for the failing table
10. If PASS: update table status in monitoring dashboard

Most FAIL results on day 1–2 of parallel running are EXPECTED_CDC_DELTA. Most FAIL results on day 3+ (after CDC is stable) are VALUE_DRIFT or DATA_LOSS. This pattern is a signal: if you are seeing many EXPECTED_CDC_DELTA on day 5 of parallel running, CDC is not catching up and the lag problem must be addressed before cutover.

16.5 Example: Three Discrepancies, Three Resolutions

Discrepancy 1: order_header count delta of 6 rows, CDC lag 18 seconds Classification: EXPECTED_CDC_DELTA. Action: none. Re-run in 5 minutes: delta is 0. PASS.

Discrepancy 2: ledger_entry amount differences of $0.01 in 3 rows Classification: VALUE_DRIFT. Action: fetch the Oracle values for ledger_ids 9,201,441, 9,201,442, 9,201,889. Oracle values: 1500.0000, 750.0000, 2250.0000. Investigation: a rounding transformation was applied during full load that rounded to 2 decimal places before casting to NUMERIC(15,4). The source values (1499.99, 749.99, 2249.99) were in Oracle and a rounding error pushed them to these values. The migration normalized incorrectly — the correct values are the Oracle values. Compensating writes restore the correct amounts. Re-run: PASS.

Discrepancy 3: customer.email case differences for 3 rows Classification: CASE_NORMALIZATION. Action: verify that all 3 rows had the email lowercased by the full load normalization. Yes — confirmed. These 3 rows happen to have emails that were stored in mixed case in Oracle. The normalization was applied correctly. CDC was also applied correctly (CDC preserves the email as Oracle sends it, then applies lowercase normalization). Expected difference. Add email column to the reconciliation's caseInsensitiveColumns list for customer. Re-run: PASS.

All three discrepancies resolved without data corruption. The monitoring dashboard shows all 340 tables GREEN. The parallel running day 3 reconciliation report: 0 FAIL, 0 WARN. Cutover is on track.

← Previous

Chapter 15Reconciliation-Driven Validation

Chapter 17How Reconciliation Improves the Product