Appendix A

Appendix A: Edge-Case Playbook

This appendix documents ten migration edge cases with their symptoms, root causes, and resolutions. Each case represents a pattern observed in real Oracle-to-PostgreSQL migrations. The playbook format is: symptom first (what you observe), then root cause (why it happened), then resolution (what to do).

Case 1: UNDO Retention Exhaustion During Full Load

Symptom: Full load fails mid-run with ORA-30052: invalid lower limit snapshot expression. The error appears in the Oracle JDBC driver output during a flashback query. The pipeline logs show the error on a specific table's chunk (e.g., chunk 4,892 of the audit_log table).

Root cause: The full load issues AS OF SCN queries that use Oracle undo data to reconstruct the table's state at the anchor SCN. If the full load takes longer than Oracle's UNDO_RETENTION setting (typically 900 seconds = 15 minutes, but may be set to 3600 or more), Oracle discards the undo data needed to serve the flashback query.

Resolution: Option A (preferred): Increase UNDO_RETENTION before running the next full load:

ALTER SYSTEM SET UNDO_RETENTION = 86400 SCOPE=BOTH; -- 24 hours

Verify the undo tablespace has sufficient space to support the increased retention.

Option B: Enable continueWithoutFlashback in the pipeline configuration. Chunks that cannot use AS OF SCN fall back to live reads. These chunks are marked LOADED_WITHOUT_FLASHBACK in the staging ledger and receive additional reconciliation scrutiny.

Option C: Reduce full load duration. Increase worker count, increase chunk size for non-LOB tables, or split the load across multiple runs starting from the same anchor SCN (requires Oracle undo to be available for the entire multi-run period).

Prevention: Before starting the full load, estimate load duration and verify that Oracle's undo retention covers the full period. Formula: estimated_load_hours × 3600 < UNDO_RETENTION.

Case 2: LOB Column Causes Memory Pressure

Symptom: The pipeline process runs out of heap memory during the full load of a table with CLOB columns. Java OutOfMemoryError: Java heap space in the worker thread. The table in question has a document_body CLOB column with some values exceeding 10 MB.

Root cause: The default JDBC result set fetch size reads a fixed number of rows into memory at once. If the fetch size is 1,000 rows and some CLOBs are 10 MB each, a single fetch buffer can hold 10 GB — far exceeding the JVM heap.

Resolution: Reduce chunk size for LOB-heavy tables:

fullLoad:
  tables:
    - name: customer_document
      chunkSize: 500      # reduced from default 50000
      fetchSize: 10       # JDBC fetch size per chunk

Enable streaming LOB reads (reads each LOB value directly to the writer without buffering the full value):

fullLoad:
  tables:
    - name: customer_document
      lobColumns:
        - column: document_body
          streamingMode: true
          maxSizeBytes: 52428800  # reject LOBs > 50MB

Prevention: During assessment, query for maximum LOB sizes:

SELECT MAX(DBMS_LOB.GETLENGTH(document_body)) FROM customer_document;

For tables with LOBs > 1 MB, configure streaming mode proactively.

Case 3: Foreign Key Cycle Not Detected During Assessment

Symptom: The full load fails during wave 2 with a PostgreSQL FK violation: ERROR: insert or update on table "order_line" violates foreign key constraint "fk_ol_promotion". Investigation shows order_line has a FK to promotion, but promotion was not in wave 1 — it was in wave 2 with order_line, meaning neither was loaded before the other.

Root cause: Assessment cycle detection found the order_line → order_header → order_line cycle and resolved it, but missed a second cycle: order_line → promotion → promotion_code → order_line. The second cycle involved a table (promotion_code) that was not in the assessed scope (it was a newer table added after the assessment was run).

Resolution: Re-run the assessment to include the recently added table:

pulsaride-transform assess --refresh-dependencies

The new assessment detects the promotion_code cycle and adds it to the cycle resolution plan. A deferred FK strategy is applied to order_line.promotion_id.

Prevention: Run the assessment immediately before starting the full load, not weeks before. Schema changes between assessment and load can introduce new dependencies.

Case 4: SCN Wrap

Symptom: CDC stops delivering events. The last applied SCN is 9,999,999,999 (or near the Oracle MAXSCN). Subsequent SCNs in Oracle appear to be very small numbers (near 1). CDC events for those SCNs are not delivered because the pipeline's SCN comparison logic assumes SCNs are monotonically increasing.

Root cause: Oracle SCN can technically wrap (though in practice this takes thousands of years with a busy system). More commonly: the SCN format in a specific Oracle version changed (extended SCN in Oracle 12.2+), and the CDC consumer was not updated to handle 6-byte SCNs (vs the older 4-byte format). The "wrap" is actually a SCN format change that looks like a wrap to old consumers.

Resolution: Update Pulsaride Transform to the version that supports extended SCN (Oracle 12.2+ compatibility). Restart CDC from the last confirmed applied SCN (available in the staging checkpoint table).

If the SCN format change is the cause: reconfigure the Oracle JDBC connection URL to specify the correct Oracle version:

jdbc:oracle:thin:@//host:1521/service?oracle.jdbc.enableOracleXa=true

Case 5: CHAR Padding Causing Unique Constraint Violations

Symptom: After the full load of product_code, a unique constraint violation appears when trying to create the unique index on code:

ERROR: could not create unique index "uq_product_code"
DETAIL: Key (code)=(ABC  ) conflicts with existing key (code)=(ABC).

Root cause: Oracle CHAR columns are fixed-length, padded with spaces. A CHAR(10) column value 'ABC' is stored as 'ABC ' (7 trailing spaces). PostgreSQL TEXT columns do not pad. After migration, the table contains rows with both 'ABC ' (padded, from full load) and 'ABC' (unpadded, from a row that was inserted by the application in a different format). The unique index sees these as distinct values, but the application treats them as the same product code.

Resolution: Add a RTRIM normalization rule for CHAR columns during full load:

transformations:
  - table: product_code
    column: code
    operation: rtrim

This trims trailing spaces from all CHAR values during load. Verify that the application also RTRIMs values when querying and inserting.

Prevention: During assessment, identify all CHAR (not VARCHAR2) columns in the source schema:

SELECT table_name, column_name, data_length
FROM dba_tab_columns
WHERE data_type = 'CHAR'
AND owner = 'MYAPP';

Apply RTRIM normalization to all of them preemptively.

Case 6: Trigger Firing Order Causes Application Error

Symptom: After the PL/SQL triggers are migrated to PostgreSQL and the application is pointed at PostgreSQL (during testing), a specific workflow produces incorrect results. The workflow: update account.status → trigger fires → audit row inserted → second trigger fires → notification sent. In PostgreSQL, the notification trigger fires before the audit trigger, reversing the intended order.

Root cause: Oracle fires triggers in an unspecified order (in practice, stable by creation time). PostgreSQL fires triggers in alphabetical order by trigger name. The Oracle triggers were named account_status_audit and account_status_notify. Alphabetically, audit < notify, so in Oracle the audit trigger ran first. In PostgreSQL, the same alphabetical order applies: audit still runs first. But the original Oracle triggers were named differently: account_status_01_audit and account_status_02_notify — numeric prefix for ordering. During migration, the prefix was dropped, reversing the order.

Resolution: Rename PostgreSQL triggers to enforce the required order using alphabetical prefixes:

ALTER TRIGGER account_status_audit RENAME TO account_status_10_audit;
ALTER TRIGGER account_status_notify RENAME TO account_status_20_notify;

Prevention: Preserve Oracle trigger numeric naming prefixes during migration. Pulsaride Transform's trigger migration includes an option preserveOrderPrefix: true that retains any leading numeric prefix in the original trigger name.

Case 7: XMLTYPE Column Migration

Symptom: A table has an XMLTYPE column. The full load fails with ORA-00932: inconsistent datatypes when the pipeline tries to select the column with a standard JDBC query.

Root cause: Oracle XMLTYPE requires special JDBC handling — it cannot be read as a standard column using getString(). It requires Oracle's XMLType.createXML() or OPAQUE.getBytes() depending on the storage model (object-relational vs CLOB-based storage).

Resolution: Configure the XMLTYPE column for special handling:

fullLoad:
  tables:
    - name: product_spec
      columns:
        - name: spec_xml
          sourceType: XMLTYPE
          targetType: XML           # PostgreSQL XML type
          extractMethod: CLOB       # extract as CLOB then cast

The pipeline issues:

-- Oracle: extract XMLTYPE as CLOB
SELECT product_id, XMLTYPE.getClobVal(spec_xml) FROM product_spec ...

Then inserts to PostgreSQL as XML type. Validate that the application uses PostgreSQL's XML functions (xpath(), xmlquery()) rather than Oracle's XMLDOM package.

Case 8: Archived Log Gap During High-Volume Period

Symptom: CDC reports a SCN gap: WARN: SCN gap detected: 24,944,883–25,011,099. No events delivered for this range. The gap corresponds to the period 02:00–02:45 AM when a nightly batch job ran.

Root cause: Oracle switched redo logs rapidly during the batch job (high DML volume caused frequent log switches). The archived log retention policy was set to delete logs older than 2 hours. By the time the CDC consumer tried to read the archived log that covered 24,944,883–25,011,099, it had been deleted.

Resolution:

Check whether the gap affects any tables in scope:

SELECT seg_name, COUNT(*) FROM v$logmnr_contents
WHERE scn BETWEEN 24944883 AND 25011099
AND seg_owner = 'MYAPP'
GROUP BY seg_name;
-- ORA-01284: Oracle archive log gap — cannot re-read deleted logs

If the logs are gone, point-in-time recovery of the log contents is not possible.

Run targeted reconciliation for all tables that had DML during the batch window:

reconciliation:
  tables:
    - name: order_header
      mode: hash
      scnRange:
        from: 24944883
        to: 25011099

Apply compensating writes for any discrepancies found.

Prevention: Increase Oracle archive log retention to cover the full CDC catch-up window plus 24 hours of buffer:

-- Oracle RMAN: configure archive log retention
CONFIGURE ARCHIVELOG DELETION POLICY TO APPLIED ON ALL STANDBY AFTER 2 COPIES;
-- Set minimum retention to 7 days
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE;
DELETE ARCHIVELOG ALL COMPLETED BEFORE 'SYSDATE-7';

Case 9: NULL Semantics in CHECK Constraint

Symptom: A CHECK constraint that was disabled in Oracle (DISABLE NOVALIDATE) is enabled in PostgreSQL, and 3,412 rows fail the constraint:

ERROR: new row for relation "account" violates check constraint "chk_account_rating"
DETAIL: Failing row contains (NULL, 'INACTIVE', ...).

The constraint is CHECK (rating IN ('A', 'B', 'C', 'D')).

Root cause: In Oracle, NULL IN ('A', 'B', 'C', 'D') is neither TRUE nor FALSE — it is NULL. Oracle evaluates a CHECK constraint as "passing" if the expression evaluates to NULL (not just TRUE). This means Oracle allows NULL values in a column with a CHECK (col IN (...)) constraint.

In PostgreSQL, the same behavior applies: a CHECK constraint passes if the expression is TRUE or NULL. So NULL IN ('A', 'B', 'C', 'D') → NULL → constraint passes. The issue is not Oracle vs PostgreSQL NULL semantics — they are the same. The issue is that the data has NULLs that weren't expected.

Investigation: The 3,412 failing rows have rating IS NULL and status = 'INACTIVE'. The application logic allows NULL rating for inactive accounts, but this was only enforced at the application level (not by Oracle's disabled constraint). The constraint should be:

CHECK (rating IN ('A', 'B', 'C', 'D') OR (rating IS NULL AND status = 'INACTIVE'))

Resolution: Adjust the constraint to match the actual business rule. Apply the adjusted constraint in PostgreSQL.

Case 10: NUMBER(38) Primary Key and BIGINT Overflow

Symptom: The full load completes successfully. During CDC, an INSERT event fails:

ERROR: value "92233720368547758071" is out of range for type bigint
DETAIL: ledger_entry.ledger_id received value 92233720368547758071

Root cause: The Oracle ledger_id column is NUMBER(38) and the PostgreSQL column was mapped to BIGINT. BIGINT maximum value is 9,223,372,036,854,775,807 (about 9.2 × 10^18). The CDC event contains a value of 92,233,720,368,547,758,071 — larger than BIGINT's maximum, generated by an Oracle sequence with MAXVALUE = 9 × 10^37.

This did not occur during the full load because the table's current maximum value was within BIGINT range. A batch job ran during parallel running that generated a very large sequence value (possibly by resetting the sequence to a high start value).

Resolution:

Change the PostgreSQL column from BIGINT to NUMERIC(38):

ALTER TABLE ledger_entry ALTER COLUMN ledger_id TYPE NUMERIC(38);

Update the primary key and foreign keys referencing ledger_id to use NUMERIC(38).
Resume CDC.

Prevention: During assessment, check Oracle NUMBER columns for scale=0 (integer semantics) and precision=38 (maximum Oracle integer):

SELECT table_name, column_name, data_precision, data_scale
FROM dba_tab_columns
WHERE data_type = 'NUMBER'
AND data_precision = 38
AND data_scale = 0
AND owner = 'MYAPP';

Map these to PostgreSQL NUMERIC(38), not BIGINT. See Chapter 4 for the full type mapping discussion.

← Previous

Chapter 20Building the Product Team

Appendix BPractical Labs