Appendix B

Appendix B: Practical Labs

These six labs are designed to be run against a local Oracle (Oracle 19c XE or Oracle 21c) and PostgreSQL (14+) instance. Each lab covers a specific migration scenario with setup SQL, step-by-step instructions, expected output, and verification queries.

Prerequisites: Docker Desktop installed. Oracle XE and PostgreSQL images available.

Lab 1: Empty String Semantics

Objective: Observe Oracle empty string = NULL behavior, set up a migration with normalization, and verify correct handling in PostgreSQL.

Setup

Start Oracle XE and PostgreSQL:

docker run -d --name oracle-xe -p 1521:1521 container-registry.oracle.com/database/express:21.3.0-xe
docker run -d --name pg15 -p 5432:5432 -e POSTGRES_PASSWORD=pgpass postgres:15

Connect to Oracle and create the test schema:

-- Oracle: create test table and data
CREATE USER lab_user IDENTIFIED BY labpass;
GRANT CONNECT, RESOURCE TO lab_user;

CONNECT lab_user/labpass

CREATE TABLE customer (
  customer_id NUMBER PRIMARY KEY,
  name        VARCHAR2(100) NOT NULL,
  email       VARCHAR2(200),
  code        VARCHAR2(20)
);

-- Insert: some with NULL, some with empty string, some with value
INSERT INTO customer VALUES (1, 'Alice',   'alice@example.com', 'C001');
INSERT INTO customer VALUES (2, 'Bob',     '',                  '');
INSERT INTO customer VALUES (3, 'Charlie', NULL,                NULL);
INSERT INTO customer VALUES (4, 'Diana',   '',                  'C004');
COMMIT;

Observation Step

-- Oracle: what does NULL mean for empty strings?
SELECT customer_id, name,
  CASE WHEN email IS NULL THEN 'IS NULL' ELSE 'NOT NULL' END AS email_null,
  CASE WHEN email = '' THEN 'IS EMPTY' ELSE 'NOT EMPTY' END AS email_empty,
  email
FROM customer;

Expected output:

1  Alice    NOT NULL  NOT EMPTY  alice@example.com
2  Bob      IS NULL   NOT EMPTY  (null)
3  Charlie  IS NULL   NOT EMPTY  (null)
4  Diana    IS NULL   NOT EMPTY  (null)

All empty strings are returned as NULL by Oracle. Rows 2, 3, and 4 all show IS NULL.

PostgreSQL Migration

-- PostgreSQL: same table
CREATE TABLE customer (
  customer_id INTEGER PRIMARY KEY,
  name        TEXT NOT NULL,
  email       TEXT,
  code        TEXT
);

-- Load without normalization (as Oracle delivers values — all empty strings are NULL)
INSERT INTO customer VALUES (1, 'Alice',   'alice@example.com', 'C001');
INSERT INTO customer VALUES (2, 'Bob',     NULL,                NULL);
INSERT INTO customer VALUES (3, 'Charlie', NULL,                NULL);
INSERT INTO customer VALUES (4, 'Diana',   NULL,                'C004');

Verification

-- PostgreSQL: verify null counts match Oracle
SELECT COUNT(*) FILTER (WHERE email IS NULL) AS null_email_pg FROM customer;
-- Returns: 3 (matches Oracle)

SELECT COUNT(*) FILTER (WHERE email = '') AS empty_email_pg FROM customer;
-- Returns: 0 (no empty strings — correct, Oracle never had them)

Adding a New Row After Migration

-- PostgreSQL: application inserts an empty email
INSERT INTO customer VALUES (5, 'Eve', '', 'C005');

-- Now check:
SELECT customer_id, email,
  CASE WHEN email IS NULL THEN 'NULL' ELSE email END AS email_display
FROM customer WHERE customer_id = 5;
-- Returns: 5, '', '' ← empty string preserved! Different from Oracle behavior.

This demonstrates why a normalizeEmpty rule must also apply to new inserts. The application must be updated to treat empty strings as NULL before inserting.

Lab 2: Sequence Advancement After Full Load

Objective: Observe the duplicate key failure from sequence drift and verify the sequence advancement fix.

Setup

-- Oracle
CREATE SEQUENCE customer_seq START WITH 1 INCREMENT BY 1 NOCACHE;

CREATE TABLE customer (
  customer_id NUMBER DEFAULT customer_seq.NEXTVAL PRIMARY KEY,
  name VARCHAR2(100)
);

-- Simulate existing data: 1000 customers
BEGIN
  FOR i IN 1..1000 LOOP
    INSERT INTO customer (name) VALUES ('Customer ' || i);
  END LOOP;
  COMMIT;
END;

Full Load Simulation

-- PostgreSQL: create table and sequence (starting at 1 — the mistake)
CREATE SEQUENCE customer_seq START WITH 1;

CREATE TABLE customer (
  customer_id INTEGER PRIMARY KEY,
  name TEXT
);

-- Simulate full load: copy all 1000 rows with their Oracle IDs
INSERT INTO customer
SELECT level, 'Customer ' || level
FROM dual CONNECT BY LEVEL <= 1000;
-- (simulate in PostgreSQL by running the INSERT manually)

-- PostgreSQL: insert the 1000 existing rows
INSERT INTO customer
SELECT generate_series(1, 1000), 'Customer ' || generate_series;

Demonstrate the Failure

-- PostgreSQL: try to insert a new customer using the sequence
INSERT INTO customer (customer_id, name)
VALUES (nextval('customer_seq'), 'New Customer');
-- ERROR: duplicate key value violates unique constraint "customer_pkey"
-- DETAIL: Key (customer_id)=(1) already exists.

The sequence is still at 1, but customer_id=1 already exists. Failure.

Apply the Fix

-- Check current max
SELECT MAX(customer_id) FROM customer;
-- Returns: 1000

-- Advance the sequence past the max + safety margin
SELECT setval('customer_seq', 11000);  -- 1000 + 10000 safety margin

-- Try insert again
INSERT INTO customer (customer_id, name)
VALUES (nextval('customer_seq'), 'New Customer');
-- Succeeds: customer_id = 11001

SELECT * FROM customer WHERE customer_id > 10999;
-- Returns: 11001, New Customer

Lab 3: CHAR Padding and RTRIM

Objective: Observe CHAR padding behavior and verify RTRIM normalization.

Setup

-- Oracle
CREATE TABLE product_code (
  code    CHAR(10) PRIMARY KEY,
  product VARCHAR2(100)
);

INSERT INTO product_code VALUES ('ABC',  'Alpha Product');
INSERT INTO product_code VALUES ('XYZ',  'Beta Product');
INSERT INTO product_code VALUES ('A1B2', 'Gamma Product');
COMMIT;

Observe Oracle CHAR Behavior

-- Oracle: check actual stored length
SELECT code, LENGTH(code) AS len, DUMP(code) AS bytes
FROM product_code;

Expected:

ABC       | 10 | Typ=96 Len=10: 65,66,67,32,32,32,32,32,32,32
XYZ       | 10 | 88,89,90,32,32,32,32,32,32,32
A1B2      | 10 | 65,49,66,50,32,32,32,32,32,32

All values are padded to length 10.

PostgreSQL Migration (Without RTRIM — Incorrect)

-- PostgreSQL: store as TEXT
CREATE TABLE product_code (
  code    TEXT PRIMARY KEY,
  product TEXT
);

-- Load without normalization
INSERT INTO product_code VALUES ('ABC       ', 'Alpha Product');
INSERT INTO product_code VALUES ('XYZ       ', 'Beta Product');
INSERT INTO product_code VALUES ('A1B2      ', 'Gamma Product');

-- Problem: application queries with unpadded code
SELECT * FROM product_code WHERE code = 'ABC';
-- Returns: 0 rows (PostgreSQL is exact-match — 'ABC' != 'ABC       ')

PostgreSQL Migration (With RTRIM — Correct)

-- Truncate and reload with RTRIM
TRUNCATE product_code;

INSERT INTO product_code VALUES (RTRIM('ABC       '), 'Alpha Product');
INSERT INTO product_code VALUES (RTRIM('XYZ       '), 'Beta Product');
INSERT INTO product_code VALUES (RTRIM('A1B2      '), 'Gamma Product');

-- Now the query works
SELECT * FROM product_code WHERE code = 'ABC';
-- Returns: ABC, Alpha Product ✓

Lab 4: Full Load Checkpoint and Restart

Objective: Simulate a full load with checkpointing, introduce a failure, and verify restart from checkpoint.

Setup

-- PostgreSQL: create the target table and staging ledger
CREATE TABLE transaction (
  transaction_id BIGINT PRIMARY KEY,
  account_id     BIGINT,
  amount         NUMERIC(12,2),
  created_at     TIMESTAMP WITH TIME ZONE
);

CREATE TABLE _pulsaride_full_load_status (
  table_name     TEXT NOT NULL,
  chunk_id       BIGINT NOT NULL,
  min_key        BIGINT,
  max_key        BIGINT,
  rows_loaded    BIGINT NOT NULL DEFAULT 0,
  rows_expected  BIGINT NOT NULL DEFAULT 0,
  status         TEXT NOT NULL DEFAULT 'PENDING',
  completed_at   TIMESTAMP WITH TIME ZONE,
  PRIMARY KEY (table_name, chunk_id)
);

-- Plan 5 chunks of 1000 rows each (5000 total rows)
INSERT INTO _pulsaride_full_load_status
  (table_name, chunk_id, min_key, max_key, rows_expected, status)
VALUES
  ('transaction', 1, 1, 1000, 1000, 'PENDING'),
  ('transaction', 2, 1001, 2000, 1000, 'PENDING'),
  ('transaction', 3, 2001, 3000, 1000, 'PENDING'),
  ('transaction', 4, 3001, 4000, 1000, 'PENDING'),
  ('transaction', 5, 4001, 5000, 1000, 'PENDING');

Simulate Chunks 1–2 Completing

-- Load chunks 1 and 2
INSERT INTO transaction
SELECT generate_series(1, 2000), 10001, (random() * 1000)::NUMERIC(12,2), NOW();

-- Mark complete
UPDATE _pulsaride_full_load_status
SET status = 'COMPLETE', rows_loaded = 1000, completed_at = NOW()
WHERE table_name = 'transaction' AND chunk_id IN (1, 2);

Simulate a Partial Write on Chunk 3 (Failure)

-- Load only 600 of 1000 rows (partial write)
INSERT INTO transaction
SELECT generate_series(2001, 2600), 10002, (random() * 1000)::NUMERIC(12,2), NOW();

-- Mark as complete (bug: marked complete before all rows loaded)
UPDATE _pulsaride_full_load_status
SET status = 'COMPLETE', rows_loaded = 600, completed_at = NOW()
WHERE table_name = 'transaction' AND chunk_id = 3;

Restart Detection

-- Detect partial writes at restart
SELECT chunk_id, status, rows_loaded, rows_expected,
  rows_loaded - rows_expected AS delta
FROM _pulsaride_full_load_status
WHERE table_name = 'transaction'
AND status = 'COMPLETE'
AND rows_loaded != rows_expected;
-- Returns: chunk_id=3, rows_loaded=600, rows_expected=1000, delta=-400

Recovery

-- Delete partial data from chunk 3
DELETE FROM transaction WHERE transaction_id BETWEEN 2001 AND 3000;

-- Reset chunk 3 to PENDING
UPDATE _pulsaride_full_load_status
SET status = 'PENDING', rows_loaded = 0, completed_at = NULL
WHERE table_name = 'transaction' AND chunk_id = 3;

-- Reload chunk 3 correctly
INSERT INTO transaction
SELECT generate_series(2001, 3000), 10002, (random() * 1000)::NUMERIC(12,2), NOW();

UPDATE _pulsaride_full_load_status
SET status = 'COMPLETE', rows_loaded = 1000, completed_at = NOW()
WHERE table_name = 'transaction' AND chunk_id = 3;

-- Verify: all 3 loaded chunks have correct row counts
SELECT chunk_id, status, rows_loaded, rows_expected FROM _pulsaride_full_load_status
WHERE table_name = 'transaction' AND status = 'COMPLETE';
-- Returns: 3 rows, all with rows_loaded = rows_expected ✓

Lab 5: Count and Hash Reconciliation

Objective: Run manual count and hash reconciliation and observe a discrepancy.

Setup

-- Oracle: source table
CREATE TABLE account (
  account_id NUMBER PRIMARY KEY,
  user_name  VARCHAR2(100),
  balance    NUMBER(12,2),
  status     VARCHAR2(20)
);

INSERT INTO account VALUES (1, 'Alice',   1500.00, 'ACTIVE');
INSERT INTO account VALUES (2, 'Bob',     750.00,  'ACTIVE');
INSERT INTO account VALUES (3, 'Charlie', 2250.00, 'INACTIVE');
INSERT INTO account VALUES (4, 'Diana',   0.00,    'ACTIVE');
COMMIT;

-- PostgreSQL: target table with a deliberate discrepancy
CREATE TABLE account (
  account_id INTEGER PRIMARY KEY,
  user_name  TEXT,
  balance    NUMERIC(12,2),
  status     TEXT
);

INSERT INTO account VALUES (1, 'Alice',   1499.99, 'ACTIVE');   -- wrong balance!
INSERT INTO account VALUES (2, 'Bob',     750.00,  'ACTIVE');
INSERT INTO account VALUES (3, 'Charlie', 2250.00, 'INACTIVE');
INSERT INTO account VALUES (4, 'Diana',   0.00,    'ACTIVE');

Count Reconciliation

-- Oracle
SELECT COUNT(*) FROM account;
-- Returns: 4

-- PostgreSQL
SELECT COUNT(*) FROM account;
-- Returns: 4

-- Count match: PASS

Count reconciliation does not detect the $0.01 balance error.

Hash Reconciliation

-- Oracle: row-level hash
SELECT account_id,
  ORA_HASH(
    account_id || '|' || user_name || '|' ||
    TO_CHAR(balance, 'FM9999999999.99') || '|' || status
  ) AS row_hash
FROM account ORDER BY account_id;

-- PostgreSQL: equivalent hash
SELECT account_id,
  MD5(
    account_id::TEXT || '|' || user_name || '|' ||
    TO_CHAR(balance, 'FM9999999999.99') || '|' || status
  ) AS row_hash
FROM account ORDER BY account_id;

Compare hash values row by row. account_id=1 will produce different hashes (due to the balance discrepancy). This identifies the exact row with the error.

Verification Query

-- PostgreSQL: verify the discrepancy
SELECT 'Oracle: 1500.00' AS source, 'PostgreSQL: 1499.99' AS target,
  1500.00 - 1499.99 AS delta;
-- Returns: delta = 0.01

-- Apply compensating write
UPDATE account SET balance = 1500.00 WHERE account_id = 1;

-- Re-run hash reconciliation: now all hashes match.

Lab 6: CDC Deduplication with UPSERT

Objective: Simulate CDC event deduplication using idempotent UPSERT.

Setup

-- PostgreSQL: target table
CREATE TABLE account (
  account_id INTEGER PRIMARY KEY,
  user_name  TEXT,
  balance    NUMERIC(12,2),
  updated_at TIMESTAMP WITH TIME ZONE
);

INSERT INTO account VALUES (1, 'Alice', 1500.00, '2026-04-15 10:00:00+00');
INSERT INTO account VALUES (2, 'Bob',   750.00,  '2026-04-15 09:00:00+00');

Simulate CDC Events (including a re-delivered event)

-- CDC events to apply (in order):
-- Event 1: UPDATE account_id=1, balance=1600.00, ts=10:05
-- Event 2: UPDATE account_id=1, balance=1600.00, ts=10:05 (re-delivered — duplicate)
-- Event 3: UPDATE account_id=1, balance=1700.00, ts=10:10
-- Event 4: UPDATE account_id=2, balance=800.00, ts=09:30

-- Apply Event 1: UPSERT
INSERT INTO account (account_id, user_name, balance, updated_at)
VALUES (1, 'Alice', 1600.00, '2026-04-15 10:05:00+00')
ON CONFLICT (account_id) DO UPDATE SET
  balance = EXCLUDED.balance,
  updated_at = EXCLUDED.updated_at
WHERE account.updated_at <= EXCLUDED.updated_at;
-- account_id=1: balance=1600.00 ✓

-- Apply Event 2: same event re-delivered
INSERT INTO account (account_id, user_name, balance, updated_at)
VALUES (1, 'Alice', 1600.00, '2026-04-15 10:05:00+00')
ON CONFLICT (account_id) DO UPDATE SET
  balance = EXCLUDED.balance,
  updated_at = EXCLUDED.updated_at
WHERE account.updated_at <= EXCLUDED.updated_at;
-- Idempotent: same timestamp, no change. account_id=1 unchanged ✓

-- Apply Event 3: newer event
INSERT INTO account (account_id, user_name, balance, updated_at)
VALUES (1, 'Alice', 1700.00, '2026-04-15 10:10:00+00')
ON CONFLICT (account_id) DO UPDATE SET
  balance = EXCLUDED.balance,
  updated_at = EXCLUDED.updated_at
WHERE account.updated_at <= EXCLUDED.updated_at;
-- account_id=1: balance=1700.00 ✓

-- Simulate: re-deliver Event 1 AFTER Event 3 (out-of-order replay)
INSERT INTO account (account_id, user_name, balance, updated_at)
VALUES (1, 'Alice', 1600.00, '2026-04-15 10:05:00+00')
ON CONFLICT (account_id) DO UPDATE SET
  balance = EXCLUDED.balance,
  updated_at = EXCLUDED.updated_at
WHERE account.updated_at <= EXCLUDED.updated_at;
-- Older timestamp: WHERE condition false, no update. balance remains 1700.00 ✓

Verification

SELECT account_id, balance, updated_at FROM account;
-- Returns:
-- 1, 1700.00, 2026-04-15 10:10:00+00  ← correct (newer event wins)
-- 2, 750.00,  2026-04-15 09:00:00+00  ← Event 4 not yet applied

Apply Event 4 and verify:

INSERT INTO account (account_id, user_name, balance, updated_at)
VALUES (2, 'Bob', 800.00, '2026-04-15 09:30:00+00')
ON CONFLICT (account_id) DO UPDATE SET
  balance = EXCLUDED.balance,
  updated_at = EXCLUDED.updated_at
WHERE account.updated_at <= EXCLUDED.updated_at;

SELECT * FROM account;
-- 1, Alice, 1700.00, 10:10
-- 2, Bob,    800.00, 09:30  ✓

The UPSERT pattern correctly handles duplicates, out-of-order events, and re-deliveries without data corruption.

← Previous

Appendix AEdge-Case Playbook

Appendix COperational Checklists