Chapter 19

Cutover Readiness and the Final Switch

The cutover is the moment the application stops writing to Oracle and starts writing to PostgreSQL. Everything before this moment was preparation. Everything after it is operation on a new system.

The cutover is not an event that happens suddenly. It is the outcome of a readiness process — a series of conditions that must be met, checks that must pass, approvals that must be given, and a procedure that must be executed in a specific order. Done well, a cutover is a 19-minute planned operation during a maintenance window. Done poorly, it is a 4-hour incident with an unclear decision about whether to proceed or roll back.

This chapter covers the complete cutover readiness framework, the go/no-go decision, the cutover procedure, and the rollback decision tree.

19.1 The Readiness Framework

Cutover readiness is the state in which all of the following are true simultaneously:

CDC lag is below threshold. The migration pipeline is not behind — changes in Oracle are reaching PostgreSQL within the configured lag threshold.
Reconciliation SLA gates are passing. All blocking-tier tables have clean reconciliation for the required sustained period.
Zero blocking discrepancies. No RED tables in the reconciliation dashboard.
Schema parity confirmed. The target schema matches the source schema exactly — all DDL has been applied, all schema changes from Oracle during parallel running have been reflected.
Sequence values advanced. All sequence-backed primary keys have been advanced past the maximum current Oracle value plus the safety margin.
Application team confirmation. The application team has tested the PostgreSQL target and confirmed that the application functions correctly.
Rollback pipeline ready. The reverse replication pipeline is configured, tested, and ready to start at cutover.
Maintenance window confirmed. The cutover window, notification of all affected parties, and blackout approvals are in place.

The pipeline enforces items 1–5 as automated gates. Items 6–8 are process gates that require human confirmation before the cutover command is available.

The Preparation That Makes 19 Minutes Possible

The cutover procedure is 19 minutes long. The preparation for it is 18 days. Understanding that relationship is the single most important thing this chapter communicates.

MIGRATION READINESS → CUTOVER TIMELINE

T-18 days   Pipeline starts. Full load complete. CDC capturing live changes.
T-7 days    First reconciliation SLA gate: all blocking-tier tables GREEN.
T-3 days    Zero RED tables for 72 consecutive hours (sustained clean).
T-1 day     Go/no-go checklist reviewed. Rollback pipeline tested.
T-2 hours   Automated gates re-checked. All 6 confirmed GREEN.
T+00        Maintenance window opens. Oracle writes paused.
T+03        CDC drained. Lag = 0 on all 340 tables.
T+08–18     Final reconciliation pass (340 tables, hash on financial tier).
T+19        Application traffic switched to PostgreSQL.
T+90        Rollback pipeline stood down. Migration confirmed complete.

Teams that experience 4-hour cutovers are compressing this preparation. They are trying to do at T+00 what should have happened during the 7 days prior. The 19 minutes at the end are not impressive execution — they are the consequence of not opening the cutover window until the pipeline had proven it was ready.

19.2 The Go/No-Go Checklist

The formal go/no-go checklist is reviewed 2 hours before the cutover window:

GO / NO-GO CHECKLIST
Cutover window: 2026-04-19 02:00–04:00 UTC

AUTOMATED GATES (checked by pipeline at T-15min)
  [ ] CDC lag < 30 seconds for all tables                    Last check: ✓ 18s
  [ ] Reconciliation: 0 RED tables                          Last check: ✓ 0
  [ ] Reconciliation: financial_tables SLA gate passing (24h) Last check: ✓ 28h clean
  [ ] Schema parity: 0 open SCHEMA_DRIFT findings           Last check: ✓ 0
  [ ] Sequence advancement applied: all sequences           Last check: ✓ 5/5
  [ ] Zero ORPHANED_EVENT findings                          Last check: ✓ 0

PROCESS GATES (human confirmation required)
  [ ] Application testing: functional test suite passed on PostgreSQL
      Confirmed by: [application team lead]
  [ ] Performance testing: load test run on PostgreSQL, p99 latency acceptable
      Confirmed by: [performance engineer]
  [ ] Rollback pipeline: tested with 10-minute reverse replication dry run
      Confirmed by: [migration lead]
  [ ] Notification: all stakeholders notified of window
      Confirmed by: [project manager]
  [ ] DBA approval: Oracle DBA confirms source is healthy, no pending issues
      Confirmed by: [Oracle DBA]
  [ ] Business approval: migration sponsor gives final sign-off
      Confirmed by: [CTO / Engineering Director]

GO/NO-GO DECISION:
  [ ] ALL gates pass → GO
  [ ] Any gate fail → NO-GO (document reason, reschedule)

Decision made at: ___________  By: ___________

Product note: The Pulsaride Transform cutover command checks all automated gates before executing. If any automated gate fails at the time of the cutover command, the command is rejected with a specific error message. The process gates are recorded via a confirmation API that the team uses to document approvals.

19.3 The Cutover Procedure

The cutover procedure is a timed sequence of steps. Each step has a defined duration, a success criterion, and a decision point for whether to proceed or roll back.

CUTOVER PROCEDURE — T=0: 02:00 UTC

T+00: 02:00 — Maintenance window opens
  - Announce on operations Slack channel: "Cutover window open"
  - Begin recording (screen capture for incident review if needed)

T+01: 02:01 — Stop application writes to Oracle
  - Change load balancer rule: route 0% of write traffic to Oracle app tier
  - Or: put Oracle into read-only mode (ALTER DATABASE OPEN READ ONLY)
  - Or: application-level write lock (depends on application architecture)
  - VERIFY: Oracle write count drops to 0 in metrics dashboard
  - MAXIMUM DURATION: 2 minutes. If writes don't stop: ABORT cutover.

T+03: 02:03 — Wait for CDC to drain
  - Monitor CDC lag metric: wait for lag to reach 0 (all in-flight events applied)
  - VERIFY: all tables show lag = 0, last applied SCN = current Oracle SCN
  - MAXIMUM DURATION: 5 minutes. If lag does not reach 0: ABORT cutover.

T+08: 02:08 — Final reconciliation pass
  - Run count reconciliation on all 340 tables
  - Run hash reconciliation on blocking-tier tables (financial, customer, account)
  - VERIFY: 0 RED tables, 0 unexpected discrepancies
  - MAXIMUM DURATION: 10 minutes. If reconciliation fails: EVALUATE (see 19.5).

T+18: 02:18 — Advance sequences final time
  - Query Oracle for final sequence values (will not change again — Oracle is read-only)
  - Set PostgreSQL sequences to Oracle values + 1
  - VERIFY: all sequences confirmed advanced

T+20: 02:20 — Start reverse replication
  - Start the PostgreSQL → Oracle reverse replication pipeline
  - VERIFY: first reverse event received within 60 seconds
  - This establishes the rollback capability for the rollback window

T+21: 02:21 — Switch application to PostgreSQL
  - Change application database connection string from Oracle to PostgreSQL
  - Or: update the load balancer backend from Oracle app tier to PostgreSQL app tier
  - VERIFY: first application write to PostgreSQL succeeds
  - VERIFY: PostgreSQL write metrics show incoming traffic

T+22: 02:22 — Smoke test
  - Run automated smoke test suite against the live PostgreSQL target
  - Tests: login, core business operation, data read, data write
  - MAXIMUM DURATION: 3 minutes. If smoke tests fail: ROLLBACK.

T+25: 02:25 — Cutover confirmed
  - Announce on Slack: "Cutover complete. PostgreSQL is live."
  - Begin monitoring period: 30 minutes of heightened alert sensitivity
  - Rollback window: OPEN (will close at T+4h = 06:00 UTC)

19.4 The Lag Threshold Decision

The lag threshold at the start of the drain phase (T+03) is the most time-sensitive decision in the cutover. If CDC lag is not zero after the drain window:

Cause 1: A large transaction committed in Oracle just before the write stop. Resolution: Wait longer (extend drain window by 2 minutes). Re-evaluate.

Cause 2: A CDC gap or processing error. Resolution: Check the event stream for CDC errors. If the remaining events are stuck, ABORT and investigate.

Cause 3: The lag was higher than expected when the write stop occurred. Resolution: If lag is decreasing at a predictable rate, calculate whether it will reach zero within the extended drain window. If yes, wait. If no, ABORT.

The decision rule: if lag is not zero at T+08 (5-minute drain limit) and is not decreasing, ABORT the cutover.

19.5 The Rollback Decision Tree

The rollback decision arises when something goes wrong after T+21 (application switched to PostgreSQL). The decision tree:

Is the application producing errors?
  YES → Is the error rate > configured threshold (e.g., 5% of requests)?
    YES → Are the errors in PostgreSQL writes?
      YES → Is it recoverable without rollback (e.g., a missing index, a query plan issue)?
        YES → Fix it without rollback. Continue.
        NO → Is the rollback window still open?
          YES → INITIATE ROLLBACK (see 19.6)
          NO → Escalate. Rollback is now a business decision.
      NO → Investigate non-write errors. May not require rollback.
    NO → Continue. Low error rate is expected during warm-up.
  NO → Continue monitoring.

Is data appearing corrupted?
  YES → STOP all writes immediately.
        Run emergency reconciliation.
        If confirmed corrupted: INITIATE ROLLBACK.

19.6 The Rollback Procedure

Rollback is the controlled reversal of the cutover. It returns Oracle to the primary role and PostgreSQL to standby.

ROLLBACK PROCEDURE (if initiated during rollback window)

T+00: DECISION MADE — Initiate rollback
  - Announce: "Rolling back. Oracle will resume primary."

T+01: Stop application writes to PostgreSQL
  - Same mechanism as T+01 in cutover, reversed
  - VERIFY: PostgreSQL write count drops to 0

T+02: Apply pending reverse replication events to Oracle
  - The reverse replication pipeline has been capturing PostgreSQL writes
  - Apply all pending events to Oracle
  - VERIFY: Oracle has all post-cutover writes

T+05: Switch application back to Oracle
  - Reverse the connection string / load balancer change
  - VERIFY: application traffic flows to Oracle

T+06: Smoke test on Oracle
  - Run smoke tests against Oracle
  - VERIFY: application functions correctly

T+08: Rollback confirmed
  - Announce: "Rollback complete. Oracle is primary."
  - Stop reverse replication (no longer needed)
  - Schedule migration post-mortem: what went wrong?
  - Do NOT re-attempt cutover within the same 24-hour window

Product note: The rollback procedure requires that the reverse replication pipeline (started at T+20) was successfully recording post-cutover PostgreSQL writes. If the reverse replication pipeline was not started or had not yet confirmed its first event, rollback is still possible but data written to PostgreSQL after cutover (before the rollback) may need to be manually reconciled back to Oracle.

19.7 The Maintenance Window: Planning for Timing

The maintenance window must be sized correctly. A 2-hour window for a cutover that takes 25 minutes sounds comfortable, but:

The smoke test might take longer than expected (5–10 minutes instead of 3)
The lag drain might spike (a batch job ran at T-15 minutes)
The final reconciliation might find a discrepancy that needs evaluation (10–20 minutes)

A tight 30-minute window for a complex migration is insufficient. A realistic minimum is 2 hours for the cutover procedure plus rollback buffer. For migrations with > 500 tables or financial systems with zero-tolerance reconciliation, a 4-hour window is appropriate.

Day and time selection: Cutover during the lowest-traffic period for the business. For a B2B application with US business hours: Saturday 2–6 AM Eastern is typical. For a global e-commerce application: evaluate traffic patterns region by region — there may be no low-traffic period, requiring application-level traffic shaping.

19.8 Example: The Successful 19-Minute Cutover

A financial services application migrating 340 tables, 2 billion rows:

Maintenance window: 2026-04-19 02:00–06:00 UTC (Saturday)
All go/no-go gates passed at T-2h
CDC lag at T-15min: 11 seconds

02:00: Maintenance window open
02:01: Oracle writes stopped. Metrics confirm: 0 writes/second to Oracle within 45s.
02:02: CDC drain in progress. Lag: 11s → 7s → 4s → 2s → 1s → 0s (at 02:04)
02:04: Lag = 0. All 340 tables confirmed. CDC drain complete.
02:04: Final reconciliation started.
02:16: Reconciliation complete. 340 GREEN, 0 YELLOW, 0 RED.
02:16: Sequence advancement executed. 5 sequences advanced.
02:17: Reverse replication started. First event confirmed within 30s.
02:18: Application switched to PostgreSQL (load balancer update).
       First PostgreSQL write confirmed at 02:18:14.
02:18: Smoke tests started.
02:19: Smoke tests PASSED (7/7 tests in 48 seconds).
02:19: Cutover confirmed. PostgreSQL is live.

Duration: 19 minutes.
Rollback window: open until 06:19 UTC.

The team monitors the application for 90 minutes. Error rates are within normal bounds. The rollback window closes without incident. The Oracle instance moves to read-only standby for 30 days (as per the decommission plan), then is decommissioned.

The 19-minute cutover was the outcome of 18 days of pipeline running, 72 hours of parallel running, 7 days of reconciliation passing continuously, and a go/no-go process that confirmed all 12 checklist items. The 19 minutes at the end were the visible tip of the preparation beneath.

The Contrast: The Same Migration Without the Readiness Framework

The same team had attempted this migration 11 months earlier using an assembled approach: AWS DMS for CDC, a custom Python reconciliation script run once the night before cutover, and a manually maintained go/no-go checklist.

The previous attempt:

Reconciliation ran once, at T-16 hours. It passed — counts matched on all tables.
No hash comparison on financial tables. No SLA gate requiring sustained clean state.
CDC lag was 45 seconds at the start of the cutover window. The team judged it acceptable.
At T+1h40: the application team flagged wrong account balances for approximately 2% of records.
At T+2h20: investigation revealed 18,000 rows that had been written to the staging table during the full load overlap window but never promoted. They were not visible in count reconciliation because they were counted in the staging table.
At T+2h40: decision made to roll back. Oracle restored as primary.
Re-migration required 3 additional weeks to design and run continuous reconciliation correctly.

What changed in the second attempt: Continuous reconciliation during parallel running would have detected the staging backlog as ORPHANED_EVENT findings within 6 hours of occurrence. The SLA gate requiring 7 days of zero RED tables would have failed and blocked the cutover window from opening. The problem would have been resolved during parallel running, not discovered at T+2h after go-live.

The 19-minute cutover is not a product claim. It is a reproducible outcome of the readiness framework this chapter describes.

← Previous

Chapter 18Observability and Operator Workflow

Chapter 20Building the Product Team