File-Driven Migration Pipeline

A pulsaride-pipeline.yml at the repo root describes the entire migration lifecycle as a declarative DAG — preflight → migrate → validate → cutover — executed with a single pt pipeline run command. Inspired by GitLab CI.

The problem it solves

Without a pipeline file, a migration requires 5–6 manual commands run in the right order. The operator tracks state in their head. Failures require manual recovery. There is no single artifact that represents "the migration".

With a pipeline file, the entire migration lifecycle is version-controlled, reproducible, and resumable. One command runs it end-to-end. One file reviews in a PR.

Pipeline file

# pulsaride-pipeline.yml
version: "1.0"
stages: [preflight, migrate, validate, cutover]

jobs:
  check-schema:
    stage: preflight
    command: pt preflight customers.yaml --target-url $TGT_URL
    on_failure: abort            # stop everything if schema is incompatible

  migrate-customers:
    stage: migrate
    command: pt migrate run customers.yaml

  migrate-orders:
    stage: migrate
    command: pt migrate run orders.yaml
    needs: [migrate-customers]   # ← table-level ordering

  migrate-order-items:
    stage: migrate
    command: pt migrate run order-items.yaml
    needs: [migrate-orders, migrate-products]

  validate-counts:
    stage: validate
    command: pt validate counts --table customers orders order_items
    needs: [migrate-customers, migrate-orders, migrate-order-items]

  validate-sample:
    stage: validate
    command: pt validate sample --table orders --sample 0.02
    needs: [migrate-orders]

  cutover-gate:
    stage: cutover
    command: pt migrate cutover orders.yaml --verify
    needs: [validate-counts, validate-sample]
    when: manual                 # ← human approval required
    on_failure: abort

Ordering layers

Two ordering concepts exist at different granularities — they are complementary, never mixed:

Keyword	Where	Granularity	Enforced by
`needs:`	pipeline file	Table-level (job must complete)	PipelineDagExecutor
`depends_on:`	transform YAML	Row-level (parent row must exist)	DeferralEngine

Job options

Field	Values	Default	Description
`stage`	any stage name	required	Which stage this job belongs to
`command`	`pt ...`	required	The `pt` CLI command to execute
`needs`	list of job names	`[]`	Jobs that must succeed before this one starts
`when`	`on_success` · `always` · `manual`	`on_success`	When to run. `manual` pauses — resume with `pt pipeline approve`
`on_failure`	`continue` · `abort`	`continue`	`abort` stops the entire pipeline immediately
`retry`	integer	`0`	Max automatic retries on failure

CLI commands

# Generate pipeline file from Oracle schema (FK graph → needs: auto-derived)
pt scaffold --source-url "$SRC_URL" --output pulsaride-pipeline.yml

# Run the full pipeline end-to-end
pt pipeline run pulsaride-pipeline.yml \
  --parallelism 4          # max parallel jobs per stage (default: 4)

# Check current status (ASCII DAG)
pt pipeline status pulsaride-pipeline.yml
#   Stage: preflight
#     ✓  check-schema
#   Stage: migrate
#     ✓  migrate-customers
#     ⏳ migrate-orders  (needs: migrate-customers)
#     ⏸  migrate-order-items  (needs: migrate-orders, migrate-products)
#   Stage: validate
#     ⏸  validate-counts
#   Stage: cutover
#     ⏸  cutover-gate

# Resume after a failure (skips DONE jobs)
pt pipeline resume pulsaride-pipeline.yml \
  --run-id abc123def456 \
  --from migrate           # optional: re-run from this stage

Execution model

Stages run sequentially. All jobs in stage N must reach a terminal state before stage N+1 begins.
Jobs within a stage run in parallel as soon as their needs: are satisfied. Parallelism is bounded by --parallelism (default 4).
Failure propagation: if a job fails, all downstream jobs that needs: it are automatically SKIPPED. Jobs with on_failure: abort stop the pipeline immediately.
Manual gates: jobs with when: manual pause and print an approval prompt. The pipeline waits until pt pipeline approve --job <name> is run (coming in v1.7.0). Until then, manual jobs are skipped.

State persistence and resume

Each pipeline run writes an append-only JSONL file to .pulsaride/pipeline-runs/<run-id>.jsonl. Each line records one job status transition. On resume, completed jobs are skipped — no re-migration of already-written data.

# Pipeline state file (one line per transition)
{"pipeline_run_id":"abc123","job":"migrate-customers","stage":"migrate",
 "status":"DONE","started_at":"2026-04-07T14:00:00Z","finished_at":"2026-04-07T14:12:33Z",
 "exit_code":0,"error":""}
{"pipeline_run_id":"abc123","job":"migrate-orders","stage":"migrate",
 "status":"FAILED","finished_at":"2026-04-07T14:15:01Z","exit_code":1,"error":"exit code 1"}

CI/CD integration

# GitHub Actions — trigger full migration pipeline on push to migration branch
name: Run Migration
on:
  push:
    branches: [migration/oracle-to-pg]

jobs:
  migrate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run Pulsaride pipeline
        run: |
          pt pipeline run pulsaride-pipeline.yml \
            --parallelism 8
        env:
          PT_SRC_PASS: ${{ secrets.SRC_DB_PASS }}
          PT_TGT_PASS: ${{ secrets.TGT_DB_PASS }}

Monitoring in pt serve

While the pipeline runs, open pt serve in a browser. The Migration Health panel shows the cutover confidence verdict, deferred row counters, and root-cause hints — all updating every 10 seconds. The pipeline status tab shows the current DAG state.

Guarantees

Idempotent resume. A job marked DONE is never re-executed on resume. Safe to interrupt and resume at any point.
No double-writes. RunRegistry idempotency guard prevents re-running an already-completed migration job.
Backward compatible. Existing pt migrate run workflows are unchanged — the pipeline is an optional orchestration layer on top.
Fully auditable. The pipeline file + state JSONL + RunRegistry = complete evidence chain for compliance reviews.