File-Driven Migration Pipeline
A pulsaride-pipeline.yml at the repo root describes the entire migration lifecycle as a declarative DAG — preflight → migrate → validate → cutover — executed with a single pt pipeline run command. Inspired by GitLab CI.
The problem it solves
Without a pipeline file, a migration requires 5–6 manual commands run in the right order. The operator tracks state in their head. Failures require manual recovery. There is no single artifact that represents "the migration".
With a pipeline file, the entire migration lifecycle is version-controlled, reproducible, and resumable. One command runs it end-to-end. One file reviews in a PR.
Pipeline file
# pulsaride-pipeline.yml
version: "1.0"
stages: [preflight, migrate, validate, cutover]
jobs:
check-schema:
stage: preflight
command: pt preflight customers.yaml --target-url $TGT_URL
on_failure: abort # stop everything if schema is incompatible
migrate-customers:
stage: migrate
command: pt migrate run customers.yaml
migrate-orders:
stage: migrate
command: pt migrate run orders.yaml
needs: [migrate-customers] # ← table-level ordering
migrate-order-items:
stage: migrate
command: pt migrate run order-items.yaml
needs: [migrate-orders, migrate-products]
validate-counts:
stage: validate
command: pt validate counts --table customers orders order_items
needs: [migrate-customers, migrate-orders, migrate-order-items]
validate-sample:
stage: validate
command: pt validate sample --table orders --sample 0.02
needs: [migrate-orders]
cutover-gate:
stage: cutover
command: pt migrate cutover orders.yaml --verify
needs: [validate-counts, validate-sample]
when: manual # ← human approval required
on_failure: abortOrdering layers
Two ordering concepts exist at different granularities — they are complementary, never mixed:
| Keyword | Where | Granularity | Enforced by |
|---|---|---|---|
needs: | pipeline file | Table-level (job must complete) | PipelineDagExecutor |
depends_on: | transform YAML | Row-level (parent row must exist) | DeferralEngine |
Job options
| Field | Values | Default | Description |
|---|---|---|---|
stage | any stage name | required | Which stage this job belongs to |
command | pt ... | required | The pt CLI command to execute |
needs | list of job names | [] | Jobs that must succeed before this one starts |
when | on_success · always · manual | on_success | When to run. manual pauses — resume with pt pipeline approve |
on_failure | continue · abort | continue | abort stops the entire pipeline immediately |
retry | integer | 0 | Max automatic retries on failure |
CLI commands
# Generate pipeline file from Oracle schema (FK graph → needs: auto-derived) pt scaffold --source-url "$SRC_URL" --output pulsaride-pipeline.yml # Run the full pipeline end-to-end pt pipeline run pulsaride-pipeline.yml \ --parallelism 4 # max parallel jobs per stage (default: 4) # Check current status (ASCII DAG) pt pipeline status pulsaride-pipeline.yml # Stage: preflight # ✓ check-schema # Stage: migrate # ✓ migrate-customers # ⏳ migrate-orders (needs: migrate-customers) # ⏸ migrate-order-items (needs: migrate-orders, migrate-products) # Stage: validate # ⏸ validate-counts # Stage: cutover # ⏸ cutover-gate # Resume after a failure (skips DONE jobs) pt pipeline resume pulsaride-pipeline.yml \ --run-id abc123def456 \ --from migrate # optional: re-run from this stage
Execution model
- Stages run sequentially. All jobs in stage N must reach a terminal state before stage N+1 begins.
- Jobs within a stage run in parallel as soon as their
needs:are satisfied. Parallelism is bounded by--parallelism(default 4). - Failure propagation: if a job fails, all downstream jobs that
needs:it are automaticallySKIPPED. Jobs withon_failure: abortstop the pipeline immediately. - Manual gates: jobs with
when: manualpause and print an approval prompt. The pipeline waits untilpt pipeline approve --job <name>is run (coming in v1.7.0). Until then, manual jobs are skipped.
State persistence and resume
Each pipeline run writes an append-only JSONL file to .pulsaride/pipeline-runs/<run-id>.jsonl. Each line records one job status transition. On resume, completed jobs are skipped — no re-migration of already-written data.
# Pipeline state file (one line per transition)
{"pipeline_run_id":"abc123","job":"migrate-customers","stage":"migrate",
"status":"DONE","started_at":"2026-04-07T14:00:00Z","finished_at":"2026-04-07T14:12:33Z",
"exit_code":0,"error":""}
{"pipeline_run_id":"abc123","job":"migrate-orders","stage":"migrate",
"status":"FAILED","finished_at":"2026-04-07T14:15:01Z","exit_code":1,"error":"exit code 1"}CI/CD integration
# GitHub Actions — trigger full migration pipeline on push to migration branch
name: Run Migration
on:
push:
branches: [migration/oracle-to-pg]
jobs:
migrate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Pulsaride pipeline
run: |
pt pipeline run pulsaride-pipeline.yml \
--parallelism 8
env:
PT_SRC_PASS: ${{ secrets.SRC_DB_PASS }}
PT_TGT_PASS: ${{ secrets.TGT_DB_PASS }}Monitoring in pt serve
While the pipeline runs, open pt serve in a browser. The Migration Health panel shows the cutover confidence verdict, deferred row counters, and root-cause hints — all updating every 10 seconds. The pipeline status tab shows the current DAG state.
Guarantees
- Idempotent resume. A job marked DONE is never re-executed on resume. Safe to interrupt and resume at any point.
- No double-writes. RunRegistry idempotency guard prevents re-running an already-completed migration job.
- Backward compatible. Existing
pt migrate runworkflows are unchanged — the pipeline is an optional orchestration layer on top. - Fully auditable. The pipeline file + state JSONL + RunRegistry = complete evidence chain for compliance reviews.