Chapter 2: Reading the Source System Before Touching It
Most migration failures begin before the first row is copied.
They begin when a team decides it already understands the source system.
That assumption usually looks harmless. A few schemas are identified. Some tables are classified as "business-critical." A conversion tool is run against DDL. A proof of concept succeeds on a narrow subset. Then the real migration begins and the hidden shape of the source system starts asserting itself:
- a trigger writes into an audit table nobody inventoried
- a sequence is used by three applications, not one
- a package contains implicit key-generation logic
- a child table is hotter than its parent
- a CDC topic is dominated by one table that was classified as secondary
- two foreign keys form a cycle the team only notices during replay
This chapter argues for a simple rule: before designing the target, read the source as if it were an unknown production system, not as if it were a neat schema model.
Pulsaride Transform treats source discovery as a first-class stage of the migration product because almost every later decision depends on it:
- data type mapping
- transform rules
- load order
- CDC strategy
- replay logic
- validation strategy
- cutover readiness
The goal is not to collect metadata for its own sake. The goal is to reduce surprise. Every object you fail to inventory early becomes an outage candidate later.
2.1 Inventorying schemas, tables, views, packages, triggers, jobs, and sequences
The first deliverable of a migration program should not be converted DDL. It should be an inventory that answers three questions:
- What exists?
- What matters operationally?
- What hides behavior rather than just structure?
Start with object families, not tables only
Teams often inventory tables and stop there. That is not enough.
For Oracle -> PostgreSQL migration, the minimum source inventory should cover:
- schemas and owners
- tables and partitions
- views and materialized views
- primary keys, unique constraints, foreign keys, and check constraints
- indexes
- sequences
- triggers
- packages, procedures, and functions
- scheduler jobs
- synonyms
- database links if they influence runtime behavior
If your inventory does not include packages, triggers, jobs, and sequences, it is not a migration inventory. It is only a storage inventory.
Capture both metadata and operational hints
A useful inventory does not stop at object names. For each object, capture attributes that drive migration design:
- object type
- owner
- creation and last DDL time
- row count estimate
- average row length
- presence of LOBs
- partitioning
- dependencies
- whether the object is read-only, write-heavy, or append-only
- whether application logic depends on it synchronously
For example, a table with 50,000 rows may still be high-risk if:
- it sits on a hot transaction path
- it feeds many child tables
- it is populated by triggers
- it is used to allocate keys or statuses
Practical Oracle inventory queries
Start with object counts by schema:
SELECT owner, object_type, COUNT(*) AS object_count
FROM all_objects
WHERE owner IN ('APP_SCHEMA')
GROUP BY owner, object_type
ORDER BY owner, object_type;
Inventory tables with approximate volume:
SELECT owner,
table_name,
num_rows,
blocks,
avg_row_len,
partitioned,
temporary
FROM all_tables
WHERE owner = 'APP_SCHEMA'
ORDER BY num_rows DESC NULLS LAST;
Inventory sequences:
SELECT sequence_owner,
sequence_name,
min_value,
max_value,
increment_by,
cache_size,
last_number
FROM all_sequences
WHERE sequence_owner = 'APP_SCHEMA'
ORDER BY sequence_name;
Inventory triggers:
SELECT owner,
trigger_name,
table_name,
triggering_event,
trigger_type,
status
FROM all_triggers
WHERE owner = 'APP_SCHEMA'
ORDER BY table_name, trigger_name;
Inventory PL/SQL objects:
SELECT owner,
object_name,
object_type,
status
FROM all_objects
WHERE owner = 'APP_SCHEMA'
AND object_type IN ('PACKAGE', 'PACKAGE BODY', 'PROCEDURE', 'FUNCTION')
ORDER BY object_type, object_name;
Inventory scheduler jobs:
SELECT owner,
job_name,
enabled,
state,
job_action,
repeat_interval
FROM all_scheduler_jobs
WHERE owner = 'APP_SCHEMA'
ORDER BY job_name;
What to do with the results
Do not dump this into a spreadsheet and move on. Use it to classify objects into product-relevant buckets:
- automatically convertible
- structurally convertible but behaviorally risky
- requires declarative transform rules
- requires procedural rewrite
- requires manual treatment or phased migration
This is where Pulsaride Transform becomes useful in a repeatable way. The inventory is not just documentation. It is the input to a migration pipeline.
Why this matters for CDC and reconciliation
The reconciliation-oriented architecture we examined already assumes that some tables will behave differently in migration:
- some have parent-child dependencies
- some depend on reference data already present in target
- some are likely to generate waiting states
- some will dominate change volume
If the source inventory misses those facts, the CDC design is blind from the first day.
2.2 Detecting unsupported Oracle patterns before migration starts
The worst time to discover an unsupported Oracle pattern is during replay, cutover rehearsal, or production switchover.
The second-worst time is after code generation has already baked bad assumptions into the target pipeline.
Unsupported or hard-to-port patterns must be detected before transform rules are finalized.
Unsupported does not always mean impossible
There are three categories:
- Directly portable Straightforward table, index, and constraint patterns with known PostgreSQL equivalents.
- Portable with redesign Objects or behaviors that can move, but not mechanically. Sequences, packages, triggers, and some index types usually land here.
- Operationally unsupported Patterns that should block the migration plan until the team decides on a replacement strategy.
The real discipline is not technical purity. It is explicitness.
Patterns that should trigger early review
At minimum, flag:
- packages with business logic
- triggers that populate columns or cascade writes
- autonomous transactions
- dynamic SQL in PL/SQL
- database jobs that mutate business tables
- materialized views relied on by the application
- function-based indexes tied to Oracle-specific functions
- bitmap indexes on write-heavy tables
- synonyms that hide ownership and coupling
- LOB-heavy tables
- tables without stable primary keys
- surrogate keys populated outside the application
Each of these patterns increases the odds that a schema-only migration story is incomplete.
Example: a harmless trigger that is not harmless
Suppose a table INVOICE looks simple:
- primary key
- a few numeric and date columns
- normal indexes
A schema conversion tool will happily recreate it in PostgreSQL.
But if Oracle has a BEFORE INSERT trigger that:
- reads from a sequence
- writes an audit row
- normalizes a status field
- calls a package function
then the table does not behave like its DDL suggests. If you miss that trigger in assessment, the target system may accept data but still fail business invariants.
Practical detection queries
Find invalid or nontrivial PL/SQL objects:
SELECT owner, object_name, object_type, status
FROM all_objects
WHERE owner = 'APP_SCHEMA'
AND object_type IN ('PACKAGE', 'PACKAGE BODY', 'PROCEDURE', 'FUNCTION', 'TRIGGER')
ORDER BY status, object_type, object_name;
Search for Oracle-specific constructs in source code:
SELECT owner, name, type, line, text
FROM all_source
WHERE owner = 'APP_SCHEMA'
AND (
UPPER(text) LIKE '%PRAGMA AUTONOMOUS_TRANSACTION%'
OR UPPER(text) LIKE '%EXECUTE IMMEDIATE%'
OR UPPER(text) LIKE '%DBMS_JOB%'
OR UPPER(text) LIKE '%DBMS_SCHEDULER%'
OR UPPER(text) LIKE '%ROWNUM%'
OR UPPER(text) LIKE '%CONNECT BY%'
)
ORDER BY name, line;
Find tables without primary keys:
SELECT t.owner, t.table_name
FROM all_tables t
LEFT JOIN all_constraints c
ON c.owner = t.owner
AND c.table_name = t.table_name
AND c.constraint_type = 'P'
WHERE t.owner = 'APP_SCHEMA'
AND c.constraint_name IS NULL
ORDER BY t.table_name;
Product implication
A complete product must surface these findings as migration blockers or design decisions, not as buried assessment notes.
For Pulsaride Transform, this means the assessment stage should produce explicit outcomes such as:
- "table can be auto-mapped"
- "table requires sequence handling"
- "table requires trigger behavior replacement"
- "object excluded from wave 1"
- "requires two-pass migration"
- "requires manual procedural rewrite"
That is how unsupported patterns stop being surprises and become planned work.
2.3 Building a dependency graph from foreign keys and application flows
A source system is not just a set of tables. It is a graph.
This graph matters because migration order, CDC replay order, reconciliation interpretation, and cutover readiness all depend on it.
Foreign keys are necessary, but not sufficient
Most teams start with database foreign keys. That is correct, but incomplete.
The dependency graph should include two layers:
- structural dependencies Derived from primary key / foreign key relationships
- application-flow dependencies Derived from how the application creates and uses records in practice
Why both?
Because real systems often contain:
- missing foreign keys in the physical schema
- optional FKs that are mandatory in business practice
- denormalized tables that depend on upstream events without formal constraints
- reference tables loaded out of band
Why the graph matters to the product
The reconciliation-centered CDC architecture we reviewed already points to this need:
- rows may enter
WAITING_DEPwhen parent entities are not yet available - rows may enter
WAITING_REFwhen reference data is assumed to exist in target - replay logic depends on knowing which writes unblock which pending records
Without a dependency graph, these behaviors become ad hoc conditionals. With a dependency graph, they become declarative product behavior.
Building the structural graph
Extract foreign key edges first:
SELECT a.owner,
a.table_name,
a.constraint_name,
a.r_constraint_name,
b.table_name AS referenced_table
FROM all_constraints a
JOIN all_constraints b
ON a.r_constraint_name = b.constraint_name
AND a.owner = b.owner
WHERE a.owner = 'APP_SCHEMA'
AND a.constraint_type = 'R'
ORDER BY a.table_name, b.table_name;
Then extract column-level edges:
SELECT acc.owner,
acc.table_name,
acc.column_name,
ac_r.table_name AS referenced_table
FROM all_cons_columns acc
JOIN all_constraints ac
ON ac.owner = acc.owner
AND ac.constraint_name = acc.constraint_name
JOIN all_constraints ac_r
ON ac.owner = ac_r.owner
AND ac.r_constraint_name = ac_r.constraint_name
WHERE ac.owner = 'APP_SCHEMA'
AND ac.constraint_type = 'R'
ORDER BY acc.table_name, acc.position;
Adding application-flow edges
This is where assessment must leave the comfort of metadata.
Interview the application behavior and transaction flow:
- Which tables are created in the same user action?
- Which rows can legitimately arrive before the parent?
- Which lookups are expected to be preloaded?
- Which tables are event sinks rather than masters?
- Which writes are idempotent and which are not?
For example:
ORDER_LINEdepends structurally onORDERORDERmay depend structurally onCUSTOMERORDERmay also depend operationally on a reference domain such asORDER_STATUS- an audit table may depend on
ORDERby application flow even if no FK exists
Turn the graph into product rules
The product should not stop at drawing the graph. It should derive behavior from it:
- load ordering for full load
- dependency rules for CDC resolution
- replay triggers for child rows
- validation grouping by business domain
This is exactly where Pulsaride Transform should remain declarative. The graph should feed the engine, not just the documentation.
2.4 Spotting high-risk tables: hot tables, wide rows, LOB-heavy tables, audit tables
Not all tables deserve equal treatment.
One of the fastest ways to make a migration brittle is to assume that every table fits the same extraction, CDC, replay, and validation pattern.
High-risk tables need special handling early.
Hot tables
A hot table is not necessarily the largest table. It is the table that changes often enough to shape CDC pressure, replay debt, and cutover risk.
Typical examples:
- order headers
- order lines
- payment records
- session or state tables
- operational logs tied to business workflows
Hot tables matter because they influence:
- Kafka partition skew
- backlog growth
- replay windows
- lock contention in target
- how meaningful lag metrics are
In the CDC design material we reviewed, one of the explicit operational concerns was hot partitions and the possibility of moving from a single topic to multiple topics when one table dominates the stream. That concern only exists if the assessment phase identifies hot tables early.
Wide rows
Wide rows hurt more than storage. They affect:
- extraction bandwidth
- staging storage
- serialization cost
- WAL volume in PostgreSQL
- checksum cost
- replay throughput
An update to one business field can end up rewriting a heavy row image repeatedly if the CDC payload is after-image oriented.
Inventory wide tables with:
SELECT owner,
table_name,
avg_row_len,
num_rows
FROM all_tables
WHERE owner = 'APP_SCHEMA'
ORDER BY avg_row_len DESC NULLS LAST;
LOB-heavy tables
LOB-heavy tables deserve explicit classification because they often break optimistic throughput assumptions.
Watch for:
BLOBCLOBNCLOB- XML payloads
- document or attachment tables
Find them with:
SELECT owner, table_name, column_name, data_type
FROM all_tab_columns
WHERE owner = 'APP_SCHEMA'
AND data_type IN ('BLOB', 'CLOB', 'NCLOB', 'LONG', 'LONG RAW')
ORDER BY table_name, column_name;
These tables may require:
- separate load strategy
- lower batch size
- different validation strategy
- cutover isolation
Audit and history tables
Audit tables are deceptively tricky. They are often:
- large
- append-only
- low business value for day-one cutover
- high operational value for compliance and traceability
Treating them like ordinary transactional tables can waste migration effort and distort CDC capacity planning.
A product should be able to classify some tables as:
- migrate in later wave
- backfill only
- summarize rather than replicate
- exclude from near-real-time cutover path
Why high-risk classification should be productized
This is not just a project checklist. It belongs in the product because the same patterns repeat:
- hot tables need a different CDC strategy
- wide tables need different performance defaults
- LOB-heavy tables need different batching
- audit tables need different business priority
If Pulsaride Transform encodes none of this, the same analysis will be redone manually in every program.
2.5 Discovering cycles before runtime: fail-fast dependency validation
Cycles are where elegant migration diagrams go to die.
A team may believe it has a clean parent-child ordering, only to discover at runtime that two tables depend on each other directly or indirectly. At that point, retries do not help. The engine simply oscillates between blocked states.
This is why cycle detection belongs in assessment and startup validation, not in operations.
Why runtime discovery is too late
If the first sign of a cycle is:
- rows piling up in
WAITING_DEP - repeated replay attempts
- unexplained cutover backlog
then the product has already failed the operator.
The right behavior is fail-fast:
- build the dependency graph before consumption starts
- detect cycles
- refuse to start until a strategy is chosen
The v2 CDC architecture material already moves in exactly this direction by replacing runtime CYCLE_DETECTED handling with startup validation of dependency rules. That is the right product instinct.
What counts as a cycle
A cycle may be:
- direct:
A -> BandB -> A - indirect:
A -> B -> C -> A - partly physical and partly logical: one edge from an FK, another from application flow
All three matter.
Practical cycle detection query
A quick Oracle query can reveal suspicious mutual foreign keys:
SELECT a.table_name,
b.table_name AS referenced_table
FROM all_constraints a
JOIN all_constraints b
ON a.r_constraint_name = b.constraint_name
WHERE a.constraint_type = 'R'
AND a.owner = 'APP_SCHEMA'
AND b.table_name IN (
SELECT table_name
FROM all_constraints
WHERE constraint_type = 'R'
AND r_constraint_name IN (
SELECT constraint_name
FROM all_constraints
WHERE table_name = a.table_name
)
);
This is not a full graph algorithm, but it surfaces obvious mutual dependencies early.
The product should go further and run an actual graph traversal over the complete dependency model, including declarative dependency rules derived from the source schema and business flows.
What to do when a cycle exists
A cycle is not always a showstopper, but it is always a design decision.
Typical responses are:
- make one side nullable temporarily
- split migration into two passes
- defer one FK population until after initial load
- reclassify one edge as nonblocking for CDC
What matters is that this choice is deliberate and documented before runtime.
Why this belongs in the product
A migration product that cannot validate its dependency model before execution is incomplete.
Pulsaride Transform should treat dependency validation the way a compiler treats syntax errors:
- discover early
- explain precisely
- block unsafe execution
That is cheaper than operational debugging by an order of magnitude.
2.6 Example: a migration blocked by a hidden mutual foreign key
Consider a simplified case with two tables:
ACCOUNTACCOUNT_PROFILE
At first glance, the migration plan seems obvious:
- load
ACCOUNT - then load
ACCOUNT_PROFILE - replay changes with
ACCOUNTas parent
But deeper inspection reveals:
ACCOUNT_PROFILE.ACCOUNT_IDreferencesACCOUNT.IDACCOUNT.DEFAULT_PROFILE_IDreferencesACCOUNT_PROFILE.ID
That is a mutual dependency.
What happens if the team misses it
The full load may appear to work if constraints are relaxed or load order is fudged.
But the CDC engine then starts showing a pathological pattern:
ACCOUNT_PROFILEchanges wait forACCOUNTACCOUNTchanges wait forACCOUNT_PROFILE- replay backlog grows
- retries do nothing
- operators see a healthy consumer but an unhealthy target
If the architecture exposes statuses like WAITING_DEP, the symptom becomes visible. But if the cycle is only discovered there, the damage is already operational.
What the product should have done earlier
During assessment:
- detect the FK cycle
- classify the pair as a two-pass migration candidate
- require an explicit strategy before enabling CDC rules
During transform planning:
- decide which edge is temporarily nullable or deferred
- define the second-pass enrichment/backfill logic
During validation:
- add targeted checks for rows missing the deferred relation
A better migration plan
One safe strategy is:
- migrate
ACCOUNTwithout enforcingDEFAULT_PROFILE_ID - migrate
ACCOUNT_PROFILE - backfill
ACCOUNT.DEFAULT_PROFILE_ID - validate missing links
- only then enforce the final constraint posture
This is not as elegant as pretending the cycle does not exist. It is much more reliable.
The lesson
The point of source-system reading is not to gather trivia. It is to discover design-shaping facts while they are still cheap to act on.
A hidden mutual foreign key is a perfect example. If found during assessment, it becomes:
- a declarative dependency rule
- a two-pass migration plan
- a known validation target
If found during cutover week, it becomes:
- a backlog spike
- a manual workaround
- a trust problem
That difference is exactly why Pulsaride Transform must begin with source understanding rather than target enthusiasm.
Why This Matters in Practice
Source-system reading is where a migration product either earns trust or accumulates hidden debt.
Every later design choice depends on whether the source estate has been understood at the right level: not just as tables and columns, but as a live system with dependencies, hidden behaviors, hot paths, and procedural logic. Poor assessment quickly turns into schedule slippage, unplanned exceptions, and expensive late-stage redesign.
In other words, assessment quality determines whether the product looks disciplined or improvisational once the migration leaves the lab.
Where Generic Approaches Fall Short
Generic tools usually start from the metadata they can extract most easily:
- tables
- columns
- types
- constraints
That is useful, but incomplete. They often miss or underweight:
- triggers that encode business behavior
- sequences and implicit key-generation paths
- packages and scheduler jobs that mutate data outside obvious transaction flows
- operational hotness and table criticality
- graph-level risk such as mutual dependencies and hidden cycles
The result is a shallow inventory that is good enough to generate output, but not good enough to govern a real migration program.
How Pulsaride Transform Helps
Pulsaride Transform should treat assessment as executable product input, not as a document-generation step.
That means the product should convert source discovery into structured migration decisions:
- object classification: auto, risky, manual, phased
- dependency graph extraction from foreign keys and application flows
- high-risk table detection for hot, wide, and LOB-heavy tables
- fail-fast dependency validation before runtime
- explicit treatment of unsupported Oracle constructs as blockers or redesign items
This is also where the product becomes reusable. Once these patterns are captured declaratively, each new migration wave starts from a stronger baseline instead of repeating the same discovery work from scratch.
What This Means for Migration Teams
Strong source assessment reduces the most damaging category of migration cost: surprise.
For teams running a real migration, it improves outcomes in four ways:
- better estimation before execution
- fewer hidden scope expansions after kickoff
- clearer boundary between product capability and manual work
- more credible wave planning and cutover preparation
In practical terms, a solution that can say early "these 40 tables are straightforward, these 8 require two-pass treatment, these 3 need procedural redesign" is much easier to trust than a tool that looks generic until the exceptions start arriving.