Chapter 4

Chapter 4: Data Type Mapping That Breaks in Production

Type mapping is where many migrations first become deceptively dangerous.

At the beginning of a project, data types seem straightforward. There is a source column, there is a target column, and there is usually at least one obvious PostgreSQL type that appears to fit. That creates false confidence. A migration team starts thinking in terms of mechanical translation:

NUMBER becomes numeric
DATE becomes timestamp
CLOB becomes text
RAW becomes bytea

Sometimes that works.

Sometimes it creates a target system that loads cleanly, passes basic smoke tests, and then begins to fail under real application behavior, real reconciliation, and real cutover pressure.

The problem is not that Oracle and PostgreSQL are incompatible. The problem is that data types carry semantics:

precision
range
padding rules
timezone expectations
ordering behavior
comparison behavior
serialization cost
storage cost

If these semantics are not preserved intentionally, the migration may move data without preserving meaning.

That is why Transform Pulsaride has to treat type mapping as a product capability rather than a one-time conversion table. The right question is never just "which target type compiles?" The right question is "which target type preserves behavior, performance, and operational predictability?"

4.1 NUMBER without precision: integer, bigint, numeric, or disaster

Oracle NUMBER is one of the most dangerous "simple" types in migration.

On paper, it looks flexible. In practice, that flexibility is exactly what creates ambiguity.

Why `NUMBER` is tricky

An Oracle column may be declared as:

NUMBER
NUMBER(10)
NUMBER(10,0)
NUMBER(18,2)
NUMBER(38)

These declarations do not all mean the same thing operationally, and they should not all land on the same PostgreSQL type.

Common wrong mappings

The most common mistakes are:

mapping every NUMBER to numeric
mapping every integral-looking NUMBER to bigint
trusting the declared precision without checking actual data

Each mistake creates a different class of problem.

If everything becomes numeric, the migration may succeed but:

indexes become heavier than necessary
joins become slower
application code loses assumptions about integer identity fields
storage and CPU cost rise quietly

If everything becomes bigint, the opposite risk appears:

some values exceed the range
decimal semantics are lost
application overflows emerge later

Practical assessment questions

Before mapping NUMBER, inspect:

declared precision and scale
actual min and max values
presence of fractional values
business meaning of the column
whether the column is used in joins, keys, counters, or money calculations

Useful Oracle checks:

SELECT column_name, data_precision, data_scale
FROM all_tab_columns
WHERE owner = 'APP_SCHEMA'
  AND table_name = 'ORDERS'
  AND data_type = 'NUMBER';

And for actual data shape:

SELECT MIN(amount) AS min_val,
       MAX(amount) AS max_val
FROM APP_SCHEMA.ORDERS;

To detect fractional values:

SELECT COUNT(*) AS fractional_rows
FROM APP_SCHEMA.ORDERS
WHERE amount != TRUNC(amount);

A safer mapping mindset

As a working rule:

identifiers and counters often want integer or bigint
financial values often want numeric(p,s)
unconstrained NUMBER columns should not be mapped blindly

Transform Pulsaride should therefore map NUMBER using both metadata and observed usage, not metadata alone.

4.2 DATE vs TIMESTAMP vs TIMESTAMPTZ

Time columns are among the most common sources of post-cutover confusion.

The root cause is simple: Oracle and PostgreSQL time types look similar enough to encourage overconfidence, but different enough to create silent drift.

Oracle `DATE` is not PostgreSQL `date`

Oracle DATE stores both date and time to the second.

That means a direct conceptual mapping from Oracle DATE to PostgreSQL date is often wrong. It strips time from the value and changes ordering, comparison, and business logic in one move.

The three usual targets

Migration teams often choose between:

PostgreSQL date
PostgreSQL timestamp
PostgreSQL timestamptz

The correct choice depends on meaning, not habit.

Use date only if the source column is truly calendar-only in business meaning.

Use timestamp if the application expects a local wall-clock value with no timezone semantics.

Use timestamptz if the value must remain unambiguous across environments, services, or sessions.

Where migrations go wrong

Common mistakes include:

truncating Oracle DATE into PostgreSQL date
using timestamp where timezone correctness matters
using timestamptz without aligning application expectations
comparing times across systems with different session timezones

Why reconciliation gets noisy

Timestamp drift often appears as:

off-by-hours mismatches
values that look different but refer to the same instant
count agreement with row-level diff failures

A reconciliation job that compares raw string forms of timestamps is especially likely to produce misleading results.

Practical check

Before mapping, determine whether the source column is used as:

business date
technical update timestamp
event timestamp
SLA clock
partitioning key

These use cases should not all share one default.

4.3 CHAR semantics, padded blanks, and equality surprises

Fixed-width character types create subtle drift even when values look identical in UI screens.

Oracle and PostgreSQL both support `CHAR`, but behavior still matters

The practical issue is not only storage. It is comparison.

A code such as 'A' may become 'A ' depending on type, client library, and migration path. That affects:

equality
distinct counts
checksum results
joins on code columns
reconciliation diffs

Where this hurts

The damage is highest on:

country or status codes
denormalized business flags
old integration fields
legacy fixed-format identifiers

Wrong instinct

A migration team sees CHAR(10) and assumes it should stay CHAR(10) forever.

That is often the wrong decision.

Sometimes the right target is text or varchar, combined with explicit normalization and padding rules at the edge.

Practical rule

For each CHAR column, ask:

is fixed width meaningful or incidental?
do consuming systems rely on padded values?
should trailing blanks be preserved, trimmed, or normalized?

Transform Pulsaride should not treat CHAR preservation as a default virtue. It should preserve behavior, not historical awkwardness.

4.4 CLOB, NCLOB, BLOB, RAW, and binary payload migration

Large and binary types break optimistic migration assumptions faster than ordinary scalar columns.

Why LOB-heavy tables are special

LOBs affect:

extraction throughput
CDC payload size
staging volume
reconciliation cost
retry impact
target write amplification

A migration plan that works well on narrow transactional rows may collapse on document-heavy tables.

Typical mappings

CLOB -> text
NCLOB -> text
BLOB -> bytea
RAW -> bytea

These mappings are often acceptable technically, but that does not mean the table should flow through the same pipeline settings as everything else.

Questions to answer early

Are large objects updated often or rarely?
Can they be migrated in the same wave as transactional tables?
Should they use a lower batch size?
Does the target application read them synchronously?
Is row-level reconciliation practical, or is hash-based validation better?

Special case: binary identifiers versus binary content

RAW(16) used for identifiers is very different from a BLOB carrying document payload. Treating both under one generic "binary mapping" policy is a mistake.

Transform Pulsaride should classify LOB-heavy and binary-heavy tables early so that performance defaults, validation methods, and cutover expectations remain realistic.

4.5 XMLTYPE, JSON stored as CLOB, and semi-structured columns

Semi-structured columns are where migration teams often discover that schema conversion and data conversion have drifted apart.

Why these columns are difficult

The type itself may be easy to move, but the behavior around it is not.

Typical cases:

Oracle XMLTYPE
JSON stored as CLOB
application payload blobs encoded as text

The migration decision is not just "where should this data fit?" It is also:

should the structure remain opaque?
should it become PostgreSQL jsonb?
are there downstream queries depending on XPath or string matching?
are there indexing opportunities after migration?

The wrong shortcut

Dump everything into text and call the migration done.

That can work for archival content, but it is often wrong for active data. Once the target application begins querying inside these columns, the team realizes too late that the chosen mapping preserved bytes but not usability.

A more careful approach

For each semi-structured column:

classify it as archival, operational, or queryable
detect invalid JSON or malformed XML in the source
decide whether to preserve, convert, or dual-store

Transform Pulsaride should support this as an explicit mapping choice rather than forcing one default for all document-like fields.

4.6 Oracle BOOLEAN emulation and PostgreSQL booleans

Many Oracle systems do not use a native boolean in the same way PostgreSQL does.

Instead, they emulate booleans with:

NUMBER(1)
CHAR(1)
VARCHAR2(1)
values like Y/N, 1/0, T/F, O/N

Why this breaks late

The schema mapping itself is easy:

source flag -> target boolean

The semantics are harder:

what values are actually present?
are nulls meaningful?
are there invalid historical values?
does the application depend on three states rather than two?

Example risk

A source flag uses:

1 = true
0 = false
NULL = not evaluated yet

If the migration team flattens this into a non-null PostgreSQL boolean with default false, it silently removes a business state.

Safer migration rule

Do not map flag columns by type alone. Map them by domain.

That means:

inventory actual values
decide whether the target is two-state or three-state
validate illegal source values before cutover

This is another place where Transform Pulsaride should prefer declarative value-domain mapping over custom patch logic.

4.7 Interval types, scientific notation, and precision loss

Some of the hardest type problems appear in the long tail, not in the common columns.

Interval-like values

Legacy Oracle systems often store durations in inconsistent ways:

numeric seconds
text durations
interval-like strings
custom unit columns

Migrating these into PostgreSQL interval may be correct, but only if the source meaning is stable and explicit.

Scientific notation and serialization traps

Very large or very small numeric values may move through:

CSV
JSON
JDBC drivers
CDC serializers

and change textual representation along the way. Even when the numeric value survives, checksums and row-level diffs may become noisy if the comparison logic is not normalization-aware.

Precision loss

Precision loss often comes from:

converting decimal values through floating-point paths
application code reading numeric through double
implicit cast chains in staging or replay

This is especially dangerous because the row still "looks right enough" until audited carefully.

Transform Pulsaride should therefore treat rare numeric and interval-like patterns as first-class assessment findings, not as afterthoughts.

4.8 Reconciliation patterns for type drift

Type drift usually appears in reconciliation before it becomes a visible production incident.

That is why validation should not only compare rows. It should help classify what kind of drift occurred.

Common type-drift signatures

NULL versus ''
timestamp offset drift
numeric text representation drift
trailing blank drift on fixed-width fields
boolean-domain drift
precision truncation on decimal values

Why counts are weak here

Most type-mapping errors do not change row count.

They change:

value interpretation
grouping behavior
uniqueness
downstream logic

That means a migration can pass:

table count checks
ingest throughput checks
CDC lag checks

and still fail semantically.

Better reconciliation patterns

Useful validation for type mapping includes:

normalized comparisons for known semantic rules
bucketed checksum comparison on business-relevant columns
sample diffs for high-risk tables
targeted validation for columns with explicit mapping policy

Transform Pulsaride should use reconciliation not just to detect drift, but to make drift diagnosable. A useful validation report should help a team distinguish:

bad mapping choice
bad source data
bad normalization rule
bad replay behavior

That distinction shortens incident analysis dramatically.

4.9 Example: NUMBER(38) that silently overflows an application assumption

Consider a legacy Oracle table:

CREATE TABLE PAYMENT_EVENT (
    EVENT_ID NUMBER(38) PRIMARY KEY,
    AMOUNT   NUMBER(18,2),
    STATUS   VARCHAR2(20)
);

During assessment, the team sees:

EVENT_ID is integral
the application treats it as an identifier
the target application code uses Java Long

The migration seems obvious:

map EVENT_ID to PostgreSQL bigint

That works in test because the early datasets contain small values.

Months later, production replay hits a range of identifiers generated by another subsystem. Some values exceed bigint, or are close enough to trigger application conversion problems during deserialization and downstream processing.

What goes wrong

At first, nothing obvious breaks in the database design review.

Then the real symptoms appear:

row writes failing on range overflow
replay retries climbing
partial backlog growing
reconciliation showing missing rows on a subset of events

The root cause is not transport. It is a bad type assumption.

What a safer process would have done

Before deciding on bigint, the team should have checked:

actual maximum values in production-like data
all producing systems for EVENT_ID
application assumptions about integer range
whether the identifier is arithmetic or opaque

A safer target may have been:

PostgreSQL numeric(38,0)

combined with:

explicit application handling
explicit indexing review
reconciliation rules that compare values without lossy casts

The lesson

The dangerous part of type mapping is not the column declaration. It is the invisible assumption behind it.

A migration product earns its value here by preventing "reasonable-looking" mappings from surviving long enough to become production incidents. That is why Transform Pulsaride needs metadata analysis, value inspection, reconciliation-aware validation, and explicit mapping policy in the same workflow.

Why This Matters in Practice

Type mapping errors are some of the most expensive migration defects because they often survive the early phases of a project. The schema compiles, the rows load, and the pipeline stays green. The real problem only appears later, when a query slows down, a constraint behaves differently, a timestamp shifts meaning, or a high-value record fails reconciliation.

That is why type mapping has to be treated as a behavior-preservation problem, not as a lookup-table exercise.

Where Generic Approaches Fall Short

Generic approaches usually rely on static conversion tables:

one source type
one target type

That is fast, but it ignores the factors that actually shape runtime behavior:

real value ranges
fractional versus integral use
timezone semantics
padding and comparison rules
document usage patterns
application assumptions

The result is a migration that may be syntactically correct while remaining operationally fragile.

How Transform Pulsaride Helps

Transform Pulsaride is most useful here when it combines four things in one flow:

metadata inspection
source-value profiling
explicit mapping policy
reconciliation that can detect type-driven drift

That combination makes type mapping reviewable, testable, and repeatable across migration waves instead of leaving it buried in project-specific scripts or assumptions.

What This Means for Migration Teams

For teams running real migrations, good type mapping reduces two kinds of pain at once:

technical pain during load, replay, and application startup
diagnostic pain after cutover, when subtle type drift becomes hard to trace back to its source

When Transform Pulsaride handles type mapping with explicit policy and validation, the target system becomes more predictable, the reconciliation outputs become more useful, and the migration team spends less time rediscovering the same class of defects table by table.

← Previous

Chapter 3The Empty String Trap

Chapter 5Names, Quoting, and Case Sensitivity