Chapter 3

Chapter 3: When '' Becomes NULL - The Empty String Trap

Some migration bugs are loud.

They crash a load. They break a constraint. They stop a replay worker. Everyone sees them immediately.

The empty-string trap is worse.

It often produces a migration that looks healthy:

row counts match
tables exist
CDC is flowing
no exception is raised

And yet the meaning of the data has changed.

That is why this chapter matters. The Oracle-to-PostgreSQL empty-string difference is one of the cleanest examples of a broader rule: migration errors often come from semantic drift, not from syntax errors.

In Oracle, the empty string in character columns is effectively treated as NULL.

In PostgreSQL, the empty string '' and NULL are different values.

That one difference leaks everywhere:

uniqueness
check constraints
application filters
business keys
reconciliation logic
CDC replay behavior

If the product does not handle it explicitly, the same bug gets rediscovered in every migration wave.

Pulsaride Transform therefore has to treat empty-string normalization as a product concern, not as a per-project cleanup script.

3.1 Oracle empty string semantics vs PostgreSQL text semantics

The heart of the problem is simple:

in Oracle, '' for character data collapses to NULL
in PostgreSQL, '' remains an empty string and is not NULL

That means the same source record can take on a different logical meaning after migration depending on how it is loaded and normalized.

Why this difference is dangerous

A migration team may inspect a source column and conclude:

"this field is often empty"
"this field is nullable"
"this field is optional"

But those are not the same statement.

In Oracle, a field may appear to contain "empty strings" in application behavior while the database actually stores them as null-like values. In PostgreSQL, the target can preserve that behavior only if the product chooses to preserve it.

A minimal comparison

Oracle behavior:

-- Oracle
INSERT INTO customer(code) VALUES ('');

SELECT CASE
         WHEN code IS NULL THEN 'NULL'
         ELSE 'NOT NULL'
       END AS code_state
FROM customer;

Operationally, Oracle treats that inserted empty string as NULL.

PostgreSQL behavior:

-- PostgreSQL
INSERT INTO customer(code) VALUES ('');

SELECT CASE
         WHEN code IS NULL THEN 'NULL'
         ELSE 'NOT NULL'
       END AS code_state
FROM customer;

In PostgreSQL, the result is NOT NULL, because the empty string is a real value.

Why migration tooling gets this wrong

A naive migration pipeline often does one of two things:

copies source values into target text columns as-is
serializes values through JSON or JDBC in a way that preserves '' when the source application originally treated it as "no value"

The result is a target that is technically valid but semantically different.

Where the issue hides

This problem is especially common in:

optional business keys
status codes
free-text optional fields
denormalized lookup codes
integration identifiers

These fields often sit at the boundary between "present but blank" and "absent," and Oracle erases that distinction while PostgreSQL preserves it.

That is why the product needs a declared policy, not guesswork.

3.2 Why a field that "looked populated" becomes null-sensitive after migration

One of the most confusing symptoms in a migration is this:

The source team says a field is "filled." The target team says the same field is "sometimes empty." Both teams appear to be looking at the same business data.

The disagreement often comes from null-sensitivity after migration.

The illusion of populated data

Imagine a CUSTOMER_CODE column used by an old application form. Users frequently leave it blank. In Oracle:

the application submits ''
Oracle folds it into NULL
downstream logic treats it as missing

After migration, the product preserves '' in PostgreSQL because no explicit normalization rule exists.

Now the field is no longer missing. It is present, but blank.

That changes every downstream query that uses:

IS NULL
IS NOT NULL
COALESCE
uniqueness assumptions
business-key construction

Why this becomes a CDC problem too

The issue does not only affect full load.

During CDC:

source-side updates may continue to emit blank-looking values
the target pipeline may replay them idempotently
each replay reinforces the wrong semantic state

This is exactly the kind of problem that reconciliation can detect repeatedly without fixing automatically unless the product knows how to normalize it.

The shift from "optional" to "present"

The dangerous change is not cosmetic. It is logical:

Oracle meaning: "value absent"
PostgreSQL accidental meaning: "value present but empty"

Many business rules depend on that difference.

A few common examples:

a code is optional only when NULL
a row is "ready for publication" only when an identifier is not null
a fallback lookup runs only when a field is null
a deduplication process groups by missing business key

After migration, those branches start behaving differently.

Queries that silently change meaning

A query like this may have been safe in Oracle:

SELECT *
FROM customer
WHERE customer_code IS NOT NULL;

After migration, rows with customer_code = '' now qualify as "present" in PostgreSQL even though they were functionally absent in Oracle.

That is how a field that "looked populated" becomes null-sensitive. The migration preserved bytes but changed the truth conditions around them.

3.3 Unique indexes, check constraints, and application logic under the empty-string trap

This is where the bug stops being theoretical.

The empty-string trap changes how the target enforces rules.

Unique indexes and duplicate semantics

Suppose the source model allows many customers without a business code.

In Oracle, multiple rows with '' behave like multiple rows with NULL. That is usually compatible with a unique constraint because the database is not treating the values as a shared concrete string.

In PostgreSQL, if the migration stores those values as '', they are all the same real value.

That can produce:

unique constraint violations during load
false duplicates during reconciliation
application inserts failing after cutover

Example:

CREATE TABLE customer (
    id BIGINT PRIMARY KEY,
    customer_code TEXT UNIQUE
);

INSERT INTO customer VALUES (1, '');
INSERT INTO customer VALUES (2, '');

In PostgreSQL, the second insert fails under a standard unique constraint.

What looked like "many missing codes" becomes "many identical codes."

Check constraints do not mean the same thing

A subtle and important case is:

CHECK (customer_code <> '')

In Oracle, inserting '' results in null-like behavior, and a check constraint treats unknown as acceptable rather than false. The empty-looking input may therefore pass.

In PostgreSQL, '' <> '' is false, so the same data is rejected.

That means a migrated check constraint can become stricter than the original operational behavior even when the DDL looks conceptually equivalent.

Application logic built on `NULL`

Application code often assumes Oracle semantics without naming them.

Typical patterns include:

NVL(customer_code, 'UNKNOWN')
WHERE customer_code IS NULL
"if code missing, derive from another source"
"if code present, skip enrichment"

After migration, rows with '' no longer enter the null branch.

The application may therefore:

skip fallback logic
publish incomplete records
classify records incorrectly
build incorrect natural keys

Why this affects migration products, not just applications

A mature migration product cannot say, "this is application behavior, not my problem."

If the product migrates the field into a state that violates original semantics, it has already created downstream defects.

Pulsaride Transform should therefore own:

explicit empty-string handling in mappings
clear defaults for text normalization
warnings when unique or check rules are exposed to this semantic shift

Otherwise the same class of defect reappears across tables and projects.

3.4 Reconciliation symptoms: mismatched counts, false duplicates, unexpected validation failures

The empty-string trap often appears first in reconciliation, not in runtime errors.

That makes it especially dangerous because teams may treat the mismatch as a data-quality nuisance rather than as a migration semantics bug.

Symptom 1: counts differ only on filtered subsets

Global row counts may match while filtered reconciliation fails:

-- PostgreSQL target
SELECT COUNT(*)
FROM customer
WHERE customer_code IS NOT NULL;

The target count may be higher than the source-equivalent logic because blank strings are now counted as present values.

This is a classic case where:

total table counts are green
business-rule counts are red

Symptom 2: false duplicates

If multiple source rows had effectively missing codes in Oracle, but the target stores them as '', downstream uniqueness or deduplication logic may start treating them as collisions.

Reconciliation then surfaces:

duplicate key reports
unexpected merge matches
increased conflict counts during replay

Symptom 3: sampled diff failures

A row-level diff may report:

source value = NULL
target value = ''

Technically the fields are different. Operationally, the team may argue they are "equivalent enough." That argument is dangerous because equivalence depends on the consuming logic.

For some fields, NULL and '' are equivalent enough.

For others, they are not:

business keys
external IDs
readiness flags
enrichment triggers

The product needs to know which case it is in.

Symptom 4: unstable replay corrections

In a reconciliation-driven architecture, drift may be detected and rows may be sent for replay or correction. If the normalization bug is still present, the same mismatch keeps coming back.

That pattern is a strong smell:

the row "repairs" successfully
the reconciliation check fails again later
no infrastructure error is present

This is usually not a transport problem. It is a semantics problem.

Symptom 5: unexpected validation failures around nullability

Validation reports may suddenly show:

more "non-null" target rows than expected
more rows failing a business-key completeness check
more rows bypassing enrichment steps

These are not random anomalies. They are often the exact footprint of Oracle empty-string collapse being lost in PostgreSQL.

What reconciliation should report

A good migration product should not just say "row mismatch." It should classify useful patterns:

NULL -> ''
'' -> NULL
whitespace-only normalization
repeated same-column drift across many rows

That is the difference between a diagnostic product and a generic diff tool.

3.5 Product patterns: normalization rules, pre-steps, and explicit null policy

The empty-string trap is common enough that it should be solved once, at the product level.

Not every migration needs the same policy, but every migration needs an explicit policy.

Pattern 1: field-level normalization rules

The safest default is not global magic. It is declarative normalization by field category.

For example:

business keys: collapse blank to NULL
optional free text: preserve blank if business chooses to preserve it
reference codes: collapse blank to NULL
operator comments: preserve original shape

This should be expressible as configuration, not custom Java or ad hoc SQL.

Pattern 2: pre-steps in staging

A practical product pattern is to normalize in staging before target promotion.

Typical SQL shape:

UPDATE customer_staging
SET customer_code = NULL
WHERE customer_code = '';

That sounds trivial, but in a reusable product it should become a generic step:

normalize blank-to-null
optionally trim whitespace
target selected columns only
emit metrics on affected rows

Pattern 3: explicit null policy in mapping contracts

Every mapped column should be able to declare one of a few policies:

preserve as-is
trim only
blank-to-null
blank-and-whitespace-to-null
reject blank as invalid

This makes migration behavior reviewable before runtime.

Pattern 4: reconciliation-aware normalization

Validation should understand the chosen policy.

If the declared policy is blank-to-null, reconciliation should not keep flagging NULL vs '' as unexplained drift. It should either:

normalize before comparing
or classify the difference as a policy mismatch, not a random row mismatch

Pattern 5: constraint-safe rollout

If empty-string normalization affects unique or check constraints, the product should apply the normalization before enabling or validating the final target rules.

Otherwise the pipeline can fail in the wrong place:

load first
constraint later
unexpected duplicate explosion

Pattern 6: product feedback loop

If the same normalization issue appears across multiple migrations, it should graduate from project script to product feature.

This is the broader lesson:

repeated manual cleanup means the product is missing a declarative capability

Pulsaride Transform should therefore treat empty-string handling as part of its transform vocabulary, not as one-off data hygiene.

3.6 Example: customer code '' turning into business-key drift

Consider a customer master table with an optional external code:

CUSTOMER(
    CUSTOMER_ID,
    CUSTOMER_CODE,
    CUSTOMER_NAME,
    UPDATED_AT
)

In Oracle, the business has long tolerated blank customer codes. Operationally, those rows are treated as "no code assigned yet."

During migration:

the schema is converted successfully
the full load copies rows into PostgreSQL
CDC starts and remains healthy
top-level counts match

Everything looks fine until reconciliation starts showing drift in downstream views and business reports.

What actually happened

Some source rows were functionally:

CUSTOMER_CODE = NULL

But the target rows became:

CUSTOMER_CODE = ''

Now a downstream rule that builds a business key like:

COALESCE(customer_code, 'NO_CODE')

behaves differently.

Expected Oracle-era meaning:

missing code -> 'NO_CODE'

Actual PostgreSQL meaning after bad migration:

empty string -> ''

So the target system produces:

a blank business key instead of a fallback key
different grouping behavior in reports
inconsistent deduplication behavior

The reconciliation footprint

The team starts seeing:

filtered count mismatches on "customers with code"
repeated row-level diffs on CUSTOMER_CODE
duplicate-looking records in business-key reports
no obvious CDC lag issue

If the product is immature, the team blames data quality.

If the product is better, it recognizes a migration semantic drift pattern.

The fix

The durable fix is not "run this SQL once in prod."

The durable fix is:

declare CUSTOMER_CODE as blank-to-null
normalize in staging before promotion
re-run reconciliation under the declared policy
add a regression test so the behavior never reappears

Possible staging normalization:

UPDATE customer_staging
SET customer_code = NULL
WHERE TRIM(customer_code) = '';

Possible validation query after normalization:

SELECT COUNT(*)
FROM customer
WHERE customer_code IS NOT NULL;

Now the target regains the intended semantics:

missing stays missing
present stays present
reconciliation stops re-reporting the same drift

The lesson

The empty-string trap is not a corner case. It is a miniature version of the entire migration problem.

The database moved.

The data moved.

But unless the product also migrates meaning, the migration is incomplete.

That is why Pulsaride Transform needs normalization rules, reconciliation-aware validation, and explicit semantic policies. Without them, the product can copy rows faithfully and still reproduce the wrong system.

Why This Matters in Practice

This chapter is about a small-looking difference that creates disproportionately large migration risk.

The empty-string trap proves that data migration is not only about moving values; it is about preserving meaning. It also shows why subtle defects are so dangerous: they are exactly the kind of issues generic tools often let through because the rows still move and the schema still compiles.

If a migration solution cannot handle a difference as fundamental as Oracle '' versus PostgreSQL NULL, it becomes much harder to trust it on larger semantic gaps.

Where Generic Approaches Fall Short

Generic tools tend to operate at one of two levels:

schema equivalence
row movement

They often do not model semantic policies such as:

whether blank means absent
whether whitespace-only values should be normalized
whether uniqueness should be evaluated before or after normalization
whether reconciliation should treat NULL versus '' as benign, suspicious, or blocking

This is why the same defect is so often rediscovered late. The tooling copies data faithfully enough to pass technical smoke tests, but not faithfully enough to preserve business intent.

How Pulsaride Transform Helps

Pulsaride Transform should treat empty-string handling as a reusable transform capability with explicit policy, not as a hidden cleanup task.

In product terms, that means:

declarative normalization rules per field or field class
pre-promotion staging steps such as blank-to-null or trim-and-null
validation that understands declared normalization policy
reconciliation outputs that classify semantic drift instead of just reporting raw mismatch
regression tests that ensure once-fixed semantic traps do not return in later waves

This turns an error-prone migration detail into a controlled and reviewable product feature.

What This Means for Migration Teams

Subtle semantic defects are among the most expensive defects in migration because they often survive long enough to damage trust.

When Pulsaride Transform handles edge cases like empty-string normalization explicitly, it shows two useful qualities in practice:

the product is built from real migration failure patterns
the product reduces post-cutover business drift, not just data movement effort

That matters because migration teams need confidence: confidence that the target system will behave like the source where it should, and that the product can explain differences where it should not.

← Previous

Chapter 2Reading the Source System

Chapter 4Data Type Mapping That Breaks in Production