Chapter 3: When '' Becomes NULL - The Empty String Trap
Some migration bugs are loud.
They crash a load. They break a constraint. They stop a replay worker. Everyone sees them immediately.
The empty-string trap is worse.
It often produces a migration that looks healthy:
- row counts match
- tables exist
- CDC is flowing
- no exception is raised
And yet the meaning of the data has changed.
That is why this chapter matters. The Oracle-to-PostgreSQL empty-string difference is one of the cleanest examples of a broader rule: migration errors often come from semantic drift, not from syntax errors.
In Oracle, the empty string in character columns is effectively treated as NULL.
In PostgreSQL, the empty string '' and NULL are different values.
That one difference leaks everywhere:
- uniqueness
- check constraints
- application filters
- business keys
- reconciliation logic
- CDC replay behavior
If the product does not handle it explicitly, the same bug gets rediscovered in every migration wave.
Pulsaride Transform therefore has to treat empty-string normalization as a product concern, not as a per-project cleanup script.
3.1 Oracle empty string semantics vs PostgreSQL text semantics
The heart of the problem is simple:
- in Oracle,
''for character data collapses toNULL - in PostgreSQL,
''remains an empty string and is notNULL
That means the same source record can take on a different logical meaning after migration depending on how it is loaded and normalized.
Why this difference is dangerous
A migration team may inspect a source column and conclude:
- "this field is often empty"
- "this field is nullable"
- "this field is optional"
But those are not the same statement.
In Oracle, a field may appear to contain "empty strings" in application behavior while the database actually stores them as null-like values. In PostgreSQL, the target can preserve that behavior only if the product chooses to preserve it.
A minimal comparison
Oracle behavior:
-- Oracle
INSERT INTO customer(code) VALUES ('');
SELECT CASE
WHEN code IS NULL THEN 'NULL'
ELSE 'NOT NULL'
END AS code_state
FROM customer;
Operationally, Oracle treats that inserted empty string as NULL.
PostgreSQL behavior:
-- PostgreSQL
INSERT INTO customer(code) VALUES ('');
SELECT CASE
WHEN code IS NULL THEN 'NULL'
ELSE 'NOT NULL'
END AS code_state
FROM customer;
In PostgreSQL, the result is NOT NULL, because the empty string is a real value.
Why migration tooling gets this wrong
A naive migration pipeline often does one of two things:
- copies source values into target text columns as-is
- serializes values through JSON or JDBC in a way that preserves
''when the source application originally treated it as "no value"
The result is a target that is technically valid but semantically different.
Where the issue hides
This problem is especially common in:
- optional business keys
- status codes
- free-text optional fields
- denormalized lookup codes
- integration identifiers
These fields often sit at the boundary between "present but blank" and "absent," and Oracle erases that distinction while PostgreSQL preserves it.
That is why the product needs a declared policy, not guesswork.
3.2 Why a field that "looked populated" becomes null-sensitive after migration
One of the most confusing symptoms in a migration is this:
The source team says a field is "filled." The target team says the same field is "sometimes empty." Both teams appear to be looking at the same business data.
The disagreement often comes from null-sensitivity after migration.
The illusion of populated data
Imagine a CUSTOMER_CODE column used by an old application form. Users frequently leave it blank. In Oracle:
- the application submits
'' - Oracle folds it into
NULL - downstream logic treats it as missing
After migration, the product preserves '' in PostgreSQL because no explicit normalization rule exists.
Now the field is no longer missing. It is present, but blank.
That changes every downstream query that uses:
IS NULLIS NOT NULLCOALESCE- uniqueness assumptions
- business-key construction
Why this becomes a CDC problem too
The issue does not only affect full load.
During CDC:
- source-side updates may continue to emit blank-looking values
- the target pipeline may replay them idempotently
- each replay reinforces the wrong semantic state
This is exactly the kind of problem that reconciliation can detect repeatedly without fixing automatically unless the product knows how to normalize it.
The shift from "optional" to "present"
The dangerous change is not cosmetic. It is logical:
- Oracle meaning: "value absent"
- PostgreSQL accidental meaning: "value present but empty"
Many business rules depend on that difference.
A few common examples:
- a code is optional only when
NULL - a row is "ready for publication" only when an identifier is not null
- a fallback lookup runs only when a field is null
- a deduplication process groups by missing business key
After migration, those branches start behaving differently.
Queries that silently change meaning
A query like this may have been safe in Oracle:
SELECT *
FROM customer
WHERE customer_code IS NOT NULL;
After migration, rows with customer_code = '' now qualify as "present" in PostgreSQL even though they were functionally absent in Oracle.
That is how a field that "looked populated" becomes null-sensitive. The migration preserved bytes but changed the truth conditions around them.
3.3 Unique indexes, check constraints, and application logic under the empty-string trap
This is where the bug stops being theoretical.
The empty-string trap changes how the target enforces rules.
Unique indexes and duplicate semantics
Suppose the source model allows many customers without a business code.
In Oracle, multiple rows with '' behave like multiple rows with NULL. That is usually compatible with a unique constraint because the database is not treating the values as a shared concrete string.
In PostgreSQL, if the migration stores those values as '', they are all the same real value.
That can produce:
- unique constraint violations during load
- false duplicates during reconciliation
- application inserts failing after cutover
Example:
CREATE TABLE customer (
id BIGINT PRIMARY KEY,
customer_code TEXT UNIQUE
);
INSERT INTO customer VALUES (1, '');
INSERT INTO customer VALUES (2, '');
In PostgreSQL, the second insert fails under a standard unique constraint.
What looked like "many missing codes" becomes "many identical codes."
Check constraints do not mean the same thing
A subtle and important case is:
CHECK (customer_code <> '')
In Oracle, inserting '' results in null-like behavior, and a check constraint treats unknown as acceptable rather than false. The empty-looking input may therefore pass.
In PostgreSQL, '' <> '' is false, so the same data is rejected.
That means a migrated check constraint can become stricter than the original operational behavior even when the DDL looks conceptually equivalent.
Application logic built on NULL
Application code often assumes Oracle semantics without naming them.
Typical patterns include:
NVL(customer_code, 'UNKNOWN')WHERE customer_code IS NULL- "if code missing, derive from another source"
- "if code present, skip enrichment"
After migration, rows with '' no longer enter the null branch.
The application may therefore:
- skip fallback logic
- publish incomplete records
- classify records incorrectly
- build incorrect natural keys
Why this affects migration products, not just applications
A mature migration product cannot say, "this is application behavior, not my problem."
If the product migrates the field into a state that violates original semantics, it has already created downstream defects.
Pulsaride Transform should therefore own:
- explicit empty-string handling in mappings
- clear defaults for text normalization
- warnings when unique or check rules are exposed to this semantic shift
Otherwise the same class of defect reappears across tables and projects.
3.4 Reconciliation symptoms: mismatched counts, false duplicates, unexpected validation failures
The empty-string trap often appears first in reconciliation, not in runtime errors.
That makes it especially dangerous because teams may treat the mismatch as a data-quality nuisance rather than as a migration semantics bug.
Symptom 1: counts differ only on filtered subsets
Global row counts may match while filtered reconciliation fails:
-- PostgreSQL target
SELECT COUNT(*)
FROM customer
WHERE customer_code IS NOT NULL;
The target count may be higher than the source-equivalent logic because blank strings are now counted as present values.
This is a classic case where:
- total table counts are green
- business-rule counts are red
Symptom 2: false duplicates
If multiple source rows had effectively missing codes in Oracle, but the target stores them as '', downstream uniqueness or deduplication logic may start treating them as collisions.
Reconciliation then surfaces:
- duplicate key reports
- unexpected merge matches
- increased conflict counts during replay
Symptom 3: sampled diff failures
A row-level diff may report:
- source value =
NULL - target value =
''
Technically the fields are different. Operationally, the team may argue they are "equivalent enough." That argument is dangerous because equivalence depends on the consuming logic.
For some fields, NULL and '' are equivalent enough.
For others, they are not:
- business keys
- external IDs
- readiness flags
- enrichment triggers
The product needs to know which case it is in.
Symptom 4: unstable replay corrections
In a reconciliation-driven architecture, drift may be detected and rows may be sent for replay or correction. If the normalization bug is still present, the same mismatch keeps coming back.
That pattern is a strong smell:
- the row "repairs" successfully
- the reconciliation check fails again later
- no infrastructure error is present
This is usually not a transport problem. It is a semantics problem.
Symptom 5: unexpected validation failures around nullability
Validation reports may suddenly show:
- more "non-null" target rows than expected
- more rows failing a business-key completeness check
- more rows bypassing enrichment steps
These are not random anomalies. They are often the exact footprint of Oracle empty-string collapse being lost in PostgreSQL.
What reconciliation should report
A good migration product should not just say "row mismatch." It should classify useful patterns:
NULL -> '''' -> NULL- whitespace-only normalization
- repeated same-column drift across many rows
That is the difference between a diagnostic product and a generic diff tool.
3.5 Product patterns: normalization rules, pre-steps, and explicit null policy
The empty-string trap is common enough that it should be solved once, at the product level.
Not every migration needs the same policy, but every migration needs an explicit policy.
Pattern 1: field-level normalization rules
The safest default is not global magic. It is declarative normalization by field category.
For example:
- business keys: collapse blank to
NULL - optional free text: preserve blank if business chooses to preserve it
- reference codes: collapse blank to
NULL - operator comments: preserve original shape
This should be expressible as configuration, not custom Java or ad hoc SQL.
Pattern 2: pre-steps in staging
A practical product pattern is to normalize in staging before target promotion.
Typical SQL shape:
UPDATE customer_staging
SET customer_code = NULL
WHERE customer_code = '';
That sounds trivial, but in a reusable product it should become a generic step:
- normalize blank-to-null
- optionally trim whitespace
- target selected columns only
- emit metrics on affected rows
Pattern 3: explicit null policy in mapping contracts
Every mapped column should be able to declare one of a few policies:
- preserve as-is
- trim only
- blank-to-null
- blank-and-whitespace-to-null
- reject blank as invalid
This makes migration behavior reviewable before runtime.
Pattern 4: reconciliation-aware normalization
Validation should understand the chosen policy.
If the declared policy is blank-to-null, reconciliation should not keep flagging NULL vs '' as unexplained drift. It should either:
- normalize before comparing
- or classify the difference as a policy mismatch, not a random row mismatch
Pattern 5: constraint-safe rollout
If empty-string normalization affects unique or check constraints, the product should apply the normalization before enabling or validating the final target rules.
Otherwise the pipeline can fail in the wrong place:
- load first
- constraint later
- unexpected duplicate explosion
Pattern 6: product feedback loop
If the same normalization issue appears across multiple migrations, it should graduate from project script to product feature.
This is the broader lesson:
- repeated manual cleanup means the product is missing a declarative capability
Pulsaride Transform should therefore treat empty-string handling as part of its transform vocabulary, not as one-off data hygiene.
3.6 Example: customer code '' turning into business-key drift
Consider a customer master table with an optional external code:
CUSTOMER(
CUSTOMER_ID,
CUSTOMER_CODE,
CUSTOMER_NAME,
UPDATED_AT
)
In Oracle, the business has long tolerated blank customer codes. Operationally, those rows are treated as "no code assigned yet."
During migration:
- the schema is converted successfully
- the full load copies rows into PostgreSQL
- CDC starts and remains healthy
- top-level counts match
Everything looks fine until reconciliation starts showing drift in downstream views and business reports.
What actually happened
Some source rows were functionally:
CUSTOMER_CODE = NULL
But the target rows became:
CUSTOMER_CODE = ''
Now a downstream rule that builds a business key like:
COALESCE(customer_code, 'NO_CODE')
behaves differently.
Expected Oracle-era meaning:
- missing code ->
'NO_CODE'
Actual PostgreSQL meaning after bad migration:
- empty string ->
''
So the target system produces:
- a blank business key instead of a fallback key
- different grouping behavior in reports
- inconsistent deduplication behavior
The reconciliation footprint
The team starts seeing:
- filtered count mismatches on "customers with code"
- repeated row-level diffs on
CUSTOMER_CODE - duplicate-looking records in business-key reports
- no obvious CDC lag issue
If the product is immature, the team blames data quality.
If the product is better, it recognizes a migration semantic drift pattern.
The fix
The durable fix is not "run this SQL once in prod."
The durable fix is:
- declare
CUSTOMER_CODEasblank-to-null - normalize in staging before promotion
- re-run reconciliation under the declared policy
- add a regression test so the behavior never reappears
Possible staging normalization:
UPDATE customer_staging
SET customer_code = NULL
WHERE TRIM(customer_code) = '';
Possible validation query after normalization:
SELECT COUNT(*)
FROM customer
WHERE customer_code IS NOT NULL;
Now the target regains the intended semantics:
- missing stays missing
- present stays present
- reconciliation stops re-reporting the same drift
The lesson
The empty-string trap is not a corner case. It is a miniature version of the entire migration problem.
The database moved.
The data moved.
But unless the product also migrates meaning, the migration is incomplete.
That is why Pulsaride Transform needs normalization rules, reconciliation-aware validation, and explicit semantic policies. Without them, the product can copy rows faithfully and still reproduce the wrong system.
Why This Matters in Practice
This chapter is about a small-looking difference that creates disproportionately large migration risk.
The empty-string trap proves that data migration is not only about moving values; it is about preserving meaning. It also shows why subtle defects are so dangerous: they are exactly the kind of issues generic tools often let through because the rows still move and the schema still compiles.
If a migration solution cannot handle a difference as fundamental as Oracle '' versus PostgreSQL NULL, it becomes much harder to trust it on larger semantic gaps.
Where Generic Approaches Fall Short
Generic tools tend to operate at one of two levels:
- schema equivalence
- row movement
They often do not model semantic policies such as:
- whether blank means absent
- whether whitespace-only values should be normalized
- whether uniqueness should be evaluated before or after normalization
- whether reconciliation should treat
NULLversus''as benign, suspicious, or blocking
This is why the same defect is so often rediscovered late. The tooling copies data faithfully enough to pass technical smoke tests, but not faithfully enough to preserve business intent.
How Pulsaride Transform Helps
Pulsaride Transform should treat empty-string handling as a reusable transform capability with explicit policy, not as a hidden cleanup task.
In product terms, that means:
- declarative normalization rules per field or field class
- pre-promotion staging steps such as blank-to-null or trim-and-null
- validation that understands declared normalization policy
- reconciliation outputs that classify semantic drift instead of just reporting raw mismatch
- regression tests that ensure once-fixed semantic traps do not return in later waves
This turns an error-prone migration detail into a controlled and reviewable product feature.
What This Means for Migration Teams
Subtle semantic defects are among the most expensive defects in migration because they often survive long enough to damage trust.
When Pulsaride Transform handles edge cases like empty-string normalization explicitly, it shows two useful qualities in practice:
- the product is built from real migration failure patterns
- the product reduces post-cutover business drift, not just data movement effort
That matters because migration teams need confidence: confidence that the target system will behave like the source where it should, and that the product can explain differences where it should not.