Nov 8, 2021

Testing Transformations Without Pretending Data Is Code

I like tests for data transformations, but I do not think data behaves exactly like application code. The inputs are messier, the domain rules change, and many failures are about meaning rather than syntax.

That changed my testing strategy. I still value unit-style checks around tricky transformation logic, but I also want invariant checks, reconciliation tests, schema tests, and sample-based reviews that catch semantic mistakes.

The best tests describe what must remain true. Primary keys should be unique. Revenue should reconcile. Timestamps should not move backward. A status should come from an allowed set.

Testing data well means accepting uncertainty while still drawing hard lines around the assumptions that matter.

Choosing the Right Test

I became more careful about matching the test to the risk. A schema test protects the interface. A uniqueness test protects joins. A reconciliation test protects financial or operational meaning. A distribution test protects against silent shifts. Those are different jobs.

The mistake I made earlier was wanting one testing style to cover everything. Data systems need layers. Some tests should fail a build before code ships. Some should run after data lands. Some should warn rather than block because the signal needs human judgment.

I also learned to test examples around edge cases. Business logic often lives in exceptions: canceled orders, merged accounts, late-arriving events, deleted users, timezone changes. A small set of representative examples can catch mistakes that broad aggregate checks miss.

The goal is not to make data perfectly testable. The goal is to make the most important assumptions executable. When those assumptions are written as tests, the system becomes easier to change because engineers can see what they are promising not to break.

Tests as Communication

Good tests communicate intent. A future engineer can read a uniqueness test and understand that the table is expected to be one row per account. They can read a reconciliation test and understand which source total the dataset must match.

That communication value matters because data transformations often outlive the original author. Tests preserve assumptions in a form the system can enforce. They are documentation with teeth.

I also became more comfortable with tests that are imperfect but useful. Some checks catch only a class of failure. Some thresholds need tuning. Some warnings require human review. That is fine. The alternative is often no signal at all.

The mature testing strategy is layered and pragmatic. Catch obvious breakage early, protect critical invariants strongly, and give humans enough context to investigate ambiguous shifts.