Aug 10, 2020

Data Contracts Before They Were Fashionable

Before I had a polished vocabulary for data contracts, I kept running into the same problem: one team changed data for reasonable local reasons, and another team absorbed the breakage.

The solution was not only tooling. It was making expectations explicit. Which fields are guaranteed? What changes require notice? What checks run before publication? Who approves breaking changes?

As systems grew, informal trust was not enough. We needed agreements that were lightweight enough to use and concrete enough to prevent silent surprises.

That is still how I think about data contracts. They are not bureaucracy when done well. They are a way to keep independent teams moving without turning every change into a production risk.

Keeping Contracts Lightweight

The challenge with contracts is making them useful without making them oppressive. If every small data change requires a committee, teams will route around the process. If no changes are governed, downstream systems absorb the chaos. The middle ground is explicit compatibility.

I like contracts that distinguish additive changes from breaking changes. Adding a nullable field may be fine. Renaming a required field, changing the grain, or redefining a status value should trigger a stronger process. The contract should make that distinction obvious.

The best version also includes automated checks. Humans should agree on the rules, but machines should enforce the boring parts. Schema validation, required field checks, enum checks, and freshness expectations are all places where automation can protect the agreement.

This was the stage where I began seeing contracts as a scaling tool. As teams grow, nobody can hold every dependency in their head. Contracts let teams change local systems while still respecting the shared data surface.

Contracts and Trust Boundaries

Contracts also clarify trust boundaries. A downstream team should not need to inspect a producer’s application code to know whether a dataset is safe to consume. The contract should expose the relevant guarantees at the boundary.

That boundary can include schema, freshness, uniqueness, allowed values, historical coverage, and change notice. The exact guarantees vary by dataset, but the act of writing them down forces the producer and consumer to agree on what matters.

I also learned that contracts need owners on both sides. Producers own publication quality. Consumers own responsible use. If a consumer uses a field outside its stated purpose, the contract gives the team a way to have that conversation clearly.

This is the difference between process and bureaucracy. Bad process slows everyone down without reducing risk. Good process makes the risky parts explicit so teams can move with more confidence.