Aug 5, 2024

AI Systems Need Data Contracts Too

Data contracts are usually discussed around tables and events, but AI systems need the same discipline around less traditional interfaces.

Prompts have expected variables. Retrieval pipelines have eligibility rules. Evaluation datasets have labeling assumptions. Feedback events have semantics. If those interfaces drift, model behavior can change in ways that are hard to diagnose.

I like making these contracts explicit. What inputs does a prompt expect? What documents can retrieval use? What metadata is required for access control? What counts as positive feedback?

The more probabilistic the product feels, the more important deterministic interfaces become. Contracts give teams something solid to debug when behavior gets fuzzy.

Contracting the Fuzzy Parts

AI systems contain interfaces that are easy to overlook because they are not always tables. A prompt template is an interface. A tool schema is an interface. A retrieval filter is an interface. An evaluation rubric is an interface. When any of these changes silently, behavior can shift in ways that are difficult to explain.

I like making those interfaces versioned and observable. Which prompt version produced this answer? Which retriever version selected this context? Which tool schema was available? Which evaluation rubric judged the output? These details make debugging possible.

Contracts also help with ownership. If a source document collection feeds retrieval, the source owner needs a way to understand how changes affect the AI product. If an application team changes tool behavior, the evaluation data should reveal whether outcomes moved.

The point is not to make AI development slow. The point is to keep iteration legible. Fast changes are safer when the system can tell you exactly what changed and what happened afterward.

Interfaces for Evaluation

Evaluation data needs contracts too. A label should mean the same thing across reviewers. A feedback event should describe whether the user disliked the answer, the retrieval, the tone, or the task outcome. Without that clarity, feedback becomes noisy.

Tool calls need even stronger contracts. If a model can call a tool, the schema should define required fields, allowed values, side effects, and permission requirements. Tool behavior should be versioned like any other production interface.

Retrieval contracts should describe which sources are eligible, how freshness is handled, and which metadata is required for filtering. Those rules should not live only in application code.

AI systems feel flexible at the surface, but underneath they need crisp interfaces. That is what lets teams improve them without losing control of the behavior.