Where Data Engineering Meets AI Product Engineering
The more I look at AI product engineering, the more I see data engineering at the center of it.
Useful AI products need trustworthy context, privacy-aware retrieval, evaluation datasets, feedback loops, lineage, monitoring, and clear ownership. Those are not side quests. They determine whether the product can improve safely over time.
My current view is that the best AI systems will be built by teams that combine product taste with infrastructure discipline. The model matters, but so does everything that surrounds it.
That is the overlap I am most interested in: building data systems that make intelligent products reliable enough to earn trust.
The Work Ahead
The next generation of AI products will need engineers who are comfortable moving between product behavior and infrastructure detail. It is not enough to know that a model responded poorly. Teams need to know whether the issue came from source data, retrieval, permissions, prompt design, tool behavior, evaluation gaps, or user expectation.
That diagnosis is a data problem. It requires instrumentation, stable identifiers, good schemas, representative evaluation sets, and a platform that keeps product interactions analyzable without collecting more than it should.
I am especially interested in systems that make improvement safer. Can a team test retrieval changes before shipping? Can privacy constraints be enforced automatically? Can evaluation data reveal regressions by workflow? Can product teams understand quality without becoming ML infrastructure experts?
This is where my interests converge. Data engineering gives AI products their memory, evidence, and guardrails. Product engineering gives the infrastructure a reason to exist. The best work happens when those two sides inform each other from the start.
The Kind of Systems I Want to Build
The systems I want to build make quality visible. They help teams understand why an AI product behaved a certain way, whether data was fresh and permission-safe, and how a change affected real workflows.
They also make responsibility visible. Owners, contracts, retention rules, evaluation rubrics, and release decisions should not be hidden in scattered documents. They should be part of the operating surface of the system.
I do not think this work is only about sophisticated models. It is about the discipline around them: the data products, interfaces, feedback loops, and reliability practices that let teams improve without losing trust.
That is why I keep coming back to the same themes. ML infrastructure, privacy engineering, reliable data systems, and LLM product engineering are not separate interests for me. They are different angles on the same problem: building intelligent systems that are useful, accountable, and dependable.