Where Data Engineering Meets AI Product Engineering

My current view on the overlap between reliable data platforms and useful AI products.

Lessons From Reliable Data Systems

A synthesis of what years of pipeline, privacy, and ML infrastructure work have taught me.

Data Engineering for Agentic Products

How tool-using AI products raise the bar for logs, permissions, and evaluation data.

Building Trustworthy Feature Stores

What feature infrastructure needs beyond a place to store feature values.

The Hard Part of Reliable AI

Why reliable AI products depend on ordinary engineering discipline around changing systems.

Measuring Retrieval Quality

A deeper look at evaluating the data layer behind LLM product behavior.

From Pipelines to Platforms

How my perspective shifted from building individual workflows to enabling teams.

AI Systems Need Data Contracts Too

Why prompts, retrieval context, and evaluation data need explicit interfaces.

Designing for Data Minimization

How privacy constraints can lead to cleaner, more purposeful data architecture.

Reliable Data Systems Have Memory

A more mature view of run history, decisions, and institutional knowledge.

Privacy in AI Systems

Why AI products make data minimization and permission boundaries even more important.

Evaluating LLM Products

A practical view of evaluation data, feedback loops, and product quality.

Retrieval Is Data Engineering

How LLM applications made familiar data quality problems show up in a new interface.

Data Platforms Need Product Thinking

Why internal platforms should be designed around workflows, not just capabilities.

Cost Is an Observability Signal

How cloud spend helped me see inefficient data systems before they became incidents.

Deleting Data Is Engineering

Why retention, deletion, and lifecycle controls deserve real design attention.

Serving Features Reliably

A note on the gap between offline feature logic and production serving expectations.

ML Infrastructure Is a Feedback System

Why I started thinking beyond training pipelines and toward learning loops.

Testing Transformations Without Pretending Data Is Code

A more practical view of testing analytics and pipeline logic.

Lineage as a Debugging Tool

How lineage became more useful to me when I stopped treating it as a catalog feature.

Privacy Is a Systems Property

Why privacy engineering belongs in architecture decisions, not only review checklists.

Feature Pipelines Are Data Products

How ML feature work changed the way I thought about ownership and interfaces.

Observability for Humans

A reflection on alerts, dashboards, and making data systems easier to operate.

Data Contracts Before They Were Fashionable

How producer-consumer expectations became a recurring theme in my data engineering work.

Idempotency as Calm

Why rerunnable jobs made data operations feel less dramatic.

Reliability During Uncertainty

How changing business conditions made me think harder about freshness, drift, and operational signals.

Documenting the Weird Parts

Why the most useful data documentation often explains exceptions, not happy paths.

Partitions and Practical Performance

A note on learning to make warehouse performance understandable instead of mysterious.

Backfills Taught Me Humility

What historical data corrections revealed about assumptions hidden inside pipelines.

Data Quality Is Product Quality

How I began connecting backend data issues to the product experiences people actually see.

Batch Jobs Need Owners

A reflection on why scheduled jobs become fragile when nobody clearly owns their behavior.

Debugging With Row Counts

A simple reliability habit that helped me understand data movement before adding bigger tools.

Schemas Are Promises

Why I started treating schemas as contracts between teams, not just database metadata.

Learning From My First Broken Pipeline

Early notes on why a data pipeline that runs is not always a data pipeline that works.