Blog
Writing and notes.
Thoughts on data engineering, ML infrastructure, privacy-aware systems, and applied AI product engineering.
Where Data Engineering Meets AI Product Engineering
My current view on the overlap between reliable data platforms and useful AI products.
Lessons From Reliable Data Systems
A synthesis of what years of pipeline, privacy, and ML infrastructure work have taught me.
Data Engineering for Agentic Products
How tool-using AI products raise the bar for logs, permissions, and evaluation data.
Building Trustworthy Feature Stores
What feature infrastructure needs beyond a place to store feature values.
The Hard Part of Reliable AI
Why reliable AI products depend on ordinary engineering discipline around changing systems.
Measuring Retrieval Quality
A deeper look at evaluating the data layer behind LLM product behavior.
From Pipelines to Platforms
How my perspective shifted from building individual workflows to enabling teams.
AI Systems Need Data Contracts Too
Why prompts, retrieval context, and evaluation data need explicit interfaces.
Designing for Data Minimization
How privacy constraints can lead to cleaner, more purposeful data architecture.
Reliable Data Systems Have Memory
A more mature view of run history, decisions, and institutional knowledge.
Privacy in AI Systems
Why AI products make data minimization and permission boundaries even more important.
Evaluating LLM Products
A practical view of evaluation data, feedback loops, and product quality.
Retrieval Is Data Engineering
How LLM applications made familiar data quality problems show up in a new interface.
Data Platforms Need Product Thinking
Why internal platforms should be designed around workflows, not just capabilities.
Cost Is an Observability Signal
How cloud spend helped me see inefficient data systems before they became incidents.
Deleting Data Is Engineering
Why retention, deletion, and lifecycle controls deserve real design attention.
Serving Features Reliably
A note on the gap between offline feature logic and production serving expectations.
ML Infrastructure Is a Feedback System
Why I started thinking beyond training pipelines and toward learning loops.
Testing Transformations Without Pretending Data Is Code
A more practical view of testing analytics and pipeline logic.
Lineage as a Debugging Tool
How lineage became more useful to me when I stopped treating it as a catalog feature.
Privacy Is a Systems Property
Why privacy engineering belongs in architecture decisions, not only review checklists.
Feature Pipelines Are Data Products
How ML feature work changed the way I thought about ownership and interfaces.
Observability for Humans
A reflection on alerts, dashboards, and making data systems easier to operate.
Data Contracts Before They Were Fashionable
How producer-consumer expectations became a recurring theme in my data engineering work.
Idempotency as Calm
Why rerunnable jobs made data operations feel less dramatic.
Reliability During Uncertainty
How changing business conditions made me think harder about freshness, drift, and operational signals.
Documenting the Weird Parts
Why the most useful data documentation often explains exceptions, not happy paths.
Partitions and Practical Performance
A note on learning to make warehouse performance understandable instead of mysterious.
Backfills Taught Me Humility
What historical data corrections revealed about assumptions hidden inside pipelines.
Data Quality Is Product Quality
How I began connecting backend data issues to the product experiences people actually see.
Batch Jobs Need Owners
A reflection on why scheduled jobs become fragile when nobody clearly owns their behavior.
Debugging With Row Counts
A simple reliability habit that helped me understand data movement before adding bigger tools.
Schemas Are Promises
Why I started treating schemas as contracts between teams, not just database metadata.
Learning From My First Broken Pipeline
Early notes on why a data pipeline that runs is not always a data pipeline that works.