Aug 14, 2018

Debugging With Row Counts

One of the most useful early habits I built was logging row counts at every meaningful step. It sounds too simple to be worth writing about, but it changed how I debugged.

Before that, a failed report could send me searching through code, source files, and warehouse tables without a clear starting point. Row counts gave me a map. I could see where data dropped, duplicated, or stopped arriving.

Later, I learned more sophisticated approaches: data quality frameworks, lineage, freshness monitors, and anomaly detection. But the instinct was the same. Make the pipeline explain itself.

Good observability starts with humble questions. How much data came in? How much came out? Does that match what we expected?

Why Simple Signals Worked

Row counts worked because they were easy to explain. I did not need a specialized tool or a deep theory of observability to show that one step received 4 million rows and the next step produced 40,000. That kind of signal immediately narrows the search space.

The pattern also forced me to think in stages. Instead of treating a pipeline as one black box, I began seeing it as a series of contracts. Extract, parse, filter, join, aggregate, publish. Each stage should have an expected relationship with the previous stage. Sometimes the relationship is exact. Sometimes it is directional. Either way, it should be named.

Later, I learned to enrich those checks with distributions, freshness windows, duplicate detection, and reconciliation against source systems. But the row-count habit remained useful because it was the first version of a larger idea: a pipeline should produce evidence while it runs.

That evidence matters most when something goes wrong. Debugging is expensive when every question requires a new query. It is much cheaper when the system has already left a trail of clues for the engineer who has to pick it up.

From Counts to Expectations

The next step after logging row counts was learning to attach expectations to them. A count by itself is a fact. A count compared to history, source totals, or business rules becomes a signal. That difference matters.

For example, a 20 percent drop may be normal on a holiday, suspicious on a weekday, and expected after a product launch changes event behavior. The check needs context to avoid becoming noise. This is where simple monitoring begins to require product understanding.

I also found that row counts are a good teaching tool. They make pipeline behavior visible to people who do not want to read transformation code. When an analyst or product manager can see where data changed shape, debugging becomes more collaborative.

Even now, with better tools available, I do not dismiss simple checks. They are often the first layer of observability. If a system cannot answer basic volume questions, it usually cannot answer deeper reliability questions either.