Nov 7, 2022

Cost Is an Observability Signal

I used to think about cost mostly as a budget concern. Then I started treating it as an observability signal.

Unexpected cost often points to something interesting: a query scanning too much history, a join multiplying rows, a backfill running repeatedly, a table growing without retention, or an orchestration bug retrying work.

Cost data can reveal system behavior that normal success/failure metrics miss. A green pipeline that costs twice as much this week is still telling you something.

The goal is not to make engineers afraid of using compute. The goal is to make cost visible enough that teams can connect engineering choices to operational consequences.

Reading the Cost Graph

Cost spikes are often symptoms of design drift. A table grows beyond its original purpose. A daily query starts scanning years of history. A backfill gets scheduled as if it were incremental. A retry loop hides behind green task status but burns compute all afternoon.

I began treating cost reviews like system reviews. Which jobs dominate spend? Is that spend tied to business value? Are expensive queries expensive for good reasons, or because the data model makes the common path inefficient?

This also made platform conversations more grounded. Instead of saying “optimize your queries” in the abstract, we could point to specific patterns: missing partition filters, avoidable cross joins, duplicated intermediate tables, or transformations that should be materialized once instead of recomputed by every consumer.

Cost visibility should not shame teams. It should help them see the shape of the system they are building. When cost becomes observable, performance and reliability conversations get much more concrete.

Cost as Feedback, Not Punishment

The tone around cost matters. If cost review feels punitive, engineers hide or avoid the conversation. If it feels like feedback, it becomes another way to understand system behavior.

I like cost dashboards that map spend to workflows, not just teams. Which product metrics are expensive to compute? Which ML features cost the most to refresh? Which exploratory workloads became production dependencies without redesign?

This helps separate valuable cost from waste. Some expensive jobs are worth it because they support important decisions. Others are expensive because the system evolved accidentally. The goal is to know the difference.

Cost awareness also improves architecture. It encourages clearer data grains, better materialization choices, retention discipline, and more thoughtful backfills. In that sense, cost is not just a finance signal. It is an engineering quality signal.