The Telemetry Tax: When Watching Your Systems Costs More Than Running Them

Somewhere in your cloud bill is a line item that grew faster than the thing it measures. It is not compute. It is not storage. It is the cost of watching compute and storage. The clearest public signal of how far this has gone surfaced on a Datadog earnings call, where executives referenced a single customer running a roughly $65 million annual observability bill. Reporting from The Pragmatic Engineer, citing multiple engineering sources, identified that customer as Coinbase, with the bill accumulated in 2021 and the company described internally as an "early optimizer." Read that again: $65 million per year, and the vendor called the situation an early-stage optimization problem. The New Stack walked through why every engineering leader should treat that number as a warning, not a curiosity.

The number is extreme. The pattern is not. Across the mid-market and enterprise, monitoring, logging, tracing, and incident tooling have quietly crossed a line: in a growing share of organizations, the telemetry stack now costs more than the systems it observes. Practitioners have a name for it. They call it the observability tax, and in one documented mid-size SaaS profile a fleet of 150 hosts on AWS spent close to 70% of its infrastructure budget just to observe that infrastructure.

This is not a tooling complaint. It is a finance problem hiding inside an engineering invoice. Most CFOs modeled cloud cost as a function of usage. Almost none modeled the cost of measuring usage, which is now compounding on its own curve. You are paying a premium to watch a system you could have run for less.

How the Tax Got So Big

Three forces compound at once, and together they explain why observability bills behave less like a utility and more like a runaway meter.

The pricing model multiplies, it does not add. Datadog's published list prices charge $15 per host per month for Infrastructure Pro and $23 per host on Enterprise, with APM stacking another $31 to $40 per host and log management billed at $0.10 per ingested GB plus $1.70 per million indexed events. Each product is priced per host, per GB, or per million events, so a single well-instrumented machine is billed several times across several SKUs. SigNoz's breakdown of the pricing caveats shows how a per-host headline rate becomes a multi-line invoice once APM, logs, synthetics, and custom metrics are switched on.

Telemetry volume is growing faster than the business. Cribl, which builds pipelines specifically to thin this firehose, puts the growth rate plainly: telemetry data is expanding at roughly 29% a year, which doubles your volume, and your bill, about every 18 months. The shift to cloud-native architecture poured accelerant on this. Chronosphere reports that container-based environments emit between 10 and 100 times more observability data than the VM-based systems they replaced, and that 69% of companies are now worried about the rate of their own data growth.

High cardinality is the silent detonator. Cardinality is the number of unique value combinations in your metrics, and it is where bills go vertical without warning. Honeycomb's explainer gives the canonical example: a metric tracked across three regions and four environments is twelve time series, but add a single high-cardinality tag like user_id across a million users and that becomes twelve million time series. In a per-metric pricing model, that one tag is the difference between a rounding error and a budget emergency. Datadog's own custom-metrics billing documentation confirms the mechanism: you get an allotment per host, then pay per custom metric beyond it, counted across the whole account. One verbose microservice emitting 500-plus custom metrics quietly rewrites your invoice.

Where the Observability Dollar Actually Goes

The tax is not one charge. It is a stack of them, and most teams cannot tell you the split without an audit. Using the documented 150-host SaaS profile that totals $16,893 per month, or $202,716 per year, here is where the money lands.

Cost driver	Monthly (150-host profile)	What pushes it up
APM (per host)	$5,925	Tracing every service, retained traces, per-host stacking
Log management (ingest + index)	$4,500	Volume growth, verbose logs, long retention windows
Infrastructure metrics (per host)	$3,450	Host count, custom-metric overages, cardinality
Incident management	$2,050	Per-seat pricing, on-call team size
Status page, error tracking, uptime	$968	Per-event and per-seat add-ons
Total observability	$16,893	vs. ~$10,000 to $15,000 AWS compute

The bottom row is the whole argument. Compute for those same 150 hosts runs roughly $10,000 to $15,000 a month. The watching costs more than the running. And two of the four largest line items, logs and custom metrics, are volume-driven, which means they grow on the 29% curve whether or not the business does.

It gets more pointed. Datadog's Cloud Cost Management product, the feature meant to help you control cloud spend, is itself billed at a percentage of the cloud spend it monitors. You pay a cut of your bill to a tool whose job is to lower your bill. That is the tax in its purest form.

Practitioners Have Already Voted

This is not a fringe worry pushed by vendors selling the fix. It is the dominant concern of the people who run these systems. In Grafana Labs' 2024 Observability Survey, 61% of respondents named cost or unexpected bills as one of their biggest concerns, ahead of every technical worry on the list. The same survey documents the sprawl feeding those bills: 70% of teams run four or more observability technologies, and respondents collectively named 62 different tools in use. Four overlapping bills, four data formats, four contracts, and no single throat to choke when the invoice spikes.

The macro picture confirms the trajectory. The observability market sat around $4.1 billion in 2024 and is forecast to compound at roughly 16% annually through 2034. The spend is not plateauing. It is accelerating, on top of a per-host model, on top of 29% annual data growth.

The Reframe: You Are Renting Hindsight

Here is the contrarian read most budget reviews miss. Observability is sold as risk reduction, so it gets waved through procurement like insurance. But unlike insurance, the premium is indexed to your own verbosity. Every new log line, every high-cardinality tag a well-meaning engineer adds for a debugging session and never removes, every service that gets traced "just in case" raises the premium permanently. You are not buying a fixed policy. You are renting hindsight by the gigabyte, and the meter never resets.

The vendors understand this perfectly, which is why the growth metric they report is not customer count. It is the number of customers paying them enormous sums. As of December 31, 2025, Datadog reported 603 customers with annual recurring revenue of $1 million or more, up 31% year over year, and about 4,310 customers at $100,000 or more, up 19%. A year earlier those numbers were 462 and 3,610 respectively. Total 2025 revenue reached $3.43 billion. The business model is not landing customers. It is expanding them, and the expansion vector is the same data growth that shows up as your problem and their revenue.

The Escape Hatch the Vendors Don't Advertise: OpenTelemetry

Every number in this post traces back to one structural fact: the bill is denominated in a vendor's proprietary agent. When your application is instrumented with a closed SaaS SDK, the data, the format, and the destination are all owned by the company sending you the invoice. You cannot shop the data elsewhere because the data only speaks one dialect. That is not a monitoring problem. It is a procurement problem, and it has a fix the incumbents would rather you not study too closely.

That fix is OpenTelemetry (OTel), the vendor-neutral instrumentation standard governed by the CNCF. In May 2026 the foundation graduated the project, noting it has reached the second-highest project velocity among over 240 projects in the cloud native ecosystem, second only to Kubernetes, with over 12,000 contributors from over 2,800 companies. This is not a fringe experiment. CNCF survey data puts OpenTelemetry as the most widely used non-graduated project among end users at the time of measurement, and independent industry polling found 48.5% of IT organizations already using it and another 25.3% planning to adopt.

Here is why that matters to your budget. OTel splits instrumentation from the backend. Your code emits data in one open protocol (OTLP), and the collector decides where it lands. Switching analytics platforms becomes a configuration change, not a re-instrumentation project across every service you own. That single decoupling lets you route data to cheaper or self-hosted destinations: the Grafana stack (Loki for logs, Mimir for metrics, Tempo for traces), ClickHouse-backed tools, or any OTLP-compatible backend, without touching a line of application code.

The savings are measured, not theoretical. Per the Grafana OpenTelemetry report, 57% of organizations reduced costs after adopting OpenTelemetry, 84% of those saw at least a 10% decrease, and 46.4% reported over 20% ROI. Self-hosting analyses show the math tips toward owning your stack once telemetry volume crosses roughly 50 GB per day, the exact volume that makes per-host SaaS pricing punishing.

The operator read is blunt. If your telemetry is locked to one company's agent, you have no negotiating leverage and no exit. The vendor knows it, which is precisely why renewal quotes climb. OpenTelemetry hands both back: leverage at the negotiating table because you can credibly walk, and an exit ramp because the data was never trapped in the first place. Adopting OTel is the structural prerequisite. What you do with that freedom (what you collect, what you sample, what you drop, and what you refuse to pay to store) is the operational discipline the next section covers.

What Disciplined Operators Do Instead

The fix is not to stop measuring. It is to treat telemetry as a managed cost with an owner, a budget, and a value test, the same way you treat compute. The teams who have broken the curve do a recognizable set of things.

Put a pipeline between emission and ingestion. Telemetry pipelines filter, sample, and route before data hits a per-GB meter. Cribl reports that aggressive filtering and routing can cut the volume sent to expensive tools by 50% or more without losing the signal that matters.
Aggregate before you store. Abnormal Security's metrics were on pace to climb from 10 to 12 million active series toward 50 million. By aggregating, they cut 98% of those metrics and ran roughly 10 times more cost-effectively than the SaaS and self-managed alternatives. Most of that data was never queried.
Govern cardinality at the source. Treat a new high-cardinality tag as a budget decision, not an engineering convenience. The jump from twelve to twelve million time series is a code review away.
Tier retention ruthlessly. Hot, queryable storage for the recent window; cheap cold storage for compliance. Most teams index everything at the most expensive tier by default.
Run a value test. If a dashboard or metric has not been queried in 90 days, it is a cost with no consumer. Cut it.
Model telemetry in the FinOps budget. If observability is not a named line with a forecast and an owner, it will grow at 29% a year unsupervised. What gets measured gets managed, including the cost of measuring.

The Bill Nobody Sized

The uncomfortable truth is that observability became expensive precisely because it works. It is genuinely useful, so teams instrument everything, retain everything, and tag everything, and the per-host, per-GB, per-metric pricing model converts that diligence into a compounding liability. The vendors did not trick anyone. They built a meter and let customer behavior fill it, then booked the result as net revenue retention.

The discipline is the same discipline that should govern any infrastructure line: know what each dollar buys, kill what nothing consumes, and never let a cost grow on autopilot because it is labeled "safety." A $65 million bill does not arrive in one quarter. It arrives one unexamined log line at a time, and the only defense is an operator who reads the meter before the meter reads the budget.

Strategia-X is the senior-operator advisory that helps leaders model the costs nobody put in the spreadsheet, including the telemetry tax, at strategia-x.com.

-Rocky

#Observability #FinOps #Monitoring #Datadog #CloudCosts #Telemetry #SRE #ITOperations #StrategiaX #RockyStack #EngineeringDreams

Telemetry consent. Operator-grade.

The Telemetry Tax: When Watching Your Systems Costs More Than Running Them