Fresher Is Not Faster - Why Cloud Costs Refuse to Show Up in Real Time!

Shannon
46 minutes ago
10 min read

As with previous blog posts, all code can be found here!

The Norns sat beside the Well of Urd weaving fate into the roots of Yggdrasil. The thread was already spun long before anyone knew how the story would end, and by the time the consequences arrived, they were simply discovering decisions that had already been made.

I found myself thinking about that recently (HUGE Norse mythology fan) after yet another conversation about cloud billing, because the parallels are stronger than you might expect (#nerdalert). One of the questions I hear most often when talking with customers is whether they can see cloud costs in real time. Sometimes the question is about Azure, sometimes it's AWS, and sometimes it's GCP. Occasionally someone will ask whether there is an API that is more current than the billing export, or whether polling every few minutes gets them closer to a live view of spend. It is an incredibly logical question, because we have spent years building operational platforms around near real-time telemetry. We expect CPU metrics to appear almost immediately, we expect logs to stream in seconds, and we expect distributed traces to tell us exactly what just happened. Naturally, people assume cloud costs should behave the exact same way.

The interesting part is that almost everyone starts by looking in the wrong place (me included). The assumption is usually that the billing export is slow because it's a file, while the API must somehow be plugged into a live meter running behind the scenes. If the export lands every night but the API answers in a few hundred milliseconds, surely the API must know something the export does not, RIGHT??

Wrong... unfortunately!

I have watched teams build dashboards around that assumption, shorten polling intervals from hours to minutes, and spend weeks trying to squeeze more freshness out of billing APIs because they believed the bottleneck lived in how they were retrieving their spend data. The reality is much less exciting, but it is also much more important to understand, because it fundamentally changes how you think about cloud cost visibility. The cost of what you deployed this morning is already being determined whether you can see it or not. Your virtual machines are running, your Kubernetes cluster is scheduling workloads, your storage accounts are accumulating data, and your applications are happily generating requests. Every one of those activities is creating usage that will eventually become a charge somewhere on your invoice.

The meter is not waiting for tonight's export to begin turning, and it is not waiting for you to query an API before deciding what you owe. What you are actually waiting on is something entirely different. You are waiting for the cloud provider to observe the usage, calculate the billable quantities, apply pricing, account for reservations or commitments, reconcile everything into a billing record, and finally publish that information somewhere other services can consume it. Until that pipeline finishes, there simply is not a definitive number for any API to return. Instead of asking whether an API is faster than an export, we should be asking whether either of them can possibly know something that has not been calculated yet. Once you frame the problem that way, the conversation changes almost immediately, because the bottleneck is the billing pipeline itself, which is owned entirely by the cloud provider. You can ask for the data every thirty seconds if you would like, but if the provider has not finished producing the billing statement of record yet, you are simply asking the same unanswered question more frequently.

Query Latency vs. Data Latency

This becomes much easier to understand if you separate query latency from data latency, because they are two completely different problems that often get lumped together. Query latency is how quickly an API responds after you ask a question. Data latency is how old the information was before you asked it. One is measured in milliseconds while the other is measured in hours, and optimizing one does absolutely nothing for the other. I like comparing this to checking your bank account on your phone, because the banking app opens almost instantly, yet the purchase you made fifteen minutes ago probably has not yet settled. The application is fast, but the transaction simply has not finished working its way through the banking system.

Refreshing the app every thirty seconds does not convince the bank to process your payment any sooner, and cloud billing behaves almost exactly the same way.

To their credit, the cloud providers are not hiding any of this. AWS documents that Cost Explorer refreshes on roughly a daily cadence and that current month costs generally do not appear until after the underlying usage has been processed. Azure explains that cost data for Enterprise Agreement and Microsoft Customer Agreement customers typically appears somewhere between eight and twenty-four hours after usage, while other billing models can take even longer. GCP is probably the most straightforward of the three, because the documentation simply tells you that detailed billing data is usually available within about a day and does not promise a specific latency for exports. The wording differs from provider to provider, but the architecture is remarkably consistent, because all three platforms have to perform the same fundamental work before they can publish authoritative billing data.

Ironically, GCP makes the lesson easier to understand than either AWS or Azure. When people talk about querying GCP costs, they are often querying the exported billing tables sitting in BigQuery. In other words, they are running SQL against a dataset that has already been produced by the billing pipeline. The query itself might finish in two seconds, but if the table is eighteen hours behind reality, you have simply received yesterday's truth very quickly. Once you recognize that pattern, it is much easier to see the same architecture hiding behind AWS Cost Explorer and Azure Cost Management.

Where the Time Actually Goes

Whenever I explain this on a whiteboard, I usually draw the pipeline, because pictures tend to settle the argument much faster than documentation. Most people imagine the delay sitting somewhere between the API and their dashboard, but that is almost never where the time is actually being spent. The overwhelming majority of the latency lives further upstream while the provider is still metering usage, rating that usage against thousands of SKUs, applying discounts and commitments, reconciling billing records, and finally publishing the results into the billing platform. Polling more frequently only optimizes the smallest and fastest part of the pipeline, because you are shaving seconds off a process that is fundamentally waiting on work measured in hours.

  spend happens
      │
      ▼ the floor: the provider has to meter it, rate it, and bill it
      │ AWS up to ~24h | Azure 8 to 72h | GCP within a day, sometimes longer
      │
  cost data lands in the billing store
      │
      ▼  refresh cadence, minutes to hours
      │  AWS CE ~3x/day | Azure ~6x/day | GCP a few times/day, no SLA
      │
  CUR file / Cost API / BigQuery export (same source, same truth)
      │
      ▼  your poll interval, seconds
      │
  your dashboard

It helps to hold three pictures in your head at the same time to understand your options. Your detailed bill is the monthly bank statement, which is authoritative, fully reconciled, and lands well after the fact for formal accounting. The cost API or BigQuery query is checking your balance in the banking app, which feels live but only reflects settled transactions. The only thing that is truly live is the running tab you keep in your own head while you are ordering the third round. Most of us pour our energy into making the banking app refresh faster, but the move that actually pays off is learning to keep the tab yourself.

Keeping the tab is more approachable than it sounds. It works for anything you can count yourself, like instance-hours off your metrics, request counts at the gateway, or bytes heading out the door. It works beautifully for LLM spend, because the token counts come right back inside the response. You are not even estimating usage there, you are just reading it off the receipt the model hands you and applying a price you already know. All three clouds publish those prices openly through the AWS Price List API, the Azure Retail Prices API (which does not even ask you to authenticate), and the GCP Cloud Billing Catalog API, so you can cache them, refresh them on a schedule, and build a little price book of your own that never goes stale for long.

Path A: A Fresher Read of Slow Truth

Before we build the live stuff, it is worth having the lagged read too, because daily service-level trends are still incredibly useful. Here is the same month-to-date-by-service question asked once per cloud. On AWS, just remember that every Cost Explorer call costs a literal penny, so you should resist the urge to drop it in a tight loop.

import boto3
from datetime import date

ce = boto3.client("ce", region_name="us-east-1")

today = date.today()
start = today.replace(day=1).isoformat()
end = today.isoformat()  # end date is exclusive

resp = ce.get_cost_and_usage(
    TimePeriod={"Start": start, "End": end},
    Granularity="DAILY",
    Metrics=["UnblendedCost"],
    GroupBy=[{"Type": "DIMENSION", "Key": "SERVICE"}],
)

for window in resp["ResultsByTime"]:
    day = window["TimePeriod"]["Start"]
    for group in window["Groups"]:
        svc = group["Keys"][0]
        amt = float(group["Metrics"]["UnblendedCost"]["Amount"])
        print(f"{day}  {svc:<32} ${amt:,.2f}")

Azure asks the same question through its Query API. It uses the exact same idea and hits the exact same freshness ceiling as the others.

python

import requests
from azure.identity import DefaultAzureCredential

cred = DefaultAzureCredential()
token = cred.get_token("https://management.azure.com/.default").token

sub = "00000000-0000-0000-0000-000000000000"
url = (
    f"https://management.azure.com/subscriptions/{sub}"
    "/providers/Microsoft.CostManagement/query"
    "?api-version=2023-11-01"  # worth checking for a newer stable version
)

body = {
    "type": "ActualCost",
    "timeframe": "MonthToDate",
    "dataset": {
        "granularity": "Daily",
        "aggregation": {"totalCost": {"name": "Cost", "function": "Sum"}},
        "grouping": [{"type": "Dimension", "name": "ServiceName"}],
    },
}

resp = requests.post(
    url,
    headers={"Authorization": f"Bearer {token}", "Content-Type": "application/json"},
    json=body,
    timeout=30,
)
resp.raise_for_status()

for row in resp.json()["properties"]["rows"]:
    print(row)

Then we have GCP, where the API is literally just SQL over the BigQuery export. This is the cleanest demonstration of the whole idea, because the query returns instantly while the table underneath it is hours old.

from google.cloud import bigquery

client = bigquery.Client()
# Standard export table; the detailed one is gcp_billing_export_resource_v1_*
TABLE = "my-billing-project.billing_export.gcp_billing_export_v1_0123AB_4567CD_89EF01"

sql = f"""SELECT
  service.description AS service,
  ROUND(SUM(cost), 2) AS cost
FROM `{TABLE}`
WHERE usage_start_time >= TIMESTAMP(DATE_TRUNC(CURRENT_DATE(), MONTH))
GROUP BY service
ORDER BY cost DESC"""

for row in client.query(sql).result():
    print(f"{row.service:<32} ${row.cost:,.2f}")

If you would rather not build a poller at all, each cloud hands you an event-driven hook that fires off this same lagged data: AWS Cost Anomaly Detection, Azure Budgets alerts, and GCP Budgets with Pub/Sub notifications. Lovely for catching a trend, still not a tripwire.

If you prefer not to write Python, none of this has to be done that way. The CLI, PowerShell, and the SDK all hit the same endpoints and get back the same slow truth, so your choice only changes ergonomics rather than latency. The CLI is the multitool on your belt that is perfect for interactive poking or a one-off check, but it becomes miserable the moment you need loops or state.

# AWS
aws ce get-cost-and-usage \
  --time-period Start=2026-06-01,End=2026-06-17 \
  --granularity DAILY --metrics UnblendedCost \
  --group-by Type=DIMENSION,Key=SERVICE --region us-east-1

# Azure (needs: az extension add --name costmanagement)
az costmanagement query --type ActualCost --timeframe MonthToDate \
  --scope "/subscriptions/<sub-id>" --dataset-granularity Daily \
  --dataset-aggregation '{"totalCost":{"name":"Cost","function":"Sum"}}' \
  --dataset-grouping name=ServiceName type=Dimension

# GCP has no cost-query CLI, so it's bq over the export
bq query --use_legacy_sql=false \
'SELECT service.description AS service, ROUND(SUM(cost),2) cost
 FROM `proj.dataset.gcp_billing_export_v1_XXXX`
 WHERE usage_start_time >= TIMESTAMP(DATE_TRUNC(CURRENT_DATE(), MONTH))
 GROUP BY service ORDER BY cost DESC'

PowerShell is the native admin language on the Azure side, giving you real objects back instead of making you scrape text. It drops straight into an Azure Automation runbook or a scheduled task easily, so you can push results into Log Analytics. AWS is well covered through the AWS.Tools modules, but GCP deprecated Cloud Tools for PowerShell, leaving you with just the CLI options there.

# Azure
Invoke-AzCostManagementQuery -Type ActualCost -Timeframe MonthToDate `
  -Scope "/subscriptions/<sub-id>" -DatasetGranularity Daily `
  -DatasetAggregation @{ totalCost = @{ name = 'Cost'; function = 'Sum' } } `
  -DatasetGrouping @(@{ type = 'Dimension'; name = 'ServiceName' })

# AWS
Get-CECostAndUsage -TimePeriod_Start 2026-06-01 -TimePeriod_End 2026-06-17 `
  -Granularity DAILY -Metric UnblendedCost `
  -GroupBy @{ Type = 'DIMENSION'; Key = 'SERVICE' } -Region us-east-1

Path B: The Live Tab

Now let us look at the live tab, which exists the exact moment a call returns. The number you care about only lives inside the running process that made the call, so the live tab has to be built in your application code rather than a shell script. Wrap whatever you are paying for, read the usage off the response, apply your cached price, and emit the metric immediately.

import time, json, logging

# USD per 1,000,000 tokens. Treat this as a cache of the vendor price
# list, not gospel. Refresh it from the published pricing on a schedule.
PRICE_BOOK = {
    "claude-opus":   {"input": 15.00, "output": 75.00},
    "claude-sonnet": {"input":  3.00, "output": 15.00},
    "gpt-frontier":  {"input":  5.00, "output": 15.00},
    "gemini-pro":    {"input":  1.25, "output":  5.00},
}

def cost_of(model, input_tokens, output_tokens):
    rates = PRICE_BOOK[model]
    return (
        (input_tokens / 1_000_000) * rates["input"]
        + (output_tokens / 1_000_000) * rates["output"]
    )

def tracked_completion(client, model, messages):
    started = time.monotonic()
    resp = client.messages.create(model=model, messages=messages, max_tokens=1024)

    usage = resp.usage
    cost = cost_of(model, usage.input_tokens, usage.output_tokens)

    # Your near real-time cost signal. It is alive the moment the call
    # returns. No billing pipeline stands between you and this number.
    emit_metric(
        "llm.cost.usd",
        cost,
        {
            "model": model,
            "input_tokens": usage.input_tokens,
            "output_tokens": usage.output_tokens,
            "latency_ms": round((time.monotonic() - started) * 1000),
        },
    )
    return resp

Wire the emitter into wherever your signals already live. If you use the OpenTelemetry exporter, it ships straight into Application Insights or Datadog, so it sits right next to the rest of your service telemetry.

from opentelemetry import metrics

meter = metrics.get_meter("finops.tokens")
cost_counter = meter.create_counter("llm.cost.usd", unit="USD")

def emit_metric(name, value, attributes):
    cost_counter.add(value, attributes)
    logging.info(json.dumps({"metric": name, "value": value, **attributes}))

That is the whole trick to gaining real visibility. Now when some over-eager agent decides to recurse through your entire knowledge base at 2am, you get an alarm in seconds instead of a nasty surprise on next month's invoice. That is the difference between an actionable cost signal and a monthly cost autopsy.

That live signal does come with one honest catch, and it is worth saying out loud. Your number is a list-price estimate, and it will not match the final bill. The statement knows things your tab never will, including negotiated discounts, savings plans, committed-use discounts, and taxes. So lean on both, each for what it's good at. The estimate steers you in real time, and once a month you set it beside the detailed export, measure how far the drift has wandered, and carry that correction factor forward.

And if you are hoping FOCUS rescues you here, it helps but it does not. FOCUS is doing real and overdue work giving us one schema for that statement across every major provider, yet it standardizes the shape of the batch file rather than the speed of its delivery. It is a schema fix, not a latency fix, so the real-time problem stays exactly where it started, which is with you.

So that's where I leave it when talking to customers. The provider owns the statement of record, and no amount of polling will pull that number forward, because it does not exist until the pipeline finishes writing it. What you own is the tab. Meter the usage yourself, apply a price you already hold, and emit the signal the moment the work happens. It will be a little wrong, and it will be live, and live-but-slightly-wrong is the number that catches a runaway before it hardens into a line item, instead of explaining it to you long after the money is gone.

The Norns finished weaving this morning's bill the moment you deployed your workloads, and they are not going to read you the weave any sooner. So pull up a seat beside your own well and learn to watch the threads as they are spun. That is as close to real time as cloud cost is ever going to get, and once you stop chasing the statement and start keeping the tab, it turns out to be close enough.

CLOUDY MUSINGS

If I can do it, you can, too!

Fresher Is Not Faster - Why Cloud Costs Refuse to Show Up in Real Time!

Query Latency vs. Data Latency

Where the Time Actually Goes

Path A: A Fresher Read of Slow Truth

Path B: The Live Tab

Recent Posts