Skip to content

Usage, quota, and billing metrics

Usage, quota, and billing metrics

This document explains how usage reporting, token accounting, quota/billing errors, and session-level metrics are implemented in the extracted Copilot CLI app.js bundle. The user-visible command is /usage, but the runtime also emits assistant.usage, session.usage_info, model metrics, quota errors, and shutdown summaries.

The important implementation point is that the CLI tracks two related but different things:

  • live context-window usage, emitted as session.usage_info;
  • accumulated request/cost/session metrics, emitted through assistant.usage, /usage, and session.shutdown.

Because app.js is bundled/minified, symbol names are unstable. Line references below are searchable anchors in the extracted bundle and will shift across releases.

Source anchors

AreaAnchor strings / minified symbolsApprox. app.js lineWhat it shows
Slash command/usage, Display session usage metrics and statistics4643, 1300User command renders accumulated session usage.
Live context usagesession.usage_info, tokenLimit, currentTokens, messagesLength3062, 4361, 4481Ephemeral event reports current context-window size and token breakdown.
API usage eventassistant.usage, inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens, reasoningTokens4361, 4471Per-call usage events feed the metrics tracker.
Premium request metrictotalPremiumRequests1300, 4033, 4361, 4396Accumulated request-cost metric shown in /usage and shutdown telemetry.
AI Units metrictotalNanoAiu, AI Units3092, 4033, 6591Token-based billing/AI Units are tracked separately from premium request count.
API durationtotalApiDurationMs4033, 4361, 4396Session accumulates time spent in model API calls.
Per-model statsmodelMetrics, requests, usage, tokenDetails4033, 4361, 4396Shutdown and UI summarize request/token usage by model.
Code-change statscodeChanges, linesAdded, linesRemoved, filesModified1300, 4033, 4361Usage display includes aggregate edit impact.
Quota/billing errorsbilling_not_configured, session_quota_exceeded, quota_exceeded191402 quota/billing responses have user-specific messages.
Rate-limit couplingeligibleForAutoSwitch, rate_limit4361, 4487Rate-limit errors can trigger auto-mode switch behavior.

Metric map

flowchart TD
Provider[Model provider response] --> UsageEvent[assistant.usage]
UsageEvent --> Tracker[Usage metrics tracker]
Truncator[Context/truncation calculator] --> UsageInfo[session.usage_info]
Tracker --> Slash["/usage output"]
Tracker --> Shutdown[session.shutdown metrics]
Provider --> Errors{error response}
Errors --> Quota[quota/billing messages]
Errors --> RateLimit[rate_limit and auto-mode switch]

/usage command

The /usage command renders a compact session summary. The implementation builds output similar to:

  • Session Usage;
  • changes, as +linesAdded -linesRemoved;
  • request total, displayed as either Premium or AI Units depending on account/billing mode;
  • elapsed session duration;
  • token totals when available: input, output, cached, and reasoning.

The command reads from the session’s usageMetrics object rather than recomputing history from raw provider responses.

Live context usage: session.usage_info

session.usage_info is an ephemeral event describing current context-window pressure. Its schema includes:

FieldMeaning
tokenLimitMaximum prompt/context tokens for the active model.
currentTokensCurrent total tokens in the context window.
messagesLengthNumber of messages currently in context.
systemTokensOptional system/developer message token count.
conversationTokensOptional user/assistant/tool conversation token count.
toolDefinitionsTokensOptional token count for model-visible tool definitions.
isInitialWhether the event corresponds to initial context calculation.

This event is emitted by truncation/compaction-related code and by context calculation paths. It helps UI surfaces display context usage without waiting for a model API response.

API usage: assistant.usage

assistant.usage is also ephemeral, but it represents a model API call rather than context-window state. Its schema includes:

FieldMeaning
modelModel identifier used for the call.
inputTokensInput tokens consumed.
outputTokensOutput tokens produced.
cacheReadTokensPrompt-cache read tokens.
cacheWriteTokensPrompt-cache write tokens.
reasoningTokensReasoning-token count when provider reports it.
copilotUsage / tokenDetailsToken-based billing details when present.
totalNanoAiuNano AI-unit cost for token-based billing.

A metrics tracker processes this event and updates per-model and session-level totals.

Aggregated session metrics

The session metrics tracker accumulates:

Metric familyExamples
Request costtotalPremiumRequests, per-model requests.count, requests.cost.
AI UnitstotalNanoAiu, per-model totalNanoAiu, token-based billing details.
Tokensinput, output, cache read/write, reasoning, token-type details.
TimetotalApiDurationMs, session duration since sessionStartTime.
Code changeslines added, lines removed, modified file count.
Model statecurrentModel, last-call input/output tokens, per-model breakdown.

The /usage command shows a user-friendly subset. session.shutdown emits a fuller summary for telemetry and logs.

Shutdown event

The session.shutdown schema includes accumulated usage and code-change data:

FieldMeaning
shutdownTyperoutine or error.
errorReasonError string when shutdown is not routine.
totalPremiumRequestsSession-wide premium request cost.
totalNanoAiuOptional accumulated AI Units cost.
tokenDetailsOptional token-type counts.
totalApiDurationMsTotal time spent in API calls.
sessionStartTimeMillisecond timestamp of session start.
codeChangesLines/files modified by the session.
modelMetricsPer-model request and token breakdown.
currentTokensContext tokens at shutdown, when known.
systemTokens, conversationTokens, toolDefinitionsTokensShutdown context-window breakdown.

The telemetry projection expands per-model metrics into model-specific fields and restricted properties.

Premium requests versus AI Units

The UI distinguishes two billing vocabulary paths:

DisplayInferred mode
Requests: <n> PremiumPremium request accounting.
AI UnitsToken-based billing / AI-unit accounting.

The bundle tracks both totalPremiumRequests and totalNanoAiu. In some account modes, the UI labels cost as AI Units rather than Premium requests. The internal metrics still retain token counts and model-level details either way.

Token details and nano AIU

The helper that parses Copilot usage maps provider usage fields like:

  • token_details[].token_type;
  • token_details[].token_count;
  • token_details[].batch_size;
  • token_details[].cost_per_batch;
  • total_nano_aiu.

Those details become tokenDetails and totalNanoAiu in internal events/metrics. This allows the CLI to support both simple token counters and token-based billing metadata without changing the high-level event flow.

Quota and billing errors

The bundle contains explicit messages for 402 quota/billing cases:

Error codeUser-facing meaning
billing_not_configuredMultiple Copilot licenses are available and the user must configure which license to use.
session_quota_exceededThe current session reached its spending limit; start a new session to continue.
quota_exceededMonthly included AI credits are exhausted; wait for reset or increase budget.

These are distinct from HTTP 429 rate limits. Quota/billing errors are about entitlement or budget. Rate limits are about request pacing or provider capacity.

Rate-limit relationship

session.error has an errorType path for rate_limit, and the schema includes eligibleForAutoSwitch. When a non-auto model hits a model/user/integration rate limit and a fallback model path exists, the runtime can follow with an auto_mode_switch.requested event or silently switch when continuation settings allow it.

This connects usage/quota tracking to resilience behavior:

  • quota/billing failures usually require user/account action;
  • rate limits may be recoverable through retry, waiting, or automatic model switching;
  • both surfaces are represented as structured session events.

Context usage versus billing usage

These two event families are easy to confuse:

EventAnswers
session.usage_info“How full is the current context window?”
assistant.usage“What did the last model API call consume/cost?”
/usage“What has this session accumulated so far?”
session.shutdown“What final metrics should be logged for the whole session?”

A session can have context usage before an API call, and an API call can produce usage even when the visible context window later changes due to truncation or compaction.

End-to-end metric flow

sequenceDiagram
participant Runtime
participant Provider
participant Tracker
participant UI
participant Telemetry
Runtime->>UI: session.usage_info(current context)
Runtime->>Provider: model request
Provider-->>Runtime: response with token/cost usage
Runtime->>Tracker: assistant.usage
Tracker->>Tracker: update totals and per-model metrics
UI->>Tracker: /usage
Tracker-->>UI: session usage summary
Runtime->>Telemetry: session.shutdown metrics

Relationship to other docs

  • resilience-rate-limits-concurrency.md explains retry, rate-limit, and auto-mode switching behavior.
  • model-api-routing.md explains provider response normalization and where usage arrives from.
  • conversation-compaction.md explains context-window pressure and compaction triggers.
  • system-events-and-ui-projection.md explains how ephemeral usage events reach UI clients.
  • observability-update-shutdown.md explains telemetry and shutdown reporting.

Created and maintained by Yingting Huang.