Command center

Cloud billing ≠ AI economics.

One operating view for AI spend, usage, value, budget burn, provider exposure, and actions — built for usage-based AI pricing.

Decision 01

Will the AI budget surprise us?

Visibility, budget exhaustion date, invoice-only gaps, and value-linked spend.

Decision 02

What should we fix first?

Ranked actions by savings, owner, confidence, and operational risk.

Decision 03

What breaks during a provider incident?

Vendor SLO/SLA posture, exposure matrix, brownout policy, and rerouting controls.

Total spend · 30d
$420K

↑ 20% MoM

Total tokens · 30d
14.5B

in 4.6 · out 2.3 · cached 7.6

Active applications
38

+3 this quarter

Agentic workflows
412

212K exec/mo

FY26 projection
$4.31M

$3.56M w/ actions

Provider concentration
43.6%

Azure · threshold 40%

Spend vs forecast
weekly $K · budget $86K/wk · 90% interval
Apr 3Apr 17May 1May 15May 29Jun 12*Jul 3*anomaly +$11.2K
Growth is driven by agentic workflows, not seat expansion. See the FY26 decision gap →
Live usage events
tokens, model, latency, retries, cost
Live

Digital Diligence

claude-opus-4.8

$1.94

44.1K tok

Vigil

gemini-2.5-pro

$0.09

12.3K tok

Tax Assist

claude-sonnet-4.6

$0.18

15.6K tok

aIQ Chat

gpt-4.1

$0.41

31.2K tok

Recommended actions
Only the four actions that materially change the forecast are shown here

Re-route Digital Diligence premium calls

Opus mix 26% → 12%; blended rate drops ~ $1.9/M tok

$310K

Enforce Advisory & Firmwide budgets

Advisory is at 113% of Q3 pace; add BU controls and alerts

$230K

Expand prompt caching to Clara & Tax

Cached share 52% → 60% across repeated-context workloads

$130K

Fix Vigil retry loop

Failure→retry rate 7.8% → target ~2%

$80K
CFO Control

Know before the AI bill arrives.

A CFO-grade control layer for AI usage visibility, budget burn, sticker-shock risk, and value yield.

Use this when: the CFO asks whether AI cost visibility is mature enough to prevent sticker shock and whether usage maps to value.
AI cost visibility maturity

72% visible. 28% still creates surprise.

Tokay measures how much AI spend is instrumented, attributed, forecastable, controlled, and linked to business value.

72%VISIBLE
Instrumented apps
84%
Spend attributed
91%
Near-real-time coverage
68%
Under active controls
57%
Value-linked spend
41%
Sticker shock watch
Budget exhaustion forecast
At risk
Annual token budget exhausted in
4.7 months

Current run-rate is $312K/mo vs original plan of $88K/mo. Driver: agentic workflows + premium model mix.

Jul52%
Aug68%
Sep82%
Oct97%
Nov116%
Dec133%

Visibility gap

6 tools

Six AI tools still lack reliable token metadata. Two are invoice-only and cannot be forecasted accurately.

Step-function change

6.0×

Agentic workflow consumption is up sharply and changes the cost curve faster than seat-based planning can handle.

Avoidable spend

$750K

Four actions close the gap: reroute premium calls, enforce budgets, expand caching, and fix retries.

Adoption friction to remove

Six tools are not instrumented. Two are invoice-only. Three high-usage apps lack owner-level attribution. Fixing this is a prerequisite for trusted forecasting.

Governance without killing experimentation

Cap sandbox and non-prod agent usage, but keep approved learning paths open. Tokay separates protected work from discretionary exploration.

Board-level proof point

Move from “AI spend increased” to “AI spend increased because of these workflows, with this value, and these controls.”

Value yield
Are tokens producing durable value?

Productive usage

61% of AI spend maps to repeat workflows with measurable productivity signals.

Experimental usage

24% is exploratory, sandbox, or non-prod usage that should have budget caps.

Dropped adoption

7 tools showed usage drop-off after rollout and need ROI review.

Waste signals

$166K tied to retries, premium overuse, and low-cache repeated context.

CFO questions Tokay now answers
Designed for usage-based AI pricing
Will we blow the annual token budget?Yes, at current run-rate the budget is exhausted in 4.7 months.
Where is usage invisible?Six tools are missing coverage; 14% of spend is still delayed or invoice-only.
Is usage creating value?41% of spend is directly value-linked; 24% remains experimental and should be capped.
Usage

Consumption & attribution.

Every chart is interactive. Click tokens, apps, models, business units, providers, or actions to inspect the driver and next move.

Use this when: teams ask who is consuming tokens, which apps/platforms drive usage, and where attribution is weak.
Token composition
parts of 14.5B · cached ~0.1× input rate

Click any usage chart

Select a segment, app, model, provider, BU, or action to inspect the economics behind it.

Tokens by application
area = tokens · color = 90d growth

aIQ Chat

3.9B · +12%

Clara

2.8B · +6%

Digital Diligence

1.7B · +41%

Copilot

2.4B · +9%

Tax Assist

1.5B · +4%

Vigil

1.1B · +28%

KnowledgeHub

1.0B · -3%

aIQ Capture

0.8B · +16%

CHAMP

0.7B · +11%

HAWK

0.6B · +33%

Digital Gateway

0.9B · +22%

Reading: Digital Diligence +41% is the mix problem; aIQ Chat is the volume problem.
Apps and platforms under token management
Expanded Tokay coverage · click any item to inspect source, owner, and economics

aIQ Capture

Capture and extraction workloads

0.8B tokAI Labs

CHAMP

Workflow platform

0.7B tokAdvisory

HAWK

Risk and monitoring agent

0.6B tokRisk

Digital Gateway

Enterprise platform entry point

0.9B tokFirmwide
Models by spend
ranked · $K · 30d
GPT-4.1
$122K
Claude Sonnet 4.6
$96K
GPT-4.1-mini
$64K
Claude Opus 4.8
$59K
Gemini 2.5 Pro
$48K
Premium models are 43% of spend on 21% of tokens — routing 30% of Digital Diligence Opus calls saves $310K.
BU consumption vs budget share
spend % vs allocation % · spikes = over-consuming
FirmwideAuditAdvisoryDeal AdvTaxAI Labs
Spend escapes the budget outline at Firmwide and Deal Advisory. Unattributed: 5.5%.
Provider mix
share of spend · concentration threshold 40%

Azure OpenAI

43.6% · $183.1K

Anthropic

29.0% · $121.9K

Google Gemini

17.1% · $71.8K

AWS Bedrock

10.3% · $43.2K

Azure exceeds threshold by 3.6 pts, driven by aIQ default routing.
Recommended usage actions
Click an action to inspect source chart and economic logic

Re-route Digital Diligence premium calls

$310K

Model mix action

Enforce Advisory & Firmwide budgets

$230K

Budget action

Expand prompt caching to Clara & Tax

$130K

Cache action

Fix Vigil retry loop

$80K

Reliability action

Cost

Find the leak. Open the insight. Take the action.

Each card points to a cost driver with a recommended move and supporting evidence.

Use this when: leadership wants to understand what is driving AI cost and which actions reduce spend without reducing AI value.
Premium leakage

Digital Diligence is overusing Opus-class models

$310K

26% of runs use premium routing; ~30% can shift to Sonnet or Gemini with low business risk.

Budget pressure

Advisory spend is running above budget pace

$230K

Advisory is at 113% of Q3 budget pace; Firmwide is trending toward the same zone.

Cache opportunity

Repeated context is still paying full input rate

$130K

Clara and Tax repeat large context blocks that should be converted into cached-token patterns.

Spend by business unit
Where AI dollars go
Business unitSpend% totalTrendReading
Firmwide$122K29%+8%aIQ Chat + Copilot
Advisory$118K28%+24%Digital Diligence premium mix
Tax$73K17%+6%Tax Assist scale-up
Audit$54K13%+9%Evidence workflows
Current best move
Largest single savings lever

Shift addressable premium traffic first

It creates the largest savings and lowers provider concentration risk.

  • Primary model: Claude Opus 4.8
  • Target routes: Sonnet 4.6, Gemini 2.5 Pro
  • Owner: Platform Engineering
  • Timeline: 2–3 weeks
Forecast

Run the scenario. See the decision gap move.

Forecast is interactive: scenario presets and sliders recalculate the projected branch, gap, and recommended action mix.

Use this when: finance asks what happens next quarter under different growth, routing, caching, and retry assumptions.
FY26 cumulative spend — scenario branches
Base: $4.31M · With actions: $3.56M · Gap: $750K
OctNovDecJanFebMarAprMayJunJul*Sep*
Base projection$4.31M
With actions$3.56M
Decision gap$750K
RiskMedium
Run scenario
Presets and tunable assumptions

Current run-rate

Baseline forecast using current token volume, model mix, cache share, retry rate, and provider routing.

Scenario actions
Updated by scenario assumptions

Select a scenario action

Click an action above to inspect the assumption, expected savings, owner, and operating risk.

Capacity

Protect the right work when models or providers get constrained.

Simulate brownouts by provider and model tier, then see which workloads should be protected, degraded, or throttled.

Capacity scenario planner
Specify provider and model
Protect14
Degrade19
Throttle37
Cost avoided$184K
Brownout recommendation
For selected scenario

Azure OpenAI · GPT-4.1

Tokay will protect the highest-value workloads, degrade lower-priority usage, and throttle low-value experimentation if pressure continues.

  • Protect: Clara, Tax Assist, Audit Evidence
  • Degrade: aIQ Chat casual usage, Research Pro
  • Throttle: sandbox harnesses, experimentation queues
Reliability

Vendor reliability, brownout exposure, and waste risk.

Tokay connects provider incidents, vendor SLO/SLA commitments, latency/error trends, vulnerable apps, and premium-model waste into one operational view.

Use this when: platform and finance teams need to know what workloads are exposed during vendor incidents or brownouts.
Active provider incidents
1

Anthropic · elevated errors across Claude models

SLO compliance
97.9%

Target 99.5% for frontier-model APIs

Workloads exposed
42

18 critical · 24 degradable

Retry waste
$86K

Projected if current error rate persists

Provider incident monitor
Status, impact, SLA/SLO posture, and mitigation
Incident identified
Vendor incident

Elevated error rate across multiple Claude models

Impacts Claude API, Claude Console, Claude Code, and dependent enterprise workflows. Tokay detected higher retry cost in Digital Diligence, Clara, and Tax Assist.

24m since detection
InvestigatingVendor status detected · error rate elevated
IdentifiedTokay mapped affected apps and token waste
MitigatingRoute eligible traffic to alternate models
ResolvedReturn traffic after error budget recovers
SLA / SLO impact
How vendor reliability translates into enterprise action

Vendor SLA

Commercial commitment. Used for credits, vendor performance, and contractual review.

Operating SLO

Internal target for AI workloads. Used to trigger brownout, routing, and throttling decisions.

Error budget

Allowed unreliability before corrective action. Tokay calculates burn by provider, model, and workload.

Recovery policy

When traffic returns to normal routing after latency, error rate, and retry cost stabilize.

Vendor reliability scorecard
Interactive score by provider · click to update the reliability detail panel

Anthropic

97.9%

Active incident · Claude family

Azure OpenAI

99.1%

No incident · concentration risk

Google Gemini

98.8%

Fallback route · moderate variance

AWS Bedrock

98.6%

Tertiary path

Most vulnerable apps
Provider dependency, retry sensitivity, and business criticality
ApplicationRiskProvider exposureMitigation
Digital DiligenceAdvisory workflow HighClaude Opus · 61%Route eligible tasks to Gemini/Sonnet fallback
VigilAgent workflow HighGemini Pro · 48%Cap retries and validate tool calls
aIQ ChatFirmwide assistant MediumAzure OpenAI · 74%Degrade casual usage during brownout
Low-value premium usage
Frontier models used where business value does not justify the cost
ApplicationCurrent modelValueRecommendation
Digital DiligenceClaude Opus 4.8SelectiveRoute many tasks to Sonnet
KnowledgeHubGPT-4.1Low-mediumShift default to 4.1-mini
Sandbox HarnessGPT-4.1LowMove to mini tier
Reliability scenario simulator
Change vendor error rate and provider availability; Tokay recalculates exposure and actions
Critical apps exposed18
Retry cost burn$86K
Traffic to reroute32%
PolicyDegrade
Brownout dependency matrix
Provider-model exposure by application · click cells to inspect reroute logic
App
Claude
Azure
Gemini
Bedrock
Digital Diligence
High
Med
Fallback
Low
Clara
High
Low
Med
Low
aIQ Chat
Low
High
Med
Low
HAWK
Med
Low
High
Med
1

Detect provider degradation

Status page + internal telemetry + retry spike.

live
2

Map exposure

Apps, owners, workflows, and BU criticality.

42
3

Decide policy

Protect / degrade / reroute / throttle.

policy
4

Execute controls

Routing, fallbacks, retry caps, and evidence.

auto
Reliability response controls
Click a control to update the response plan

Reroute eligible Claude traffic

Move low-risk summarization and extraction workloads to Gemini/Sonnet fallback.

32%

Degrade casual assistant usage

Preserve client delivery, tax review, and audit evidence while reducing non-critical UX quality.

24 apps

Throttle sandbox harnesses

Pause low-value experiments and non-prod agents until error budget recovers.

$41K
DetectStatus + error spike
MapApps, owners, workflows
PolicyDegrade
ExecuteRouting + throttles
RecoverReturn traffic

Reroute eligible Claude traffic

Tokay recommends shifting addressable Claude traffic from affected premium routes to Gemini/Sonnet fallback while preserving high-value workflows.

Ask Tokay

Delegate the investigation, then watch the run unfold.

Tokay behaves like a long-horizon agent harness over tokenomics data: it plans, queries marts, tests scenarios, writes findings, and returns actions with evidence.

Use this when: a leader delegates a multi-step investigation and wants evidence, artifacts, and actions rather than another dashboard.
Investigation queue
Click a task to load and run it
Where are we overpaying for intelligence?

Find leakage across model mix, retries, caching, and BU budgets.

What will AI spend look like in the next 30, 60, and 90 days?

Forecast the decision gap and identify mitigation actions.

What happens if agent traffic doubles?

Stress agentic workflows, step count, retries, and model routing.

Which workflows are expensive because of retries?

Find failure loops and repeated-token waste.

If capacity becomes constrained, which workloads should be protected?

Build protect / degrade / throttle policy.

Which models provide the best cost-to-latency ratio?

Compare reliability, latency, and realized cost.

Overpayment investigation

usage → cost → model mix → workflow waste → action plan

Run complete · evidence grounded
Tokay can run multi-step investigations over usage, cost, forecast, capacity, and reliability marts.

Run plan

Long-horizon

Agent trace

18s

Where are we overpaying for intelligence?

High confidence

Tokay found three overpayment zones: Digital Diligence premium routing, Vigil retry waste, and repeated-context workloads in Clara and Tax that should use more cached tokens.

Evidence retrieved

Artifacts generated

    Next actions

    Skills4
    Evidence7 marts