Cloud billing ≠ AI economics.
One operating view for AI spend, usage, value, budget burn, provider exposure, and actions — built for usage-based AI pricing.
Will the AI budget surprise us?
Visibility, budget exhaustion date, invoice-only gaps, and value-linked spend.
What should we fix first?
Ranked actions by savings, owner, confidence, and operational risk.
What breaks during a provider incident?
Vendor SLO/SLA posture, exposure matrix, brownout policy, and rerouting controls.
↑ 20% MoM
in 4.6 · out 2.3 · cached 7.6
+3 this quarter
212K exec/mo
$3.56M w/ actions
Azure · threshold 40%
Digital Diligence
claude-opus-4.8
44.1K tok
Vigil
gemini-2.5-pro
12.3K tok
Tax Assist
claude-sonnet-4.6
15.6K tok
aIQ Chat
gpt-4.1
31.2K tok
Re-route Digital Diligence premium calls
Opus mix 26% → 12%; blended rate drops ~ $1.9/M tok
Enforce Advisory & Firmwide budgets
Advisory is at 113% of Q3 pace; add BU controls and alerts
Expand prompt caching to Clara & Tax
Cached share 52% → 60% across repeated-context workloads
Fix Vigil retry loop
Failure→retry rate 7.8% → target ~2%
Know before the AI bill arrives.
A CFO-grade control layer for AI usage visibility, budget burn, sticker-shock risk, and value yield.
72% visible. 28% still creates surprise.
Tokay measures how much AI spend is instrumented, attributed, forecastable, controlled, and linked to business value.
Current run-rate is $312K/mo vs original plan of $88K/mo. Driver: agentic workflows + premium model mix.
Visibility gap
6 toolsSix AI tools still lack reliable token metadata. Two are invoice-only and cannot be forecasted accurately.
Step-function change
6.0×Agentic workflow consumption is up sharply and changes the cost curve faster than seat-based planning can handle.
Avoidable spend
$750KFour actions close the gap: reroute premium calls, enforce budgets, expand caching, and fix retries.
Adoption friction to remove
Six tools are not instrumented. Two are invoice-only. Three high-usage apps lack owner-level attribution. Fixing this is a prerequisite for trusted forecasting.
Governance without killing experimentation
Cap sandbox and non-prod agent usage, but keep approved learning paths open. Tokay separates protected work from discretionary exploration.
Board-level proof point
Move from “AI spend increased” to “AI spend increased because of these workflows, with this value, and these controls.”
Productive usage
61% of AI spend maps to repeat workflows with measurable productivity signals.
Experimental usage
24% is exploratory, sandbox, or non-prod usage that should have budget caps.
Dropped adoption
7 tools showed usage drop-off after rollout and need ROI review.
Waste signals
$166K tied to retries, premium overuse, and low-cache repeated context.
| Will we blow the annual token budget?Yes, at current run-rate the budget is exhausted in 4.7 months. |
| Where is usage invisible?Six tools are missing coverage; 14% of spend is still delayed or invoice-only. |
| Is usage creating value?41% of spend is directly value-linked; 24% remains experimental and should be capped. |
Consumption & attribution.
Every chart is interactive. Click tokens, apps, models, business units, providers, or actions to inspect the driver and next move.
Click any usage chart
Select a segment, app, model, provider, BU, or action to inspect the economics behind it.
aIQ Chat
3.9B · +12%
Clara
2.8B · +6%
Digital Diligence
1.7B · +41%
Copilot
2.4B · +9%
Tax Assist
1.5B · +4%
Vigil
1.1B · +28%
KnowledgeHub
1.0B · -3%
aIQ Capture
0.8B · +16%
CHAMP
0.7B · +11%
HAWK
0.6B · +33%
Digital Gateway
0.9B · +22%
aIQ Capture
Capture and extraction workloads
CHAMP
Workflow platform
HAWK
Risk and monitoring agent
Digital Gateway
Enterprise platform entry point
Azure OpenAI
43.6% · $183.1K
Anthropic
29.0% · $121.9K
Google Gemini
17.1% · $71.8K
AWS Bedrock
10.3% · $43.2K
Re-route Digital Diligence premium calls
$310KModel mix action
Enforce Advisory & Firmwide budgets
$230KBudget action
Expand prompt caching to Clara & Tax
$130KCache action
Fix Vigil retry loop
$80KReliability action
Find the leak. Open the insight. Take the action.
Each card points to a cost driver with a recommended move and supporting evidence.
Digital Diligence is overusing Opus-class models
26% of runs use premium routing; ~30% can shift to Sonnet or Gemini with low business risk.
Advisory spend is running above budget pace
Advisory is at 113% of Q3 budget pace; Firmwide is trending toward the same zone.
Repeated context is still paying full input rate
Clara and Tax repeat large context blocks that should be converted into cached-token patterns.
| Business unit | Spend | % total | Trend | Reading |
|---|---|---|---|---|
| Firmwide | $122K | 29% | +8% | aIQ Chat + Copilot |
| Advisory | $118K | 28% | +24% | Digital Diligence premium mix |
| Tax | $73K | 17% | +6% | Tax Assist scale-up |
| Audit | $54K | 13% | +9% | Evidence workflows |
Shift addressable premium traffic first
It creates the largest savings and lowers provider concentration risk.
- Primary model: Claude Opus 4.8
- Target routes: Sonnet 4.6, Gemini 2.5 Pro
- Owner: Platform Engineering
- Timeline: 2–3 weeks
Run the scenario. See the decision gap move.
Forecast is interactive: scenario presets and sliders recalculate the projected branch, gap, and recommended action mix.
Current run-rate
Baseline forecast using current token volume, model mix, cache share, retry rate, and provider routing.
Select a scenario action
Click an action above to inspect the assumption, expected savings, owner, and operating risk.
Protect the right work when models or providers get constrained.
Simulate brownouts by provider and model tier, then see which workloads should be protected, degraded, or throttled.
Azure OpenAI · GPT-4.1
Tokay will protect the highest-value workloads, degrade lower-priority usage, and throttle low-value experimentation if pressure continues.
- Protect: Clara, Tax Assist, Audit Evidence
- Degrade: aIQ Chat casual usage, Research Pro
- Throttle: sandbox harnesses, experimentation queues
Vendor reliability, brownout exposure, and waste risk.
Tokay connects provider incidents, vendor SLO/SLA commitments, latency/error trends, vulnerable apps, and premium-model waste into one operational view.
Anthropic · elevated errors across Claude models
Target 99.5% for frontier-model APIs
18 critical · 24 degradable
Projected if current error rate persists
Elevated error rate across multiple Claude models
Impacts Claude API, Claude Console, Claude Code, and dependent enterprise workflows. Tokay detected higher retry cost in Digital Diligence, Clara, and Tax Assist.
Vendor SLA
Commercial commitment. Used for credits, vendor performance, and contractual review.
Operating SLO
Internal target for AI workloads. Used to trigger brownout, routing, and throttling decisions.
Error budget
Allowed unreliability before corrective action. Tokay calculates burn by provider, model, and workload.
Recovery policy
When traffic returns to normal routing after latency, error rate, and retry cost stabilize.
Anthropic
97.9%Active incident · Claude family
Azure OpenAI
99.1%No incident · concentration risk
Google Gemini
98.8%Fallback route · moderate variance
AWS Bedrock
98.6%Tertiary path
| Application | Risk | Provider exposure | Mitigation |
|---|---|---|---|
| Digital DiligenceAdvisory workflow | High | Claude Opus · 61% | Route eligible tasks to Gemini/Sonnet fallback |
| VigilAgent workflow | High | Gemini Pro · 48% | Cap retries and validate tool calls |
| aIQ ChatFirmwide assistant | Medium | Azure OpenAI · 74% | Degrade casual usage during brownout |
| Application | Current model | Value | Recommendation |
|---|---|---|---|
| Digital Diligence | Claude Opus 4.8 | Selective | Route many tasks to Sonnet |
| KnowledgeHub | GPT-4.1 | Low-medium | Shift default to 4.1-mini |
| Sandbox Harness | GPT-4.1 | Low | Move to mini tier |
Detect provider degradation
Status page + internal telemetry + retry spike.
Map exposure
Apps, owners, workflows, and BU criticality.
Decide policy
Protect / degrade / reroute / throttle.
Execute controls
Routing, fallbacks, retry caps, and evidence.
Reroute eligible Claude traffic
Move low-risk summarization and extraction workloads to Gemini/Sonnet fallback.
32%Degrade casual assistant usage
Preserve client delivery, tax review, and audit evidence while reducing non-critical UX quality.
24 appsThrottle sandbox harnesses
Pause low-value experiments and non-prod agents until error budget recovers.
$41KReroute eligible Claude traffic
Tokay recommends shifting addressable Claude traffic from affected premium routes to Gemini/Sonnet fallback while preserving high-value workflows.
Delegate the investigation, then watch the run unfold.
Tokay behaves like a long-horizon agent harness over tokenomics data: it plans, queries marts, tests scenarios, writes findings, and returns actions with evidence.
Find leakage across model mix, retries, caching, and BU budgets.
Forecast the decision gap and identify mitigation actions.
Stress agentic workflows, step count, retries, and model routing.
Find failure loops and repeated-token waste.
Build protect / degrade / throttle policy.
Compare reliability, latency, and realized cost.
Overpayment investigation
usage → cost → model mix → workflow waste → action plan
Run plan
Long-horizonAgent trace
Where are we overpaying for intelligence?
High confidenceTokay found three overpayment zones: Digital Diligence premium routing, Vigil retry waste, and repeated-context workloads in Clara and Tax that should use more cached tokens.