Command center

Cloud billing ≠ AI economics.

One operating view for AI spend, usage, value, budget burn, provider exposure, and actions — built for usage-based AI pricing.

Decision 01

Will the AI budget surprise us?

Visibility, budget exhaustion date, invoice-only gaps, and value-linked spend.

Decision 02

What should we fix first?

Ranked actions by savings, owner, confidence, and operational risk.

Decision 03

What breaks during a provider incident?

Vendor SLO/SLA posture, exposure matrix, brownout policy, and rerouting controls.

Total spend · 30d

$420K

↑ 20% MoM

Total tokens · 30d

14.5B

in 4.6 · out 2.3 · cached 7.6

Active applications

+3 this quarter

Agentic workflows

412

212K exec/mo

FY26 projection

$4.31M

$3.56M w/ actions

Provider concentration

43.6%

Azure · threshold 40%

Spend vs forecast

weekly $K · budget $86K/wk · 90% interval

Growth is driven by agentic workflows, not seat expansion. See the FY26 decision gap →

Live usage events

tokens, model, latency, retries, cost

Live

Digital Diligence

claude-opus-4.8

$1.94

44.1K tok

Vigil

gemini-2.5-pro

$0.09

12.3K tok

Tax Assist

claude-sonnet-4.6

$0.18

15.6K tok

aIQ Chat

gpt-4.1

$0.41

31.2K tok

Recommended actions

Only the four actions that materially change the forecast are shown here

Re-route Digital Diligence premium calls

Opus mix 26% → 12%; blended rate drops ~ $1.9/M tok

$310K

Enforce Advisory & Firmwide budgets

Advisory is at 113% of Q3 pace; add BU controls and alerts

$230K

Expand prompt caching to Clara & Tax

Cached share 52% → 60% across repeated-context workloads

$130K

Fix Vigil retry loop

Failure→retry rate 7.8% → target ~2%

$80K

CFO Control

Know before the AI bill arrives.

A CFO-grade control layer for AI usage visibility, budget burn, sticker-shock risk, and value yield.

Use this when: the CFO asks whether AI cost visibility is mature enough to prevent sticker shock and whether usage maps to value.

AI cost visibility maturity

72% visible. 28% still creates surprise.

Tokay measures how much AI spend is instrumented, attributed, forecastable, controlled, and linked to business value.

72%VISIBLE

Instrumented apps

84%

Spend attributed

91%

Near-real-time coverage

68%

Under active controls

57%

Value-linked spend

41%

Sticker shock watch

Budget exhaustion forecast

At risk

Annual token budget exhausted in

4.7 months

Current run-rate is $312K/mo vs original plan of $88K/mo. Driver: agentic workflows + premium model mix.

Jul52%

Aug68%

Sep82%

Oct97%

Nov116%

Dec133%

Visibility gap

6 tools

Six AI tools still lack reliable token metadata. Two are invoice-only and cannot be forecasted accurately.

Step-function change

6.0×

Agentic workflow consumption is up sharply and changes the cost curve faster than seat-based planning can handle.

Avoidable spend

$750K

Four actions close the gap: reroute premium calls, enforce budgets, expand caching, and fix retries.

Adoption friction to remove

Six tools are not instrumented. Two are invoice-only. Three high-usage apps lack owner-level attribution. Fixing this is a prerequisite for trusted forecasting.

Governance without killing experimentation

Cap sandbox and non-prod agent usage, but keep approved learning paths open. Tokay separates protected work from discretionary exploration.

Board-level proof point

Move from “AI spend increased” to “AI spend increased because of these workflows, with this value, and these controls.”

Value yield

Are tokens producing durable value?

Productive usage

61% of AI spend maps to repeat workflows with measurable productivity signals.

Experimental usage

24% is exploratory, sandbox, or non-prod usage that should have budget caps.

Dropped adoption

7 tools showed usage drop-off after rollout and need ROI review.

Waste signals

$166K tied to retries, premium overuse, and low-cache repeated context.

CFO questions Tokay now answers

Designed for usage-based AI pricing

Will we blow the annual token budget?Yes, at current run-rate the budget is exhausted in 4.7 months.

Where is usage invisible?Six tools are missing coverage; 14% of spend is still delayed or invoice-only.

Is usage creating value?41% of spend is directly value-linked; 24% remains experimental and should be capped.

Usage

Consumption & attribution.

Every chart is interactive. Click tokens, apps, models, business units, providers, or actions to inspect the driver and next move.

Use this when: teams ask who is consuming tokens, which apps/platforms drive usage, and where attribution is weak.

Token composition

parts of 14.5B · cached ~0.1× input rate

Click any usage chart

Select a segment, app, model, provider, BU, or action to inspect the economics behind it.

Tokens by application

area = tokens · color = 90d growth

aIQ Chat

3.9B · +12%

Clara

2.8B · +6%

Digital Diligence

1.7B · +41%

Copilot

2.4B · +9%

Tax Assist

1.5B · +4%

Vigil

1.1B · +28%

KnowledgeHub

1.0B · -3%

aIQ Capture

0.8B · +16%

CHAMP

0.7B · +11%

HAWK

0.6B · +33%

Digital Gateway

0.9B · +22%

Reading: Digital Diligence +41% is the mix problem; aIQ Chat is the volume problem.

Apps and platforms under token management

Expanded Tokay coverage · click any item to inspect source, owner, and economics

aIQ Capture

Capture and extraction workloads

0.8B tokAI Labs

CHAMP

Workflow platform

0.7B tokAdvisory

HAWK

Risk and monitoring agent

0.6B tokRisk

Digital Gateway

Enterprise platform entry point

0.9B tokFirmwide

Models by spend

ranked · $K · 30d

GPT-4.1

$122K

Claude Sonnet 4.6

$96K

GPT-4.1-mini

$64K

Claude Opus 4.8

$59K

Gemini 2.5 Pro

$48K

Premium models are 43% of spend on 21% of tokens — routing 30% of Digital Diligence Opus calls saves $310K.

BU consumption vs budget share

spend % vs allocation % · spikes = over-consuming

Spend escapes the budget outline at Firmwide and Deal Advisory. Unattributed: 5.5%.

Provider mix

share of spend · concentration threshold 40%

Azure OpenAI

43.6% · $183.1K

Anthropic

29.0% · $121.9K

Google Gemini

17.1% · $71.8K

AWS Bedrock

10.3% · $43.2K

Azure exceeds threshold by 3.6 pts, driven by aIQ default routing.

Recommended usage actions

Click an action to inspect source chart and economic logic

Re-route Digital Diligence premium calls

$310K

Model mix action

Enforce Advisory & Firmwide budgets

$230K

Budget action

Expand prompt caching to Clara & Tax

$130K

Cache action

Fix Vigil retry loop

$80K

Reliability action

Cost

Find the leak. Open the insight. Take the action.

Each card points to a cost driver with a recommended move and supporting evidence.

Use this when: leadership wants to understand what is driving AI cost and which actions reduce spend without reducing AI value.

Premium leakage

Digital Diligence is overusing Opus-class models

$310K

26% of runs use premium routing; ~30% can shift to Sonnet or Gemini with low business risk.

Budget pressure

Advisory spend is running above budget pace

$230K

Advisory is at 113% of Q3 budget pace; Firmwide is trending toward the same zone.

Cache opportunity

Repeated context is still paying full input rate

$130K

Clara and Tax repeat large context blocks that should be converted into cached-token patterns.

Spend by business unit

Where AI dollars go

Business unit	Spend	% total	Trend	Reading
Firmwide	$122K	29%	+8%	aIQ Chat + Copilot
Advisory	$118K	28%	+24%	Digital Diligence premium mix
Tax	$73K	17%	+6%	Tax Assist scale-up
Audit	$54K	13%	+9%	Evidence workflows

Current best move

Largest single savings lever

Shift addressable premium traffic first

It creates the largest savings and lowers provider concentration risk.

Primary model: Claude Opus 4.8
Target routes: Sonnet 4.6, Gemini 2.5 Pro
Owner: Platform Engineering
Timeline: 2–3 weeks

Forecast

Run the scenario. See the decision gap move.

Forecast is interactive: scenario presets and sliders recalculate the projected branch, gap, and recommended action mix.

Use this when: finance asks what happens next quarter under different growth, routing, caching, and retry assumptions.

FY26 cumulative spend — scenario branches

Base: $4.31M · With actions: $3.56M · Gap: $750K

Base projection$4.31M

With actions$3.56M

Decision gap$750K

RiskMedium

Run scenario

Presets and tunable assumptions

Agent traffic growth 1.0×

Premium model share 43%

Cached token share 52%

Retry rate 7.8%

Current run-rate

Baseline forecast using current token volume, model mix, cache share, retry rate, and provider routing.

Scenario actions

Updated by scenario assumptions

Select a scenario action

Click an action above to inspect the assumption, expected savings, owner, and operating risk.

Capacity

Protect the right work when models or providers get constrained.

Simulate brownouts by provider and model tier, then see which workloads should be protected, degraded, or throttled.

Capacity scenario planner

Specify provider and model

Provider

Model focus

Capacity reduction 20%

Traffic growth 2.0×

Premium access Restricted

Protect14

Degrade19

Throttle37

Cost avoided$184K

Brownout recommendation

For selected scenario

Azure OpenAI · GPT-4.1

Tokay will protect the highest-value workloads, degrade lower-priority usage, and throttle low-value experimentation if pressure continues.

Protect: Clara, Tax Assist, Audit Evidence
Degrade: aIQ Chat casual usage, Research Pro
Throttle: sandbox harnesses, experimentation queues

Reliability

Vendor reliability, brownout exposure, and waste risk.

Tokay connects provider incidents, vendor SLO/SLA commitments, latency/error trends, vulnerable apps, and premium-model waste into one operational view.

Use this when: platform and finance teams need to know what workloads are exposed during vendor incidents or brownouts.

Active provider incidents

Anthropic · elevated errors across Claude models

SLO compliance

97.9%

Target 99.5% for frontier-model APIs

Workloads exposed

18 critical · 24 degradable

Retry waste

$86K

Projected if current error rate persists

Provider incident monitor

Status, impact, SLA/SLO posture, and mitigation

Incident identified

Vendor incident

Elevated error rate across multiple Claude models

Impacts Claude API, Claude Console, Claude Code, and dependent enterprise workflows. Tokay detected higher retry cost in Digital Diligence, Clara, and Tax Assist.

24m since detection

InvestigatingVendor status detected · error rate elevated

IdentifiedTokay mapped affected apps and token waste

MitigatingRoute eligible traffic to alternate models

ResolvedReturn traffic after error budget recovers

SLA / SLO impact

How vendor reliability translates into enterprise action

Vendor SLA

Commercial commitment. Used for credits, vendor performance, and contractual review.

Operating SLO

Internal target for AI workloads. Used to trigger brownout, routing, and throttling decisions.

Error budget

Allowed unreliability before corrective action. Tokay calculates burn by provider, model, and workload.

Recovery policy

When traffic returns to normal routing after latency, error rate, and retry cost stabilize.

Vendor reliability scorecard

Interactive score by provider · click to update the reliability detail panel

Anthropic

97.9%

Active incident · Claude family

Azure OpenAI

99.1%

No incident · concentration risk

Google Gemini

98.8%

Fallback route · moderate variance

AWS Bedrock

98.6%

Tertiary path

Most vulnerable apps

Provider dependency, retry sensitivity, and business criticality

Application	Risk	Provider exposure	Mitigation
Digital DiligenceAdvisory workflow	High	Claude Opus · 61%	Route eligible tasks to Gemini/Sonnet fallback
VigilAgent workflow	High	Gemini Pro · 48%	Cap retries and validate tool calls
aIQ ChatFirmwide assistant	Medium	Azure OpenAI · 74%	Degrade casual usage during brownout

Low-value premium usage

Frontier models used where business value does not justify the cost

Application	Current model	Value	Recommendation
Digital Diligence	Claude Opus 4.8	Selective	Route many tasks to Sonnet
KnowledgeHub	GPT-4.1	Low-medium	Shift default to 4.1-mini
Sandbox Harness	GPT-4.1	Low	Move to mini tier

Reliability scenario simulator

Change vendor error rate and provider availability; Tokay recalculates exposure and actions

Claude error rate 8%

Azure latency degradation 18%

Fallback capacity 65%

Critical apps exposed18

Retry cost burn$86K

Traffic to reroute32%

PolicyDegrade

Brownout dependency matrix

Provider-model exposure by application · click cells to inspect reroute logic

App

Claude

Azure

Gemini

Bedrock

Digital Diligence

High

Med

Fallback

Low

Clara

High

Low

Med

Low

aIQ Chat

Low

High

Med

Low

HAWK

Med

Low

High

Med

Detect provider degradation

Status page + internal telemetry + retry spike.

live

Map exposure

Apps, owners, workflows, and BU criticality.

Decide policy

Protect / degrade / reroute / throttle.

policy

Execute controls

Routing, fallbacks, retry caps, and evidence.

auto

Reliability response controls

Click a control to update the response plan

Reroute eligible Claude traffic

Move low-risk summarization and extraction workloads to Gemini/Sonnet fallback.

32%

Degrade casual assistant usage

Preserve client delivery, tax review, and audit evidence while reducing non-critical UX quality.

24 apps

Throttle sandbox harnesses

Pause low-value experiments and non-prod agents until error budget recovers.

$41K

DetectStatus + error spike

MapApps, owners, workflows

PolicyDegrade

ExecuteRouting + throttles

RecoverReturn traffic

Reroute eligible Claude traffic

Tokay recommends shifting addressable Claude traffic from affected premium routes to Gemini/Sonnet fallback while preserving high-value workflows.

Ask Tokay

Delegate the investigation, then watch the run unfold.

Tokay behaves like a long-horizon agent harness over tokenomics data: it plans, queries marts, tests scenarios, writes findings, and returns actions with evidence.

Use this when: a leader delegates a multi-step investigation and wants evidence, artifacts, and actions rather than another dashboard.

Investigation queue

Click a task to load and run it

Where are we overpaying for intelligence?

Find leakage across model mix, retries, caching, and BU budgets.

What will AI spend look like in the next 30, 60, and 90 days?

Forecast the decision gap and identify mitigation actions.

What happens if agent traffic doubles?

Stress agentic workflows, step count, retries, and model routing.

Which workflows are expensive because of retries?

Find failure loops and repeated-token waste.

If capacity becomes constrained, which workloads should be protected?

Build protect / degrade / throttle policy.

Which models provide the best cost-to-latency ratio?

Compare reliability, latency, and realized cost.

Overpayment investigation

usage → cost → model mix → workflow waste → action plan

Run complete · evidence grounded

Tokay can run multi-step investigations over usage, cost, forecast, capacity, and reliability marts.

Run plan

Long-horizon

Agent trace

18s

Where are we overpaying for intelligence?

High confidence

Tokay found three overpayment zones: Digital Diligence premium routing, Vigil retry waste, and repeated-context workloads in Clara and Tax that should use more cached tokens.