Skip to main content
Token Access Protocol

Pay for AI token-by-token.
Shut the tap the moment it goes wrong.

A Solana state-channel payment protocol for streaming LLM inference. Either side can halt at any output-token boundary; settlement only ever moves USDC for output the consumer actually accepted. Built on x402.

< 1¢
Bounded loss per side mid-stream
~200 ms
Default halt grace period
~92%
Refunded per halted response
x402
HTTP payment standard, extended

See it in action

A consumer asks for JSON. The producer streams tokens; the consumer signs incremental commitments. When the model goes off-topic, the evaluator halts the stream — and the unspent deposit refunds on-chain.

session.stream(prompt)
// click Start to stream tokens
Tokens received0/38
Cumulative paid$0.000000
Prepaid input floor$0.000012
Refund pending settlement$0.000190
Idle · channel open, deposit escrowed, evaluator armed

The structural shift

Today's pay-after-delivery model bundles commitment and value in two coarse points; TAP makes them flow together, token by token. The trust window collapses from "full request value" to "a few tokens of inference cost".

TodayPay-after-delivery API
TAPStreaming state-channel
Halt mid-response
No — full response is billed
Either side, any token boundary
Refund unused tokens
Not possible after delivery
On-chain, automatic at settlement
Producer trusts consumer for
Pre-funded account / KYC
< 1¢ of unsigned tokens
Consumer trusts producer for
Full request value, every request
< 1¢ trailing buffer
Per-token settlement cost
Bundled at end (no per-token rail)
Off-chain. One Solana tx per session
Input-cost protection
None — bundled with output
prepaid_input locked at channel open

Why it works

Six properties together — none new on its own, but the combination is what makes per-token settlement economically viable.

Bilateral halt

Either party stops by ceasing their next action. No explicit halt message, no attack surface, no DoS vector. The session settles at the token boundary you chose.

How halt works

Input prepay (§4.9)

Producer's prefill compute is non-refundable but bounded. The consumer locally re-tokenizes the prompt with the producer's declared tokenizer; mismatch aborts before any escrow.

Read §4.9

Built on x402

TAP extends Coinbase's x402 standard into variable-cost streaming. Discovery, channel-open, and settlement compose with existing x402 facilitators with no contradictions.

x402 relationship

Solana-native

Anchor program enforces prepaid_input ≤ cumulative_paid ≤ deposit and the dispute-window state machine. Fast finality, sub-cent fees, PDAs as channel state.

On-chain interface

Pluggable evaluators

Halt-on-quality is application-level. Compose JSON-schema, length, repetition, topic-drift, content-policy. Or write a callable that returns Decision.HALT — the SDK handles the rest.

Evaluators

A streaming substrate

The same channel construction maps to audio, video, GPU rental, metered APIs. Drops are tokens; per-second metering is just a different unit. LLM inference is the showcase, not the limit.

Beyond LLM

Stop paying for tokens you'll throw away.

Build a producer or consumer in a few lines. The reference demo wraps Gemini 2.5 Flash; adapters are scaffolded for Anthropic, OpenAI, and local Ollama. The on-chain program is deployed to Solana devnet under 2tqofcitv1LHFGCLCmR9Kyke6TmArQwpHSinWWtmCje9.