Evaluators

An evaluator is any callable from accumulated output to Decision:

from tap.evaluators.base import Decision, Evaluator

def my_evaluator(text: str) -> Decision:
    return Decision.HALT if "forbidden" in text else Decision.CONTINUE

The session calls the evaluator after each token chunk; the first HALT stops the session and force-signs a final commit.

Decision is a 2-value enum:

Value	Meaning
`Decision.CONTINUE`	Keep streaming.
`Decision.HALT`	Stop the session and force-sign the final commit.

Built-in evaluators

`json_schema(schema)`

Halts as soon as the streaming output cannot be extended into a valid instance of the supplied JSON schema. Tolerates leading markdown fences (```json) and a 32-character warm-up window so partial preambles don't trip the syntactic gate.

If jsonschema is installed, full Draft-2020-12 validation runs once the buffer parses; otherwise the check degrades to syntax-only.

evaluators.json_schema({
    "type": "object",
    "required": ["title"],
    "properties": {"title": {"type": "string"}},
})

`length_cap(max_chars)`

Halts when accumulated output reaches max_chars. Raises ValueError if max_chars <= 0. The name reports as length_cap(<n>) so session.halted_by shows the bound.

`topic_drift(reference, *, threshold=0.05, min_chars=200, scorer=None)`

Halts when scorer(reference, accumulated) falls below threshold. Default scorer is a cheap token-overlap heuristic; pass your own for embedding-based scoring. min_chars avoids halting on very short prefixes where the score is noisy.

`repetition_guard(*, window=200, threshold=0.4)`

Halts when more than threshold of the last window characters are covered by a single 8-char n-gram — the most common token-padding failure mode (whitepaper §5.3.5). Window must be ≥ 32 characters; threshold must be in (0.0, 1.0].

`content_policy(banned)`

Halts on a substring or compiled regex match. Plain strings are matched case-insensitively; pre-compiled re.Patterns use their own flags.

import re
evaluators.content_policy([
    "API_KEY",
    re.compile(r"\b[A-Z0-9]{32,}\b"),  # candidate secret
])

Composing evaluators

evaluators.compose(*evaluators) ORs them together — the first to return HALT wins. The composed evaluator's name is "compose(<inner names>)", so session.halted_by still surfaces which inner check fired.

evaluator = evaluators.compose(
    evaluators.json_schema(EXPECTED_SCHEMA),
    evaluators.length_cap(2_000),
    evaluators.repetition_guard(),
)

Pass the composed evaluator to consumer.open_session(evaluator=...).

Writing custom evaluators

Any callable conforming to Evaluator = Callable[[str], Decision] works:

class TopicGuard:
    name = "topic_guard"  # surfaces in session.halted_by
    def __init__(self, allowed_topics: set[str]):
        self.allowed = allowed_topics

    def __call__(self, accumulated: str) -> Decision:
        # ... inspect text, return CONTINUE or HALT
        return Decision.CONTINUE

Set .name on the callable so session.halted_by reports a meaningful value instead of a generic "evaluator".

Latency considerations

The evaluator runs after every token. Heavy per-token work — calls to a remote API, expensive parsing — will starve the streaming loop. Rule of thumb: evaluators should complete in under 5ms at the median.

For evaluators that must be expensive (a classifier model, for example), accumulate output and run the check every K tokens internally:

def heavy_evaluator():
    counter = {"i": 0}
    def _(text: str):
        counter["i"] += 1
        if counter["i"] % 50 != 0:
            return Decision.CONTINUE
        return run_classifier(text)
    return _

Built-in evaluators​

json_schema(schema)​

length_cap(max_chars)​

topic_drift(reference, *, threshold=0.05, min_chars=200, scorer=None)​

repetition_guard(*, window=200, threshold=0.4)​

content_policy(banned)​

Composing evaluators​

Writing custom evaluators​

Latency considerations​