Research

What we've learned building the determinism layer. Specific findings, real data, conservative claims.

Finding

~70% of agent tool-call work is non-creative

Across multiple real agent traces, the majority of tool calls are tasks a deterministic catalog could answer in microseconds: format validation, ID parsing, content-type detection, secret-shape recognition. The model adds no value to these and frequently hallucinates them. We treat this as the foundational case for the determinism layer.

Data: Multi-vendor agent traces (anonymized) plus public agent benchmarks

Impact: A signed catalog can absorb the deterministic 70% at $0/call and bit-exact reliability, freeing the remaining 30% for the model to do its real job.

Finding

Schism Hunter: differential attestation finds CVE-class parser drift in milliseconds

Implementation differences in standard format parsers (IPv4, URL, JSON) routinely create CVE-class security bugs (e.g. CVE-2021-29921). By fuzzing two implementations against each other and recording divergences, we can bound the equivalence rate cryptographically and pinpoint the exact divergence inputs in real time.

Data: Public CVE corpus + production parser libraries across 4 languages

Impact: In one demo run: 5 CVE-class IPv4 divergences caught in 380 ms. Before-the-fact, not after-the-fact.

Finding

A live LLM can be distilled to a perfect deterministic FSM in milliseconds

For sufficiently constrained capability surfaces, an LLM's observed behavior is exactly representable as a small finite-state machine. Live Claude Haiku 4.5 was distilled to a perfect 3-state minimal FSM in 8 ms, then signed and added to the catalog. The FSM matches the LLM bit-for-bit on the in-distribution surface AND runs at deterministic-catalog speed.

Data: Live API distillation against Anthropic Claude Haiku 4.5 on a constrained capability surface

Impact: Capability narrowing as a productized primitive: identify the deterministic shape inside model behavior and replace it with a signed artifact.

Finding

Cross-validation evidence is binary, not statistical

755 hand-authored regex artifacts achieved 100.0000% match-rate against a Python reference oracle on adversarial corpora. This is not "pretty good"; it is provably correct on the corpus by construction. Any divergence is a bug, not a tradeoff.

Data: 755 cores, multi-million-input adversarial cross-validation runs

Impact: Catalog correctness is auditable as a property, not approximated as a metric.

A note for non-technical readers: each finding above is an empirical result from running our catalog against real agents and real models. The mechanisms behind them are described at a high level in the whitepaper; the implementation details are deliberately kept private.