Inside the Sub-Millisecond Fast-Path
How Sprappy Filter scores most prompts in under a millisecond — and why that latency budget changes what you can build.
The Latency Budget Problem
Every control you add to the request path spends latency. If filtering costs 200ms, teams disable it for latency-sensitive features. The only filter that stays installed is one fast enough to forget about. That is the design goal of the sub-millisecond fast-path.
What "Fast-Path" Means
The fast-path is the pattern tier. It applies deterministic pattern matching across the threat categories and returns a verdict without any neural-network inference. For the large fraction of traffic where the answer is clear-cut, this resolves in sub-millisecond time.
Why Patterns Are Fast
Pattern matching is a well-understood, highly optimized operation. Compiled pattern automata scan text in linear time. There is no model to load, no inference to run, no GPU to wait on. The work is bounded by the length of the input, not the size of a network.
The Cascade Decision
The fast-path is not the whole story — it is the first stage of a cascade. When the pattern tier returns a confident verdict, that is the answer. When confidence is low or signals conflict, the prompt escalates to the transformer cascade for the ambiguous middle band.
Prompt -> Pattern fast-path (sub-ms)
|-- confident -> verdict
|-- uncertain -> Transformer cascade -> verdict
This means you pay transformer latency only on the prompts that genuinely need it.
What This Enables
A sub-millisecond fast-path changes what is feasible:
- Filter on every keystroke in an autocomplete flow
- Score streaming chunks without stalling the stream
- Run filtering inline in a serverless function without blowing the cold-start budget
- Deploy at the edge where round trips are expensive
The Offline Angle
Because the fast-path is pattern-based, it ships as the free offline engine — WASM or native. You can run the sub-millisecond tier entirely on-device with no network call. Clear-cut threats get caught at the edge; only the ambiguous ones reach the hosted cascade at https://api.sprapp.com/v1/filter.
Honest Performance Claims
Sub-millisecond describes the pattern fast-path on typical inputs. Very long inputs take longer simply because there is more text to scan. And the fast-path's 95% coverage applies to clear-cut threats — the ambiguous cases that escalate to the transformer cascade are not sub-millisecond, by design.
The Tradeoff in One Sentence
Spend microseconds resolving the obvious 95%, and reserve real compute for the hard middle band. That is what keeps a filter fast enough to leave on everywhere.