Block, Sanitize, Allow: Designing Around a Ternary Verdict
A binary safe/unsafe verdict is too blunt for real applications. The block/sanitize/allow model gives you a middle path.
Why Binary Falls Short
Many safety systems return a boolean: safe or unsafe. That forces a brutal choice — pass the prompt through untouched, or reject it entirely. Real prompts are rarely that clean. A legitimate support message might contain one email address. A useful document might carry one injected sentence. Binary verdicts make you discard the good with the bad.
The Three Outcomes
Sprappy Filter returns one of three actions from https://api.sprapp.com/v1/filter:
- Allow — the prompt is clean; forward it as-is.
- Sanitize — the prompt is mostly fine but contains an offending span; strip or redact that span and forward the rest.
- Block — the prompt is a clear threat; do not forward it.
This ternary maps onto application logic far more naturally than a boolean.
When Each Applies
Allow is the common case for normal traffic. Most prompts are benign and resolve here through the sub-millisecond pattern fast-path.
Sanitize shines for PII and indirect injection. You keep the legitimate content and remove only the problematic part — redact the credit card, strip the injected instruction.
Block is for unambiguous threats: live credentials, clear malware, direct injection with no salvageable content.
Designing Your Handlers
switch (verdict.action) {
case "allow": return forward(prompt);
case "sanitize": return forward(verdict.sanitized);
case "block": return reject(verdict.categories);
}
Notice that sanitize returns the cleaned text. Your code does not have to reimplement redaction — it forwards what the filter hands back.
The UX Dimension
The three verdicts let you craft better user experiences. A blocked prompt can show a clear refusal. A sanitized prompt can quietly proceed, or optionally tell the user "we removed an email address before processing." Allow is invisible. This beats a single "request denied" wall.
Honest Framing
Sanitize is powerful but not magic. If a prompt is adversarial through and through, there is nothing safe to extract — that is a block. And no verdict is infallible; the pattern tier handles clear-cut cases (around 95%) and the transformer cascade handles the ambiguous middle (up to 97.1%), but edge cases exist.
Mapping to Categories
The verdict comes with the categories that triggered it. A block on prompt_injection should be handled differently from a block on credential_theft. Use the category breakdown to route, log, and alert appropriately.
A ternary verdict is a small design choice with large downstream benefits. It lets you keep what is useful and remove what is dangerous, instead of choosing between them.