Guardrails2025-11-207 min read

Block, Sanitize, Allow: Designing Around a Ternary Verdict

A binary safe/unsafe verdict is too blunt for real applications. The block/sanitize/allow model gives you a middle path.

guardrailsverdict modelsanitizationAI securitySprappy Filter

Why Binary Falls Short

Many safety systems return a boolean: safe or unsafe. That forces a brutal choice — pass the prompt through untouched, or reject it entirely. Real prompts are rarely that clean. A legitimate support message might contain one email address. A useful document might carry one injected sentence. Binary verdicts make you discard the good with the bad.

The Three Outcomes

Sprappy Filter returns one of three actions from https://api.sprapp.com/v1/filter:

Allow — the prompt is clean; forward it as-is.
Sanitize — the prompt is mostly fine but contains an offending span; strip or redact that span and forward the rest.
Block — the prompt is a clear threat; do not forward it.

This ternary maps onto application logic far more naturally than a boolean.

When Each Applies

Allow is the common case for normal traffic. Most prompts are benign and resolve here through the sub-millisecond pattern fast-path.

Sanitize shines for PII and indirect injection. You keep the legitimate content and remove only the problematic part — redact the credit card, strip the injected instruction.

Block is for unambiguous threats: live credentials, clear malware, direct injection with no salvageable content.

Designing Your Handlers

switch (verdict.action) {
  case "allow":    return forward(prompt);
  case "sanitize": return forward(verdict.sanitized);
  case "block":    return reject(verdict.categories);
}

Notice that sanitize returns the cleaned text. Your code does not have to reimplement redaction — it forwards what the filter hands back.

The UX Dimension

The three verdicts let you craft better user experiences. A blocked prompt can show a clear refusal. A sanitized prompt can quietly proceed, or optionally tell the user "we removed an email address before processing." Allow is invisible. This beats a single "request denied" wall.

Honest Framing

Sanitize is powerful but not magic. If a prompt is adversarial through and through, there is nothing safe to extract — that is a block. And no verdict is infallible; the pattern tier handles clear-cut cases (around 95%) and the transformer cascade handles the ambiguous middle (up to 97.1%), but edge cases exist.

Mapping to Categories

The verdict comes with the categories that triggered it. A block on prompt_injection should be handled differently from a block on credential_theft. Use the category breakdown to route, log, and alert appropriately.

A ternary verdict is a small design choice with large downstream benefits. It lets you keep what is useful and remove what is dangerous, instead of choosing between them.

Written bySPRAPP Team

Guardrails vs Fine-Tuning: Two Approaches to AI Safety

Should you bake safety into the model or wrap it with external guardrails? Both have a role — and they are not interchangeable.

2026-02-058 min read

Guardrails

Securing Autonomous AI Agents With Prompt Filtering

Agents act on the world, which makes their prompt surface a security boundary. Filtering inbound and inter-agent messages contains the risk.

2026-05-148 min read

← Back to News

Why Binary Falls Short

The Three Outcomes

Sprappy Filter returns one of three actions from https://api.sprapp.com/v1/filter:

Allow — the prompt is clean; forward it as-is.

Sanitize — the prompt is mostly fine but contains an offending span; strip or redact that span and forward the rest.

Block — the prompt is a clear threat; do not forward it.

This ternary maps onto application logic far more naturally than a boolean.

When Each Applies

Allow is the common case for normal traffic. Most prompts are benign and resolve here through the sub-millisecond pattern fast-path.

Sanitize shines for PII and indirect injection. You keep the legitimate content and remove only the problematic part — redact the credit card, strip the injected instruction.

Block is for unambiguous threats: live credentials, clear malware, direct injection with no salvageable content.

Designing Your Handlers

switch (verdict.action) { case "allow": return forward(prompt); case "sanitize": return forward(verdict.sanitized); case "block": return reject(verdict.categories); }

Notice that sanitize returns the cleaned text. Your code does not have to reimplement redaction — it forwards what the filter hands back.

Honest Framing

Mapping to Categories

A ternary verdict is a small design choice with large downstream benefits. It lets you keep what is useful and remove what is dangerous, instead of choosing between them.

Block, Sanitize, Allow: Designing Around a Ternary Verdict

Why Binary Falls Short

The Three Outcomes

When Each Applies

Designing Your Handlers

The UX Dimension

Honest Framing

Mapping to Categories

Tags

Related Articles

Guardrails vs Fine-Tuning: Two Approaches to AI Safety

Securing Autonomous AI Agents With Prompt Filtering

Block, Sanitize, Allow: Designing Around a Ternary Verdict

Why Binary Falls Short

The Three Outcomes

When Each Applies

Designing Your Handlers

The UX Dimension

Honest Framing

Mapping to Categories

Tags

Related Articles

Guardrails vs Fine-Tuning: Two Approaches to AI Safety

Securing Autonomous AI Agents With Prompt Filtering