Quick Start: Pre-LLM Threat Scoring With Sprappy Filter
Add a Sprappy Filter threat-scoring check in front of your LLM calls in a few steps.
Why Score Before the LLM
Sprappy Filter scores incoming text for threats — prompt injection, abuse, policy violations — before it ever reaches your model. Running this check first means you can reject or flag dangerous input cheaply, without spending an LLM call on it. This tutorial adds a basic check to an existing pipeline.
Step 1: Get a Filter Key
Provision a Sprappy Filter key from your dashboard and store it in an environment variable. The setup steps per language are in the docs at https://doc.sprapp.com.
Step 2: Score Incoming Text
Before you call your LLM, send the user's input to the Filter scoring endpoint. It returns a threat score and category signals. The call is designed to be fast and cheap relative to an LLM invocation, so it fits in front of a hot path.
Step 3: Decide on a Threshold
The score is only useful with a policy attached. Pick a threshold: below it, pass the request through to your LLM; above it, block, flag for review, or downgrade. There is no universal correct threshold — it depends on how costly a false block is versus a missed threat for your application.
Step 4: Handle the Block Path
Decide what a blocked request does. Returning a generic refusal is safest; silently dropping is confusing to users; flagging for human review is good when false positives are costly. Build this path deliberately rather than leaving it implicit.
Step 5: Log and Tune
Log scores and outcomes. Over time, review cases near your threshold — both the blocks that should have passed and the passes that should have blocked. Adjust the threshold based on real traffic, not on guesses. The docs at https://doc.sprapp.com describe useful fields to log.
What This Does and Doesn't Do
Sprappy Filter is a fast first-line screen, not a complete safety guarantee. It reduces obvious malicious input cheaply, but no pre-filter catches everything, and you should still apply downstream safeguards. Treat it as defense in depth, not a single point of trust.
Next Steps
Once basic scoring works, look at combining Filter scores with category-specific handling in the docs at https://doc.sprapp.com — for example, routing suspected injection differently from suspected abuse.