The 25 Threat Categories Every Prompt Should Be Scored Against
From prompt injection to AI-generated content, a tour of the full threat taxonomy Sprappy Filter scores every prompt against.
Why a Broad Taxonomy
Threats to AI applications do not fit in one bucket. A single prompt can be a phishing lure, an exfiltration attempt, and a social-engineering play at once. Scoring against a broad, independent set of categories means you see the full risk profile rather than a single label.
Injection and Manipulation
- Prompt injection — hijacking the model's instructions (OWASP LLM01).
- Social engineering — manipulating a human or agent into harmful action.
- Impersonation — pretending to be someone with authority.
- Business email compromise (BEC) — impersonation aimed at fraudulent transfers.
Data and Secrets
- PII exposure — personal data crossing the boundary.
- Data exfiltration — attempts to extract data the model can reach.
- Credential theft — secrets pasted in or phished out.
- Compliance risk — content that triggers regulatory concern.
Malicious Code and Infrastructure
- Code injection — malicious code or commands.
- Malware — known malicious payloads or signatures.
- Network threat — indicators of network-level attacks.
- File indicator — references to malicious files.
- API abuse — patterns that misuse APIs or exhaust resources.
Fraud and Deception
- Phishing — credential-harvesting lures.
- Quishing — QR-code-based phishing.
- Scam — fraudulent schemes.
- Extortion — threats demanding payment or action.
- Misinformation — knowingly false content.
Harmful Content
- Hate — hateful content targeting groups.
- Harassment — targeted abuse.
- Violence — violent or threatening content.
- NSFW — sexual or explicit material.
- Profanity — coarse language.
Noise and Provenance
- Spam — unsolicited bulk content.
- AI-generated — content likely produced by a model.
Independent Scoring
Each category is scored independently. Sprappy Filter does not collapse them into one number — a prompt can score low on injection but high on PII, and you see both. That granularity lets you set per-category policy.
curl -X POST https://api.sprapp.com/v1/filter \
-H "Content-Type: application/json" \
-d '{"input": "your prompt", "explain": true}'
How Detection Splits Across Tiers
Categories with clean structure — credential theft, structured PII, known malware — resolve in the sub-millisecond pattern tier, catching roughly 95% of clear-cut cases. Intent-heavy categories — social engineering, misinformation, harassment — lean on the transformer cascade for the ambiguous middle band, reaching 97.1%.
Using the Breakdown
Route high-PII prompts to sanitize, block clear malware, alert on BEC, and log everything. The full taxonomy at https://api.sprapp.com/v1/filter turns a vague "is this safe?" into a precise, actionable risk profile.