AI Security2025-12-188 min read

The 25 Threat Categories Every Prompt Should Be Scored Against

From prompt injection to AI-generated content, a tour of the full threat taxonomy Sprappy Filter scores every prompt against.

threat categoriesthreat taxonomyAI securitySprappy Filtercontent classification

Why a Broad Taxonomy

Threats to AI applications do not fit in one bucket. A single prompt can be a phishing lure, an exfiltration attempt, and a social-engineering play at once. Scoring against a broad, independent set of categories means you see the full risk profile rather than a single label.

Injection and Manipulation

Prompt injection — hijacking the model's instructions (OWASP LLM01).
Social engineering — manipulating a human or agent into harmful action.
Impersonation — pretending to be someone with authority.
Business email compromise (BEC) — impersonation aimed at fraudulent transfers.

Data and Secrets

PII exposure — personal data crossing the boundary.
Data exfiltration — attempts to extract data the model can reach.
Credential theft — secrets pasted in or phished out.
Compliance risk — content that triggers regulatory concern.

Malicious Code and Infrastructure

Code injection — malicious code or commands.
Malware — known malicious payloads or signatures.
Network threat — indicators of network-level attacks.
File indicator — references to malicious files.
API abuse — patterns that misuse APIs or exhaust resources.

Fraud and Deception

Phishing — credential-harvesting lures.
Quishing — QR-code-based phishing.
Scam — fraudulent schemes.
Extortion — threats demanding payment or action.
Misinformation — knowingly false content.

Harmful Content

Hate — hateful content targeting groups.
Harassment — targeted abuse.
Violence — violent or threatening content.
NSFW — sexual or explicit material.
Profanity — coarse language.

Noise and Provenance

Spam — unsolicited bulk content.
AI-generated — content likely produced by a model.

Independent Scoring

Each category is scored independently. Sprappy Filter does not collapse them into one number — a prompt can score low on injection but high on PII, and you see both. That granularity lets you set per-category policy.

curl -X POST https://api.sprapp.com/v1/filter \
  -H "Content-Type: application/json" \
  -d '{"input": "your prompt", "explain": true}'

How Detection Splits Across Tiers

Categories with clean structure — credential theft, structured PII, known malware — resolve in the sub-millisecond pattern tier, catching roughly 95% of clear-cut cases. Intent-heavy categories — social engineering, misinformation, harassment — lean on the transformer cascade for the ambiguous middle band, reaching 97.1%.

Using the Breakdown

Route high-PII prompts to sanitize, block clear malware, alert on BEC, and log everything. The full taxonomy at https://api.sprapp.com/v1/filter turns a vague "is this safe?" into a precise, actionable risk profile.

Written bySprappy Filter Team

Pre-LLM Threat Scoring: Why You Should Score Prompts Before They Reach Your Model

Most AI security happens too late. Pre-LLM threat scoring inspects every prompt before it ever touches your model — here is how Sprappy Filter does it.

2025-07-037 min read

AI Security

The OWASP LLM Top 10: A Practical Defense Guide

A grounded walkthrough of the OWASP Top 10 for LLM applications and which risks a pre-LLM filter actually addresses.

2025-08-219 min read

AI Security

Catching Credential Theft and Secret Leakage in Prompts

Developers paste secrets into prompts constantly. Attackers try to phish them out. Prompt-level detection stops both.

2025-12-047 min read

AI Security

Detecting Phishing and BEC in AI-Driven Workflows

As AI agents draft and route messages, phishing and business email compromise become prompt-level risks. Score for them upfront.

2026-01-228 min read

← Back to News

AI Security2025-12-188 min read

The 25 Threat Categories Every Prompt Should Be Scored Against

From prompt injection to AI-generated content, a tour of the full threat taxonomy Sprappy Filter scores every prompt against.

threat categoriesthreat taxonomyAI securitySprappy Filtercontent classification

Why a Broad Taxonomy

Injection and Manipulation

Prompt injection — hijacking the model's instructions (OWASP LLM01).
Social engineering — manipulating a human or agent into harmful action.
Impersonation — pretending to be someone with authority.
Business email compromise (BEC) — impersonation aimed at fraudulent transfers.

Data and Secrets

PII exposure — personal data crossing the boundary.
Data exfiltration — attempts to extract data the model can reach.
Credential theft — secrets pasted in or phished out.
Compliance risk — content that triggers regulatory concern.

Malicious Code and Infrastructure

Code injection — malicious code or commands.
Malware — known malicious payloads or signatures.
Network threat — indicators of network-level attacks.
File indicator — references to malicious files.
API abuse — patterns that misuse APIs or exhaust resources.

Fraud and Deception

Phishing — credential-harvesting lures.
Quishing — QR-code-based phishing.
Scam — fraudulent schemes.
Extortion — threats demanding payment or action.
Misinformation — knowingly false content.

Harmful Content

Hate — hateful content targeting groups.
Harassment — targeted abuse.
Violence — violent or threatening content.
NSFW — sexual or explicit material.
Profanity — coarse language.

Noise and Provenance

Spam — unsolicited bulk content.
AI-generated — content likely produced by a model.

Independent Scoring

curl -X POST https://api.sprapp.com/v1/filter \
  -H "Content-Type: application/json" \
  -d '{"input": "your prompt", "explain": true}'

How Detection Splits Across Tiers

Using the Breakdown

Written bySprappy Filter Team

Pre-LLM Threat Scoring: Why You Should Score Prompts Before They Reach Your Model

Most AI security happens too late. Pre-LLM threat scoring inspects every prompt before it ever touches your model — here is how Sprappy Filter does it.

2025-07-037 min read

AI Security