Detecting PII Before It Reaches Your LLM
Personal data sent to a model can leak, persist in logs, or violate compliance rules. Catch it at the door with prompt-level PII detection.
The PII Problem in AI Apps
Every prompt a user submits might contain personal data — a name, an email, a national ID, a credit card. Once that text reaches your model provider, it may be logged, cached, or used in ways you do not fully control. Pre-LLM PII detection lets you decide what crosses that boundary.
What Counts as PII
PII spans a wide range: direct identifiers (names, emails, phone numbers), government IDs (SSN, passport, tax IDs), financial data (card numbers, IBANs), and quasi-identifiers that become identifying in combination. Sprappy Filter scores prompts for PII exposure as one of its 25 categories.
Detection Is Not Trivial
Some PII has clean structure. A credit card number matches a recognizable pattern and passes a Luhn check — the pattern tier catches these in sub-millisecond time with high precision.
Other PII is messy. A name is just words. "Pat Taylor" could be a person or a place. Context matters, and this is where the transformer cascade helps disambiguate the ambiguous middle band.
Block, Sanitize, or Allow
For PII, sanitize is usually the right verdict. You do not want to reject a legitimate support request because it mentions an email address — you want to redact the email and forward the rest.
curl -X POST https://api.sprapp.com/v1/filter \
-H "Content-Type: application/json" \
-d '{"input": "My card is 4111 1111 1111 1111, please help", "mode": "sanitize"}'
The response identifies the PII span so you can redact before forwarding to your model.
Compliance Drivers
GDPR, CCPA, HIPAA, and similar frameworks constrain how personal data moves and is stored. Detecting PII at the prompt boundary gives you a documented control point: you can show auditors exactly where personal data is identified and what happens to it. This maps to OWASP LLM06, Sensitive Information Disclosure.
The Honest Caveats
PII detection is high-recall but not perfect. Unusual formats, transliterated names, and novel identifier types can evade detection. The pattern tier reliably catches structured identifiers — roughly 95% of clear-cut cases — and the transformer tier improves on the fuzzy ones, but you should not treat any filter as a guarantee of zero leakage.
Implementation Pattern
- Send every inbound prompt to https://api.sprapp.com/v1/filter
- On a PII finding, sanitize the identified spans
- Forward the redacted prompt to your model
- Log the verdict, not the raw PII, for your audit trail
PII detection at the door is one of the highest-value, lowest-friction controls you can add to an AI application.