I built an AI security Firewall and made it open source because production apps were leaking SSNs to OpenAI

AI Summary6 min read

TL;DR

Sentinel Protocol is an open-source local security proxy for LLM API calls that scans inputs and outputs with 81 engines to prevent PII leaks and attacks, requiring minimal code changes.

Key Takeaways

•Sentinel Protocol acts as a local proxy between apps and LLM providers, running 81 security engines to block or redact PII and detect injections without cloud calls.
•It includes features like a two-way PII vault for tokenization, multi-layer injection detection, agentic security for MCP, and real-time streaming redaction to protect data.
•The tool offers compliance support with OWASP LLM Top 10 mapping, generates evidence reports, and has low overhead (<5ms), making it easy to integrate with existing code.

What it is

A local security proxy for LLM API calls. It sits between your application and
any LLM provider - OpenAI, Anthropic, Google Gemini, Ollama, DeepSeek, Groq, etc and runs 81 security engines on every request.

Zero cloud calls for security decisions. Everything runs on your machine. The
audit trail is a plain JSONL file that stays local.

Getting started

npx --yes --package sentinel-protocol sentinel bootstrap --profile paranoid --mode enforce --dashboard

Enter fullscreen mode Exit fullscreen mode

Proxy starts at http://127.0.0.1:8787. Dashboard at http://127.0.0.1:8788.

Change one line in your SDK:

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  baseURL: 'http://127.0.0.1:8787/v1',
  defaultHeaders: { 'x-sentinel-target': 'openai' }
});

Enter fullscreen mode Exit fullscreen mode

Everything else stays identical. Your app code doesn't change.

The PII problem

The PII engine handles 40+ pattern types with severity-tiered actions:

Severity	Examples	Action
Critical	SSN, Credit Card, Passport	Block (403) — never reaches model
High	API Keys, AWS credentials, tax ID	Block (403)
Medium	Email, phone, physical address	Silently redact → placeholder
Low	IP addresses	Log and pass

When you block an SSN:

json
{
  "error": "PII_DETECTED",
  "reason": "pii_detected",
  "pii_types": ["ssn_us"],
  "correlation_id": "52360b2d-4b92-4b30-9ace-32fae427c323"
}

Enter fullscreen mode Exit fullscreen mode

The PII never left your machine. The audit log has the timestamp, type, correlation ID, and duration. Your users' data stayed local.

The two-way PII vault goes further. It tokenizes PII before the request leaves (the model sees a reference token, not the real value), then detokenizes the token in the model's response. End-to-end - the actual value is never transmitted.

Injection detection

Three layers, running simultaneously per request:

LFRL classifier: Custom rule language (RULE...WHEN...THEN) plus a learned scoring function. Calibrated confidence 0.0–1.0. Configurable block threshold (default: 0.85). Every rule is inspectable - no black box.

Prompt rebuff: Canary token detection + perplexity scoring. Catches adversarial text that's lexically valid but statistically anomalous.

Semantic scanner: ONNX local embeddings (all-MiniLM-L6-v2) computing cosine similarity against a threat signature corpus. Catches semantically similar injections that don't match known lexical patterns.

The injection-merge layer combines all three signals with configurable weights into a single decision.

Agentic and MCP security

For teams building with tool-using agents and MCP:

MCP poisoning: A malicious MCP server can return a crafted tool result and redirect the agent's next action. Sentinel's MCP poisoning detector analyzes tool call results for hijacking signals before they influence the agent.

Shadow MCP: Detects unauthorized MCP servers impersonating legitimate ones.

MCP certificate pinning: TLS cert validation against expected fingerprints for known MCP servers.

Swarm protocol: HMAC-authenticated inter-agent messaging. Agents can't impersonate each other in multi-agent setups.

Loop breaker: Detects and terminates infinite agent recursion before the budget burns.

Intent drift: Tracks whether an agent's behavior over a session is diverging from its constitutional goal.

Egress scanning: Most security tooling stops at the input. Sentinel scans what comes back out:

Hallucination tripwire: Catches fabricated URLs, nonexistent package imports, numeric contradictions within the same response, and improbable citation patterns. Deterministic. Works directly on output text without needing the input context.

Stego exfil detector: Zero-width characters and invisible Unicode code points are a real exfiltration vector. Tools can embed hidden data in what appears to be clean natural language text. Sentinel scans for it.

Streaming (SSE) redaction: In real time, as SSE chunks arrive. Not after the stream ends but during. The transform buffers partial sentences, scans, and either forwards or redacts before the chunk reaches your client.

Output classifier: Four categories - toxicity, code execution, hallucination signals, unauthorized disclosure. Each with configurable warn/block thresholds and context-window dampening to keep the FP rate reasonable.

Compliance and governance: Every blocked event gets OWASP LLM Top 10 category mapping and MITRE ATLAS technique attribution automatically. The compliance engine generates SOC2, GDPR, HIPAA, and EU AI Act Article 12 evidence reports on demand.

The forensic debugger lets you replay any blocked request against a changed config. That's useful when you're tuning thresholds: make a change, replay the historical block, see whether the updated config would have still caught it.

Formal verification: TLA+ spec for the security decision pipeline, Alloy spec for policy consistency. These are in the repo. You can run them.

The numbers

52,069 lines of source code
81 security engines
139 test suites, 567 tests, 0 failures
306 linted files, 0 warnings
9 total runtime dependencies
<5ms proxy overhead at p95
0 npm audit vulnerabilities
OWASP LLM Top 10: 10/10 categories covered

What I got wrong
Egress is harder than ingress. Scanning natural language output has an inherently higher FP rate than scanning structured input patterns. I had to build context windowing and n-gram scoring specifically to make the output classifier useful at the default thresholds.*

Streaming was harder than I expected. Getting real-time SSE redaction right required three rewrites. The partial sentence buffering across chunks is subtle.

MCP should have been there from day one. I added most of the agentic security late. Tool-using agents and MCP integrations are the fastest-growing attack surface.

No install — try it now

npx --yes --package sentinel-protocol sentinel bootstrap --profile paranoid --mode enforce --dashboard

Enter fullscreen mode Exit fullscreen mode

Global install

npm install -g sentinel-protocol
sentinel bootstrap --profile paranoid --mode enforce --dashboard

Enter fullscreen mode Exit fullscreen mode

Embed in your app

npm install sentinel-protocol

Enter fullscreen mode Exit fullscreen mode

GitHub: https://github.com/myProjectsRavi/sentinel-protocol

MIT.
No telemetry.
No accounts.

Built by a developer who got tired of the answer being "trust the model."

I built an AI security Firewall and made it open source because production apps were leaking SSNs to OpenAI

TL;DR

Key Takeaways

Tags

What it is

Getting started

The PII problem

Injection detection

Agentic and MCP security

No install — try it now

Global install

Embed in your app

dev.to top (week)