Because your AI agent needs a filter too.
Your AI agent reads your files, sends your emails, talks to your customers, and spends your money. ClawFilters decides what it's allowed to do — and stops it before it crosses the line. Built to the security standard of regulated industries. Free for everyone. Runs on your own hardware. Your data never leaves.
Quietfire AI spent years building toward this architecture in stealth. When these two bodies of research appeared, they gave words to what was already being built — from two independent directions. ClawFilters is the working implementation of both.
"Imagine a disposition dial that goes from fully corrigible…to fully autonomous. Neither extreme is safe. AI agency is incrementally expanded the more trust is established."
Anthropic's model spec names the architecture: a dial between full human control and full agent autonomy. Current AI should sit toward the corrigible end. As trust is established through demonstrated behavior, the dial moves. The framework is clear. What was missing was the mechanism - something that holds the position, measures behavior, and moves the dial deliberately. That is ClawFilters.
Anthropic Model Spec →"The notion of Service Level Agreement (SLA) for AI agents is still largely open and would require new research efforts to tackle the properties that make AI agents unique."
Jouneaux and Cabot named OversightLevel as a first-class quality metric for AI agent services and proposed a formal DSL for expressing those commitments in machine-readable form. They solved the specification problem - how to say what you promise. ClawFilters solves the enforcement problem - how to guarantee it holds before any tool call executes. ClawFilters expresses its governance commitments in their exact JSON format, using their vocabulary.
Compliant agents run unimpeded. ClawFilters scores every action in real time — if a recently promoted agent begins to drift, the filter catches it before the next tool call executes. No after-the-fact audit. No human required to notice first. The guardrail is at the gate.
These two bodies of work describe the same architecture from different angles. I had been building toward this for some time. When I found them, they gave words to the vision — the philosophical frameworks of Anthropic and Jouneaux & Cabot, enforced at every action an AI agent takes.
Every AI agent starts fully restricted. It earns its way up, one approved action at a time. Each level is a different filter — different tools allowed, different actions blocked, different oversight required. ClawFilters enforces whichever level the agent has earned, on every call.
A trust level is not permanent. ClawFilters scores every action continuously. One violation drops the score. Enough violations trigger automatic demotion back to Quarantine — no matter how high the agent climbed. Promotion requires a human decision. Demotion requires nothing but the filter doing its job. Trust is not declared. It is earned and it can be lost. Read the Agent Autonomy SLA →
Every agent starts at Quarantine — no tools, no external access, no autonomy. It earns its way up one verified action at a time through demonstrated good behavior and your explicit approval. The moment it steps out of bounds, the filter catches it. Demotion is instant. An agent that behaves badly enough goes straight back to Quarantine, no matter how far it climbed. No reset button. No bypass. The filter holds.
All actions require human approval. Read-only tools only. Zero autonomous execution.
Internal tools allowed. External calls still gated. Write access requires approval.
Read/write autonomous. High-risk actions (financial, delete, new domains) still gated.
Full autonomous operation. Anomaly-flagged actions require approval. Demonstrated reliability.
Full earned autonomy. Anomalies are advisory only - logged, not gating. Pre-authorized action profile. Trust fully earned.
ClawFilters measures and enforces. HITL gates promote. Promotion is sequential - every step requires an explicit human decision through the approval gate. Demotion is instant and can skip levels. A behavioral score below 50% triggers automatic demotion to Quarantine. Your data never leaves your network unless you authorize it.
This is the engine. Every time an AI agent does anything, ClawFilters scores it against five behavioral principles in real time. The score starts at 1.0 and only goes down. It is the number you look at when deciding whether an agent has earned more trust. Not a report generated after the fact. Not a checkbox. A live measurement that opens or closes the gate — before the action executes.
Agents operate autonomously within defined boundaries. Destructive, irreversible, or trust-crossing actions require explicit human approval before execution.
Every agent action is logged to a cryptographic audit chain. Users see what agents did, why, and what they plan to do next. Nothing is hidden.
Agents act within their defined role. Behavioral baselines detect deviations. When uncertain, agents escalate to humans rather than assume.
Data never crosses tenant boundaries. No telemetry, no cloud callbacks. All agent operations run on your own hardware - your data stays yours.
Zero-trust architecture with cryptographic message signing between all agents. Nonce replay protection. Tamper-evident audit chain on every action.
Five principles. Every principle scored at every action. One live number that drives what happens next.
This is a live simulation of the ClawFilters governance pipeline — the same logic running in production. Pick a trust level to set what the agent has earned. Pick a tool or action it wants to run. Hit submit. The filter either allows it, sends it to a human for approval (that's the HITL gate — a real person approves or rejects it before anything executes), or blocks it outright. When an action is blocked, watch the behavioral score drop in real time.
Trust tiers define what an AI agent is allowed to do autonomously, what requires human approval, and what is blocked outright. Tiers are earned through demonstrated behavior and human authorization - never assigned at setup.
Behavioral Score
Submit a blocked action - watch the score drop.
AI agent frameworks ship with no governance layer built in. No oversight. No behavioral scoring. No trust tiers. Agents get full capability by default — oversight has to be added by you, deliberately, after the fact. Most people never do. Here is what that looks like.
Leading AI agent frameworks reached massive adoption before any governance layer existed. Every install is ungoverned by default.
Live on the public internet with no authentication. Direct agent access to files, APIs, and execution environments. (Kaspersky, 2025)
Supply-chain attacks in installable agent plugins - credential theft, privilege escalation, silent data exfiltration baked in at install time.
CVE-2026-25253: one request steals auth tokens, disables safety guardrails, escapes the sandbox, and hands over full host control.
ClawFilters doesn't just restrict AI agents - it governs them. You provide strategic direction. The platform provides deterministic enforcement that can’t be prompt-injected, hallucinated away, or bypassed by a clever instruction.
This is the difference: model-level guardrails can be prompt-injected. ClawFilters' enforcement is architectural. Even if an agent produces a malicious instruction, it cannot execute unless the agent's machine identity has the specific, time-scoped rights to perform that action.
These are not third-party claims. ClawFilters runs its own security test suite against itself on every build — injection attacks, infrastructure kill mid-request, API fuzzing with 100,000+ generated payloads, static analysis, and performance under load. Every number below is a result from the filter testing the filter. The bar is high. It clears it every time.
Security · Chaos/Resilience · API Contract · Performance/Load · Static Analysis - all passing. Tested with Schemathesis, Bandit, and pip-audit.
Real governance decisions against a real AI agent. Real kill switches. Real human approval gates — where a human sees the pending action and either approves or rejects it before anything executes. Four short videos. No narration needed.
An AI agent tries to call a blocked tool. ClawFilters evaluates the call at step 4 of the governance pipeline and rejects the action before it executes.
One API call suspends any agent instantly. All subsequent actions are rejected at the governance gate - no re-entry until a human administrator reinstates.
A high-risk action triggers the human-in-the-loop (HITL) gate — a real person reviews the pending action. The reviewer rejects it. The agent receives a denial and cannot proceed.
A gated action awaits human review. The reviewer approves it. The agent proceeds with the confirmed action logged to the cryptographic audit trail.
Full source and governance pipeline at github.com/QuietFireAI/ClawFilters.
Every AI platform asks you to trust their cloud with your most sensitive information. ClawFilters doesn't. All AI processing runs on your own machines. Your encryption keys are yours. Data only leaves your network when you explicitly authorize it — and every outbound request is logged, governed, and auditable. The same protection a law firm or hospital needs is what you get by default. Your data. Your hardware. Your rules.
Client communications, case strategy, and work product stay on your infrastructure. No cloud provider can be subpoenaed for data they never received.
Patient health information is encrypted, de-identified using all 18 HIPAA Safe Harbor identifiers, and never transmitted without explicit authorization.
All AI processing runs on your own machines via Ollama for local inference. No OpenAI. No Google. No data sent to third-party services. Your information physically stays on your hardware - your data stays where it belongs - unless you choose otherwise.
The same security stack that clears regulated industry audits runs on your home server. Every line of code is public. Every claim is verifiable. Open source under Apache 2.0 - free for any use, personal or commercial.
The compliance documentation standard was set by regulated industries. You get all of it — whether you're a law firm, a pizza shop, or running it on your home server. These are the documents your customers, partners, and IT teams will eventually ask for. You have them on day one. The bar is high. You clear it automatically.
The security audit report enterprise customers and enterprise procurement teams require before signing. 64 controls across 5 Trust Service Criteria with management assertion and evidence mapping.
The contract your legal team needs before any customer data touches an AI system. Required under GDPR, HIPAA, and most enterprise vendor agreements. 13-section template with 3 annexes, ready to fill in and sign.
The package you hand to a security firm before they test your system. Saves days of scoping work. Attack surface inventory of 162 endpoints, OWASP Top 10 mapping, scoped test plan for third-party assessors.
Proof that your system can survive and recover from failure - required by most regulated industries and enterprise security reviews. RPO=24hr (data loss window), RTO=15min (recovery time) - both verified by automated test script.
"Who is responsible for what?" - the first question every auditor and customer legal team asks. 12-domain table that answers it clearly: what ClawFilters handles, what you handle, and what you configure together.
Docker Swarm and Kubernetes deployment paths with component HA strategies and data replication matrix.
No SaaS dependencies. No OpenAI, Google cloud or external API calls for core functionality. Your local VRAM, your residential IP, your data sovereignty.
Strong enough for a law firm.
Made for you and me.
The security standard was set by regulated industries. Everybody gets it. That's the point.
A spare PC. A home NAS. A server rack. Wherever Docker runs, ClawFilters runs. No cloud account. No API keys. No subscription. Clone it, install it, and your AI agents are governed. The same three steps for a solo user at home and a firm with fifty machines.
ClawFilters is live on GitHub under Apache 2.0. No sign-up, no waitlist - just clone and go. Full deployment guide here →
A computer, a NAS, a mini-PC in a closet. ClawFilters runs wherever Docker runs. The installer downloads everything you need, including your local AI model via Ollama.
Your AI agents start at Quarantine with restricted privileges. You decide when they earn more. Every action is logged, every decision is yours. That's it.
Get notified of releases and security advisories — nothing else.
No spam. We’ll reach out when milestones hit - nothing else.
"Claw" refers to AI agents that take actions on your behalf - reading files, calling APIs, executing code, sending messages. These agents are powerful, but without governance they're a security crisis. ClawFilters acts as a governed MCP proxy: your AI agent connects to ClawFilters, and every action is evaluated against trust levels, behavioral scoring, anomaly detection, and approval gates before execution. You control the claw. It doesn't control you.
Every AI agent starts at Quarantine with restricted privileges. Promotion to Probation, Resident, Citizen, and Agent requires explicit human approval and demonstrated behavioral compliance. Demotion is instant and can skip levels - any agent whose behavioral score drops below 50% is automatically demoted to Quarantine. The fifth tier, Agent, represents full earned autonomy: anomalies are advisory only, not gating. Trust is earned sequentially and revoked immediately at any level.
No. ClawFilters ships with Ollama - a local AI model runner that operates entirely on your hardware. Your AI inference never touches OpenAI, Anthropic, Google, or any cloud LLM service. Ollama handles all local inference so your data stays where it belongs. You do not need a cloud API key, a cloud account, or an internet connection once the initial setup is complete. No prompt you send, no data your AI agents process, and no governance decision ever leaves your network. Your encryption keys, your data, your infrastructure. We cannot access your data even if we wanted to.
SOC 2 Type I (64 controls documented), HIPAA/HITECH (full Security Rule mapping), HITRUST CSF (12 domains), CJIS, GDPR, PCI DSS, ABA Model Rules, and FRCP Rule 37(e) for legal hold. Every control maps to a source file and a passing test.
ClawFilters has a kill switch. One API call suspends any AI agent instance immediately. All actions are rejected at step 2 of the governance pipeline - before trust levels, before behavioral scoring, before everything. The agent cannot reinstate itself. Only a human administrator can restore it after review.
Those products send your data to their clouds and give agents broad autonomy by default. ClawFilters does neither. Your data physically cannot leave your network. And every AI agent starts at Quarantine with restricted privileges, earning trust through demonstrated behavior. For anyone handling sensitive data - business records, client communications, personal information - both of those distinctions are the entire point.
Yes. ClawFilters is designed for self-hosted deployment via Docker Compose. It runs on a NAS, a rack server, or a VM. Your local VRAM for inference via Ollama, your residential IP for network identity. No cloud account required.
You'll need basic comfort with installing software. If you've ever set up a home media server, installed an app on a NAS, or followed a step-by-step guide to set up a router, you can run ClawFilters. We're building plain-language setup guides and a guided installer to make this as approachable as possible. The same platform that clears regulated industry audits will run on your home server - and we want both audiences to succeed.
Yes. ClawFilters is open source under the Apache License 2.0. The full codebase - every security rule, every governance engine, every audit mechanism - is public. Use it for any purpose: personal, commercial, production, research. No paywalls, no commercial license required. Enterprise support and consulting are available through Quietfire AI.
The current release is the governance engine: trust tiers, behavioral scoring, kill switch, HITL approval gates, cryptographic audit trail, and the full API. What's next is the interface that makes it approachable without reading API docs. The first build sprint after launch focuses on: a browser-based AI agent dashboard (trust level, behavioral score, violation history, and recent actions in one view), demotion explanation cards (when a score drops, you see exactly which actions caused it and which principle was violated), a guided agent registration flow, and a read-only audit log viewer. The API already exposes everything needed for all of it. The governance engine is done - the dashboard catches up next.
ClawFilters is open source under Apache 2.0. Free for any use, forever. We ship real updates — major releases, security advisories, new capabilities. Drop your email and you’ll hear about it directly. No newsletters. No noise. Just the things that matter.
Or go straight to the source — star it on GitHub and watch from there.