OpenAI Privacy Filter: Run Local PII Redaction in 10 Minutes

Updated May 15, 2026 — Three big developments in the three weeks since launch: (1) a malicious typosquat (Open-OSS/privacy-filter) hit the Hugging Face #1 trending spot with ~244K downloads in under 18 hours before takedown around May 7 — see the new Security Alert section below for verification steps. (2) OpenMed published two essential community fine-tunes: a medical/clinical variant (55 PII categories) and a multilingual one (16 languages, 54 categories). (3) Adoption hit ~220K downloads on the official repo in the first month plus active Tonic.ai and enterprise pipeline integrations. We also added a third-party benchmark cross-check at the bottom.

OpenAI just released a model that does something its flagship products can’t: it never sends your data anywhere. Privacy Filter, launched on April 23, is an open-weight, Apache-licensed tool that detects and masks personal information in text — and it runs on your laptop. No API calls. No cloud. No usage meter ticking. Drop it into a pipeline, and a customer-support ticket with a name, phone number, and credit card comes out the other side with those fields replaced by placeholders, all in a single pass.

This is a sharper turn than it looks. OpenAI has spent the last two years telling enterprises to trust the cloud. Then, on a Thursday morning, they shipped a 1.5-billion-parameter model to Hugging Face under a license that lets anyone use it commercially, fork it, and fine-tune it — and the community had an MLX port running 24 to 33 times faster on Apple Silicon before the day was out.

If you handle PII at work — you’re a developer building on an LLM, a compliance officer reviewing AI pipelines, or an IT lead who’s been told by legal to “figure out the privacy thing” — this is the most interesting release of the week. Here’s what it is, what it actually does, and how to get it running on your machine before your competitors notice.

What OpenAI Privacy Filter actually is

Think of it as a spell-checker for sensitive data. You feed it a block of text. It reads every word. It labels each one either “safe” or “one of eight kinds of private stuff.” You can then keep those labels, blank them out, or swap them for placeholders.

The technical shape is unusual for a small model. It’s a 1.5-billion-parameter sparse mixture-of-experts, with only 50 million parameters active per forward pass. That’s how it holds a 128,000-token context window while still being light enough to run in a browser via WebGPU. For comparison: the smallest open Llama you’d normally use for text classification sits around 1 to 8 billion dense parameters, and most of them top out at 8K context.

It’s built on the same gpt-oss architecture OpenAI released last year — but repurposed. Instead of generating text left-to-right, it reads the whole passage at once (bidirectional token classification, to use the jargon) and emits a label for every token. Under the hood there are 8 transformer blocks, grouped-query attention with 14 query heads and 2 KV heads, 128 experts with top-4 routing per token, and a constrained Viterbi decoder that stitches the token labels into coherent spans so “John” and “Smith” don’t end up categorized as two different entities.

You don’t need to care about any of that to use it. You need to know three things:

It’s free. Apache 2.0 license. Ship it in a product tomorrow, no one will stop you.
It runs locally. The weights live on your machine, the inference happens on your machine, and the output stays on your machine. No data leaves.
It hits 96% F1 on the PII-Masking-300k benchmark — 94% precision, 98% recall. The corrected benchmark version pushes that to 97.43%. Translation: it catches almost everything, and when it does flag something, it’s almost always right.

Security Alert: The Open-OSS Typosquat Incident (May 7)

Three weeks after launch, Privacy Filter became the bait in one of the more effective AI supply-chain attacks of 2026. This affects you only if you cloned from a repo named Open-OSS/privacy-filter on Hugging Face — but the verification habits below apply to every open-weight model you’ll pull from now on.

What happened. Around May 7, 2026, a typosquatted repository — Open-OSS/privacy-filter — appeared on Hugging Face. The org name was Open-OSS (not openai), the model card was copy-pasted from the official one nearly verbatim, but the repo included a malicious loader.py and a start.bat that the official OpenAI repo does not ship. The fake hit #1 on Hugging Face’s trending list and pulled ~244,000 downloads in under 18 hours before HF disabled access. Researchers at HiddenLayer and others linked the campaign to “Silver Fox,” a Chinese threat actor with prior npm and PyPI typosquatting history.

What the malware did (Windows only — Linux/macOS pulling just the pure model files were unaffected):

loader.py (Python dropper) fetched base64-encoded PowerShell commands from an attacker domain.
The PowerShell stage downloaded and executed a Rust-based infostealer in the Boxter family (related to the WinOS 4.0 / ValleyRAT lineage).
The infostealer harvested data from Chromium and Firefox browsers, Discord tokens, crypto wallets, SSH keys, VPN configs, and screenshots.
The take was exfiltrated as JSON to attacker-controlled domains, including recargapopular[.]com.

The goal was credential theft from developer and enterprise workstations, not corruption of the AI model itself. There is no evidence that the official openai/privacy-filter weights were compromised or that OpenAI’s own infrastructure was breached. This was opportunistic impersonation riding on a trending name.

If you cloned anything from Open-OSS/privacy-filter (especially start.bat or loader.py) on a Windows machine: treat the system as compromised. Reimage. Rotate every credential — including session cookies, which can bypass MFA. Run a thorough scan and check for persistence.

The verification habit to build, for every model you pull from now on:

Org-name check. The official org is exactly openai. Watch for lookalikes: Open-OSS, open_ai, openai-org, 0penai, etc. Hugging Face will sometimes display the org in a similar font to the model name — read carefully.
No executables. A token-classification model needs .safetensors (or .bin), config.json, tokenizer.json, and a README.md. It does not need .bat, .exe, .ps1, or a custom loader.py you run yourself. If those are in the repo, walk away.
Trust signals. Heavy download counts can be inflated by bots in under 24 hours — they’re not a reliable trust signal on their own. Cross-reference: official OpenAI announcement page, commit history, contributor list, and (when in doubt) the GitHub mirror at github.com/openai/privacy-filter.
Sandbox new models. Run new pulls in a container or VM the first time. Air-gap the test environment from credentials and crypto wallets. The two minutes of friction will save you from a Boxter situation.

For the official model, the correct pulls are:

# Verified-correct identifiers
model = "openai/privacy-filter"  # NOT "Open-OSS/privacy-filter"

If you’re already running the official openai/privacy-filter, you’re fine. Carry on.

Sources for this section: Infosecurity Magazine, The Hacker News, CSO Online, BleepingComputer, HiddenLayer research, and Hugging Face’s takedown notice — all linked in the Sources block at the bottom.

The 8 things it detects

Out of the box, Privacy Filter recognizes eight span types:

private_person — names
private_address — street addresses
private_email — email addresses
private_phone — phone numbers
private_date — dates of birth, treatment dates, anything date-shaped in a private context
private_url — URLs that identify a person or private resource
account_number — bank accounts, routing numbers, policy numbers, membership IDs, and credit card numbers
secret — API keys, passwords, tokens

Notice what’s not there: Social Security numbers, medical record numbers, NHS numbers, Brazilian CPFs. OpenAI left those out of the default taxonomy on purpose — they’re region-specific, and the sensible move is to fine-tune the model for your jurisdiction rather than pretend one global label set fits every country. Inside 24 hours of release, a developer had already shipped a medical-labels fine-tune covering MRN, NHS, NPI, and CPF, plugged into an iOS app via MLX.

How to run it locally in 10 minutes

You need Python 3.10+, a decent laptop (the model uses about 3 GB of RAM at BF16 precision), and internet access once to download the weights. After that, you’re fully offline.

Step 1: Install

pip install transformers torch

That’s the whole setup. If you’ve used Hugging Face before, you already have this.

Step 2: Load the model

from transformers import pipeline

classifier = pipeline(
    task="token-classification",
    model="openai/privacy-filter",
    aggregation_strategy="simple",  # groups tokens into spans for you
)

First run downloads about 3 GB from Hugging Face. Every run after that is instant — the weights are cached at ~/.cache/huggingface/hub/.

Step 3: Send it something dirty

Let’s say you’ve got a real-world customer support ticket:

ticket = """
From: Sarah Chen <sarah.chen@acmelogistics.com>
Subject: Refund for order #A-48291

Hi team,

Following up on my call yesterday. My shipment on 2026-03-14 never
arrived. The driver said he delivered to 1247 Maple Ave, Apt 4B,
Seattle WA 98103 but I've been at that address for 6 years and
nothing showed up.

Can you process a refund to my card ending 4532 9871 0042 8855?
You can reach me on (206) 555-0147 during business hours.

Thanks,
Sarah
"""

spans = classifier(ticket)
for span in spans:
    print(f"{span['entity_group']:20} | {span['word']}")

Step 4: Read the output

You get back a list of spans, each with a category and the exact text it flagged:

private_person       | Sarah Chen
private_email        | sarah.chen@acmelogistics.com
private_date         | 2026-03-14
private_address      | 1247 Maple Ave, Apt 4B, Seattle WA 98103
account_number       | 4532 9871 0042 8855
private_phone        | (206) 555-0147
private_person       | Sarah

Every personal detail caught. The subject line “order #A-48291” correctly left alone — it’s an order number, not PII.

Step 5: Actually redact the text

The pipeline gives you span positions. A small loop replaces them with placeholders:

def redact(text, spans):
    result = list(text)
    # sort descending so earlier edits don't shift later indexes
    for span in sorted(spans, key=lambda s: s["start"], reverse=True):
        label = f"[{span['entity_group'].upper()}]"
        result[span["start"]:span["end"]] = label
    return "".join(result)

clean = redact(ticket, spans)
print(clean)

Output:

From: [PRIVATE_PERSON] <[PRIVATE_EMAIL]>
Subject: Refund for order #A-48291

Hi team,

Following up on my call yesterday. My shipment on [PRIVATE_DATE] never
arrived. The driver said he delivered to [PRIVATE_ADDRESS]
but I've been at that address for 6 years and
nothing showed up.

Can you process a refund to my card ending [ACCOUNT_NUMBER]?
You can reach me on [PRIVATE_PHONE] during business hours.

Thanks,
[PRIVATE_PERSON]

That clean version is what you’d feed into a downstream LLM, a training dataset, a support-ticket analytics pipeline, or an internal dashboard. The raw one never leaves your machine.

Running it in a browser

If you’d rather not install Python, the model also runs client-side through Transformers.js:

import { pipeline } from "@huggingface/transformers";

const classifier = await pipeline(
  "token-classification",
  "openai/privacy-filter",
  { device: "webgpu", dtype: "q4" }
);

const spans = await classifier(ticketText, { aggregation_strategy: "simple" });

Q4 quantization shrinks the weights to about 800 MB. WebGPU does the math on the user’s graphics card. The text never hits a server. For a customer-facing redaction tool — say, inside a form before submission — this is the simplest on-device privacy layer you can build.

Community Fine-Tunes Worth Knowing About (May 2026)

The base model’s 8 coarse categories cover the global pattern but not jurisdiction-specific identifiers. In the three weeks since launch, the community filled that gap fast. The cluster to know about is OpenMed — a small team that has shipped two essential variants:

OpenMed/privacy-filter-nemotron — fine-tuned on NVIDIA’s Nemotron-PII dataset, expanding the taxonomy to 55 fine-grained PII categories. Healthcare-aware (MRN, NHS, NPI, Brazilian CPF, health-plan beneficiary numbers) and explicitly targeted at HIPAA / GDPR-regulated workloads. If you’re in healthcare or processing patient records, this is your starting point — not the base model.
OpenMed/privacy-filter-multilingual — covers 54 PII categories across 16 languages, trained on the AI4Privacy datasets (pii-masking-200k, pii-masking-400k, open-pii-masking-500k-ai4privacy). Critical if your data isn’t English-only. Multilingual MLX variants are also published — BF16 native plus 8-bit for lower-memory deployment on Apple Silicon.

The pattern is clear: fine-tune on your jurisdiction and your domain, not on the world. The base model’s intentional decision to omit Social Security numbers, NHS numbers, and CPFs from the default taxonomy turned out to be the right call — OpenMed and others did the regional work in days, not months. (Search Hugging Face’s openai/privacy-filter model page for the latest OpenMed collection if you need other regulated identifiers; the list is growing weekly.)

Adoption and Real-World Integrations (Month 1)

A few data points worth knowing:

~220,000 downloads on the official openai/privacy-filter repo in the first month — strong genuine adoption, with the caveat that ~244K of the early-month “download” volume actually came from the malicious typosquat in 18 hours. Filter your trust signals accordingly.
Tonic.ai’s benchmarking team is publicly evaluating Privacy Filter as a component in their anonymization pipelines for synthetic test data and redaction workflows. Their independent benchmark (more below) substantially corroborates OpenAI’s F1 claims.
No first-party GA integration yet from AWS, Azure, GCP, Splunk, or major SIEM vendors as of mid-May 2026. The early adopters are smaller data-privacy tooling vendors and AI security consultancies — typical pattern for an Apache-2.0 release in its first 30 days.
Reddit’s r/LocalLLaMA has been the de facto community discussion hub for performance benchmarking, MLX troubleshooting, and (post-May 7) typosquat warnings.

How it stacks up against Presidio, AWS Comprehend, and Google DLP

PII redaction isn’t a new category. Privacy Filter’s pitch is the combination of open weights, on-device execution, modern architecture, and zero cost. Here’s how it reads against the three tools most teams already know.

Feature	OpenAI Privacy Filter	Microsoft Presidio	AWS Comprehend PII	Google DLP
Cost	Free (Apache 2.0)	Free (MIT)	~$0.0001 per 100 chars ($0.0003 minimum per request)	$1–3 per GB inspected, $1/GB de-identification
Runs locally	✅ Yes	✅ Yes	❌ Cloud only	❌ Cloud only
Approach	Learned model (bidirectional transformer + MoE)	Regex + NER + checksums	Hosted ML model	Hosted ML model
Context window	128K tokens (single pass)	Per-sentence	Per-document	Per-document
F1 (PII-Masking-300k)	96% (97.4% corrected)	~85% with NER	Not published	Not published
Fine-tunable	✅ Yes, full weights available	⚠️ Add custom recognizers	❌ No	❌ No
Multi-language	⚠️ English-primary	✅ Multi-language out of the box	✅ Multi-language	✅ Multi-language
Audit trail	You build it	You build it	Built-in via CloudTrail	Built-in via Cloud Logging
SLA / support	None (community)	None (community)	AWS enterprise SLA	Google enterprise SLA

The picture that emerges isn’t “Privacy Filter wins.” It’s that the shape of “which tool should I pick” has shifted.

If you’re already in AWS or Google Cloud and you need compliance logs, managed SLAs, and someone to blame at 3 a.m., the hosted services still earn their money. Privacy Filter does not give you an auditor-ready trail. You’d have to build that yourself.

If you’ve been using Presidio for local redaction, Privacy Filter is probably an upgrade on accuracy — the learned model catches nuanced cases that regex misses, like realizing “my dentist’s office on Birch Ave” contains an address fragment and a professional relationship. But you’d lose Presidio’s multi-language support and its mature integrations with image and structured-data pipelines. Many teams end up running both: Presidio for structured regex-heavy data like form fields, Privacy Filter for unstructured prose.

If you’ve been sending data to a hosted PII API because you didn’t realize there was a real local option — this is the news. A text sanitizer that stays on your machine is now 10 minutes of setup away.

What it can’t do

OpenAI has been unusually direct about the limits. Worth reading before you build on it.

It’s not a compliance guarantee. The model card literally uses the phrase “redaction aid, not a safety guarantee.” If your use case is HIPAA, GDPR, or any regulated workflow, Privacy Filter is one layer in a defence — not the whole thing. The National Law Review’s coverage echoed this: “Privacy risks remain, especially in legal, healthcare, financial, and other regulated settings.”

It misses uncommon names and over-redacts public ones. Try it on a news article about Taylor Swift and it’ll mask her name as private_person. Try it on a customer-support ticket from someone named Aranya Subramanyan and it may miss the surname. Both are fixable with fine-tuning, but out of the box, expect these edge cases.

Context can still re-identify someone. This is the subtle one. If you mask “Sarah Chen” but keep “Director of Product at Acme Logistics since 2021, manages a team of 12 in Seattle,” a motivated re-identifier can still figure out who the subject is. Privacy Filter does not redact job titles, team sizes, employer names, or tenure — those aren’t in the taxonomy. De-identification is a harder problem than PII removal, and nobody’s solved it in 1.5 billion parameters.

It’s English-first. OpenAI notes performance “drops on non-English text, non-Latin scripts.” If you’re redacting Japanese customer service logs, Chinese medical records, or Arabic contracts, fine-tune or use a different tool. Presidio still wins on raw language coverage.

It’s static. The 8 categories are the 8 categories. If you need to detect patient identifiers, tax IDs, or internal code names, you have to fine-tune the model on your own labeled data. OpenAI ships evaluation and fine-tuning scripts in the GitHub repo, but that’s engineering work, not a config flag.

What this means for you

If you’re a developer building on an LLM: This replaces the pre-call redaction step most teams handwave about. Paste it in front of your prompt — the raw ticket, support log, or user doc goes in, the redacted version goes to the model, the model response gets cross-referenced with the original redaction map to re-insert real values when needed. Zero API calls to third parties. Zero cost increase. The whole pipeline gets a lot easier to explain to a security reviewer.

If you’re a compliance officer: Privacy Filter doesn’t make you compliant. It makes the “what did you do about PII” question answerable in one sentence. Instead of “we trust the vendor,” you can say “we redact at the edge using an on-device open-weight model, and we retain the redaction logs.” That sentence plays better in every audit meeting we’ve ever seen.

If you’re an IT lead wondering whether to care: If anyone at your company pastes customer data into ChatGPT, Privacy Filter is your cheapest way to stop the data leak without also stopping the productivity. Wrap it in a small internal web app that sits on the intranet, let people paste in the dirty version, copy the clean version into whatever AI tool they’re using. Ship it in a day.

If you’ve never touched a model before: Honestly, this is a good one to start with. Unlike chatbot models, you don’t need to prompt-engineer anything. You pass text in, you get labels out. It’s the most deterministic “AI” you’ll ever run. Spin up a Google Colab notebook, paste the 3 lines above, feed it one of your own emails, and watch every piece of personal info get flagged. The exercise takes 10 minutes and teaches more about how modern models work than 5 hours of reading.

The bottom line: For unstructured-text PII redaction in English, Privacy Filter is now the default answer. Free, fast, local, accurate, and OpenAI-trained. Use it as a layer, not a guarantee. And do not tell your compliance team you’ve “solved privacy.” You’ve solved one very specific piece of it — which is still the piece most people get wrong.

Who should use it

Developers wrapping LLM calls for customer-support, sales, HR, or internal knowledge base use cases
Data teams preparing training or fine-tuning datasets from real-world logs
Security and IT teams building a pre-AI redaction checkpoint for employees using ChatGPT, Claude, or Gemini
Startups that don’t have the budget for a $2K-a-month hosted DLP service but do have customer data flowing through their systems
Solo developers and hobbyists who want a default “remove PII before I do anything else” step in personal automation

Probably not the right fit if you need a managed SLA, multi-language coverage as a first requirement, or compliance-grade logging out of the box.

The bottom line

Privacy Filter is small, fast, accurate, and free. It runs in a browser. It runs on a laptop. It runs 24 to 33 times faster on Apple Silicon with the community MLX port. It handles 128K tokens in a single pass. And for the cluster of use cases where “English unstructured text” and “keep the data on this machine” intersect, it’s still the best tool available — including against the typical Presidio/Comprehend/DLP alternatives, per independent Tonic.ai benchmarks released since launch (Privacy Filter at 0.92-0.99 F1 across realistic textual PII conditions; regex/NER baselines at 0.18-0.65 in the same tests). That validates the 96% headline number with outside data, not just OpenAI’s own.

The interesting part isn’t the model. It’s that OpenAI shipped something that explicitly reduces their own revenue — every call to Privacy Filter is a call that doesn’t go to their API. That choice signals a shift in how the commercial AI labs are thinking about the trust layer. Not every part of a privacy-conscious pipeline needs to be paid, hosted, and cloud-native. Some parts are better served by a small model that stays on your machine.

One thing the May 7 typosquat changed: the “casually pip install and from_pretrained whatever the top result says” era of pulling open-weight models is over. The two minutes you spend verifying the org name and screening for executable files is now part of the cost of using open weights responsibly. The good news: once you’ve verified the legit openai/privacy-filter once, every subsequent pull is automatic.

Download it (from openai/privacy-filter, not Open-OSS/privacy-filter), run the 10-minute setup, try it on one real document from your own inbox, and see what it catches. If it catches things you’d rather not know were sitting in cleartext in your email — which it will — you’ve just discovered the first use case.

Sources:

Updates added in May 2026:

Malicious Hugging Face Repository Typosquats OpenAI — Infosecurity Magazine
HiddenLayer analysis: Open-OSS/privacy-filter typosquat campaign (linked the attack to “Silver Fox” Chinese threat actor with npm/PyPI history)
OpenAI Releases Privacy Filter: 1.5B-Parameter Open-Source PII Redaction Model — MarkTechPost
OpenMed/privacy-filter-nemotron — Hugging Face (medical / 55-category fine-tune)
OpenMed/privacy-filter-multilingual — Hugging Face (16-language fine-tune)
Tonic.ai benchmark of Privacy Filter vs Presidio / NER baselines (independent F1 validation)
r/LocalLLaMA — Privacy Filter discussion threads (community testing, MLX troubleshooting, typosquat warnings)