Named Entity Recognition
How to extract people, organizations, dates, locations, and amounts from unstructured text — NER methods, tools, and real-world applications.
Premium Course Content
This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.
- Access all premium courses
- 1000+ AI skill templates included
- New content added weekly
Finding the Needles in the Haystack
🔄 Lesson 4 covered text classification — labeling entire documents. But sometimes you need to go deeper: not “what is this document about?” but “what specific things does it mention?” That’s named entity recognition.
NER identifies and classifies specific items in text — people, organizations, locations, dates, monetary amounts, and more. It’s the difference between knowing a document discusses a lawsuit and knowing it mentions “John Smith,” “Acme Corp,” “$2.3 million,” and “March 15, 2026.”
What NER Extracts
Standard NER recognizes these entity types:
| Entity Type | Label | Examples |
|---|---|---|
| Person | PER | “Elon Musk,” “Dr. Sarah Chen” |
| Organization | ORG | “Google,” “United Nations,” “MIT” |
| Location/GPE | LOC/GPE | “New York,” “Japan,” “the Amazon” |
| Date/Time | DATE/TIME | “March 2026,” “last Tuesday,” “3pm” |
| Money | MONEY | “$4.5 million,” “€200,” “¥10,000” |
| Percentage | PERCENT | “15%,” “two-thirds” |
| Product | PRODUCT | “iPhone 17,” “Model Y” |
These are the standard types. Domain-specific NER adds specialized categories — DRUG, DISEASE, and GENE for biomedical text; COURT, STATUTE, and JURISDICTION for legal documents.
How NER Works
NER is fundamentally a sequence labeling task. Each word (token) in the text gets a label indicating whether it’s part of an entity and, if so, what type.
BIO tagging is the standard format:
- B- = Beginning of an entity
- I- = Inside (continuation of) an entity
- O = Outside (not an entity)
Example: “Barack Obama visited New York City”
| Token | Label |
|---|---|
| Barack | B-PER |
| Obama | I-PER |
| visited | O |
| New | B-LOC |
| York | I-LOC |
| City | I-LOC |
The model’s job: assign the correct BIO label to every token in the text.
✅ Quick Check: Why does NER use BIO tags instead of simply labeling each word as PERSON, LOCATION, or NONE? Because entities span multiple words. Without B/I distinction, the model can’t tell where one entity ends and another begins. In “New York Times Building,” is it one entity or two? BIO makes it clear: “New” = B-ORG, “York” = I-ORG, “Times” = I-ORG, “Building” = O (or B-FAC for facility). The B tag signals “a new entity starts here.”
NER Approaches
Rule-based NER — Uses patterns, dictionaries, and regular expressions. “If a capitalized word follows ‘Mr.’, it’s a PERSON.” If a word matches a country dictionary, it’s a GPE.
- Strengths: no training data needed, fully interpretable, high precision for known patterns
- Weakness: brittle — misses anything not covered by rules
Statistical NER (CRF) — Conditional Random Fields model the sequence of labels as a whole, considering neighboring labels and word features. For years this was the standard.
- Strengths: considers sequence context, good with moderate data
- Weakness: requires manual feature engineering
Neural NER (BiLSTM-CRF) — Bidirectional LSTM reads the sentence in both directions, capturing context from left and right. A CRF layer on top ensures valid label sequences (you can’t have I-PER following B-ORG).
- Strengths: learns features automatically, strong performance
- Weakness: slower to train than CRF alone
Transformer NER (BERT-NER) — The current standard. Fine-tune a pretrained BERT model to predict BIO tags for each token. BERT’s deep contextual understanding handles ambiguity far better than previous approaches.
- Strengths: state-of-the-art accuracy, handles ambiguous entities, captures long-range context
- Weakness: requires GPU, slower inference
Real-World NER Applications
Healthcare: Extract drug names, dosages, conditions, and procedures from clinical notes. A hospital processes thousands of notes daily — NER structures this unstructured data for analytics, billing, and research. Specialized models like SciSpacy achieve 85-95% accuracy on biomedical entities.
Legal: Identify parties, dates, jurisdictions, statutes, and monetary amounts in contracts and case filings. NER reduces legal document review time by 75% — what took months of paralegal work now takes hours of automated processing plus targeted human review.
Finance: Extract company names, financial figures, dates, and events from earnings reports, SEC filings, and news. Trading firms use NER to process thousands of documents per second, spotting relevant information before human analysts can read the first paragraph.
Social media: Monitor brand mentions, competitor names, product references, and key people across millions of posts. NER combined with sentiment analysis tells you not just that people are talking about your brand, but what specifically they’re saying about which products.
✅ Quick Check: A financial firm wants to extract company names from news articles. “Apple reported strong earnings” should identify Apple as ORG. “I ate an apple for lunch” should not. Why is this hard for NER, and what approach handles it best? This is entity ambiguity — the same word can be an entity in one context and not in another. Rule-based systems fail because “Apple” matches their company dictionary in both cases. Transformer-based NER (BERT-NER) handles this because it considers the full sentence context. “Reported earnings” strongly signals a company context; “ate for lunch” signals food context.
Building NER Systems
Off-the-shelf tools:
| Tool | Strengths | Entity Types |
|---|---|---|
| spaCy | Fast, production-ready, 75+ languages | PER, ORG, GPE, DATE, MONEY, + more |
| Hugging Face | BERT-based, highest accuracy | Depends on model chosen |
| Stanza (Stanford) | Research-grade, 66 languages | Standard types |
| AWS Comprehend | No-code API, scalable | PER, ORG, LOC, DATE, + custom |
When to build custom NER: Off-the-shelf models work for standard entities (people, companies, places). Build custom NER when you need domain-specific entities — drug names, legal citations, product SKUs, financial instruments — that general models don’t recognize.
Key Takeaways
- NER extracts specific entities (people, orgs, dates, amounts) from unstructured text — sequence labeling with BIO tags
- Evolution: rule-based → CRF → BiLSTM-CRF → transformer-based (BERT-NER is current standard)
- Context determines entity type — “Apple” is ORG or fruit depending on surrounding words
- Domain-specific NER requires domain-specific models — news NER fails on medical text
- Applications: legal review (75% time reduction), healthcare records, financial analysis, social media monitoring
- Tools: spaCy (fast, production), Hugging Face (highest accuracy), AWS Comprehend (no-code)
Up Next
NER tells you what is mentioned in text. But what do people think about those things? Lesson 6 covers sentiment analysis — detecting opinions, emotions, and attitudes in text at scale.
Knowledge Check
Complete the quiz above first
Lesson completed!