Named Entity Recognition

Finding the Needles in the Haystack

🔄 Lesson 4 covered text classification — labeling entire documents. But sometimes you need to go deeper: not “what is this document about?” but “what specific things does it mention?” That’s named entity recognition.

NER identifies and classifies specific items in text — people, organizations, locations, dates, monetary amounts, and more. It’s the difference between knowing a document discusses a lawsuit and knowing it mentions “John Smith,” “Acme Corp,” “$2.3 million,” and “March 15, 2026.”

What NER Extracts

Standard NER recognizes these entity types:

Entity Type	Label	Examples
Person	PER	“Elon Musk,” “Dr. Sarah Chen”
Organization	ORG	“Google,” “United Nations,” “MIT”
Location/GPE	LOC/GPE	“New York,” “Japan,” “the Amazon”
Date/Time	DATE/TIME	“March 2026,” “last Tuesday,” “3pm”
Money	MONEY	“$4.5 million,” “€200,” “¥10,000”
Percentage	PERCENT	“15%,” “two-thirds”
Product	PRODUCT	“iPhone 17,” “Model Y”

These are the standard types. Domain-specific NER adds specialized categories — DRUG, DISEASE, and GENE for biomedical text; COURT, STATUTE, and JURISDICTION for legal documents.

How NER Works

NER is fundamentally a sequence labeling task. Each word (token) in the text gets a label indicating whether it’s part of an entity and, if so, what type.

BIO tagging is the standard format:

B- = Beginning of an entity
I- = Inside (continuation of) an entity
O = Outside (not an entity)

Example: “Barack Obama visited New York City”

Token	Label
Barack	B-PER
Obama	I-PER
visited	O
New	B-LOC
York	I-LOC
City	I-LOC

The model’s job: assign the correct BIO label to every token in the text.

✅ Quick Check: Why does NER use BIO tags instead of simply labeling each word as PERSON, LOCATION, or NONE? Because entities span multiple words. Without B/I distinction, the model can’t tell where one entity ends and another begins. In “New York Times Building,” is it one entity or two? BIO makes it clear: “New” = B-ORG, “York” = I-ORG, “Times” = I-ORG, “Building” = O (or B-FAC for facility). The B tag signals “a new entity starts here.”

NER Approaches

Rule-based NER — Uses patterns, dictionaries, and regular expressions. “If a capitalized word follows ‘Mr.’, it’s a PERSON.” If a word matches a country dictionary, it’s a GPE.

Strengths: no training data needed, fully interpretable, high precision for known patterns
Weakness: brittle — misses anything not covered by rules

Statistical NER (CRF) — Conditional Random Fields model the sequence of labels as a whole, considering neighboring labels and word features. For years this was the standard.

Strengths: considers sequence context, good with moderate data
Weakness: requires manual feature engineering

Neural NER (BiLSTM-CRF) — Bidirectional LSTM reads the sentence in both directions, capturing context from left and right. A CRF layer on top ensures valid label sequences (you can’t have I-PER following B-ORG).

Strengths: learns features automatically, strong performance
Weakness: slower to train than CRF alone

Transformer NER (BERT-NER) — The current standard. Fine-tune a pretrained BERT model to predict BIO tags for each token. BERT’s deep contextual understanding handles ambiguity far better than previous approaches.

Strengths: state-of-the-art accuracy, handles ambiguous entities, captures long-range context
Weakness: requires GPU, slower inference

Real-World NER Applications

Healthcare: Extract drug names, dosages, conditions, and procedures from clinical notes. A hospital processes thousands of notes daily — NER structures this unstructured data for analytics, billing, and research. Specialized models like SciSpacy achieve 85-95% accuracy on biomedical entities.

Legal: Identify parties, dates, jurisdictions, statutes, and monetary amounts in contracts and case filings. NER reduces legal document review time by 75% — what took months of paralegal work now takes hours of automated processing plus targeted human review.

Finance: Extract company names, financial figures, dates, and events from earnings reports, SEC filings, and news. Trading firms use NER to process thousands of documents per second, spotting relevant information before human analysts can read the first paragraph.

Social media: Monitor brand mentions, competitor names, product references, and key people across millions of posts. NER combined with sentiment analysis tells you not just that people are talking about your brand, but what specifically they’re saying about which products.

✅ Quick Check: A financial firm wants to extract company names from news articles. “Apple reported strong earnings” should identify Apple as ORG. “I ate an apple for lunch” should not. Why is this hard for NER, and what approach handles it best? This is entity ambiguity — the same word can be an entity in one context and not in another. Rule-based systems fail because “Apple” matches their company dictionary in both cases. Transformer-based NER (BERT-NER) handles this because it considers the full sentence context. “Reported earnings” strongly signals a company context; “ate for lunch” signals food context.

Building NER Systems

Off-the-shelf tools:

Tool	Strengths	Entity Types
spaCy	Fast, production-ready, 75+ languages	PER, ORG, GPE, DATE, MONEY, + more
Hugging Face	BERT-based, highest accuracy	Depends on model chosen
Stanza (Stanford)	Research-grade, 66 languages	Standard types
AWS Comprehend	No-code API, scalable	PER, ORG, LOC, DATE, + custom

When to build custom NER: Off-the-shelf models work for standard entities (people, companies, places). Build custom NER when you need domain-specific entities — drug names, legal citations, product SKUs, financial instruments — that general models don’t recognize.

Key Takeaways

NER extracts specific entities (people, orgs, dates, amounts) from unstructured text — sequence labeling with BIO tags
Evolution: rule-based → CRF → BiLSTM-CRF → transformer-based (BERT-NER is current standard)
Context determines entity type — “Apple” is ORG or fruit depending on surrounding words
Domain-specific NER requires domain-specific models — news NER fails on medical text
Applications: legal review (75% time reduction), healthcare records, financial analysis, social media monitoring
Tools: spaCy (fast, production), Hugging Face (highest accuracy), AWS Comprehend (no-code)

Up Next

NER tells you what is mentioned in text. But what do people think about those things? Lesson 6 covers sentiment analysis — detecting opinions, emotions, and attitudes in text at scale.