Sentiment Analysis

Reading the Room at Scale

🔄 Lesson 5 covered NER — extracting what’s mentioned in text. Now we turn to a different question: what do people think about those things? Sentiment analysis detects opinions, emotions, and attitudes in text — and it’s one of the most commercially valuable NLP applications.

Every company with customers has opinions to analyze: product reviews, support tickets, social media mentions, survey responses, employee feedback. Sentiment analysis turns this flood of unstructured opinion into structured, actionable data.

Levels of Sentiment Analysis

Sentiment analysis operates at three levels of granularity:

Document-level: Classify the entire text as positive, negative, or neutral.

“This restaurant is amazing” → Positive
“Terrible service, never coming back” → Negative
Simplest to build, but loses nuance in mixed-sentiment texts

Sentence-level: Classify each sentence independently.

“The food was excellent. The waiter was rude.” → Sentence 1: Positive, Sentence 2: Negative
Better granularity, but still misses what specifically is positive or negative

Aspect-based: Identify specific aspects and the sentiment toward each.

“The camera is great but battery life is disappointing” → Camera: Positive, Battery: Negative
Most actionable — tells you exactly what to fix and what to promote

✅ Quick Check: An e-commerce company wants to know why customers return products. They have 500,000 return reason comments. Which level of sentiment analysis provides the most actionable insights? Aspect-based — it identifies specific product aspects driving dissatisfaction (sizing, quality, color accuracy, packaging). Document-level would just tell you most returns are negative (obvious). Aspect-based tells you that 40% of clothing returns mention “runs small” — a sizing chart fix could reduce returns significantly.

How Sentiment Analysis Works

Lexicon-based approach: Use a dictionary of words pre-labeled with sentiment scores. “Excellent” = +3, “terrible” = -3, “okay” = 0. Sum the scores across the document.

Strengths: no training data needed, interpretable, fast
Weakness: misses context, sarcasm, and negation (“not good” scores the same as “good” with a “not” word)

Machine learning approach: Train a classifier (Lesson 4 techniques) on labeled examples. TF-IDF + Logistic Regression is a strong baseline. BERT fine-tuned on sentiment data is the current standard.

Strengths: captures context, handles nuance, higher accuracy
Weakness: requires labeled training data

LLM-based approach: Prompt an LLM (GPT-4, Claude) to classify sentiment with zero or few examples. “Classify the sentiment of this review as positive, negative, or neutral: [text]”

Strengths: no training data needed, handles nuance and sarcasm, flexible
Weakness: expensive at scale, slower, less consistent than fine-tuned models

The Sarcasm Problem

Sarcasm is sentiment analysis’s hardest challenge. “Oh wonderful, my flight is delayed again. What a fantastic airline.” Every word is positive. The meaning is deeply negative.

Lexicon-based systems fail completely — they see “wonderful” and “fantastic” and classify this as positive. Even basic ML models struggle because the surface-level signals conflict with the actual meaning.

What helps:

Training on sarcastic data — models need sarcasm-labeled examples to learn the pattern
Context-aware models — BERT detects the incongruity between “wonderful” and “delayed again”
Emoji and punctuation signals — sarcasm often comes with specific markers (🙄, excessive exclamation marks, quotation marks around positive words)
Separate sarcasm classifier — detect sarcasm first, then adjust sentiment

Current state: BERT-based sarcasm detection achieves 75-85% accuracy. Not perfect, but workable for most business applications.

Real-World Applications

Brand monitoring: Companies track sentiment about their brand, products, and competitors across social media, news, and review platforms. A sudden negative spike can signal a PR crisis, a product defect, or a competitor’s successful campaign.

Financial markets: Hedge funds analyze news sentiment and social media sentiment to predict stock movements. A study by JP Morgan found that Twitter sentiment predicted stock returns with modest but statistically significant accuracy.

Product development: Aspect-based sentiment on product reviews tells product teams exactly what customers love and hate. “Great camera but terrible battery” across thousands of reviews is a clear signal for the next product iteration.

Employee experience: Companies analyze internal survey comments, Glassdoor reviews, and Slack messages (with consent) to measure engagement and identify issues before they become attrition.

✅ Quick Check: A restaurant analyzes 10,000 Yelp reviews. Document-level sentiment shows 65% positive. Aspect-based sentiment reveals: Food (80% positive), Service (45% positive), Wait time (20% positive), Ambiance (70% positive). Which insight is more valuable for the restaurant owner? The aspect-based breakdown. 65% positive overall doesn’t help — the owner can’t improve “overall.” But knowing service is the weakest link (45%) and wait time is terrible (20%) gives them specific, actionable targets. Fix wait times (maybe add a reservation system or hire more staff), then train service staff. Food and ambiance are strengths to maintain and market.

Sentiment at Scale: Practical Considerations

Handling negation: “Not bad,” “hardly impressive,” “far from perfect” — negation flips or modifies sentiment. Lexicon approaches handle this poorly. ML models handle it better. Transformer models handle it best because they consider the full sentence context.

Domain adaptation: Sentiment words mean different things in different domains. “Unpredictable” is negative for a car (reliability issue) but positive for a thriller movie (keeps you guessing). Models trained on movie reviews underperform on product reviews — and vice versa.

Multilingual sentiment: Sentiment expressions vary dramatically across languages and cultures. Japanese readers express criticism more indirectly than American reviewers. A direct translation of a Japanese review might read as “neutral” when the original carries clear negative intent. Multilingual models help, but cultural calibration is still needed.

Key Takeaways

Sentiment analysis detects opinions at three levels: document (overall), sentence (per sentence), and aspect-based (per feature)
Aspect-based sentiment is most actionable — it tells you exactly what to fix and what to promote
Methods: lexicon-based (fast, no training data), ML classifiers (better accuracy), LLM-based (most flexible, expensive at scale)
Sarcasm remains the hardest challenge — BERT-based models detect it at 75-85% accuracy
Domain matters — “unpredictable” is negative for cars, positive for thrillers
Never make major business decisions on raw sentiment scores alone — always investigate context, sample size, and specificity

Up Next

Lessons 2-6 covered the NLP task toolkit — preprocessing, representations, classification, NER, and sentiment. Lesson 7 zooms out to the technology that changed everything: how transformers and LLMs reshaped every NLP task you’ve learned about.