Lesson 5 15 min

Data Journalism and Analysis

Turn raw data into stories using AI — analyze datasets, find patterns, create visualizations, and make complex information accessible to your audience.

🔄 Quick Recall: In the previous lesson, you learned to use AI as a writing and editing partner. Now let’s add another powerful capability — using AI to find stories hidden inside datasets that would take weeks to analyze manually.

Data Is Everywhere. Stories Are Hidden.

Government agencies publish thousands of datasets — budgets, crime statistics, environmental reports, inspection records. Companies file quarterly reports with detailed financials. Nonprofits release impact data. Census bureaus produce demographic portraits of every community.

Most of this data goes unreported. Not because it’s uninteresting, but because analyzing it takes time and expertise that many newsrooms lack. AI changes that equation. You don’t need to be a data scientist to find the story in a spreadsheet.

Getting Data Into AI

Structured data (CSV, Excel): Most AI tools can process spreadsheets directly. Upload the file and describe what you want to know.

PDF tables: Government reports often lock data in PDF tables. Ask AI to extract the data into a structured format first, then analyze it.

Unstructured data: Meeting minutes, inspection reports, and complaint logs contain data buried in text. AI can extract and categorize this information.

I have a dataset of [describe: what it contains, how many rows, time period covered].

Start by:
1. Describing the dataset structure (columns, data types, coverage)
2. Identifying any data quality issues (missing values, outliers, inconsistencies)
3. Calculating basic summary statistics
4. Identifying the 3-5 most notable findings or patterns

Then I'll tell you what to dig into deeper.

Quick Check: Why should you ask AI to identify data quality issues before diving into analysis?

Because dirty data produces false stories. If 30% of records are missing a key field, any analysis of that field is unreliable. If there’s a spike in data because of a reporting change (not a real change), you’d write a false trend story. Checking data quality first prevents you from building a story on a foundation of noise.

Finding the Story in the Data

Once you understand the data, look for stories:

Trend analysis: What’s changing over time? Is crime going up in specific neighborhoods? Are inspection failures increasing at certain facilities?

Analyze this data for trends over [time period]:
1. What metrics are changing most dramatically?
2. When did the changes start?
3. Are the changes concentrated in specific categories or locations?
4. Are there any reversals or inflection points?
5. Which trends would be most newsworthy to a [local/national/trade] audience?

Outlier detection: What stands out? Which school district spends dramatically more or less per student? Which hospital has an unusually high complication rate?

Comparison analysis: How do different groups compare? Are there disparities by zip code, race, income level, or political affiliation?

Pattern matching: Do certain events correlate? Do restaurant inspection scores predict health code violations? Do police response times correlate with neighborhood demographics?

Making Data Accessible

Raw numbers don’t resonate. Stories do. AI helps you translate data into narrative:

I found this pattern in the data: [describe finding].

Help me make this accessible to a general audience:
1. Write a one-sentence summary a reader would immediately understand
2. Create an analogy that makes the scale tangible (e.g., "That's enough to fill 15 Olympic swimming pools")
3. Identify the human impact — who is affected and how?
4. Suggest a person or community I should profile to give this data a face
5. Draft a "by the numbers" sidebar with 4-5 key statistics

The best data stories don’t lead with numbers. They lead with people affected by the numbers, then use data to show the scope.

Visualization Basics

AI can help you create or conceptualize charts, even if you’re not a designer:

Choose the right chart type:

  • Trends over time → line chart
  • Comparing categories → bar chart
  • Showing proportions → pie chart (sparingly) or stacked bar
  • Geographic patterns → map
  • Relationships between variables → scatter plot
I want to visualize [specific data finding] for a web article.

1. Which chart type best communicates this finding?
2. What should the axes/labels be?
3. What color or design choices would make the key finding immediately obvious?
4. What caption would help readers interpret the chart correctly?
5. Are there any common ways this type of visualization can be misleading?

Data Ethics for Journalists

When using AI to analyze data, maintain these standards:

Show your work. Document your methodology so others can reproduce your analysis. What data did you use? How did you clean it? What did you ask AI to do? What did you exclude?

Acknowledge limitations. No dataset is perfect. Be transparent about gaps, potential biases in collection, and what your analysis cannot tell you.

Don’t overstate causation. AI can find correlations easily. Correlation isn’t causation. A trend in the data is a lead for reporting, not a conclusion.

Protect privacy. Data that’s technically public can still identify individuals when combined. Be cautious about publishing granular data that could harm specific people.

Exercise: Data Story Sprint

Find a public dataset relevant to your beat (try data.gov, your city’s open data portal, or a federal agency’s statistics page):

  1. Upload it to an AI tool and run the initial analysis prompt
  2. Identify the most newsworthy finding
  3. Use the “making data accessible” prompt to translate the finding for general audiences
  4. Draft the first two paragraphs of a data-driven story based on what you found
  5. List three sources you’d need to interview to complete the story

Key Takeaways

  • AI makes data journalism accessible to reporters without coding skills — describe what you want to know in plain language
  • Always check data quality before analysis to avoid building stories on noise or errors
  • Look for trends, outliers, comparisons, and correlations — each reveals different types of stories
  • The best data stories lead with people, not numbers — use AI to find the human impact
  • Document your methodology transparently: data source, cleaning steps, analysis approach, and limitations
  • Correlation from AI analysis is a lead for reporting, not a publishable conclusion

Up Next: In the next lesson, we’ll tackle the ethical challenges of AI in journalism — bias, disclosure, deepfakes, and drawing the line between AI assistance and AI authorship.

Knowledge Check

1. What's the most important first step when AI identifies a 'significant pattern' in a dataset?

2. How can a journalist with no coding experience use AI for data analysis?

3. When publishing data-driven stories, what should you always include?

Answer all questions to check

Complete the quiz above first

Related Skills