Data Analysis & Statistics
Select the right statistical tests, generate publication-quality visualizations, and identify patterns in complex datasets — with AI-guided analysis workflows.
Premium Course Content
This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.
- Access all premium courses
- 1000+ AI skill templates included
- New content added weekly
“Which statistical test should I use?” is the most common question researchers ask — and the one most likely to get a wrong answer from AI if you don’t provide context. AI tools can run any test instantly. The challenge isn’t computation; it’s choosing the right test and verifying the assumptions hold for your specific data.
🔄 Quick Recall: In the previous lesson, you designed experiments with AI — including power analysis and confound identification. Now those design decisions pay off: well-designed experiments are straightforward to analyze. Poorly designed experiments create statistical nightmares no AI can fix.
The Statistical Test Selection Framework
Before running any test, answer these five questions. AI can help, but you must verify.
| Question | Why It Matters | Example |
|---|---|---|
| What’s your research question? | Determines the test family | Comparing groups → t-test family; relationships → correlation/regression |
| How many variables? | Narrows the test | 1 IV + 1 DV → simple; multiple IVs → factorial/multivariate |
| What type of data? | Constrains test choice | Continuous → parametric; ordinal → non-parametric; categorical → chi-square |
| How many groups? | Specifies the exact test | 2 groups → t-test; 3+ groups → ANOVA; repeated → paired/within |
| Are assumptions met? | Validates or invalidates | Normal distribution, equal variance, independence |
Help me select the appropriate statistical test:
Research question: [what am I testing?]
Independent variable(s): [name, type: continuous/categorical/ordinal]
Dependent variable(s): [name, type: continuous/categorical/ordinal]
Design: [between-subjects / within-subjects / mixed]
Number of groups/levels: [how many]
Sample size: [N per group]
Data characteristics:
- Normal distribution? [yes / no / unknown — I'll check]
- Equal variances? [yes / no / unknown — I'll check]
- Independent observations? [yes / no — explain if no]
- Any repeated measures? [yes / no]
Recommend:
1. The primary statistical test (with justification)
2. The assumption checks I need to run first
3. What to do if assumptions are violated (non-parametric alternative)
4. The effect size measure to report alongside p-values
5. How to report the results in APA/journal format
✅ Quick Check: You have three groups of plants (different fertilizers) and you measured plant height after 30 days. What’s the most likely appropriate test? (Answer: One-way ANOVA — you have one categorical IV with 3 levels and one continuous DV. If the ANOVA is significant, follow up with post-hoc tests like Tukey’s HSD to identify which groups differ.)
Assumption Checking Workflow
This is where most AI-assisted analyses go wrong — skipping straight to the test.
Check statistical assumptions for my planned analysis:
Test I plan to run: [test name]
Data: [describe your dataset briefly]
Run and interpret:
1. Normality: Shapiro-Wilk test + Q-Q plot (for each group)
2. Homogeneity of variance: Levene's test
3. Independence: Describe your data collection (are observations truly independent?)
4. Linearity: Scatter plot (if regression/correlation)
5. Outliers: Box plots, z-scores > 3, or Mahalanobis distance
For each assumption:
- Result of the test
- Visual inspection (what the plot shows)
- Decision: assumption met / borderline / violated
- If violated: recommended alternative test or transformation
AI tools for assumption checking:
| Tool | Strength |
|---|---|
| Julius AI | Upload data → automatic normality tests, Q-Q plots, box plots |
| JASP | Point-and-click assumption checks built into every analysis |
| AI assistants | Explain test results, suggest alternatives when assumptions fail |
| GraphPad Prism | Life science-focused assumption testing with guided decisions |
AI-Generated Visualizations
Good visualizations reveal patterns that summary statistics hide. AI generates them quickly, but you decide what matters.
Create publication-quality visualizations for my data:
Data description: [what your variables are]
Analysis: [what statistical test you're running]
Journal target: [which journal's style]
Generate:
1. The primary result figure (show the main finding)
2. Assumption check plots (Q-Q, residuals, box plots)
3. Exploratory visualizations (distributions, relationships)
Format requirements:
- Resolution: 300 DPI minimum
- Font: [journal requirement, typically Arial or Helvetica]
- Color: accessible color palette (colorblind-safe)
- Axis labels: clear, with units
- Error bars: specify SE, SD, or 95% CI
Visualization best practices:
- Show individual data points when N < 30 (don’t hide behind bar charts)
- Use violin plots or raincloud plots instead of bar charts for distributions
- Include error bars with clear labels (SE and 95% CI communicate very different things)
- Use colorblind-safe palettes (roughly 8% of men have color vision deficiency)
✅ Quick Check: A bar chart shows Group A mean = 50 and Group B mean = 55, with non-overlapping error bars. Is this enough to conclude the groups differ? (Answer: It depends on what the error bars represent. SE bars can appear non-overlapping even when the difference isn’t significant. 95% CI bars that don’t overlap usually indicate significance, but overlapping CIs don’t necessarily mean non-significance. Always report the actual test statistic and p-value — error bars are visual aids, not statistical tests.)
Handling Multiple Comparisons
When AI helps you explore data, it’s easy to run dozens of tests without realizing the accumulated error rate.
| Number of Tests | Chance of at Least One False Positive (α = 0.05) |
|---|---|
| 1 | 5% |
| 5 | 23% |
| 10 | 40% |
| 20 | 64% |
| 50 | 92% |
Correction methods:
- Bonferroni: Conservative — divide α by number of tests (α/n)
- FDR (Benjamini-Hochberg): Less conservative — controls false discovery rate
- Pre-registration: Best approach — specify your primary analyses in advance, label everything else as exploratory
Practice Exercise
- Take a dataset from your current research and run the statistical test selection prompt
- Before running the selected test, run the assumption checking workflow — do all assumptions hold?
- Generate at least one publication-quality visualization using Julius AI or your preferred tool
Key Takeaways
- Always select statistical tests based on your data structure and design — don’t accept whatever AI runs by default
- Check all assumptions before trusting results: normality, equal variance, independence, linearity
- Report effect sizes alongside p-values — statistical significance without practical significance is empty
- Correct for multiple comparisons when running exploratory analyses — the more tests you run, the higher the false positive rate
- Label all post-hoc findings clearly as exploratory, not confirmatory — transparency is essential for credibility
- Show individual data points in visualizations when possible — bar charts hide important distribution information
Up Next
In the next lesson, you’ll learn to write your paper with AI assistance — from structuring your manuscript to polishing your prose while maintaining your scientific voice.
Knowledge Check
Complete the quiz above first
Lesson completed!