Neural Networks
How artificial neurons work — weights, biases, activation functions, and the forward pass from input to prediction.
Premium Course Content
This lesson is part of a premium course. Upgrade to Pro to unlock all premium courses and content.
- Access all premium courses
- 1000+ AI skill templates included
- New content added weekly
The Building Block
Every deep learning model — from a simple spam classifier to GPT-4 — is built from the same fundamental unit: the artificial neuron. Understanding how a single neuron works gives you the key to understanding entire networks.
The Artificial Neuron
An artificial neuron does four things:
- Receives inputs — numbers from the previous layer (or from raw data for the first layer)
- Multiplies each input by a weight — weights determine how important each input is
- Adds a bias — a constant that shifts the result
- Applies an activation function — introduces non-linearity (the crucial step)
Analogy: Think of a neuron as a voting system. Each input casts a vote (the value), weighted by importance (the weight). The bias is a baseline preference. The activation function decides whether the total vote is strong enough to “fire” — to pass a signal to the next layer.
Input × Weight₁ + Input × Weight₂ + ... + Bias → Activation Function → Output
The weights and biases are what the network learns during training. Before training, they’re random. After training, they encode the patterns the network discovered.
Layers
Neurons are organized into layers, and layers stack to form a network:
| Layer Type | Role | Example |
|---|---|---|
| Input layer | Receives raw data | 784 neurons for a 28×28 pixel image (one per pixel) |
| Hidden layers | Transform data, learn features | 128, 256, or 512 neurons per layer |
| Output layer | Produces the prediction | 1 neuron (spam/not spam) or 10 neurons (digits 0-9) |
Input layer: One neuron per feature. For an image, each pixel becomes an input. For tabular data, each column becomes an input.
Hidden layers: Where the learning happens. Each neuron in a hidden layer receives the outputs of all neurons in the previous layer, processes them, and passes its output to the next layer. More layers = capacity to learn more complex patterns.
Output layer: Translates the network’s internal representation into a useful prediction. The number of output neurons matches the task: 1 for binary classification, 10 for classifying digits, hundreds for text generation.
✅ Quick Check: A neural network classifies handwritten digits (0-9). The input images are 28×28 pixels. How many neurons are in the input layer and the output layer? Input: 784 neurons (28 × 28 = 784 pixels, each becomes one input). Output: 10 neurons (one for each digit 0-9). Each output neuron produces a probability — the highest probability is the predicted digit.
Activation Functions
Activation functions are the critical ingredient that makes deep learning work. Without them, stacking layers is pointless — a 100-layer linear network collapses to a single linear operation.
ReLU (Rectified Linear Unit): The most widely used activation. If the input is positive, pass it through unchanged. If negative, output zero.
ReLU(x) = max(0, x)
Why ReLU works: It’s simple, fast to compute, and avoids the “vanishing gradient” problem that plagued earlier activations. The downside: neurons that output zero can “die” and stop learning (the dying ReLU problem).
Sigmoid: Squashes any value into the range 0 to 1. Used in the output layer for binary classification (probability of belonging to a class).
Softmax: Extends sigmoid to multiple classes. Takes a vector of raw scores and converts them to probabilities that sum to 1. Used in the output layer for multi-class classification.
Tanh: Squashes values to the range -1 to 1. Sometimes used in hidden layers, though ReLU is generally preferred.
The Forward Pass
When you give a neural network an input (like an image), it processes the data through every layer from input to output. This is called the forward pass.
Step by step:
- Input data enters the input layer (e.g., 784 pixel values)
- Each input is multiplied by its weight and summed at each neuron in the first hidden layer
- The activation function is applied to each sum
- The outputs become inputs to the next hidden layer
- Repeat through all hidden layers
- The output layer produces the final prediction
During the forward pass, no learning happens — the network just computes a prediction using its current weights. Learning (adjusting weights) happens during training, which we’ll cover in Lesson 3.
✅ Quick Check: You have a 3-layer network (input → hidden → output) for classifying emails as spam or not spam. The input has 50 features (word counts, sender info, etc.). The hidden layer has 64 neurons with ReLU activation. The output layer has 1 neuron. What activation function should the output neuron use? Sigmoid — because this is binary classification (spam or not spam). Sigmoid converts the output to a probability between 0 and 1. If the output is 0.87, there’s an 87% probability the email is spam.
Key Takeaways
- Artificial neurons receive inputs, multiply by weights, add bias, and apply an activation function
- Weights and biases are learned during training — they encode the patterns the network discovers
- Three layer types: input (raw data), hidden (feature learning), output (prediction)
- Activation functions (ReLU, sigmoid, softmax) introduce non-linearity — without them, depth is meaningless
- The forward pass processes data through all layers to produce a prediction — no learning happens during this step
Up Next
The forward pass produces a prediction. But how does the network improve? Lesson 3 covers the training process — loss functions, backpropagation, and gradient descent. This is how neural networks actually learn.
Knowledge Check
Complete the quiz above first
Lesson completed!