Neural Networks

The Building Block

Every deep learning model — from a simple spam classifier to GPT-4 — is built from the same fundamental unit: the artificial neuron. Understanding how a single neuron works gives you the key to understanding entire networks.

The Artificial Neuron

An artificial neuron does four things:

Receives inputs — numbers from the previous layer (or from raw data for the first layer)
Multiplies each input by a weight — weights determine how important each input is
Adds a bias — a constant that shifts the result
Applies an activation function — introduces non-linearity (the crucial step)

Analogy: Think of a neuron as a voting system. Each input casts a vote (the value), weighted by importance (the weight). The bias is a baseline preference. The activation function decides whether the total vote is strong enough to “fire” — to pass a signal to the next layer.

Input × Weight₁ + Input × Weight₂ + ... + Bias → Activation Function → Output

The weights and biases are what the network learns during training. Before training, they’re random. After training, they encode the patterns the network discovered.

Layers

Neurons are organized into layers, and layers stack to form a network:

Layer Type	Role	Example
Input layer	Receives raw data	784 neurons for a 28×28 pixel image (one per pixel)
Hidden layers	Transform data, learn features	128, 256, or 512 neurons per layer
Output layer	Produces the prediction	1 neuron (spam/not spam) or 10 neurons (digits 0-9)

Input layer: One neuron per feature. For an image, each pixel becomes an input. For tabular data, each column becomes an input.

Hidden layers: Where the learning happens. Each neuron in a hidden layer receives the outputs of all neurons in the previous layer, processes them, and passes its output to the next layer. More layers = capacity to learn more complex patterns.

Output layer: Translates the network’s internal representation into a useful prediction. The number of output neurons matches the task: 1 for binary classification, 10 for classifying digits, hundreds for text generation.

✅ Quick Check: A neural network classifies handwritten digits (0-9). The input images are 28×28 pixels. How many neurons are in the input layer and the output layer? Input: 784 neurons (28 × 28 = 784 pixels, each becomes one input). Output: 10 neurons (one for each digit 0-9). Each output neuron produces a probability — the highest probability is the predicted digit.

Activation Functions

Activation functions are the critical ingredient that makes deep learning work. Without them, stacking layers is pointless — a 100-layer linear network collapses to a single linear operation.

ReLU (Rectified Linear Unit): The most widely used activation. If the input is positive, pass it through unchanged. If negative, output zero.

ReLU(x) = max(0, x)

Why ReLU works: It’s simple, fast to compute, and avoids the “vanishing gradient” problem that plagued earlier activations. The downside: neurons that output zero can “die” and stop learning (the dying ReLU problem).

Sigmoid: Squashes any value into the range 0 to 1. Used in the output layer for binary classification (probability of belonging to a class).

Softmax: Extends sigmoid to multiple classes. Takes a vector of raw scores and converts them to probabilities that sum to 1. Used in the output layer for multi-class classification.

Tanh: Squashes values to the range -1 to 1. Sometimes used in hidden layers, though ReLU is generally preferred.

The Forward Pass

When you give a neural network an input (like an image), it processes the data through every layer from input to output. This is called the forward pass.

Step by step:

Input data enters the input layer (e.g., 784 pixel values)
Each input is multiplied by its weight and summed at each neuron in the first hidden layer
The activation function is applied to each sum
The outputs become inputs to the next hidden layer
Repeat through all hidden layers
The output layer produces the final prediction

During the forward pass, no learning happens — the network just computes a prediction using its current weights. Learning (adjusting weights) happens during training, which we’ll cover in Lesson 3.

✅ Quick Check: You have a 3-layer network (input → hidden → output) for classifying emails as spam or not spam. The input has 50 features (word counts, sender info, etc.). The hidden layer has 64 neurons with ReLU activation. The output layer has 1 neuron. What activation function should the output neuron use? Sigmoid — because this is binary classification (spam or not spam). Sigmoid converts the output to a probability between 0 and 1. If the output is 0.87, there’s an 87% probability the email is spam.

Key Takeaways

Artificial neurons receive inputs, multiply by weights, add bias, and apply an activation function
Weights and biases are learned during training — they encode the patterns the network discovers
Three layer types: input (raw data), hidden (feature learning), output (prediction)
Activation functions (ReLU, sigmoid, softmax) introduce non-linearity — without them, depth is meaningless
The forward pass processes data through all layers to produce a prediction — no learning happens during this step

Up Next

The forward pass produces a prediction. But how does the network improve? Lesson 3 covers the training process — loss functions, backpropagation, and gradient descent. This is how neural networks actually learn.