Artificial Neural Networks (ANNs) are inspired by the human brain.
| Biological | Artificial |
|---|---|
| Neuron | Node/Unit |
| Dendrites | Input connections |
| Axon | Output |
| Synapse | Weight |
| Firing threshold | Activation function |
Human brain: ~10¹¹ neurons, each connected to ~10⁴ others.
A perceptron computes a weighted sum of inputs and applies an activation function.
Input: x = (x₁, x₂, ..., xₙ)
Weights: w = (w₁, w₂, ..., wₙ)
Bias: w₀
Net input: net = w₀ + Σ wᵢxᵢ = w·x + b
Output: y = f(net)
| Function | Formula | Range | Use |
|---|---|---|---|
| Step | 1 if net>0, else 0 | {0,1} | Binary classification |
| Sigmoid | 1/(1+e^(-net)) | (0,1) | Probability output |
| Tanh | (e^x - e^(-x))/(e^x + e^(-x)) | (-1,1) | Hidden layers |
| ReLU | max(0, net) | [0,∞) | Deep networks |
| Softmax | e^(xᵢ)/Σe^(xⱼ) | (0,1) | Multi-class output |
For linearly separable problems, the perceptron learning rule converges.
Initialize weights w randomly (small values)
Repeat until convergence:
For each training example (x, y_true):
y_pred = step(w·x)
if y_pred ≠ y_true:
w ← w + η(y_true - y_pred)x
where η = learning rate (0 < η ≤ 1)
Training data:
x1 x2 | y
0 0 | 0
0 1 | 0
1 0 | 0
1 1 | 1
Initial weights: w=(0.5, 0.5), bias=0, η=0.1
Iteration 1: (0,0,0) → net=-0.5, pred=0, correct
Iteration 2: (0,1,0) → net=0.5, pred=1, WRONG
w₀ ← 0 + 0.1×(0-1)×1 = -0.1 (bias update)
w₁ ← 0.5 + 0.1×(0-1)×0 = 0.5
w₂ ← 0.5 + 0.1×(0-1)×1 = 0.4
...
(Continue until convergence)
If training data is linearly separable, the perceptron learning algorithm will converge to a correct solution in finite steps.
Perceptron CANNOT solve XOR:
x1 x2 | XOR
0 0 | 0
0 1 | 1
1 0 | 1
1 1 | 0
No single line separates 0s from 1s → Need multilayer network
Adds hidden layers between input and output to solve non-linear problems.
Input Layer → Hidden Layer(s) → Output Layer
Example (3-2-1 network):
x₁ ──┐
x₂ ──┤── h₁ ──┐
x₃ ──┤ ├── y
└── h₂ ──┘
For each layer:
netⱼ = Σᵢ wᵢⱼ × aᵢ + bⱼ
aⱼ = f(netⱼ)
where:
aᵢ = output of previous layer neuron i
wᵢⱼ = weight from i to j
f = activation function
Backpropagation computes gradients of the loss with respect to all weights using the chain rule.
MSE for regression:
L = (1/2) Σ (y_true - y_pred)²
Cross-entropy for classification:
L = -Σ y_true × log(y_pred)
1. FORWARD PASS: Compute outputs layer by layer
2. COMPUTE OUTPUT ERROR:
δₖ = (yₖ - oₖ) × f'(netₖ) [output layer]
3. BACKPROPAGATE ERROR:
δⱼ = f'(netⱼ) × Σₖ wⱼₖ × δₖ [hidden layer]
4. UPDATE WEIGHTS:
wᵢⱼ ← wᵢⱼ + η × δⱼ × aᵢ
5. REPEAT until convergence
Network: 2 inputs → 2 hidden (sigmoid) → 1 output (sigmoid)
Initial weights (example):
w₁₁=0.5, w₂₁=0.5 (input→hidden1)
w₁₂=-0.5, w₂₂=0.5 (input→hidden2)
v₁=0.5, v₂=-0.5 (hidden→output)
Input: x=(1,0), target y=1
Forward pass:
h1_net = 0.5×1 + 0.5×0 = 0.5
h1_out = sigmoid(0.5) = 0.622
h2_net = -0.5×1 + 0.5×0 = -0.5
h2_out = sigmoid(-0.5) = 0.378
out_net = 0.5×0.622 + (-0.5)×0.378 = 0.122
out = sigmoid(0.122) = 0.530
Loss: L = 0.5×(1-0.530)² = 0.110
Backward pass:
δ_out = (1-0.530) × 0.530×(1-0.530) = 0.117
δ_h1 = 0.622×(1-0.622) × 0.5×0.117 = 0.014
δ_h2 = 0.378×(1-0.378) × (-0.5)×0.117 = -0.014
Weight updates (η=0.1):
v₁ ← 0.5 + 0.1×0.117×0.622 = 0.507
v₂ ← -0.5 + 0.1×0.117×0.378 = -0.496
w₁₁ ← 0.5 + 0.1×0.014×1 = 0.501
Strengths:
Weaknesses:
Improvements:
Application: Recognize person from 20×20 pixel face images.
Architecture:
Input: 400 units (20×20 pixels, grayscale 0-1)
Hidden: 3 hidden layers (200, 100, 50 units)
Output: 1 unit per person (softmax)
Training:
Dataset: 1000 face images, 20 people (50 each)
Train/Test: 800/200
Results after training:
Training accuracy: 98%
Test accuracy: 94%
Preprocessing: Normalize pixel values, align faces.
Short Answer (2 marks each)
Long Answer (8 marks each)
Think & Apply
A neural network for digit recognition has:
Calculate total number of weights in this network.
The gradient is vanishing in a deep network. What would you change: activation function, weight initialization, or architecture? Justify.