Machine Learning (ML) is a subset of AI that enables systems to learn from data and improve performance without being explicitly programmed.
Tom Mitchell’s Definition:
“A computer program is said to learn from experience E with respect to some task T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
| Domain | Application | Type of ML |
|---|---|---|
| Healthcare | Disease diagnosis | Classification |
| Finance | Stock price prediction | Regression |
| E-commerce | Product recommendations | Unsupervised |
| Gaming | Game playing agents | Reinforcement |
| NLP | Language translation | Supervised |
| Security | Fraud detection | Classification |
| Robotics | Robot navigation | Reinforcement |
| Computer Vision | Face recognition | Classification |
Learns from labeled training data — each example has input + correct output.
Training: (x1,y1), (x2,y2), ..., (xn,yn)
Goal: Learn function f: X → Y
Test: Predict y for new x
Sub-types:
Algorithms: Decision Trees, SVM, Neural Networks, Naive Bayes
Learns from unlabeled data — finds hidden patterns or structure.
Training: x1, x2, ..., xn (no labels)
Goal: Discover structure in data
Sub-types:
Example: Customer segmentation — group customers by buying behavior without predefined categories.
Uses a small amount of labeled data + large amount of unlabeled data.
Why useful: Labeling data is expensive. Getting raw data is cheap.
Example: Google Photos — few labeled photos + many unlabeled photos to learn face recognition.
Agent learns by interacting with environment — receives rewards/penalties.
Agent → Action → Environment → State + Reward → Agent
Key elements:
Example: AlphaGo — learns to play Go by playing millions of games, receiving +reward for winning, -reward for losing.
Find rules describing associations between variables.
Example — Market Basket Analysis:
{bread, butter} → {milk} support=30%, confidence=70%
"70% of customers who buy bread and butter also buy milk"
Apriori Algorithm is commonly used for association rule mining.
1. Data Collection
↓
2. Data Preprocessing
(cleaning, normalization, feature selection)
↓
3. Model Selection
↓
4. Training
↓
5. Evaluation
↓
6. Deployment
↓
7. Monitoring & Updating
Underfitting: Model too simple → high training error, high test error
(High Bias)
Good fit: Low training error, low test error
Overfitting: Model too complex → low training error, high test error
(High Variance)
Total Error = Bias² + Variance + Irreducible Noise
High Bias (Underfitting): Doesn't capture data patterns
High Variance (Overfitting): Memorizes training data, poor generalization
| Set | Purpose | Typical Size |
|---|---|---|
| Training | Fit model parameters | 60-70% |
| Validation | Tune hyperparameters | 10-20% |
| Test | Final evaluation | 20-30% |
Confusion Matrix:
Predicted
Pos Neg
Actual Pos [TP] [FN]
Neg [FP] [TN]
| Metric | Formula |
|---|---|
| Accuracy | (TP+TN)/(TP+TN+FP+FN) |
| Precision | TP/(TP+FP) |
| Recall | TP/(TP+FN) |
| F1-Score | 2×(Precision×Recall)/(Precision+Recall) |
| Metric | Formula | Meaning |
|---|---|---|
| MAE | Σ|y-ŷ|/n | Average absolute error |
| MSE | Σ(y-ŷ)²/n | Penalizes large errors |
| RMSE | √MSE | Same units as y |
| R² | 1 - SS_res/SS_tot | Variance explained (0-1) |
Short Answer (2 marks each)
Long Answer (8 marks each)
Think & Apply
For each scenario, identify the type of ML and appropriate algorithm:
A spam filter has: TP=90, FP=10, FN=5, TN=895. Calculate accuracy, precision, recall and F1-score. Which metric is most important for a spam filter?