DECISION TREE LEARNING & INSTANCE-BASED LEARNING
Introduction to Decision Tree Learning
Decision Tree Learning is a supervised machine learning method used for classification and prediction. It represents decisions in a tree-like structure where:
- Nodes represent tests or conditions
- Branches represent possible outcomes
- Leaf nodes represent final decisions or classes
Example: Suppose we want to predict whether a person will buy a product.
Decision tree example:
Age > 30?
/ \
Yes No
/ \
Income > 50k? No Buy
/ \
Yes No
Buy Product No Buy
Components of Decision Tree
| Component | Meaning |
|---|---|
| Root Node | Starting point of tree |
| Internal Node | Decision condition |
| Branch | Outcome of condition |
| Leaf Node | Final result or class |
Applications
- Medical diagnosis
- Customer behavior prediction
- Fraud detection
- Credit risk analysis
Decision Tree Learning Algorithm
The decision tree learning algorithm builds a tree by recursively dividing the dataset based on attributes.
Steps of the Algorithm
- Start with the entire dataset as the root node.
- Select the best attribute that splits the data.
- Divide the dataset into subsets based on attribute values.
- Create child nodes.
-
Repeat the process until:
- All data belongs to same class
- No attributes remain
This process is called recursive partitioning.
Inductive Bias
Inductive bias refers to the assumptions made by a learning algorithm to generalize from training data.
In simple words: When a machine learning model learns from limited examples, it assumes certain patterns to predict unseen data.
Example: Suppose a decision tree learns that:
- Birds with wings and feathers can fly.
- Even if it has not seen a new bird, it assumes it can fly based on learned patterns.
Importance
Inductive bias helps the model:
- Make predictions on unseen data
- Avoid overfitting
- Learn general patterns
Inductive Inference with Decision Trees
Inductive inference means drawing general conclusions from specific examples.
Decision trees perform inductive inference by:
- Studying training examples
- Finding patterns
- Creating decision rules
Example
Training Data:
| Weather | Play Cricket |
|---|---|
| Sunny | No |
| Rainy | Yes |
| Overcast | Yes |
The decision tree may infer rule:
If Weather = Sunny → No Play
If Weather = Rainy → Play
This rule is generalized for future predictions.
Entropy (Information Theory)
Entropy measures the uncertainty or impurity in data.
If the data is mixed, entropy is high.
If the data is pure, entropy is low.
Entropy Formula
Where:
| Symbol | Meaning |
|---|---|
| S | Dataset |
| pᵢ | Probability of class i |
| c | Number of classes |
Example
Dataset:
| Class | Count |
|---|---|
| Yes | 9 |
| No | 5 |
If both classes are equally distributed → High entropy.
If only one class exists → Entropy = 0.
Information Gain
Information Gain measures how much information an attribute gives about the class. It helps determine which attribute should be used to split the dataset.
Formula
Where:
| Symbol | Meaning |
|---|---|
| S | Dataset |
| A | Attribute |
| Sᵥ | Subset of data |
Interpretation
Higher information gain → better attribute for splitting.
Example attributes:
- Age
- Income
- Education
The attribute with highest information gain becomes the root node.
ID3 Algorithm
ID3 (Iterative Dichotomiser 3) is a decision tree learning algorithm developed by Ross Quinlan.
It builds the decision tree using information gain.
Steps of ID3 Algorithm
- Calculate entropy of dataset.
- Compute information gain for each attribute.
- Choose attribute with highest information gain.
- Create node and split dataset.
- Repeat process for each subset.
-
Stop when:
- All examples belong to same class
- No attributes remain
Advantages of ID3
- Simple to implement
- Works well with categorical data
- Easy to interpret
Limitation
- Cannot handle continuous data directly
- Prone to overfitting
Issues in Decision Tree Learning
Decision tree algorithms face several challenges.
| Issue | Explanation |
|---|---|
| Overfitting | Tree becomes too complex and fits training data perfectly but fails on new data |
| Handling Continuous Attributes | Decision trees work better with categorical data |
| Missing Values | Difficult to process incomplete data |
| Bias toward Attributes with Many Values | Attributes with more values may appear more informative |
| Tree Complexity | Very large trees are difficult to interpret |
Solutions
- Pruning (reducing tree size)
- Handling missing values
-
Using improved algorithms like:
- C4.5
- CART
Summary Table
| Topic | Key Idea |
|---|---|
| Decision Tree | Tree structure for classification |
| Inductive Bias | Assumptions for generalization |
| Inductive Inference | Learning general rules from examples |
| Entropy | Measure of data impurity |
| Information Gain | Measures usefulness of attribute |
| ID3 Algorithm | Builds tree using information gain |
| Issues | Overfitting, missing data, complexity |
Important MCA Exam Questions
- Explain Decision Tree Learning with diagram.
- Define Entropy and Information Gain.
- Explain ID3 Algorithm with steps.
- What is Inductive Bias in Machine Learning?
- Explain Issues in Decision Tree Learning.
INSTANCE-BASED LEARNING
Instance-Based Learning (IBL) is a type of machine learning method where the model learns by storing training examples and comparing new data with those examples.
Instead of creating a general model, the system remembers past instances (examples) and uses them to make predictions.
In simple words: Learning happens by comparing new cases with previously stored cases.
Example: Suppose a system stores medical records of patients.
When a new patient arrives, the system compares the new patient with similar previous patients to predict disease.
Key Idea
Prediction is based on similarity between instances.
Characteristics
| Feature | Explanation |
|---|---|
| Lazy Learning | Model waits until prediction time |
| Memory-Based | Stores training examples |
| Similarity Measure | Uses distance measures like Euclidean distance |
k-Nearest Neighbour (k-NN) Learning
k-Nearest Neighbour (k-NN) is the most popular instance-based learning algorithm.
It classifies a new data point by looking at the k nearest data points in the training dataset.
How k-NN Works
Steps:
- Choose the value of k (number of neighbors).
- Calculate the distance between new data and all training data.
- Select the k closest neighbors.
- Assign the class that appears most frequently among those neighbors.
Distance Formula (Euclidean Distance)
Where:
| Symbol | Meaning |
|---|---|
| x | New data point |
| y | Training data point |
| n | Number of features |
Example
Suppose we classify fruit type using features:
- Weight
- Color
If most neighbors are Apples, the new fruit is classified as Apple.
Advantages
- Simple to implement
- No training phase
- Works well with small datasets
Disadvantages
- Slow for large datasets
- Sensitive to irrelevant features
- Requires large memory
Locally Weighted Regression (LWR)
Locally Weighted Regression is a type of regression algorithm where predictions are made using nearby data points.
Instead of fitting a single global model, LWR creates local models for each prediction.
Idea
Points closer to the query point get higher importance (weight). Example - Suppose we want to predict house prices.
A house price prediction depends more on nearby houses rather than houses in another city.
So the algorithm gives more weight to closer houses.
Key Characteristics
| Feature | Explanation |
|---|---|
| Local Model | Model built around nearby data |
| Weighted Data | Nearby points have higher weights |
| Lazy Learning | Model built during prediction |
Applications
- Economic forecasting
- Demand prediction
- Time-series analysis
Radial Basis Function (RBF) Networks
Radial Basis Function Networks are a type of Artificial Neural Network used for function approximation and classification.
They use radial basis functions as activation functions.
Structure of RBF Network
An RBF network has three layers:
- Input Layer
- Hidden Layer (Radial basis functions)
- Output Layer
Radial Basis Function
Where:
| Symbol | Meaning |
|---|---|
| x | Input vector |
| c | Center |
| γ | Parameter controlling spread |
The function gives higher values for points closer to the center.
Applications
- Pattern recognition
- Speech recognition
- Image processing
- Function approximation
Advantages
- Fast learning
- Good for nonlinear problems
Case-Based Learning
Case-Based Learning (CBL) is a method where the system solves new problems using solutions from similar past cases.
It is widely used in expert systems.
Process of Case-Based Learning
The system follows four steps:
| Step | Explanation |
|---|---|
| Retrieve | Find similar past cases |
| Reuse | Apply solution from past case |
| Revise | Modify solution if necessary |
| Retain | Store new case for future use |
Example: Legal system: A lawyer solves a case by studying similar past court cases.
Similarly, case-based learning uses previous cases to solve new problems.
Applications
- Medical diagnosis systems
- Customer support systems
- Legal reasoning systems
- Help desk systems
Comparison of Instance-Based Learning Methods
| Method | Purpose |
|---|---|
| k-NN | Classification based on nearest neighbors |
| Locally Weighted Regression | Local regression prediction |
| RBF Networks | Neural network using radial functions |
| Case-Based Learning | Solve problems using past cases |
Advantages of Instance-Based Learning
- Simple concept
- No training phase required
- Adapts quickly to new data
- Effective for pattern recognition
Limitations of Instance-Based Learning
| Problem | Explanation |
|---|---|
| High Memory Requirement | Stores entire dataset |
| Slow Prediction | Needs to compare many instances |
| Sensitive to Noise | Outliers affect results |
| Curse of Dimensionality | Performance decreases with many features |
Summary
Instance-Based Learning is a memory-based machine learning approach where predictions are made by comparing new instances with stored examples. Algorithms like k-NN, Locally Weighted Regression, RBF networks, and Case-Based Learning rely on similarity measures to make predictions. These methods are simple and flexible but can require large storage and computational resources.
Important MCA Exam Questions
- Explain Instance-Based Learning with characteristics.
- Explain k-Nearest Neighbour algorithm with example.
- What is Locally Weighted Regression?
- Explain Radial Basis Function Networks.
- Explain Case-Based Learning with steps