DECISION TREE LEARNING & INSTANCE-BASED LEARNING



Introduction to Decision Tree Learning

Decision Tree Learning is a supervised machine learning method used for classification and prediction. It represents decisions in a tree-like structure where:

  • Nodes represent tests or conditions
  • Branches represent possible outcomes
  • Leaf nodes represent final decisions or classes

Example: Suppose we want to predict whether a person will buy a product.

Decision tree example:

Age > 30?
/ \
Yes No
/ \
Income > 50k? No Buy
/ \
Yes No
Buy Product No Buy

Components of Decision Tree

ComponentMeaning
Root NodeStarting point of tree
Internal NodeDecision condition
BranchOutcome of condition
Leaf NodeFinal result or class

Applications

  • Medical diagnosis
  • Customer behavior prediction
  • Fraud detection
  • Credit risk analysis

Decision Tree Learning Algorithm

The decision tree learning algorithm builds a tree by recursively dividing the dataset based on attributes.

Steps of the Algorithm

  1. Start with the entire dataset as the root node.
  2. Select the best attribute that splits the data.
  3. Divide the dataset into subsets based on attribute values.
  4. Create child nodes.
  5. Repeat the process until:
    • All data belongs to same class
    • No attributes remain

This process is called recursive partitioning.

Inductive Bias

Inductive bias refers to the assumptions made by a learning algorithm to generalize from training data.

In simple words: When a machine learning model learns from limited examples, it assumes certain patterns to predict unseen data.

Example: Suppose a decision tree learns that:

  • Birds with wings and feathers can fly.
  • Even if it has not seen a new bird, it assumes it can fly based on learned patterns.

Importance

Inductive bias helps the model:

  • Make predictions on unseen data
  • Avoid overfitting
  • Learn general patterns

Inductive Inference with Decision Trees

Inductive inference means drawing general conclusions from specific examples.

Decision trees perform inductive inference by:

  1. Studying training examples
  2. Finding patterns
  3. Creating decision rules

Example

Training Data:

WeatherPlay Cricket
SunnyNo
RainyYes
OvercastYes

The decision tree may infer rule:

If Weather = Sunny → No Play
If Weather = Rainy → Play

This rule is generalized for future predictions.

Entropy (Information Theory)

Entropy measures the uncertainty or impurity in data.

If the data is mixed, entropy is high.
If the data is pure, entropy is low.

Entropy Formula

Entropy(S)=i=1cpilog2piEntropy(S) = - \sum_{i=1}^{c} p_i \log_2 p_i

Where:

SymbolMeaning
SDataset
pᵢProbability of class i
cNumber of classes

Example

Dataset:

ClassCount
Yes9
No5

If both classes are equally distributed → High entropy.

If only one class exists → Entropy = 0.

Information Gain

Information Gain measures how much information an attribute gives about the class. It helps determine which attribute should be used to split the dataset.

Formula

Gain(S,A)=Entropy(S)vValues(A)SvSEntropy(Sv)Gain(S,A) = Entropy(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} Entropy(S_v)

Where:

SymbolMeaning
SDataset
AAttribute
SᵥSubset of data

Interpretation

Higher information gain → better attribute for splitting.

Example attributes:

  • Age
  • Income
  • Education

The attribute with highest information gain becomes the root node.

ID3 Algorithm

ID3 (Iterative Dichotomiser 3) is a decision tree learning algorithm developed by Ross Quinlan.

It builds the decision tree using information gain.

Steps of ID3 Algorithm

  1. Calculate entropy of dataset.
  2. Compute information gain for each attribute.
  3. Choose attribute with highest information gain.
  4. Create node and split dataset.
  5. Repeat process for each subset.
  6. Stop when:
    • All examples belong to same class
    • No attributes remain

Advantages of ID3

  • Simple to implement
  • Works well with categorical data
  • Easy to interpret

Limitation

  • Cannot handle continuous data directly
  • Prone to overfitting

Issues in Decision Tree Learning

Decision tree algorithms face several challenges.

IssueExplanation
OverfittingTree becomes too complex and fits training data perfectly but fails on new data
Handling Continuous AttributesDecision trees work better with categorical data
Missing ValuesDifficult to process incomplete data
Bias toward Attributes with Many ValuesAttributes with more values may appear more informative
Tree ComplexityVery large trees are difficult to interpret

Solutions

  • Pruning (reducing tree size)
  • Handling missing values
  • Using improved algorithms like:
    • C4.5
    • CART

Summary Table

TopicKey Idea
Decision TreeTree structure for classification
Inductive BiasAssumptions for generalization
Inductive InferenceLearning general rules from examples
EntropyMeasure of data impurity
Information GainMeasures usefulness of attribute
ID3 AlgorithmBuilds tree using information gain
IssuesOverfitting, missing data, complexity

Important MCA Exam Questions

  1. Explain Decision Tree Learning with diagram.
  2. Define Entropy and Information Gain.
  3. Explain ID3 Algorithm with steps.
  4. What is Inductive Bias in Machine Learning?
  5. Explain Issues in Decision Tree Learning.

INSTANCE-BASED LEARNING

Instance-Based Learning (IBL) is a type of machine learning method where the model learns by storing training examples and comparing new data with those examples.

Instead of creating a general model, the system remembers past instances (examples) and uses them to make predictions.

In simple words: Learning happens by comparing new cases with previously stored cases.

Example: Suppose a system stores medical records of patients.

When a new patient arrives, the system compares the new patient with similar previous patients to predict disease.

Key Idea

Prediction is based on similarity between instances.

Characteristics

FeatureExplanation
Lazy LearningModel waits until prediction time
Memory-BasedStores training examples
Similarity MeasureUses distance measures like Euclidean distance

k-Nearest Neighbour (k-NN) Learning

k-Nearest Neighbour (k-NN) is the most popular instance-based learning algorithm.

It classifies a new data point by looking at the k nearest data points in the training dataset.

How k-NN Works

Steps:

  1. Choose the value of k (number of neighbors).
  2. Calculate the distance between new data and all training data.
  3. Select the k closest neighbors.
  4. Assign the class that appears most frequently among those neighbors.

Distance Formula (Euclidean Distance)

d(x,y)=i=1n(xiyi)2d(x,y) = \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2}

Where:

SymbolMeaning
xNew data point
yTraining data point
nNumber of features

Example

Suppose we classify fruit type using features:

  • Weight
  • Color

If most neighbors are Apples, the new fruit is classified as Apple.

Advantages

  • Simple to implement
  • No training phase
  • Works well with small datasets

Disadvantages

  • Slow for large datasets
  • Sensitive to irrelevant features
  • Requires large memory

Locally Weighted Regression (LWR)

Locally Weighted Regression is a type of regression algorithm where predictions are made using nearby data points.

Instead of fitting a single global model, LWR creates local models for each prediction.

Idea

Points closer to the query point get higher importance (weight)Example - Suppose we want to predict house prices.

A house price prediction depends more on nearby houses rather than houses in another city.

So the algorithm gives more weight to closer houses.

Key Characteristics

FeatureExplanation
Local ModelModel built around nearby data
Weighted DataNearby points have higher weights
Lazy LearningModel built during prediction

Applications

  • Economic forecasting
  • Demand prediction
  • Time-series analysis

Radial Basis Function (RBF) Networks

Radial Basis Function Networks are a type of Artificial Neural Network used for function approximation and classification.

They use radial basis functions as activation functions.

Structure of RBF Network

An RBF network has three layers:

  1. Input Layer
  2. Hidden Layer (Radial basis functions)
  3. Output Layer

Radial Basis Function

ϕ(x)=eγxc2\phi(x) = e^{-\gamma ||x - c||^2}

Where:

SymbolMeaning
xInput vector
cCenter
γParameter controlling spread

The function gives higher values for points closer to the center.

Applications

  • Pattern recognition
  • Speech recognition
  • Image processing
  • Function approximation

Advantages

  • Fast learning
  • Good for nonlinear problems

Case-Based Learning

Case-Based Learning (CBL) is a method where the system solves new problems using solutions from similar past cases.

It is widely used in expert systems.

Process of Case-Based Learning

The system follows four steps:

StepExplanation
RetrieveFind similar past cases
ReuseApply solution from past case
ReviseModify solution if necessary
RetainStore new case for future use

Example: Legal system: A lawyer solves a case by studying similar past court cases.

Similarly, case-based learning uses previous cases to solve new problems.

Applications

  • Medical diagnosis systems
  • Customer support systems
  • Legal reasoning systems
  • Help desk systems

Comparison of Instance-Based Learning Methods

MethodPurpose
k-NNClassification based on nearest neighbors
Locally Weighted RegressionLocal regression prediction
RBF NetworksNeural network using radial functions
Case-Based LearningSolve problems using past cases

Advantages of Instance-Based Learning

  1. Simple concept
  2. No training phase required
  3. Adapts quickly to new data
  4. Effective for pattern recognition

Limitations of Instance-Based Learning

ProblemExplanation
High Memory RequirementStores entire dataset
Slow PredictionNeeds to compare many instances
Sensitive to NoiseOutliers affect results
Curse of DimensionalityPerformance decreases with many features

Summary

Instance-Based Learning is a memory-based machine learning approach where predictions are made by comparing new instances with stored examples. Algorithms like k-NN, Locally Weighted Regression, RBF networks, and Case-Based Learning rely on similarity measures to make predictions. These methods are simple and flexible but can require large storage and computational resources.

Important MCA Exam Questions

  1. Explain Instance-Based Learning with characteristics.
  2. Explain k-Nearest Neighbour algorithm with example.
  3. What is Locally Weighted Regression?
  4. Explain Radial Basis Function Networks.
  5. Explain Case-Based Learning with steps