Regression, Bayesian Learning & Support Vector Machine



REGRESSION

Regression is a machine learning technique used to predict continuous values. Continuous values means numbers that can take any value within a range.

Examples

  • Predicting house price
  • Predicting temperature
  • Predicting sales revenue
  • Predicting stock price

In regression, we try to find the relationship between variables.

Example: Sales = f(Advertising Budget)

If advertising increases → sales may increase.

LINEAR REGRESSION

Linear regression is the simplest regression algorithm used to predict a numeric value based on the relationship between variables. It assumes that the relationship between variables is linear (straight line).

Mathematical Representation

Regression, Bayesian Learning & Support Vector Machine

SymbolMeaning
yPredicted value
xInput variable
mSlope of line
bIntercept

Example: Suppose we want to predict house price based on size.

House Size (sq ft)Price
80020 lakh
100025 lakh
120030 lakh

The algorithm finds the best straight line that fits these data points.

Applications

  • Sales forecasting
  • Stock market prediction
  • Weather forecasting
  • Business growth prediction

Advantages

  • Simple and easy to implement
  • Easy to interpret

Disadvantages

  • Works only when relationship is linear
  • Sensitive to outliers

LOGISTIC REGRESSION

Logistic regression is used when the output is categorical (classification) rather than continuous.

Example outputs:

  • Yes / No
  • True / False
  • Spam / Not Spam
  • Disease / No Disease

Even though it is called regression, it is mainly used for classification problems.

Logistic Function

P(y=1x)=11+ezP(y=1|x)=\frac{1}{1+e^{-z}}

Where:

z = wx + b

This formula converts any value into a probability between 0 and 1.

Example

Spam Detection:

If probability > 0.5 → Spam
If probability < 0.5 → Not Spam

Applications

  • Email spam detection
  • Disease prediction
  • Customer churn prediction
  • Credit risk analysis

BAYESIAN LEARNING

Bayesian learning is a machine learning method based on probability theory. It uses Bayes theorem to update the probability of a hypothesis when new data is available.

Example: If a patient has fever and cough, Bayesian learning helps predict the probability of flu or infection.

BAYES THEOREM

Bayes theorem is used to calculate the probability of an event based on prior knowledge

Regression, Bayesian Learning & Support Vector Machine

where,

TermMeaning
P(AB)
P(BA)
P(A)Prior probability
P(B)Probability of evidence

Example

Disease prediction:

A = Person has disease
B = Person has symptoms

Bayes theorem helps calculate probability of disease given symptoms.

CONCEPT LEARNING

Concept learning means learning a general concept from specific examples. Example: We show the system several examples of birds.

ExampleFeatures
SparrowWings, feathers
EagleWings, feathers
ParrotWings, feathers

The system learns the concept:

Bird = animal with wings and feathers

Applications

  • Image recognition
  • Pattern recognition
  • Object detection

BAYES OPTIMAL CLASSIFIER

Bayes Optimal Classifier is a theoretically best classifier in machine learning.

It chooses the classification with the highest probability.

Idea

Given several hypotheses:

H1, H2, H3

The classifier selects the hypothesis that has the highest posterior probability.

Advantage

  • Produces minimum possible classification error

Limitation

  • Computationally expensive
  • Hard to implement for large datasets

NAÏVE BAYES CLASSIFIER

  • Naïve Bayes is a simple and powerful classification algorithm based on Bayes theorem.
  • It assumes that features are independent of each other.
  • That assumption is called naïve assumption.

Example

Spam detection:

Email features:

  • Contains "offer"
  • Contains "free"
  • Contains "discount"

Naïve Bayes assumes these words independently affect spam probability.

Formula

It uses Bayes theorem:

Probability(Class | Features)

Applications

  • Email spam filtering
  • Sentiment analysis
  • Document classification
  • News categorization

Advantages

  • Fast and efficient
  • Works well with large datasets

Disadvantages

  • Assumes feature independence (not always realistic)

BAYESIAN BELIEF NETWORKS

Bayesian Belief Networks (BBN) are graphical models representing probabilistic relationships between variables.

Structure:

Nodes → Random variables
Edges → Dependency between variables

Example

Medical diagnosis system

Smoking → Lung Disease
Lung Disease → Cough

This network helps calculate probabilities of diseases.

Applications

  • Medical diagnosis
  • Risk analysis
  • Fraud detection
  • Decision support systems

EM ALGORITHM (Expectation-Maximization)

The EM algorithm is used when data has missing or hidden variables. It is commonly used in clustering and probabilistic models.

Two Main Steps

StepExplanation
Expectation (E-Step)Estimate missing data
Maximization (M-Step)Update parameters to maximize probability

Process

  1. Start with initial parameters
  2. Estimate hidden variables
  3. Update model parameters
  4. Repeat until convergence

Applications

  • Gaussian Mixture Models
  • Image segmentation
  • Speech recognition
  • Clustering

SUMMARY TABLE

TopicKey Idea
Linear RegressionPredict continuous values using straight line
Logistic RegressionClassification using probability
Bayes TheoremCalculates probability using prior knowledge
Concept LearningLearning general concept from examples
Bayes Optimal ClassifierBest theoretical classifier
Naïve BayesSimple probabilistic classifier
Bayesian Belief NetworksGraphical probability models
EM AlgorithmHandles hidden variables

For Exams, the most important questions from this unit are:

  1. Explain Linear Regression with diagram.
  2. Explain Logistic Regression and its applications.
  3. Explain Bayes theorem with example.
  4. What is Naïve Bayes classifier?
  5. Explain Bayesian belief networks.
  6. Explain EM Algorithm with steps.

SUPPORT VECTOR MACHINE (SVM)

Introduction to Support Vector Machine

Support Vector Machine (SVM) is a supervised machine learning algorithm used mainly for classification and sometimes regression problems. Its main goal is to separate data into different classes by finding the best boundary between them.

Example problems:

  • Spam email detection
  • Image classification (cat vs dog)
  • Face recognition
  • Medical diagnosis

The boundary that separates the classes is called a Hyperplane.

Example: Suppose we have two types of data points:

  • Class A
  • Class B

SVM tries to draw a line (in 2D) or plane (in higher dimensions) that separates these classes with the maximum margin.

Hyperplane (Decision Surface)

A hyperplane is a boundary that separates data into different classes.

For 2D data → hyperplane is a line
For 3D data → hyperplane is a plane
For higher dimensions → hyperplane is a decision surface

wx+b=0w \cdot x + b = 0

Where:

SymbolMeaning
wWeight vector
xInput data point
bBias

Key Concept: Margin

  • Margin = distance between the hyperplane and the nearest data points.
  • SVM tries to maximize this margin.
  • The nearest data points are called Support Vectors.
  • These support vectors determine the position of the hyperplane.

Types of Support Vector Kernels

Sometimes data cannot be separated using a straight line.
In such cases, SVM uses kernel functions to transform data into a higher dimension.

Common kernels include:

Kernel TypePurpose
Linear KernelFor linearly separable data
Polynomial KernelFor curved decision boundaries
Gaussian (RBF) KernelFor complex patterns

Linear Kernel

Linear kernel is used when the dataset can be separated by a straight line.

Example:

Class A and Class B are easily separable.

Formula:

K(xi,xj)=xixjK(x_i,x_j) = x_i \cdot x_j

Where:

xi and xj are input vectors.

Applications

  • Text classification
  • Spam detection
  • Document categorization

Advantages:

  • Simple
  • Fast computation

Polynomial Kernel

Polynomial kernel is used when data has a curved relationship.

Formula:

K(xi,xj)=(xixj+c)dK(x_i,x_j) = (x_i \cdot x_j + c)^d

Where:

ParameterMeaning
cConstant
dDegree of polynomial

This kernel allows SVM to create curved decision boundaries.

Applications:

  • Image processing
  • Natural language processing

Gaussian Kernel (RBF Kernel)

Gaussian kernel is also called Radial Basis Function (RBF) kernel.

It is the most commonly used kernel because it can handle complex nonlinear data.

Formula:

K(xi,xj)=eγxixj2K(x_i,x_j) = e^{-\gamma ||x_i-x_j||^2}

Where:

SymbolMeaning
γ (gamma)Kernel parameter
xi − xjDistance between points

This kernel maps data into very high dimensional space.

Applications:

  • Face recognition
  • Bioinformatics
  • Handwriting recognition

Properties of Support Vector Machine

SVM has several important characteristics.

PropertyExplanation
Maximum MarginSVM finds boundary with largest margin
Robust to OverfittingWorks well with high dimensional data
Uses Support VectorsOnly important data points affect model
Kernel TrickHandles nonlinear data using kernels
Effective in High DimensionsWorks well when features are large

Advantages of SVM

  1. High accuracy
  2. Effective in high dimensional spaces
  3. Works well with small datasets
  4. Memory efficient
  5. Can handle nonlinear classification

Issues in Support Vector Machine

Despite many advantages, SVM also has some limitations.

IssueExplanation
High Training TimeTraining can be slow for large datasets
Kernel SelectionChoosing correct kernel is difficult
Parameter TuningParameters like C and gamma must be optimized
Not Suitable for Large DatasetsComputationally expensive
Difficult InterpretationModel is harder to interpret than decision trees

Real Life Applications of SVM

FieldApplication
HealthcareDisease diagnosis
FinanceCredit risk analysis
CybersecurityMalware detection
Image ProcessingObject recognition
MarketingCustomer segmentation

Summary

Support Vector Machine is a powerful supervised learning algorithm used for classification and regression. It works by finding the optimal hyperplane that maximizes the margin between different classes. With the help of kernel functions, SVM can handle both linear and nonlinear datasets. Although it provides high accuracy, it requires proper kernel selection and parameter tuning.

Important Exam Questions (MCA)

  1. Explain Support Vector Machine with diagram.
  2. What is a hyperplane in SVM?
  3. Explain types of kernel functions in SVM.
  4. Write advantages and disadvantages of SVM.
  5. Explain Gaussian kernel and polynomial kernel.