Regression, Bayesian Learning & Support Vector Machine

REGRESSION

Regression is a machine learning technique used to predict continuous values. Continuous values means numbers that can take any value within a range.

Examples

Predicting house price
Predicting temperature
Predicting sales revenue
Predicting stock price

In regression, we try to find the relationship between variables.

Example: Sales = f(Advertising Budget)

If advertising increases → sales may increase.

LINEAR REGRESSION

Linear regression is the simplest regression algorithm used to predict a numeric value based on the relationship between variables. It assumes that the relationship between variables is linear (straight line).

Mathematical Representation

Regression, Bayesian Learning & Support Vector Machine

Symbol	Meaning
y	Predicted value
x	Input variable
m	Slope of line
b	Intercept

Example: Suppose we want to predict house price based on size.

House Size (sq ft)	Price
800	20 lakh
1000	25 lakh
1200	30 lakh

The algorithm finds the best straight line that fits these data points.

Applications

Sales forecasting
Stock market prediction
Weather forecasting
Business growth prediction

Advantages

Simple and easy to implement
Easy to interpret

Disadvantages

Works only when relationship is linear
Sensitive to outliers

LOGISTIC REGRESSION

Logistic regression is used when the output is categorical (classification) rather than continuous.

Example outputs:

Yes / No
True / False
Spam / Not Spam
Disease / No Disease

Even though it is called regression, it is mainly used for classification problems.

Logistic Function

$P(y=1|x)=\frac{1}{1+e^{-z}}$

Where:

z = wx + b

This formula converts any value into a probability between 0 and 1.

Example

Spam Detection:

If probability > 0.5 → Spam
If probability < 0.5 → Not Spam

Applications

Email spam detection
Disease prediction
Customer churn prediction
Credit risk analysis

BAYESIAN LEARNING

Bayesian learning is a machine learning method based on probability theory. It uses Bayes theorem to update the probability of a hypothesis when new data is available.

Example: If a patient has fever and cough, Bayesian learning helps predict the probability of flu or infection.

BAYES THEOREM

Bayes theorem is used to calculate the probability of an event based on prior knowledge

where,

Term	Meaning
P(A	B)
P(B	A)
P(A)	Prior probability
P(B)	Probability of evidence

Example

Disease prediction:

A = Person has disease
B = Person has symptoms

Bayes theorem helps calculate probability of disease given symptoms.

CONCEPT LEARNING

Concept learning means learning a general concept from specific examples. Example: We show the system several examples of birds.

Example	Features
Sparrow	Wings, feathers
Eagle	Wings, feathers
Parrot	Wings, feathers

The system learns the concept:

Bird = animal with wings and feathers

Applications

Image recognition
Pattern recognition
Object detection

BAYES OPTIMAL CLASSIFIER

Bayes Optimal Classifier is a theoretically best classifier in machine learning.

It chooses the classification with the highest probability.

Idea

Given several hypotheses:

H1, H2, H3

The classifier selects the hypothesis that has the highest posterior probability.

Advantage

Produces minimum possible classification error

Limitation

Computationally expensive
Hard to implement for large datasets

NAÏVE BAYES CLASSIFIER

Naïve Bayes is a simple and powerful classification algorithm based on Bayes theorem.
It assumes that features are independent of each other.
That assumption is called naïve assumption.

Example

Spam detection:

Email features:

Contains "offer"
Contains "free"
Contains "discount"

Naïve Bayes assumes these words independently affect spam probability.

Formula

It uses Bayes theorem:

Probability(Class | Features)

Applications

Email spam filtering
Sentiment analysis
Document classification
News categorization

Advantages

Fast and efficient
Works well with large datasets

Disadvantages

Assumes feature independence (not always realistic)

BAYESIAN BELIEF NETWORKS

Bayesian Belief Networks (BBN) are graphical models representing probabilistic relationships between variables.

Structure:

Nodes → Random variables
Edges → Dependency between variables

Example

Medical diagnosis system

Smoking → Lung Disease
Lung Disease → Cough

This network helps calculate probabilities of diseases.

Applications

Medical diagnosis
Risk analysis
Fraud detection
Decision support systems

EM ALGORITHM (Expectation-Maximization)

The EM algorithm is used when data has missing or hidden variables. It is commonly used in clustering and probabilistic models.

Two Main Steps

Step	Explanation
Expectation (E-Step)	Estimate missing data
Maximization (M-Step)	Update parameters to maximize probability

Process

Start with initial parameters
Estimate hidden variables
Update model parameters
Repeat until convergence

Applications

Gaussian Mixture Models
Image segmentation
Speech recognition
Clustering

SUMMARY TABLE

Topic	Key Idea
Linear Regression	Predict continuous values using straight line
Logistic Regression	Classification using probability
Bayes Theorem	Calculates probability using prior knowledge
Concept Learning	Learning general concept from examples
Bayes Optimal Classifier	Best theoretical classifier
Naïve Bayes	Simple probabilistic classifier
Bayesian Belief Networks	Graphical probability models
EM Algorithm	Handles hidden variables

For Exams, the most important questions from this unit are:

Explain Linear Regression with diagram.
Explain Logistic Regression and its applications.
Explain Bayes theorem with example.
What is Naïve Bayes classifier?
Explain Bayesian belief networks.
Explain EM Algorithm with steps.

SUPPORT VECTOR MACHINE (SVM)

Introduction to Support Vector Machine

Support Vector Machine (SVM) is a supervised machine learning algorithm used mainly for classification and sometimes regression problems. Its main goal is to separate data into different classes by finding the best boundary between them.

Example problems:

Spam email detection
Image classification (cat vs dog)
Face recognition
Medical diagnosis

The boundary that separates the classes is called a Hyperplane.

Example: Suppose we have two types of data points:

Class A
Class B

SVM tries to draw a line (in 2D) or plane (in higher dimensions) that separates these classes with the maximum margin.

Hyperplane (Decision Surface)

A hyperplane is a boundary that separates data into different classes.

For 2D data → hyperplane is a line
For 3D data → hyperplane is a plane
For higher dimensions → hyperplane is a decision surface

$w \cdot x + b = 0$

Where:

Symbol	Meaning
w	Weight vector
x	Input data point
b	Bias

Key Concept: Margin

Margin = distance between the hyperplane and the nearest data points.
SVM tries to maximize this margin.
The nearest data points are called Support Vectors.
These support vectors determine the position of the hyperplane.

Types of Support Vector Kernels

Sometimes data cannot be separated using a straight line.
In such cases, SVM uses kernel functions to transform data into a higher dimension.

Common kernels include:

Kernel Type	Purpose
Linear Kernel	For linearly separable data
Polynomial Kernel	For curved decision boundaries
Gaussian (RBF) Kernel	For complex patterns

Linear Kernel

Linear kernel is used when the dataset can be separated by a straight line.

Example:

Class A and Class B are easily separable.

Formula:

$K(x_i,x_j) = x_i \cdot x_j$

Where:

xi and xj are input vectors.

Applications

Text classification
Spam detection
Document categorization

Advantages:

Simple
Fast computation

Polynomial Kernel

Polynomial kernel is used when data has a curved relationship.

Formula:

$K(x_i,x_j) = (x_i \cdot x_j + c)^d$

Where:

Parameter	Meaning
c	Constant
d	Degree of polynomial

This kernel allows SVM to create curved decision boundaries.

Applications:

Image processing
Natural language processing

Gaussian Kernel (RBF Kernel)

Gaussian kernel is also called Radial Basis Function (RBF) kernel.

It is the most commonly used kernel because it can handle complex nonlinear data.

Formula:

$K(x_i,x_j) = e^{-\gamma ||x_i-x_j||^2}$

Where:

Symbol	Meaning
γ (gamma)	Kernel parameter
xi − xj	Distance between points

This kernel maps data into very high dimensional space.

Applications:

Face recognition
Bioinformatics
Handwriting recognition

Properties of Support Vector Machine

SVM has several important characteristics.

Property	Explanation
Maximum Margin	SVM finds boundary with largest margin
Robust to Overfitting	Works well with high dimensional data
Uses Support Vectors	Only important data points affect model
Kernel Trick	Handles nonlinear data using kernels
Effective in High Dimensions	Works well when features are large

Advantages of SVM

High accuracy
Effective in high dimensional spaces
Works well with small datasets
Memory efficient
Can handle nonlinear classification

Issues in Support Vector Machine

Despite many advantages, SVM also has some limitations.

Issue	Explanation
High Training Time	Training can be slow for large datasets
Kernel Selection	Choosing correct kernel is difficult
Parameter Tuning	Parameters like C and gamma must be optimized
Not Suitable for Large Datasets	Computationally expensive
Difficult Interpretation	Model is harder to interpret than decision trees

Real Life Applications of SVM

Field	Application
Healthcare	Disease diagnosis
Finance	Credit risk analysis
Cybersecurity	Malware detection
Image Processing	Object recognition
Marketing	Customer segmentation

Summary

Support Vector Machine is a powerful supervised learning algorithm used for classification and regression. It works by finding the optimal hyperplane that maximizes the margin between different classes. With the help of kernel functions, SVM can handle both linear and nonlinear datasets. Although it provides high accuracy, it requires proper kernel selection and parameter tuning.

Important Exam Questions (MCA)

Explain Support Vector Machine with diagram.
What is a hyperplane in SVM?
Explain types of kernel functions in SVM.
Write advantages and disadvantages of SVM.
Explain Gaussian kernel and polynomial kernel.

Regression, Bayesian Learning & Support Vector Machine

REGRESSION

Examples

LINEAR REGRESSION

Mathematical Representation

Applications

Advantages

Disadvantages

LOGISTIC REGRESSION

Logistic Function

Example

Applications

BAYESIAN LEARNING

BAYES THEOREM

Example

CONCEPT LEARNING

Applications

BAYES OPTIMAL CLASSIFIER

Idea

Advantage

Limitation

NAÏVE BAYES CLASSIFIER

Example

Formula

Applications

Advantages

Disadvantages

BAYESIAN BELIEF NETWORKS

Example

Applications

EM ALGORITHM (Expectation-Maximization)

Two Main Steps

Process

Applications

SUMMARY TABLE

SUPPORT VECTOR MACHINE (SVM)

Introduction to Support Vector Machine

Hyperplane (Decision Surface)

Key Concept: Margin

Types of Support Vector Kernels

Linear Kernel

Applications

Polynomial Kernel

Gaussian Kernel (RBF Kernel)

Properties of Support Vector Machine

Advantages of SVM

Issues in Support Vector Machine

Real Life Applications of SVM

Summary

You might like