Regression, Bayesian Learning & Support Vector Machine
REGRESSION
Regression is a machine learning technique used to predict continuous values. Continuous values means numbers that can take any value within a range.
Examples
- Predicting house price
- Predicting temperature
- Predicting sales revenue
- Predicting stock price
In regression, we try to find the relationship between variables.
Example: Sales = f(Advertising Budget)
If advertising increases → sales may increase.
LINEAR REGRESSION
Linear regression is the simplest regression algorithm used to predict a numeric value based on the relationship between variables. It assumes that the relationship between variables is linear (straight line).
Mathematical Representation
| Symbol | Meaning |
|---|---|
| y | Predicted value |
| x | Input variable |
| m | Slope of line |
| b | Intercept |
Example: Suppose we want to predict house price based on size.
| House Size (sq ft) | Price |
|---|---|
| 800 | 20 lakh |
| 1000 | 25 lakh |
| 1200 | 30 lakh |
The algorithm finds the best straight line that fits these data points.
Applications
- Sales forecasting
- Stock market prediction
- Weather forecasting
- Business growth prediction
Advantages
- Simple and easy to implement
- Easy to interpret
Disadvantages
- Works only when relationship is linear
- Sensitive to outliers
LOGISTIC REGRESSION
Logistic regression is used when the output is categorical (classification) rather than continuous.
Example outputs:
- Yes / No
- True / False
- Spam / Not Spam
- Disease / No Disease
Even though it is called regression, it is mainly used for classification problems.
Logistic Function
Where:
z = wx + b
This formula converts any value into a probability between 0 and 1.
Example
Spam Detection:
If probability > 0.5 → Spam
If probability < 0.5 → Not Spam
Applications
- Email spam detection
- Disease prediction
- Customer churn prediction
- Credit risk analysis
BAYESIAN LEARNING
Bayesian learning is a machine learning method based on probability theory. It uses Bayes theorem to update the probability of a hypothesis when new data is available.
Example: If a patient has fever and cough, Bayesian learning helps predict the probability of flu or infection.
BAYES THEOREM
Bayes theorem is used to calculate the probability of an event based on prior knowledge
where,
| Term | Meaning |
|---|---|
| P(A | B) |
| P(B | A) |
| P(A) | Prior probability |
| P(B) | Probability of evidence |
Example
Disease prediction:
A = Person has disease
B = Person has symptoms
Bayes theorem helps calculate probability of disease given symptoms.
CONCEPT LEARNING
Concept learning means learning a general concept from specific examples. Example: We show the system several examples of birds.
| Example | Features |
|---|---|
| Sparrow | Wings, feathers |
| Eagle | Wings, feathers |
| Parrot | Wings, feathers |
The system learns the concept:
Bird = animal with wings and feathers
Applications
- Image recognition
- Pattern recognition
- Object detection
BAYES OPTIMAL CLASSIFIER
Bayes Optimal Classifier is a theoretically best classifier in machine learning.
It chooses the classification with the highest probability.
Idea
Given several hypotheses:
H1, H2, H3
The classifier selects the hypothesis that has the highest posterior probability.
Advantage
- Produces minimum possible classification error
Limitation
- Computationally expensive
- Hard to implement for large datasets
NAÏVE BAYES CLASSIFIER
- Naïve Bayes is a simple and powerful classification algorithm based on Bayes theorem.
- It assumes that features are independent of each other.
- That assumption is called naïve assumption.
Example
Spam detection:
Email features:
- Contains "offer"
- Contains "free"
- Contains "discount"
Naïve Bayes assumes these words independently affect spam probability.
Formula
It uses Bayes theorem:
Probability(Class | Features)
Applications
- Email spam filtering
- Sentiment analysis
- Document classification
- News categorization
Advantages
- Fast and efficient
- Works well with large datasets
Disadvantages
- Assumes feature independence (not always realistic)
BAYESIAN BELIEF NETWORKS
Bayesian Belief Networks (BBN) are graphical models representing probabilistic relationships between variables.
Structure:
Nodes → Random variables
Edges → Dependency between variables
Example
Medical diagnosis system
Smoking → Lung Disease
Lung Disease → Cough
This network helps calculate probabilities of diseases.
Applications
- Medical diagnosis
- Risk analysis
- Fraud detection
- Decision support systems
EM ALGORITHM (Expectation-Maximization)
The EM algorithm is used when data has missing or hidden variables. It is commonly used in clustering and probabilistic models.
Two Main Steps
| Step | Explanation |
|---|---|
| Expectation (E-Step) | Estimate missing data |
| Maximization (M-Step) | Update parameters to maximize probability |
Process
- Start with initial parameters
- Estimate hidden variables
- Update model parameters
- Repeat until convergence
Applications
- Gaussian Mixture Models
- Image segmentation
- Speech recognition
- Clustering
SUMMARY TABLE
| Topic | Key Idea |
|---|---|
| Linear Regression | Predict continuous values using straight line |
| Logistic Regression | Classification using probability |
| Bayes Theorem | Calculates probability using prior knowledge |
| Concept Learning | Learning general concept from examples |
| Bayes Optimal Classifier | Best theoretical classifier |
| Naïve Bayes | Simple probabilistic classifier |
| Bayesian Belief Networks | Graphical probability models |
| EM Algorithm | Handles hidden variables |
For Exams, the most important questions from this unit are:
- Explain Linear Regression with diagram.
- Explain Logistic Regression and its applications.
- Explain Bayes theorem with example.
- What is Naïve Bayes classifier?
- Explain Bayesian belief networks.
- Explain EM Algorithm with steps.
SUPPORT VECTOR MACHINE (SVM)
Introduction to Support Vector Machine
Support Vector Machine (SVM) is a supervised machine learning algorithm used mainly for classification and sometimes regression problems. Its main goal is to separate data into different classes by finding the best boundary between them.
Example problems:
- Spam email detection
- Image classification (cat vs dog)
- Face recognition
- Medical diagnosis
The boundary that separates the classes is called a Hyperplane.
Example: Suppose we have two types of data points:
- Class A
- Class B
SVM tries to draw a line (in 2D) or plane (in higher dimensions) that separates these classes with the maximum margin.
Hyperplane (Decision Surface)
A hyperplane is a boundary that separates data into different classes.
For 2D data → hyperplane is a line
For 3D data → hyperplane is a plane
For higher dimensions → hyperplane is a decision surface
Where:
| Symbol | Meaning |
|---|---|
| w | Weight vector |
| x | Input data point |
| b | Bias |
Key Concept: Margin
- Margin = distance between the hyperplane and the nearest data points.
- SVM tries to maximize this margin.
- The nearest data points are called Support Vectors.
- These support vectors determine the position of the hyperplane.
Types of Support Vector Kernels
Sometimes data cannot be separated using a straight line.
In such cases, SVM uses kernel functions to transform data into a higher dimension.
Common kernels include:
| Kernel Type | Purpose |
|---|---|
| Linear Kernel | For linearly separable data |
| Polynomial Kernel | For curved decision boundaries |
| Gaussian (RBF) Kernel | For complex patterns |
Linear Kernel
Linear kernel is used when the dataset can be separated by a straight line.
Example:
Class A and Class B are easily separable.
Formula:
Where:
xi and xj are input vectors.
Applications
- Text classification
- Spam detection
- Document categorization
Advantages:
- Simple
- Fast computation
Polynomial Kernel
Polynomial kernel is used when data has a curved relationship.
Formula:
Where:
| Parameter | Meaning |
|---|---|
| c | Constant |
| d | Degree of polynomial |
This kernel allows SVM to create curved decision boundaries.
Applications:
- Image processing
- Natural language processing
Gaussian Kernel (RBF Kernel)
Gaussian kernel is also called Radial Basis Function (RBF) kernel.
It is the most commonly used kernel because it can handle complex nonlinear data.
Formula:
Where:
| Symbol | Meaning |
|---|---|
| γ (gamma) | Kernel parameter |
| xi − xj | Distance between points |
This kernel maps data into very high dimensional space.
Applications:
- Face recognition
- Bioinformatics
- Handwriting recognition
Properties of Support Vector Machine
SVM has several important characteristics.
| Property | Explanation |
|---|---|
| Maximum Margin | SVM finds boundary with largest margin |
| Robust to Overfitting | Works well with high dimensional data |
| Uses Support Vectors | Only important data points affect model |
| Kernel Trick | Handles nonlinear data using kernels |
| Effective in High Dimensions | Works well when features are large |
Advantages of SVM
- High accuracy
- Effective in high dimensional spaces
- Works well with small datasets
- Memory efficient
- Can handle nonlinear classification
Issues in Support Vector Machine
Despite many advantages, SVM also has some limitations.
| Issue | Explanation |
|---|---|
| High Training Time | Training can be slow for large datasets |
| Kernel Selection | Choosing correct kernel is difficult |
| Parameter Tuning | Parameters like C and gamma must be optimized |
| Not Suitable for Large Datasets | Computationally expensive |
| Difficult Interpretation | Model is harder to interpret than decision trees |
Real Life Applications of SVM
| Field | Application |
|---|---|
| Healthcare | Disease diagnosis |
| Finance | Credit risk analysis |
| Cybersecurity | Malware detection |
| Image Processing | Object recognition |
| Marketing | Customer segmentation |
Summary
Support Vector Machine is a powerful supervised learning algorithm used for classification and regression. It works by finding the optimal hyperplane that maximizes the margin between different classes. With the help of kernel functions, SVM can handle both linear and nonlinear datasets. Although it provides high accuracy, it requires proper kernel selection and parameter tuning.
Important Exam Questions (MCA)
- Explain Support Vector Machine with diagram.
- What is a hyperplane in SVM?
- Explain types of kernel functions in SVM.
- Write advantages and disadvantages of SVM.
- Explain Gaussian kernel and polynomial kernel.