Unsupervised Learning and Clustering



Unsupervised learning is a type of machine learning where the computer learns from data without any given answers. This means no one tells the computer what is right or wrong. The system only looks at the data and tries to find patterns, similarities, or groups by itself.

It is like giving a box of mixed items to a student and asking them to arrange the items in meaningful groups without any instructions. The computer uses the structure of data to understand how things are related.

Unsupervised Learning and Clustering

In daily life, we often do unsupervised learning without realising it. For example, when you arrange photos in your mobile gallery into folders like family, friends, and college, you group similar photos together. 

No one tells you how to group them; you decide based on similarity. In the same way, the computer groups similar data items.

Key Ideas

  • No labelled data is given

  • Computer finds hidden patterns

  • Useful when large data has no answers

Exam Tip
Unsupervised learning = learning without teacher.

Why Unsupervised Learning Matters

Unsupervised learning helps us understand large amounts of data easily. Companies collect huge amounts of data from users every day. Without grouping or organising this data, it becomes useless.

Unsupervised learning helps companies see trends, user behaviour, and customer interest. This makes better business decisions possible.

For example, online shopping websites group customers based on what they buy. Students who buy programming books are placed in one group. Students who buy novels are placed in another group. These groups help websites suggest correct products.

Key Ideas

  • Handles big data

  • Finds hidden structure

  • Helps in business and research

Remember This
Unsupervised learning turns raw data into useful information.

Clustering

Clustering is the process of dividing data into groups so that similar items stay together. Each group is called a cluster. Items inside one cluster are very similar, while items in different clusters are different from each other. Clustering is one of the most important tasks in unsupervised learning.

Think of your college library. Books are arranged into sections like programming, mathematics, and management. Books with similar topics stay together. This is clustering in real life.

Key Ideas

  • Groups similar data

  • Each group = cluster

  • No predefined labels

Exam Tip
Clustering = grouping similar objects.

Criterion Functions for Clustering

A criterion function is a rule that tells us how good a clustering result is. It measures the quality of clusters. The main aim is to make data inside a cluster as similar as possible and data in different clusters as different as possible.

Imagine you arrange students into study groups. If students in one group have very different subjects, then grouping is poor. If students in one group study the same subject, grouping is good. The criterion function checks this quality.

Key Ideas

  • Measures cluster quality

  • Helps compare clustering results

  • Used to improve clustering

Remember This
Better clustering = high similarity inside, low similarity outside.

Square Error Criterion (Basic Idea)

The square error measures the distance between data points and the centre of their cluster. Distance simply means how far two items are from each other. The aim is to keep this distance small. A smaller distance means better clustering.

Think of a group of students standing around a class leader. If all students stand close to the leader, grouping is good. If students stand far away, grouping is poor.

Key Ideas

  • Measures closeness

  • Smaller value = better clusters

  • Used in K-means

Iterative Square-Error Partitional Clustering

This method divides data into a fixed number of clusters and improves the result step by step. The word iterative means repeating steps again and again. The algorithm keeps changing clusters until the error becomes very small.

For example, you first divide students randomly into two groups. After seeing mistakes, you rearrange the students. You repeat this until the groups become correct.

Key Ideas

  • Fixed number of clusters

  • Repeats steps

  • Minimises square error

K-Means Clustering

K-means is the most popular partitional clustering method. K means the number of clusters. The algorithm chooses K centres and assigns each data point to the nearest centre. Then it updates the centres and repeats the process.

Suppose you want to divide students into three groups based on marks. You choose K = 3. The computer groups students into three clusters and keeps improving the groups.

Key Ideas

  • Choose K value

  • Find cluster centres

  • Repeat until stable

Exam Tip
K-means = simple and fast clustering method.

Steps of the K-Means Algorithm

The algorithm first selects K initial centres. Next, each data point goes to the nearest centre. Then new centres are calculated. These steps repeat until clusters stop changing.

Think of organising hostel rooms. You first place students randomly. Then you rearrange students based on habits. You repeat until everyone fits well.

Key Ideas

  • Select K

  • Assign data

  • Update centre

  • Repeat

Agglomerative Hierarchical Clustering

Agglomerative clustering builds clusters step by step. At the beginning, each data point is its own cluster. Then the closest clusters are merged again and again until only one big cluster remains.

Imagine each student stands alone. Then, students who know each other join. Then small groups join into bigger groups.

Key Ideas

  • Bottom-up approach

  • Merges clusters

  • Creates tree structure

Dendrogram (Tree Diagram)

Dendrogram (Tree Diagram)

A dendrogram is a tree-like diagram that shows how clusters merge. It helps us understand cluster formation visually.

Think of a family tree showing relationships. A dendrogram shows how data groups connect.

Key Ideas

  • Tree diagram

  • Shows merging

  • Used in hierarchical clustering

Differences: K-Means vs Hierarchical

Feature K-Means Hierarchical
Number of clusters Must choose K Not needed
Speed Fast Slow
Structure Flat Tree

Cluster Validation

Cluster validation checks whether clustering result is good or not. It helps ensure that clusters make sense and are useful.

Imagine a teacher checking group project quality. If groups are poorly formed, teacher reorganises.

Key Ideas

  • Checks quality

  • Finds best result

  • Avoids poor clusters

Types of Cluster Validation

Internal validation checks clustering using the data itself. External validation compares with known correct grouping. Relative validation compares different clustering methods.

Example: You compare two ways of grouping students and choose the better one.

Key Ideas

  • Internal

  • External

  • Relative

Why This Topic Matters

Clustering helps in customer analysis, medical research, image processing, and recommendation systems. It improves business and technology.

Example: Netflix groups users by movie interest.

Key Ideas

  • Used in industry

  • Helps decision making

Possible Exam Questions

Short Questions

  • Define unsupervised learning

  • What is clustering?

  • Explain K-means

Long Questions

  • Explain the K-means algorithm

  • Describe hierarchical clustering

  • Discuss cluster validation

Remember This

  • Unsupervised learning = no labels

  • Clustering = grouping

  • K-means = partitional

  • Hierarchical = tree-based

Detailed Summary

Unsupervised learning allows machines to learn from data without answers. Clustering is the most important task of unsupervised learning. It groups similar data into clusters. Criterion functions measure cluster quality. K-means is fast and simple, while hierarchical clustering builds tree-like groups.

Cluster validation ensures results are correct. These techniques help companies understand data and improve services.

Key Takeaways

  • Data can organise itself

  • Similar items form clusters

  • Clustering supports real-world systems

These notes are written to build a strong understanding and help students score well in exams.