Unit 4: Sampling




Sampling: Basic Concepts

Sampling is the process of selecting a small group (sample) from a large group (population) to study and draw conclusions.

Defining the Universe

The universe (also called the target population) refers to the entire group of people, items, or data that a researcher is interested in studying.

Types of Universe

  • Finite Universe: Limited number of items. Example: Number of students in a school.
  • Infinite Universe: Unlimited number. Example: Number of stars in the sky.

Example: If a company wants to study customer satisfaction, the universe could be all customers who purchased their product in the last year.

Concepts of Statistical Population

A statistical population is the complete set of observations or data that have something in common and are the focus of a statistical study.

Types of Population

  • Real Population: Actually exists (e.g., employees in a company).
  • Hypothetical Population: Does not exist physically but is assumed for study (e.g., result of tossing a coin infinite times).

Example: In a survey about the reading habits of college students, the statistical population is all college students in the city or region.

Sample

A sample is a subset of the population selected for analysis. It should represent the characteristics of the whole population.

Why use a sample?

  • Saves time and cost.
  • Practical and manageable.

Example: Out of 5,000 college students, selecting 500 students to survey is creating a sample.

Characteristics of a Good Sample

A good sample must have the following characteristics:

Sample

Summary

Sample

Sampling Frame: Definition and Practical Approach

A sampling frame is a list or database that includes all the elements (individuals, items, or units) in the population from which a sample is actually drawn. It serves as a bridge between the target population and the sample.

In simple terms: A sampling frame is a list of people or things you can choose your sample from.

📚 Example of Sampling Frame

Practical Approach to Determine the Sampling Frame

Here’s a step-by-step, practical method to determine the sampling frame:

Step 1: Define the Target Population Clearly

Start by defining who or what you want to study.  Example: A company wants to understand the satisfaction level of customers who purchased a new product in the last 6 months.

Step 2: Identify Available Sources of Information

Find out where you can get the data list related to your population. Example: CRM software, Sales database, Email subscriber list,  College admission records, Employee database.

Step 3: Ensure the Frame is Updated and Complete

Check if the list is:
  • Recent and up-to-date
  • Complete (includes all relevant units)
  • Free of duplicates
Example: Avoid using a customer list from last year if your study is about current customers.

Step 4: Remove Ineligible Units (if any)

Filter out people/items who do not qualify as part of your population. Example: Remove people who have returned the product if your study is about satisfied users only.

Step 5: Choose the Sampling Technique

Once you have a clean frame, choose how to pick the sample:
  • Simple Random Sampling
  • Stratified Sampling
  • Systematic Sampling, etc.

🔍 Importance of a Good Sampling Frame

A good sampling frame ensures:
  • High accuracy in results
  • Representation of the whole population
  • Reduction of sampling bias
  • Reliable and valid conclusions

❌ Common Problems in Sampling Frames

Example: Research Topic: Studying the effectiveness of online learning among MBA students at XYZ University.
  • Target Population: All MBA students enrolled in 2024–2025.
  • Sampling Frame: Official list of enrolled students from the registrar's office.
  • Sampling Method: Randomly select 200 students from the list.

Sampling Errors

Sampling error occurs when the sample chosen does not perfectly represent the population. It happens because only a part of the population is studied, not the whole. Example: If you're surveying 100 students out of 1,000 and most are from one department, results may not reflect the entire college’s views.

Causes:

  • Small sample size
  • Improper sampling technique
  • Non-random selection

Non-Sampling Errors

Non-sampling errors are errors that occur during the data collection, processing, or analysis — even if the sample is perfect.

Types of Non-Sampling Errors:

Methods to Reduce Sampling and Non-Sampling Errors

Sample Size Constraints

Sample size constraints refer to limitations that affect how many people or items can be included in a sample.

Common Constraints

  • Budget: Limited money for data collection
  • Time: Less time to gather responses
  • Resources: Limited access to people or databases
Effect: Too small a sample size may lead to unreliable or biased results.

Non-Response

Non-response occurs when some selected participants do not respond or refuse to participate.

Types:

  • Unit Non-Response: Whole participant doesn’t respond.
  • Item Non-Response: Respondent skips some questions.
Example: In an online survey, if 200 people are contacted and only 120 respond, the 80 non-responses can cause bias.

How to Reduce Non-Response:

  • Send follow-up reminders
  • Offer incentives
  • Keep survey short and easy
  • Ensure confidentiality

✅ Quick Summary Table

Probability Sampling

Probability sampling is a method in which every member of the population has a known and equal chance of being selected in the sample. It reduces bias and increases the reliability of results.

Simple Random Sampling

In this method, each member of the population has an equal and independent chance of being selected.

How it works:

  • Use random number tables or computer software
  • No pattern is followed
Example: A teacher wants to select 10 students from a class of 50. She assigns each student a number and uses a random number generator to select 10 students.

Advantages:

  • Simple to understand
  • No bias in selection

Disadvantages:

  • Not suitable for large populations without a complete list

Systematic Sampling

In systematic sampling, a sample is drawn by selecting every k-th unit from a list after a random start.
Formula: Sampling Interval (k) = Population size / Sample size
Example: From a list of 1,000 customers, you want to select 100. So, k = 1000/100 = 10. After choosing a random start point (say 5), select every 10th customer: 5, 15, 25, 35…

Advantages

  • Easy to implement
  • Quick and cost-effective

Disadvantages

  • If there's a hidden pattern, results may be biased

Stratified Random Sampling

Population is divided into homogeneous subgroups (strata) based on a certain characteristic (e.g., age, gender), and then a random sample is taken from each group. Example: A university wants to survey 200 students. It divides students into 4 groups based on year (1st, 2nd, 3rd, 4th year) and randomly selects 50 students from each.

Advantages

  • Ensures representation of all groups
  • More accurate than simple random sampling

Disadvantages

  • Requires detailed population information
  • More complex to administer

Area Sampling

Area sampling is used when the population is spread across a large geographical area. It divides the area into sections, then samples are taken from selected sections.

Used in: Field surveys, national census, rural marketing research

Example:To survey rural households in a state, divide the state into districts (areas), then randomly select some districts, then villages, then households.

Advantages

  • Practical for large-scale studies
  • Cost-effective for dispersed populations

Disadvantages

  • May miss variation within each area if not sampled well

Cluster Sampling

Population is divided into clusters (groups) that represent the whole population. Then entire clusters are randomly selected, and all units within are studied.

Difference from Stratified Sampling:

  • In stratified, elements are similar within each group but different across groups.
  • In cluster, each cluster is a mini-version of the population.
Example:A researcher wants to study school children in a city. Instead of selecting students from all schools, he randomly selects 5 schools (clusters) and surveys all students in them.

Advantages

  • Saves time and cost
  • Useful when population list is not available

Disadvantages

  • Less accurate than stratified sampling
  • High chance of sampling error if clusters are not well chosen

✅ Comparison Table

Non-Probability Sampling

In non-probability sampling, not every member of the population has a known or equal chance of being selected. Selection is often based on the researcher’s judgment, ease of access, or specific purpose. It is commonly used in qualitative research, exploratory studies, and when time or budget is limited.

Judgment Sampling (Expert Sampling)

The researcher selects units based on their knowledge and judgment about who will be the best representative of the population. Example: A marketing expert selects only experienced customers for feedback on a new luxury product.

Advantages
  • Useful when only experts or informed individuals are needed
  • Saves time

Disadvantages

  • Highly subjective
  • Risk of bias

Convenience Sampling

Samples are taken from those who are easily available or willing to participate. Example: A student surveys people at a shopping mall or interviews classmates because they are easy to reach.

Advantages

  • Very easy and quick
  • Low cost

Disadvantages

  • Not representative of the population
  • High chance of bias

Purposive Sampling

Also called selective sampling, the researcher selects specific individuals or groups with a purpose — usually because they meet certain criteria. Example: A researcher studying diabetes selects only patients diagnosed within the last year.

Advantages

  • Focused data collection
  • Useful for studying a specific subgroup

Disadvantages

  • Limited generalizability
  • Risk of excluding relevant views

Quota Sampling

Population is divided into subgroups (quotas), and a certain number of samples are chosen non-randomly from each group. Example: A researcher needs 50 men and 50 women for a study, so they keep interviewing until both quotas are filled, but selection is based on convenience.

Advantages

  • Ensures subgroup representation
  • Easier to manage than stratified sampling

Disadvantages

  • Not random within quotas
  • Still prone to bias

Snowball Sampling

Used when the population is hard to access. Existing participants refer other potential participants, and the sample grows like a “snowball.” Example: Used to study drug users, underground artists, or rare disease patients — where initial contacts introduce more people.

Advantages

  • Effective for hidden or hard-to-reach groups
  • Builds trust through referrals

Disadvantages

  • Limited control over sample composition
  • Potential for over-representation of connected groups

✅ Comparison Table

Sampling

Determining Sample Size

Sample size is the number of units (people, items, cases) selected from the population to be included in a study. Choosing the right sample size is very important — too small can lead to inaccurate results, and too large can waste resources.

🛠️ Practical Considerations in Sampling and Sample Size

Before calculating the sample size, a researcher must consider these real-world factors:

Sample Size Determination (Formula-Based)

For probability sampling, the sample size is often calculated using the following formula:
n = (Z² × p × q) / E²
Where:
  • n = Required sample size
  • Z = Z value (based on confidence level, e.g., 1.96 for 95%)
  • p = Estimated proportion of the population having the attribute
  • q = 1 − p
  • E = Margin of error (allowed error rate, like 5% = 0.05)

📊 Example: Suppose you want to find out how many people like a new product.

Confidence level: 95% → Z = 1.96
Expected proportion (p): 0.5 (if unknown, take 50%)
  • q = 1 – 0.5 = 0.5
  • Margin of error (E): 5% = 0.05
  • n = (1.96² × 0.5 × 0.5) / 0.05²
  • n = (3.8416 × 0.25) / 0.0025
  • n = 0.9604 / 0.0025 = 384.16 ≈ 385
So, you need a sample size of 385 respondents.

Tips for Determining Sample Size

Balance Between Accuracy and Cost

You need to balance:
  • Accuracy needed (bigger sample)
  • Time & money available (smaller sample)
So, sample size is always a trade-off between statistical needs and practical limitations.