Unit 2: Data Science, Big Data & Cloud Computing




Introduction

Today’s businesses run on data. Whether it's Amazon recommending products or banks detecting fraud, data science, big data, and cloud computing are core pillars of digital transformation.

Data vs Information

TermMeaningExampleBusiness Importance
DataRaw facts, symbols, numbers, text collected from sources500 customer purchase recordsNo direct meaning but base for analysis
InformationProcessed data that gives meaning“70% customers buy online at night”Helps managers make decisions

Key Point: Data becomes information after processing and analysis.

Data Value Chain

The data value chain shows how data creates business value.

StageDescriptionExample in Business
1. Data GenerationData produced from sourcesCustomer browsing behavior
2. Data CollectionGathering dataWebsite logs, CRM system
3. Data StorageSaving data securelyDatabase, Cloud storage
4. Data ProcessingCleaning & organizingRemoving duplicate records
5. Data AnalysisExtract insightsPredicting sales trends
6. Data VisualizationPresent data in chartsSales dashboard
7. Decision MakingUsing insights for strategyLaunching night-time offers

Goal: Convert data → actionable insights → business value.

Types of Data

TypeMeaningExampleUse Case
Structured DataOrganized, in tables/rowsExcel, SQL DatabaseFinance, HR records
Unstructured DataFree-form, not in tablesEmails, videos, social media postsSentiment analysis, marketing
Semi-structured DataHas tags but not fully structuredXML, JSONWeb data, APIs
Big DataExtremely large, complex dataData from YouTube, AmazonAI, automation, trend prediction

The Data Pipeline

The data pipeline is the step-by-step flow of data from source to insight.

1. Data Collection

  • Gathering data from multiple sources
  • Tools: Google Analytics, CRM, social media, sensors, surveys

Example: Collecting customer clicks from a shopping app.

2. Data Cleaning

  • Removing missing, duplicate, incorrect data
  • Standardizing formats

Why? Clean data = accurate results

Example: Removing wrong email ids, empty entries.

3. Data Storage

  • Storing data safely for future use

Storage TypeExample
Traditional DatabasesMySQL, Oracle
Data WarehouseAmazon Redshift, Snowflake
Cloud StorageAWS S3, Google Cloud

4. Data Analysis

  • Applying statistics, ML models to find patterns

Example: Finding which product sells most on weekends.

Tools: Excel, Python, R, Power BI, Tableau

5. Data Curation

  • Managing and organizing datasets
  • Ensuring quality, labels, documentation

Goal: Make data reusable & reliable.

6. Data Visualization

  • Present insights using graphs & dashboards

ToolUse
Tableau / Power BIBusiness dashboards
Excel chartsBasic visualization
Python (Matplotlib)Data graphs

Example: A dashboard showing monthly sales trends.

Big Data & Cloud Computing in Business

ConceptExplanationBusiness Benefit
Big DataHandling huge, real-time dataBetter prediction & personalization
Cloud ComputingStoring & processing data onlineCost-effective, scalable, secure

Real-World Example: Netflix uses big data + cloud to recommend shows.

Why MBA Students Must Learn This

SkillCareer Benefit
Understanding dataHelps in strategic decision making
Analytics knowledgeNeeded in marketing, finance, HR
Cloud conceptsUseful in IT & digital transformation roles

Key Takeaways

  • Data → Information → Insight → Business Value
  • Data pipeline ensures data flows smoothly from source to decision-making
  • Cloud & big data help handle large-scale digital business operations

Short Summary for Exams

Data is raw facts; information is processed data.
Data value chain: generation → collection → storage → processing → analysis → visualization → decision-making.
Data pipeline automates this flow.
Cloud computing provides scalable online data storage & processing.
Big data manages large & complex data sets used in modern digital businesses.

Big Data: Key Concepts

ConceptMeaningExample
VolumeVery large amount of dataNetflix user streaming data
VelocitySpeed at which data is generated & processedStock market price updates
VarietyData in different formatsVideos, social media posts, text, spreadsheets
VeracityAccuracy & reliability of dataCustomer reviews vs fake reviews
ValueBusiness benefit from dataAmazon recommendations increasing sales

These are called 5Vs of Big Data.

Why Big Data is Important

  • Real-time decision making
  • Understanding customer behavior
  • Competitive advantage
  • Helps in automation & prediction

Business Use Cases of Big Data

IndustryUse CaseExample
Retail & E-CommercePersonalized recommendationsAmazon, Flipkart suggestions
Banking & FinanceFraud detectionDetecting unusual transactions
HealthcarePredict disease, patient data analysisAI-based medical diagnosis
MarketingCustomer segmentation, targeted adsGoogle & Meta ads personalization
TransportRoute optimizationUber / Ola demand prediction
ManufacturingPredictive maintenanceMachine failure prediction

Big Data Conclusion: Big data helps businesses move from reactive management → predictive & proactive decisions.

Role of Data Science in Business Analytics & Decision-Making

What is Data Science?

Data Science = Statistics + Programming + Business knowledge
to extract insights and support decisions.

Key Roles in Business

RoleExplanationExample
Descriptive AnalyticsWhat happened?Dashboard of last month's sales
Diagnostic AnalyticsWhy it happened?Customer churn analysis
Predictive AnalyticsWhat will happen?Forecasting next month’s revenue
Prescriptive AnalyticsWhat should we do?Suggest marketing budget allocation

How Data Science helps Decision-Making

  • Identifies trends & opportunities
  • Reduces business risk
  • Improves customer experience
  • Helps set pricing strategies
  • Improves operational efficiency

Example: Zomato uses data science for delivery time prediction, pricing & restaurant recommendations.

Cloud Computing Fundamentals

What is Cloud Computing?

Cloud computing means using remote servers via the internet to store, manage & process data instead of local computers.

Key Features

FeatureMeaning
On-demand serviceUse when needed
ScalabilityIncrease or decrease resources anytime
Cost-efficientPay only for usage
Global accessUse anywhere online
Data securityAdvanced backup, encryption

Cloud Service Models

ModelFull FormWhat it providesExampleFor Whom
IaaSInfrastructure-as-a-ServiceServers, storage, networkingAWS EC2, Google Compute EngineIT admins, developers
PaaSPlatform-as-a-ServiceTools to build & deploy appsGoogle App Engine, AWS Elastic BeanstalkDevelopers
SaaSSoftware-as-a-ServiceReady-to-use softwareGmail, Salesforce, MS Office 365End users

Simple Example to Remember

You want to eat pizzaType
Make everything from scratch at homeOn-premise (no cloud)
Buy ready dough & toppings, bake at homeIaaS
Buy half-baked pizza, just heatPaaS
Order ready-to-eat pizzaSaaS

Learning Trick:

SaaS = Ready Software
PaaS = Platform to build software
IaaS = Hardware on rent

Key MBA Takeaways

  • Big Data enables scalable, real-time, data-driven decisions
  • Data Science turns raw data into strategic insights
  • Cloud Computing gives flexibility, cost-efficiency, and scalability
  • IaaS/PaaS/SaaS are core models every MBA must understand

Short Exam Answer (Memory Hint)

Big Data uses 5Vs (Volume, Velocity, Variety, Veracity, Value) for insights.
Data Science enables descriptive, diagnostic, predictive, and prescriptive analytics for better decisions.
Cloud Computing provides online computing services through IaaS (hardware), PaaS (application platform), SaaS (ready software)

Cloud Deployment Models

Cloud deployment model = Type of cloud environment based on ownership & accessibility

Deployment ModelMeaningFeaturesExample Use CaseExamples
Public CloudCloud resources shared by multiple users over the internetLow cost, scalable, pay-as-you-goStartups, SaaS companies, large-scale appsAWS, Google Cloud, Microsoft Azure
Private CloudCloud environment dedicated to one organizationHighly secure, customizableBanks, Government, HospitalsVMware, OpenStack, IBM Private Cloud
Hybrid CloudMix of public + private cloud with data sharing between themBalance of security & scalabilityEnterprises handling sensitive + public dataAWS Outpost, Azure Hybrid, Google Anthos
Community CloudShared by organizations with common regulatory or industry needsSecure, collaborativeUniversities, research institutionsGovernment/University collaboration clouds
Multi-Cloud (modern)Use of multiple cloud providers togetherAvoid vendor lock-in, better reliabilityCorporates running different workloads on different cloudsAWS + Google Cloud + Azure

Simple Memory Trick

Public = Shared
Private = Dedicated
Hybrid = Mix
Community = Group
Multi-Cloud = Multiple cloud providers

Why Deployment Models Matter for Managers

  • Helps decide cost vs security requirements
  • Guides IT infrastructure planning
  • Ensures compliance (ex: banking, healthcare)

Cloud Platforms for Data Storage, Management & Scalable Analytics

Cloud platforms allow businesses to store, process and analyze massive data without buying servers.

Top Cloud Platforms

PlatformKey ServicesPurpose
Amazon AWSS3, Redshift, EC2, EMR, Athena, GlueBig data storage, cloud computing & analytics
Google Cloud Platform (GCP)BigQuery, Cloud Storage, DataProc, Pub/SubReal-time analytics, AI/ML workloads
Microsoft AzureAzure Blob Storage, SQL Data Warehouse, HDInsightEnterprise analytics, hybrid cloud
IBM CloudIBM DB2, Watson AIAI-based analytics
Oracle CloudOracle Autonomous DBDatabase-intensive analytics

Cloud Storage Services (for business data)

Cloud ProviderStorage ServiceUse
AWSS3, GlacierObject storage, archival
Google CloudCloud StorageObject storage
Microsoft AzureBlob StorageCloud file & data store

Cloud Data Management Services

ServicePlatformPurpose
AWS RDS / DynamoDBAWSDatabase management
BigQueryGoogleLarge-scale SQL analytics
Azure SQLMicrosoftCloud database
Snowflake (on multi-cloud)SnowflakeData warehouse & analytics

Cloud Analytics & Processing Tools

AreaToolsFunction
Big Data ProcessingAWS EMR, GCP Dataproc, Azure HDInsightDistributed processing (Hadoop/Spark)
Data Integration (ETL)AWS Glue, Azure Data FactoryClean + transform data
Real-Time AnalyticsAWS Kinesis, GCP Pub/Sub, Azure Stream AnalyticsStreaming data from apps/sensors
AI/ML PlatformsAWS SageMaker, GCP Vertex AI, Azure ML StudioMachine learning & AI analytics

Why Businesses Use Cloud for Analytics

BenefitExplanation
ScalabilityHandle huge data (Petabytes+) anytime
Cost-efficientPay only for usage
Fast data processingReal-time analytics
Global accessWork from anywhere
High securityData encryption & backups
Innovation-friendlyAI, IoT, Big-data ready

Business Example Scenarios

IndustryCloud Use Case
E-commerce (Flipkart, Amazon)Customer recommendation engine
Banking & Fintech (HDFC, Paytm)Fraud detection, secure cloud storage
HealthcarePatient record management, AI diagnosis
Logistics (Delhivery)GPS tracking & real-time routing
Education (BYJU’S)Online learning content delivery

Quick Exam Answer

Cloud deployment models include Public, Private, Hybrid, Community, and Multi-cloud.
Cloud platforms like AWS, Google Cloud, and Azure offer scalable services for data storage (S3, Blob), management (RDS, BigQuery), and analytics (EMR, BigQuery, HDInsight).
They help businesses process Big Data efficiently and support AI-based decision making.