Data Warehouse

Overview of Data Warehousing

In today’s world, organizations generate a huge amount of data every day from different sources such as sales systems, banking software, websites, mobile apps, and ERP systems. This data is scattered, unorganized, and stored in different formats.

Data Warehousing is a technique used to collect, store, integrate, and manage large volumes of data from multiple sources in one central place so that meaningful analysis and decision-making can be done.

Simply put: Operational systems run the business, while data warehouses help analyze the business.

Definition of Data Warehouse

A Data Warehouse is a centralized repository that stores integrated, historical, and subject-oriented data to support decision-making and reporting.

Key Characteristics

Feature	Meaning (Simple Language)
Subject-oriented	Data is organized by topics like sales, customers, finance
Integrated	Data comes from multiple sources and is made consistent
Time-variant	Stores historical data (years/months/days)
Non-volatile	Data is not frequently changed or deleted

Data Warehousing Components

A data warehouse system is made up of several important components that work together.

Major Components Explained

Component	Description
Data Sources	OLTP systems, databases, ERP, CRM, files, APIs
ETL Tools	Extract, Transform, Load data into the warehouse
Staging Area	Temporary storage before data is cleaned
Data Warehouse Database	Central storage of processed data
Metadata Repository	Stores data definitions and rules
OLAP Tools	Used for analysis and reporting
Front-End Tools	Dashboards, reports, data visualization

Building a Data Warehouse (Step-by-Step)

Building a data warehouse is a systematic process.

Step 1: Requirement Analysis

Identify business needs
Decide what data is required
Understand reporting requirements

Step 2: Data Source Identification

Identify operational databases
Understand data formats and structures

Step 3: ETL Process

Extract data from sources
Transform data (clean, remove duplicates, standardize)
Load data into the warehouse

Step 4: Warehouse Design

Choose schema (Star or Snowflake)
Design fact tables and dimension tables

Step 5: Data Storage

Store historical and summarized data
Optimize for fast querying

Step 6: Testing and Deployment

Validate data accuracy
Deploy warehouse for users

Warehouse Database

The Warehouse Database is the core of the data warehouse where all cleaned and integrated data is stored.

Types of Tables Used

Table Type	Purpose
Fact Table	Stores quantitative data (sales amount, quantity)
Dimension Table	Stores descriptive data (customer, time, product)
Aggregate Table	Stores summarized data for faster queries

Schema Types

Schema	Description
Star Schema	Simple design, one fact table connected to dimensions
Snowflake Schema	Normalized dimensions, complex structure
Galaxy Schema	Multiple fact tables

Mapping the Data Warehouse to a Multiprocessor Architecture

To handle large volumes of data efficiently, data warehouses are mapped to multiprocessor systems.

Why Multiprocessor Architecture?

Faster query processing
High scalability
Better performance for large datasets

Types of Multiprocessor Architectures

Architecture	Description
Shared Memory	All processors share the same memory
Shared Disk	Each processor has its own memory but shares disk
Shared Nothing	Each processor has its own memory and disk

How Mapping Works

Data is partitioned across processors
Queries are executed in parallel
Results are combined for final output

Advantages of Data Warehousing

Benefit	Explanation
Better decision-making	Uses historical and integrated data
Faster queries	Optimized for analysis
Data consistency	Single version of truth
Business intelligence	Supports forecasting and trends

Real-Life Example

Example: Retail Company

Sales data from multiple branches
Customer data from CRM
Inventory data from ERP

All data is stored in a data warehouse
Managers analyze sales trends, customer behavior, and inventory needs

Summary (Exam-Friendly)

Data Warehousing stores large historical data
Helps in analysis, not daily transactions
Uses ETL process
Organized using fact and dimension tables
Supports parallel processing using multiprocessor architecture

Difference Between Database System and Data Warehouse

Multidimensional Data Model, Data Cubes, Star, Snowflake, Fact Constellation Concepts

Difference Between Database System and Data Warehouse

This is a very important MCA exam question and is often asked in table form.

Basis	Database System (OLTP)	Data Warehouse (OLAP)
Purpose	Daily operations	Analysis & decision making
Data Type	Current, real-time data	Historical data
Data Size	Small to medium	Very large
Updates	Frequent insert, update, delete	Rarely updated
Queries	Simple queries	Complex analytical queries
Users	Clerks, staff	Managers, analysts
Example	Banking transaction	Sales trend analysis
Performance	Optimized for speed of transactions	Optimized for reporting
Data Structure	Highly normalized	Denormalized

In simple words:

Database = runs the business
Data Warehouse = analyzes the business

Multidimensional Data Model

The Multidimensional Data Model is used in data warehouses to represent data in multiple dimensions (views).

Key Elements

Element	Meaning
Fact	Numeric data (sales, profit)
Dimension	Descriptive data (time, product, location)
Measure	Values stored in fact table

Example

Sales can be analyzed by:

Time (Year, Month)
Product
Region

Data Cubes

A Data Cube is a logical representation of multidimensional data.

Explanation in Simple Language

Think of a cube where:

X-axis = Product
Y-axis = Location
Z-axis = Time

Each cell shows sales value.

Operations on Data Cube

Operation	Meaning
Roll-up	Summarize data
Drill-down	View detailed data
Slice	Select one dimension
Dice	Select multiple dimensions
Pivot	Rotate cube view

Star Schema Concept

The Star Schema is the simplest and most commonly used schema in data warehouses.

Structure

One Fact Table in the center
Multiple Dimension Tables around it
Shape looks like a ⭐ star

Advantages

Advantage	Reason
Simple design	Easy to understand
Fast queries	Fewer joins
Easy maintenance	Simple structure

Snowflake Schema Concept

The Snowflake Schema is an extension of the star schema with normalized dimensions.

Key Features

Feature	Description
Normalized	Dimension tables are split
Complex	More joins required
Storage	Less redundancy

Star vs Snowflake (Quick View)

Basis	Star Schema	Snowflake Schema
Complexity	Low	High
Joins	Fewer	More
Storage	More	Less

Fact Constellation (Galaxy Schema)

A Fact Constellation schema contains multiple fact tables sharing common dimension tables.

Example

Fact Tables: Sales, Shipping
Shared Dimensions: Time, Product, Location

Characteristics

Feature	Description
Fact Tables	More than one
Complexity	High
Usage	Large enterprises
Flexibility	Very high

Concept Summary (One-Look Revision Table)

Concept	Meaning
Database System	Handles daily transactions
Data Warehouse	Supports analysis
Multidimensional Model	Data viewed from many angles
Data Cube	Logical multidimensional structure
Star Schema	Simple fact-dimension design
Snowflake Schema	Normalized dimension design
Fact Constellation	Multiple fact tables

Exam-Friendly Conclusion

Data Warehouse is read-oriented
Uses multidimensional data model
Data cubes improve analytical queries
Star schema is simplest
Snowflake reduces redundancy
Fact constellation supports complex analysis

Data Warehouse

Overview of Data Warehousing

Definition of Data Warehouse

Key Characteristics

Data Warehousing Components

Major Components Explained

Building a Data Warehouse (Step-by-Step)

Step 1: Requirement Analysis

Step 2: Data Source Identification

Step 3: ETL Process

Step 4: Warehouse Design

Step 5: Data Storage

Step 6: Testing and Deployment

Warehouse Database

Types of Tables Used

Schema Types

Mapping the Data Warehouse to a Multiprocessor Architecture

Why Multiprocessor Architecture?

Types of Multiprocessor Architectures

How Mapping Works

Advantages of Data Warehousing

Real-Life Example

Summary (Exam-Friendly)

Difference Between Database System and Data Warehouse

Multidimensional Data Model, Data Cubes, Star, Snowflake, Fact Constellation Concepts

Difference Between Database System and Data Warehouse

Multidimensional Data Model

Key Elements

Example

Data Cubes

Explanation in Simple Language

Operations on Data Cube

Star Schema Concept

Structure

Advantages

Snowflake Schema Concept

Key Features

Star vs Snowflake (Quick View)

Fact Constellation (Galaxy Schema)

Example

Characteristics

Concept Summary (One-Look Revision Table)

Exam-Friendly Conclusion

You might like