Introduction to Distributed Data Processing
Introduction to Distributed Data Processing
Distributed Data Processing means processing data across multiple computers (nodes) connected via a network.
Features
- Data stored at multiple locations
- Processing done collaboratively
- Network-based communication
Diagram: Distributed Data Processing
Node 1 Node 2 Node 3
(Data + App) (Data + App) (Data + App)
\ | /
Network Communication
Advantages
- Faster processing
- Resource sharing
- Fault tolerance
Distributed Database System (DDBS)
A Distributed Database System is a collection of multiple, logically interrelated databases distributed over a network.
Key Characteristics
- Data is physically distributed
- Appears as a single database to users
- Controlled by Distributed DBMS (DDBMS)
Diagram: Distributed Database
User
|
DDBMS Layer
/ | \
Site1 Site2 Site3
(DB1) (DB2) (DB3)
Promises of Distributed Database Systems
Why is DDBMS important?
Advantages / Promises
| Feature | Description |
|---|---|
| Transparency | User sees one DB |
| Reliability | Failure of one node doesn’t stop the system |
| Scalability | Easy to add nodes |
| Performance | Parallel processing |
| Availability | Data accessible anytime |
Types of Transparency
| Type | Meaning |
|---|---|
| Location Transparency | User doesn’t know the data location |
| Replication Transparency | Copies hidden |
| Fragmentation Transparency | Data split hidden |
Problem Areas in DDBMS
Challenges
Problems Table
| Problem | Description |
|---|---|
| Data Consistency | Maintaining the same data across nodes |
| Network Failure | Communication issues |
| Security | Data protection |
| Concurrency Control | Multiple users access |
| Query Optimization | Efficient query execution |
Example Problem
If one node fails → data may become inconsistent.
Distributed DBMS Architecture
Architecture defines the structure and interaction of components in DDBMS.
Architectural Models for Distributed DBMS
Client-Server Architecture
Diagram
Client → Request → Server → Database
Features
- Clients send requests
- Server processes data
Peer-to-Peer Architecture
Diagram
Node1 ↔ Node2 ↔ Node3
(All equal nodes)
Features
- No central server
- Each node acts as a client & server
Multi-tier Architecture
Diagram
Client → Application Server → Database Server
Features
- Better security
- Scalable design
Comparison of Architectures
| Architecture | Advantage | Disadvantage |
|---|---|---|
| Client-Server | Simple | Server overload |
| Peer-to-Peer | Flexible | Complex |
| Multi-tier | Secure | Costly |
DDBMS Architecture
Components
Architecture Diagram
User Interface
|
Global Query Processor
|
Local Query Processor
|
Local Databases
Layers Explanation
1. Global Query Processor
- Converts user query into sub-queries
2. Local Query Processor
- Executes queries at local sites
3. Data Manager
- Handles storage & retrieval
Types of Distributed DBMS
Based on Homogeneity
| Type | Description |
|---|---|
| Homogeneous | Same DBMS |
| Heterogeneous | Different DBMS |
Based on Data Distribution
| Type | Description |
|---|---|
| Replicated | Data copies |
| Fragmented | Data split |
| Hybrid | Both |
Data Fragmentation
Breaking database into smaller pieces.
Types
| Type | Description |
|---|---|
| Horizontal | Rows divided |
| Vertical | Columns divided |
| Hybrid | Both |
Fragmentation Diagram
Table: Students
----------------------
| ID | Name | Marks |
----------------------
Horizontal:
Site1 → ID 1–50
Site2 → ID 51–100
Vertical:
Site1 → ID, Name
Site2 → ID, Marks
Data Replication
Storing multiple copies of data at different locations.
Types
| Type | Description |
|---|---|
| Full | Entire DB copied |
| Partial | Some data copied |
Advantages
- High availability
- Faster access
Disadvantages
- Update complexity
- Storage cost
Combined DDBMS Working
User Query
|
Global Processor
|
Fragmentation / Replication
|
Local Sites Execution
|
Result Combined
Important Exam Questions
Short Questions
- Define DDBMS.
- What is data fragmentation?
- What is replication?
Long Questions
- Explain the architecture of DDBMS.
- Describe the advantages and problems of DDBMS.
- Compare Client-Server and Peer-to-Peer.
Practical/Theory Mix
- Explain fragmentation with an example.
- Draw an architecture diagram of DDBMS.
- Discuss transparency in DDBMS.
Final Summary
- DDBMS = distributed + single system view
- Architectures → Client-Server, P2P, Multi-tier
- Fragmentation & Replication → core concepts
- Problems → consistency, security, network
Database Design in Distributed DBMS
Overview of Distributed Database Design
Database design in DDBMS focuses on how data is divided, stored, and managed across multiple sites.
Objectives
- Efficient data access
- High availability
- Minimum communication cost
- Data consistency
Design Process Diagram
Global Database Design
↓
Fragmentation
↓
Allocation
↓
Local Database Design
Alternative Design Strategies
These define how we design a distributed database.
Top-Down Approach
Design starts from a global schema, then divided into fragments.
Diagram
Global Schema
↓
Fragmentation
↓
Allocation
Advantages
- Better control
- Uniform design
- Suitable for new systems
Disadvantages
- Complex
- Time-consuming
Bottom-Up Approach
Existing databases are integrated into one distributed system.
Diagram
Local Databases
↓
Integration
↓
Global Schema
Advantages
- Easy for existing systems
- Faster implementation
Disadvantages
- Data inconsistency
- Integration issues
Comparison Table
| Feature | Top-Down | Bottom-Up |
|---|---|---|
| Start Point | Global schema | Local DBs |
| Use Case | New system | Existing system |
| Complexity | High | Moderate |
Distribution Design Issues
These are key challenges in distributing data.
Important Issues
- Data Distribution: Where to store data?
- Replication: How many copies of data?
- Fragmentation: How to divide data?
- Allocation: Where to place fragments?
- Transparency: Hide complexity from users
Design Issues Diagram
Data Distribution
|
Fragmentation
|
Replication
|
Allocation
Fragmentation
Breaking a database into smaller parts (fragments).
Why Fragmentation?
- Improve performance
- Reduce data transfer
- Increase parallelism
Types of Fragmentation
1. Horizontal Fragmentation
Rows are divided.
Students Table:
ID | Name | City
Site1 → City = Delhi
Site2 → City = Mumbai
2. Vertical Fragmentation
Columns are divided.
Site1 → ID, Name
Site2 → ID, Marks
3. Hybrid Fragmentation
A combination of both.
Fragmentation Comparison
| Type | Based On | Example |
|---|---|---|
| Horizontal | Rows | Region-wise data |
| Vertical | Columns | Sensitive data |
| Hybrid | Both | Complex systems |
Fragmentation Rules
- Completeness → No data loss
- Reconstruction → Can rebuild the original table
- Disjointness → No overlap
Allocation
Placing fragments at different sites.
Types of Allocation
1. Centralised Allocation
All data → One site
- Simple but less reliable
2. Distributed Allocation
Fragment1 → Site1
Fragment2 → Site2
- Better performance
3. Replicated Allocation
Same data → Multiple sites
- High availability
Allocation Comparison
| Type | Advantage | Disadvantage |
|---|---|---|
| Centralized | Simple | Single point failure |
| Distributed | Fast access | Complex |
| Replicated | Reliable | High cost |
Fragmentation vs Allocation
| Feature | Fragmentation | Allocation |
|---|---|---|
| Meaning | Divide data | Place data |
| Purpose | Efficiency | Availability |
| Example | Split table | Store at the site |
Combined Design Workflow
Global Schema
↓
Fragmentation
↓
Allocation
↓
Replication
↓
Execution
Important Exam Questions
Short Questions
- Define fragmentation.
- What is allocation?
- The difference between horizontal and vertical fragmentation.
Long Questions
- Explain database design strategies in DDBMS.
- Discuss fragmentation types with examples.
- Explain allocation strategies.
Case-Based Question
- Design fragmentation for a student database.
- Suggest an allocation strategy for the banking system.
Final Summary
- Top-Down → Start global → divide
- Bottom-Up → Merge local DBs
- Fragmentation → split data
- Allocation → place data
- Goal → performance + availability + efficiency