Distributed DBMS Reliability & Parallel Database Systems
Distributed DBMS Reliability
Reliability Concepts
Reliability is the ability of a system to function correctly even in the presence of failures.
Key Goals
- Continuous operation
- Data consistency
- Fault recovery
Reliability Metrics
| Measure | Description |
|---|---|
| Availability | System uptime |
| Mean Time Between Failures (MTBF) | Average time between failures |
| Mean Time To Repair (MTTR) | Time to recover |
| Failure Rate | Frequency of failure |
Formula
Availability = MTBF / (MTBF + MTTR)
Fault Tolerance in Distributed Systems
Ability of system to continue working despite failures.
Techniques
| Technique | Description |
|---|---|
| Replication | Multiple copies of data |
| Redundancy | Backup components |
| Checkpointing | Save system state |
| Logging | Record transactions |
Fault Tolerance Diagram
Failure → Detection → Recovery → Resume Operation
Failures in Distributed DBMS
Types of Failures
Failure Types Table
| Failure Type | Description |
|---|---|
| Transaction Failure | Logical error |
| System Crash | Hardware/software crash |
| Site Failure | Entire node failure |
| Network Failure | Communication breakdown |
Example
- Power failure at one node
- Network disconnection
Local Reliability Protocols
Protocols that ensure reliability at the single-site level.
Techniques
- Logging: Record all operations
- Checkpointing: Save DB state periodically
- 3Recovery Manager: Restores the system after a crash
Local Recovery Flow
Failure → Log Check → Rollback/Redo → Recovery
Distributed Reliability Protocols
Ensure reliability across multiple sites.
Important Protocol
Two-Phase Commit (2PC)
2PC Process
Coordinator → Prepare
↓
Participants → Vote (Yes/No)
↓
Coordinator → Commit / Abort
Phases
| Phase | Description |
|---|---|
| Prepare | Ask nodes |
| Commit | Final decision |
Advantages
- Ensures consistency
Disadvantages
- Blocking problem
Site Failure
When a complete node becomes unavailable.
Handling Techniques
- Replication
- Failover mechanisms
- Recovery protocols
Site Failure Diagram
Site Failure → Switch to Backup Site → Continue Execution
Network Partitioning
Network splits into isolated groups of nodes.
Problem
- Nodes cannot communicate
- Data inconsistency
Solution
- Majority voting
- Replication control
Partition Example
Cluster A ←X→ Cluster B
(No communication)
Parallel Database System
A database system where multiple processors execute queries simultaneously.
Goals
- Speed up query processing
- Improve throughput
Parallel DB Architectures
1. Shared Memory
Diagram
CPU1, CPU2 → Shared Memory → DB
Features
- Fast communication
- Limited scalability
2. Shared Disk
CPU1, CPU2 → Shared Disk
- Moderate scalability
3. Shared Nothing
Node1 (CPU+Disk)
Node2 (CPU+Disk)
Node3 (CPU+Disk)
Features
- High scalability
- No resource sharing
Architecture Comparison
| Architecture | Scalability | Complexity |
|---|---|---|
| Shared Memory | Low | Low |
| Shared Disk | Medium | Medium |
| Shared Nothing | High | High |
Parallel Data Placement
Distributing data across multiple processors.
Techniques
| Method | Description |
|---|---|
| Round Robin | Even distribution |
| Hash Partitioning | Based on the hash function |
| Range Partitioning | Based on value ranges |
Parallel Query Processing
Executing queries using multiple processors simultaneously.
Types
| Type | Description |
|---|---|
| Inter-query | Multiple queries parallel |
| Intra-query | Single query parallel |
Query Parallelism
Query → Split → Execute on Nodes → Combine Results
Load Balancing
Distributing workload evenly across nodes.
Importance
- Avoid overload
- Improve performance
Techniques
- Dynamic allocation
- Task scheduling
Database Clusters
A group of interconnected databases acting as a single system.
Types
| Type | Description |
|---|---|
| Active-Active | All nodes active |
| Active-Passive | Backup node |
Cluster Diagram
Node1 ↔ Node2 ↔ Node3
|
Shared System View
Combined Workflow
User Query
↓
Parallel Execution
↓
Distributed Nodes
↓
Result Merge
Important Exam Questions
Short Questions
- Define reliability
- What is 2PC?
- What is a parallel DBMS?
Long Questions
- Explain fault tolerance techniques
- Describe distributed reliability protocols
- Explain parallel database architectures
Case-Based Questions
- Explain the handling of network partition
- Compare shared memory and shared nothing
- Explain load balancing
Final Summary
- Reliability → system works despite failures
- 2PC → ensures distributed consistency
- Failures → site, network, system
- Parallel DB → faster execution using multiple nodes
- Shared Nothing → most scalable architecture