Distributed DBMS Reliability & Parallel Database Systems



Distributed DBMS Reliability

Distributed DBMS Reliability & Parallel Database Systems

Reliability Concepts

Reliability is the ability of a system to function correctly even in the presence of failures.

Key Goals

  • Continuous operation
  • Data consistency
  • Fault recovery

Reliability Metrics

MeasureDescription
AvailabilitySystem uptime
Mean Time Between Failures (MTBF)Average time between failures
Mean Time To Repair (MTTR)Time to recover
Failure RateFrequency of failure

Formula

Availability = MTBF / (MTBF + MTTR)

Fault Tolerance in Distributed Systems

Ability of system to continue working despite failures.

Techniques

TechniqueDescription
ReplicationMultiple copies of data
RedundancyBackup components
CheckpointingSave system state
LoggingRecord transactions

Fault Tolerance Diagram

Failure → Detection → Recovery → Resume Operation

Failures in Distributed DBMS

Types of Failures

Failure Types Table

Failure TypeDescription
Transaction FailureLogical error
System CrashHardware/software crash
Site FailureEntire node failure
Network FailureCommunication breakdown

Example

  • Power failure at one node
  • Network disconnection

Local Reliability Protocols

Protocols that ensure reliability at the single-site level.

Techniques

  • Logging: Record all operations
  • Checkpointing: Save DB state periodically
  • 3Recovery Manager: Restores the system after a crash

Local Recovery Flow

Failure → Log Check → Rollback/Redo → Recovery

Distributed Reliability Protocols

Ensure reliability across multiple sites.

Important Protocol

Two-Phase Commit (2PC)

2PC Process

Coordinator → Prepare

Participants → Vote (Yes/No)

Coordinator → Commit / Abort

Phases

PhaseDescription
PrepareAsk nodes
CommitFinal decision

Advantages

  • Ensures consistency

Disadvantages

  • Blocking problem

Site Failure

When a complete node becomes unavailable.

Handling Techniques

  • Replication
  • Failover mechanisms
  • Recovery protocols

Site Failure Diagram

Site Failure → Switch to Backup Site → Continue Execution

Network Partitioning

Network splits into isolated groups of nodes.

Problem

  • Nodes cannot communicate
  • Data inconsistency

Solution

  • Majority voting
  • Replication control

Partition Example

Cluster A ←X→ Cluster B
(No communication)

Parallel Database System

A database system where multiple processors execute queries simultaneously.

Goals

  • Speed up query processing
  • Improve throughput

Parallel DB Architectures

1. Shared Memory

Diagram

CPU1, CPU2 → Shared Memory → DB

Features

  • Fast communication
  • Limited scalability

2. Shared Disk

CPU1, CPU2 → Shared Disk
  • Moderate scalability

3. Shared Nothing 

Node1 (CPU+Disk)
Node2 (CPU+Disk)
Node3 (CPU+Disk)

Features

  • High scalability
  • No resource sharing

Architecture Comparison

ArchitectureScalabilityComplexity
Shared MemoryLowLow
Shared DiskMediumMedium
Shared NothingHighHigh

Parallel Data Placement

Distributing data across multiple processors.

Techniques

MethodDescription
Round RobinEven distribution
Hash PartitioningBased on the hash function
Range PartitioningBased on value ranges

Parallel Query Processing

Executing queries using multiple processors simultaneously.

Types

TypeDescription
Inter-queryMultiple queries parallel
Intra-querySingle query parallel

Query Parallelism

Query → Split → Execute on Nodes → Combine Results

Load Balancing

Distributing workload evenly across nodes.

Importance

  • Avoid overload
  • Improve performance

Techniques

  • Dynamic allocation
  • Task scheduling

Database Clusters

A group of interconnected databases acting as a single system.

Types

TypeDescription
Active-ActiveAll nodes active
Active-PassiveBackup node

Cluster Diagram

Node1 ↔ Node2 ↔ Node3
|
Shared System View

Combined Workflow

User Query

Parallel Execution

Distributed Nodes

Result Merge

Important Exam Questions

Short Questions

  • Define reliability
  • What is 2PC?
  • What is a parallel DBMS?

Long Questions

  • Explain fault tolerance techniques
  • Describe distributed reliability protocols
  • Explain parallel database architectures

Case-Based Questions

  • Explain the handling of network partition
  • Compare shared memory and shared nothing
  • Explain load balancing

Final Summary

  • Reliability → system works despite failures
  • 2PC → ensures distributed consistency
  • Failures → site, network, system
  • Parallel DB → faster execution using multiple nodes
  • Shared Nothing → most scalable architecture