Unit 4: Transaction Processing Concept



Transaction Processing Concepts (DBMS)

A transaction is a logical unit of work that consists of one or more database operations.

Example

Transfer ₹1000 from Account A to Account B:

  1. Debit A
  2. Credit B

Both operations together form one transaction.

Characteristics of a Transaction (ACID Properties)

PropertyMeaning
AtomicityTransaction is all-or-nothing
ConsistencyDatabase moves from one valid state to another
IsolationTransactions do not interfere with each other
DurabilityChanges are permanent after commit

Transaction States

StateDescription
ActiveTransaction is executing
Partially CommittedLast statement executed
CommittedTransaction completed successfully
FailedError occurred
AbortedTransaction rolled back

Serializability

Serializability ensures that the result of concurrent transactions is equivalent to some serial execution.

Serial Schedule

  • Transactions execute one after another
  • No interleaving

Non-Serial Schedule

  • Operations are interleaved
  • Improves performance but may cause inconsistency

Testing of Serializability

Serializability is tested using:

  1. Conflict Serializability
  2. View Serializability

Serializability of Schedules

A schedule is the order in which operations of multiple transactions are executed.

Types of Schedules

Schedule TypeDescription
Serial ScheduleOne transaction completes before another starts
Non-Serial ScheduleOperations are interleaved
Serializable ScheduleEquivalent to some serial schedule

Conflict Serializable Schedule

Conflicting Operations

Two operations conflict if:

  • They belong to different transactions
  • They operate on the same data item
  • At least one is a write operation

Examples:

  • Read–Write
  • Write–Read
  • Write–Write

(Read–Read does NOT conflict)

Conflict Serializability Test (Precedence Graph)

Steps

  1. Create a node for each transaction

  2. Draw an edge Ti → Tj if:

    • Ti’s conflicting operation occurs before Tj’s

  3. Check for cycles

Rule

  • No cycle → Conflict Serializable
  • Cycle exists → Not Conflict Serializable

View Serializable Schedule

A schedule is view serializable if it produces the same result as a serial schedule based on:

  1. Initial read
  2. Final write
  3. Read-from relationship

Characteristics

  • More general than conflict serializability
  • Difficult to test
  • All conflict-serializable schedules are view-serializable

Difference: Conflict vs View Serializability

BasisConflict SerializabilityView Serializability
TestPrecedence graphComplex logical test
StrictnessMore strictLess strict
Practical useCommonly usedRare

Recoverability

A schedule is recoverable if a transaction commits only after the transactions whose data it read have committed.

Example

  • T1 writes X
  • T2 reads X
  • T2 commits before T1 → Not Recoverable

Types of Schedules Based on Recoverability

TypeDescription
RecoverableCommit order is safe
CascadelessNo transaction reads uncommitted data
StrictNeither read nor write uncommitted data

Strict ⊂ Cascadeless ⊂ Recoverable

Recovery from Transaction Failures

Failures can occur due to:

  • System crash
  • Power failure
  • Software error
  • Transaction error

Recovery Goal

  • Maintain atomicity and durability

Recovery Actions

ActionMeaning
UNDORoll back transaction
REDOReapply committed transaction

Log-Based Recovery

A log records all database modifications.

Log Record Format

  • <Ti, X, old_value, new_value>

Write-Ahead Logging (WAL) Rule

  1. Log record must be written before data is written to disk
  2. Commit record written before commit

Recovery Using Logs

  • UNDO all uncommitted transactions
  • REDO all committed transactions

Checkpoints

A checkpoint is a mechanism to reduce recovery time.

What Happens at Checkpoint

  • All log records written to disk
  • Database buffers flushed
  • Checkpoint entry written in log

Advantages of Checkpoints

  • Faster recovery
  • Reduced log scanning
  • Improved performance

Deadlock Handling

A deadlock occurs when two or more transactions wait indefinitely for each other to release resources.

Deadlock Conditions

  1. Mutual exclusion
  2. Hold and wait
  3. No preemption
  4. Circular wait

Deadlock Handling Techniques

1. Deadlock Prevention

Ensures deadlock conditions never occur.

Techniques:

  • Wait-Die scheme
  • Wound-Wait scheme

2. Deadlock Avoidance

System avoids unsafe states.

  • Banker’s Algorithm (conceptual)

3. Deadlock Detection

  • Use Wait-for Graph
  • Detect cycles

4. Deadlock Recovery

  • Abort one or more transactions
  • Rollback transactions

Deadlock Handling Comparison

MethodKey Idea
PreventionDisallow conditions
AvoidancePredict safe states
DetectionAllow & detect
RecoveryRollback transactions

Exam-Focused Summary (MCA)

  • Transaction is a unit of work
  • ACID properties ensure reliability
  • Serializability ensures correctness
  • Conflict serializability uses precedence graph
  • View serializability is more general
  • Recoverability ensures safe commits
  • Log-based recovery uses UNDO and REDO
  • Checkpoints reduce recovery cost
  • Deadlock handling ensures smooth concurrency

Distributed Database System (DDBMS)

A Distributed Database System is a collection of multiple logically related databases distributed over different network sites, connected through a communication network, and managed by a Distributed DBMS (DDBMS).

Key Objectives

  • Data sharing across locations
  • Improved reliability and availability
  • Better performance through parallelism
  • Local autonomy

Distributed Data Storage

Distributed data storage refers to how data is stored across multiple sites in a distributed database environment.

Goals of Distributed Data Storage

  • Reduce data access time
  • Improve system availability
  • Increase reliability
  • Support scalability

Data Fragmentation

Fragmentation divides a database into smaller parts called fragments, which are stored at different sites.

Types of Fragmentation

TypeDescriptionExample
Horizontal FragmentationDivides rowsSTUDENT where Course = 'MCA'
Vertical FragmentationDivides columnsSTUDENT(RollNo, Name)
Hybrid FragmentationCombination of bothHorizontal + Vertical

Data Replication

Replication means storing copies of data at multiple sites.

Types of Replication

TypeDescription
Full ReplicationEntire database at each site
Partial ReplicationOnly selected fragments replicated
No ReplicationEach fragment stored at one site only

Advantages

  • High availability
  • Faster data access
  • Fault tolerance

Disadvantages

  • Update overhead
  • Consistency maintenance

Data Allocation

Data allocation decides where fragments and replicas are stored.

Allocation Strategies

  • Centralized allocation
  • Distributed allocation
  • Replicated allocation

Concurrency Control in Distributed Databases

Concurrency control ensures that simultaneous execution of transactions across multiple sites maintains database consistency.

Challenges in Distributed Concurrency Control

  • Network delays
  • Partial failures
  • Lack of global clock
  • Data replication issues

Distributed Locking

Similar to centralized locking but applied across sites.

Types of Locks

  • Shared (Read) Lock
  • Exclusive (Write) Lock

Problems

  • Deadlocks across sites
  • Communication overhead

Two-Phase Locking (2PL)

A transaction follows two phases:

  1. Growing Phase – acquire locks
  2. Shrinking Phase – release locks

Distributed 2PL

  • Locks managed across sites
  • Ensures serializability
  • May cause distributed deadlocks

Timestamp-Based Concurrency Control

  • Each transaction is assigned a unique timestamp
  • Older transactions get priority

Advantages

  • No deadlocks
  • Simple logic

Disadvantages

  • Transaction rollbacks
  • Starvation possible

Optimistic Concurrency Control

  • Transactions execute without locking
  • Validation done at commit time

Phases

  1. Read Phase
  2. Validation Phase
  3. Write Phase

Comparison of Concurrency Control Methods

MethodDeadlockOverhead
Lock-BasedPossibleHigh
Timestamp-BasedNoMedium
OptimisticNoLow

Directory System in Distributed Databases

A Directory System keeps information about:

  • Data location
  • Fragment details
  • Replicas
  • Global schema

It helps users and DBMS to locate data transparently.

Functions of Directory System

  • Maintain global schema
  • Track fragments and replicas
  • Support query processing
  • Ensure location transparency

Types of Directory Systems

1. Centralized Directory

  • Single site maintains directory

Advantages

  • Simple design

Disadvantages

  • Single point of failure
  • Performance bottleneck

2. Distributed Directory

  • Directory information distributed across sites

Advantages

  • No single point of failure
  • Better performance

Disadvantages

  • Complexity
  • Synchronization issues

3. Replicated Directory

  • Directory replicated at multiple sites

Advantages

  • High availability
  • Fast access

Disadvantages

  • Update overhead

Comparison of Directory Systems

TypeReliabilityComplexity
CentralizedLowLow
DistributedHighHigh
ReplicatedVery HighMedium

Transparency in Distributed Databases

The directory system supports different types of transparency:

  • Location Transparency – User need not know data location
  • Replication Transparency – User unaware of replicas
  • Fragmentation Transparency – User sees logical database

Exam-Focused Summary (MCA)

  • Distributed databases store data across multiple sites
  • Data storage uses fragmentation, replication, and allocation
  • Concurrency control maintains consistency in parallel transactions
  • Distributed locking and 2PL ensure serializability
  • Timestamp and optimistic methods reduce deadlocks
  • Directory system manages metadata and data location
  • Replicated directory provides best availability