Transaction Processing Concept

Transaction Processing Concepts (DBMS)

A transaction is a logical unit of work that consists of one or more database operations.

Example

Transfer ₹1000 from Account A to Account B:

Debit A
Credit B

Both operations together form one transaction.

Characteristics of a Transaction (ACID Properties)

Property	Meaning
Atomicity	Transaction is all-or-nothing
Consistency	Database moves from one valid state to another
Isolation	Transactions do not interfere with each other
Durability	Changes are permanent after commit

Transaction States

State	Description
Active	Transaction is executing
Partially Committed	Last statement executed
Committed	Transaction completed successfully
Failed	Error occurred
Aborted	Transaction rolled back

Serializability

Serializability ensures that the result of concurrent transactions is equivalent to some serial execution.

Serial Schedule

Transactions execute one after another
No interleaving

Non-Serial Schedule

Operations are interleaved
Improves performance but may cause inconsistency

Testing of Serializability

Serializability is tested using:

Conflict Serializability
View Serializability

Serializability of Schedules

A schedule is the order in which operations of multiple transactions are executed.

Types of Schedules

Schedule Type	Description
Serial Schedule	One transaction completes before another starts
Non-Serial Schedule	Operations are interleaved
Serializable Schedule	Equivalent to some serial schedule

Conflict Serializable Schedule

Conflicting Operations

Two operations conflict if:

They belong to different transactions
They operate on the same data item
At least one is a write operation

Examples:

Read–Write
Write–Read
Write–Write

(Read–Read does NOT conflict)

Conflict Serializability Test (Precedence Graph)

Steps

Create a node for each transaction
Draw an edge Ti → Tj if:
- Ti’s conflicting operation occurs before Tj’s
Check for cycles

Rule

No cycle → Conflict Serializable
Cycle exists → Not Conflict Serializable

View Serializable Schedule

A schedule is view serializable if it produces the same result as a serial schedule based on:

Initial read
Final write
Read-from relationship

Characteristics

More general than conflict serializability
Difficult to test
All conflict-serializable schedules are view-serializable

Difference: Conflict vs View Serializability

Basis	Conflict Serializability	View Serializability
Test	Precedence graph	Complex logical test
Strictness	More strict	Less strict
Practical use	Commonly used	Rare

Recoverability

A schedule is recoverable if a transaction commits only after the transactions whose data it read have committed.

Example

T1 writes X
T2 reads X
T2 commits before T1 → Not Recoverable

Types of Schedules Based on Recoverability

Type	Description
Recoverable	Commit order is safe
Cascadeless	No transaction reads uncommitted data
Strict	Neither read nor write uncommitted data

Strict ⊂ Cascadeless ⊂ Recoverable

Recovery from Transaction Failures

Failures can occur due to:

System crash
Power failure
Software error
Transaction error

Recovery Goal

Maintain atomicity and durability

Recovery Actions

Action	Meaning
UNDO	Roll back transaction
REDO	Reapply committed transaction

Log-Based Recovery

A log records all database modifications.

Log Record Format

<Ti, X, old_value, new_value>

Write-Ahead Logging (WAL) Rule

Log record must be written before data is written to disk
Commit record written before commit

Recovery Using Logs

UNDO all uncommitted transactions
REDO all committed transactions

Checkpoints

A checkpoint is a mechanism to reduce recovery time.

What Happens at Checkpoint

All log records written to disk
Database buffers flushed
Checkpoint entry written in log

Advantages of Checkpoints

Faster recovery
Reduced log scanning
Improved performance

Deadlock Handling

A deadlock occurs when two or more transactions wait indefinitely for each other to release resources.

Deadlock Conditions

Mutual exclusion
Hold and wait
No preemption
Circular wait

Deadlock Handling Techniques

1. Deadlock Prevention

Ensures deadlock conditions never occur.

Techniques:

Wait-Die scheme
Wound-Wait scheme

2. Deadlock Avoidance

System avoids unsafe states.

Banker’s Algorithm (conceptual)

3. Deadlock Detection

Use Wait-for Graph
Detect cycles

4. Deadlock Recovery

Abort one or more transactions
Rollback transactions

Deadlock Handling Comparison

Method	Key Idea
Prevention	Disallow conditions
Avoidance	Predict safe states
Detection	Allow & detect
Recovery	Rollback transactions

Exam-Focused Summary (MCA)

Transaction is a unit of work
ACID properties ensure reliability
Serializability ensures correctness
Conflict serializability uses precedence graph
View serializability is more general
Recoverability ensures safe commits
Log-based recovery uses UNDO and REDO
Checkpoints reduce recovery cost
Deadlock handling ensures smooth concurrency

Distributed Database System (DDBMS)

A Distributed Database System is a collection of multiple logically related databases distributed over different network sites, connected through a communication network, and managed by a Distributed DBMS (DDBMS).

Key Objectives

Data sharing across locations
Improved reliability and availability
Better performance through parallelism
Local autonomy

Distributed Data Storage

Distributed data storage refers to how data is stored across multiple sites in a distributed database environment.

Goals of Distributed Data Storage

Reduce data access time
Improve system availability
Increase reliability
Support scalability

Data Fragmentation

Fragmentation divides a database into smaller parts called fragments, which are stored at different sites.

Types of Fragmentation

Type	Description	Example
Horizontal Fragmentation	Divides rows	STUDENT where Course = 'MCA'
Vertical Fragmentation	Divides columns	STUDENT(RollNo, Name)
Hybrid Fragmentation	Combination of both	Horizontal + Vertical

Data Replication

Replication means storing copies of data at multiple sites.

Types of Replication

Type	Description
Full Replication	Entire database at each site
Partial Replication	Only selected fragments replicated
No Replication	Each fragment stored at one site only

Advantages

High availability
Faster data access
Fault tolerance

Disadvantages

Update overhead
Consistency maintenance

Data Allocation

Data allocation decides where fragments and replicas are stored.

Allocation Strategies

Centralized allocation
Distributed allocation
Replicated allocation

Concurrency Control in Distributed Databases

Concurrency control ensures that simultaneous execution of transactions across multiple sites maintains database consistency.

Challenges in Distributed Concurrency Control

Network delays
Partial failures
Lack of global clock
Data replication issues

Distributed Locking

Similar to centralized locking but applied across sites.

Types of Locks

Shared (Read) Lock
Exclusive (Write) Lock

Problems

Deadlocks across sites
Communication overhead

Two-Phase Locking (2PL)

A transaction follows two phases:

Growing Phase – acquire locks
Shrinking Phase – release locks

Distributed 2PL

Locks managed across sites
Ensures serializability
May cause distributed deadlocks

Timestamp-Based Concurrency Control

Each transaction is assigned a unique timestamp
Older transactions get priority

Advantages

No deadlocks
Simple logic

Disadvantages

Transaction rollbacks
Starvation possible

Optimistic Concurrency Control

Transactions execute without locking
Validation done at commit time

Phases

Read Phase
Validation Phase
Write Phase

Comparison of Concurrency Control Methods

Method	Deadlock	Overhead
Lock-Based	Possible	High
Timestamp-Based	No	Medium
Optimistic	No	Low

Directory System in Distributed Databases

A Directory System keeps information about:

Data location
Fragment details
Replicas
Global schema

It helps users and DBMS to locate data transparently.

Functions of Directory System

Maintain global schema
Track fragments and replicas
Support query processing
Ensure location transparency

Types of Directory Systems

1. Centralized Directory

Single site maintains directory

Advantages

Simple design

Disadvantages

Single point of failure
Performance bottleneck

2. Distributed Directory

Directory information distributed across sites

Advantages

No single point of failure
Better performance

Disadvantages

Complexity
Synchronization issues

3. Replicated Directory

Directory replicated at multiple sites

Advantages

High availability
Fast access

Disadvantages

Update overhead

Comparison of Directory Systems

Type	Reliability	Complexity
Centralized	Low	Low
Distributed	High	High
Replicated	Very High	Medium

Transparency in Distributed Databases

The directory system supports different types of transparency:

Location Transparency – User need not know data location
Replication Transparency – User unaware of replicas
Fragmentation Transparency – User sees logical database

Exam-Focused Summary (MCA)

Distributed databases store data across multiple sites
Data storage uses fragmentation, replication, and allocation
Concurrency control maintains consistency in parallel transactions
Distributed locking and 2PL ensure serializability
Timestamp and optimistic methods reduce deadlocks
Directory system manages metadata and data location
Replicated directory provides best availability

Transaction Processing Concept