Unit 4: Transaction Processing Concept
Transaction Processing Concepts (DBMS)
A transaction is a logical unit of work that consists of one or more database operations.
Example
Transfer ₹1000 from Account A to Account B:
- Debit A
- Credit B
Both operations together form one transaction.
Characteristics of a Transaction (ACID Properties)
| Property | Meaning |
|---|---|
| Atomicity | Transaction is all-or-nothing |
| Consistency | Database moves from one valid state to another |
| Isolation | Transactions do not interfere with each other |
| Durability | Changes are permanent after commit |
Transaction States
| State | Description |
|---|---|
| Active | Transaction is executing |
| Partially Committed | Last statement executed |
| Committed | Transaction completed successfully |
| Failed | Error occurred |
| Aborted | Transaction rolled back |
Serializability
Serializability ensures that the result of concurrent transactions is equivalent to some serial execution.
Serial Schedule
- Transactions execute one after another
- No interleaving
Non-Serial Schedule
- Operations are interleaved
- Improves performance but may cause inconsistency
Testing of Serializability
Serializability is tested using:
- Conflict Serializability
- View Serializability
Serializability of Schedules
A schedule is the order in which operations of multiple transactions are executed.
Types of Schedules
| Schedule Type | Description |
|---|---|
| Serial Schedule | One transaction completes before another starts |
| Non-Serial Schedule | Operations are interleaved |
| Serializable Schedule | Equivalent to some serial schedule |
Conflict Serializable Schedule
Conflicting Operations
Two operations conflict if:
- They belong to different transactions
- They operate on the same data item
- At least one is a write operation
Examples:
- Read–Write
- Write–Read
- Write–Write
(Read–Read does NOT conflict)
Conflict Serializability Test (Precedence Graph)
Steps
-
Create a node for each transaction
-
Draw an edge Ti → Tj if:
-
Ti’s conflicting operation occurs before Tj’s
-
-
Check for cycles
Rule
- No cycle → Conflict Serializable
- Cycle exists → Not Conflict Serializable
View Serializable Schedule
A schedule is view serializable if it produces the same result as a serial schedule based on:
- Initial read
- Final write
- Read-from relationship
Characteristics
- More general than conflict serializability
- Difficult to test
- All conflict-serializable schedules are view-serializable
Difference: Conflict vs View Serializability
| Basis | Conflict Serializability | View Serializability |
|---|---|---|
| Test | Precedence graph | Complex logical test |
| Strictness | More strict | Less strict |
| Practical use | Commonly used | Rare |
Recoverability
A schedule is recoverable if a transaction commits only after the transactions whose data it read have committed.
Example
- T1 writes X
- T2 reads X
- T2 commits before T1 → Not Recoverable
Types of Schedules Based on Recoverability
| Type | Description |
|---|---|
| Recoverable | Commit order is safe |
| Cascadeless | No transaction reads uncommitted data |
| Strict | Neither read nor write uncommitted data |
Strict ⊂ Cascadeless ⊂ Recoverable
Recovery from Transaction Failures
Failures can occur due to:
- System crash
- Power failure
- Software error
- Transaction error
Recovery Goal
-
Maintain atomicity and durability
Recovery Actions
| Action | Meaning |
|---|---|
| UNDO | Roll back transaction |
| REDO | Reapply committed transaction |
Log-Based Recovery
A log records all database modifications.
Log Record Format
-
<Ti, X, old_value, new_value>
Write-Ahead Logging (WAL) Rule
- Log record must be written before data is written to disk
- Commit record written before commit
Recovery Using Logs
- UNDO all uncommitted transactions
- REDO all committed transactions
Checkpoints
A checkpoint is a mechanism to reduce recovery time.
What Happens at Checkpoint
- All log records written to disk
- Database buffers flushed
- Checkpoint entry written in log
Advantages of Checkpoints
- Faster recovery
- Reduced log scanning
- Improved performance
Deadlock Handling
A deadlock occurs when two or more transactions wait indefinitely for each other to release resources.
Deadlock Conditions
- Mutual exclusion
- Hold and wait
- No preemption
- Circular wait
Deadlock Handling Techniques
1. Deadlock Prevention
Ensures deadlock conditions never occur.
Techniques:
- Wait-Die scheme
- Wound-Wait scheme
2. Deadlock Avoidance
System avoids unsafe states.
-
Banker’s Algorithm (conceptual)
3. Deadlock Detection
- Use Wait-for Graph
- Detect cycles
4. Deadlock Recovery
- Abort one or more transactions
- Rollback transactions
Deadlock Handling Comparison
| Method | Key Idea |
|---|---|
| Prevention | Disallow conditions |
| Avoidance | Predict safe states |
| Detection | Allow & detect |
| Recovery | Rollback transactions |
Exam-Focused Summary (MCA)
- Transaction is a unit of work
- ACID properties ensure reliability
- Serializability ensures correctness
- Conflict serializability uses precedence graph
- View serializability is more general
- Recoverability ensures safe commits
- Log-based recovery uses UNDO and REDO
- Checkpoints reduce recovery cost
- Deadlock handling ensures smooth concurrency
Distributed Database System (DDBMS)
A Distributed Database System is a collection of multiple logically related databases distributed over different network sites, connected through a communication network, and managed by a Distributed DBMS (DDBMS).
Key Objectives
- Data sharing across locations
- Improved reliability and availability
- Better performance through parallelism
- Local autonomy
Distributed Data Storage
Distributed data storage refers to how data is stored across multiple sites in a distributed database environment.
Goals of Distributed Data Storage
- Reduce data access time
- Improve system availability
- Increase reliability
- Support scalability
Data Fragmentation
Fragmentation divides a database into smaller parts called fragments, which are stored at different sites.
Types of Fragmentation
| Type | Description | Example |
|---|---|---|
| Horizontal Fragmentation | Divides rows | STUDENT where Course = 'MCA' |
| Vertical Fragmentation | Divides columns | STUDENT(RollNo, Name) |
| Hybrid Fragmentation | Combination of both | Horizontal + Vertical |
Data Replication
Replication means storing copies of data at multiple sites.
Types of Replication
| Type | Description |
|---|---|
| Full Replication | Entire database at each site |
| Partial Replication | Only selected fragments replicated |
| No Replication | Each fragment stored at one site only |
Advantages
- High availability
- Faster data access
- Fault tolerance
Disadvantages
- Update overhead
- Consistency maintenance
Data Allocation
Data allocation decides where fragments and replicas are stored.
Allocation Strategies
- Centralized allocation
- Distributed allocation
- Replicated allocation
Concurrency Control in Distributed Databases
Concurrency control ensures that simultaneous execution of transactions across multiple sites maintains database consistency.
Challenges in Distributed Concurrency Control
- Network delays
- Partial failures
- Lack of global clock
- Data replication issues
Distributed Locking
Similar to centralized locking but applied across sites.
Types of Locks
- Shared (Read) Lock
- Exclusive (Write) Lock
Problems
- Deadlocks across sites
- Communication overhead
Two-Phase Locking (2PL)
A transaction follows two phases:
- Growing Phase – acquire locks
- Shrinking Phase – release locks
Distributed 2PL
- Locks managed across sites
- Ensures serializability
- May cause distributed deadlocks
Timestamp-Based Concurrency Control
- Each transaction is assigned a unique timestamp
- Older transactions get priority
Advantages
- No deadlocks
- Simple logic
Disadvantages
- Transaction rollbacks
- Starvation possible
Optimistic Concurrency Control
- Transactions execute without locking
- Validation done at commit time
Phases
- Read Phase
- Validation Phase
- Write Phase
Comparison of Concurrency Control Methods
| Method | Deadlock | Overhead |
|---|---|---|
| Lock-Based | Possible | High |
| Timestamp-Based | No | Medium |
| Optimistic | No | Low |
Directory System in Distributed Databases
A Directory System keeps information about:
- Data location
- Fragment details
- Replicas
- Global schema
It helps users and DBMS to locate data transparently.
Functions of Directory System
- Maintain global schema
- Track fragments and replicas
- Support query processing
- Ensure location transparency
Types of Directory Systems
1. Centralized Directory
-
Single site maintains directory
Advantages
-
Simple design
Disadvantages
- Single point of failure
- Performance bottleneck
2. Distributed Directory
-
Directory information distributed across sites
Advantages
- No single point of failure
- Better performance
Disadvantages
- Complexity
- Synchronization issues
3. Replicated Directory
-
Directory replicated at multiple sites
Advantages
- High availability
- Fast access
Disadvantages
-
Update overhead
Comparison of Directory Systems
| Type | Reliability | Complexity |
|---|---|---|
| Centralized | Low | Low |
| Distributed | High | High |
| Replicated | Very High | Medium |
Transparency in Distributed Databases
The directory system supports different types of transparency:
- Location Transparency – User need not know data location
- Replication Transparency – User unaware of replicas
- Fragmentation Transparency – User sees logical database
Exam-Focused Summary (MCA)
- Distributed databases store data across multiple sites
- Data storage uses fragmentation, replication, and allocation
- Concurrency control maintains consistency in parallel transactions
- Distributed locking and 2PL ensure serializability
- Timestamp and optimistic methods reduce deadlocks
- Directory system manages metadata and data location
- Replicated directory provides best availability