Unit 4: Database Implementation & Management




Database Implementation & Management

When we create a database, it’s not just about tables and queries —
there’s a lot going on behind the scenes to store data efficiently, process queries faster, and reduce cost/time.

Let’s understand each topic one by one 

Database Storage and Physical Structures

A Database Storage Structure refers to how data is actually stored inside a database system (on hard disk or memory). Databases organize data in a way that makes it easy to access and update quickly.

Levels of Data Storage

LevelDescriptionExample
Logical LevelHow users see data (tables, columns)SELECT * FROM Customers;
Physical LevelHow data is stored on diskData files, indexes, pages, blocks
Buffer / CacheTemporary memory area for fast accessRAM used for query results

Indexing

Imagine a book index — instead of reading every page, you check the index to find a topic quickly.
The same concept applies in databases.

Indexing is a technique to speed up data retrieval from large tables.

Example:

Without Index:
To find “EmpID = 105”, the system checks every record one by one.
With Index:
It directly jumps to the location where EmpID = 105 is stored.

Types of Indexes:

TypeDescription
Primary IndexAutomatically created on primary key column
Secondary IndexCreated on non-key columns for faster search
Clustered IndexData rows are physically arranged according to the index
Non-clustered IndexIndex created separately from the actual data table

SQL Example:

CREATE INDEX idx_name ON Employees(Name);

B-Trees (Balanced Trees)

A B-tree (Balanced Tree) is a data structure used to organize and access data efficiently in indexing.

Think of a B-tree as a hierarchical structure like a family tree:

  • Each node can store multiple keys.
  • Root node → main entry point.
  • Leaf nodes → actual data.

Why B-trees are useful:

  • Keep data sorted
  • Allow fast searching, insertion, and deletion
  • Reduce the number of disk reads

Example:

When you search for EmpID = 50:

  • The B-tree checks top (root) nodes first
  • Then moves down the branches until it finds the right leaf node.

Used in: Indexes in databases (MySQL, Oracle, etc.)

Hashing

Hashing is another way to locate data quickly. It uses a mathematical function (hash function) to convert a key value into an address.

Example:

If EmployeeID = 105,
A hash function might map it to address 10, so the data for that employee is stored at that location.

Types of Hashing:

TypeDescription
Static HashingHash table size is fixed
Dynamic HashingHash table grows/shrinks as data changes

Hashing is best for exact searches, like finding a record by ID.

Query Processing and Optimization

When you run a SQL command like:

SELECT * FROM Employees WHERE Salary > 50000;

The DBMS does several steps internally to process and optimize the query for faster results.

Steps in Query Processing

StepDescription
1. ParsingSQL query is checked for syntax errors
2. TranslationSQL is converted into a low-level query (relational algebra)
3. OptimizationDBMS chooses the best way (fastest path) to execute the query
4. ExecutionQuery runs and retrieves data from disk or memory

Query Optimization

Query Optimization means improving the performance of SQL queries —
so that results come faster and with minimum resource use (CPU, memory, I/O).

Techniques Used:

TechniqueDescription
Using IndexesAccess fewer rows by using indexed columns
Join OptimizationChoose the most efficient join method (nested loop, merge join, hash join)
Predicate PushdownFilter data early (apply WHERE condition first)
Query RewritingRewrite query in a faster equivalent form
CachingStore frequently used results in memory

Query Cost Analysis

When the database runs a query, it consumes resources
like CPU time, memory, and disk I/O (input/output operations).

Query Cost Analysis measures these to find the most efficient execution plan.

Factors Affecting Query Cost:

FactorDescription
Disk Access TimeTime to read/write data from storage
CPU UsageTime for processing and sorting data
Network CostFor distributed databases
Memory UsageAmount of RAM used for temporary data

Goal: Choose the query plan with the lowest cost (fastest execution).

Summary Table

TopicMeaningExample / Use
Database StorageHow data is stored physicallyPages, Blocks, Files
IndexingData lookup shortcutIndex on Employee Name
B-TreesHierarchical data structure for indexesUsed in MySQL indexing
HashingUses hash function to locate dataHash(EmployeeID) = memory address
Query ProcessingSteps DBMS takes to run a queryParsing → Optimization → Execution
Query OptimizationImprove speed & reduce costUse indexes, rewrite query
Query Cost AnalysisMeasure of efficiencyDisk + CPU + Memory cost

In Simple Words:

Database Implementation = How data is stored.
Query Optimization = How queries run faster.
Goal = Fast access + Less cost + Accurate results.

Database Security, Recovery & Transaction Management

Backup, Recovery, and Disaster Management

Backup

A backup is a copy of database data that can be used to restore information if something goes wrong.

Think of backup like saving a copy of your project file on Google Drive — if your laptop crashes, you can still recover your work.

Types of Backups

TypeDescriptionExample
Full BackupComplete copy of the entire databaseWeekly full database copy
Incremental BackupBacks up only the changes made after the last backupDaily small backups
Differential BackupBacks up changes made after the last full backupEvery 2–3 days backup of all recent changes

SQL Example:

BACKUP DATABASE myDB TO DISK = 'D:\Backup\myDB.bak';

Recovery

Recovery is the process of restoring the database after a failure using backup files and logs.

There are two main types of failures:

  1. System Crash – Power failure, hardware crash
  2. Transaction Failure – Error during transaction (e.g., division by zero, data conflict)

Recovery Techniques

TechniqueDescription
Roll ForwardRestore database using backup + log files to bring it to latest state
Roll BackUndo incomplete transactions using logs (like CTRL + Z for database)

Disaster Management

When something serious happens (like a fire, server crash, or cyber attack), disaster management ensures the database can be quickly restored.

Steps in Disaster Management:

  1. Regular Backups – Store copies in multiple locations (onsite + cloud)
  2. Redundancy – Use duplicate servers (mirroring/replication)
  3. Disaster Recovery Plan (DRP) – A formal plan for what to do after failure
  4. Testing – Regularly test recovery methods to ensure they work

Example: Banks keep real-time mirror servers in different cities to avoid data loss.

Concurrency Control

In a multi-user system, many users access the database at the same time.

Concurrency Control ensures that:

  • Multiple users can work together without conflicting
  • The accuracy and consistency of data are maintained

Example:

Two employees withdraw money from the same bank account at the same time.
Without control → Wrong balance
With control → Transactions handled one after another safely.

Locking

Locking prevents multiple users from changing the same data at once.

TypeDescriptionExample
Shared LockData can be read by many users but not modifiedViewing balance
Exclusive LockData can be modified by only one userUpdating salary

When a user locks a row, others must wait until it’s released.

Timestamping

Each transaction is given a unique timestamp (time of start).
The database uses these timestamps to decide the order of transactions.

CaseResult
Older transaction → executes firstAllowed
Newer one → waits or rolls backEnsures consistency

Used in systems where time order matters (like online ticket booking).

Deadlocks

A deadlock happens when two or more transactions wait for each other forever.

Example

  • Transaction 1 locks “Account A” and needs “Account B”.
  • Transaction 2 locks “Account B” and needs “Account A”.→ Both wait endlessly.

Deadlock Solutions:

  1. Deadlock Prevention: Stop conditions before they happen (give timeout or order of locking).
  2. Deadlock Detection: DBMS checks for cycles of waiting and stops one transaction.
  3. Deadlock Recovery: Abort one transaction and restart it.

Transaction Management

A transaction is a small logical unit of work —
like transferring ₹1000 from one account to another.

Example:

1. Deduct ₹1000 from Account A 2. Add ₹1000 to Account B

Both steps must complete together — if one fails, both should be cancelled.

ACID Properties

To ensure reliability, every transaction must follow ACID properties 👇

PropertyMeaningExample
A – AtomicityAll steps of a transaction must succeed or none at allMoney deducted & added both or neither
C – ConsistencyDatabase must remain valid before & after transactionTotal money before = total after
I – IsolationTransactions must not interfere with each otherTwo users withdrawing same time → no conflict
D – DurabilityOnce a transaction is committed, it remains saved even after crashSystem crash → committed data still safe

Transaction States

StateMeaning
ActiveTransaction is running
Partially CommittedAll operations done, waiting to save
CommittedSuccessfully completed
FailedError occurred
AbortedRolled back or undone

Transaction Control Commands (TCL)

CommandPurpose
COMMITSaves all changes permanently
ROLLBACKUndoes recent changes
SAVEPOINTSets a checkpoint in a transaction to roll back to

Summary Table

TopicMeaningKey Point
BackupCopy of databaseUsed for recovery
RecoveryRestoring data after failureRollback / Rollforward
Disaster ManagementPlan to restore system after major failureReplication, DRP
LockingPrevents simultaneous data modificationShared / Exclusive
TimestampingOrders transactions by timePrevents conflict
DeadlockTwo transactions waiting for each otherDetection & recovery
TransactionUnit of workMust follow ACID
ACIDReliability principlesAtomicity, Consistency, Isolation, Durability

In Short:

Backup & Recovery → Protect data
Concurrency Control → Prevent conflicts
Transaction Management → Ensure reliability

ACID → The golden rule of database safety