Unit 2: Data Modelling & Database Design
What is Data Modelling?
Data modelling is the process of visually designing how data will be stored, connected, and accessed in a database.
In simple words
Data modelling is like creating a blueprint before building a database.
Just like architects draw a plan before constructing a building, IT teams draw a data model before building a database.
What is the ER (Entity-Relationship) Model?
ER Model (Entity-Relationship Model) represents data and how different data items are related.
ER Model uses ER diagrams to show:
- Entities (things)
- Attributes (details about things)
- Relationships (how things are connected)
Key Concepts
1. Entity
An entity is any real-world object or concept about which data is stored.
| Entity | Real-life meaning |
|---|---|
| Customer | Person who buys products |
| Product | Item being sold |
| Employee | Person working in company |
| Order | Customer’s purchase record |
Entity = Noun (Thing)
2. Attributes
Attributes are properties or details of an entity.
| Entity | Attributes (Details) |
|---|---|
| Customer | Customer ID, Name, Phone, Email |
| Product | Product ID, Name, Price, Category |
| Employee | Employee ID, Name, Salary, Department |
Attribute = Property/Characteristic
3. Types of Attributes
| Type | Meaning | Example |
|---|---|---|
| Key Attribute | Uniquely identifies an entity | Student Roll Number, Employee ID |
| Composite Attribute | Can be broken into smaller parts | Name → First Name + Last Name |
| Multivalued Attribute | Has more than one value | Phone numbers, Email IDs |
| Derived Attribute | Can be calculated | Age (from Date of Birth), Total Price (qty × rate) |
4. Relationships
A relationship shows how entities are connected.
| Relationship Type | Meaning | Example |
|---|---|---|
| One-to-One (1:1) | One entity linked to one | CEO → One company |
| One-to-Many (1:M) | One entity linked to many | One customer → many orders |
| Many-to-Many (M:M) | Many entities linked to many | Students ↔ Courses |
Relationship = Verb (links between entities)
5. Constraints
Constraints are rules applied to ensure data correctness.
| Type | Example |
|---|---|
| Primary key | Every Customer ID must be unique |
| Foreign key | An order must belong to an existing customer |
| Not null | Phone number cannot be blank |
| Check | Age must be greater than 18 |
ER Diagram Example
ER Diagram Representation:
(One customer can place many orders)
Extended ER Features (Advanced Concepts)
Generalization
Combining similar entities into a higher-level entity
“Bottom-up approach”
| Example |
|---|
| Car + Bike + Truck → Vehicle (general entity) |
Specialization
Breaking a general entity into sub-entities
“Top-down approach”
| Example |
|---|
| Employee → Manager, Worker, Intern |
Aggregation
Combining a relationship and entity into a higher-level entity
Used when relationship itself has attributes
| Example |
|---|
| Project + Employee = Works-On → Department supervises this |
Aggregation is helpful in complex systems like HRMS, banking, ERP.
Why is ER Modelling Important for Business?
| Benefit | Business Impact |
|---|---|
| Clear data structure | Saves time, avoids confusion |
| Reduces errors | Ensures accuracy in reports |
| Improves communication | Business & IT understand each other |
| Better database performance | Smooth operations & transactions |
| Supports decision-making | Better analytics & dashboards |
Summary (Quick Revision Sheet)
| Concept | Meaning |
|---|---|
| Entity | Object about which data is stored |
| Attribute | Characteristics of entity |
| Relationship | How entities are linked |
| Key Attribute | Unique identifier |
| Composite Attribute | Can be divided |
| Multivalued Attribute | Multiple values |
| Derived Attribute | Calculated |
| Generalization | Merge similar entities |
| Specialization | Divide into sub-entities |
| Aggregation | Group entity + relationship |
Final Tip
ER Model = Visual plan for a database
It helps companies avoid data errors, duplication, and confusion.
Relational Database Model
Think of a relational database as a set of neat spreadsheets (tables) that are linked together. Each table holds information about one “thing” (entity) — customers, orders, products — and the links between tables keep data accurate and avoid repetition.
1. Structure — what a relational model looks like
Table (Relation): like a spreadsheet with rows and columns.
- Row (Tuple): one record / one real-world instance (e.g., one customer).
- Column (Attribute): one piece of information about that record (e.g., name, phone).
Relational Schema: the design/blueprint showing all tables and how they relate.
- Example:
CUSTOMER(CustomerID, Name, Phone, Email) - Example:
ORDER(OrderID, OrderDate, CustomerID, TotalAmount)
2. Keys — how tables are linked and how we ensure uniqueness
Primary Key (PK): a column (or set of columns) that uniquely identifies each row in a table.
- Example:
CustomerIDuniquely identifies a customer. - Rules: not null, unique.
- Foreign Key (FK): a column in one table that refers to the primary key of another table — this creates the relationship. Example:
Order.CustomerIDis an FK that points toCustomer.CustomerID. - Candidate Key: a column (or combo) that could be a primary key (there may be several), but one is chosen as PK.
- Composite Key: PK made of more than one column (e.g.,
StudentID + CourseIDfor an enrollment table). - Surrogate Key: an artificial key (like an auto-increment
ID) used instead of a natural key.
3. Integrity Constraints — rules to keep data correct
- Entity Integrity: primary key must not be null and must be unique.
- Referential Integrity: foreign key must either be null (if allowed) or match an existing primary key in the referenced table. This prevents “orphan” records.
- Domain Constraint: each column contains values from an allowed set or type (e.g.,
Agemust be numeric between 0–120). - Unique Constraint: column(s) must have unique values (e.g.,
Emailif required unique). - Check Constraints: custom rules (e.g.,
Salary >= 0).
Business impact: these constraints prevent data errors (like orders with no customer, duplicate IDs, wrong data types).
4. Normalization — cleaning tables to reduce redundancy and anomalies
Normalization breaks a messy table into well-structured tables to avoid:
- Insertion anomaly (can’t add certain data without others),
- Update anomaly (need to update same data in many places),
- Deletion anomaly (deleting one thing removes needed info).
We’ll show a simple running example using a single messy table and transform it.
Initial (unnormalized) table — SALES_RAW
| SaleID | CustomerName | CustomerPhone | ProductID | ProductName | Qty | UnitPrice | OrderDate |
|---|---|---|---|---|---|---|---|
| 1 | Rahul | 98765 | P100 | Pen | 10 | 5 | 2025-10-01 |
| 2 | Rahul | 98765 | P200 | Notebook | 2 | 50 | 2025-10-01 |
| 3 | Priya | 91234 | P100 | Pen | 20 | 5 | 2025-10-02 |
Problems: customer details repeated, product details repeated, price repeated — lots of redundancy.
1NF — First Normal Form (remove repeating groups, atomic values)
Rule: Each column must contain atomic (single) values; table must have a primary key.
Transform by ensuring one product per row (our raw table above is already row-per-product). Ensure columns are atomic (no lists). Choose PK: SaleID + ProductID (or create SaleLineID).
Result: still redundant (customer and product info repeat), but now in 1NF.
2NF — Second Normal Form (remove partial dependency)
Rule: Table must be in 1NF and non-key attributes must depend on the whole primary key (not part of it). Applies when PK is composite.
If our PK is SaleID + ProductID, attributes like CustomerName depend only on SaleID (part of key) → move them out.
Split into:
SALE(SaleID, OrderDate, CustomerID)- SALE_LINE(SaleLineID or SaleID+ProductID, SaleID, ProductID, Qty, UnitPrice)
And create:
CUSTOMER(CustomerID, Name, Phone)- PRODUCT(ProductID, ProductName)
Now attributes are stored where they fully depend on the table’s PK.
3NF — Third Normal Form (remove transitive dependency)
Rule: In 2NF, and no non-key attribute depends on another non-key attribute.
Example problem: if PRODUCT table stored CategoryName AND CategoryDescription in the same table, and CategoryDescription depends on CategoryName (non-key depends on non-key), we should move categories to a separate table.
Resulting tables:
CUSTOMER(CustomerID, Name, Phone)- PRODUCT(ProductID, ProductName, CategoryID)
- CATEGORY(CategoryID, CategoryName, CategoryDesc)
- SALE(SaleID, OrderDate, CustomerID)
- SALE_LINE(SaleLineID, SaleID, ProductID, Qty, UnitPrice)
Now each non-key attribute depends only on the key.
BCNF — Boyce-Codd Normal Form (stronger than 3NF)
Rule: For every functional dependency X → Y, X must be a superkey. It fixes edge cases where 3NF still allows anomalies.
Example: Suppose CourseInstructor table had Course → Instructor and also Instructor → Course (in special contexts). If a non-key determines another column, BCNF enforces further splitting.
In practice: BCNF is applied when you find unusual dependencies; many well-designed schemas already satisfy BCNF.
Relational Schema & Table Design
- Start with ER model: identify entities, relationships, attributes.
- Choose clear primary keys: stable, unique, preferably surrogate keys (
CustomerIDauto-number). - Use foreign keys for links: name them clearly (
CustomerIDinSALEpoints toCUSTOMER). - Avoid duplicate data: normalize to at least 3NF unless performance reasons require denormalization.
- Use meaningful column names:
order_datenotod. - Define data types & sizes:
VARCHAR(100)for name,DECIMAL(10,2)for money. - Add constraints: PK, FK, NOT NULL, UNIQUE, CHECK — these protect data.
- Index important columns: for faster searches (e.g., index
OrderDate), but don’t over-index. - Document the schema: a simple data dictionary describing each table/column.
- Plan for growth and backups: partition large tables, think about archiving old data.
Trade-off: normalization reduces redundancy but may require more joins for queries — balance for performance.
Role & Responsibilities of the Database Administrator (DBA)
The DBA is the person (or team) who keeps the database healthy, secure, and available. Think of the DBA as the “facilities manager” for company data.
Key responsibilities:
- Installation & Configuration: install DBMS software, configure for the organization.
- Schema Design Support: work with developers/analysts to design efficient schemas and ensure best practices.
- Security & Access Control: create users/roles, grant privileges, enforce least privilege, audit logins.
- Backup & Recovery: schedule backups, test restores, create disaster recovery plans.
- Performance Tuning: monitor queries, add indexes, tune parameters, manage resources.
- Capacity Planning: plan storage, CPU, memory needs; forecast growth.
- High Availability & Replication: set up replication, clustering, failover solutions.
- Maintenance Tasks: patching, upgrades, reorganizing tables/indexes, statistics updates.
- Data Integrity & Quality: enforce constraints, help with ETL and data cleansing processes.
- Monitoring & Alerts: set up alerts for failures, slow queries, resource saturation.
- Support & Troubleshooting: help developers, resolve production incidents.
- Documentation & Policies: maintain runbooks, access policies, backup procedures.
- Compliance & Audits: support legal/finance audits; ensure regulatory compliance (e.g., data retention).
Business impact: good DBAs reduce downtime, prevent data loss, improve performance, and protect sensitive data — directly affecting company reliability and trust.
Quick Practical Examples (mini checklists)
When designing a Customer table
- Columns:
CustomerID (PK),Name,Phone,Email (UNIQUE?),AddressID (FK) - Constraints:
CustomerIDPK,EmailUNIQUE (if required),PhoneCHECK or format rule.
When creating an Order table:
- Columns:
OrderID (PK),OrderDate,CustomerID (FK),TotalAmount - Design: Don’t store customer name here — link to
CUSTOMER. UseSALE_LINEfor products per order.
Summary
- A relational DB = tables + keys + relationships.
- Keys (PK/FK) enforce identity and links.
- Integrity constraints keep data valid (no orphans, correct types).
- Normalization (1NF → 2NF → 3NF → BCNF) removes redundancy and anomalies — aim for 3NF for business apps unless there’s a performance justification.
- Good table design balances normalization, query speed, and simplicity.
- DBA ensures the database runs, is secure, and recovers from failures.