Unit 2: Data Modelling & Database Design




What is Data Modelling?

Data modelling is the process of visually designing how data will be stored, connected, and accessed in a database.

In simple words

Data modelling is like creating a blueprint before building a database.

Just like architects draw a plan before constructing a building, IT teams draw a data model before building a database.

What is the ER (Entity-Relationship) Model?

ER Model (Entity-Relationship Model) represents data and how different data items are related.

ER Model uses ER diagrams to show:

  • Entities (things)
  • Attributes (details about things)
  • Relationships (how things are connected)

Key Concepts

1. Entity

An entity is any real-world object or concept about which data is stored.

EntityReal-life meaning
CustomerPerson who buys products
ProductItem being sold
EmployeePerson working in company
OrderCustomer’s purchase record

Entity = Noun (Thing)

2. Attributes

Attributes are properties or details of an entity.

EntityAttributes (Details)
CustomerCustomer ID, Name, Phone, Email
ProductProduct ID, Name, Price, Category
EmployeeEmployee ID, Name, Salary, Department

Attribute = Property/Characteristic

3. Types of Attributes

TypeMeaningExample
Key AttributeUniquely identifies an entityStudent Roll Number, Employee ID
Composite AttributeCan be broken into smaller partsName → First Name + Last Name
Multivalued AttributeHas more than one valuePhone numbers, Email IDs
Derived AttributeCan be calculatedAge (from Date of Birth), Total Price (qty × rate)

4. Relationships

A relationship shows how entities are connected.

Relationship TypeMeaningExample
One-to-One (1:1)One entity linked to oneCEO → One company
One-to-Many (1:M)One entity linked to manyOne customer → many orders
Many-to-Many (M:M)Many entities linked to manyStudents ↔ Courses

Relationship = Verb (links between entities)

5. Constraints

Constraints are rules applied to ensure data correctness.

TypeExample
Primary keyEvery Customer ID must be unique
Foreign keyAn order must belong to an existing customer
Not nullPhone number cannot be blank
CheckAge must be greater than 18

ER Diagram Example

Customer (CustomerID, Name, Phone) Orders (OrderID, Date, Amount) Relationship: Customer places Order (1-to-Many)

ER Diagram Representation:

Customer 1 -----< places >----- M Order

(One customer can place many orders)

Extended ER Features (Advanced Concepts)

Generalization

Combining similar entities into a higher-level entity
“Bottom-up approach”

Example
Car + Bike + Truck → Vehicle (general entity)

Specialization

Breaking a general entity into sub-entities
“Top-down approach”

Example
Employee → Manager, Worker, Intern

Aggregation

Combining a relationship and entity into a higher-level entity
Used when relationship itself has attributes

Example
Project + Employee = Works-On → Department supervises this

Aggregation is helpful in complex systems like HRMS, banking, ERP.

Why is ER Modelling Important for Business?

BenefitBusiness Impact
Clear data structureSaves time, avoids confusion
Reduces errorsEnsures accuracy in reports
Improves communicationBusiness & IT understand each other
Better database performanceSmooth operations & transactions
Supports decision-makingBetter analytics & dashboards

Summary (Quick Revision Sheet)

ConceptMeaning
EntityObject about which data is stored
AttributeCharacteristics of entity
RelationshipHow entities are linked
Key AttributeUnique identifier
Composite AttributeCan be divided
Multivalued AttributeMultiple values
Derived AttributeCalculated
GeneralizationMerge similar entities
SpecializationDivide into sub-entities
AggregationGroup entity + relationship

Final Tip

ER Model = Visual plan for a database
It helps companies avoid data errors, duplication, and confusion.

Relational Database Model 

Think of a relational database as a set of neat spreadsheets (tables) that are linked together. Each table holds information about one “thing” (entity) — customers, orders, products — and the links between tables keep data accurate and avoid repetition.

1. Structure — what a relational model looks like

Table (Relation): like a spreadsheet with rows and columns.

  • Row (Tuple): one record / one real-world instance (e.g., one customer).
  • Column (Attribute): one piece of information about that record (e.g., name, phone).

Relational Schema: the design/blueprint showing all tables and how they relate.

  • Example: CUSTOMER(CustomerID, Name, Phone, Email)
  • Example: ORDER(OrderID, OrderDate, CustomerID, TotalAmount)

2. Keys — how tables are linked and how we ensure uniqueness

Primary Key (PK): a column (or set of columns) that uniquely identifies each row in a table.

  • Example: CustomerID uniquely identifies a customer.
  • Rules: not null, unique.
  • Foreign Key (FK): a column in one table that refers to the primary key of another table — this creates the relationship.  Example: Order.CustomerID is an FK that points to Customer.CustomerID.
  • Candidate Key: a column (or combo) that could be a primary key (there may be several), but one is chosen as PK.
  • Composite Key: PK made of more than one column (e.g., StudentID + CourseID for an enrollment table).
  • Surrogate Key: an artificial key (like an auto-increment ID) used instead of a natural key.

3. Integrity Constraints — rules to keep data correct

  • Entity Integrity: primary key must not be null and must be unique.
  • Referential Integrity: foreign key must either be null (if allowed) or match an existing primary key in the referenced table. This prevents “orphan” records.
  • Domain Constraint: each column contains values from an allowed set or type (e.g., Age must be numeric between 0–120).
  • Unique Constraint: column(s) must have unique values (e.g., Email if required unique).
  • Check Constraints: custom rules (e.g., Salary >= 0).

Business impact: these constraints prevent data errors (like orders with no customer, duplicate IDs, wrong data types).

4. Normalization — cleaning tables to reduce redundancy and anomalies

Normalization breaks a messy table into well-structured tables to avoid:

  • Insertion anomaly (can’t add certain data without others),
  • Update anomaly (need to update same data in many places),
  • Deletion anomaly (deleting one thing removes needed info).

We’ll show a simple running example using a single messy table and transform it.

Initial (unnormalized) table — SALES_RAW

SaleIDCustomerNameCustomerPhoneProductIDProductNameQtyUnitPriceOrderDate
1Rahul98765P100Pen1052025-10-01
2Rahul98765P200Notebook2502025-10-01
3Priya91234P100Pen2052025-10-02

Problems: customer details repeated, product details repeated, price repeated — lots of redundancy.

1NF — First Normal Form (remove repeating groups, atomic values)

Rule: Each column must contain atomic (single) values; table must have a primary key.

Transform by ensuring one product per row (our raw table above is already row-per-product). Ensure columns are atomic (no lists). Choose PK: SaleID + ProductID (or create SaleLineID).

Result: still redundant (customer and product info repeat), but now in 1NF.

2NF — Second Normal Form (remove partial dependency)

Rule: Table must be in 1NF and non-key attributes must depend on the whole primary key (not part of it). Applies when PK is composite.

If our PK is SaleID + ProductID, attributes like CustomerName depend only on SaleID (part of key) → move them out.

Split into:

  • SALE(SaleID, OrderDate, CustomerID)
  • SALE_LINE(SaleLineID or SaleID+ProductID, SaleID, ProductID, Qty, UnitPrice)

And create:

  • CUSTOMER(CustomerID, Name, Phone)
  • PRODUCT(ProductID, ProductName)

Now attributes are stored where they fully depend on the table’s PK.

3NF — Third Normal Form (remove transitive dependency)

Rule: In 2NF, and no non-key attribute depends on another non-key attribute.

Example problem: if PRODUCT table stored CategoryName AND CategoryDescription in the same table, and CategoryDescription depends on CategoryName (non-key depends on non-key), we should move categories to a separate table.

Resulting tables:

  • CUSTOMER(CustomerID, Name, Phone)
  • PRODUCT(ProductID, ProductName, CategoryID)
  • CATEGORY(CategoryID, CategoryName, CategoryDesc)
  • SALE(SaleID, OrderDate, CustomerID)
  • SALE_LINE(SaleLineID, SaleID, ProductID, Qty, UnitPrice)

Now each non-key attribute depends only on the key.

BCNF — Boyce-Codd Normal Form (stronger than 3NF)

Rule: For every functional dependency X → Y, X must be a superkey. It fixes edge cases where 3NF still allows anomalies.

Example: Suppose CourseInstructor table had Course → Instructor and also Instructor → Course (in special contexts). If a non-key determines another column, BCNF enforces further splitting.

In practice: BCNF is applied when you find unusual dependencies; many well-designed schemas already satisfy BCNF.

Relational Schema & Table Design 

  • Start with ER model: identify entities, relationships, attributes.
  • Choose clear primary keys: stable, unique, preferably surrogate keys (CustomerID auto-number).
  • Use foreign keys for links: name them clearly (CustomerID in SALE points to CUSTOMER).
  • Avoid duplicate data: normalize to at least 3NF unless performance reasons require denormalization.
  • Use meaningful column names: order_date not od.
  • Define data types & sizes: VARCHAR(100) for name, DECIMAL(10,2) for money.
  • Add constraints: PK, FK, NOT NULL, UNIQUE, CHECK — these protect data.
  • Index important columns: for faster searches (e.g., index OrderDate), but don’t over-index.
  • Document the schema: a simple data dictionary describing each table/column.
  • Plan for growth and backups: partition large tables, think about archiving old data.

Trade-off: normalization reduces redundancy but may require more joins for queries — balance for performance.

Role & Responsibilities of the Database Administrator (DBA)

The DBA is the person (or team) who keeps the database healthy, secure, and available. Think of the DBA as the “facilities manager” for company data.

Key responsibilities:

  • Installation & Configuration: install DBMS software, configure for the organization.
  • Schema Design Support: work with developers/analysts to design efficient schemas and ensure best practices.
  • Security & Access Control: create users/roles, grant privileges, enforce least privilege, audit logins.
  • Backup & Recovery: schedule backups, test restores, create disaster recovery plans.
  • Performance Tuning: monitor queries, add indexes, tune parameters, manage resources.
  • Capacity Planning: plan storage, CPU, memory needs; forecast growth.
  • High Availability & Replication: set up replication, clustering, failover solutions.
  • Maintenance Tasks: patching, upgrades, reorganizing tables/indexes, statistics updates.
  • Data Integrity & Quality: enforce constraints, help with ETL and data cleansing processes.
  • Monitoring & Alerts: set up alerts for failures, slow queries, resource saturation.
  • Support & Troubleshooting: help developers, resolve production incidents.
  • Documentation & Policies: maintain runbooks, access policies, backup procedures.
  • Compliance & Audits: support legal/finance audits; ensure regulatory compliance (e.g., data retention).

Business impact: good DBAs reduce downtime, prevent data loss, improve performance, and protect sensitive data — directly affecting company reliability and trust.

Quick Practical Examples (mini checklists)

When designing a Customer table

  • Columns: CustomerID (PK), Name, Phone, Email (UNIQUE?), AddressID (FK)
  • Constraints: CustomerID PK, Email UNIQUE (if required), Phone CHECK or format rule.

When creating an Order table:

  • Columns: OrderID (PK), OrderDate, CustomerID (FK), TotalAmount
  • Design: Don’t store customer name here — link to CUSTOMER. Use SALE_LINE for products per order.

Summary 

  • A relational DB = tables + keys + relationships.
  • Keys (PK/FK) enforce identity and links.
  • Integrity constraints keep data valid (no orphans, correct types).
  • Normalization (1NF → 2NF → 3NF → BCNF) removes redundancy and anomalies — aim for 3NF for business apps unless there’s a performance justification.
  • Good table design balances normalization, query speed, and simplicity.
  • DBA ensures the database runs, is secure, and recovers from failures.