Skip to content

DATABASE MANAGEMENT SYSTEM ​

Authors

πŸ§‘β€πŸ’» RK ROY & Senju Nikhil

🧾 "Databases aren't just about dataβ€”they're about trust, speed, and reliability."

Download Full Notes PDF

πŸ“š Data ​

Data is a collection of raw, unorganized facts and details like numbers, text, images, and more.

  • Data is measured in terms of bits and bytes
  • Data can be recorded and does not have any meaning unless processed

Types of Data ​

Quantitative data

  • Numerical form
  • Examples: weight, volume, temperature

Qualitative data

  • Descriptive, but not numerical
  • Examples: gender, ethnicity, religion

Information ​

Provides context of the data and enables decision making.

Database ​

Database is an electronic place/system where data is stored in a way that it can be easily accessed, managed, and updated. To make real use of data, we need Database Management Systems (DBMS).

DBMS ​

A DBMS (Database Management System) is software that interacts with users, applications, and the database itself to store, manage, and retrieve data efficiently and securely.

Advantages:

  • Data Integrity and Consistency: Ensures accuracy and consistency of data across the database
  • Data Sharing: Multiple users and applications can access the database simultaneously
  • Data Security: Allows different users to have different levels of access

DBMS Architecture ​

View of Data (Three Schema Architecture) ​

The purpose is to hide unnecessary details from users and make databases easier to use through three levels of abstraction:

Physical Level (Internal Level) ​

  • Describes how data is actually stored in memory (files, indexes, etc.)
  • Deals with low-level data structures, storage allocation, compression, encryption
  • Goal: Store data efficiently

Logical Level (Conceptual Level) ​

  • Describes what data is stored and relationships among data
  • Users at this level don't worry about physical storage details
  • Database designers (DBAs) use this level to plan the database structure
  • Goal: Make database easy to design and manage

View Level (External Level) ​

  • Describes how users see the data
  • Different users can have different views of the same database
  • Uses views/schemas to show only relevant parts of the data
  • Also helps secure data by restricting what each user group can see
  • Goal: Provide personalized, secure data views

Database Architecture

Data Models ​

Provides a way to describe the design of a DB at logical level. Underlying the structure of the DB is the Data Model; a collection of conceptual tools for describing data, data relationships, data semantics & consistency constraints.

Examples: ER model, Relational Model, object-oriented model, object-relational data model

Database Languages ​

Data Definition Language (DDL)

  • Used to specify the database schema
  • We specify consistency constraints, which must be checked every time DB is updated

Data Manipulation Language (DML)

  • Used to express database queries and updates
  • Data manipulation involves:
    • Retrieval of information stored in DB
    • Insertion of new information into DB
    • Deletion of information from the DB
    • Updating existing information stored in DB
  • Query language is a part of DML to specify statements requesting the retrieval of information

Practically, both language features are present in a single DB language, e.g., SQL.


Relational Model ​

Relational Model organizes the data in the form of tables.

Key Concepts:

  • Tuple: A single row of the table representing a single data point/unique record
  • Degree of table: Number of attributes/columns in a given table/relation
  • Cardinality: Total number of tuples in a given relation

Integrity Constraints ​

Integrity constraints in databases are rules that ensure the accuracy, consistency, and validity of data stored in a relational database. They prevent invalid data from being entered and maintain the logical correctness of relationships among tables.

Domain Integrity ​

Ensures that values in a column are valid according to the column's data type, range, or format.

sql
age INT CHECK (age >= 0);

Entity Integrity ​

Ensures that every table has a primary key, and that primary key values are unique and not NULL. Prevents duplicate or missing identifiers for rows.

sql
student_id INT PRIMARY KEY;

Referential Integrity ​

Ensures consistency between foreign keys and the primary key they reference. Prevents "orphan" records.

sql
FOREIGN KEY (course_id) REFERENCES Course(course_id);

Key Integrity ​

Guarantees uniqueness of data in certain columns. Includes Primary Key, Unique Key, and Foreign Key constraints.

sql
email VARCHAR(100) UNIQUE;

Check Constraints ​

Define a condition that each row must satisfy.

sql
salary DECIMAL CHECK (salary > 0);

Relational Model Keys ​

πŸ”‘ Simple Key ​

Key with only one attribute.

πŸ”‘ Primary Key ​

  • A column or set of columns that uniquely identifies each row in a table
  • Cannot have null values
  • There is only one primary key per table
  • Example: student_id in student table

πŸ”‘ Candidate Key ​

  • Set of all unique keys
  • Any column (or combination of columns) that can uniquely identify a row
  • A table can have multiple candidate keys, but only one is chosen as the primary key

πŸ”‘ Super Key ​

  • Candidate Key βˆͺ any other attributes
  • A set of one or more attributes that can uniquely identify a row
  • Candidate key βŠ† Super key
  • May contain extra columns

πŸ”‘ Alternate Key ​

  • A candidate key that is not chosen as the primary key
  • Still uniquely identifies rows

πŸ”‘ Foreign Key ​

  • A column in one table that refers to the primary key of another table
  • Used to link tables together

πŸ”‘ Composite Key ​

  • A key made up of two or more columns that together uniquely identify a row
  • Used when a single column is not enough

πŸ”‘ Unique Key ​

  • Like a candidate key: ensures all values are unique
  • Can allow one null value (depends on DBMS)
  • Can be used to enforce uniqueness on non-primary attributes

πŸ”‘ Compound Key ​

  • Primary key formed using 2 foreign keys
  • Key with more than one attribute

Keys Diagram

Key Constraints ​

NOT NULL ​

Restricts the user from having a NULL value. Ensures every element in the database has a value.

UNIQUE ​

Ensures that the values in the column are unique.

DEFAULT ​

Used to set the default value to the column. The default value is added to the columns if no value is specified for them.

CHECK ​

Keeps the check that integrity of data is maintained before and after the completion of CRUD operations.

PRIMARY KEY ​

This is an attribute or set of attributes that can uniquely identify each entity in the entity set. Must contain unique as well as not null values.

FOREIGN KEY ​

Whenever there is some relationship between two entities, there must be some common attribute between them. This common attribute must be the primary key of an entity set and will become the foreign key of another entity set.


Entity Relationship Model ​

Data Models ​

Collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints.

ER Model ​

It is a high level data model based on a perception of a real world that consists of a collection of basic objects, called entities and of relationships among these objects. Graphical representation of ER Model is ER diagram, which acts as a blueprint of DB.

Entity ​

An Entity is a "thing" or "object" in the real world that is distinguishable from all other objects.

  • Entity can be uniquely identified (by a primary attribute, aka Primary Key)
  • Strong Entity: Can be uniquely identified
  • Weak Entity: Can't be uniquely identified, depends on some other strong entity

Attributes ​

An entity is represented by a set of attributes. Each entity has a value for each of its attributes. For each attribute, there is a set of permitted values, called the domain, or value set, of that attribute.

Types of Attributes ​

Simple

  • Attributes which can't be divided further
  • Example: Customer's account number, Student's Roll number

Composite

  • Can be divided into subparts (other attributes)
  • Example: Name can be divided into first-name, middle-name, last-name

Single Valued

  • Only one value attribute
  • Example: Student ID, loan-number

Multi-Valued

  • Attribute having more than one value
  • Example: phone-number, nominee-name

Derived

  • Value can be derived from other related attributes
  • Example: Age, loan-age, membership-period

Relationships ​

Association among two or more entities.

Types:

  • Strong Relationship: Between two independent entities
  • Weak Relationship: Between weak entity and its owner/strong entity

Degree of Relationship:

  • Unary: Only one entity participates
  • Binary: Two entities participate
  • Ternary: Three entities participate

Relationship Cardinalities ​

One to OneOne to ManyMany to OneMany to Many

Strong vs Weak Entity ​

Strong Entity:

  • Can be uniquely identified by its own attributes
  • Has a primary key
  • Shown with single rectangle in ER diagram

Weak Entity:

  • Cannot be uniquely identified by its own attributes alone
  • Depends on a strong entity (owner entity)
  • Shown with double rectangle in ER diagram

Normalization ​

DBMS Normalization is a systematic approach to decompose tables to eliminate data redundancy. It is a multi-step process that puts data into tabular form, removes duplicate data, and sets up relationships between tables.

Why Normalization? ​

  • Eliminating redundant data
  • Keeping data consistent
  • Storage optimization
  • Making database structure more scalable and adaptable

Problems Without Normalization ​

If a table is not properly normalized and has data redundancy, it will:

  • Consume extra memory
  • Make it difficult to handle and update data
  • Lead to Insertion, Update, and Deletion Anomalies

Example Table:

rollnonamebranchhodoffice_tel
401AkonCSEMr.X123
402BkonCSEMr.X123
403CkonCSEMr.X123
404DkonCSEMr.X123

Insertion Anomaly ​

  • Until a student opts for a branch, data cannot be inserted
  • Branch information repeated for all students

Update Anomaly ​

  • If HOD changes, all student records must be updated
  • Risk of data inconsistency if some records are missed

Deletion Anomaly ​

  • Deleting the last student in a branch loses branch information
  • Two different entities (Student and Branch) should not be kept together

Normal Forms ​

First Normal Form (1NF) ​

Rules:

  • Should only have single (atomic) valued attributes/columns
  • All columns should have unique names

Example:

Not in 1NF:

StudentIDNameSubjects
1RajMath, Physics
2RoyChemistry, Biology

In 1NF:

StudentIDNameSubject
1RajMath
1RajPhysics
2RoyChemistry
2RoyBiology

Second Normal Form (2NF) ​

Rules:

  • Must be in 1NF
  • Should not have partial dependencies

Partial Dependency: When a non-prime attribute depends on part of a composite primary key.

Example:

Not in 2NF:

StudentIDSubjectIDStudentNameSubjectName
1101RajMath
1102RajPhysics

In 2NF (split into 3 tables):

Students:

StudentIDStudentName
1Raj
2Roy

Subjects:

SubjectIDSubjectName
101Math
102Physics

Enrollments:

StudentIDSubjectID
1101
1102

Third Normal Form (3NF) ​

Rules:

  • Must be in 2NF
  • No transitive dependency

Transitive Dependency: When a non-prime attribute depends on another non-prime attribute.

Example:

Not in 3NF:

EmployeeIDEmployeeNameDeptIDDeptNameDeptLocation
1Aman10ITMumbai
2Neha20HRDelhi

In 3NF:

Employees:

EmployeeIDEmployeeNameDeptID
1Aman10
2Neha20

Departments:

DeptIDDeptNameDeptLocation
10ITMumbai
20HRDelhi

Boyce-Codd Normal Form (BCNF) ​

Rules:

  • Must be in 3NF
  • For each functional dependency (X β†’ Y), X should be a Super Key

Example:

Not in BCNF:

studentIdCourseInstructor
1DBMSRaj
2DSRaj
3AIRoy

In BCNF:

Courses:

CourseInstructor
DBMSRaj
DSRaj
AIRoy

Student Courses:

studentIdCourse
1DBMS
2DS
3AI

Fourth Normal Form (4NF) ​

Rules:

  • Must be in BCNF
  • No multi-valued dependency

Example:

Not in 4NF:

StudentCourseHobby
RajDBMSCricket
RajDBMSMusic
RajAICricket
RajAIMusic

In 4NF:

Student Courses:

StudentCourse
RajDBMS
RajAI

Student Hobbies:

StudentHobby
RajCricket
RajMusic

Fifth Normal Form (5NF) ​

Rules:

  • Must be in 4NF
  • Also called Project-Join Normal Form (PJNF)
  • Most advanced level of normalization
  • Cannot be decomposed without losing data

Example:

Not in 5NF:

SupplierPartProject
S1P1J1
S1P1J2
S1P2J1
S1P2J2

In 5NF:

Supplier Parts:

SupplierPart
S1P1
S1P2

Supplier Projects:

SupplierProject
S1J1
S1J2

Part Projects:

PartProject
P1J1
P1J2
P2J1
P2J2

Advantages of Normalization ​

  • Reduced data redundancy
  • Improved data consistency
  • Simplified database design
  • Improved query performance
  • Easier database maintenance

When to Use Normalization vs Denormalization ​

Normalization is best for:

  • Transactional systems (banking, enterprise apps)
  • Where data integrity is paramount

Denormalization is ideal for:

  • Read-heavy applications
  • Data warehousing
  • Reporting systems
  • Where performance is more critical than integrity

Transactions & Concurrency Control ​

A transaction is a sequence of one or more operations performed on the database as a single logical unit of work. It ensures that either all operations are successfully executed (committed) or none of them take effect (rolled back).

ACID Properties ​

Atomicity ​

  • Transaction is all or nothing
  • If any part fails, entire transaction is rolled back
  • Example: Money transfer - both debit and credit must succeed

Consistency ​

  • Transaction must keep database in valid state
  • All constraints must hold
  • Moves from one consistent state to another

Isolation ​

  • Transactions run independently
  • One transaction's operations should not affect another's
  • Example: Two users withdrawing from same account

Durability ​

  • Once committed, changes are permanent
  • Survive system crashes
  • Example: Updated balance remains after power failure

Transaction Schedules ​

Serial Schedule:

  • Transactions execute one at a time
  • Ensures consistency but reduces throughput

Non-Serial Schedule:

  • Multiple transactions execute concurrently
  • Interleaving operations
  • Improves efficiency but requires concurrency control

Concurrency Control ​

Ensures consistency and integrity when multiple transactions execute simultaneously. Main goal is to prevent:

  • Dirty Read: Reading uncommitted data
  • Lost Update: Two transactions overwrite each other's changes
  • Unrepeatable Read: Same query returns different values
  • Phantom Read: New rows appear in repeated queries

DBMS uses locks, timestamps, and isolation levels to prevent these issues.


Indexing in DBMS ​

Disk Structure ​

Disk Structure

  • Each block typically 512B
  • Multiple rows stored per block
  • Index table reduces search blocks needed

Index Benefits ​

Without indexing:

  • Need to scan all data blocks
  • Example: 100 records, 4 per block = 25 blocks to search

With indexing:

  • Index table stores key-value pairs
  • Key = ID, Value = pointer to block
  • Drastically reduces search time

B-Trees ​

B-Tree is a specialized m-way tree designed for disk-based storage systems.

B-Tree

Properties:

  • All leaf nodes at same level
  • Keys stored in ascending order
  • Non-leaf nodes (except root) have at least m/2 children
  • Keys inserted bottom-up

Height formulas:

  • Minimum height: ⌈log_m(n+1)βŒ‰ - 1
  • Maximum height: ⌈log_t(n+1)βŒ‰

B+ Trees ​

Advanced version where:

  • All data stored in leaf nodes
  • Internal nodes contain only keys for navigation
  • Leaf nodes are linked together
  • Supports efficient range queries

BST vs B-Tree ​

BST:

  • Each node has at most 2 children
  • O(log n) if balanced, O(n) if skewed
  • Stored in memory (RAM)

B-Tree:

  • Many keys per node
  • Fewer disk I/O operations
  • Always balanced
  • Optimized for databases

B-Tree vs B+ Tree ​

FeatureB-TreeB+ Tree
Data StorageInternal + leaf nodesLeaf nodes only
Search EfficiencyMay stop at internal nodesAlways goes to leaf
Range QueriesNeeds in-order traversalFast (linked leaves)
Space UsageLarger heightSmaller height
Use CaseGeneral purposeDBMS, file systems

Types of Indexing ​

Primary Index ​

  • Built on primary key
  • Ordered index
  • Can be dense or sparse

Primary Index

Dense Primary Index:

  • Entry for every record
  • Fast search but large size

Dense Primary Index

Sparse Primary Index:

  • One entry per block
  • Smaller index, slightly slower

Sparse Primary Index

Clustering Index ​

  • Built on non-primary key field
  • Records physically grouped by clustering field
  • One entry per distinct value

Clustering Index

Secondary Index ​

  • Built on non-clustering field
  • Data file can be unordered
  • Dense index (entry for every record)
  • Multiple secondary indexes allowed

Secondary Index

Dense vs Sparse Index ​

Dense and Sparse Index

FeatureDense IndexSparse Index
EntriesOne per recordOne per block
SpaceHighLow
SearchVery fastSlower
MaintenanceCostlyEasier

Single Level vs Multilevel Indexing ​

Single vs Multilevel

Single Level:

  • One index file
  • Slower for large data

Multilevel:

  • Indexes on indexes
  • Hierarchical structure (like B-Tree)
  • Much faster for large databases

Index Types Summary ​

Index Types

Index TypeKey FieldRecord OrderMultiple Allowed
PrimaryPrimary KeySortedNo
SecondaryNon-PrimaryUnsortedYes
ClusteredAny (usually PK)SortedNo
Non-ClusteredAnyUnsortedYes

NoSQL Databases ​

NoSQL stands for "Not Only SQL". Designed for:

  • Unstructured/semi-structured data
  • High scalability and availability
  • Schema-less flexibility
  • Distributed architecture

Key Characteristics ​

  • Schema-free: No predefined structure
  • Non-tabular: Uses JSON, key-value, etc.
  • Handles big data: High volume and velocity
  • Horizontal scaling: Distribute across servers
  • Cloud-friendly: Built for cloud-native apps

Advantages ​

Flexible Schema ​

  • No fixed schema required
  • Easy to adapt to changes

Horizontal Scaling ​

  • Add more servers to handle load
  • Achieved via sharding or replica sets

High Availability ​

  • Auto-replication across servers
  • System continues if nodes fail

Fast Reads & Writes ​

  • No joins needed
  • Ideal for real-time apps

Built-in Caching ​

  • Some NoSQL DBs (like Redis) serve as in-memory caches

When to Use NoSQL ​

  • Fast-paced Agile development
  • Unstructured/semi-structured data
  • Huge data volumes
  • Scalable architecture needed
  • Real-time streaming, IoT, microservices

Types of NoSQL Databases ​

Key-Value Store ​

  • Simplest type: key-value pairs
  • Examples: Redis, DynamoDB
  • Use: Caching, session management

Document Store ​

  • Stores JSON-like documents
  • Flexible schema
  • Examples: MongoDB, CouchDB
  • Use: E-commerce, CMS, mobile apps

Column-Oriented Store ​

  • Stores data in columns
  • Efficient for analytics
  • Examples: Cassandra, HBase
  • Use: Time-series, IoT, analytics

Graph Store ​

  • Nodes and edges for relationships
  • Examples: Neo4j, Amazon Neptune
  • Use: Social networks, fraud detection

Disadvantages ​

  • Data redundancy
  • Costly deletes/updates
  • Not one-size-fits-all
  • Limited ACID support (varies)
  • Lacks constraint enforcement

SQL vs NoSQL ​

FeatureSQLNoSQL
Data ModelTablesJSON, key-value, graphs
SchemaFixedFlexible
ScalingVerticalHorizontal
ACIDFull supportPartial
JOINsSupportedNot needed
ExamplesMySQL, PostgreSQLMongoDB, Redis, Cassandra

Database Types ​

1. Relational Databases (RDBMS) ​

  • Based on Relational Model
  • Data in tables with rows and columns
  • Use SQL
  • ACID properties
  • Examples: MySQL, PostgreSQL, Oracle

2. Object-Oriented Databases ​

  • Based on OOP concepts
  • Data as objects with methods
  • Examples: ObjectDB, db4o
  • Use: CAD/CAM, scientific apps

3. Hierarchical Databases ​

  • Tree-like structure
  • Parent-child relationships
  • One-to-many only
  • Examples: IBM IMS
  • Use: File systems, XML

4. Network Databases ​

  • Graph structure
  • Many-to-many relationships
  • Examples: IDMS
  • Use: Telecom, engineering

Clustering in DBMS ​

Clustering connects multiple servers to manage a single database system.

Benefits ​

Data Redundancy ​

  • Each node has data replica
  • Prevents data loss

Load Balancing ​

  • Requests distributed across servers
  • No single machine overloaded

High Availability ​

  • System remains available if nodes fail
  • Critical for mission-critical systems

Scalability ​

  • Easily add new nodes

Fault Tolerance ​

  • Other servers continue if one fails

Architecture ​

Load Balancer
  β”œβ”€β”€ Node 1 (DB)
  β”œβ”€β”€ Node 2 (DB)
  └── Node 3 (DB)

Partitioning & Sharding ​

Partitioning ​

Splitting a large table into smaller, manageable parts.

Vertical Partitioning ​

  • Divides by columns
  • Each partition stores subset of attributes

Horizontal Partitioning ​

  • Divides by rows
  • Each partition contains different rows

When to Use ​

  • Dataset too large
  • Slow query response
  • Multiple concurrent requests

Benefits ​

FeatureBenefit
ParallelismMultiple queries on different partitions
AvailabilityIf one fails, others online
PerformanceSmaller tables, faster indexes
ManageabilityEasier backup/archival

Sharding ​

Special form of horizontal partitioning where each shard is on a separate database server.

Example:

  • Users 1-1M β†’ Shard 1
  • Users 1M-2M β†’ Shard 2

Advantages ​

  • Scalability
  • Availability
  • Performance

Disadvantages ​

  • Complexity
  • Data skew
  • Re-sharding difficulty
  • Analytical queries harder

Scatter-Gather Problem ​

Query must search all shards even if few records match.

sql
SELECT * FROM users WHERE email LIKE '%@gmail.com';

Results:

  • High network traffic
  • Increased query time
  • CPU usage on all shards

CAP Theorem ​

Distributed system can guarantee only two of three:

  • C – Consistency: All nodes see same data
  • A – Availability: Every request gets response
  • P – Partition Tolerance: System works despite network failures

CAP Trade-Offs ​

ModelGuaranteesSacrificesExampleUse Case
CPConsistency + PartitionAvailabilityMongoDBBanking
APAvailability + PartitionConsistencyCassandraSocial apps
CAConsistency + AvailabilityPartitionMySQLEnterprise

Note: Partition Tolerance is essential in distributed systems, so real choice is between Consistency and Availability.


Master-Slave Architecture ​

  • Master DB: Handles all writes
  • Slave DBs: Handle reads (replicas)

How It Works ​

  • Writes go to master
  • Reads served from slaves
  • Replication: Synchronous or Asynchronous

CQRS Pattern ​

  • Command (Write): Master
  • Query (Read): Slaves

Advantages ​

  • Scalability
  • High availability
  • Load balancing
  • Data safety
  • Performance

Challenges ​

  • Replication lag
  • Write bottleneck
  • Complex configuration

Scaling Patterns (Case Study) ​

Pattern 1: Query Optimization ​

  • Use indexes
  • Caching (Redis)
  • Connection pooling

Pattern 2: Vertical Scaling ​

  • Upgrade hardware
  • More RAM/CPU

Pattern 3: CQRS ​

  • Read replicas
  • Separate read/write

Pattern 4: Multi-Primary ​

  • Distribute writes
  • Logical ring replication

Pattern 5: Functional Partitioning ​

  • Split by functionality
  • User DB, Trip DB, etc.

Pattern 6: Sharding ​

  • Horizontal scaling
  • 50+ machines
  • Region-based

Pattern 7: Data Center Partitioning ​

  • Multiple global data centers
  • Geo-routing
  • Cross-DC replication

Summary ​

This comprehensive guide covers:

Database Fundamentals ​

  • ER diagrams and relationships
  • Normalization (1NF through 5NF)
  • Keys and constraints
  • Database architecture

Indexing ​

  • B-Trees and B+ Trees
  • Primary, secondary, clustered indexes
  • Dense and sparse indexing

Transactions ​

  • ACID properties
  • Concurrency control
  • Transaction schedules

Advanced Topics ​

  • NoSQL databases
  • Clustering and partitioning
  • CAP theorem
  • Scaling patterns

CS Preparation Notes