Database Normalization
What do you mean by Database Normalization?
Database normalization is a process used to organize a database into tables and columns to minimize redundancy and dependency. The main goal is to divide large tables into smaller, more manageable ones, reducing data duplication and ensuring data integrity. The process involves several stages, known as normal forms (NF), each with specific criteria. Here’s a breakdown:
First Normal Form (1NF) in Database Normalization
- Atomicity: Each column must contain atomic (indivisible) values, meaning each column should contain only one value per row.
- Uniqueness: Each column should have a unique name.
- Order irrelevance: The order in which data is stored does not matter.
Example
Suppose we have a table with the following structure:
OrderID | CustomerName | Products |
1 | John Doe | Apple, Banana |
2 | Jane Smith | Banana, Orange |
This table violates 1NF because the “Products” column contains multiple values. To normalize it to 1NF, we split it into separate rows:
OrderID | CustomerName | Products |
1 | John Doe | Apple |
1 | John Doe | Banana |
2 | Jane Smith | Banana |
2 | Jane Smith | Orange |
Second Normal Form (2NF) in Database Normalization
- 1NF compliance: The table must already be in 1NF.
- Elimination of partial dependency: All non-key attributes must be fully functionally dependent on the primary key.
Example
Consider a table in 1NF:
OrderId | Product | CustomerName | CustomerAddress |
1 | Apple | John Doe | 123 Elm St |
2 | Banana | John Doe | 123 Elm St |
The “CustomerAddress” depends only on “CustomerName” (partial dependency), not the whole primary key (“OrderID” and “Product”). To normalize to 2NF, we create separate tables:
Orders Table:
OrderID | Product |
1 | Apple |
1 | Banana |
Customers Table:
CustomerName | CustomerAddress |
John Doe | 123 Elm St |
Third Normal Form (3NF) in Database Normalization
- 2NF compliance: The table must already be in 2NF.
- Elimination of transitive dependency: Non-key columns should not depend on other non-key columns.
Example
A table in 2NF might look like this:
OrderID | Product | CustomerID | CustomerAddress |
1 | Apple | 1001 | 123 Elm St |
1 | Banana | 1001 | 123 Elm St |
Here, “CustomerAddress” depends on “CustomerID”, not directly on the primary key. To normalize it to 3NF, we separate the dependent attributes into a new table:
Orders Table:
OrderID | Product | CustomerId |
1 | Apple | 1001 |
1 | Banana | 1001 |
Customers Table:
CustomerID | CustomerAddress |
1001 | 123 Elm St |
Boyce-Codd Normal Form (BCNF) in Database Normalization
- 3NF compliance: The table must already be in 3NF.
- More stringent condition: Every determinant must be a candidate key.
Example
Consider a table:
CourseID | Instructor | Department |
C101 | Dr. Smith | Physics |
C102 | Dr. Jones | Chemistry |
If each department has only one instructor, and each course is taught by one instructor, we might face anomalies. To normalize to BCNF, we ensure that every determinant is a candidate key:
Courses Table:
CourseID | Instructor |
C101 | Dr. Smith |
C102 | Dr. Jones |
Instructors Table:
Instructor | Department |
Dr. Smith | Physics |
Dr. Jones | Chemistry |
Fourth Normal Form (4NF) in Database Normalization
- BCNF compliance: The table must already be in BCNF.
- No multi-valued dependencies: A record should not have two or more independent multi-valued facts about an entity.
Example
Consider a table:
StudentID | Course | Hobby |
1001 | Math | Chess |
1001 | Science | Chess |
1001 | Math | Painting |
Here, “Course” and “Hobby” are independent multi-valued facts about “StudentID”. To normalize to 4NF, we separate them into different tables:
StudentCourses Table:
StudentId | Course |
1001 | Math |
1001 | Science |
StudentHobbies Table:
StudentId | Hobby |
1001 | Chess |
1001 | Painting |
Fifth Normal Form (5NF) in Database Normalization
- 4NF compliance: The table must already be in 4NF.
- No join dependency: Decomposing the table should not result in any loss of data when the tables are joined back together.
Summary
Database normalization involves organizing the columns and tables of a database to reduce redundancy and improve data integrity. The stages of normalization are designed to ensure that the data is stored efficiently and logically, reducing anomalies and improving the consistency of the database.