**Normalization** (notes by Sumit Verma)

I will discuss all the key points of Normalization in DBMS in this blog.

Normalization is the process of breaking up the table or relation into multiple tables or relations so that we can reduce redundancy and we can make our database more efficient as well as error free.

Consider **StudentInfo** table,

Id | Name | Department |

1 | S1 | CSE |

2 | S2 | EC |

3 | S3 | CSE |

4 | S5 | IT |

5 | S4 | CSE |

Now if you have to add one more column for name of HOD also, so now Student table will look like,

Id | Name | Department | HOD |

1 | S1 | CSE | Prof. A |

2 | S2 | EC | Prof. B |

3 | S3 | CSE | Prof. A |

4 | S5 | IT | Prof. C |

5 | S4 | CSE | Prof. A |

Here you can see that whenever name of department is repeating, HOD name is also repeating, this is causing redundancy in the database.

Now suppose we also want to add phone number column for every department. So our table will be like,

Id | Name | Department | HOD | PNO |

1 | S1 | CSE | Prof. A | 32272 |

2 | S2 | EC | Prof. B | 67897 |

3 | S3 | CSE | Prof. A | 32272 |

4 | S5 | IT | Prof. C | 67327 |

5 | S4 | CSE | Prof. A | 32272 |

Now suppose if phone number of CSE department changed from 32272 to 32222, you have to make the changes at three places.

Id | Name | Department | HOD | PNO |

1 | S1 | CSE | Prof. A | 32222 |

2 | S2 | EC | Prof. B | 67897 |

3 | S3 | CSE | Prof. A | 32222 |

4 | S5 | IT | Prof. C | 67327 |

5 | S4 | CSE | Prof. A | 32222 |

The same is true for deletion also. So you can see that for insertion, deletion as well as updation, this database is inefficient.

So we will use normalization here,

We will break StudentInfo table into two tables as Student and Department as,

**Student**:

Id | Name | Department |

1 | S1 | CSE |

2 | S2 | EC |

3 | S3 | CSE |

4 | S5 | IT |

5 | S4 | CSE |

**Department**:

Dept_ID | HOD | PNO |

CSE | Prof. A | 32222 |

EC | Prof. B | 67897 |

IT | Prof. C | 67327 |

We can make Department column of Student table as a foreign key which will refer to Dept_ID of Department table.

Here whenever we will need details of department, we will refer to the **Department** table with the help of the foreign key.

Now, suppose you want to update phone number of any department so you just need to update that only one place. Hence, our database is more efficient now.

**First Normal Form (1NF)**:

The following criteria must be satisfied for 1NF.

- Values of each attribute are atomic.
- No composite values
- All entries in any column must be of the same kind
- Each column must have a unique name
- No two rows are identical

Consider the below **Student **table

Roll | Name | Courses |

101 | Asif | DBMS, CN, CD, SE |

102 | Amit | CO, OS |

103 | Arpit | CD, OS, CN, DBMS |

Here you can see that column Courses contain multiple values in a single row. This is not allowed in 1NF.

We can convert this table into 1NF as below,

Roll | Name | Courses |

101 | Asif | DBMS |

101 | Asif | CN |

101 | Asif | CD |

101 | Asif | SE |

102 | Amit | CO |

102 | Amit | OS |

103 | Arpit | CD |

103 | Arpit | OS |

103 | Arpit | CN |

103 | Arpit | DBMS |

**Second Normal Form (2NF):**

A relation is said to be in 2NF if it is in 1NF and all non-prime attributes are fully functional dependent on any key of R.

Example: Consider the relation R(A,B,C,D,E,F) and the following functional dependencies,

F = { A→ BCDEF

BC →ADEF

B→F

D→E }

Candidate Keys = {A, BC}

Prime Attributes = A, B, C

Non prime Attributes = D, E, F

BC is a candidate key so any of it’s subset can not determine other attributes of the relation but B is determining attribute F here.

So B→F is a partial dependency and hence this relation is not in 2NF.

In order to make it 2NF, we will break the relation into two parts: R1 (ABCDE) and R2 (BF).

Now the relations R1 and R2 are in 2NF.

**Third Normal Form (3NF):**

*Criteria for 3NF:*

1. Relation should be in 2NF.

2. No non-prime attribute should be transitively dependent on candidate key.

OR There should not be the case that a non prime attribute is determined by another non prime attribute.

Example: Consider the relation R(A,B,C,D,E,F) and the following functional dependencies,

F = { A→ BCDEF

BC →ADEF

B→F

D→E }

**Check for 2NF**:

Candidate Keys = {A, BC}

Prime Attributes = A, B, C

Non prime Attributes = D, E, F

BC is a candidate key so any of it’s subset can not determine other attributes of the relation but B is determining attribute F here.

So B→F is a partial dependency and hence this relation is not in 2NF.

In order to make it 2NF, we will break the relation into two parts: R1 (ABCDE) and R2 (BF).

Now the relations R1 and R2 are in 2NF.

**Check for 3NF:**

We have two relations now R1 (ABCDE) and R2 (BF).

**R****2**** (BF)**: We have B→F dependency here.

Candidate key = B

Prime Attribute = B

Non Prime Attribute = F

Since there is only one non prime attribute so no chance of violation of 3NF. So *R**2*** (BF) is in 3NF**.

**R****1**** (ABCDE)**: We have following dependencies here:

A→ BCDEF

BC →ADEF

D→E

Candidate keys = A, BC

Prime Attributes = A,B,C

Non Prime Attributes = D, E

Here D→E dependency is violating 3NF because ‘D’ is a non prime attribute and it is determining another non prime attribute ‘E’. So in order to make that relation in 3NF we will break the relation into two parts as,

1. R11(ABCD)

2. R12(DE)

Now all relations *R**11**(ABCD), R**12**(DE) and R**2*** (BF) are in 3NF**.

**Boyce Codd Normal Form (BCNF):**

Criteria for BCNF:

1. Relation should be in 3NF

2. For each dependency X→Y, X should be a super key.

Example: Consider the realtion R(ABCD) and the following FDs,

F = { A→BCD

BC→AD

D→B

}

Candidate Keys = A, BC

Prime Attributes = A, B, C

Non prime attributes = D

**Check for 2NF**:

There is no partial dependency, hence relation is in 2NF.

**Check for 3NF**:

There is only one non prime attribute so no chance of 3NF violation. Hence relation is in 3NF.

**Check for BCNF**:

Here *D→B is violating BCNF because ‘D’ is not a superkey*. So it is not in BCNF.

In order to make it BCNF, relation will be divided into two parts as,

1. R1(ADC)

2. R2(DB)

Now **relations R****1****(ADC) and R****2****(DB) are in BCNF.**

**Thanks for reading.**