Normalization of relations.
Normalization of relations
NORMALIZING RELATIONS
Normalizing is the process of restructuring relations to a form which: -
• Minimizes duplication of data in a database
• Operations of adding, deleting, modifying data in a database do not lead to inconsistent data in a database
• Retrieval of data simplified
WHY NORMALIZE ?
Relations are normalized to ensure that, collection of relations do not unnecessarily hold duplicate data. It is easy to modify a data item as it gets modified in all relations where it appears and hence no consistency is there. When data is deleted accidentally, required data does not get deleted. It also simplifies retrieval of required data.
HOW ARE RELATIONS NORMALIZED ?
UNNORMALIZED RELATION
1.Observe order for many items
2.Item lines has many attributes-called composite attributes
3.Each tuple has variable length
4.Difficult to store due to non-uniformity
5.Given item code difficult to find qty-ordered and hence called Unnormalized relation
FIRST NORMAL FORM
Identify the composite attributes, convert the composite attributes to individual attributes. Duplicate the common attributes as many times as lines in composite attribute. Every attribute now describes single property and not multiple properties, some data will be duplicated. Now this is called First normal form (1NF) also called flat file.
FIRST NORMAL FORM – 1NF
HIGHER NORMAL FORMS
First normal form is first essential step in normalization. Higher normal forms known as 2NF, 3NF, BCNF, 4NF, 5NF also exist. Each is an improvement of the preceding one. A higher normal form also satisfies requirements of a lower normal form
Higher normalization steps are based on :
• Detecting dependence between attributes
• Identifying key attributes
• Detecting multivalued dependency between attributes
FUNCTIONAL DEPENDENCY
Given X,Y as two attributes in a relation
Given X if only one value of Y corresponds to it then Y is functionally dependent on X
WHY NORMALIZE RELATIONS-REVISITED
We normalize relations to ensure the following:
While operating on data base we do not lose data or introduce inconsistencies. Insertion of new data should not force leaving blank fields for some attributes. We do not delete vital information during update. In a normalized relation it is possible to change the values of the attribute without exhaustively searching all tuples of the relation.
EXAMPLE TO SHOW NEED FOR NORMALIZATION
DELETION: If order no1886 is deleted the fact that item code 4629 costs 20.25 is lost
UPDATE: If price of item 4627 is changed, all instances of this item code have to be changed by exhaustive search-errors possible
IDEAL NORMALIZATION
At the end of normalization a normalized relation
•Should have no data values duplicated in rows
•Every attribute in a row must have a value
•Deletion of a row must not lead to accidental loss of information
•Adding a row should not affect other rows
•A value of an attribute in a row can be changed independent of other rows
SECOND NORMAL FORM (2NF)
A relation is in 2NF if it is in 1NF, non-key attributes are functionally dependent on key attribute and if there is a composite key then no non-key attribute is functionally depend on one part of the key.
2NF FORM
1 NF Orders Relation
NON KEY ATTRIBUTES WHOLLY DEPENDENT ON KEY
• Repetition of order date removed.
• If order 1886 for item 4629 is cancelled the price/unit is lost in INF as the whole tuple would be deleted.
• In 2NF item price not lost when order 1886 for item 4629 cancelled. Only row 4 in order details deleted.
• Duplication of data in a relation is not there.
THIRD NORMAL FORM
A Relation in 2NF may have functional dependency between some Non-key attributes. This needs further normalization as the non-keys being dependent leads to unnecessary duplication of data. Normalization to 3NF ensures that there is no functional dependency between non-key attributes.
EXAMPLE
Student (Roll no, name, dept, year, hostelname )
- If students in a given year are all put in one hostel then year and the hostel are functionally dependent
- Year implies hostel-hostel name unnecessarily duplicated
- If all students of year 1 are moved to another hostel many tuples need to be changed.
NORMALIZATION TO 3NF
Student( Roll no, name, dept, year )
Hostel (year, hostel)
This is in 3NF
Example :1
Employee (empcode,name,salary,project no,termination date of project)
* termination date (non-key attribute)
Dependent on project no. another non-key attribute
•Thus needs normalization
3NF relations :
Employee(empcode,name,salary,projectno)
Example:2
Passenger(Ticket code,Passenger name,Train no,Departure time,Fare) Train no. and departure time are non-key attributes and are functionally dependent
3NF Relations :
Passenger(Ticket code ,Passenger name,Train no, Fare)
Train details (Train no., departure time)
BOYCE-CODD NORMAL FORM
Assume
* Relation has more than 1 possible key
* Keys are composite
* Composite keys have common attribute
* Non-key attributes not dependent on one another
Thus though the relation is in 3NF, still there could be problems due to unnecessary duplication and loss of data accidentally.
EXAMPLE
Professor (Prof code, Dept, Head of Dept, Percent time)
• Observe two possible composite keys (Prof code, Dept) or (Prof code,Head of Dept)
• Observe Head of dept name is repeated
• If professor P2 resigns the fact that Rao is Head of Chemistry is lost as lines 3 & 4 will be deleted
The dependency diagrams are:
•Percentage time a Prof. spends in the department is dependent on Prof code and Department
• Head of Dept depends on department
NEED FOR BCNF
Observe the given relation is in 3NF as non key attributes are independent of one another and wholly dependent on key. However there are problems due to the fact that there are two possible composite keys, and attribute of on of the composite key depends on a attribute of other possible composite key
NORMALIZING TO BCNF
•Identify the dependent attributes in the possible composite keys
•Remove them and create a new relation
EXAMPLE
Composite keys
FOURTH NORMAL FORM
4NF is needed when there are multi-valued dependencies
• Example :
(Vendor, Project, Item) relations Assumptions :
-A vendor capable of supplying many items to many projects
-A project needs many items
-Project may order the same item from many vendors
Vendor-Project-Item supply capability relation
Problems
•Item I1 duplicated for VI and also for V3
•If VI is to supply to project P2 but the item to be supplied is not decided there will be blank in item column
Solution:
•Split vendor-project-item relation into two relations
•Resulting relation must have not more than one independent multivalued dependency
RESULTING RELATIONS
Vendor Capability Project needs
OBSERVE NO UNNECESSARY DUPLICATION NEED FOR 5NF
•In 4NF relations vendor capability to supply items and projects need for items are there.
•They are obtained by splitting the given relation
•Looking at relation project-item we see that project P2 requires item I1
•From vendor item relation we see that I1 is supplied by V1.
•This would lead us to infer that(V1,P1,I1)must be a tuple in the original relation but it is not.In other words V1 does not supply item I1 to project P2.
•This spurious tuple has occurred because vendor V1 may not be allowed to supply item I1 to project P2
•Similarly another spurious tuple is (V3, P3, I1)
•We thus need a third relation which specifies the vendors who are allowed to supply to projects
Additional relation for 5NF
The above relation is in addition to the two relations of 4NF.
Comments
Post a Comment