Objectives
6.1 Fourth Normal Form
6.2 An aside on 4NF
6.3 Fifth Normal Form
6.4 Normalisation is not a panacea
Functional dependencies are the most important dependencies in database design. However there are forms of redundancy that cannot be detected from functional dependencies, but can be detected using other dependencies. There are two further normal forms Fourth Normal Form and Fifth Normal Form which deal with these additional forms of redundancy.
You also consider some cases where normalisation does not give the best conceptual design.
Reading
R&G Chapter 19 section 19.8.
After you have studied this unit, you should be able to:
- explain what is meant by 4NF and 5NF
- normalise a set of relations up to 5 NF given adequate information
- be able to pose questions to elicit from the user the information needed to normalise to 5NF
- be aware of cases where normalisation does not give an appropriate design.
Normalisation up to BCNF is concerned with single valued facts. Fourth normal form is concerned with multivalued facts.
4NF asks whether an all key relation represents ONE complex multivalued fact, or two (or more) simpler multivalued facts.
Consider :
Student4 (Sno, sname, tutorno, tutorname, courseno, coursename)
FDs
- Sno-> sname
- tutorno -> tutorname
- courseno->coursename
Using the closure algorithm we can determine that the key is {Sno, tutorno, courseno}.
Casting into BCNF we get
Student ( Sno, Sname)
Tutor ( tutorno, tutorname)
Course ( courseno,coursename)
and
STC (Sno, tutorno, courseno)Student, Tutor and Course are in BCNF and we can be sure there is no redundancy.
STC is also in BCNF. But could there be redundancy in STC? The answer is that depends on the semantics of STC. We will look at two cases
CASE1: STC means that each student does a set of courses, and each course is tutored by a set of tutors.
suppose we want to represent
- course c1 is tutored by tutors t1,t3,t5
- course c5 is tutored by tutors t3,t7
- course c1 is studied by students s1,s2
- course c5 is studied by student s1
We could represent this information in the relation STC as follows:
student course tutor s1 c1 t1 s1 c1 t3 s1 c1 t5 s1 c5 t3 s1 c5 t7 s2 c1 t1 s2 c1 t3 s2 c1 t5 However there is quite a bit of redundancy, we see that student s1 studies course c5 more than once. In this case STC would be subject to update anomalies such as:
- we cannot store which tutors teach a course if no student is doing the course
- if the set of tutors changes it has to be changed for every student doing the course
- it is possible to introduce inconsistency (how?)
If fact although there is no functional dependency in STC there is a multivalued dependency (MVD).
Reading
Read R&G 19.8.1 for an explanation of a multivalued dependency.
Can you see that
- course ->-> student
- course ->-> tutor
A relation R is in 4NF iff for every MVD X ->->Y that holds in R either
- Y is a subset of X or
- XY = R or
- X is a superkey
In our example course ->-> student, but none of the above is true, therefore STC is not in 4NF.
STC should be split into two relations as follows:
student course s1 c1 s1 c5 s2 c1 and
course tutor c1 t1 c1 t3 c1 t5 c5 t3 c5 t7 This contains the same information as STC but is not subject to update anomalies.
CASE 2:
Now suppose that the semantics of STC are that different students see different tutors for the courses they study e.g.
- s1 is taught c1 by tutors t1 and t7
- s2 is taught c1 by tutors t1, t3 and t5
In this case course does not multidetermine student, and STC is in 4NF. The three attributes are needed together to represent the information. (Try splitting it as in CASE1, what goes wrong?)
Another example : Consider the relation
Drinks (Name, Pub, Beer)
This might mean that a drinker (Name) is picky about what he/she drinks where: maybe Fred likes to drink IPA or Luton Pride in the Hens' Teeth, but only likes Tweekstons in the Ferreter's Arms. In this case only one (complex) multivalued fact is being held, and the relation cannot be losslessly decomposed. It is already in 4NF.
However, it might be that Drinks represents TWO simpler multivalued facts, the set of pubs a drinker likes to visit, and the set of drinks they like to drink. If we use Drinks to hold these two multivalued facts it will be subject to update anomalies. In this case normalisation to 4NF would show that Drinks should be cast into (Name, Beer) and (Name, Pub).
Exercises
1) Suppose we have a relation SPJ (sno, pno, jno), this shows which suppliers (snos) supply which parts (pnos) to which projects (jnos). What question(s) would you ask your user to find out whether SPJ is in 4NF? Remember you must phrase your question in terms of the problem domain, it is probably no good asking the user about MVDs!
2) Suppose you have a relation concerned with cookery as follows:
uses ( cook, ingredient, dish )
What questions would you ask the user to find out whether uses is in 4NF or not?
To Contents
Recall the relation SPJ (sno, pno, jno), this shows which suppliers (snos) supply which parts (pnos) to which projects (jnos).
Suppose you have decided it is in 4NF, it represents one complex multivalued fact -- which supplier supplies which parts to which projects.
In this case you need the complete SPJ (sno, pno, jno) relation to represent this complex fact.
THIS DOES NOT MEAN that you do not need relations such as canSupply(sno,pno), i.e. a list of which parts each supplier can supply, AS WELL. In an ideal world the roles of the attributes will have been carefully thought out before you begin to normalise, such an expectation is unrealistic.
To Contents
Reading
R&G 19.8.3 and 19.8.4
Fifth Normal Form is the last normal form based on lossless-join decomposition. An informal summary of 5NF is 'if even after 4NF you can still find a lossless way of splitting an all key relation, then you should do it'.
There is no easy way of testing whether a relation is in 5NF other than testing every possible split.
Here is an example which shows a relation which can be split into 3 projections without loss of data:
Flying Quills is a business that provides secretarial support for multi-lingual conferences. Thus
- conference1 might want support in English, French, German
- conference2 might want Mandarin and Russian
Flying Quills has a number of secretaries with various language skills.
- Fred is fluent in English, French, Russian;
- Sven is fluent is Swedish;
- Sally is fluent in English and German.
If a secretary attends a conference then they do provided support in all the languages in which they are skilled.
Clearly Flying Quills want to see some output something like this:
conference need support conference1 English French
German
Fred, Sally Fred
Sally
conference2
Mandarin Russian
??? Fred
Now this is not normalised at all. Should this view be supported by the following base table?
FQ (conference, need, support)
conference need support conference1 English Fred conference1 English Sally conference1 French Fred conference1 German Sally conference2 Russian Fred conference2 Mandarin no-one none Swedish Sven Well this table turns out to be subject to many update anomalies. If we delete Fred we would also delete that conference1 need French, and conference2 needs Russian. We could start putting no-one in in place of Fred's name, but again this is a recipe for disaster, do we put no-one in line1 and if not how do we know not to?
Flying Quills depend on their reliability: this is just not going to work.
However we can cast the relation into 3 lossless projections... (omitting projections with none or no-one in as they are a bodge)
needs
conference need conference1 English conference1 French conference1 German conference2 Russian conference2 Mandarin
provides
need support English Fred English Sally French Fred German Sally Russian Fred Swedish Sven
assigned
conference support conference1 Fred conference1 Sally conference2 Fred
Oddly if you join (natural join) two of these relations you potentially create spurious tuples, but if you join all three
e.g. ((needs nat join provides) nat join assigned)
the result is lossless.
So, Flying Quills should store their data in three tables and derive the overall view as an when they want it.
Recognising Relations not in 5NF
Firstly any relation in 3NF which is not all key is also in 5NF.
Relations which are in 4NF but not 5NF are rare.
A relation is in 4NF but not 5NF if and only if a specific pattern occurs in the data, and that is that three facts in the data imply a third
- fact a,b e.g. conference1 needs English, and
- fact b,c e.g. English is one of Fred's skills, and
- fact c,a e.g. Fred is attending conference1
imply
- fact a,b,c Fred will provide English at conference1
To Contents
In this section we consider the semantics of the data, and whether the normalisation process has been semantically rich enough:
It is not sufficient just to normalise data and leave it at that, there are a number of questions that should be asked about the data before the design is implemented. We look at 3 examples, there are more, as you gain experience in database design you will be able to add to this list.
a) Are any key values likely to be null?
Given relation R (A,B,C,D)
and FDs {{A->B}, {A->C}, {C->D}}
Normalisation produces:
R1(A,B,C) & R2(C,D)
But what if values of the attribute C are prone to being unknown, or not existing.
For example
Person (PersonId, name, birthdate, starsign)
- PersonId-> name, birthdate
- birthdate-> starsign
Suppose people often withhold their birthdate, but will disclose their star sign e.g.
PersonId name birthdate starsign p07 Mallinder null Libra If we normalise we can store
PersonId name birthdate p07 mallinder null but we can't put anything in here as we can't have a null value for a key
birthdate starsign So, if we normalise we cannot store that P07's starsign is Libra.
If key values are likely to be null, the design must be denormalised.
b) are there any hidden entities
Consider the following, it is a small hospital and all consultants can be uniquely identified by name.
Patient2 ( patientNo, patientName, consultantName)
patientNo patientName consultantName p234 Jones Smith p456 Heppinstall Smith p007 Peace Mallinder
What normal form is Patient2 in? The answer is that it is in 5NF, because it is very definitely in 3NF and it is not all key.
Yet there are update anomalies e.g. where would you store the name of a new consultant who hasn't yet got a case load?
It would be better to have a consultant entity. You can either make a separate existence list of consultants.
Patient2 ( patientNo, patientName, consultantName)
patientNo patientName consultantName p234 Jones Smith p456 Heppinstall Smith p007 Peace Mallinder
Consultant (consultantName)
consultantName Smith Smith Mallinder
or you could introduce a consultant identifier number in which case Consultant would be a result of normalisation
c) Are there any 'near' alternate keys
Suppose we had information about students, their registration number, their name, their home address and their LEA (local education authority).
Now home address does determine LEA. So normalisation would give:
Student (regNo, name, address) & Address (address, LEA)
This is not sensible,
- occasionally siblings attend the University at the same time, but it is unusual.
- we have no need to store an address in Address unless it also occurs also in Student (i.e. it is some student's home address)
- we never need to 'work out' the LEA from the address
In this case, where 'address' is almost an alternate key of Student, it is better to keep the design as below:
Student2 (regNo, name, address, LEA)
These notes are concerned with one aspect of the logical database design, i.e. normalisation.
The technique of normalisation is founded on strong theory, which we follow to 5NF. Fifth Normal Form is the final normal form based on lossless-join decomposition.
The unit concludes with a discussion of some of the limitations of normalisation.
To Contents
functional dependency update anomaly lossless join 1,2,3,4,5NF, BCNF full functional dependency multi-valued dependency trivial dependency logical database design single valued fact multivalued fact
To contents
6.1 exercise, you could ask a questions such as "do suppliers supply parts to a common pool or are parts supplied to specific named projects?" If the former then sno ->-> pno, and so SPJ would not be in 4NF.
6.1 exercise 2, similarly you could ask " given a dish such as spaghetti bolognaise, do all cooks use exactly the same ingredients or do cooks have their own recipies with different ingredients?" If the former then dish ->-> ingredient (i.e. the ingredients are indepented of who is cooking the dish) and uses is not in 4NF.
Last Updated: 17/06/02 by M.Davis@herts.ac.uk
© University of Hertfordshire Higher Education Corporation (2002)