Department of Computer Science

Database

Unit 6: Further Normalisation


To: Database course contents

Contents

Preview

Objectives

6.1   Fourth Normal Form
6.2   An aside on 4NF
6.3   Fifth Normal Form
6.4   Normalisation is not a panacea

Summary

Terms used

Preview

Functional dependencies are the most important dependencies in database design. However there are forms of redundancy that cannot be detected from functional dependencies, but can be detected using other dependencies.  There are two further normal forms Fourth Normal Form and Fifth Normal Form which deal with these additional forms of redundancy.

You also consider some cases where normalisation does not give the best conceptual design.

Reading

R&G Chapter 19 section 19.8.


Objectives

After you have studied this unit, you should be able to:


6.1 Fourth Normal Form

Normalisation up to BCNF is concerned with single valued facts. Fourth normal form is concerned with multivalued facts.

4NF asks whether an all key relation represents ONE complex multivalued fact, or two (or more) simpler multivalued facts.

Consider :   

Student4 (Sno, sname, tutorno, tutorname, courseno, coursename)

FDs

Using the closure algorithm we can determine that the key is {Sno, tutorno, courseno}.

Casting into BCNF we get

Student ( Sno, Sname)
Tutor ( tutorno, tutorname)
Course ( courseno,coursename)
and
STC (Sno, tutorno, courseno)

Student, Tutor and Course are in BCNF and we can be sure there is no redundancy.

STC is also in BCNF. But could there be redundancy in STC?  The answer is that depends on the semantics of STC. We will look at two cases

CASE1: STC means that each student does a set of courses, and each course is tutored by a set of tutors.

suppose we want to represent

 

We could represent this information in the relation STC as follows:

student course tutor
s1 c1 t1
s1 c1 t3
s1 c1 t5
s1 c5 t3
s1 c5 t7
s2 c1 t1
s2 c1 t3
s2 c1 t5

However there is quite a bit of redundancy, we see that student s1 studies course c5 more than once. In this case STC would be subject to update anomalies such as:

  1. we cannot store which tutors teach a course if no student is doing the course
  2. if the set of tutors changes it has to be changed for every student doing the course
  3. it is possible to introduce inconsistency (how?)

If fact although there is no functional dependency in STC there is a multivalued dependency (MVD).


Reading

Read R&G 19.8.1 for an explanation of a multivalued dependency.


Can you see that

  • course ->-> student
  • course ->-> tutor

A relation R is in 4NF iff for every MVD X ->->Y that holds in R either

In our example course ->-> student, but none of the above is true, therefore STC is not in 4NF.

STC should be split into two relations as follows:

student course
s1 c1
s1 c5
s2 c1

and

course tutor
c1 t1
c1 t3
c1 t5
c5 t3
c5 t7

This contains the same information as STC but is not subject to update anomalies.


CASE 2:
Now suppose that the semantics of STC are that different students see different tutors for the courses they study e.g.

In this case course does not multidetermine student, and STC is in 4NF. The three attributes are needed together to represent the information. (Try splitting it as in CASE1, what goes wrong?)


Another example : Consider the relation

    Drinks (Name, Pub, Beer)

This might mean that a drinker (Name) is picky about what he/she drinks where: maybe Fred likes to drink IPA or Luton Pride in the Hens' Teeth, but only likes Tweekstons in the Ferreter's Arms. In this case only one (complex) multivalued fact is being held, and the relation cannot be losslessly decomposed. It is already in 4NF.

However, it might be that Drinks represents TWO simpler multivalued facts, the set of pubs a drinker likes to visit, and the set of drinks they like to drink. If we use Drinks to hold these two multivalued facts it will be subject to update anomalies. In this case normalisation to 4NF would show that Drinks should be cast into  (Name, Beer) and (Name, Pub).


Exercises 

1) Suppose we have a relation SPJ (sno, pno, jno), this shows which suppliers (snos) supply which parts (pnos) to which projects (jnos).  What question(s) would you ask your user to find out whether SPJ is in 4NF? Remember you must phrase your question in terms of the problem domain, it is probably no good asking the user about   MVDs!

2) Suppose you have a relation concerned with cookery as follows:

uses ( cook, ingredient, dish )

What questions would you ask the user to find out whether uses is in 4NF or not?

Answers

To Contents


6.2 An aside on 4NF

Recall the relation SPJ (sno, pno, jno), this shows which suppliers (snos) supply which parts (pnos) to which projects (jnos).

Suppose you have decided it is in 4NF, it represents one complex multivalued fact -- which supplier supplies which parts to which projects.

In this case you need the complete SPJ (sno, pno, jno) relation to represent this complex fact.

THIS DOES NOT MEAN that you do not need relations such as canSupply(sno,pno), i.e. a list of which parts each supplier can supply, AS WELL. In an ideal world the roles of the attributes will have been carefully thought out before you begin to normalise, such an expectation is unrealistic.

To Contents


6.3 Fifth Normal Form

Reading

R&G 19.8.3 and 19.8.4


Fifth Normal Form is the last normal form based on lossless-join decomposition. An informal summary of 5NF is 'if even after 4NF you can still find a lossless way of splitting an all key relation, then you should do it'.

There is no easy way of testing whether a relation is in 5NF other than testing every possible split.


Here is an example which shows a relation which can be split into 3 projections without loss of data:

Flying Quills is a business that provides secretarial support for multi-lingual conferences. Thus

Flying Quills has a number of secretaries with various language skills.

If a secretary attends a conference then they do provided support in all the languages in which they are skilled.

Clearly Flying Quills want to see some output something like this:

conference need support
conference1 English

French

German

Fred, Sally

Fred

Sally

conference2

 

Mandarin

Russian

???

Fred

 

Now this is not normalised at all. Should this view be supported by the following base table?

FQ (conference, need, support)

conference need support
conference1 English Fred
conference1 English Sally
conference1 French Fred
conference1 German Sally
conference2 Russian Fred
conference2 Mandarin no-one
none Swedish Sven

Well this table turns out to be subject to many update anomalies. If we delete Fred we would also delete that conference1 need French, and conference2 needs Russian. We could start putting no-one in in place of Fred's name, but again this is a recipe for disaster, do we put no-one in line1 and if not how do we know not to?

Flying Quills depend on their reliability: this is just not going to work.

However we can cast the relation into 3 lossless projections... (omitting projections with none or no-one in as they are a bodge)

needs

conference need
conference1 English
conference1 French
conference1 German
conference2 Russian
conference2 Mandarin

provides

need support
English Fred
English Sally
French Fred
German Sally
Russian Fred
Swedish Sven

assigned

conference support
conference1 Fred
conference1 Sally
conference2 Fred

 

Oddly if you join (natural join) two of these relations you potentially create spurious tuples,  but if you join all three

e.g. ((needs nat join provides) nat join assigned)

the result is lossless.

So, Flying Quills should store their data in three tables and derive the overall view as an when they want it.


Recognising Relations not in 5NF

Firstly any relation in 3NF which is not all key is also in 5NF.

Relations which are in 4NF but not 5NF are rare.

A relation is in 4NF but not 5NF if and only if a specific pattern occurs in the data, and that is that three facts in the data imply a third

  • fact a,b      e.g. conference1 needs English, and
  • fact b,c      e.g. English is one of Fred's skills, and
  • fact c,a       e.g. Fred is attending conference1

imply

  • fact a,b,c    Fred will provide English at conference1

To Contents


6.4 Normalisation is not a panacea

In this section we consider the semantics of the data, and whether the normalisation process has been semantically rich enough:

It is not sufficient just to normalise data and leave it at that, there are a number of questions that should be asked about the data before the design is implemented. We look at 3 examples, there are more, as you gain experience in database design you will be able to add to this list.

a) Are any key values likely to be null?

Given relation R (A,B,C,D)

and FDs {{A->B}, {A->C}, {C->D}}

Normalisation produces:

R1(A,B,C)  &   R2(C,D)

But what if values of the attribute C are prone to being unknown, or not existing.

For example

Person (PersonId, name, birthdate, starsign)

Suppose people often withhold their birthdate, but will disclose their star sign e.g.

PersonId name birthdate starsign
p07 Mallinder null Libra

If we normalise we can store

PersonId name birthdate
p07 mallinder null

but we can't put anything in here as we can't have a null value for a key

birthdate starsign

So, if we normalise we cannot store that P07's starsign is Libra.

If key values are likely to be null, the design must be denormalised.


b) are there any hidden entities

Consider the following, it is a small hospital and all consultants can be uniquely identified by name.

Patient2 ( patientNo, patientName, consultantName)

patientNo patientName consultantName
p234 Jones Smith
p456 Heppinstall Smith
p007 Peace Mallinder

What normal form is Patient2 in? The answer is that it is in 5NF, because it is very definitely in 3NF and it is not all key.

Yet there are update anomalies e.g. where would you store the name of a new consultant who hasn't yet got a case load?

It would be better to have a consultant entity. You can either make a separate existence list of consultants.

Patient2 ( patientNo, patientName, consultantName)

patientNo patientName consultantName
p234 Jones Smith
p456 Heppinstall Smith
p007 Peace Mallinder

 

Consultant (consultantName)

consultantName
Smith
Smith
Mallinder

or you could introduce a consultant identifier number in which case Consultant would be  a result of normalisation


c) Are there any 'near' alternate keys

Suppose we had information about students,  their registration number, their name, their home address and their LEA (local education authority).

Now home address does determine LEA. So normalisation would give:

Student (regNo, name, address)   &  Address (address, LEA)

This is not sensible,

In this case, where 'address' is almost an alternate key of Student, it is better to keep the design as below:

Student2 (regNo, name, address, LEA)


Summary

These notes are concerned with one aspect of  the logical database design, i.e. normalisation.

The technique of normalisation is founded on strong theory, which we follow to 5NF. Fifth Normal Form is the final normal form based on lossless-join decomposition.

The unit concludes with a discussion of some of the limitations of normalisation.

To Contents


Terms

functional dependency update anomaly
lossless join
1,2,3,4,5NF, BCNF full functional dependency
multi-valued dependency
trivial dependency
logical database design single valued fact
multivalued fact

To contents


Answers

6.1 exercise, you could ask a questions such as "do suppliers supply parts to a common pool or are parts supplied to specific named projects?"  If the former then sno ->-> pno, and so SPJ would not be in 4NF.

6.1 exercise 2, similarly you could ask " given a dish such as spaghetti bolognaise, do all cooks use exactly the same ingredients or do cooks have their own recipies with different ingredients?" If the former then dish ->-> ingredient (i.e. the ingredients are indepented of who is cooking the dish) and uses is not in 4NF.


Last Updated: 17/06/02 by M.Davis@herts.ac.uk

© University of Hertfordshire Higher Education Corporation (2002)

Disclaimer