uhlogo.gif (6404 bytes) Department of Computer Science

Data Dictionaries

To: Systems Design and Development


Read pp 8-18 of Notes, and prepare Exercises 1.4-1.6 for next tutorial.

Notation for representing structure of data items.

Need to express:

composition (sequence?) - how an item is made up of simpler ones (its attributes).

repetition - items which are repeated in (e.g.) lists, arrays, etc.

selection - values for items chosen from alternatives

optionality - items which are not always present.

E.g: The Tutorial List I put up on the CS2 Notice Board.

The data on this consists of:

a Title
a Version Number
a Date
a day and time and room and a list of students' names
another day and time and room and list of students' names
(yet) another day and time and room and list of students' names

In this example, we have composition and repetition.

How much repetition?


Symbols used in DD Notation

(see Notes, page 9)

Names a meaningful name for every composite or basic data item

=        means 'is defined as', or 'is made up of'

+        means 'and'

{}        means zero or more of whatever's inside, i.e. repetition

n{}m        means between n and m (inclusive)

[ | | ]        means one of the listed attributes is present.

()        means item inside is optional

" "        enclose literals (actual values)

* *           enclose comments - define meaning of data, informally.


Tutorial List Example

TutorialList = Title + VersionNumber + Date + {TutorialDetails}

TutorialDetails = DayOfWeek + TimeSlot + Room + StudentList

StudentList = {FamilyName + FirstName}

or…..

TutorialList = Title + VersionNumber + Date +
                        {DayOfWeek + TimeSlot + Room + {FamilyName +
                         FirstName}}

How far do we go?

VersionNumber = Digit +"." + Digit

Digit = ["1"|"2"|"3"|"4"|…..]

TimeSlot = StartHour + "-" + EndHour

StartHour = ["9"|"10"|"11"|"12"…..]

too far?

Questions???


 SDD Course Plan

CoursePlan = DateOfPlan + VersionNumber + Titles + {WeekDetails}

WeekDetails = WeekNo + StartDate +
                                [TeachingWeek |NonTeachingWeek]

NonTeachingWeek = ["admin week"| "induction week"|
                                        "student centred learning"|..]

TeachingWeek = 2{LectureDetails}2 + (TutorialDetails) + (PracticalWork)

LectureDetails = *Description of Lecture Content*

StartDate = Date
DateOfPlan = Date
Date = *date in format "dd-mmm-yy"*

etc, etc.


Data Dictionary - Conclusions

Simple way of describing syntax of composite data.

Meaning or Semantics captured informally, by:

'Meaningful' names for data

Comments explaining constraints and usage

Wide range of uses - any situation in which composite data is found:

particularly data 'on the move'

particularly data crossing the system boundary:

customers' source documents

input data on data entry forms

output data on reports/screens, etc

Internal data needs more attention.....

 



Entity-Relationship Modelling

Read pp19-39 of Notes (next week).

Models the system's internal data, persistent/stored data, 'data at rest'

A 'logical' model - independent of current or eventual implementation; no consideration of where it will be stored, on what media, etc.

Aims:

identify simple data items and relationships between them (complex items reduced to simple ones)

a model of the stored data requirements independent of specific processing requirements

a minimal model - no duplicated data, no data that can be derived from other data

Some similarities with 'Object Modelling' (later).

Origins with relational database design:

may provide basis for system design around a RDBMS

...but we also use it as a fundamental analysis technique (not just useful when a database implementation is anticipated)

Uses DD notation - but rules about which structures are permitted ("normal forms)".


ER Modelling - Terminology

entity simple class or type of 'thing' or object the system needs to hold data about.

e.g. STUDENT

entity occurrence one particular STUDENT

STUDENT = {Student}

attributes of an entity data items which make up that entity

e.g. Student = Name + Address + EnrolNo + DateOfBirth + …

value of attribute value for a particular occurrence

e.g ("Fred Karno", "42 …", 452189, 18/5/79, …..)

key of entity attribute or group of attributes whose value(s) uniquely identify an occurrence

e.g. Student = Name + Address + EnrolNo + DateOfBirth + …

relationship an association between two entities

e.g. STUDENT is registered for COURSE

 

foreign key attribute of one entity which is key of another - establishes relationship.

e.g. Student = Name + Address + EnrolNo + DateOfBirth + CourseCode* + …

(note - only allows one COURSE per STUDENT!)

cardinality/degree of relationship (between A and B)

1 to 1 each A occurrence is related to just 1 B occurrence, and vice versa

reminv.gif (1636 bytes)

 

1 to many each A occurrence is related to many B occurrences, but each B is related to only one A

custord.gif (1671 bytes)

 

many to many each A occurrence is related to many B occurrences, and vice versa (problematic!).

studcour.gif (1708 bytes)


ER Modelling - Example

A Project Management System

Each employee belongs to a single department, and has one manager within that department. An employee can work on several projects at the same time.

Each project has a start date, a finish date, and a number of employees assigned to it. One employee is assigned as the project manager. Projects are identified by a project code. There is a requirement to list all projects due to finish in a certain week.

Most projects are carried out for a single client, although there are internal projects for which there is no client. At any one time, a client may have several projects under way. Clients are identified by a code.


Entities as Tables

PROJECT = {Project}

Project = ProjectCode + StartDate + FinishDate + …..

Project
Code           StartDate                    FinishDate ……..

4713 1/1/98 31/1/98
4722 1/3/98 31/10/99
4801 1/6/98 31/12/00

CLIENT = {Client}

Client = ClientCode + CompanyName + CompanyAddress + ..

173 Joe Brown & Sons 32 High St
177 Finefare Supermarkets 77 Upper St
182 Wondrous Widgets 19 Hornsey Lane

 

The Client-Project Relationship

clieproj.gif (1634 bytes)

 

Project = ProjectCode + StartDate + FinishDate + ClientCode* + …..

Code         StartDate                      FinishDate                  Code ……

4713 1/1/98 31/1/98 177
4722 1/3/98 31/10/99 215
4801 1/6/98 31/12/00 177

ER Modelling (Continued)

Notes p27:

RULE FOR A GOOD ENTITY MODEL

For each entity.....
    the value of every attribute must depend on ...

        the key
        the whole key
        nothing but the key


Example - Hotel Bookings

The room booking side of a small hotel is to be computerised. The hotel has a number of rooms.

Each room has a basic (double) price and a supplementary price for extra children. These prices also depend on the time of year - it is more expensive at Christmas, during the summer and around bank holidays, for example. There are 3 seasonal bands.

The system must enable the hotel proprietor to answer phone calls from prospective clients (eg rooms available now and in the future, with costs), make provisional bookings, do mailings of previous clients, prepare clients' bill (ignore extras such as papers, drinks etc).


How do we go about creating a first entity model for a system?

• Make a list of objects about which the system must store data - your candidate entities.
        Think about what you know of the client's "problem domain" - e.g: nouns in problem statement.

• Investigate the attributes of each candidate entity. Write a Data Dictionary entry.

• Make sure occurrences of each candidate entity can be uniquely identified. Check there is a key.

• Remove duplicated and derived data. Ensure that (non-key) data items appear only in one place in the model.
        Check that the RULE applies.

• Investigate relationships between the candidate entities.
        Draft an Entity Relationship Diagram.

• Validate the entity descriptions.
        Make sure you have captured all the data the system needs to function!
        Check that the RULE applies.

• Iterate!


Hotel Bookings ER Model

We are obviously concerned with entities such as CLIENT, ROOM, BOOKING. A first attempt:

ROOM = {Room}
Room = RoomNumber + NumberOfDoubles + NumberOfSingles + WashingFacilities +
                    *other info about the room* + {Season + BasicPrice + SupplementaryPrice}

Season = ["low"|"middle"|"high"]

CLIENT = {Client}
Client = NameAndAddress + ClientTelephoneNumber + *other info*
NameAndAddress = ClientName + ClientAddress

BOOKING ={Booking}
Booking = RoomNumber + DateOfArrival + NameAndAddress* + NumberOfGuests +
                        ExpectedStay *in days* + (Deposit)

 hotel1.gif (2604 bytes)

 

Does the RULE apply? No - there are attributes in ROOM which cannot be found just from the key. To find the prices, you need both to know the RoomNumber and the Season.

We solve this problem by creating a new entity ROOM-PRICE, which allows us to find the prices for any room in any season. These attributes are removed from Room:

Room = RoomNumber + NumberOfDoubles + NumberOfSingles + WashingFacilities +
                *other info about the room*

ROOM-PRICE = RoomNumber + Season + BasicPrice + SupplementaryPrice
Season = ["low"|"middle"|"high"]

 hotel2.gif (1681 bytes)

 What about current occupancy? Some clients will book and then occupy, some will occupy without booking, some will book without occupying......

ROOM-OCCUPANCY = {RoomOccupancy}
RoomOccupancy = RoomNumber + DateArrived + NameAndAddress + (ExpectedLeavingDate)

 

 hotel3.gif (3730 bytes)

 

 Can we find the Booking from the Room-Occupancy, and if not what else do we need?


 Data Normalization and Normal Forms

Techniques which help in eliminating redundancy, and in deriving relationships.

They can be used either to check that a proposed entity model is minimal, or to derive a minimal model from an inferior one.

 

These rules will not help you if you don’t understand the data and their inter-relationships.


First Normal Form

An entity is in 1NF if and only if it has a key and there are no repeated attributes or groups of attributes.

To get to 1NF:

  1. select a key
  1. for each repeating group:

a) remove it from the entity,

b) create a new entity with these attributes and a key which consists of both the key to the original entity and a key to the repeating group.


Second Normal Form

An entity is in 2NF if and only if it is in 1NF, and has no attributes which require only part of the key to identify them uniquely (hence only applies to entities with composite keys).

To get to 2NF:

For each subset of the key which identifies an attribute (or set of attributes):

  1. create a new entity with this (subset of the) original key and these attributes
  2. check that this new entity has a key

Third Normal Form

An entity is in 3NF if and only if it is in 2NF, and has no attributes which depend only on some other non-key attribute(s).

To get to 3NF:

For each pair of (sets of) attributes which are so related:

  1. decide on the direction of dependency. If given A, the value of B is uniquely determined, then B depends on A, in which case:

    a)  make A the key to a new entity, C, and include B as attribute(s) of C

    b)  leave A in the original entity as a ‘foreign’ key to C, but remove B from the original entity.


Normalization Example

STOCKINFO =

{ProductCode + ProductDescription + ProductPrice + ProductClassCode + ProductClassDescription + TotalStock +

{AreaCode + AreaName +

{DepotNo + DepotAddress + ProductQtyIS + ProductROL

    }

}

}

--------------------------------------------------

To get to 1NF, we have to remove the iterations and identify keys. Renaming the entity, and taking out the {AreaCode...} iteration gives:-

PRODUCT = {Product}
Product = ProductCode + ProductDescription + ProductPrice + ProductClassCode +
                ProductClassDescription + TotalStock

AREA = {Area}
Area = ProductCode + AreaCode + AreaName +{DepotNo + DepotAddress +
         ProductQtyIS  + ProductROL}

------------------------------------------------

Taking out the inner {DepotNo...} iteration gives:-

PRODUCT = {Product}
Product = ProductCode + ProductDescription + ProductPrice + ProductClassCode +
                 ProductClassDescription + TotalStock

AREA = {Area}
Area = ProductCode + AreaCode + AreaName

DEPOT = {Depot}
Depot = ProductCode + AreaCode + DepotNo + DepotAddress + ProductQtyIS + ProductROL

-------------------------------------------------

Now in 1st NF as there are no repeating groups. Perhaps we should note that the names are somewhat misleading - AREA is about what products are stored in which areas, and DEPOT likewise contains information about products stored in depots in each area.

Now we look for attributes which only depend on part of a key:-

PRODUCT = {Product}

Product is already in 2NF, as it doesn't have a compound key.

AreaName in AREA only depends on AreaCode, giving two entities, which we will call AREA_1 and AREA_2 for the moment:-

AREA_1 = {Area_1}
Area_1 = Product_Code + AreaCode

AREA_2 = {Area_2}
Area_2 = AreaCode + AreaName

DepotAddress depends only on DepotNo and AreaCode components of the key - assuming that depots are numbered in the same way in each area (so that, for example, there is a Depot 2 in Area B and also a Depot 2 in Area C.

We assume that this entity is about a particular product, in a particular depot in a particular area, so the other attributes depend on the whole key.

DEPOT_1 = {Depot_1}
Depot_1 = ProductCode + AreaCode + DepotNo + ProductQtyIS + ProductROL

DEPOT_2 = {Depot_2}
Depot_2 = AreaCode + DepotNo + DepotAddress

-------------------------------------------------

Now in 2nd NF. We can now see that AREA_1 simply tells us which products are in which areas; we have this information already in DEPOT_1, so I'll scrap AREA_1. AREA_2 tells us what names particular areas have, so I'll simply call it AREA. Similarly DEPOT_2 will be renamed DEPOT, and DEPOT_1 will be renamed PRODUCT_IN_DEPOT.

Summarizing at this stage (second normal form), and replacing ‘AreaCode + DepotNo’ in PRODUCT_IN_DEPOT by ‘KeyToDepot’ gives:-

PRODUCT = {Product}
Product = ProductCode + ProductDescription + ProductPrice + ProductClassCode +
                 ProductClassDescription + TotalStock

AREA = {Area}
Area = AreaCode + AreaName

DEPOT = {Depot}
Depot = KeyToDepot + DepotAddress
KeyToDepot = AreaCode + DepotNo

PRODUCT_IN_DEPOT ={Product_in_Depot}
Product_in_Depot = ProductCode + KeyToDepot + ProductQtyIS + ProductROL

-------------------------------------------------

Now look for attributes that depend on other non-key attributes.

ProductClassDescription depends only on ProductClassCode, so we split PRODUCT.

Also ‘TotalStock’ in PRODUCT is derivable by summing all the values of ’ProductQtyIS’ for all occurrences of PRODUCT_IN_DEPOT, so we remove it:-

PRODUCT = {Product}
Product = ProductCode + ProductDescription + ProductPrice + ProductClassCode*

PRODUCT_CLASS = {Product_Class}
Product_Class = ProductClassCode + ProductClassDescription

AREA = {Area}
Area = AreaCode + AreaName

PRODUCT_IN_DEPOT = {Product_in_Depot}
Product_in_Depot = ProductCode + KeyToDepot + ProductQtyIS + ProductROL

DEPOT = {Depot}
Depot = KeyToDepot + DepotAddress
KeyToDepot = AreaCode + DepotNo

Now in 3NF. All the other non-key attributes are independent.

 

stock.gif (3668 bytes)


Last Updated: 27/10/98 by M.Wood@herts.ac.uk

© University of Hertfordshire Higher Education Corporation (1998)

uhlogo.gif (6404 bytes)Disclaimer