Department of Computer ScienceRead pp 8-18 of Notes, and prepare Exercises 1.4-1.6 for next tutorial.
Notation for representing structure of data items.
Need to express:
composition (sequence?) - how an item is made up of simpler ones (its attributes).
repetition - items which are repeated in (e.g.) lists, arrays, etc.
selection - values for items chosen from alternatives
optionality - items which are not always present.
E.g: The Tutorial List I put up on the CS2 Notice Board.
The data on this consists of:
a Title
a Version Number
a Date
a day and time and room and a list of students' names
another day and time and room and list of students' names
(yet) another day and time and room and list of students' names
In this example, we have composition and repetition.
How much repetition?
(see Notes, page 9)
Names a meaningful name for every composite or basic data item
= means 'is defined as', or 'is made up of'
+ means 'and'
{} means zero or more of whatever's inside, i.e. repetition
n{}m means between n and m (inclusive)
[ | | ] means one of the listed attributes is present.
() means item inside is optional
" " enclose literals (actual values)
* * enclose comments - define meaning of data, informally.
Tutorial List Example
TutorialList = Title + VersionNumber + Date + {TutorialDetails}
TutorialDetails = DayOfWeek + TimeSlot + Room + StudentList
StudentList = {FamilyName + FirstName}
or ..
TutorialList = Title + VersionNumber + Date +
{DayOfWeek + TimeSlot + Room + {FamilyName +
FirstName}}
VersionNumber = Digit +"." + Digit
Digit = ["1"|"2"|"3"|"4"| ..]
TimeSlot = StartHour + "-" + EndHour
StartHour = ["9"|"10"|"11"|"12" ..]
too far?
Questions???
CoursePlan = DateOfPlan + VersionNumber + Titles + {WeekDetails}
WeekDetails = WeekNo + StartDate +
[TeachingWeek |NonTeachingWeek]NonTeachingWeek = ["admin week"| "induction week"|
"student centred learning"|..]TeachingWeek = 2{LectureDetails}2 + (TutorialDetails) + (PracticalWork)
LectureDetails = *Description of Lecture Content*
StartDate = Date
DateOfPlan = Date
Date = *date in format "dd-mmm-yy"*etc, etc.
Simple way of describing syntax of composite data.
Meaning or Semantics captured informally, by:
'Meaningful' names for data
Comments explaining constraints and usage
Wide range of uses - any situation in which composite data is found:
particularly data 'on the move'
particularly data crossing the system boundary:
customers' source documents
input data on data entry forms
output data on reports/screens, etc
Internal data needs more attention.....
Read pp19-39 of Notes (next week).
Models the system's internal data, persistent/stored data, 'data at rest'
A 'logical' model - independent of current or eventual implementation; no consideration of where it will be stored, on what media, etc.
Aims:
identify simple data items and relationships between them (complex items reduced to simple ones)
a model of the stored data requirements independent of specific processing requirements
a minimal model - no duplicated data, no data that can be derived from other data
Some similarities with 'Object Modelling' (later).
Origins with relational database design:
may provide basis for system design around a RDBMS
...but we also use it as a fundamental analysis technique (not just useful when a database implementation is anticipated)
Uses DD notation - but rules about which structures are permitted ("normal forms)".
entity simple class or type of 'thing' or object the system needs to hold data about.
e.g. STUDENT
entity occurrence one particular STUDENT
STUDENT = {Student}
attributes of an entity data items which make up that entity
e.g. Student = Name + Address + EnrolNo + DateOfBirth +
value of attribute value for a particular occurrence
e.g ("Fred Karno", "42 ", 452189, 18/5/79, ..)
key of entity attribute or group of attributes whose value(s) uniquely identify an occurrence
e.g. Student = Name + Address + EnrolNo + DateOfBirth +
relationship an association between two entities
e.g. STUDENT is registered for COURSE
foreign key attribute of one entity which is key of another - establishes relationship.
e.g. Student = Name + Address + EnrolNo + DateOfBirth + CourseCode* +
(note - only allows one COURSE per STUDENT!)
cardinality/degree of relationship (between A and B)
1 to 1 each A occurrence is related to just 1 B occurrence, and vice versa
1 to many each A occurrence is related to many B occurrences, but each B is related to only one A
many to many each A occurrence is related to many B occurrences, and vice versa (problematic!).
ER Modelling - Example
Each employee belongs to a single department, and has one manager within that department. An employee can work on several projects at the same time.
Each project has a start date, a finish date, and a number of employees assigned to it. One employee is assigned as the project manager. Projects are identified by a project code. There is a requirement to list all projects due to finish in a certain week.
Most projects are carried out for a single client, although there are internal projects for which there is no client. At any one time, a client may have several projects under way. Clients are identified by a code.
PROJECT = {Project}
Project = ProjectCode + StartDate + FinishDate + ..
Project
Code StartDate
FinishDate
..
| 4713 | 1/1/98 | 31/1/98 | |
| 4722 | 1/3/98 | 31/10/99 | |
| 4801 | 1/6/98 | 31/12/00 | |
CLIENT = {Client}
Client = ClientCode + CompanyName + CompanyAddress + ..
| 173 | Joe Brown & Sons | 32 High St | |
| 177 | Finefare Supermarkets | 77 Upper St | |
| 182 | Wondrous Widgets | 19 Hornsey Lane | |

Project = ProjectCode + StartDate + FinishDate + ClientCode* + ..
Code StartDate FinishDate Code
| 4713 | 1/1/98 | 31/1/98 | 177 | |
| 4722 | 1/3/98 | 31/10/99 | 215 | |
| 4801 | 1/6/98 | 31/12/00 | 177 | |
ER Modelling (Continued)
Notes p27:
RULE FOR A GOOD ENTITY MODEL
For each entity.....
the value of every attribute must depend on ...
the key
the whole key
nothing but the key
Example - Hotel Bookings
The room booking side of a small hotel is to be computerised. The hotel has a number of rooms.
Each room has a basic (double) price and a supplementary price for extra children. These prices also depend on the time of year - it is more expensive at Christmas, during the summer and around bank holidays, for example. There are 3 seasonal bands.
The system must enable the hotel proprietor to answer phone calls from prospective clients (eg rooms available now and in the future, with costs), make provisional bookings, do mailings of previous clients, prepare clients' bill (ignore extras such as papers, drinks etc).
Make a list of objects about which the system must store data - your candidate entities.
Think about what you know of the client's "problem domain" - e.g: nouns in problem statement. Investigate the attributes of each candidate entity. Write a Data Dictionary entry.
Make sure occurrences of each candidate entity can be uniquely identified. Check there is a key.
Remove duplicated and derived data. Ensure that (non-key) data items appear only in one place in the model.
Check that the RULE applies. Investigate relationships between the candidate entities.
Draft an Entity Relationship Diagram. Validate the entity descriptions.
Make sure you have captured all the data the system needs to function!
Check that the RULE applies. Iterate!
We are obviously concerned with entities such as CLIENT, ROOM, BOOKING. A first attempt:
ROOM = {Room}
Room = RoomNumber + NumberOfDoubles + NumberOfSingles + WashingFacilities +
*other info about the room* + {Season + BasicPrice + SupplementaryPrice}Season = ["low"|"middle"|"high"]
CLIENT = {Client}
Client = NameAndAddress + ClientTelephoneNumber + *other info*
NameAndAddress = ClientName + ClientAddressBOOKING ={Booking}
Booking = RoomNumber + DateOfArrival + NameAndAddress* + NumberOfGuests +
ExpectedStay *in days* + (Deposit)

Does the RULE apply? No - there are attributes in ROOM which cannot be found just from the key. To find the prices, you need both to know the RoomNumber and the Season.
We solve this problem by creating a new entity ROOM-PRICE, which allows us to find the prices for any room in any season. These attributes are removed from Room:
Room = RoomNumber + NumberOfDoubles + NumberOfSingles + WashingFacilities +
*other info about the room*ROOM-PRICE = RoomNumber + Season + BasicPrice + SupplementaryPrice
Season = ["low"|"middle"|"high"]

What about current occupancy? Some clients will book and then occupy, some will occupy without booking, some will book without occupying......
ROOM-OCCUPANCY = {RoomOccupancy}
RoomOccupancy = RoomNumber + DateArrived + NameAndAddress + (ExpectedLeavingDate)

Can we find the Booking from the Room-Occupancy, and if not what else do we need?
Techniques which help in eliminating redundancy, and in deriving relationships.
They can be used either to check that a proposed entity model is minimal, or to derive a minimal model from an inferior one.
These rules will not help you if you dont understand the data and their inter-relationships.
An entity is in 1NF if and only if it has a key and there are no repeated attributes or groups of attributes.
To get to 1NF:
- select a key
- for each repeating group:
a) remove it from the entity,
b) create a new entity with these attributes and a key which consists of both the key to the original entity and a key to the repeating group.
Second Normal Form
An entity is in 2NF if and only if it is in 1NF, and has no attributes which require only part of the key to identify them uniquely (hence only applies to entities with composite keys).
To get to 2NF:
For each subset of the key which identifies an attribute (or set of attributes):
- create a new entity with this (subset of the) original key and these attributes
- check that this new entity has a key
An entity is in 3NF if and only if it is in 2NF, and has no attributes which depend only on some other non-key attribute(s).
To get to 3NF:
For each pair of (sets of) attributes which are so related:
- decide on the direction of dependency. If given A, the value of B is uniquely determined, then B depends on A, in which case:
a) make A the key to a new entity, C, and include B as attribute(s) of C
b) leave A in the original entity as a foreign key to C, but remove B from the original entity.
STOCKINFO =
{ProductCode + ProductDescription + ProductPrice + ProductClassCode + ProductClassDescription + TotalStock +
{AreaCode + AreaName +
{DepotNo + DepotAddress + ProductQtyIS + ProductROL
}
}
}
--------------------------------------------------
To get to 1NF, we have to remove the iterations and identify keys. Renaming the entity, and taking out the {AreaCode...} iteration gives:-
PRODUCT = {Product}
Product = ProductCode + ProductDescription + ProductPrice + ProductClassCode +
ProductClassDescription + TotalStockAREA = {Area}
Area = ProductCode + AreaCode + AreaName +{DepotNo + DepotAddress +
ProductQtyIS + ProductROL}------------------------------------------------
Taking out the inner {DepotNo...} iteration gives:-
PRODUCT = {Product}
Product = ProductCode + ProductDescription + ProductPrice + ProductClassCode +
ProductClassDescription + TotalStockAREA = {Area}
Area = ProductCode + AreaCode + AreaNameDEPOT = {Depot}
Depot = ProductCode + AreaCode + DepotNo + DepotAddress + ProductQtyIS + ProductROL-------------------------------------------------
Now in 1st NF as there are no repeating groups. Perhaps we should note that the names are somewhat misleading - AREA is about what products are stored in which areas, and DEPOT likewise contains information about products stored in depots in each area.
Now we look for attributes which only depend on part of a key:-
PRODUCT = {Product}
Product is already in 2NF, as it doesn't have a compound key.
AreaName in AREA only depends on AreaCode, giving two entities, which we will call AREA_1 and AREA_2 for the moment:-
AREA_1 = {Area_1}
Area_1 = Product_Code + AreaCodeAREA_2 = {Area_2}
Area_2 = AreaCode + AreaName
DepotAddress depends only on DepotNo and AreaCode components of the key - assuming that depots are numbered in the same way in each area (so that, for example, there is a Depot 2 in Area B and also a Depot 2 in Area C.
We assume that this entity is about a particular product, in a particular depot in a particular area, so the other attributes depend on the whole key.
DEPOT_1 = {Depot_1}
Depot_1 = ProductCode + AreaCode + DepotNo + ProductQtyIS + ProductROLDEPOT_2 = {Depot_2}
Depot_2 = AreaCode + DepotNo + DepotAddress-------------------------------------------------
Now in 2nd NF. We can now see that AREA_1 simply tells us which products are in which areas; we have this information already in DEPOT_1, so I'll scrap AREA_1. AREA_2 tells us what names particular areas have, so I'll simply call it AREA. Similarly DEPOT_2 will be renamed DEPOT, and DEPOT_1 will be renamed PRODUCT_IN_DEPOT.
Summarizing at this stage (second normal form), and replacing AreaCode + DepotNo in PRODUCT_IN_DEPOT by KeyToDepot gives:-
PRODUCT = {Product}
Product = ProductCode + ProductDescription + ProductPrice + ProductClassCode +
ProductClassDescription + TotalStockAREA = {Area}
Area = AreaCode + AreaNameDEPOT = {Depot}
Depot = KeyToDepot + DepotAddress
KeyToDepot = AreaCode + DepotNoPRODUCT_IN_DEPOT ={Product_in_Depot}
Product_in_Depot = ProductCode + KeyToDepot + ProductQtyIS + ProductROL-------------------------------------------------
Now look for attributes that depend on other non-key attributes.
ProductClassDescription depends only on ProductClassCode, so we split PRODUCT.
Also TotalStock in PRODUCT is derivable by summing all the values of ProductQtyIS for all occurrences of PRODUCT_IN_DEPOT, so we remove it:-
PRODUCT = {Product}
Product = ProductCode + ProductDescription + ProductPrice + ProductClassCode*PRODUCT_CLASS = {Product_Class}
Product_Class = ProductClassCode + ProductClassDescriptionAREA = {Area}
Area = AreaCode + AreaNamePRODUCT_IN_DEPOT = {Product_in_Depot}
Product_in_Depot = ProductCode + KeyToDepot + ProductQtyIS + ProductROLDEPOT = {Depot}
Depot = KeyToDepot + DepotAddress
KeyToDepot = AreaCode + DepotNoNow in 3NF. All the other non-key attributes are independent.

Last Updated: 27/10/98 by M.Wood@herts.ac.uk
© University of Hertfordshire Higher Education Corporation (1998)