Normalization Homework

This assignment will give you more more practice working with Normalization.

Deadline: Wednesday 2/24 before class. NO LATE SUBMISSIONS WILL BE ACCEPTED!

Note: Unlike most homeworks in this class, for this assignment you may discuss your approach and solutions with other students (including in the class’s public Slack channels). The assignment will only be graded for completion. However, you will need to thoroughly understand these concepts in order to do well on the exam!

Why are we allowing you to copy answers from other students? Well, actually you shouldn’t do that if you want to do well, but you should discuss your ideas and justify your approaches with each other. Explaining to other students how to solve a problem is the best way to learn.

Assignment Repository

Create a github classrooms repository using this link: https://classroom.github.com/a/ipq46Wav

All students must create their own repository and must type out their own answers, even if they work together with other students.

For this assignment you must answer the questions in the Readme.md file (also shown below). You should enter your answers directly into that file. All of your answers should be within <blockquote> tags so they are easy to see.

Question 1: Update Anomalies

Consider the schema below which captures the same attributes that we store in the employee database (homework 3). Bold attributes (i.e., surrounded by ** in markdown) are the key to that relation. Assume the “usual” dependencies (from the schema defined in class/homework).

EMP_DEPT (ENAME, SSN, BDATE, ADDR,DNO, DNAME, MGRSSN)
EMP_PROJ (SSN, PNUM, HOURS, ENAME, PNAME, PLOC)

For this schema, do you expect any problems (update anomalies) with any of the following queries? Explain (with examples).

a) Insert the information “John Smith works in department 5” (with John Smith’s birthdate, ssn, address and dno, dname and mgrssn).
b) Create a new “Sales” department in the company with department number 9.
c) Delete Project number 10.

Question 2: Normal Forms

A. Explain how a Relation that meets First Normal Form (1NF) can reduce redundancy compared to a Relation which is not normalized.

B. Explain how a Relation that meets Second Normal Form (2NF) can reduce redundancy compared to a Relation that only meets 1NF.

C. Explain how a Relation that meets Third Normal Form (3NF) can reduce redundancy compared to a Relation that only meets 2NF.

D. Explain how a Relation that meets Boyce-Codd Normal Form (BCNF) can reduce redundancy compared to a Relation that only meets 3NF.

Question 3: Functional Dependencies

Consider a Relation R3 = (A, B, C, D, E, F) which has the following functional dependencies F:

A -> BC
CD -> E
CD -> F
B -> F

Which of the following must also hold:

A -> B
A -> C
A -> E
A -> F
C -> E
AD -> E

Question 4: FDs and Candidate Keys

We know that a Candidate Key is a minimal set of attributes which are sufficient to uniquely identify each tuple in a relation, i.e., all other attributes must be functionally dependent on the set of attributes that make up the Candidate Key. Thus a candidate key must be a minimal set of attributes which can appear on the left hand side of functional dependencies, but will produce a closure that includes all other attributes on the right hand side.

A. Consider the Relation R3 = (A, B, C, D, E, F) from the prior question, with the same set of functional dependencies. What Attributes can be used to define a Candidate Key for R3?

B. Consider the Relation R4 = (A,C,B,D,E), with Functional Dependencies: A -> B, C -> D. What is the Candidate Key for R4?

Question 5: FDs and Normal Forms

A. Consider the Relation R5 = (V, W, X, Y, Z), with the following functional dependencies:

V -> X
WY -> X
VWY -> Z

In this relation, (V, W, Y) is the Candidate Key. What normal form does R5 satisfy? You may assume that all tuples are unique and attributes are atomic.

B. Consider the relation R3 = (A, B, C, D), with the following functional dependencies:

AB -> C
C -> D

What is the Candidate Key for this relation? What normal form does R3 satisfy? You may assume that all tuples are unique and attributes are atomic.

Question 6: Decomposition

Suppose we decompose Relation R5 (described above) into two tables, R51 and R52:

R51 = (V, W, Y, Z)
R52 = (V, W, X)

A. Will this be a loss-free decomposition, i.e., will we still be able to reconstruct all data by joining the two tables together? What normal form will R51 and R52 be in?

B. Propose a set of tables which will provide a loss-free decomposition and ensures all tables are in 3NF.

Question 7: More Decomposition

Consider data about class registrations: Class = (course_id, title, dept_name, credits, sec_id, semester, year, building, room_number, capacity, time_slot_id)

For conciseness, we will abbreviate all tables as follows: Course_id (C), title (T), dept_name (D), credits (H), sec_id (S), semester (A), year (Y), building (B), room_number (R), capacity (N), time_slot_id (M).

Thus our simplified input schema is Class = (C,T,D,H,S,A,Y,B,R,N,M).

This relation has the following dependencies:

C -> TDH (given a course ID, we know the title, department, and credit hours)
BR -> N (Given a building and room number we know the room’s capacity)
CSAY -> BRM (Given a course ID, section ID, and semester, we know the building, room number, and time slot it was taught in)

A. What is the Candidate Key for the Class relation?

B. Decompose the Class Relation into smaller tables until it achieves BCNF normalization. You should be able to reach BCNF after two stages of decomposition (i.e., using 3 tables).