NetNews Usenet Archive 1992 #27

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #27 / NN_1992_27.iso / spool / comp / database / 8045 < prev next >

Wrap

Internet Message Format | 1992-11-23 | 7.1 KB

Path: sparky!uunet!zaphod.mps.ohio-state.edu!cs.utexas.edu!natinst.com!hrd769.brooks.af.mil!hrd769.brooks.af.mil!not-for-mail From: burgess@hrd769.brooks.af.mil (Dave Burgess) Newsgroups: comp.databases Subject: Re: Normalizing 2NF -> 3NF Date: 23 Nov 1992 11:21:39 -0600 Organization: Armstrong Lab MIS, Brooks AFB TX Lines: 128 Distribution: world Message-ID: <1er3v3INNqfc@hrd769.brooks.af.mil> References: <69674@cup.portal.com> <1ejt9qINN340@hrd769.brooks.af.mil> <70064@cup.portal.com> NNTP-Posting-Host: hrd769.brooks.af.mil In article <70064@cup.portal.com> DVaz@cup.portal.com (David D Vazquez) writes: >Dave Burges writes: >> >> Where is the redundancy? In the foreign key element? That is where your >>cardinality will help determine the actual implementation, but we are talking >>Form Normalization, not implementation. In normalization, the concepts of >>query speed and disk space have no place. Save them for the physical >>implementation phase. >> > >I agree, but physically implemented tables generally tend to look alot like >their abstractly designed relatives. You can also save alot of time and >headaches if you implement your tables just like they've been designed. Just >think, you've spent 7 hours designing a database only to have your programmer/ >implementor spend another 7 hours trying to figure out the better implementa- >tion (and for something silly like R1(A*,B)/R2(A*,C) vs. R1(A*,B)/R2(B*,C). >The database implementor shouldn't have to make these kinds of decisions! I agree 100%. That is why they should implement according to the design, not try to second guess what was intended. I have a 'REAL LIFE' example of this that I am not willing to share on the net, but E-Mail me and I will be glad to discuss it. It shows what happens when the implementation team codes up what they understood the situation to be, rather then implementing what the designer had documented. The nightmare continues. >Supertype/subtype and recursive relationship implementation should be the extent >of implementation issues. > That is what makes the Database methodology so inviting. It allows us the ability to document these types of relationships in (relatively) easy to do pieces that illuminate (versus obfuscate) the issue. Before I continue, I need to make sure everyone understands what I meant by 'Dorm'. In the military, every building at a training site has a 'name', whether it is a squadron name DORM or a historical figure HALL or a simple building number. In addition, each building on the 'campus' has a street address. If the term 'dorm' was taken as a room number, then we are debating different issues. For example, the student lives in 'Kilday Hall', but the street address is '123 34th Street West'. No one uses the address, except to get mail into the mail room. All other correspondence refers to the 'dorm name'. If the term 'dorm' were to be taken as a 'room number' then the relation R1(A*,B,C) could only be further normalized through the use of a compound key relationship where B and C are used in R2(C+B*) since this is the only way to develop a candidate key from the R2 relation. If this were the case and no other data about (dorm+room) were kept, we would still need to have it separated out to avoid an anomoly where the rooms disappear when no one lives in them unless we implement a kludge like 'VACANT' moves in whenever the rooms are emptied. >'Redundancy?', you ask. Given R(Student, Dorm, Dorm Address); if you implement >as R1(Student, Dorm) and R2(Student, Dorm Address), you are repeating values >of Dorm Address for each student as well as not enforcing Dorm -> Dorm Address >(or vice versa); you'd have to do some extra coding.. Also, you'd need two >copies of Student's names (one to find the dorm and one to find the dorm's >address); this is a waste. The dorm address is not dependent on the student; it is dependent on the dormitory. In fact, if the students all move out of the dormitory in your example, it ceases to exist. I don't think THAT was your intent (although it would make urban renewal a lot easier, and I have known students where, when they moved out, it was as much work as building a new dorm to fix it back up :-)) Therefore, the relationship in the example could never be R1(A*,B) R2(A*,C). The dorm address is transitively dependent on the student, but functionally dependent on the dormitory. Therefore, the only possible 3NF relations from the relationship are: R1(A*,B)/R2(B*,C) or R1(A*,C)/R2(C*,B) Since either the Dorm address or the dormitory name could be a candidate key to their relation, the selected foreign key into R2 is moot. Another way to look at it to further understanding might be to recognize the one-to-many relationships at work. Student (1) to (1) Dormitory Student (1) to (1) Dorm address Dormitory (1) to (1) Dorm Address Dormitory (1) to (m) Student Dorm Address (1) to (1) Dormitory Dorm Address (1) to (m) Student I don't want to draw pictures, but the 1-M relationship between (Dorm Name Student) implies that the relation should be split to AVOID redundant data being entered in the 2NF relation. In other words, the dorm information would be entered once (address/dorm name) and all of the students would reference that combination through a foreign key (address or name). > >The problem I see is that you never have one correct answer when it comes to >getting a database into 3NF. Should there be one, maybe? > > -Dave. "dvaz@cup.portal.com and dvazquez@sdiserv.cs.uno.edu" I have seen dozens of database designs for the sample problem in the class I teach. None were ever the same. On the other hand, there are certain features which every class has implemented the same way: - All the successful designs relied on the analysis of the anomolies that their design would and wouldn't address. - The successful ones were reality based, in that the relationships between the relations correlated very closely to the discovered real life situations. - The unsuccessful ones were over-simplified in order to make postpone decision making to the implementation phase. - The confusing ones may or may not have worked. They were unmappable to a physical implementation method, or relied on a single database product (e.g. Oracle) to limit their design. This made implementing them using a different database model (Flat File access model/Network Model/ Hierarchical Model). - None have every successfully overdesigned. The ones where a lot of work was done on in the design stages were very easy to implement. They were not always optimally executable, but the were easy to implement. The optimization of the database is a separate issue, and relies on the access patterns, rather than any relationships of the data. I understand that I have painted many of these statements with a pretty wide brush, but please understand that I get a little religious about implementation being where the work is done. I have been bowled down too many times by poorly designed but very excellently implemented database systems. TSgt Dave Burgess NCOIC AL/MIS Brooks AFB, TX