Principles of Database Design James J. Cimino NIH Clinical Center.
-
Upload
estella-burke -
Category
Documents
-
view
218 -
download
1
Transcript of Principles of Database Design James J. Cimino NIH Clinical Center.
![Page 1: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/1.jpg)
Principles of Database DesignJames J. Cimino
NIH Clinical Center
![Page 2: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/2.jpg)
Outline
• Definition
• Motivation
• History and evolution
• Design principles
• Design methods
• Exercises
• Take-Home Messages
![Page 3: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/3.jpg)
Database: Definition
• A collection of data that:
– is organized
– usually computer-based
– represents repetitive information implicitly
– supports retrieval
![Page 4: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/4.jpg)
Information
• Content– Name– Date– Diagnosis– Medication– Problem– Procedure– Visit
• Structure– Field– Record– Table– Database
![Page 5: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/5.jpg)
Paper Database as Expert System
![Page 6: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/6.jpg)
Motivation
• Power and flexibility depend on data model
• Database is the realization of data model
• Evaluation of commercial products
• Communicating with vendors and IT staff
• Building your own databases
![Page 7: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/7.jpg)
In the beginning, there were punch cards…
1234567 NAME SANDIEGO, CARMEN0001
1234567 ADDR 123 MAIN STREET 0002
1234567 DOB 02/01/1948 0003
1234567SANDIEGO,CARMEN123MAIN020148
1-7 8-22 23-30 31-36
![Page 8: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/8.jpg)
Sequential Files
• Stored on magnetic tape
• Longer (or shorter) than 80 characters
• 8-bit characters (256 characters)
• Variable-length records
• Random access possible
Len=21 Data Len=16 Data
• Slowwwwww…..
ID Loc ID Loc12345678901234567890123
0211234567SandiegoCarmen0161234568CiminoJim
SandiegoCarmenCiminoJim1234567000112345680015
![Page 9: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/9.jpg)
Random Access Files
• Disk storage with moving heads
• Larger capacity (MB!)
• Addressable records and fields using pointers
• Indexes created as lists of pointers
• Separation of physical and logical models
• Can be difficult to recover if index corrupted
![Page 10: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/10.jpg)
Random Access Files
0000: Name1000Addr2000DOB 60001000: Sandiego, Carmen2000: 123 Main Street3000: 4000:5000:6000: Feb 1, 1948
} } } } } }
![Page 11: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/11.jpg)
Indexed Files
0000: 00000004: 40000008: 2000
0000: Cimino, James1000:2000: Norton, Cathy3000: 4000: Lindberg, Don
![Page 12: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/12.jpg)
Hierarchical Databases
1234567
Sandiego, Carmen 123 Main Street
Labs
Chem7
Na136 K 4.3
Chem7
Na142 K 3.9
![Page 13: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/13.jpg)
Hierarchical Databases
• Easy to use
• Efficient storage
• “Tree walking” is fast
• Queries across trees are slow
• Flexible
• Too flexible: chaos is allowed
• Too easy to modify
• Difficult to document complex structures
![Page 14: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/14.jpg)
Hierarchical Databases
^EMR(1234567)=“Sandiego, Carmen”
^EMR(1234567, “Address”)=“123 Main Street”
^EMR(1234567, “Chem7”, “2/2/02”, “Na”)=136
^EMR(1234567, “Chem7”, “2/2/02”, “K”)=4.3
^EMR(1234567, “Chem7”, “2/3/02”, “Na”)=142
^EMR(1234567, “Chem7”, “2/3/02”, “K”)=3.9
![Page 15: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/15.jpg)
Hierarchical Chaos
1234567
Admissions
Admission 1
Admit Date: 2/2/02
Primary DX: CHF
Other DX
AODM
Flag: S
A Fib
Flag: P
![Page 16: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/16.jpg)
1234567
Gyn Clinic
Pap
Dr. Jones
Sandiego
Gyn Visit
Gyn Clinic
Secretary
305-1000 Service
Ms Smith
Beeper 34
2 Main St.
8AM-5PM
305-2500
Network Databases
![Page 17: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/17.jpg)
Extensible Markup Language (XML) Databases
• SGML is a metalanguage• SGML is used to write Document Type Definitions
(DTDs) that define languages• HTML is a language with an SGML DTD
– Tags are for formatting/presentation syntax• XML is a proper subset of SGML• XML defines tags that convey semantics• We could write “Health Markup Language” (“HML”) in
XML (if we could agree on the semantics and tags)• Tags may or may not be stored with data
![Page 18: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/18.jpg)
<document>
</document>
<document.id>CXR001</document.id><doc. date>19991101</doc. date><document.type>
</document.type><document.body>
<document.body>
<identifier>P5-00010</identifier> <text>Chest X-Ray</text>
<findings>No infiltrate, cardiac shadownot enlarged...</findings>
<impression>Normal X-ray</impression>
![Page 19: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/19.jpg)
<patient.id>
</patient.id><patient.name>
</patient.name><patient.dob>19230113</patient.dob><patient.sex value="male"/><inpatient/>
<patient>
</patient>
<id.value>1234789</id.value>
<family.name>Sandiego</family.name><given.name>Carmen</given.name><suffix>M.D.</suffix>
![Page 20: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/20.jpg)
Extensible Markup Language (XML) Databases
• Strengths– Flexibility to represent wide range of data– Data carries its field assignment– Sparse data handled compactly– Tags can have platform-specific display
• Weaknesses– Immature database tools– Verbose– I/O intensive– A trade-off of decreased efficiency for
increased flexibility; ? scalability
![Page 21: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/21.jpg)
Relational Databases - Features
• Tables with columns and rows
• Logical vs. physical representation
• Multiple indexes
• Inter-table relationships
• Virtual sequential files (with simultaneous update)
![Page 22: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/22.jpg)
Relational Databases
Pt-UI Tname Date12345 Na 5/30/9642353 CBC 5/30/9647756 ESR 5/30/9612348 HBsAg 5/30/9634523 Amylase 5/30/96
Lab_testPt-UI Lname Fname
12345 Smith Elmer12346 Jones Barbara12347 Clark Arthur12348 Jones Casey12349 Sample Steve
Patient
From table “Patient”, get “Pt_UI” where “Lname”=“Jones” and “Fname”=“Casey”
and then Get “Tname” and “Date” from table “Lab_test” for the same “Pt_UI”
![Page 23: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/23.jpg)
Normalization
• Efficient database organization• Eliminate redundant data• Ensure data dependencies make sense• E.F. Codd, 1970: five normal forms• First Normal Form
– Eliminate duplicative columns– Create separate tables for each group of
related data– Identify each row with a unique column
or set of columns (the primary key).
![Page 24: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/24.jpg)
Normalization (continued)
• Second normal form– Remove subsets of data that apply to
multiple rows of a table and place them in a separate table
– Create relationships between these new tables and their predecessors through the use of foreign keys
• Third normal form:– Remove columns that are not dependent
upon the primary key
![Page 25: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/25.jpg)
Relational Databases - Advantages
• Comprehensible
• Multiple “views” possible
• Easy to modify
• New elements don’t “break” programs
• Database management systems (DBMS)
– Referential integrity
– “Reorg” for efficiency
– Access control
– Locking for multiple simultaneous use
![Page 26: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/26.jpg)
Relational Databases - Disadvantages
• Storage overhead
• I/O-intense
• Cost
![Page 27: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/27.jpg)
Systems Design Fundamentals
• The data model is the most critical aspect
• Data model should reflect real world objects and relationships to ensure durability
• A correct data model outlasts applications, including many not anticipated at system start-up
![Page 28: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/28.jpg)
System Design: Basic Concepts
• The world contains “things”
• Develop abstractions called “objects”
• Group objects by criteria which represent the abstract object as an empty table
Patient ID Name Physician Phone No.
![Page 29: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/29.jpg)
Types of Objects (Tables)
• Tangible things (book, person)
• Roles (doctor, patient, supervisor)
• Events (ordering of a lab test)
• Interactions - bind two or more other objects via a transaction (“purchase” relates buyer to seller)
![Page 30: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/30.jpg)
Objects• All of the real-world things in the set (the
“instances”) have the same characteristics
• All instances conform to the same rules
No holes in the table
Doctor License Exp. Date Specialty
Casey 123 ABC Jan 2004 Surgery
Kildare 691XKY Mar 2005 Medicine
Holiday 12-A-962 Sculpture
LICENSE
No strange values
![Page 31: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/31.jpg)
Basic Concepts (continued)
• Empty tables can be filled in to represent the real world things from which the object was abstracted
Patient ID Name Physician Phone No.
3131313 John Smith Casey 867-5309
1234567 Ben Casey Killdare PA6-9000
2121212 Mary Jones Holiday 555-1234
![Page 32: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/32.jpg)
Basic Concepts (continued)
• Relationships between objects are “attributes” of those objects
Patient ID Name Physician Phone No.
Physician Address PhoneRelationship: “Has-Doc” Patient Has-Doc Physician
![Page 33: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/33.jpg)
Table Notation
Patient_Admissions
Pt_ID Date_Adm Time_Adm Unit Room
Empty Table form:
Graphical Form:Patient_Admissions* Pt_ID* Date_Adm-Time_Adm-Unit-Room
Textual Form:
Patient_Admissions (Pt_ID,Date_Adm, Time_Adm, Unit, Room)
![Page 34: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/34.jpg)
• Rule 1: One instance of an object has:– exactly one value for each attribute– only one element per row-column intersection– no repeating groups– no true “holes” in table
• Rule 2: Attributes contain no internal structure
Formalisms for Tables
![Page 35: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/35.jpg)
Name Sex-Age Weight Glucose1 Glucose 2Mary F-32 133,135 201Joe M-43 190 116 93Joe M-43 190 88
Not ok:
Name Sex AgeMary F 32Joe M 43
Ok:
Name GlucoseMary 201Joe 116Joe 93Joe 88
Name WeightMary 133Mary 135Joe 190
Formalisms for Tables
![Page 36: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/36.jpg)
• Rule 3: Every attribute should represent a characteristic of the entire object, not a characteristic of a limited part of the object
Formalisms for Tables
• Rule 1: One instance of an object has:– exactly one value for each attribute– only one element per row-column intersection– no repeating groups– no true “holes” in table
• Rule 2: Attributes contain no internal structure
![Page 37: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/37.jpg)
Patient Table*Person Identifier-Person Name-Date of Birth-Date of Admission
Not OK:Attribute of encounter, not patient
Formalisms for Tables
Patient Table*Person Identifier-Person Name-Date of Birth
OK:
Admission Table*Admission ID-Person Identifier-Date of Admission
![Page 38: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/38.jpg)
• Rule 3: Every attribute should represent a characteristic of the entire object, not a characteristic of a limited part of the object
Formalisms for Tables
• Rule 1: One instance of an object has:– exactly one value for each attribute– only one element per row-column intersection– no repeating groups– no true “holes” in table
• Rule 2: Attributes contain no internal structure
![Page 39: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/39.jpg)
Relationships
• Relationship: an abstraction of an association between real world things
– Patient OCCUPIES Bed
– Library CONTAINS Books
– Specimen IS ASSAYED by Lab Method
• Inverse relationships:
– Bed is OCCUPIED BY Patient
– Book IS LENT BY Library
![Page 40: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/40.jpg)
Relationship Types
Patient BedOne-to-One:
Many-to-Many Patient Doctors
One-to-Many Patient Disease
![Page 41: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/41.jpg)
Modeling Many-to-Many Relationships
DRUG MANUFACTURER* manufacturer name- other attributes
DRUG*generic name- other attributes
LICENSE* manufacturer name* generic name- date licensed
![Page 42: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/42.jpg)
PMID- 2405204TI- Medical informatics. An emerging academic discipline and institutional priority.AB- Information management constitutes a major activity of the health care…\AD- Department of Radiology, Brigham and Women's Hospital, Boston, MA 02115AU- Greenes RAAU- Shortliffe EHLA- engPT- Journal ArticlePT- ReviewJT- JAMA : the journal of the American Medical AssociationSO- JAMA. 1990 Feb 23;263(8):1114-20.MH- Career ChoiceMH- Hospital Information SystemsMH- Information SystemsMH- Medical Informatics/education/organization & administration/*trendsMH- Medical Informatics ApplicationsMH- National Library of Medicine (U.S.)MH- ResearchMH- Training SupportMH- United States
Exercise: Devise a Relational Model for MEDLINE citations
![Page 43: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/43.jpg)
Case PresentationThe patient is a 50 year old, Native American female who present to the emergency room (ER) with the chief complaint of lip numbness, nausea and chest pain.The patient was generally well until about one half hour prior to arrival in the ER, while eating dinner at as seafood restaurant in Rock Harbor, MA. She was finishing a dinner of New England clam chowder, lobster, steamed clams, and corn on the cob when she noted onset of symptoms. Others in her party ate fish and chips, although two other people ate the clam chowder; none at the steamers.She gives a history of hypertension and states that she was getting a "capsule, half green, half blue-green" from her private doctor. She also reports that she was treated in the past for tuberculosis while she was pregnant, but doesn't remember what she was treated with or for how long. She reports that she was at another hospital on the other side of town, where she had a liver biopsy. She reports that he thinks the diagnosis was "hemachromatosis". The patient reports an allergy to Bufferin.Physical examination revealed a well-developed, well-nourished diaphoretic female in moderate respiratory distress. Vital signs showed a pulse of 110, a respiratory rate of 8, an oral temperature of 100.3, and a blood pressure of 150/100. Examination revealed rales over both lower lung fields. Abdominal exam revealed a tender, palpable liver edge. Neurologic exam reveals dysarthria, diffuse muscle weakness, and hyperreflexia. Chem7 (serum): Glucose 100 (70-105) Chem7 (plasma): Glucose 150 (75-110)CBC: Hgb 15 (12.0-15.8), Hct 45 (42.4-48.0), WBC 11,000 (3,540-9,060), Platelets 145K (165-415K)A fingerstick blood sugar was 80Urinalysis showed protein of 1+ and glucose of 0A blood culture was positive for methicillin-resistant Staphylococcus aureus (MRSA)ECG - Sinus Rhythm, 74BPM, Axis -30 degrees, ST segment 2mm elevated andT-waves down in leads I, L, V5 and V6Chest X-ray Left upper lobe infiltrate, left ventricular hypertrophyThe patient's nurse reported that the patient seemed more worried about who would care for her elderly father if anything happened to her.A medical student reviewing the case wonders whether paralytic shellfish poisoning could cause a myocardial infarction; she decides to do a literature search.The patient was treated with activated charcoal and stomach lavage , followed by enteric-coated aspirin. Due to worsening respiratory insufficiency, she was intubated and placed on mechanical ventilation.
![Page 44: Principles of Database Design James J. Cimino NIH Clinical Center.](https://reader036.fdocuments.us/reader036/viewer/2022062517/56649ea15503460f94ba4a85/html5/thumbnails/44.jpg)
Take-Home Messages
• Data model is the most critical aspect of system design and function
• Data models should reflect real world objects and their relationships to ensure durability
• A correct data model outlasts applications, including many not anticipated at system start-up