1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and...
-
Upload
camron-jones -
Category
Documents
-
view
217 -
download
0
Transcript of 1 Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation 9/8/2015 Database Design and...
04/19/23
1Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Database Designand Decisions
GISC 6383 Management and Implementation of GIS
Note: some figures in this document are adapted from:
State of New York GIS Development Guides
ftp://ftp.sara.nysed.gov/pub/gis/sara.zip
chapter 3: conceptual design
chapter 6: database planning and design
chapter 7: database construction
04/19/23
2Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
1. Conceptually Model USERS view(Tomlinson Chapter 6,7)
2. Logically Model the Data– Define data entities and their
relationships– Identify representation of entities and
match to spatial data model(Tomlinson Chapter 8, & 9 thru p.136)
3. Physically Model the DataCreate physical database design for selected
software (Oracle, ArcGIS, etc)
4. Design Process for obtaining and converting Data from source
ESRI Steps for Building Geodatabase1. Model the user’sView of the data
2. Define objects and relationships
3. Select geographic representation.
4. Match to geodatabase elements5. Organize geodatabase structure(Zeiler, p. 18)
Data Base Design Process: steps
Conceptual
Physical
Object Oriented Analysis and Design (OOAD)
Database Schema in UML Diagram(using Visio)
Reality
Geodatabase Generation and Population
[
(Review from earlier lecture.)
04/19/23
3Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
• We covered Conceptual Modeling/Needs Assessment in the lecture on Implementation Steps
• Tonight we focus on Logical Design and Physical Design
• These are the guts of database design/data modeling
Loosely corresponds to Tomlinson’s Chapter 8: Create a data designand Chapter 9 thru p. 136
04/19/23
4Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Objectives of Database Design: – Satisfies and supports organization’s objectives – Contains all data but no redundant data
• Minimizes redundancy across the organization
– Allows for different users to access same data• Consistent and flexible data retrieval and analysis
– Accommodates different views of the same data• Based on user and application needs• Increases likelihood of users developing applications
– Appropriately represents & organizes geographic features– Maintains the data so its currency is assured– Secures data by distinguishing applications (& users) which
• create data (add records for new entities)• update data (maintain & modify existing data records)• read data (use but can’t modify in any way)• delete data (remove records from database)(CRUD)
04/19/23
5Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Logical Data Modeling
04/19/23
6Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Why build a model?• We build models of systems in order to break them up into
well-defined subsystems, because it helps us to overcome the difficulties in comprehending such systems in their entirety.
• As the complexity of systems increase, so does the difficulty of comprehending them and the importance of good modeling techniques to assist us in managing them.
• Good models with well-defined semantics are essential for communication among project teams and to assure architectural soundness.
In other words, developing a model for an industrial-strength software system prior to its construction or renovation is as essential as having a blueprint for a large building.
Luis X. B. Mourão http://cplus.about.com
04/19/23
7Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Logical Data Modeling – It is an entirely data-driven
process– It encourages comprehensive
understanding of business information requirements
– It enables effective communication among designers, developers, and users throughout the design process
– It forms the basis for designing correct, consistent, sharable, and flexible databases using any database technology
• Correct– an accurate and faithful
representation of the way information is used in the business
• Consistent– no contradictions in the way
information objects are named, defined, related, & documented
• Sharable– accessible by multiple applications
and users to meet varying access requirements
• Flexible– easily updateable when
implementation or business changes
04/19/23
8Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Logical Data Modeling in practice• purpose of data model is to ensure that data has been
identified in a completely rigorous and unambiguous fashion on which both user and GIS analyst agree
• logical data models define entities (the unit about which we collect information—e.g. people, companies), the attributes of entities (the information collected—age, salary), and the relationships between entities (companies pay salaries to people)
• developed through use of – entity-relationship diagrams which show relationships
among all data throughout the organization– data dictionaries (structured lists) which document each
entity, its attributes, and its relationships
04/19/23
9Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Entity Relationship (E-R) Diagrams and Logical Data Modeling• Entity (noun)
– objects or things to be included in the database– person, place thing or concept about which you wish to record info.– for example: employee, company, citizen, street
• Attribute (adjective) (but they are often nouns—e.g. a name!!)– characteristics or measurements to be recorded for the entities– fact or nondecomposable piece of information describing an entity– for example: age, dob, owner, street type
• Relationships association between entities (verb)employee---works for----company company--------has-------employees
transformers ----are mounted on----poles land parcel---has----owner
• Cardinality of Relationships (“adverb”)– one to one– one to many– many to many
• Business Rules (attribute domains and validation rules) – requirements that attributes or relationships must meet– specifications which preserve the integrity of the logical data model by governing the values attributes
may assume or the cardinality relationships may take on– For example: # of kids is an integer between 0 and 24; poles have 0 to 3 transformers
Country ----has--- Capital city (one capital city per country)
Company (one) <----Work for---- ---has------> (many) EmployeesParcels < ----have----> owner s (parcels >1 owner; owners>1 parcel)
04/19/23
10Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Name
Age
Sex
Name
Function
Size
Employee Works in Department
Has
DependentsOccupant_Name
Unit_Number
Job Title
Name
Age
Relationshlip to employer
Building_Name
Height
Floor_area
Owner_Name
Owner_Address
Situs_Address
Building Located on Parcel
Has
OccupantOccupant_Name
Unit_Number
ID #
ConventionalE-R Diagram
GIS E-R Diagram
04/19/23
11Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Entity or Attribute? owner as examplePerennial problem in modeling
• Do you think of it as a number or text >>>attribute
• Does it have attributes of its own >>>entity– Owner has name, billing address,
• Does it have a relationship with other things >>>entity
– “Owner receives tax bills”
• Does it repeat elsewhere >>>entity– Owner could own many parcels
• Is it important by itself >>>entity– An owner IS important
If in doubt, make it an entity, Can change later in physical model
ParcelPINOwner
ParcelPIN
OwnerName
This? Or this?
Some rules for attributes• Primitives: no meaning in itself
– e.g 50
• All values of same kind – e.g all integers: 4, 6 not 4, SIX
• Describe entity not another attribute – e.g. owner name describes owner of parcel, not
the parcel
• Never a list: e.g. owner: John Smith, Bill Jones
• Never repeatede.g. owner1, owner2, owner3
• Undecomposable or composite?e.g. 117 West Plano Road
Exceptions are always made!!!
e.g. in 911 when speed matters
04/19/23
12Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
…...data dictionaries (structured lists) document each entity, its attributes, and its relationships
Entity Definition
Entity Name:
Definition:
Unique identifier:
Attributes:
Building
A unique structure with roof
Bldg_Id
Bldg_Id, Bldg_Name, Floor_Area,Height
Relationships: located on parcel has occupant
Entities which have spatial expression need additional conceptualization… …see following slides
When using UML techniques to create Entity-Relationship diagrams, entities, attributes and relationships can all be documented within the diagram.
Data dictionaries become part of the metadata.
Recording measurement units (meters, feet) for attributes is critical and problemattic. Can include in: --name: length_in_feet (clumsy) --metadata (metadata gets lost, forgotten, or separated from data)
04/19/23
13Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Spatial Data Modeling
Spatial data differs in two key ways, and these must be incorporated:– entities have a corresponding spatial expression.
In ESRI terms, • Objects: entities without spatial expression (e.g. owner)• Features: entities with spatial expression (e.g. parcel)• An entity, when given spatial properties, becomes a
GIS spatial data layer
– relationships may have a spatial expression also
04/19/23
14Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
…in essence, you need to decide how to represent each entity spatially
• Point or a point symbol
• Line or line types
• Areas or polygons
• Surfaces or surface drapes
• Raster format such as scanned paper documents
• Images such as photos, satellite or clip art
• or plain old Alpha or numeric
Note that:
• one entity could be visualized through another
– footprint on a lot map
• might be different at different scales– airport a point at one scale and polygon at
another
• might be different for two applications– street as line for routing– street as polygon for pavement
management
• attributes might be displayed graphically as annotation
04/19/23
15Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
…..identifying entities’ spatial representation(examples)• Property associated
– Legal parcel– Assessor parcel– Parcel boundary– plat map– Parcel photograph– Owner– Address– Land value
• Street associated– Street– Street segment– Intersection– Traffic light– Traffic analysis zone– Bus route– Bus stop
– polygon– polygon– line string– raster– image– alphanumeric– alphanumeric– numeric
– line string– line segment– node– point– polygon– route– point
pixels
abcdef123
110210.67
(And these will need to be supported by the GIS software you select)
in essence, you need to decide how to represent each entity spatially
04/19/23
16Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Additionally, you will need to identify the classic
spatial data properties for each layer• Scale ranges at which data is required to be
displayed• minimum resolution required to support intended
applications• minimum accuracy required to support intended
applications• Projection(s) in which data will be
– stored and – used (may not be the same)
04/19/23
17Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Spatial Relationships
Spatial Descriptive Common GIS E-R Model Relationship Verbs Implementation Symbol
Connectivity
Contiguity
Containment
Proximity
Coincidence
Connect, link
Adjacent,abutt
Contained,containing,within
Closest,nearest
Coincident,Coterminous
Topology
Topology
Spatial joinoperation
Spatial joinoperation
Spatial join.operation
street segments link to street network
cities common border
lot within floodplain
house nearest fire_hydrant
Valve and gauge same manhole
How GIS system might implement the relationship
…identifying spatial relationships, and how they will be implemented
Possible ER symbol (not standardized)
04/19/23
18Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
STREETSEGMENT
CONTAINS
POLYGON G T
WETLANDS
CONTAINS
CONTAINS
CONTAINS
HAS
OCCUPANT
ADJACENT
HAS
CONTAINS
WITHIN
LINE G T
WATERMAIN
LINE G T
WATER SERVICECONNECTION
LINK
LINK
LINE G T
COINCIDENTLINE
INTERSECT
NODE G TLINK
ABUTTING
NEARESTADDRESS
HYDRANT
NODE G T
VALVE
NODE G T
STREET SYSTEM
POLYGON G T
FLOODPLAIN
POLYGON G T
SOILS
POLYGON G T
POLYGON G
BUILDING
POLYGON G T
POLYGON G T
POLYGON G T POLYGON G T
POLYGON G T
STREETSEGMENT
POLYGON G T
INTERSECTION
PARCEL
ZONING
CENSUSTRACT
CENSUSBLOCK
TRAFFICZONE
Example E-R Diagram with Spatial Concepts for Urban Application
Contains:--15 entities--16 relationsAttributes not included for simplicity
Corrected from original source
04/19/23
19Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
E-R DiagramFinal E-R diagram should be verified with users for:• required entities• appropriate spatial representation of entities• required attributes• appropriate spatial relations/operations
Once verified, the E-R diagram becomes the basis for the physical data base design.
04/19/23
20Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Creating E-R diagrams: Unified Modeling Language• E-R diagrams can be drawn by hand or with any drawing package• CASE (computer aided software engineering) tools emerged in
the 1980s and early 90s) to aid in the process (e.g. TI’s Composer)
• Each used their own proprietary language and symbolization• UML (Unified Modeling Language) developed in mid/late 1990s
to provide standardized modeling language based on object-oriented concepts.– Initiated at Rational Software Corporation in 1994/95 by merging of Grady
Booch’s (Booch Model), Jim Rumbaugh’s (OMT--Object Modeling Technique) and Ivar Jacobson’s (OOSE--Object-Oriented Software Engineering) method
• Existence of UML standard allows data base vendors to support automated conversion of conceptual data models to physical data base designs
• ArcGISs support use of MS Visio2000/02/03 (enterprise edition [2000], professional [02/03] for full support) [see Zeiler, p. 19-20] – Sample geodatabase schemas (templates) available for different industries
• Other UML based products: Rational Rose, Paradigm Plus, Oracle2000, ERwin (from Computer Associates)
04/19/23
21Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Physical Data Modeling
• Creating physical database design for the selected database software – Oracle, SQL Server, ArcGIS, Intergraph, etc
• We will focus on physical database design for an ESRI Geodatabase
Loosely corresponds to Tomlinson’s Chapter 8: Create a Data Design
…physical data base design involvesFor entities and attributes • Representing (“mapping”)* all entities and their attributes into one or more relational
tables in a selected RDBMS and determining keys for forming relationships• For each spatial entity, selecting an appropriate
– GIS data type e.g. polygon, line, point, surface
– GIS data set format for storage: e.g. geodatabase, coverage, grid, tin, image, etc
– Spatial Reference System • coordinate system (geographic or projected, etc.)• Spatial domain • Precision (or resolution)
• Selecting an appropriate Data Type for each field (attribute)– e.g. for ESRI: string, short integer, long integer, float, double, blob.
For relationships (associations)• for regular (non-spatial) relationships, identifying which of the RDBMS’s normal
query structures or relational operators will handle the relationship– if won’t do it, develop specs for a custom application– See Appendix for discussion of relational operators
• for spatial relations, identify which capabilities of software will handle the desired operation (e.g. nearest neighbor identification)
– if won’t do it, develop specs for a custom application
*Note: Computer people talk about “mapping” entities to tables rather than “representing”. Mapping in this sense does not mean producing cartographic output!
04/19/23
23Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Example of “mapping” a relationship for RDBMS attribute tables --the relation contains is mapped to a table join --the building table has a foreign or secondary key, (parcel ID#) topermit a join with the primary key (also parcel ID#) in parcel table The spatial data structure is shown as two generic layers
PARCEL
POLYGON POLYGON
BUILDINGCONTAINS
LAYER
PARCEL ID #
BUILDING ID #PARCEL ID #
LAYERPARCEL TABLE
BUILDING TABLE
table join
ConceptualE/R diagram
PhysicalDatabase design
(primary key)
(foreign key)(Primary key=foreign key)
(one) (many)
ParcelPOLYGON G T
ARC AAT TIC BND ETC. PAT
AreaPerimeterPoly ID #Sub_bl_lot#
Parcel
ParcelINFO Parcel ID#Owner_nameOwner_addSitus_addDepthFront_footageAssessed_valueLast_sale_dateLast_sale_price
Previous Values
Parcel ID#YearOwner_nameOwner_addressAssessed_value
Attributes represented in Oracle Tables
Attribute List of Entity "Parcel"Parcel [subdivision_block_lot#,
owner_name, owner_addresssitus_address, area, depth,front_footage, assessed_value,last_sale_date, last_sale_price(owner_name, owner_address,assessed_value as of Jan. 1 forlast two years)]
ARC/INFO Spatial Database Structure (coverage)
A key field for parcels is formed by concatenating subdivision, block & lot #s
Example of Spatial
Database “Mapping” for ArcInfo
coverage
What is wrong with this design?
spatial expression represented as an ArcInfo polygon coverage
PAT=Polygon Attribute Table
WATER MAIN
LINK
VALVE
HYDRANT
WATER MAIN ID #
VALVE ID #
HYDRANT ID #
WATER MAIN
WATER MAIN ID #
VALVE
VALVE ID #
HYDRANT
HYDRANT ID #
Attribute: SQL-Server TABLES
Spatial: geodatabase
Feature data set Feature class
Feature class
Design decisions:• why distribmains and transmains as separate feature classes?• why not valve with gate and hydrant subclasses? (--different attributes)• why prodwell1 and prodwell2?
Exampleof Database Mapping:for geodatabase
04/19/23
26Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
ESRI-related DB Design Decisions
ESRI DB Design Decision Overview• Type for spatial data layer
– Vector (point, line or polygon), raster, tin ?• Format for spatial data
– Coverage, shapefile or geodatabase ?Geodatabase Design Decisions• Feature Datasets
– Stand-alone feature classes or feature data sets (fds)?– Spatial Reference system for each feature data set
• Coordinate system: lat/long or projected? which projection? what parameters?• Spatial Domain (extent) and Precision ?
• Feature Classes (tables)– Subtypes or separate feature classes?
• roads feature class with road_type subtype, or separate freeway, arterial, street feature classes?• Attributes Types and Validation
– Type (string, long integer, short integer, etc.) ?– Validation Rule through application of Attribute Domain ?
• Domains and Defaults• Relationships and Associations between feature classes
– Implement Geometric networks and/or topology rules?– Implement relates or joins in the database or in ArcMap documents?
• Type of Geodatabase– Personal geodatabase or SDE-based geodatabase ?
04/19/23
28Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Spatial Data TypesOverall, ArcGIS 9 supports at least four representations of geographic data.
• Vector data for representing features– CAD, Coverages, shapefiles, geodatabases
• Raster data for images, and surfaces.– Image data in .bmp, .tiff, .jpeg, .sid, ERDAS formats
– Raster data in discrete or continuous GRIDS (ESRI’s native file format for raster)• Discrete grids can have an attribute data table, continuous do not
– Raster data in a Geodatabase (as of ArcGIS 9)
• Triangulated Irregular Networks (TINS) for surfaces.– Although TINS are a vector format, as of ArcGIS 9.1, they are not yet supported by
the Personal Geodatabase and must be stored in coverage workspaces or SDE geodatabase
• Tabular data (sometimes called Event tables).
– List of X,Y coordinates for points (such as may be output from GPS)
– “Locators” for finding a geographic position from an address.
A decision must be made as to the spatial data type for each layer.
04/19/23
29Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Personal GeodatabaseFeature data setFeature class (feature type = polygon)Feature class (feature type = arc)
Coverage (= feature class)Feature type (arc)Feature type (point)Feature type (polygon)Feature type (point)
In a gdb, feature class can have only one feature type.
A coverage can have multiple feature types-now viewed as a shortcoming.
Coverage (= feature class)Feature type (arc)Feature type (point)Locator (table)Raster ShapefileShapefile
Tracts feature class table
Features(rows)
Feature type
Feature ID(key field)
Secondary orForeign key
Spatial File Formats--example
04/19/23
30Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Texas geodatabaseDallas County feature data
setfeature classes
All feature classes extentwithin a feature data datum set must be in the projectionsame spatialreference system
Plano feature data setfeature class feature class
Stand-alone feature classesEach stand-alone feature class maybe in a differentspatial reference system US Geodatabase
Stand-alone feature class
Geodatabase Design Decisions—example
Rasters and TINs can be stored within a SDE geodatabase but not a personal geodatabase
04/19/23
31Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Geodatabase
Feature datasetsSpatial Reference
Object classes and subtypesFeature Classes and subtypesRelationship classesGeometric Network Topology
Domains
Validation Rules
Raster Datasets (SDE Only)rasters
TIN datasets (SDE only )nodes, edges, faces
Locatorsaddresses x,y locationsZip codes place namesroute locations
Anatomy of a GeodatabaseGeodatabases contain: feature datasets, raster
datasets, TIN datasets (planned 9.2), locators
Feature datasets contain various objects which all share a common spatial reference
Objects (e.g. Jane Blow, land owner) are instances of object classes (e.g. land owners) and have no spatial form.
Features, stored in feature classes, are spatial objects (e.g. land parcels) which are similar and have same spatial form (e.g. polygon)
Object (or feature) classes are tables, with objects (or features) in the rows of the table
Attributes are in the columns of the table Subtypes are an alternative to multiple object
(or feature) classes (e.g. ‘concrete’, ‘asphalt’, ‘gravel’ road subtypes): think of subtype as the most significant classification variable (attribute) in the class table
04/19/23
32Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
GeodatabaseFeature datasets
Spatial Reference
Object classes and subtypesFeature Classes and subtypesRelationship classesGeometric Network Topology
Domains
Validation Rules
Raster Datasets (SDE only)rasters
TIN datasets (SDE omly)nodes, edges, faces
Locatorsaddresses x,y locationsZip codes place namesroute locations
Anatomy of a Geodatabase contdRelationship classes are tables containing general
relationships between objects and/or features (e.g. between work order object class and roads feature class)
Geometric Networks models flows thru linear systems such as streams, sewers, raods
Topology models relationships among lines and areas (e.g. common state/county boundary)
Domains are sets of valid and/or default attribute values: (e.g. road lane count default is 2; valid values are integers 1-12 )
Validation rules control feature and attribute integrity by applying domains. 3 types: attribute rules (applied to attributes or subtypes e.g gravel road 1 or 2 lanes only); connectivity rules (e.g. gravel road cannot connect to freeway); and relationship rules (constrains cardinality of a relationship e.g. gravel road can have no more than 4 segments at an intersection)
Simple behaviors are realized thru domains and validation rules
Complex behaviors and custom objects are realized by extending rules with custom programming
04/19/23
33Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Feature classes (FC), feature datasets (fds) and subtypes• Spatial features (e.g. a land parcel) are grouped into feature classes: a table with spatial data
– Data in FC must have same topology type (all points, all lines, all polygons)• Water feature class with lakes (polygon) and streams (line) not permitted
– Minimizing the number of feature classes improves performance• Use different feature classes only when attributes are significantly different
– Use roads feature class rather than freeway, arterial, streets feature classes– Use subtype to differentiate freeway, arterials, streets (all have similar attributes)
• Subtypes are “subclasses” within a feature class that allow you to further distinguish objects without creating new feature classes
– based on a single column’s values (must be integer or long integer)– Same subtype has similar attribute values and behaviors– Use where attributes are the same across all subtypes
• Feature classes can be grouped into feature datasets (fds) or “spatial folders”– All feature classes in a fds must have the same spatial reference system, but may have different
topology (can have points and lines and polygons in same fds) – Organize by thematic similarity e.g transportation– If you wish to create a geometric network, must be in same fds– If you wish to create topology, must be in same fds – If they share geometry (street forms political boundary), should be in same fds– Security (read/write permissions, etc..) applied at the fds not the fc level!!!!
04/19/23
34Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Data Types for Attributes• For every attribute field, must select a data type• Each RDBMS stores data slightly differently• ESRI generic data types will translate into closest RDBMS equivalent• Values given below may differ with RDBMS used ESRI Generic Data TypesString: text field. Be sure its length (number of characters), absolute or what you specify, is sufficient to record
longest data value.Short Integer: (or integer) whole numbers (no decimal point) generally +/-32,767 (2 bytes). OK for size of family, not OK for city sizeLong Integer: (or long) only supports integers to +/- 2,147,483,647 (4 bytes) Float: (or single) single precision floating point; again, be careful-- supports decimal point but perhaps only 6
digits long with decimal moveable 34 places (E34) (4 bytes)Double: double precision floating point; the safest-- supports 12-15 digits with decimal moveable up to 308
places (E308) (8 bytes)Blob: binary long decimal for special programming applicationsNote terminology:• Precision: the total number of digits (before plus after decimal)• Scale: number of digits after decimal
04/19/23
35Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Domains and DefaultsWhy Use Them? • Data Integrity: prevents entry of invalid (“obviously wrong”) data values• Data Efficiency: choose from a set of valid values rather than type in each timeDomains define a set of legal values for a field’s attributes• Range domain: specifies a valid range of values for numerical attributes
– A water pipe must be between 1 and 100 inches wide• Coded value domain: specifies a valid set of values for an attributes. Can apply
to any type of attributes– Parcels can only have RES or VAC land use values
• Domains are defined as a geodatabase property & then applied as appropriate– Multiple objects in the same database may use the same domain– May be applied to an entire field (attribute), or separately by subtype
Defaults are values automatically assigned when a feature is created– Of course, may be changed during data entry/edit process– Again, may be applied to an entire field (attribute), or separately by subtype
Again, the physical design process requires decisions about domains and defaults, and to what they should be applied.
04/19/23
36Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Implementing Associations and Relationships Associations (general term) may be implemented as• A relationship class within the GDB.
– both classes (tables) in the relationship (i.e. the related tables) must be within the GDB– permanent part of the data model and relationship rules are enforced – Is itself a class, stored in tables, with properties and behaviors– support 1:1 and 1:M cardinality, and many to many using a key class table.– Strictly speaking, everything else is an “association”
• An ArcMap relate– Functionally similar to a relationship class – supports both 1:1 and 1:M cardinality– Can form a relate to tables outside the GDB
• INFO tables, Access tables, dbf tables• Tables in other databases via ODBC (object data base connectivity)
– Local to the ArcMap document therefore is essentially temporary
• An ArcMap join– Supports 1:1 cardinality only– Links matching objects and visualizes them as rows in a single table– Can match objects either
• Non-spatially using key attribute fields in each table: exists only temporarily• Spatially using containment, nearness, etc. criteria: saved as new feature class or new shapefile
• An ArcMap hyperlink– Attribute in a table stores a hyperlink to a document outside GDB
• path name to file-based documents (spreadsheets, text, photos, video or sound clips, etc.)• URLs for Internet documents
– Not part of the GDB in any way; implemented via layer properties in ArcMAP
--Use relationship class when the referential integrity of data is important--Joins are simpler and require less overhead.
04/19/23
37Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Some Special Case Relationships• Many to Many relationships
– Many class A objects match many class B (parcels & owners)– Implemented via an “attributed relationship”– Intermediate table is created to store the relationship
• Aggregation v. Composition and Simple v. Complex Relationships– Aggregation: e.g. dog has bowl, collar valve has valve box
• Aggregation implemented through simple relationships• Peer to peer: delete one, the other remains: Dog dies but bowl and collar remain
– Composition: e.g. dog has feet, tail valve has maintenance records• Composition implemented through complex relationships• Enforced dependency: delete one, and the other goes also: Dog dies, feet and tail gone
LegalArea Zoning PIN1.5 RES 1
0.75 COM 20.75 RES 30.18 RES 40.18 AGR 5
PIN Percent OWN1 100 12 50 13 50 23 25 33 75 44 75 14 25 35 100 3
Origin PK
Origin FKDestination FK
Own Name1 Mary2 Joe3 Raj4 Pedro
Destination PK
Parcel Ownership Owner
04/19/23
38Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Spatial ReferenceAll feature classes within a feature dataset must have the same spatial reference.
• Coordinate System– Datum– Geographic (lat/long) or projected?– Projection parameters: central meridian, standard parallels, coordinate system origin
(false easting and northing)– Measurement (map) units: dd (for lat/long); feet, meters, etc. (for projected)
• Spatial domain– The allowable coordinate range for the geographic coordinates
• X/Y Domain: MinX, MaxX, MinY, MaxY (horizontal extent)– Domain defaults to 3 times the the actual data extent (100% on either side)
• Z Domain: Min, Max (vertical extent)• M Domain: Min, Max (other parameter, e.g. distance from river mouth ) (can differ within
feature data set)
– Once created, the spatial domain for feature dataset/class cannot be changed.– Data outside domain will require a new feature dataset or standalone feature class.
• Precision– Number of system storage units (SU) per one map measurement unit (MU)
• If precision is 1 and mu= 1 meter ( 1 SU per MU), cannot record values less than 1 meter • If precision is 100 and mu= 1 meter (100 SUs per MU), can record values
to 1/100 = .01 = 1 cm
04/19/23
39Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Precision and Spatial DomainWilson NC data
Y values
2302168 238705223344962249613
3232852555 52555
X values
WilsonCity.shpShapefile Extent(range of actual data
when fds created)
Geodatabase Spatial Domain100%-200% wider on all sides thereforeDomain range is at least 3 times Extent range
(Exact amount depends on Precision)
--Geodatabase coordinates are stored as 4byte long integer. This provides 10 significant digits with max value of 2,147,483,648-- map value is multiplied by precision when stored (and converted back when displayed), so min Y values, for example, actually stored as 793707x15624=1,2400,888,268-- the values are also shifted when stored so that data is centered in storage space so you only have to ensure that max. range times precision is less than 2,147,483,648 137438x15624=2,147,331,300 < 2,147,483,648 OK, otherwise reduce domain or precision
Precision is: 15,624Since map units are feet, will support accuracy to 15624/12= 1,302th of an inch!!!
You have 10 significant digits to work with. Precision in essence controls were you put the decimal.If map unit is meters and precision is 1000, you record down to the nearest millimeter. E.g. the map value 1,123,456.1236 is stored as 1,123,456,124
With GRS80, world circumference is 40,075,016 m. therefore can map world at approx. 1.9 cm accuracy (40,075,016*100)/ 2,147,483,648
137,438
13
7,4
38
Domain range in x
Extent range in y
704002
656268
745972
793707
47734
47734
41970
Extent range in x
Dom
ain
rang
e in
y
04/19/23
40Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Personal versus MultiUser (ArcSDE) Geodatabase• Personal Geodatabase.
– Implemented as a Microsoft Access database (*.mdb file) by using MS Jet engine which is installed along with ArcGIS8.
– Microsoft Access license not needed, but its handy to have for attribute data development
– Can be placed on local or network drives
– if on network drive and a user has edit access, other users can’t access (single user editing).
– Intended for personal or small work-group use
– can handle small to moderately sized datasets.• Max of 250,000 features per feature class (table)• maximum size is 2.0 GB
– In general, has the full functionality of ArcSDE geodatabase except
• versioning.
• Multi-user editing
– If a personal ggdb is deleted in ArcCatalog (or by Windows Explorer) , its gone.
• One .mdb file can contain a lot of data. Be carefull!!
04/19/23
41Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Personal versus MultiUser (ArcSDE) Geodatabase
• Enterprise Multiuser ArcSDE Geodatabase– ArcSDE is a data access extension to ArcGIS 8 & 9 that serves geodatabases to
ArcGIS applications running on PC’s connected via a TCP/IP network.– supports concurrent editing by multiple users.
• Supports versioning where multiple users can concurrently edit different versions of a layer and any conflicts are resolved when versions are saved back to the original layer
– significantly higher speeds for data access than shapefiles or personal geodatabases– Supports very large databases without the need to tile or otherwise ‘subdivide’ the
data– ArcCatalog only creates and deletes connections to ArcSDE geodatabases, it can’t
delete the database– Can be deployed on UNIX or Windows NT.
• Many use UNIX platform for ArcSDE and DBMS, and XP for GIS applications– ArcSDE is centrally tuned and managed by a DBA.– Back-up and security procedures implemented in the DBMS apply to the GIS data.– Can build SQL applications to access tables in a remote geodatabase.– Requires a server with a DBMS (e.g Oracle, SQL Server) and ArcSDE.
SDE dbGISUser
server
04/19/23
42Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Generating Geodatabase Schema• “Schema” is the definition of objects contained within a database
– For a geodatabse, objects may include Domains, Tables, FeatureClasses, Relationships or GeometricNetworks.
• Four solutions for schema generation/management in ArcGIS 9
1. Manual creation using ArcCatalog, or
2. Design from scratch in Microsoft Visio, output to XMI* Repository, and use ArcCatalog's Case Tool to import from XMI
3. As above, but begin with one of ESRI existing Sample Data Models**, and edit
4. GeodatabaseDesigner extension (free ArcScript downloadable from www.esri.com) which can be used in conjunction with first 2 or, in some cases, as an alternative.
*XMI is a new standard for storing object models. Supported by Visio 2002 and later
**ESRI has sample data models for a variety of areas:--hydro, parcel, transportation, utilities, etc..--download from Web site
04/19/23
43Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Pros and Cons • ArcCatalog Menu and Wizards
– Simple and easy to use; documented within ArcCatalog– Easy to introduce gross errors in the schema through either omission or addition
• Visio(UML) and the ArcCatalog Case Tool– UML (Unified Modeling Language) is a standard for modeling object-oriented objects and their properties, thus
can be applied to other databases– object inheritance significantly reduces duplication– Uses Visio's strong graphical functionality for an easy way to visualize the design.– However,
• need to buy it• More complex with steep initial learning curve• only supports a subset of geodatabase properties
• Geodatabase Designer (GD)– Fast and free to use and distribute (although not officially supported) – Supports all geodatabase properties and all ArcSDE RDBMS's.– The only bi-directional solution : schema can be EXPORTED and IMPORTED– However,
• A proprietary solution which only works with ESRI geodatabase • Only displays in html text, but can use Geodatabase Diagrammer (another free extension) to display in Visio
04/19/23
44Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Database Normalization• An important step in the physical design is database
normalization• Developing a table structure which:
– Reduces or eliminates redundancy– Makes tables easy to manage– Simplifies changes in the future
• There is an entire theory of database normalization– we don’t have time to go into it
• Just present an example– The usual goal is to create a table structure which is in 3rd
normal form (3NF)
Parcel_ID Parcel_ad Block Precinct Councillor City Mayor Own1_name Own1_ad Own2_name Own2_ad Value8 501 Sadowski 1 1001 Smith Big Green Sadowski. M 501 Sadowski 105,4509 590 Sadowski 2 1002 Jones Big Green Adams, K 590 Sadowski Adams, M 590 Sadowski 89,78036 1001 Adnan 4 1002 Jones Big Green Sadowski, M 501 Sadowski 101,50075 1175 Dadley 12 1004 Hassan Little White Kroeger 592 Tierney Bertrand. K 1097 Bertrand 98,000
Unnormalized (flat file)
OWNER TABLE key fieldsOwner_ID Owner_name Owner_ad
001 Sadowski. M 501 Sadowski002 Adams, K 590 Sadowski004 Kroeger 592 Tierney003 Adams, M 590 Sadowski005 Bertrand. K 1097 Bertrand
PARCEL TABLEParcel_ID street_no street_name Block Owner_ID Value
8 501 Sadowski 1 001 105,4509 590 Sadowski 2 002 89,780
36 1001 Adnan 4 001 101,50075 1175 Dadlexz 12 004 98,0009 590 Sadowski 2 003 89,780
75 1175 Dadlexz 12 005 98,000COUNCILOR TABLE MAYOR TABLE
Precinct Councillor City City Mayor1001 Smith Big Big Green1002 Jones Big Little White1004 Hassan Little
3rd Normal form:--all fields are determined by primary key field
See: Appendix IIfor more detail
04/19/23
46Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Data Importing Vs. Data Loading• Importing
– Creates new features within a new feature class or geodatabase table.
• The features class or table cannot exist before importing
– Database schema is imported at the same time– Often involves conversion from other formats e.g. coverages
• Loading– Appends features into an existing feature class. – Existing feature class must have the same schema as the data
sources– Can be accomplished with:
• Simple Data Loader (ArcCatalog)• Object Loader Wizard (ArcMap)
04/19/23
47Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Conclusion• The outcome of these steps is:
– A rigorous design for our database: a “database schema”
– The design of a process for obtaining the data elements that will populate our database schema
• Identifying a data source and the necessary processing sequence for each layer
• covered in Implementation Steps lecture
Next time, we will go into the lab and look at some of this in practice.This will involve many ESRI-specific design decisions as outlined in:
dbdecisions.ppt
04/19/23
48Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Appendix I
DBMS Relational Operators
“for regular (non-spatial) relationships (in the ER Diagram or UML model), physical database design involves identifying which of the RDBMS’s normal query structures or relational operators will handle the relationship”
04/19/23
49Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
RDBMS: Relational Operators• Select (or Restrict)
– retrieves a subset of rows from a table based on value(s) in a column or columns
• Project– retrieves a subset of columns from a table, removing duplicates from the result
• Product– produces the set of all rows that are the concatenation of a row from one relational table with a row from another relational
table– (usually an intermediate step; not useful otherwise)
• Join– horizontally combines (contatenates) rows in one table with rows in another (or the same) table, including only rows which
meet some selection criteria relating columns of the two tables– Combines product and select
• Union– vertically combines (stacks) rows of one table with rows in the same or a different table
• Intersection– results in rows common to two (or more) relational tables
• Difference– results in rows that appear in one table but not another
• Division– results in common values in one table for which there are other matching column values corresponding to every row in another
table
Examples follow in the next three slides……
04/19/23
50Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
PRODUCT of STORE and SUPPLIER Tables
Supplier Supplier Store Store
supplier_name flavor Name Location
Mr Chip vanilla Mr Chip New York
Mr Chip vanilla Bean, Inc. New York
Mr Chip vanilla Mrs Mousse'sNew Jersy
Mr Chip vanilla Mr Mousse's New Jersy
Mr Chip vanilla Sancho's California
Mr Chip vanilla Diet-Crème Colorado
Mr Chip vanilla Freeze-it Alaska
Mr Chip Chocolate Mr Chip New York
Mr Chip Chocolate Bean, Inc. New York
Mr Chip Chocolate Mrs Mousse'sNew Jersy
Mr Chip Chocolate Mr Mousse's New Jersy
Mr Chip Chocolate Sancho's California
Mr Chip Chocolate Diet-Crème Colorado
Mr Chip Chocolate Freeze-it Alaska
Mrs Chip Avocado Mr Chip New York
Mrs Chip Avocado Bean, Inc. New York
Mrs Chip Avocado Mrs Mousse'sNew Jersy
Mrs Chip Avocado Mr Mousse's New Jersy
Mrs Chip Avocado Sancho's California
Mrs Chip Avocado Diet-Crème Colorado
Mrs Chip Avocado Freeze-it Alaska
Mrs Chip date-nut Mr Chip New York
Mrs Chip date-nut Bean, Inc. New York
Mrs Chip date-nut Mrs Mousse'sNew Jersy
Mrs Chip date-nut Mr Mousse's New Jersy
Mrs Chip date-nut Sancho's California
Mrs Chip date-nut Diet-Crème Colorado
Mrs Chip date-nut Freeze-it Alaska
Diet-crème cottage chees Mr Chip New York
Diet-crème cottage chees Bean, Inc. New York
Diet-crème cottage chees Mrs Mousse'sNew Jersy
Diet-crème cottage chees Mr Mousse's New Jersy
Diet-crème cottage chees Sancho's California
Diet-crème cottage chees Diet-Crème Colorado
Diet-crème cottage chees Freeze-it Alaska
Diet-crème skim milk Mr Chip New York
Diet-crème skim milk Bean, Inc. New York
Diet-crème skim milk Mrs Mousse'sNew Jersy
Diet-crème skim milk Mr Mousse's New Jersy
Diet-crème skim milk Sancho's California
Diet-crème skim milk Diet-Crème Colorado
Diet-crème skim milk Freeze-it Alaska
STORE TABLEStore_Name LocationMr Chip New YorkBean, Inc. New YorkMrs Mousse's New JersyMr Mousse's New JersySancho's CaliforniaDiet-Crème ColoradoFreeze-it Alaska
SUPPLIER TABLESupplier_Name FlavorMr Chip vanillaMr Chip ChocolateMrs Chip avovadoMrs Chip date-nutDiet-Crème cottage cheesDiet-Crème skim milk
SELECT (RESTRICT) of Store Table on Location = New JerseyStore_Name LocationMrs Mousse's New JersyMr Mousse's New Jersy
PROJECT of Store table on Location column
LocationNew YorkNew JerseyCaliforniaColoradoAlaska
JOIN Store Table and Supplier Table by vendor name (same as Select from Product when supplier_name=store_nameMr Chip vanilla Mr Chip New YorkMr Chip Chocolate Mr Chip New YorkDiet-crème cottage cheesDiet-Crème Colorado(note: this is an equi-join; there are other types)
Base Tables
04/19/23
51Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
EUROPE_STORE TableStore_Name Location StatusSir Chip London open Monsieur Chip Paris closedSenor Chip Madrid open
STORE TABLEStore_Name LocationMr Chip New YorkBean, Inc. New YorkMrs Mousse'sNew JersyMr Mousse's New JersySancho's CaliforniaDiet-Crème ColoradoFreeze-it Alaska
UNION of STORE and EUROPE_STORE TablesStore_Name Location StatusMr Chip New York nullBean, Inc. New York nullMrs Mousse's New JersynullMr Mousse's New JersynullSancho's California nullDiet-Crème Colorado nullFreeze-it Alaska nullSir Chip London open Monsieur Chip Paris closedSenor Chip Madrid open (duplicates removed, if any)
MY_FAVORITES TableStore_Name LocationComrade Chip MoscowSir Chip LondonHerr Chip Berlin
INTERSECTION of MY_FAVORITES and EUROPE_STORE TablesStore_Name Location StatusSir Chip London open
DIFFERENCE of MY_FAVORITES and EUROPE_STOREStore_Name LocationComrade Chip MoscowHerr Chip Berlin
DIFFERENCE of EUROPE_STORE and MY_FAVORITESStore_Name Location StatusMonsieur Chip Paris closedSenor Chip Madrid open
Base Tables
04/19/23
52Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Terri’s job function is to check if employees of Big City X have taken required courses in the city’s employee training program. The process is to compare courses taken with required courses. Terri can be replaced by a RDBMS Division relational operator!
Completed Course TableStudent ID# S_Name Course# C_Name
10 Fred mis101 data10 Fred gis101 gis10 Fred mis201 program10 Fred mis301 networks30 Karen mis101 data20 John gis201 gps30 Karen gis101 gis30 Karen mis301 networks30 Karen mis201 program
Required Course TableCourse# C_Namemis101 datamis201 programmis301 networks
DIVISION of Completed_Course Table by Required_Course TableStudent ID# S_Name
10 Fred30 Karen
Base Tables
The division operator identifies Fred and Karen
04/19/23
53Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Appendix IIDatabase Normalization Detail
•Developing a table structure which:–Reduces or eliminates redundancy–Makes tables easy to mange–Simplifies changes in the future
•Our goal is normally to have all tables in third normal form (3NF)
04/19/23
54Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Unnormalized Data (Flat File)
Unnormalized form (flat file)
• You work for the county. In this particular state, the county records land ownership, values property, and manages all elections held in the county.
• Some of the information you need is shown in the flat file above
• This format has many problems, some of which are pointed to above
Parcel_ID Parcel_ad Block Precinct Councillor City Mayor Own1_name Own1_ad Own2_name Own2_ad Value8 501 Sadowski 1 1001 Smith Big Green Sadowski. M 501 Sadowski 105,4509 590 Sadowski 2 1002 Jones Big Green Adams, K 590 Sadowski Adams, M 590 Sadowski 89,78036 1001 Adnan 4 1002 Jones Big Green Sadowski, M 501 Sadowski 101,50075 1175 Dadley 12 1004 Hassan Little White Kroeger 592 Tierney Bertrand. K 1097 Bertrand 98,000
repeating groups of fields.What if there are 3 (or 25) owners?
Not smallest meaningful value.How can you sort by street, then number?
First Normal Form (1NF)First Normal Form (1NF):•Each field contains smallest meaningful value
•Parcel_ad split into two variables (street_no & street_name), thus can sort on street, then number•Owner_ad left as complex attribute ‘cos only used for mailing
•No repeating fields (owner1, owner2, etc..)•There is now no limit on number of owners per parcel
Parcel_ID street_no street_name Block Precinct Councillor City Mayor Owner_ID Owner_name Owner_ad Value8 501 Sadowski 1 1001 Smith Big Green 001 Sadowski. M 501 Sadowski 105,4509 590 Sadowski 2 1002 Jones Big Green 002 Adams, K 590 Sadowski 89,78036 1001 Adnan 4 1002 Jones Big Green 001 Sadowski, M 501 Sadowski 101,50075 1175 Dadlexz 12 1004 Hassan Little White 004 Kroeger 592 Tierney 98,0009 590 Sadowski 2 1002 Jones Big Green 003 Adams, M 590 Sadowski 89,78075 1175 Dadlexz 12 1004 Hassan Little White 005 Bertrand. K 1097 Bertrand 98,000
However, problems in that:•Must use multiple primary key fields (parcel_id and owner_id) to uniquely identify a record •multiple repeating values when there are two (or more) owners: Street_no, street_name, block, precinct,councillor, mayor, city, Owner_name, Owner_ad all have repeats. Wastes space, and
•If an owner’s address changes, multiple records (rows) must be updated•If a parcel is sold, and the owner does not own any other property (e.g. Kroeger, Adams, M or Bertrand), information about that owner is lost
04/19/23
56Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Second Normal Form (2NF):concept of 2NF and problem with 1NF table
• Second Normal Form (2NF) requires that every non-key field (attribute) be “functionally dependent” on the primary key– Functional dependency is a relationship between attributes such that knowing
one attribute automatically determines the other
• Tables with multiple fields making up the primary key are not 2NF– This usually shows up as repeating values in an attribute fields
– For example, owner_ID repeats, and knowing the owner does not determine the councilor
• knowing owner_ID (part of primary key) is 001, does not determine Councilor, which could be Jones or Smith.
Parcel_ID street_no street_name Block Precinct Councillor City Mayor Owner_ID Owner_name Owner_ad Value8 501 Sadowski 1 1001 Smith Big Green 001 Sadowski. M 501 Sadowski 105,4509 590 Sadowski 2 1002 Jones Big Green 002 Adams, K 590 Sadowski 89,78036 1001 Adnan 4 1002 Jones Big Green 001 Sadowski, M 501 Sadowski 101,50075 1175 Dadlexz 12 1004 Hassan Little White 004 Kroeger 592 Tierney 98,0009 590 Sadowski 2 1002 Jones Big Green 003 Adams, M 590 Sadowski 89,78075 1175 Dadlexz 12 1004 Hassan Little White 005 Bertrand. K 1097 Bertrand 98,000
(1NF table)
04/19/23
57Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Second Normal Form (2NF): example of 2NF form PRECINCT TABLE key field
Precinct Councilor City Mayor1001 Smith Big Green1002 Jones Big Green1004 Hassan Little White
OWNER TABLEOwner_ID Owner_name Owner_ad
001 Sadowski. M 501 Sadowski002 Adams, K 590 Sadowski004 Kroeger 592 Tierney003 Adams, M 590 Sadowski005 Bertrand. K 1097 Bertrand
PARCEL TABLEParcel_ID street_no street_name Block Precinct Owner_ID Value
8 501 Sadowski 1 1001 001 105,4509 590 Sadowski 2 1002 002 89,78036 1001 Adnan 4 1002 001 101,50075 1175 Dadlexz 12 1004 004 98,0009 590 Sadowski 2 1002 003 89,780
• In each table, there is only one key field, and knowing its value determines the value of all other attributes
– Satisfies criteria for 2NF
– Far fewer repeats and duplicate editing problems
• Note that there are still shortcomings, for example– if the mayor of city “big” changes, we must update two records
(2NF tables)
04/19/23
58Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
Third Normal Form (3NF)• 3NF requires that no non-key field be a fact about another non-key field
– This will be violated if there is transitive dependency in a table• Transitive functional dependency occurs when the value in a non-key field is determined by a value in another
non-key field• The value for city determines mayor, (and neither of these is the key field)
• In 3NF, fields can only be attributes of the primary key, and not of some other field– Tables not in 3NF usually have repeating values in a non-key field (e,g mayor field in PRECINCT table)– Mayor ‘Green’ is a fact about city (a non-key field), not about precinct (the key field)
PRECINCT TABLE (2NF)Precinct Councilor City Mayor
1001 Smith Big Green1002 Jones Big Green1004 Hassan Little White
COUNCILOR TABLE (3NF)Precinct Councillor City MAYOR TABLE (3NF)
1001 Smith Big City Mayor1002 Jones Big Big Green1004 Hassan Little Little White
Precinct Table (in 2NF) is split into Councilor and Mayor tables in 3NF
04/19/23
59Ron Briggs, UT-Dallas GISC 6383 GIS Management and Implementation
4th and 5th Normal Form• Our goal is usually to have all tables in at least 3rd normal form• 4th and 5th normal forms also exist, but these can add disadvantages (for
example, processing inefficiency) as well as advantages• For example, 5th Normal form has no duplicated data, but requires junction
tables to link data and form relationships
PRECINCT TABLE (5NF) COUNCILOR TABLE (5NF) CITY TABLE (5NF)Precinct Voter_count Councilor Age City Mayor Population
1001 3245 Smith 27 Big Green 500,0001002 5600 Jones 85 Little White 5,0001004 2001 Hassan 431005 750
Junction Tables
Precint Councilor Councilor City1001 Smith Smith Big1002 Jones Jones Big1004 Hassan Hassan Little1005 Hassan