Post on 07-Apr-2015
GIS DATA MODELLING AND MANAGEMENT
Prof. Ganesh D BhutkarSubject Teacher
Student Group: -
GIS Data Modelling and Management 1
Student Group: -Sohan Pachhade BE IT 2008-09 J-29Vivek Bamne BE IT 2008-09 J-06Sanyog Salve BE IT 2008-09 J-34
Reference:Chapters 8 & 9: Spatial Data Modeling and GIS Data Management
M. Anji Reddy, Remote Sensing and GIS, B S Publications, Second Edition, 2006.
SPATIAL DATA MODELLING
• It is a precise and clear process about how to turn dataabout spatial entities into graphical representations.
• The two main approaches in which computer can handleand display spatial entities are :-
1. Raster Approach
GIS Data Modelling and Management 2
1. Raster Approach2. Vector Approach
• A map contains spatial elements like monuments, roads,rivers and parks.
• Spatial modeling is very much useful in understandinggeographical problems.
STAGES OF GIS DATA MODELLING
• Identifying the spatial features from the real worldthat are of interest in context of application.
• Representing the conceptual data model by anappropriate spatial data model. This involveschoosing between one of the two approaches: raster
GIS Data Modelling and Management 3
choosing between one of the two approaches: rasteror vector.
• Selecting an appropriate spatial data structure tostore the model within the computer. The spatial datastructure is the physical way in which entities arecoded for purpose of storage and manipulation.
• An entity is the element in reality• Geographical entities can be represented by 3 main
entities, i.e Points, Lines and Areas.• There are two additional spatial entities :-1. Surface
GRAPHIC REPRESENTATION OF SPATIAL DATA
GIS Data Modelling and Management 4
1. Surface: It is used to represent continuous features orphenomenon. For these features, there is a value atevery location. e. g. Temperature, Population Density.
2. Network: It is a series of interconnecting lines alongwhich there is a flow of data, objects or materials. e. g.Road network along which there is a flow of traffic to andfrom the areas.
1. How to select proper entity type for providingappropriate representations ?
2. How to represent changes over time ?
CHALLENGES IN DEFINITION OF ENTITIES
GIS Data Modelling and Management 5
2. How to represent changes over time ?
e. g. Vegetation in forest may be a continuous featurewhich can be represented by a surface forecologists whereas it may be represented as aseries of discrete area entities by governmentofficials.
• The terrain is divided into number of parcels or units calledas grid cells. Each grid cell is of same size and hence itoccupies same amount of geographical space.
• It does not provide precise locational information becauseof grid cells. The simplest way of including attribute data foreach entity is to assign a number representing the attributelike a class of land cover, for each cell. E.g. 0 for Water and
RASTER DATA REPRESENTATION
GIS Data Modelling and Management 6
like a class of land cover, for each cell. E.g. 0 for Water and1 for Land.
• The resolution is given by m * n i.e columns * rows.
Problems with raster representation:1. Lack of absolute locational Information,2. Reduced spatial accuracy, reliability of distance.3. Need for large storage capacity.
• Vector representation allows us to to give specific spatiallocations specifically.
• All entities are represented using points (basic buildingblocks) having x and y co-ordinates.
• Line and area entities are constructed by connecting aseries of points into chains and polygons.
VECTOR DATA REPRESENTATION
GIS Data Modelling and Management 7
series of points into chains and polygons.• Attributes are linked through software linkage.
Problems with vector representation:1. Selection of appropriate number of points to construct an
entity.2. Representation of networks and surfaces is complex.
1. GRID Model
2. IMGRID Model
TYPES OF RASTER GIS MODELS
GIS Data Modelling and Management 8
2. IMGRID Model
3. MAP (Map Analysis Package) Model
• Compact data reduces the information content toabsolute minimum.
• Compact data is needed for efficient storage andfaster retrieval.
• Based on nature of GIS data and existence of
COMPACT RASTER DATA MODELS
GIS Data Modelling and Management 9
• Based on nature of GIS data and existence ofavailable facilities the compact methods are groupedas : -
1. Run-Length Coding2. Raster Chain Codes3. Block Codes4. Quad trees
COMPACT RASTER DATA MODELS (Contd..)
RUN LENGTH CODES• Each grid cell has a numerical value corresponding to a category of
data.• If there are 500 * 500 grid cells, then 250000 numbers have to be
typed.• There are long strings of same numbers in each row. The long string
is called run. Every run has some length, which is used for
GIS Data Modelling and Management 10
is called run. Every run has some length, which is used forcompactness - (R, N).
• Its disadvantage is that it works on a row by row basis, so it’s tedious.RASTER CHAIN CODES• This method of data reduction works by defining the boundary of the
entity.• Here the directions are represented by numbers to avoid mistakes.(0
is North, 1 is East, 2 is South, 3 is West)• Method of storing data is based on (X,Y,N,D) where (X,Y) - start
points, N - No of cells & D - direction.
BLOCK CODES• Modified run length code i.e it selects a square group of cells and
assigns a starting point, the centre or corner, pick a grid cell value andtell the computer how wide the square of grid cell is based on no. ofcells.
• Effective method of reducing the storage space for most thematically
COMPACT RASTER DATA MODELS (Contd..)
GIS Data Modelling and Management 11
layered digital data in GIS.
QUADTREES• It’s a difficult approach which works on a square group of cells.• Map is successively divided into uniform square group of grid cells with
same attribute value.• The map is then divided in 4 quadrants. NW, NE, SW, SE.• This method is only possible with raster data model and is quite
innovative because it uses recursion and divides the images into quadsor quarters till the smallest unit cell.
TYPES OF VECTOR GIS MODELS & COMPACT MODELS1. Spaghetti model
2. Topological Models (GBF / DIME, TIGER &POLYVRT)
GIS Data Modelling and Management 12
3. Shape file
Compact Models:
1. Galton’s Model
2. Freeman-Huffman Chain Codes
Parameter RASTER VECTOR1. Data Structure Simple Complex
2. Data Structure Compactness
Lesser More
3. Overlay Operations Easily & efficiently implemented
More difficult to implement
COMPARISION OF DATA MODELS
GIS Data Modelling and Management 13
4. High Spatial Variability
Efficiently represented Inefficient
5. Topological Relationships
More difficult to represent
Efficient encoding of topology
6. Graphical Output Less aesthetically pleasing.
Better suited.
7.Base Location-based Object-based
DBMS is a software to control the storage,retrieval and modification of data in a database.
It is designed for -
DATABASE MANAGEMENT SYSTEM (DBMS)
GIS Data Modelling and Management 14
It is designed for -� File handling & management� Record maintenance� Extraction of information from data (Queries)� Maintenance of data security and integrity� Application building (Reports)
DBMS APPLICATIONS
• Travel agency system, • Banking system • Library management system,
GIS Data Modelling and Management 15
• Library management system, • Railway reservation system, • Student admission system, • Financial accounting system etc.
• Security : It refers to protection of data againstaccidental or intentional disclosure tounauthorized persons and protection againstunauthorized access, modification or destructionof database.
• Integrity : It is an ability to protect data from
FUNCTIONS OF DBMS
GIS Data Modelling and Management 16
• Integrity : It is an ability to protect data fromsystems problems through a variety of assurancemeasures like range checking, backup andrecovery.
• Synchronization : It refers to forms of protectionagainst inconsistencies that can result frommultiple simultaneous users.
• Physical data independence : It means theunderlying data storage & manipulation hardwareshould not matter to the user.
• Minimization of redundancy : Redundancy isgenerally not advisable in a database. And storing
FUNCTIONS OF DBMS (Contd..)
GIS Data Modelling and Management 17
generally not advisable in a database. And storingand manipulating the dependencies increasesdifficulty of working data. Soit uses Normalization.
• Efficiency : Data retrieval operations mainlydepend on volume of data, method of dataencoding, design of database structures andcomplexity of query.
1. Data definition2. Storage definition3. Database administration4. Data manipulation
COMPONENTS OF DBMS
GIS Data Modelling and Management 18
� In data retrieval, mapping must be made betweenhigh-level objects in query language statement andthe physical location of data on storage device.
� Query compiler or optimizer is used to optimize thecode so that performance on the retrieval isimproved.
Following are the basic file file structures used in GIS:Simple List :
Records are placed in the order in which they areentered. The main advantage is to add a record justappend it. The disadvantage is lack of structure whichmakes searching very inefficient.
GIS DATA FILE MANAGEMENT
GIS Data Modelling and Management 19
makes searching very inefficient.Ordered Sequential Files:
It uses alphabetic characters. Data Is arranged inrecognizable sequences against which individuals canbe compared . The normal search strategy is sort ofdivide and conquer approach. It avoids search time toget data.
Indexed Files:These are more superior than the rest of the methods.These are based on the index or code. It uses apointer to locate a record. This type of search has 3requirements first it requires a criteria before hand,second it requires recalculation of index from original
GIS DATA FILE MANAGEMENT (Contd..)
GIS Data Modelling and Management 20
second it requires recalculation of index from originaldata, third sequential search methods are needed toobtain information.
Relative File:These are like indexed files only; but index used isrecord number.
Four Options to build GIS real world model are:LGCU (Least Common Geographical Units) based GIS :
It integrates all pertinent spatial data records into asingle set of all classes.
Layered based GIS : Each layer reflects different set ofattributes. It is a series of thematic layers. GIS data is
BUILDING GIS MODELS
GIS Data Modelling and Management 21
attributes. It is a series of thematic layers. GIS data isbroken down into logical terrain units related to layers.
Feature based GIS : It is a new approach where GISfeatures are stored as spatial or non spatial data.
Object orionted GIS : Features are not divided into layers,but grouped into classes and hierarchies of objects. Theadvantage is its reusability, but Implementation iscomplex.
• Implementation Issue is the integration of GIS withexisting internal databases. Most of the database arerelational.
• Other models by which real world database model isbuilt are hierarchical and network database models.
• Almost all existing and most widely used GIS software
DATABASE MODELS
GIS Data Modelling and Management 22
• Almost all existing and most widely used GIS softwarelike ARC / INFO are based on RDBMS.
• RDBMS is Relational DBMS and it is very easy to learnand well suited for adhoc queries. A relational querylanguage like SQL is very easy to learn.
• Three most popular data modeling approaches arerecord-based, object-based and object-relational basedon ER Diagram.
• When the data has a parent or a child or one to manyrelation, it is called hierarchical model.
• This model has many advantages like- easy to understand,
HIERARCHIAL DATABASE MODEL
GIS Data Modelling and Management 23
- easy to update or expand,- good for quick data retrieval.
• This model has many disadvantages like- large index files to be maintained,- certain attribute values are repeated, so redundancyincreases and it occupies more storage space and alsodata access becomes slow.
• When the data has many to many relationship, it iscalled network systems model.
• This model has many advantages like- more flexibility,
NETWORK DATABASE MODEL
GIS Data Modelling and Management 24
- more flexibility,- avoids redundancy.
• This model has many disadvantages like- overhead of pointers,- complex system,- more no. of pointers, so more storage space.
• It is a collection of tabular relations each with a set ofattributes.
• Data is stored as a set of rows called as tuples; consistingof values for each attribute.
• There are two schemas upon which the entire databasedepends. They are relation schema and database schema.
RELATIONAL DATABASE MODEL
GIS Data Modelling and Management 25
depends. They are relation schema and database schema.• Relation Schema – It is usually declared when database is
set up and does not change much during life span of thesystem.
• Database schema – It is a set of relation schema andrelational database with some constraints.
• Primitive operations of relational algebra - Union,Difference, Intersection, Join etc.
• Relational algebra provides a specific set of rules fordesign and function of these systems.
• Relational join is a linking mechanism to match / relatedata in one table to another.
• A single or multiple columns can be used to define search
RELATIONAL DATABASE MODEL(Contd..)
GIS Data Modelling and Management 26
• A single or multiple columns can be used to define searchstrategy and this search criterion is called primary key.
• When a primary key in one table is related to anothercolumn in second table, the column in the second tablerow to which primary key is linked, is called foreign key.
• In process of relational joins, many a times redundancy iscreated. A set of rules called Normal Forms has beenestablished to reduce it.
There are THREE basic normal forms.• 1st Normal Form - There should be a single value
only in each row location.• 2nd Normal Form - Every column that is not a primary
key be totally dependent on the primary key.
RELATIONAL DATABASE MODEL (Contd..)
GIS Data Modelling and Management 27
key be totally dependent on the primary key.• 3rd Normal Form - Columns should depend on
primary keys but primary keys should not depend onany non-primary key.
There are more advanced normal forms available.They can be used to improve quality of the database,
• The tables which are stored in database are queried and theserepresent some virtual views which is done using SQL.
• Queries may be related to one table. e. g. Which hotels in cityare five star? The answer to the query can be Hotel Taj.
• Also, queries may be related to many tables. e. g. Which
STANDARD QUERY LANGUAGE (SQL)
GIS Data Modelling and Management 28
• Also, queries may be related to many tables. e. g. Whichtourists originating from Europe stay more in five star hotels incity? (Two tables involved may be Tourist and Occupancy).
• Advantages of SQL - Completeness, Simplicity, PseudoEnglish language style.
• SQL is not developed to handle geographical concepts like“near to”, “far from”, “connected to” etc.
• RDBMS software supporting SQL – ARC / INFO, ORACLE,Geovision.
• It is a layered approach where layer holds informationabout a single thematic domain at a single known time.
• Data is stored in terms of “snapshots”.• Drawbacks:-
1. Data Volume is enormous.
LOCATION BASED REPRESENTATION FOR SPATIO-TEMPORAL DATA
GIS Data Modelling and Management 29
1. Data Volume is enormous.2. Time consuming process to access data.3. Individual change w.r.t cells can’t be determined.
TEMPORAL GRID APPROACH• Variable length list is associated with each pixel.• Each entry brings a change at each location with new
value and time (event history)
ENTITY BASED REPRESENTATION FOR SPATIO - TEMPORAL DATA
Also called as: AMENDMENT VECTOR APPROACH• It tracks the changes in geometry of entities w. r. t.
time.• Changes are incrementally recorded (Vectors).• As time progresses, number of amendment vectors
GIS Data Modelling and Management 30
• As time progresses, number of amendment vectorsgrow to increase complexity.
• Time-Based Representations for Spatio-Temporal Data usetime as the organizational basis.
• With this type of time-based representation, the changesrelated to time are explicitly stored.
TIME BASED REPRESENTATION FOR SPATIO-TEMPORAL DATA
GIS Data Modelling and Management 31
related to time are explicitly stored.
• This type of representation has the unique advantage offacilitating time-based queries.
• Adding new events as time progresses is also straightforwardand are simply added to the end of timeline.
THANK YOU !
GIS Data Modelling and Management 32