Post on 30-Jan-2016
description
04/22/23 Department of Civil, Architectural & Environmental Engineering 1
HYDROSEEK and HYDROTAGGERA Search Engine for Hydrologists
GIS in Water Resources Lecture
M. Piasecki
November, 2007
04/22/23 Department of Civil, Architectural & Environmental Engineering 2
Lecture Demo of HydroSeek What are the search criteria? Functionality of the Engine Interface
Data Sources Common Sources Common Problems (Completeness, Syntax, Semantics)
Ontologies Ontology details Concept-to-data variable tagging
Architecture Flow Chart Technologies used
Demo of HydroTagger Why the Tagging? Technologies
04/22/23 Department of Civil, Architectural & Environmental Engineering 3
www.HydroSeek.org
04/22/23 Department of Civil, Architectural & Environmental Engineering 4
HIS Goals Hydrologic Data Access System – better access
to a large volume of high quality hydrologic data Support for Observatories – synthesizing
hydrologic data for a region Advancement of Hydrologic Science – data
modeling and advanced analysis Hydrologic Education – better data in the
classroom, basin-focused teaching
04/22/23 Department of Civil, Architectural & Environmental Engineering 6
Search multiple heterogeneous data sources simultaneously regardless of semantic or structural differences between them
Objective
NWIS
NARR
NAWQANAM-12
request
request
request
request
request
requestrequest
request
request
return
return
return
return
return
returnreturn
return
return
What we are doing now …..
04/22/23 Department of Civil, Architectural & Environmental Engineering 7
Semantic Mediator
What we would like to do …..
NWIS
NAWQA
NARR
generic
request
GetValues
GetValues
GetValues
GetValues
GetValues
GetValuesGetValues
GetValues
GetValues HODM
04/22/23 Department of Civil, Architectural & Environmental Engineering 8
Data sources…USGS
EPA
CIMS
TCEQ
NADP
04/22/23 Department of Civil, Architectural & Environmental Engineering 10
Spatial Coverage
STORET has 758 sites in Texas, TCEQ has 8407.
STORET has 47,602 sites in Florida, NWIS has 27,906.
NWIS has 121,545 in Minnesota, STORET has 22,260.
04/22/23 Department of Civil, Architectural & Environmental Engineering 11
Data Availability
04/22/23 Department of Civil, Architectural & Environmental Engineering 12
1957-19771977-20032003-2007
Nitrogen
Temporal Coverage
04/22/23 Department of Civil, Architectural & Environmental Engineering 13
Interface Problem
NWIS ~175 form elements on a single page
STORET + NWIS + TCEQ + CIMS = ???A drop down menu ∞
String search across parameter list? How about synonyms?‘Elevation, water surface’ vs. ‘stage height’
04/22/23 Department of Civil, Architectural & Environmental Engineering 14
Completeness Problem: Metadata Catalog• Better query performance• Freedom• Fewer errors
Total Number of Sites 274,918
Sites with geographic coordinates 274,435
Sites with State/County information 273,113
Sites with Hydrologic Unit Codes 128,646
Availability of geographic identifiers for stations in EPA STORET
04/22/23 Department of Civil, Architectural & Environmental Engineering 15
Heterogeneity Problem
Syntax E.g. date & time formats, Gregorian versus Julian
Data format/structure E.g. XML, HTML, tab/tilde/comma separated
text, gunzipped tar balls…
Semanticsmore …..
04/22/23 Department of Civil, Architectural & Environmental Engineering 16
Issues with Semantics Hyponymy Parameter “Groundwater level”, “Stream stage”, “Reservoir level” versus “Water level”
Pseudo hyponymy due to lack of metadata Parameter “Manganese, 6N hydrochloric acid extracted, recoverable, dry weight, milligrams per kilogram” versus “Manganese, milligrams per kilogram”
Synonymy ‘Total Kjeldahl Nitrogen’ vs. ‘Ammonia+Organic Nitrogen’
04/22/23 Department of Civil, Architectural & Environmental Engineering 17
Search Fine tune Retrieve
rather than
Search Retrieve
avoid ‘high precision, low recall’ and ‘low precision, high recall’
problems.
Search Strategy
04/22/23 Department of Civil, Architectural & Environmental Engineering 18
Layered Ontology Model
04/22/23 Department of Civil, Architectural & Environmental Engineering 19
NavigationCompound
Core
04/22/23 Department of Civil, Architectural & Environmental Engineering 20
Knowledge Base OWL Ontologies
‘Escherichia coli’ = ‘E. coli’‘E. coli’ is-a ‘Indicator Organism’
‘Copper’ is-a ‘Micronutrient’‘Copper’ isMeasuredIn ‘Medium’‘Medium’ = {Water, Soil…}‘Micronutrient’ is-a ‘Nutrient’
• Supports classification of search results
• Entities in the ontology are associated with measured variables in a relational database
• Helps solving semantic heterogeneity issues between data repositories
04/22/23 Department of Civil, Architectural & Environmental Engineering 21
04/22/23 Department of Civil, Architectural & Environmental Engineering 22
Point Observations Information ModelData Source
Network
Sites
Variables
Values
{Value, Time, Qualifier, Offset}
USGS
Streamflow gages
Neuse River near Clayton, NC
Discharge, stage (Daily or instantaneous)
206 cfs, 13 August 2006
• A data source operates an observation network• A network is a set of observation sites• A site is a point location where one or more variables are measured• A variable is a property describing the flow or quality of water• A value is an observation of a variable at a particular time• A qualifier is a symbol that provides additional information about the value• An offset allows specification of measurements at various depths in water
http://www.cuahsi.org/his/webservices.html
GetSites
GetSiteInfo
GetVariables
GetVariableInfo
GetValues
04/22/23 Department of Civil, Architectural & Environmental Engineering 23
Hydroseek Webservices Most Hydroseek functions are available as web services (SOAP)
Support for queries using GlobalChangeMasterDirectory GCMD keywords
Supports output in GeographyMarkupLanguage GML as well as WaterML
Drexel Server
HydroSeek
Native Services
MicroSoft Server
VirtualEarth MapSan Diego Supercomputer
Center Server
USGSDaily
EPASTORET
USGSRealtime
WaterOneFlow
WaterOneFlow
WaterOneFlow
WaterOneFlow TCEQ
WaterOneFlow CIMS
04/22/23 Department of Civil, Architectural & Environmental Engineering 24
GetStationsRequest
Response
BoundingBox
04/22/23 Department of Civil, Architectural & Environmental Engineering 25
GetStationsByHU
HUC_Code
Request
Response
Request
Response
04/22/23 Department of Civil, Architectural & Environmental Engineering 26
GetStationCatalogueFiltered
Request
Response
04/22/23 Department of Civil, Architectural & Environmental Engineering 27
GetStationCatalogue
04/22/23 Department of Civil, Architectural & Environmental Engineering 28
Allows searching multiple heterogeneous data sources simultaneously regardless of semantic or structural differences between them
Modular & extensible
Architecture Outline Inside the CUAHSI HOD Module
04/22/23 Department of Civil, Architectural & Environmental Engineering 30
The Database-Ontology Link
www.HdyroTagger.org
04/22/23 Department of Civil, Architectural & Environmental Engineering 31
1) MappingsApproved_Table
HydroSeek ODM neededan upgrade, i.e. additionaltables.
2) FrequentUpDates_Table
04/22/23 Department of Civil, Architectural & Environmental Engineering 32
How does the Tagging work?Step 1Users need to register on the web-site first before they can use the HydroTagger.
When registering select the testbed site you are affiliated with. Each testbed site needs ONE administrator who can then admit additional users for that specific testbed site.
Please send an email to identify the designated tagger site administrator so we can promote that person to the role.
04/22/23 Department of Civil, Architectural & Environmental Engineering 33
How does the Tagging work?
WATERS Network Information System
Step 2The “Sniffer” jumps into action and trawls through the testbed sites to find and identify new variablenames (once a week, currently every Sunday night)
It does so by using the regular web-services published through the WSDL (no “hacking”!!!)
It returns i) data updating information and ii) variablenames used and compares these to those used by HydroSeek.
04/22/23 Department of Civil, Architectural & Environmental Engineering 34
How does the Tagging work?Step 3The Tagger now updates the HydroSeek catalogue (an amalgamation of all 10 testbed catalogues) with the newly found data entries.
If it finds a new variablename (introduced during the dataloading process using the Data-Loader), it puts it into a table and offers it up to he HydroTagger GUI for semantic Tagging.
Test-Bed VarName Siteexist? VarName? content ActionCCBay DOConcSuf Y Y new data update Cat (Time)CCBay DOConcBot Y N new variable place in TaggerBin => DOCCBay DOConcMid N Y new data upudate Cat (Site+Time)
SRBHOS DO_Water Y Y new data update Cat (Time)
Minnehaha TempSurf Y N new variable place in TaggerBin => TempMInnehaha StreamDOCon Y N new variable place in TaggerBin => DO
SantaFe WaterDOCon Y N new variable place in TaggerBin => DOSantaFe GoldConc Y N new var/no conc place in TaggerBin => ??
04/22/23 Department of Civil, Architectural & Environmental Engineering 35
Thank you…Questions?