A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping...

25
A Semantically-Enabled Provenance-Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah L. McGuinness Tetherless World Senior Constellation Chair Professor of Computer and Cognitive Science Rensselaer Polytechnic Institute Troy, NY, USA

Transcript of A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping...

Page 1: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

A Semantically-Enabled Provenance-Aware Water Quality Portal

Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano

Deborah L. McGuinnessTetherless World Senior Constellation Chair

Professor of Computer and Cognitive ScienceRensselaer Polytechnic Institute

Troy, NY, USA

Page 2: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Introduction

• Real Life Motivation Example:– In 2009, in Bristol County, Rhode Island,

Children start getting sick with symptom like diarrhea. The cause was found to be polluted water.

– Public concerns: “When did the contamination begin?”, “How did this happen?”, “How can we keep it from happening again?”

– We need an environmental informatics systems that can automatically integrate and analyze water quality.

Page 3: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Challenges

1. Raw data from multiple sources and in different format – difficult to integrate and query.

2. Semantics of the water quality data are not explicitly encoded in the data – machine can’t process data automatically.

3. Large amount of data due to large spatial region, long time span, and large number of pollutants and regulated limit – analysis can be time consuming and complex.

Page 4: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

TWC-SWQP

• Identify point sources of water pollution, including water sites monitored by USGS and polluting facilities regulated by EPA.

• Demonstrates the effectiveness of semantic web technologies in addressing the challenges faced by environmental informatics systems.

• Enable/Enpower citizens & scientists to better explore water related information.

Page 5: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

System Architecture

access

Virtuoso

Page 6: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

SemantAQUA Workflow

Archive

CSV2RDF4LODEnhance

derive derive

integrate archive

Publish

CSV2RDF4LODDirect visualize

Page 7: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Ontology

• Core TWC Water ontology– Extends existing best

practice ontologies, e.g. SWEET, OWL-Time.

– Includes terms for relevant pollution concepts

– Can use to conclude: “any water source that has a measurement outside of its allowable range” is a polluted water source.

Portion of the TWC Water Ontology.

Page 8: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Ontology

• Regulation Ontology– model the federal and state

water quality regulations for drinking water sources

– Can use to define: for example, in California, “any measurement has value 0.01 mg/L is the limit for Arsenic”

– Combine with core ontology, we can infer “any water source contains 0.01 mg/L of Arsenic is a polluted water source.” Portion of Cal. Regulation Ontology.

Page 9: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Provenance

• Preserves provenance in the Proof Markup Language (PML).

• Data Source Level Provenance:– The captured provenance data are used to

support provenance-based queries.

• Reasoning level provenance: – When water source been marked as polluted,

user can access supporting provenance data for the explanations including the URLs of the source data, intermediate data and the converted data.

Page 10: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Visualization

1. Presents analyzed results with Google Map

2. Presents explanation of water source pollution

3. Presents possible health effect of contaminant

4. Presents “Facet” type filter to select type of data

5. Presents link to the authority, where user can report problems.

1

2 3

http://was.tw.rpi.edu/swqp/map.html

45

Page 11: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Visualization

• Time series Visualization:– Presents data in time series visualization

for user to explore and analyze the data

Limit value: 15

Violation, measured value: 50

http://was.tw.rpi.edu/swqp/trend/epaTrend.html?state=RI&county=3&site=http%3A%2F%2Ftw2.tw.rpi.edu%2Fzhengj3%2Fowl%2Fepa.owl%23facility-110000312135

Page 12: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Demo

Page 13: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Data

• EPA Data: – Provides measurements of pollutants in the water

discharged by the facilities, and also the threshold values for up to five test types for each pollutant.

• USGS Data:– Provides measurements of substances contained in

water samples collected at USGS data-collection stations

• Regulation Data:– Provides lists of pollutants and their maximum

contaminant level

Page 14: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Selected Follow-up options

Limit

Violation

Page 15: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Results

• Semantic Data Integration provides an effective and low cost approach for integrating data from various sources.• SWQP integrates data from various sources, including

EPA, USGS, and state governments.• Linking to external data: “twcwater:Arsenic”, linked to

“dbpedia:Arsenic” using owl:sameAs.• We have generated 89.58 million triples for the USGS

datasets and 105.99 million triples for the EPA datasets. Requires only 2-person days.

Page 16: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Results

• Query and reasoning supported by semantic technologies improves responsiveness and simplifies the development of web applications. • SPARQL queries narrows down the data, we can

reason over only the relevant data on one selected regulation.

• Reasoning eases the complexity of queries a developer needs to write for software applications.

Page 17: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Results

• Provenance information encoded using semantic web technology supports transparency and trust. • SWQP provides detailed provenance information:

– Original data, intermediate data, data source

• “What if” Senario: user may trust data from certain authorities only. – User can apply a stricter regulation from another state to

a local water source.

Page 18: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Discussion

• Future Work– Expand SWQP to support all 50 states.– Add flood/weather information, and their effect on

water sources– model the health effects from exposure to the

excessive pollutants in water and support reasoning over these effects.

– Expand SWQP to other environmental topics: soil quality, air quality

– Get community involved: user can put comment on each water source, or report problem to the authorities.

Page 19: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Conclusion

• SWQP is a web portal that allows citizens and professionals to easily explore water quality information.

• SWQP illustrated benefits of applying semantic web technologies to water quality research.– Data integration, provenance, automatic reasoning.

• Architecture of SWQP can be easily apply to other environment topics– Air quality, soil quality, etc.

Page 20: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Questions?http://tw.rpi.edu/web/project/SemantAQUA

http://inference-web.org/wiki/Semantic_Water_Quality_Portal

Page 21: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

BACKUP SLIDES

Page 22: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Related work

• Other work focuses on facilitating water quality management [13, 14] and wastewater treatment [15] via knowledge sharing and reuse.– [13] presents system that integrates water quality data from multiple sources

and retrieves data using semantic relationships among data.

– [14] presented an ontology-based Knowledge Management system (KMS) that can be integrated into the numerical flow and water quality modeling to provide assistance on the selection of a model and its pertinent parameters

– [15] is an environmental decision-support system for wastewater management, which augments classic rule-based and case-based reasoning with a domain ontology.

• SWQP:– SWQP differs from these projects in that it supports provenance based

query.

– SWQP is built upon standard semantic technologies (e.g. OWL, SPARQL, Pellet, Virtuoso) and thus can be easily replicated or expanded.

Page 23: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

Queries for result 2

SELECT * WHERE {

?watersource twcwater:hasMeasurement ?measurement.

?measurement twcwater:hasValue ?value;

twcwater:hasCharacteristic ?charactericsitc;

twcwater:hasUnit ?unit. (1)

?regulation twcwater:hasValue ?limit;

twcwater:hasCharacteristic ?characteristic;

twcwater:hasUnit ?unit.

?watersource geo:lat ?lat; geo:long ?long.

FILTER( ?value > limit )

}

SELECT * WHERE {

?watersource rdf:type twcwater:pollutedWaterSource.

geo:lat ?lat; (2)

geo:long ?long.

}

Page 24: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

New Ontology

• New Regulation ontology– Reuse sweet:Measurement

instead of use owl:sameAs

– Defines cardinality restriction

– Defines Datatype restriction

Portion of the EPA regulation ontology

Page 25: A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.

New Ontology

• TWC Environment Monitoring Ontology– Can be extended to use

different regulation– Uses sweet ontology– More general ontology:

aim for not just monitoring water, but anything relate to environment: air quality.