Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications
-
Upload
knoesis-center-wright-state-university -
Category
Technology
-
view
2.887 -
download
2
description
Transcript of Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications
![Page 1: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/1.jpg)
Semantics-empowered Big Data Processing for PCS ApplicationsKrishnaprasad Thirunarayan (T. K. Prasad) and Amit Sheth
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, OH-45435
![Page 2: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/2.jpg)
Prasad 2
Outline
• 5 V’s of Big Data Research
• Semantic Perception for Scalability
• Lightweight semantics to manage heterogeneity – Cost-benefit trade-off and continuum
• Hybrid Knowledge Representation and Reasoning– Anomaly, Correlation, Causation
11/15/2013
![Page 3: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/3.jpg)
Prasad 3
5V’s of Big Data Research
Volume
Velocity
Variety
Veracity
Value
11/15/2013
Big Data => Smart Data
![Page 4: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/4.jpg)
Prasad 4
Volume : Assorted Examples
• 25+ billion sensors have been deployed.
• About 250TB of sensor data are generated for a NY-LA flight on Boeing 737.
• Parkinson disease dataset that tracked 16 patients with mobile phone using 7 sensors over 8 weeks is 12GB.
Check engine light analogy11/15/2013
![Page 5: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/5.jpg)
Prasad 5
Volume : Semantic Perception
• Abstracting machine-sensed data – E.g., fine-grained to coarse-grained– E.g., average, peak, rate of change
• Extracting human-comprehensible features/entities• Machine perception
– Derive conclusions using domain models
and hybrid abductive/deductive reasoning
Goal: Human accessible situational awareness and actionable intelligence for decision making
11/15/2013
![Page 6: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/6.jpg)
Prasad 6
Weather Use Case
• Machine-sensed phenomenon– temperature, precipitation, humidity, wind speed, etc.
• Human perceived features– blizzard, flurry, rain storm, clear, etc.– categories of hurricanes (SSHWS)
• Machine perception– Using domain models from NOAA
• Ultimately, generate weather alerts …
11/15/2013
![Page 7: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/7.jpg)
Prasad 7
Parkinson’s Disease Use Case
• Data from machine-sensors– accelerometer, GPS, compass, microphone, etc.
• Human perceived features– tremors, walking style, balance, slurred speech, etc.
• Machine perception– Using domain models to be created to diagnose and
monitor disease progression
• Ultimately, recommend options to control chronic conditions …
11/15/2013
![Page 8: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/8.jpg)
Prasad 8
Heart Failure Use Case
• Machine-sensed data – weight, heart rate, blood pressure, oxygen level, etc.
• Human perceived features– Risk-level for hospital readmission of CHF/ADHR patient
• Machine perception– Using domain models to be created to monitor heart
condition of a cardiac patient post hospital discharge
• Ultimately, recommend treatments to reduce preventable hospital readmissions …
11/15/2013
![Page 9: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/9.jpg)
Prasad 9
Asthma Use Case
• Data from machine-sensors– Environmental sensors, physiological sensors, etc.
• Human perceived features– Asthma severity gleaned from frequency of asthma
attacks, wheezing, coughing, sleeplessness, etc.
• Machine perception– Using domain models to be created to monitor asthma
patients and their surroundings
• Ultimately, recommend prevention and control options …
11/15/2013
![Page 10: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/10.jpg)
Prasad 10
Traffic Use Case
• Data from machine-sensors, social media stream, and planned event schedules– Traffic flow sensors : link speed, link volume, Event-
specific tweets, etc.
• Human perceived features– traffic delays and congestion, etc.
• Machine perception– Using domain models to be created to understand traffic
patterns in response to events
• Ultimately, recommend traffic management options …
11/15/2013
![Page 11: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/11.jpg)
Slow moving traffic
Link Description
Scheduled Event
Scheduled Event
511.org
511.org
Schedule Information
511.org
Traffic Monitoring
11
Heterogeneity in a Physical-Cyber-Social System
![Page 12: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/12.jpg)
Prasad 12
Volume with a Twist
Resource-constrained reasoning on mobile-devices
Goal: Boolean encodings to ensure feasibility, efficiency, and economy
11/15/2013
![Page 13: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/13.jpg)
13* based on Neisser’s cognitive model of perception
ObserveProperty
PerceiveFeature
Explanation
Discrimination
1
2
Perception Cycle* that exploits background knowledge / domain models
Abstracting raw data for human
comprehension
Focus generation for disambiguation and action(incl. human in the loop)
Prior Knowledge
![Page 14: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/14.jpg)
Virtues of Our Approach to Semantic Perception
Blends simplicity, effectiveness, and scalability.
• Declarative specification of explanation and discrimination;
• With applications (e.g., to healthcare) that are of contemporary relevance and interdisciplinary;
• Using encodings/algorithms that are significant (asymptotic order of magnitude gain) and necessary (“tractable” due to time/memory reduction for typical problem sizes); and
• Prototyped using extant PCs and mobile devices.
![Page 15: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/15.jpg)
O(n3) < x < O(n4) O(n)
Efficiency Improvement
• Problem size increased from 10’s to 1000’s of nodes• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial to
linear
Evaluation on a mobile device
15
![Page 16: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/16.jpg)
Prasad 16
Volume and Velocity
• Lightweight semantics-based Adaptive/Continuous Filtering
E.g.,: Track evolution of crowd-sourced and verified Wikipedia event pages for relevance ranking of Twitter hashtags in Disaster response use-case
• Building domain models dynamically
11/15/2013
![Page 17: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/17.jpg)
Heliopolis is a suburb of
Cairo.
Dynamic Model Creation
Continuous Semantics 17
![Page 18: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/18.jpg)
Variety
Syntactic and semantic heterogeneity • in textual and sensor data, • in (legacy) materials data• in (long tail) geosciences data
Idea: Semantics-empowered integration
11/15/2013 Prasad 18
![Page 19: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/19.jpg)
Prasad 19
Variety (What?): Materials/Geosciences Use Case
• Structured Data (e.g., relational)
• Semi-structured, Heterogeneous Documents (e.g., Publications and technical specs, which usually include text, numerics, maps and images)
• Tabular data (e.g., ad hoc spreadsheets and complex tables incorporating “irregular” entries)
11/15/2013
![Page 20: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/20.jpg)
20
Variety (How?/Why?): Granularity of Semantics & Applications
• Lightweight semantics: File and document-level annotation to enable discovery and sharing
• Richer semantics: Data-level annotation and extraction for semantic search and summarization
• Fine-grained semantics: Data integration, interoperability and reasoning in Linked Open Data
Cost-benefit trade-off and continuum
![Page 21: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/21.jpg)
Prasad 21
Challenges Associated with Typical Spreadsheet/Table
• Meant for human consumption • Irregular :
– Not simple rectangular grid• Heterogeneous
– All rows not interpreted similarly• Complex
– Meaning of each row and each column context dependent • Footnotes modify meaning of entries (esp. in materials
and process specifications)
11/15/2013
![Page 22: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/22.jpg)
22
![Page 23: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/23.jpg)
Prasad 23
Practical Semi-Automatic Content Extraction
• DESIGN: Develop regular data structures that can be used to formalize tabular information.– Provide a natural expression of data – Provide semantics to data, thereby removing potential
ambiguities– Enable automatic translation
• USE: Manual population of regular tables and automatic translation into LOD
11/15/2013
![Page 24: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/24.jpg)
Variety (What?) : Sensor Data Use Case
Develop/learn domain models to exploit complementary and corroborative information
• To relate patterns in multimodal data to “situation”
• To integrate machine sensed and human sensed data
11/15/2013 Prasad 24
![Page 25: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/25.jpg)
Variety: Hybrid KRR
Blending data-driven models with declarative knowledge – Data-driven: Bottom-up, correlation-based,
statistical– Declarative: Top-down, causal/taxonomical,
logical– Refine structure to better estimate parameters
E.g., Traffic Analytics using PGMs + KBs
11/15/2013 Prasad 25
![Page 26: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/26.jpg)
Variety (Why?): Hybrid KRR
Data can help compensate for our overconfidence in our own intuitions and reduce the extent to which our desires distort our perceptions.
-- David Brooks of New York Times
However, inferred correlations require clear justification that they are not coincidental, to inspire confidence.
11/15/2013 Prasad 26
![Page 27: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/27.jpg)
• Correlations due to common cause or origin
• Coincidental due to data skew or misrepresentation
• Coincidental new discovery
• Strong correlation vs causation
• Anomalous and accidental
• Correlation turning into causations
Correlations vs Causation vs Anomalies
11/15/2013 Prasad 27
![Page 28: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/28.jpg)
• Correlations Due to common cause or origin– E.g., Planets: Copernicus > Kepler > Newton > Einstein
• Coincidental due to data skew or misrepresentation – E.g., Tall policy claims made by politicians!
• Coincidental new discovery– E.g., Hurricanes and Strawberry Pop-Tarts Sales
• Strong correlation vs causation– E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers
• Anomalous and accidental– E.g., CO2 levels and Obesity
• Correlation turning into causations– E.g., Pavlovian learning: conditional reflex
Correlations vs Causation vs Anomalies
11/15/2013 Prasad 28
![Page 29: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/29.jpg)
• Correlations Due to common cause or origin– E.g., Planets: Copernicus > Kepler > Newton > Einstein
• Coincidental due to data skew or misrepresentation – E.g., Tall policy claims made by politicians!
• Coincidental new discovery– E.g., Hurricanes and Strawberry Pop-Tarts Sales
• Strong correlation vs causation– E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers
• Anomalous and accidental– E.g., CO2 levels and Obesity
• Correlation turning into causations– E.g., Pavlovian learning: conditional reflex
Paradoxes: The Seeds of Progress
Correlations vs Causation vs Anomalies
11/15/2013 Prasad 29
![Page 30: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/30.jpg)
Veracity
Lot of existing work on Trust ontologies, metrics and models, and on Provenance tracking
• Homogeneous data: Statistical techniques• Heterogeneous data: Semantic models
Open Problem: Develop semantics of trust using expressive frameworks that are both declarative and computational • To make explicit all aspects that go into trust
formation, to inspire confidence in inferences
11/15/2013 Prasad 30
![Page 31: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/31.jpg)
Veracity
Machine sensing: objective, quantitative,
but prone to environmental effects, battery life, …
Human sensing: subjective, qualitative,
but prone to bias, perceptual errors, rumors, …
Open problem: Improving trustworthiness by combining machine sensing and human sensing– E.g., 2002 Überlingen mid-air collision :Pilot incorrectly
using Traffic controller advice over electronic TCAS system recommendation
11/15/2013 Prasad 31
![Page 32: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/32.jpg)
(More on) Value
Learning domain models from “big data” for prediction
E.g., Harnessing Twitter "Big Data" for Automatic Emotion Identification
Idea: Exploit “emotion-hashtagged” tweets as training dataset
11/15/2013 Prasad 32
![Page 33: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/33.jpg)
(More on) Value
Discovering gaps and enriching domain models using data
E.g., Data driven knowledge acquisition method for domain knowledge enrichment in the healthcare
Idea: Use associations between diseases, symptoms and medications in EMR documents
11/15/2013 Prasad 33
![Page 34: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/34.jpg)
Prasad 34
Conclusions
• Glimpse of our research organized around
the 5 V’s of Big Data• Discussed role in harnessing Value
– Semantic Perception (Volume)– Continuum of Semantic models to manage
Heterogeneity (Variety)– Hybrid KRR: Probabilistic + Logical (Variety)– Continuous Semantics (Velocity)– Trust Models (Veracity)
11/15/2013
![Page 35: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications](https://reader030.fdocuments.us/reader030/viewer/2022013003/554b983ab4c9052d448b4b41/html5/thumbnails/35.jpg)
Prasad35
thank you, and please visit us at
http://knoesis.org/
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USA
Kno.e.sis
11/15/2013
Special Thanks to: Pramod Anantharam and Cory Henson