Role of Big Data for Smart City Applications
-
Upload
knoesis-center-wright-state-university -
Category
Data & Analytics
-
view
307 -
download
1
description
Transcript of Role of Big Data for Smart City Applications
1
Role of Big Data for Smart City ApplicationsPramod Anantharam, Amit Sheth, Kno.e.sis, Wright State University
Thanks Paya Barnaghi
2
Amit Sheth’s PHD students
Ashutosh Jadhav
Hemant Purohit
Vinh Nguyen Lu Chen
Pramod AnantharamSujan
Perera
Alan Smith
Maryam Panahiazar
Sarasi Lalithsena
Cory Henson
Kalpa Gunaratna
Delroy Cameron
Sanjaya Wijeratne
Wenbo Wang
Kno.e.sis in 2013 = ~100 researchers (15 faculty, ~50 PhD students)
Pavan Kapanipathi
Shreyansh Bhatt
Acknowledgements: Kno.e.sis team, Funds - NSF, NIH, AFRL, Industry…
3
• Among top universities in the world in World Wide Web (cf: 10-yr impact, Microsoft Academic Search: among top 10 in June2014)
• Among the largest academic groups in the US in Semantic Web + Social/Sensor Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical & Biomedicine Applications
• Exceptional student success: internships and jobs at top salary (IBM Watson/Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research universities, NLM, startups )
• 100 researchers including 15 World Class faculty (>3K citations/faculty avg) and ~45 PhD students- practically all funded
• Extensive research for largely multidisciplinary projects; world class resources; industry sponsorships/collaborations (Google, IBM, …)
4
Top organization in WWW: 10-yr Field Rating (MAS)
5
Let’s talk Big Data @ Kno.e.sis
• Social Media Big Data – Twitris, eDrugTrends• Sensor/IoT Big Data – CityPulse, kHealth• Healthcare Big Data – kHealth, EMR, Prediction• Biomedical Big Data – Biomarker from NextGen
Sequencing and Proteomics, SCOONER• Big and Smart Data Certificate Kno.e.sis private cloud: 864 CPU cores, 18TB RAM, 17TB SSD, 435TB disk
6
Smart Cities and Back to the future
Thanks to Dr. Payam Barnaghi for sharing the slide
7Source LAT Times, http://documents.latimes.com/la-2013/
Future cities: a view from 1998
Thanks to Dr. Payam Barnaghi for sharing the slide
8
Image courtesy: Avatar wiki
Thanks to Dr. Payam Barnaghi for sharing the slide
9Thanks to Dr. Payam Barnaghi for sharing the slide
10
Why: Improved Economic/Social/Human development in an era of increased Urbanization
What? All aspects of economy: Agricultural + Manufacturing + Service + Knowledge
How?• Next
Imperatives
11
Enablers of Economic Developments
Image credit: http://www.rcet.org/twd/students/socialstudies/ss_extensions_1intro.htmlImage credit: http://www.shutterstock.com/pic-157118819/stock-vector-conceptual-tag-cloud-containing-words-related-to-smart-city-digital-city-infrastructure-ict.html
Economic development on trade routesCivilizations on river banks
Economic development now increasingly rely on digital infrastructure
12
Over 340 million people live in cities of India in 2008 and it is expected to grow to 590 million by 2030 leading to rapid urbanization1
We are increasingly moving from Agriculture Industry Services The next growth should be toward Knowledge Economy
1http://www.mckinsey.com/insights/urbanization/urban_awakening_in_india
General Economic Trends
13
One aspect of characterizing a City: All its functions
Image credit: http://www.ibm.com/smarterplanet/us/en/smarter_cities/overview/index.html
14
15
Five Key Elements of Smart City*
Utility Services
Transportation Services
Social Infrastructure
Safety & health Services Recycling Services* By Indian Urban Development Ministry
16http://www.tribalcafe.co.uk/big-data-infographic/
Unprecedented Digital Data Growth
• Every thing is becoming data driven• Many types of data: Physical, Cyber, and Social• Effective collection and use of this Big Data has to be a core
part of designing Smart Cities
17
• Increased citizen participation (Social)• Increase monitoring using sensors (Physical)• Increase Digital Government (eGov) data
(Cyber)
Understanding wealth of data
Let’s not develop future applications with constraints of the past
http://www.informationweek.com/government/leadership/digital-civic-engagement-us-lags/d/d-id/1113938
India ranks 8th in civic engagement!
18
What do we need for developing Smart City Applications?
http://wiki.knoesis.org/index.php/PCS
Amit Sheth, Pramod Anantharam, Cory Henson, 'Physical-Cyber-Social Computing: An Early 21st Century Approach,' IEEE Intelligent Systems, vol. 28, no. 1, pp. 78-82, Jan.-Feb., 2013. http://doi.ieeecomputersociety.org/10.1109/MIS.2013.20
Physical
Cyber
Social*
Developers need to Consider observations from Physical-Cyber-Social systems in Building Smart City applications
*http://www.ichangemycity.com/
19
Physical: Sensors monitoring physical world
- Programmable devices- Off-the-shelf gadgets/tools
Thanks to Dr. Payam Barnaghi for sharing the slide
20
Cyber: Observations pushed to the cyber world
Thanks to Dr. Payam Barnaghi for sharing the slide
21
Motion sensorMotion sensor
Motion sensor
ECG sensor
World Wide Web
Road block, A3Road block, A3
Social: People interacting with the physical world
Thanks to Dr. Payam Barnaghi for sharing the slide
22
• Smart City application in Indian Context• Smart City Use Cases in Developed World
– Smart City application in European Context– Smart City application in US context
Scope of this talk
23
• Smart City application in Indian Context• Smart City Use Cases in Developed World
– Smart City application in European Context– Smart City application in US context
24
Dynamic schedule update of Public Transport vehiclesin A CITY Lacking Traffic Instrumentation*
Pramod Anantharam
Joint work with Biplav Srivastava and Raj Gupta, IBM IRLAug 31, 2012
*Work done as part of internship at IBM Research
25
Motivation
By 2001 over 285 million Indians lived in cities, more than in all North American cities combined (Office of the Registrar General of India 2001)1
1The Crisis of Public Transport in India2IBM Smarter Traffic
Modes of transportation in Indian Cities
Texas Transportation Institute (TTI) Congestion report in U.S.
26
Motivation: Why SMS for Events?
• Prevalence– In India, 11 cities provide notifications to citizens using SMS– SMS based alerts common for business transactions– Low-cost phones constitute 95% of all phones (~930 million
mobile connections in India2)• Social media (Facebook, Twitter) and SMS
– Commuters prefer dynamic updates such as SMS verses any other form of traffic updates1.
1Caulfield et al. Factors Which Influence the Preferences for real-time Public Transport Information, Association of European Transport and contributors 20072http://en.wikipedia.org/wiki/Communications_in_India
27
Problem
• Input: – Traffic related text alerts, domain knowledge, public
transport routes, and historical data.• Output
– Events in desired form– Impact of events on public transport routes (e.g.
probability of delay given location + event)• Challenges
– No instrumentation (sensors) leading to sparse and imprecise information, event extraction from free text.
28
Solution Components
As events are reported to MDU (Multi-modal Dynamic Update):• Traffic event detection from SMS alerts – event <Type, Time
(Reported, Published), Location (From, To, On), Description>
• Reasoning over traffic events for delay assessment– Find stops in the region affected by event (Qualitative)– Estimate delay at stops (Quantitative)
• Consider time of day and history of such events• Have an attenuation function based on event types
– Propagate delay estimates to neighboring stops• Account for time, schedule and direction of travel
29
c.p.w.d.cly. vasant vihar
vasant vihar
depot.
vasant vihar(t)
paschim marg vasant vihar
Signal Enclave
Vasant Vihar
vasant vihar model schoolc.p.w.d.cly. vasant viharpaschim marg vasant viharvasant vihar(t)vasant vihar depot.
“Traffic movement is slow from Sanjay point towards Vasant Vihar due to break down of an HTV in front of Signal [email protected],210612.”
Illustration from New Delhi (India)
eventtype = BreakDowneventdescription = “Traffic movement is slow from Sanjay point towards Vasant Vihar due to break down of an HTV in front of Signal [email protected],210612.”eventstartloc = Sanjay Pointeventendloc = Vasant vihareventonloc = Signal Enclaveeventtime = June 21, 2012, 10:15am
30
Evaluation: Event Extraction
•Run for ~50 messages in Delhi•Accurate extraction of location from, to and type.
Sample
31
Bayesian Model: Impact of Events on Delay
The probability of having a delay at a stop , Si, given events observed at the stop, is given by
32
Impact (Delay) Propagation Across Stops
Vehicle moves from S1 towards S4
Actual steps are by loopy belief propagation algorithmKoller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Tech-niques. MIT Press (2009)
Assumption: delay at a node is directly influenced by delay at the next node.
33
Application: adaptive route recommendations in IRL Transit
34
1-Slide Summary: Multi-Mode Commuting Recommender in Delhi And Bangalore
Highlights• Published data of multiple authorities used; repeatable process •Multiple modes searched• Preference over modes, time, hops and number of choices supported; more extensions, like fare possible • Integration of results with map as future work; already done as part of other projects, viz. SCRIBE-STAT
IRL – Transit on March 2012
First Version
35
IRL – Transit in Aug 2012
Key Points•SMS message from city• Event and location identified• Impact assessed• Impact used in search
36
Matching Stop Names to OSM Location
• 3931 multi-modal stops in Delhi• Matching algorithm involves chucking of stop names,
distance metrics, and voting to select the best match.• Matches categorized as confident (>50% of techniques
resulted in the match), possible (=50%), and uncertain (<50%).
• 1496 (confident matches) stops mapped to OSM locations.
OSM names
37
Event Information
VasantVihar
StartLoc = Sanjay PointEndLoc = VasantViharOnLoc = Signal EnclaveLat-lon = 28.5561025,77.1645187
textual observation
Backgroundknowledge
GIS Location
Assess impact
Observations
parameterize
Fetch text message
Extract Locations StartLoc, EndLoc, and
OnLoc
Extract Event Information
Is location present?
Extract GIS location using Open Street
Maps
No
Yes
Is event present?
Domain knowledge of categorization of events
Text message with
metadata
Yes
No
Traffic movement is slow from Sanjay point towardsVasantVihar due to break down of an HTV in front ofSignal [email protected],210612.
STOPID STOPNAME
321 c.p.w.d. cly. vasantvihar
369 vasantvihar (t)814 vasantvihar model school
956 vasantviharcpwdcly.
957 paschimmargvasantvihar
1274 vasantvihar depot
STOPID STOPNAME
957 paschimmargvasantvihar
814 vasantvihar model school
321 c.p.w.d. cly. vasantvihar
956 vasantviharcpwdcly.
1274 vasantvihar depot
369 vasantvihar (t)
IRL-Transit routes (ordered)
StartLoc = Sanjay PointEndLoc = VasantViharOnLoc = Signal EnclaveEvent = break down of an HTV
StartLoc = Sanjay PointEndLoc = VasantViharOnLoc = Signal EnclaveEvent = break down of an HTVEvent Type = BreakDown
IRL-Transit routes (unordered)
Signal Enclave
38
Evaluation: Reasoning over traffic events
• Traffic alerts collected for 10 cities in India for two years.
• Prior probability of events computed using these alerts.
• Probability of having a delay given an event type at ten locations in Delhi is summarized:
39
Number of SMS messages for bus stops in Delhi for 2 years (Aug 2010 – Aug 2012)
• 344 stopswith updates• 3931 total stops
40
• Smart City application in India Context• Smart City Use Cases in Developed World
– Smart City application in European Context– Smart City application in US context
41
CityPulse Consortium
Industrial SIE, ERIC
SME AI,
HigherEducation
UNIS, NUIG,UASO, WSU
City BR, AA
Partners:
Duration: 36 months
CityPulse
42
43
AnalyticsToolbox
Context-awareDecision Support,
Visualisation
Knowledge-based
Stream Processing
Real-TimeMonitoring &
Testing
Accuracy & Trust
Modelling
SemanticIntegration
On Demand Data
Federation
OpenReferenceData Sets
Real-TimeIoT InformationExtraction
IoT StreamProcessing
Federation ofHeterogenousData Streams
Design-Time Run-Time Testing
Exposure APIs
In summary
44
Data:Data
Domain Knowledge
Socialsystems
Interactions
Open Interfaces
Ambient IntelligenceQuality and
Trust
Privacy and
Security
Open Data
45
Use cases
46
Scenario ranking
47
101 Scenarios
48
101 Scenarios
• http://www.ict-citypulse.eu/page/content/smart-city-use-cases-and-requirements
49
Public parking space availability prediction
http://www.ict-citypulse.eu/scenarios/scenarios
• Finding parking space in a city can be challenging • Predicting the probability of parking given various input
variables such as scheduled events, time of day & location.• Reduced emission and frustration for citizens
50
• Smart City application in India Context• Smart City Use Cases in Developed World
– Smart City application in European Context– Smart City application in US context
51
Extracting City Events from Social Streams
Toward a Citizen Centered Smart CityPramod Anantharam1
1Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USA
http://www.ict-citypulse.eu/page/
Mentor/Supervisor: Dr. Payam Barnaghi
52Image credit: http://www.ibm.com/smarterplanet/us/en/smarter_cities/overview/index.html
Public Safety Urban planning Gov. & agency admin.
Energy &water
Environmental Transportation Social Programs Healthcare Education
Pulse of a City (CityPulse)
53
• Are people talking about city infrastructure on twitter?
• Can we extract city infrastructure related events from twitter?
• How can we leverage event and location knowledge bases for event extraction?
• How well can we extract city events?
Research Questions
54
Are People Talking About City Infrastructure on Twitter?
55
Some Challenges in Extracting Events from Tweets
• No well accepted definition of ‘events related to a city’ • Tweets are short (140 characters) and its informal
nature make it hard to analyze– Entity, location, time, and type of the event
• Multiple reports of the same event and sparse report of some events (biased sample)– Numbers don’t necessarily indicate intensity
• Validation of the solution is hard due to the open domain nature of the problem
56
Formal Text Informal Text
Closed Domain
Open Domain [Roitman et al. 2012][Kumaran and Allan 2004]
[Lampos and Cristianini 2012]
[Becker et al. 2011]
[Wang et al. 2012]
[Ritter et al. 2012]
Related Work on Event Extraction
57
City Infrastructure
Tweets from a cityPOS
Tagging
Hybrid NER+ Event term extraction
Geohashing
Temporal Estimation
Impact Assessment
Event Aggregation
OSM Locations
SCRIBE ontology
511.org hierarchy
City Event Extraction
City Event Extraction Solution Architecture
City Event Annotation
58
• City Event Annotation – Automated creation of training data – Annotation task (our CRF model vs. baseline CRF model)
• City Event Extraction– Use aggregation algorithm for event extraction– Extracted events AND ground truth
• Dataset (Aug – Nov 2013) ~ 8 GB of data on disk– Over 8 million tweets– Over 162 million sensor data points– 311 active events and 170 scheduled events
Evaluation
59
Ground Truth Data (only incident reports) -- City Event Extraction
We have around 162 million data records from sensors monitoring over 3,700 links in San Franciso Bay Area<link_id, link_speed, link_volume, link_travel_time,time_stamp> a data record
GREEN – Active EventsYELLOW – Scheduled Events
311 active events and 170 scheduled events
60
Evaluation – Extracted Events AND Ground Truth
61
Traffic Analytics using Probabilistic Graphical Models Enhanced with Knowledge Bases
Pramod Anantharam, T. K. Prasad, Amit ShethOhio Center of Excellence in Knowledge-enabled Computing (kno.e.sis)
Wright State University, Dayton, Ohio
2nd International Workshop on Analytics for Cyber-Physical Systems (ACS-2013)
62
Slow moving traffic
Link Description
Scheduled Event
Scheduled Event
511.org
511.org
Schedule Information
511.org
63
Uncertainty in the Real-world
• Observation: Slow Moving Traffic• Multiple Causes (Uncertain about the cause):
– Scheduled Events: music events, fair, theatre events, concerts, road work, repairs, etc.
– Active Events: accidents, disabled vehicles, break down of roads/bridges, fire, bad weather, etc.
– Peak hour: e.g. 7 am – 9 am OR 4 pm – 6 pm• Each of these events may have a varying
impact on traffic
64
Why Probabilistic Graphical Models?
“As far as the laws of mathematics refer to reality, they are not certain, as far as they are certain, they do not refer to reality” -- Albert Einstein, 1921.
“Graphical models are a marriage between probability theory and graph theory. They provide a natural tool for dealing with two problems that occur throughout applied mathematics and engineering -- uncertainty and complexity …”
-- Michael Jordan, UC Berkley, 1998.
65
Graphical Models – Bayesian Network Example
SlowTraffic
Cold
IcyRoad PoorVisibility
Random variable
Edge between random variables which is indicative
of conditional independence
T 0.33F 0.67
T F0.75 0.050.25 0.95
cold
IcyR
oad T
F
T F0.85 0.40
0.15 0.60
cold
Poor
Visib
ility
T
F
cold
T F
IcyRoad PoorVisibility
T F T F
0.85 0.4 0.9 0.2
0.15 0.6 0.1 0.8
Slow
Traffi
c
T
F
Conditional Probability Table
(CPT)A graphical model hasstructure (nodes and edges) and parameters; CPD – continuous variables, CPT – discrete variables
66
How do we get nodes and edges?
Domain Experts
ColdWeather
PoorVisibility
SlowTraffic
IcyRoad
Declarative domain knowledge
Variables and relationships
Causal knowledge
Linked Open Data
ColdWeather(YES/NO) IcyRoad (ON/OFF) PoorVisibility (YES/NO) SlowTraffic (YES/NO)
1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0
Domain Observations
Domain Knowledge
Structure and parameters
WinterSeason Otherknowledge
67
Domain Knowledge
• Declarative knowledge about various domains are increasingly being published on the web1,2.
• Declarative knowledge describes concepts and relationships in a domain (structure).
• Linked Open Data may be used to derive priors probability of events (parameters).
• In this work, we focus only on use of declarative knowledge for structure using ConceptNet 5.
1http://conceptnet5.media.mit.edu/2http://linkeddata.org/
68
ConceptNet 5
http://conceptnet5.media.mit.edu/web/c/en/traffic_jam
Delay
go to baseball game
traffic jam
traffic accident
traffic jam
ActiveEvent
ScheduledEvent
Causestraffic jam
Causestraffic jam
CapableOfslow traffic
CapableOfoccur twice each day
Causes
is_a
bad weatherCapableOf
slow traffic
road iceCauses
accident
TimeOfDay
go to concertHasSubevent
car crash
accidentRelatedTo
car crash
BadWeather
Causes
Causes
is_ais_a
is_a is_a is_ais_a
is_a
69
Key Idea
• Probabilistic Graphical Models (PGM) use statistical approaches to uncover correlations.
• Declarative knowledge curated by humans provide richer relationships including causal knowledge.
• Goal: Utilizing declarative knowledge with PGM structure learning algorithms to build richer (quality and coverage) models.
70
Traffic jam
Link Description
Scheduled Event
traffic jambaseball game
Add missing random variables
Time of day
bad weather CapableOf slow traffic bad weather
Traffic data from sensors deployed on road network in San Francisco Bay Area
time of day
traffic jambaseball gametime of day
slow traffic
Complementing graphical model structure extraction
Add missing links bad weather
traffic jambaseball gametime of day
slow traffic
Add link directionbad weather
traffic jambaseball gametime of day
slow traffic
go to baseball game Causes traffic jam
Knowledge from ConceptNet5
traffic jam CapableOfoccur twice each daytraffic jam CapableOf slow traffic
71
Smart Cities: Opportunities
• empower citizens• provide more business opportunities for
companies (and SMEs) and private sector services• create better governance of our cities and better
public services • provide smarter monitoring and control• improve energy efficiency, create greener
environments… • create better healthcare, elderly-care…
Thanks to Dr. Payam Barnaghi for sharing the slide
72
Smart Cities: Challenges
• Adherence to open data standards by all the city authorities
• Sufficient guidance and support for city authorities in managing their data
• Reliability and quality of citizen reporting of city events
• Privacy and Security issues in event reporting
73
Thank you