Role of Big Data for Smart City Applications

Post on 22-Nov-2014

307 views 1 download

Tags:

description

Talk given to the Smart City course students at CEPT University. Oct 19, 2014. * Overview on Physical (IoT/Sensor), Cyber (OpenGov) and Social (citizen Sensing) data * Relevance to City Departments * Three smart city applications (from India, Europe and US) More on the course: http://indianexpress.com/article/india/india-others/cept-launches-first-ever-course-on-smart-cities/

Transcript of Role of Big Data for Smart City Applications

1

Role of Big Data for Smart City ApplicationsPramod Anantharam, Amit Sheth, Kno.e.sis, Wright State University

Thanks Paya Barnaghi

2

Amit Sheth’s PHD students

Ashutosh Jadhav

Hemant Purohit

Vinh Nguyen Lu Chen

Pramod AnantharamSujan

Perera

Alan Smith

Maryam Panahiazar

Sarasi Lalithsena

Cory Henson

Kalpa Gunaratna

Delroy Cameron

Sanjaya Wijeratne

Wenbo Wang

Kno.e.sis in 2013 = ~100 researchers (15 faculty, ~50 PhD students)

Pavan Kapanipathi

Shreyansh Bhatt

Acknowledgements: Kno.e.sis team, Funds - NSF, NIH, AFRL, Industry…

3

• Among top universities in the world in World Wide Web (cf: 10-yr impact, Microsoft Academic Search: among top 10 in June2014)

• Among the largest academic groups in the US in Semantic Web + Social/Sensor Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical & Biomedicine Applications

• Exceptional student success: internships and jobs at top salary (IBM Watson/Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research universities, NLM, startups )

• 100 researchers including 15 World Class faculty (>3K citations/faculty avg) and ~45 PhD students- practically all funded

• Extensive research for largely multidisciplinary projects; world class resources; industry sponsorships/collaborations (Google, IBM, …)

4

Top organization in WWW: 10-yr Field Rating (MAS)

5

Let’s talk Big Data @ Kno.e.sis

• Social Media Big Data – Twitris, eDrugTrends• Sensor/IoT Big Data – CityPulse, kHealth• Healthcare Big Data – kHealth, EMR, Prediction• Biomedical Big Data – Biomarker from NextGen

Sequencing and Proteomics, SCOONER• Big and Smart Data Certificate Kno.e.sis private cloud: 864 CPU cores, 18TB RAM, 17TB SSD, 435TB disk

6

Smart Cities and Back to the future

Thanks to Dr. Payam Barnaghi for sharing the slide

7Source LAT Times, http://documents.latimes.com/la-2013/

Future cities: a view from 1998

Thanks to Dr. Payam Barnaghi for sharing the slide

8

Image courtesy: Avatar wiki

Thanks to Dr. Payam Barnaghi for sharing the slide

9Thanks to Dr. Payam Barnaghi for sharing the slide

10

Why: Improved Economic/Social/Human development in an era of increased Urbanization

What? All aspects of economy: Agricultural + Manufacturing + Service + Knowledge

How?• Next

Imperatives

11

Enablers of Economic Developments

Image credit: http://www.rcet.org/twd/students/socialstudies/ss_extensions_1intro.htmlImage credit: http://www.shutterstock.com/pic-157118819/stock-vector-conceptual-tag-cloud-containing-words-related-to-smart-city-digital-city-infrastructure-ict.html

Economic development on trade routesCivilizations on river banks

Economic development now increasingly rely on digital infrastructure

12

Over 340 million people live in cities of India in 2008 and it is expected to grow to 590 million by 2030 leading to rapid urbanization1

We are increasingly moving from Agriculture Industry Services The next growth should be toward Knowledge Economy

1http://www.mckinsey.com/insights/urbanization/urban_awakening_in_india

General Economic Trends

13

One aspect of characterizing a City: All its functions

Image credit: http://www.ibm.com/smarterplanet/us/en/smarter_cities/overview/index.html

14

15

Five Key Elements of Smart City*

Utility Services

Transportation Services

Social Infrastructure

Safety & health Services Recycling Services* By Indian Urban Development Ministry

16http://www.tribalcafe.co.uk/big-data-infographic/

Unprecedented Digital Data Growth

• Every thing is becoming data driven• Many types of data: Physical, Cyber, and Social• Effective collection and use of this Big Data has to be a core

part of designing Smart Cities

17

• Increased citizen participation (Social)• Increase monitoring using sensors (Physical)• Increase Digital Government (eGov) data

(Cyber)

Understanding wealth of data

Let’s not develop future applications with constraints of the past

http://www.informationweek.com/government/leadership/digital-civic-engagement-us-lags/d/d-id/1113938

India ranks 8th in civic engagement!

18

What do we need for developing Smart City Applications?

http://wiki.knoesis.org/index.php/PCS

Amit Sheth, Pramod Anantharam, Cory Henson, 'Physical-Cyber-Social Computing: An Early 21st Century Approach,' IEEE Intelligent Systems, vol. 28, no. 1, pp. 78-82, Jan.-Feb., 2013. http://doi.ieeecomputersociety.org/10.1109/MIS.2013.20

Physical

Cyber

Social*

Developers need to Consider observations from Physical-Cyber-Social systems in Building Smart City applications

*http://www.ichangemycity.com/

19

Physical: Sensors monitoring physical world

- Programmable devices- Off-the-shelf gadgets/tools

Thanks to Dr. Payam Barnaghi for sharing the slide

20

Cyber: Observations pushed to the cyber world

Thanks to Dr. Payam Barnaghi for sharing the slide

21

Motion sensorMotion sensor

Motion sensor

ECG sensor

World Wide Web

Road block, A3Road block, A3

Social: People interacting with the physical world

Thanks to Dr. Payam Barnaghi for sharing the slide

22

• Smart City application in Indian Context• Smart City Use Cases in Developed World

– Smart City application in European Context– Smart City application in US context

Scope of this talk

23

• Smart City application in Indian Context• Smart City Use Cases in Developed World

– Smart City application in European Context– Smart City application in US context

24

Dynamic schedule update of Public Transport vehiclesin A CITY Lacking Traffic Instrumentation*

Pramod Anantharam

Joint work with Biplav Srivastava and Raj Gupta, IBM IRLAug 31, 2012

*Work done as part of internship at IBM Research

25

Motivation

By 2001 over 285 million Indians lived in cities, more than in all North American cities combined (Office of the Registrar General of India 2001)1

1The Crisis of Public Transport in India2IBM Smarter Traffic

Modes of transportation in Indian Cities

Texas Transportation Institute (TTI) Congestion report in U.S.

26

Motivation: Why SMS for Events?

• Prevalence– In India, 11 cities provide notifications to citizens using SMS– SMS based alerts common for business transactions– Low-cost phones constitute 95% of all phones (~930 million

mobile connections in India2)• Social media (Facebook, Twitter) and SMS

– Commuters prefer dynamic updates such as SMS verses any other form of traffic updates1.

1Caulfield et al. Factors Which Influence the Preferences for real-time Public Transport Information, Association of European Transport and contributors 20072http://en.wikipedia.org/wiki/Communications_in_India

27

Problem

• Input: – Traffic related text alerts, domain knowledge, public

transport routes, and historical data.• Output

– Events in desired form– Impact of events on public transport routes (e.g.

probability of delay given location + event)• Challenges

– No instrumentation (sensors) leading to sparse and imprecise information, event extraction from free text.

28

Solution Components

As events are reported to MDU (Multi-modal Dynamic Update):• Traffic event detection from SMS alerts – event <Type, Time

(Reported, Published), Location (From, To, On), Description>

• Reasoning over traffic events for delay assessment– Find stops in the region affected by event (Qualitative)– Estimate delay at stops (Quantitative)

• Consider time of day and history of such events• Have an attenuation function based on event types

– Propagate delay estimates to neighboring stops• Account for time, schedule and direction of travel

29

c.p.w.d.cly. vasant vihar

vasant vihar

depot.

vasant vihar(t)

paschim marg vasant vihar

Signal Enclave

Vasant Vihar

vasant vihar model schoolc.p.w.d.cly. vasant viharpaschim marg vasant viharvasant vihar(t)vasant vihar depot.

“Traffic movement is slow from Sanjay point towards Vasant Vihar due to break down of an HTV in front of Signal Enclave.msg@10.15am,210612.”

Illustration from New Delhi (India)

eventtype = BreakDowneventdescription = “Traffic movement is slow from Sanjay point towards Vasant Vihar due to break down of an HTV in front of Signal Enclave.msg@10.15am,210612.”eventstartloc = Sanjay Pointeventendloc = Vasant vihareventonloc = Signal Enclaveeventtime = June 21, 2012, 10:15am

30

Evaluation: Event Extraction

•Run for ~50 messages in Delhi•Accurate extraction of location from, to and type.

Sample

31

Bayesian Model: Impact of Events on Delay

The probability of having a delay at a stop , Si, given events observed at the stop, is given by

32

Impact (Delay) Propagation Across Stops

Vehicle moves from S1 towards S4

Actual steps are by loopy belief propagation algorithmKoller, D., Friedman, N.: Probabilistic Graphical Models: Principles and Tech-niques. MIT Press (2009)

Assumption: delay at a node is directly influenced by delay at the next node.

33

Application: adaptive route recommendations in IRL Transit

34

1-Slide Summary: Multi-Mode Commuting Recommender in Delhi And Bangalore

Highlights• Published data of multiple authorities used; repeatable process •Multiple modes searched• Preference over modes, time, hops and number of choices supported; more extensions, like fare possible • Integration of results with map as future work; already done as part of other projects, viz. SCRIBE-STAT

IRL – Transit on March 2012

First Version

35

IRL – Transit in Aug 2012

Key Points•SMS message from city• Event and location identified• Impact assessed• Impact used in search

36

Matching Stop Names to OSM Location

• 3931 multi-modal stops in Delhi• Matching algorithm involves chucking of stop names,

distance metrics, and voting to select the best match.• Matches categorized as confident (>50% of techniques

resulted in the match), possible (=50%), and uncertain (<50%).

• 1496 (confident matches) stops mapped to OSM locations.

OSM names

37

Event Information

VasantVihar

StartLoc = Sanjay PointEndLoc = VasantViharOnLoc = Signal EnclaveLat-lon = 28.5561025,77.1645187

textual observation

Backgroundknowledge

GIS Location

Assess impact

Observations

parameterize

Fetch text message

Extract Locations StartLoc, EndLoc, and

OnLoc

Extract Event Information

Is location present?

Extract GIS location using Open Street

Maps

No

Yes

Is event present?

Domain knowledge of categorization of events

Text message with

metadata

Yes

No

Traffic movement is slow from Sanjay point towardsVasantVihar due to break down of an HTV in front ofSignal Enclave.msg@10.15am,210612.

STOPID STOPNAME

321 c.p.w.d. cly. vasantvihar

369 vasantvihar (t)814 vasantvihar model school

956 vasantviharcpwdcly.

957 paschimmargvasantvihar

1274 vasantvihar depot

STOPID STOPNAME

957 paschimmargvasantvihar

814 vasantvihar model school

321 c.p.w.d. cly. vasantvihar

956 vasantviharcpwdcly.

1274 vasantvihar depot

369 vasantvihar (t)

IRL-Transit routes (ordered)

StartLoc = Sanjay PointEndLoc = VasantViharOnLoc = Signal EnclaveEvent = break down of an HTV

StartLoc = Sanjay PointEndLoc = VasantViharOnLoc = Signal EnclaveEvent = break down of an HTVEvent Type = BreakDown

IRL-Transit routes (unordered)

Signal Enclave

38

Evaluation: Reasoning over traffic events

• Traffic alerts collected for 10 cities in India for two years.

• Prior probability of events computed using these alerts.

• Probability of having a delay given an event type at ten locations in Delhi is summarized:

39

Number of SMS messages for bus stops in Delhi for 2 years (Aug 2010 – Aug 2012)

• 344 stopswith updates• 3931 total stops

40

• Smart City application in India Context• Smart City Use Cases in Developed World

– Smart City application in European Context– Smart City application in US context

41

CityPulse Consortium

Industrial SIE, ERIC

SME AI,

HigherEducation

UNIS, NUIG,UASO, WSU

City BR, AA

Partners:

Duration: 36 months

CityPulse

42

43

AnalyticsToolbox

Context-awareDecision Support,

Visualisation

Knowledge-based

Stream Processing

Real-TimeMonitoring &

Testing

Accuracy & Trust

Modelling

SemanticIntegration

On Demand Data

Federation

OpenReferenceData Sets

Real-TimeIoT InformationExtraction

IoT StreamProcessing

Federation ofHeterogenousData Streams

Design-Time Run-Time Testing

Exposure APIs

In summary

44

Data:Data

Domain Knowledge

Socialsystems

Interactions

Open Interfaces

Ambient IntelligenceQuality and

Trust

Privacy and

Security

Open Data

45

Use cases

46

Scenario ranking

47

101 Scenarios

48

101 Scenarios

• http://www.ict-citypulse.eu/page/content/smart-city-use-cases-and-requirements

49

Public parking space availability prediction

http://www.ict-citypulse.eu/scenarios/scenarios

• Finding parking space in a city can be challenging • Predicting the probability of parking given various input

variables such as scheduled events, time of day & location.• Reduced emission and frustration for citizens

50

• Smart City application in India Context• Smart City Use Cases in Developed World

– Smart City application in European Context– Smart City application in US context

51

Extracting City Events from Social Streams

Toward a Citizen Centered Smart CityPramod Anantharam1

1Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USA

http://www.ict-citypulse.eu/page/

Mentor/Supervisor: Dr. Payam Barnaghi

52Image credit: http://www.ibm.com/smarterplanet/us/en/smarter_cities/overview/index.html

Public Safety Urban planning Gov. & agency admin.

Energy &water

Environmental Transportation Social Programs Healthcare Education

Pulse of a City (CityPulse)

53

• Are people talking about city infrastructure on twitter?

• Can we extract city infrastructure related events from twitter?

• How can we leverage event and location knowledge bases for event extraction?

• How well can we extract city events?

Research Questions

54

Are People Talking About City Infrastructure on Twitter?

55

Some Challenges in Extracting Events from Tweets

• No well accepted definition of ‘events related to a city’ • Tweets are short (140 characters) and its informal

nature make it hard to analyze– Entity, location, time, and type of the event

• Multiple reports of the same event and sparse report of some events (biased sample)– Numbers don’t necessarily indicate intensity

• Validation of the solution is hard due to the open domain nature of the problem

56

Formal Text Informal Text

Closed Domain

Open Domain [Roitman et al. 2012][Kumaran and Allan 2004]

[Lampos and Cristianini 2012]

[Becker et al. 2011]

[Wang et al. 2012]

[Ritter et al. 2012]

Related Work on Event Extraction

57

City Infrastructure

Tweets from a cityPOS

Tagging

Hybrid NER+ Event term extraction

Geohashing

Temporal Estimation

Impact Assessment

Event Aggregation

OSM Locations

SCRIBE ontology

511.org hierarchy

City Event Extraction

City Event Extraction Solution Architecture

City Event Annotation

58

• City Event Annotation – Automated creation of training data – Annotation task (our CRF model vs. baseline CRF model)

• City Event Extraction– Use aggregation algorithm for event extraction– Extracted events AND ground truth

• Dataset (Aug – Nov 2013) ~ 8 GB of data on disk– Over 8 million tweets– Over 162 million sensor data points– 311 active events and 170 scheduled events

Evaluation

59

Ground Truth Data (only incident reports) -- City Event Extraction

We have around 162 million data records from sensors monitoring over 3,700 links in San Franciso Bay Area<link_id, link_speed, link_volume, link_travel_time,time_stamp> a data record

GREEN – Active EventsYELLOW – Scheduled Events

311 active events and 170 scheduled events

60

Evaluation – Extracted Events AND Ground Truth

61

Traffic Analytics using Probabilistic Graphical Models Enhanced with Knowledge Bases

Pramod Anantharam, T. K. Prasad, Amit ShethOhio Center of Excellence in Knowledge-enabled Computing (kno.e.sis)

Wright State University, Dayton, Ohio

2nd International Workshop on Analytics for Cyber-Physical Systems (ACS-2013)

62

Slow moving traffic

Link Description

Scheduled Event

Scheduled Event

511.org

511.org

Schedule Information

511.org

63

Uncertainty in the Real-world

• Observation: Slow Moving Traffic• Multiple Causes (Uncertain about the cause):

– Scheduled Events: music events, fair, theatre events, concerts, road work, repairs, etc.

– Active Events: accidents, disabled vehicles, break down of roads/bridges, fire, bad weather, etc.

– Peak hour: e.g. 7 am – 9 am OR 4 pm – 6 pm• Each of these events may have a varying

impact on traffic

64

Why Probabilistic Graphical Models?

“As far as the laws of mathematics refer to reality, they are not certain, as far as they are certain, they do not refer to reality” -- Albert Einstein, 1921.

“Graphical models are a marriage between probability theory and graph theory. They provide a natural tool for dealing with two problems that occur throughout applied mathematics and engineering -- uncertainty and complexity …”

-- Michael Jordan, UC Berkley, 1998.

65

Graphical Models – Bayesian Network Example

SlowTraffic

Cold

IcyRoad PoorVisibility

Random variable

Edge between random variables which is indicative

of conditional independence

T 0.33F 0.67

T F0.75 0.050.25 0.95

cold

IcyR

oad T

F

T F0.85 0.40

0.15 0.60

cold

Poor

Visib

ility

T

F

cold

T F

IcyRoad PoorVisibility

T F T F

0.85 0.4 0.9 0.2

0.15 0.6 0.1 0.8

Slow

Traffi

c

T

F

Conditional Probability Table

(CPT)A graphical model hasstructure (nodes and edges) and parameters; CPD – continuous variables, CPT – discrete variables

66

How do we get nodes and edges?

Domain Experts

ColdWeather

PoorVisibility

SlowTraffic

IcyRoad

Declarative domain knowledge

Variables and relationships

Causal knowledge

Linked Open Data

ColdWeather(YES/NO) IcyRoad (ON/OFF) PoorVisibility (YES/NO) SlowTraffic (YES/NO)

1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0

Domain Observations

Domain Knowledge

Structure and parameters

WinterSeason Otherknowledge

67

Domain Knowledge

• Declarative knowledge about various domains are increasingly being published on the web1,2.

• Declarative knowledge describes concepts and relationships in a domain (structure).

• Linked Open Data may be used to derive priors probability of events (parameters).

• In this work, we focus only on use of declarative knowledge for structure using ConceptNet 5.

1http://conceptnet5.media.mit.edu/2http://linkeddata.org/

68

ConceptNet 5

http://conceptnet5.media.mit.edu/web/c/en/traffic_jam

Delay

go to baseball game

traffic jam

traffic accident

traffic jam

ActiveEvent

ScheduledEvent

Causestraffic jam

Causestraffic jam

CapableOfslow traffic

CapableOfoccur twice each day

Causes

is_a

bad weatherCapableOf

slow traffic

road iceCauses

accident

TimeOfDay

go to concertHasSubevent

car crash

accidentRelatedTo

car crash

BadWeather

Causes

Causes

is_ais_a

is_a is_a is_ais_a

is_a

69

Key Idea

• Probabilistic Graphical Models (PGM) use statistical approaches to uncover correlations.

• Declarative knowledge curated by humans provide richer relationships including causal knowledge.

• Goal: Utilizing declarative knowledge with PGM structure learning algorithms to build richer (quality and coverage) models.

70

Traffic jam

Link Description

Scheduled Event

traffic jambaseball game

Add missing random variables

Time of day

bad weather CapableOf slow traffic bad weather

Traffic data from sensors deployed on road network in San Francisco Bay Area

time of day

traffic jambaseball gametime of day

slow traffic

Complementing graphical model structure extraction

Add missing links bad weather

traffic jambaseball gametime of day

slow traffic

Add link directionbad weather

traffic jambaseball gametime of day

slow traffic

go to baseball game Causes traffic jam

Knowledge from ConceptNet5

traffic jam CapableOfoccur twice each daytraffic jam CapableOf slow traffic

71

Smart Cities: Opportunities

• empower citizens• provide more business opportunities for

companies (and SMEs) and private sector services• create better governance of our cities and better

public services • provide smarter monitoring and control• improve energy efficiency, create greener

environments… • create better healthcare, elderly-care…

Thanks to Dr. Payam Barnaghi for sharing the slide

72

Smart Cities: Challenges

• Adherence to open data standards by all the city authorities

• Sufficient guidance and support for city authorities in managing their data

• Reliability and quality of citizen reporting of city events

• Privacy and Security issues in event reporting

73

Thank you