Crisis Informatics (November 2013)

Post on 09-May-2015

848 views 0 download

description

Talk at Microsoft Research, New York City, November 2013.

Transcript of Crisis Informatics (November 2013)

Crisis informatics:Finding relevant and credible information on social media during disasters

January 2010

How/when did it start for me?

3

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Fertile grounds for applied research

✔ Problems of global significance✔ Solved with labor-intensive methods✔ Better solution provides a public good✔ Large and noisy data sets available✔ Engage volunteer communities

Publication titles

7

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Fertile grounds for applied research

✔ Problem of global significance✔ Solved with labor-intensive methods✔ Better solution provides a public good✔ Large and noisy data sets available✔ Engage volunteer communities• Relevance to practitioners?

Patrick Meier, Social Innovation Director @ QCRI – http://irevolution.net/

Patrick Meier, Social Innovation Director @ QCRI – http://irevolution.net/

“What can speed humanitarian

response to tsunami-ravaged

coasts? Expose human rights

atrocities? Launch helicopters to

rescue earthquake victims?

Outwit corrupt regimes?

A map.”

10

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

CollaboratorsMuhammad Imran– QCRI

Hemant Purohit– Wright Univ.

Alexandra Olteanu– EPFL

Jakob Rogstadious– Univ. of Madeira

Ioanna Lykorentzou– INRIA

Shady Elbassuoni– Univ. of Beirut

Lalana Kagal et al.– CSAIL MIT

Fernando Diaz– Microsoft

11

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Outline

• Motivation• Handling crisis tweets• Crowdsourced verification• Ongoing work

– Automatic classification– Resource matchmaking

Crisis MappingHemant Purohit, Carlos Castillo, Patrick Meier and Amit Sheth: Crisis Mapping, Citizen Sensing and Social Media AnalyticsTutorial at ICWSM, May 2013.

13

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

I don't have time for social networks!

• We all have spare capacity– Television, TV series, Internet sites

• We overestimate ourselves in general– Don' underestimate social media

users, it is a bad starting point

18

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

An earthquake hits a Twitter user

• When an earthquake strikes, the first tweets are posted 20-30 seconds later

• Damaging seismic waves travel at 3-5 km/s, while network communications are light speed on fiber/copper + latency

• After ~100km seismic waves may be overtaken by tweets about them

http://xkcd.com/723/

26

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Crisis Mapper Conference 2013:Next week!

Classifying and extracting information from tweetsMuhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Practical Extraction of Disaster-Relevant Information from Social MediaIn SWDM. Rio de Janeiro, Brazil, 2013.

Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Extracting Information Nuggets from Disaster-Related Messages in Social MediaIn ISCRAM. Baden-Baden, Germany, 2013. Best paper award.

28

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

3.

Extraction

Our approach

2.

Classification1.

Filtering

29

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

1. Filtering

Is disaster-related?

Contributes tosituational

awareness?

Yes Yes

No No

30

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Labeling task

Classify the following tweet from Hurricane Sandy as:● Personal: only of interest to author and

immediate circle of friends● Informative: interesting to other people● Off-topic: not related to Hurricane Sandy● Other/can't judge

31

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Advice on labeling

• Your instructions will never be correct the first time you try– e.g. personal / eyewitness– Instructions must be re-written reactively– Perform small-scale labeling first

• Instructions must be concrete and brief– If you can't do it, the task has to be divided

32

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

2. ClassificationCaution &

AdviceInformation

SourcesDamage &Casualties Donations

Health

Shelter

Food

Water

Logistics

...

...

Filteredtweets

33

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Distribution of tweet types

50%

18%

16%

10%6%

Caution/AdviceInfo SourceDonationsCasualties/DamageUnknown

Joplin Tornado (2011)

34

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Classification results

Class AUC

Caution and advice 0.91

Information source 0.76

Donations 0.89

Casualties/damage 0.87

35

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

3. Extraction

...

Classifiedtweets

@JimFreund: Apparently we have no choice.

There is a tornado watch in effect

tonight.

36

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Extraction

• #hashtags, @user mentions, URLs, etc.– Regular expressions– Text library from Twitter

• Temporal expressions– Part-of-speech tagger + heuristics– Natty library

• Supervised learning

37

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Labels for extraction

• Type-dependent instruction• Ask evaluators to copy-paste a

word/phrase from each tweet

38

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Learning: Conditional Random Fields

• Used extensively in NLP for part-of-speech tagging and information extraction

• Representation of observations is important (capitalization, position, etc.)

HMM Linear-chain CRF

hidden

observed

39

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Tool

• CMU ARK Twitter NLP– Tokenization– Feature extraction– CRF learning

• Very easy to use: simply change the training set (part-of-speech tags) into anything, and re-train

40

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Output examples

RT @weatherchannel: .@NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC

Wow what a mess #Sandy has made. Be sure to check on the elderly and homeless please! Thoughts and prayers to all affected

RT @twc_hurricane: Wind gusts over 60 mph are being reported at Central Park and JFK airport in #NYC this hour. #Sandy

RT @mitchellreports: Red Cross tells us grateful for Romney donation but prefer people send money or donate blood dont collect goods NOT best way to help #Sandy

41

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Extractor evaluation

Setting Rec Prec

Train 2/3 Joplin, Test 1/3 Joplin 78% 90%

Train 2/3 Sandy, Test 1/3 Sandy 41% 79%

Train Joplin, Test Sandy 11% 78%

Train Joplin + 10% Sandy, Test 90% Sandy 21% 81%

• Precision is: one word or more in common with what humans extracted

42

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Donations matching• Identify and match requests/offers for donations

– Money, clothing, food, shelter, volunteers, blood

Average precision = 0.21 (0.16 if only text similarity is used)

Crowdsourced stream processing systemsMuhammad Imran, Ioanna Lykourentzou and Carlos Castillo: Engineering Crowdsourced Stream Processing Systems(Submitted for publication)

44

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

45

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Design objectives and principlesDesign principles

Design objective Example metric Automatic components

Crowdsourced components

Low latency End-to-end time Keep-items moving Trivial tasks

High throughput Output items per unit of time

High-performance processing

Task automation

Load adaptability Rate response function

Load shedding, load queueing

Task prioritization

Cost effectiveness Cost vs. quality, throughput, etc.

N/A Task frugality

High quality Application-dependent

Redudancy, aggregation and quality control

Design patterns

● QA loop

● Task assignment

● Process/verify

● Supervised learning

● Crowdwork sub-task chaining

● Humans are not a bottleneck

● Humans review every output element

47

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

http://aidr.qcri.org/

48

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Self-service for crisis-related classification

Unstructuredtext reports

Structuredinformation

ReportClassifier

ModelBuilder

Crowdsourced active learning

Library of training data

49

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Preliminary results: efficiency

Maximum documented input load during a natural disaster = 270 tweets/sec.

Preliminary results: effectiveness

Task: Informative vs. {Personal, Other}

52

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Free software

• AIDR is free software• The official launch date is

November 20th during the Crisis Mappers conference in Nairobi, Kenya

Mobile applicationsFuming Shih, Oshani Seneviratne, Daniela Miao, Ilaria Liccardi, Lalana Kagal, Evan Patton, Patrick Meier, Carlos Castillo:Democratizing Mobile App Development for Disaster ManagementTo be presented at the IJCAI Workshop on Semantic Cities. Beijing, China, 2013.

54

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Mobile components (AppInventor)

• Components useful for DIY emergency response apps–e.g. off-line tolerant

photo uploads• Aggregating/federating

linked open data

55

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Helping developers query linked data

57

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Resource matching

Crowdsourced verification

3

61

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Crowdsourced verificationfor crisis information

• Veri.ly• Joint project between MASDAR

and QCRI• Iyad Rahwan, Abdulfatai Popoola,

Dmytro Krasnoshtan, Attila Toth (MASDAR), Victor Naroditskiy (Univ. Southampton) + QCRI

Closing remarks

65

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Computationally feasible

Supported bydata

Useful

Good projects in this space

66

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Computationally feasible

Supported bydata

Useful

Good projects in this space

Temptation! Danger!

Poorly planned projects :-(

AI-complete problems

67

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Some venues

• ISCRAM – International Conference on Information Systems for Crisis Response and Management

• SMDW – Workshop on Social Web for Disaster Management

• SMERTS – Social Media and Semantic Technologies in Emergency Response

+ the usual suspects, depending on your area ;-)

68

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Possibility of large impact by using computer science to support

humanitarian work

=Applied computing at its best

Thank you!Carlos Castillo · chato@acm.org

http://www.chato.cl/research/With thanks to Patrick Meier for several slides