Crisis Informatics (November 2013)

Crisis informatics:Finding relevant and credible information on social media during disasters

January 2010

How/when did it start for me?

Carlos Castillo – chato@acm.orghttp://www.chato.cl/research/

Fertile grounds for applied research

✔ Problems of global significance✔ Solved with labor-intensive methods✔ Better solution provides a public good✔ Large and noisy data sets available✔ Engage volunteer communities

State of the art

At least 650 publications:

Crisis Analysis (52)

Crisis Management (280)

Situational Awareness (58)

Social Media (203)

Mobile Phones (64)

Crowdsourcing (109)

Software and Tools (90)

Human-Computer Interaction (28)

Natural Language Processing (33)

Trust and Security (31)

Geographical Analysis (45)

Source: http://humanitariancomp.referata.com/

Publication titles

Fertile grounds for applied research

✔ Problem of global significance✔ Solved with labor-intensive methods✔ Better solution provides a public good✔ Large and noisy data sets available✔ Engage volunteer communities• Relevance to practitioners?

Patrick Meier, Social Innovation Director @ QCRI – http://irevolution.net/

“What can speed humanitarian

response to tsunami-ravaged

coasts? Expose human rights

atrocities? Launch helicopters to

rescue earthquake victims?

Outwit corrupt regimes?

A map.”

CollaboratorsMuhammad Imran– QCRI

Hemant Purohit– Wright Univ.

Alexandra Olteanu– EPFL

Jakob Rogstadious– Univ. of Madeira

Ioanna Lykorentzou– INRIA

Shady Elbassuoni– Univ. of Beirut

Lalana Kagal et al.– CSAIL MIT

Fernando Diaz– Microsoft

Outline

• Motivation• Handling crisis tweets• Crowdsourced verification• Ongoing work

– Automatic classification– Resource matchmaking

Crisis MappingHemant Purohit, Carlos Castillo, Patrick Meier and Amit Sheth: Crisis Mapping, Citizen Sensing and Social Media AnalyticsTutorial at ICWSM, May 2013.

I don't have time for social networks!

• We all have spare capacity– Television, TV series, Internet sites

• We overestimate ourselves in general– Don' underestimate social media

users, it is a bad starting point

An earthquake hits a Twitter user

• When an earthquake strikes, the first tweets are posted 20-30 seconds later

• Damaging seismic waves travel at 3-5 km/s, while network communications are light speed on fiber/copper + latency

• After ~100km seismic waves may be overtaken by tweets about them

http://xkcd.com/723/

Crisis Mapper Conference 2013:Next week!

Classifying and extracting information from tweetsMuhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Practical Extraction of Disaster-Relevant Information from Social MediaIn SWDM. Rio de Janeiro, Brazil, 2013.

Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Extracting Information Nuggets from Disaster-Related Messages in Social MediaIn ISCRAM. Baden-Baden, Germany, 2013. Best paper award.

Extraction

Our approach

Classification1.

Filtering

1. Filtering

Is disaster-related?

Contributes tosituational

awareness?

Yes Yes

Labeling task

Classify the following tweet from Hurricane Sandy as:● Personal: only of interest to author and

immediate circle of friends● Informative: interesting to other people● Off-topic: not related to Hurricane Sandy● Other/can't judge

Advice on labeling

• Your instructions will never be correct the first time you try– e.g. personal / eyewitness– Instructions must be re-written reactively– Perform small-scale labeling first

• Instructions must be concrete and brief– If you can't do it, the task has to be divided

2. ClassificationCaution &

AdviceInformation

SourcesDamage &Casualties Donations

Health

Shelter

Logistics

Filteredtweets

Distribution of tweet types

Caution/AdviceInfo SourceDonationsCasualties/DamageUnknown

Joplin Tornado (2011)

Classification results

Class AUC

Caution and advice 0.91

Information source 0.76

Donations 0.89

Casualties/damage 0.87

3. Extraction

Classifiedtweets

@JimFreund: Apparently we have no choice.

There is a tornado watch in effect

tonight.

Extraction

• #hashtags, @user mentions, URLs, etc.– Regular expressions– Text library from Twitter

• Temporal expressions– Part-of-speech tagger + heuristics– Natty library

• Supervised learning

Labels for extraction

• Type-dependent instruction• Ask evaluators to copy-paste a

word/phrase from each tweet

Learning: Conditional Random Fields

• Used extensively in NLP for part-of-speech tagging and information extraction

• Representation of observations is important (capitalization, position, etc.)

HMM Linear-chain CRF

hidden

observed

• CMU ARK Twitter NLP– Tokenization– Feature extraction– CRF learning

• Very easy to use: simply change the training set (part-of-speech tags) into anything, and re-train

Output examples

RT @weatherchannel: .@NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC

Wow what a mess #Sandy has made. Be sure to check on the elderly and homeless please! Thoughts and prayers to all affected

RT @twc_hurricane: Wind gusts over 60 mph are being reported at Central Park and JFK airport in #NYC this hour. #Sandy

RT @mitchellreports: Red Cross tells us grateful for Romney donation but prefer people send money or donate blood dont collect goods NOT best way to help #Sandy

Extractor evaluation

Setting Rec Prec

Train 2/3 Joplin, Test 1/3 Joplin 78% 90%

Train 2/3 Sandy, Test 1/3 Sandy 41% 79%

Train Joplin, Test Sandy 11% 78%

Train Joplin + 10% Sandy, Test 90% Sandy 21% 81%

• Precision is: one word or more in common with what humans extracted

Donations matching• Identify and match requests/offers for donations

– Money, clothing, food, shelter, volunteers, blood

Average precision = 0.21 (0.16 if only text similarity is used)

Crowdsourced stream processing systemsMuhammad Imran, Ioanna Lykourentzou and Carlos Castillo: Engineering Crowdsourced Stream Processing Systems(Submitted for publication)

Design objectives and principlesDesign principles

Design objective Example metric Automatic components

Crowdsourced components

Low latency End-to-end time Keep-items moving Trivial tasks

High throughput Output items per unit of time

High-performance processing

Task automation

Load adaptability Rate response function

Load shedding, load queueing

Task prioritization

Cost effectiveness Cost vs. quality, throughput, etc.

N/A Task frugality

High quality Application-dependent

Redudancy, aggregation and quality control

Design patterns

● QA loop

● Task assignment

● Process/verify

● Supervised learning

● Crowdwork sub-task chaining

● Humans are not a bottleneck

● Humans review every output element

http://aidr.qcri.org/

Self-service for crisis-related classification

Unstructuredtext reports

Structuredinformation

ReportClassifier

ModelBuilder

Crowdsourced active learning

Library of training data

Preliminary results: efficiency

Maximum documented input load during a natural disaster = 270 tweets/sec.

Preliminary results: effectiveness

Task: Informative vs. {Personal, Other}

Free software

• AIDR is free software• The official launch date is

November 20th during the Crisis Mappers conference in Nairobi, Kenya

Mobile applicationsFuming Shih, Oshani Seneviratne, Daniela Miao, Ilaria Liccardi, Lalana Kagal, Evan Patton, Patrick Meier, Carlos Castillo:Democratizing Mobile App Development for Disaster ManagementTo be presented at the IJCAI Workshop on Semantic Cities. Beijing, China, 2013.

Mobile components (AppInventor)

• Components useful for DIY emergency response apps–e.g. off-line tolerant

photo uploads• Aggregating/federating

linked open data

Helping developers query linked data

Resource matching

Crowdsourced verification

Crowdsourced verificationfor crisis information

• Veri.ly• Joint project between MASDAR

and QCRI• Iyad Rahwan, Abdulfatai Popoola,

Dmytro Krasnoshtan, Attila Toth (MASDAR), Victor Naroditskiy (Univ. Southampton) + QCRI

Closing remarks

Computationally feasible

Supported bydata

Useful

Good projects in this space

Computationally feasible

Supported bydata

Useful

Good projects in this space

Temptation! Danger!

Poorly planned projects :-(

AI-complete problems

Some venues

• ISCRAM – International Conference on Information Systems for Crisis Response and Management

• SMDW – Workshop on Social Web for Disaster Management

• SMERTS – Social Media and Semantic Technologies in Emergency Response

+ the usual suspects, depending on your area ;-)

Possibility of large impact by using computer science to support

humanitarian work

=Applied computing at its best

Thank you!Carlos Castillo · chato@acm.org

http://www.chato.cl/research/With thanks to Patrick Meier for several slides

Crisis Informatics (November 2013)

Technology

Transcript of Crisis Informatics (November 2013)

GRAIN | November 2009 Land grabbing and the global food crisis GRAIN November 2009.

The Crisis - Volume 1 Number 1 November 1910

Ocean Informatics Event UCSD/SIO - 02 November 2007 Data Management:

Emerging Trends in Crisis Informatics

New Books November 2015 - eui.eu · Library – New Books November 2015 • Informatics, Bibliography, etc • Philosophy & Psychology • Religion • Sociology • Political Science

Public Panic Crisis Event A Preventable Ecological Crisis Prepared for Western Kentucky University, Crisis Intervention, November 4, 2006 Copyright 2006,

Clinical Research Informatics - CDM Media · Clinical Research Informatics Biomedical Informatics Clinical Informatics Bio (Molecular) Informatics Nursing Informatics Dental Informatics

SENIORS IN CRISIS: CANADA PENSION PLAN CRISIS SENIORS IN CRISIS: CANADA PENSION PLAN CRISIS HUGS November 30,2009.

Public Health and Regional Informatics Mark Frisse November 18, 2008 see: 300-lecture.

Nursing Informatics : Opportunities Abound 2007-11-19 MNIA Introduction to Health Informatics: Session II Nursing Informatics: Opportunities Abound November.

Voice Recognition in the Electronic Health Record Diane Luedtke Nursing Informatics, NSG600INA November, 2010.

Cassandra Day Denver 2014: Using Cassandra to Support Crisis Informatics Research

LEADERSHIP IN CLINICAL INFORMATICS - hisa.org.au · issues for clinical informatics in Australia were highlighted. Then, in November/December Then, in November/December 2017 the paper

Mental Health Crisis Response Centre Electronic Health Record Presentation for Manitoba Nursing Informatics Association September 16, 2013 Heather Forrest,

Corporate Progress Geo - Informatics Report Newsletter No… · Geo - Informatics November 2015 08 Progress Report Communique to all Together, in partnership. Rare events are not

Clinical Research Informatics William G. Adams, MD Associate Professor of Pediatrics Director, BU-CTSI Clinical Research Informatics November 10, 2009.

The Crisis, Vol. 1, No. 1, November 1910 - AA.vv

How Social Media Text Analysis Can Inform Disaster …asv.informatik.uni-leipzig.de/...Media_Text_Analysis_Can_Inform_Disaster_Management.pdfThe emerging ﬁeld of crisis informatics

Planning and Response in the Aftermath of a Large Crisis: An Agent-Based Informatics Framework

Survey of Medical Informatics CS 493 – Fall 2004 November 1, 2004 V. “Juggy” Jagannathan.