Disaster data informatics for situation awareness

21
Disaster Data Informatics for Situation Awareness Ashutosh Jadhav [email protected] Ohio Center of Excellence in Knowledge Enabled Computing (Kno.e.sis) Wright State University, Dayton, OH

Transcript of Disaster data informatics for situation awareness

Disaster Data Informatics for Situation Awareness

Ashutosh [email protected]

Ohio Center of Excellence in Knowledge Enabled Computing (Kno.e.sis)

Wright State University, Dayton, OH

Research Problem

Disaster Data Informatics for Situation Awareness Expedite decision making process in the disaster situation by identifying useful/actionable information from social media

1. Informativeness Analysisa. Identify information rich tweet messages (filtering noisy tweets)

based on variety of analysis

2. Classifying information rich messages a. People at the disaster site, suffering people asking for helpb. Global response about the disaster (opinions, comments, news

etc.)

3. Expedite decision making process and situational awarenessa. Considering (2.a) understand needs at disaster site b. Make connection resource-->needs

Informativeness Analysis

Motivation: Information Overload●

● 5,500 tweets per seconds during japanese earthquake and tsunami

***Within a minute of the quake, there were more than 40,000 earthquake-related Tweets. The micro-blogging site said it hit about 5,500 Tweets per second on the quake......

-The New York Times

How to find useful and actionable information quickly from such huge stream

of incoming event data?

Multidimensional Event Analysis

Multidimensional dataDimensions Data generated at the

disaster location Data generated

around the worldWho generates the data?(People)

Affected people, NGO volunteers

People not directly involved in the diaster

What data is generated?(Content)

Reports about - current situation, - needs for resources, - medical & other emergencies- complains etc.

-Opinions, concerns, sympathy, desire for help

-Sharing of related news, blogs and other multimedia

How the data is generated?(Network)

- Social media (Twitter, FB)- SMS and Web reports to involved NGOs and government organization

Majorly through social media (Twitter, Facebook, blogs, etc)

Why data is generated?(Intention)

- Seeking for help- Inform current situation, needs etc

Sharing personal view-points on the disaster related incidents

When data is generated(Time)

After the disaster, in recovery and rebuild phase

Mostly after the disaster

Research problem

How can we identify useful/ informative (actionable) information

that can be used toexpedite decision making & situational awareness

in the disaster situation?

Approach

Informativeness Analysis - Definition ● Useful/actionable information in the disaster situation

that can help for better and faster situation awareness

Examples messagesWe need tent, cover, rice. Uneted Nation never Help us since the earthquake, we live in Carre-four, Lapot street,

if women and children are victim of rape or other agressions in provisionnal shelter, what number can we call to have fast assistance.

We are still under the sheets. We do not have: Tents, prelates, sanitary articles and household etc. Bastien the city Alix fontamara 27

we don't have some water in the delmas camp 40b

We need tent indelmas 18 because we don't find nothing in the area.

How can we find help and food in fontamara 43 rue menos

A father, whose wife passed away, and has two children who need medical attention. One child has a broken arm, and he is afraid of infection

Multidimensional dataDimensions Data generated at the

disaster location Data generated

around the worldWho generates the data? (People)

Affected people, NGO volunteers People not directly involved in the disaster

What data is generated?(Content)

Reports about - current situation, - needs for resources, - medical & other emergencies- complains etc.

-Opinions, concerns, sympathy, desire for help

-Sharing of related news, blogs and other multimedia

How the data is generated?(Network)

- Social media (Twitter, FB)

- SMS and Web reports to involved NGOs and government organization

Majorly through social media (Twitter, Facebook, blogs, etc)

Why data is generated?(Intention)

- Seeking for help- Inform current situation, needs etc

Sharing personal view-points on the disaster related incidents

When data is generated(Time)

After the disaster, in recovery and rebuild phase

Mostly after the disaster

Data set● Social Networking Messages

○ Twitter, Facebook

● News articles○ News websites, external links from tweets, FB status

● NGO messages○ Ushahidi messages/reports

● Mobile messages○ SMS

Informativeness Analysis

Content Analysis● Structure and syntactic analysis● Linguistic analysis● Text analysis● Metadata Analysis

People Analysis● Author profile description● Social connectivity ● Activity level● Author credibility/influence

News Analysis● Content analysis● Social share analysis● URL credibility● Alexa analysis

Semantic Analysis● Content annotation using disaster domain model

considering: entities mentioned, needs, resources, location, organizations, people, disaster type etc.

Content Analysis● Structure and syntactic analysis

○ Message length○ Number of words, special characters, slags, dictionary words

● Linguistic analysis○ Number of nouns, verbs, adverbs, adjective○ POS patterns

● Text analysis

○ N-gram analysis○ TF_IDF statistics○ Entities (dbpedia/ontology)

● Metadata analysis○ Publish time○ Location (explicit and implicit)

People Analysis ● Author profile description

○ Profession○ Demographic information (age, gender, location)

● Social connectivity ○ Number of follow-followers

● Activity level○ Number of tweets○ Number of tweets "on topic"

● Author credibility/influence ○ Klout ○ SocialMatica○ Peer index

News Analysis● News and other event related stories are generally linked in many

of the event related messages (tweets, etc.) primarily ○ Message size limitation (140 characters for Twitter)○ Bringin external authoritative context

● Analyzing news and other event related stories plays a crucial role in event analysis

Many news stories about the event ■ which news stories to focus on?■ how to extract useful and actionable information

nuggets from these news stories ?

News Analysis

Content Analysis- Structure and syntactic analysis- Linguistic analysis- Text analysis- Metadata Analysis

Social share analysis- Number tweets, retweets- Facebook share, like, comments, recommendations- Google plus, LinkedIn shares

URL credibility - Google page rank- Local credibility (?)

Alexa analysis (Alexa is a web information

company)

- Alexa global and country rank- Alexa url authority - Alexa url & subdomain mozRank- Alexa page & domain authority

Semantic Analysis● Content annotation using disaster domain model

considering variety of entities mentioned (DBPedia)○ needs, resources, location, organizations, people,

disaster type etc.

Semantic Disaster Model*** Reuse/ (formalise and build) disaster domain model considering:

Disaster type Earthquake, floods, terror attack (disaster type will help us for better understanding of needs)

Needs Model of basic human needs needs in disasters like food, water, medicines, shelter, etc

ResourcesModel of resources which can satisfy some need like need: thirsty -> resource: water, fruit juice, need: hungry -> resource: food etc.

Location Location of incidents, geo-location data

Organization Involved government and non-government organizations

People & social role

Model of people base on gender, age group, role (mother, father, son, etc.) (This can be help in understanding/reasoning needs like if there is mention of mother and baby then need may be milk)

QCRI Internship Proposal