Twitcident: Fighting Fire with Information from Social Web Streamsfabianabel.de/papers/2012-wis- ·...

download Twitcident: Fighting Fire with Information from Social Web Streamsfabianabel.de/papers/2012-wis- · 2012-04-05 · Twitcident: Fighting Fire with Information from Social Web Streams

If you can't read please download the document

Transcript of Twitcident: Fighting Fire with Information from Social Web Streamsfabianabel.de/papers/2012-wis- ·...

  • Twitcident: Fighting Fire with Information fromSocial Web Streams

    Fabian Abel, Claudia Hauff,Geert-Jan Houben, Ke Tao

    TU Delft, Web Information SystemsPO Box 5031, 2600 GA Delft, the Netherlands

    {f.abel,c.hauff,g.j.p.m.houben,k.tao}@tudelft.nl

    Richard StronkmanTwitcident.com

    Koning Wilheminaplein 400, Amsterdam, theNetherlands

    [email protected]

    ABSTRACTIn this paper, we present Twitcident, a framework and Web-based system for filtering, searching and analyzing informa-tion about real-world incidents or crises. Twitcident con-nects to emergency broadcasting services and automaticallystarts tracking and filtering information from Social Webstreams (Twitter) when a new incident occurs. It enrichesthe semantics of streamed Twitter messages to profile in-cidents and to continuously improve and adapt the infor-mation filtering to the current temporal context. Facetedsearch and analytical tools allow users to retrieve particularinformation fragments and overview and analyze the currentsituation as reported on the Social Web.

    Demo: http://wis.ewi.tudelft.nl/twitcident/

    Categories and Subject DescriptorsH.3.3 [Information Systems]: Information Search and Re-trieval—Information filtering

    General TermsAlgorithms, Design, Experimentation

    KeywordsSocial Web Streams, Filtering, Faceted Search, Semantics

    1. INTRODUCTIONDuring crisis situations such as large fires, storms or other

    types of incidents, people nowadays report and discuss abouttheir observations, experiences, and opinions in various So-cial Web streams. Recent studies show that data from theSocial Web and particularly Twitter helps to detect inci-dents [4] or to analyze the information that people reportabout an incident [2]. However, (i) automatically filteringrelevant information from Social Web streams and (ii) mak-ing the information accessible and findable in a given inci-

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Copyright 2012 ACM X-XXXXX-XX-X/XX/XX ...$5.00.

    incident description

    Incident Profiling

    Semantic Enrichment

    Incident Detection

    Faceted Search &

    Realtime Analytics

    enriched filtered media

    enriched media

    Filtering

    media

    incident-relevant media

    incident profile

    Social Web

    NER

    Classification

    Linkage

    Metadata Extraction

    Social Media Aggregation

    Emergency Services & General Public

    information need

    relevant media & analytics

    Emergency Broadcasters

    Figure 1: Twitcident architecture: (i) incident pro-filing and filtering of social media that is relevantto an incident (green boxes) and (ii) faceted searchand realtime analytics functionality to explore andoverview the media (blue box). Both types of com-ponents benefit from the semantic enrichment of theaggregated media.

    dent context are two fundamental challenges that have notbeen researched sufficiently yet. In this paper, we demon-strate solutions to these challenges.

    Filtering and search in Social Web streams differs fromtraditional Web search. For example, Teevan et al. [6] re-veal significant differences in the search behavior on Twittercompared to Web search: Twitter users are specifically in-terested in information related to events and often use therudimentary search functionality of Twitter to monitor in-formation streams. With Twitcident, we introduce a frame-work that automates the process of monitoring relevant in-formation that is published in Social Web streams. Facetedsearch capabilities [1] allow end-users to further filter andexplore the information.

    2. TWITCIDENTIn this section, we first overview the architecture of the

    Twitcident framework and then detail its key components(see Figure 1). The Web-based front-end of the Twitcidentsystem is depicted in Figure 3.

  • P2000 Broadcast

    Twitcident Framework

    Initial query: (Moerdijk OR Chemie-Pack) AND (fire OR smoke OR flame…) SINCE:2011-01-05 Refined query based on incident

    profiling: (Moerdijk OR Dordrecht…) AND (#moerdijkFire OR toxic…) …

    2.

    3. 4.

    Incident in Twitcident:

    Twitter

    Twitcident system (a) (b)

    Broadcasted incident description: Prio 1 fire : : Vlasweg : 4 4782PW Moerdijk :: Chemie Pack

    1.

    Figure 2: Incident detection: (1) as soon as an in-cident is broadcasted via the P2000 network, theTwitcident framework (2) transforms the encodedP2000 message into an initial incident query to (3)collect Twitter messages that are possibly relevantfor the incident so that (4) information about theincident can be accessed via the Twitcident system.Over time, the incident profiling effects refinementsof the queries that are used to collect tweets. Thescreenshot shows the dashboard of popular incidentsthat are (and have been) monitored by Twitcident.

    2.1 Architecture and OverviewThe core framework functionality is triggered by an inci-

    dent detection module that senses for incidents being broad-casted by emergency services. Whenever an incident is de-tected, Twitcident starts a new thread for profiling the in-cident and aggregating social media and Twitter messagesfrom the Web. The collected messages are further processedby the semantic enrichment module which features namedentity recognition (NER), classification of messages, linkageof messages to external Web resources and further metadataextraction. The semantic enrichment is one of the key en-abling components of the Twitcident framework as it (i) sup-ports semantic filtering of Twitter messages to identify thosetweets that are relevant for a given incident, (ii) allows forfaceted search on the filtered media and (iii) provides meansfor summarizing information about incidents and perform-ing realtime analytics.

    In the Twitcident system, both faceted search and re-altime analytics are made available to client users via agraphical user interface that is displayed in Figure 3. Thesearch functionality allows end-users to further filter mes-sages about an incident while analytics deliver diagrams andgadgets that enable users to analyze and overview how peo-ple report about the incident on the Social Web. We nowdiscuss each of the components of our architecture in detail.

    2.2 Incident DetectionFor detecting incidents, the Twitcident system relies on

    emergency broadcasting services. In the Netherlands, in-cidents which require the police, fire department or otherpublic emergency services to take an action and which aremoreover of interest to the general public, are immediatelypublished via the P2000 communication network and de-scribe what type of incident has happened, where and whenit happened and also what scale the incident is classifiedas. Figure 2(a) shows an example P2000 message inform-

    ing about a large fire incident that happened in the city ofMoerdijk, the Netherlands, on January 5th 2011. The figurevisualizes the automatic workflow that is triggered whenevera new incident is reported. If a new incident is detected thenthe Twitcident framework translates the broadcasted mes-sage into an initial incident profile that is applied as query tocollect relevant messages from Twitter. All incidents thatare monitored by the Twitcident system are listed on thedashboard that is depicted in Figure 2(b).

    2.3 Incident Profiling and FilteringWhile monitoring an incident, Twitcident continuously

    adapts the incident profiling to improve the filtering of mes-sages. This process is realized via the following components(see Figure 1): (i) incident profiling, (ii) social media aggre-gation, (iii) semantic enrichment and (iv) filtering.

    2.3.1 Incident ProfilingBased on the initial incident description and the collected,

    enriched Social Web messages, the incident profiling mod-ule generates a profile that is used to refine the media ag-gregation and the filtering. An incident profile is a set ofweighted attribute-value pairs that describe the character-istics of the incident. Attributes may refer to concepts thatare related to the incident (e.g. locations, persons) and theweights specify the importance of an attribute-value pair forthe incident. Therefore, the aforementioned fire that hap-pened in Moerdijk may have the following incident profile:P(imoerdijk) = {((location, Moerdijk), 1.0), ((location, Dor-drecht), 0.73), ((type, Fire), 1.0), . . . }. Incident profiles arecontinuously updated to adapt to topic changes that arisewithin an incident. To prevent topic drift, we combine thecurrent profile with the initial incident profile following aclassical mixture approach.

    2.3.2 Social Media AggregationBased on the incident profiling, the Twitcident system

    exploits the social media aggregation component to collectTwitter messages as well as related pictures and videos thatare posted on platforms such as Twitpic or Twitvid1 re-spectively. Twitcident utilizes both the REST API and theStreaming API of Twitter to collect messages. The RESTAPI enables Twitcident to collect also those incident-relatedtweets that have been posted before the emergency servicesreported about a given incident.

    2.3.3 Semantic EnrichmentThe aggregated Social Web content (Twitter messages) is

    processed by the semantic enrichment component of Twit-cident which features the following functionality.

    NER. The named entity recognition (NER) module as-sembles four different services2 for detecting entities suchas persons, locations or organizations that are mentioned intweets. As these services only function for English texts,Twitcident translates non-English tweets to English3. The

    1http://twitpic.com and http://twitvid.com2http://dbpedia.org/spotlight, http://alchemyapi.com, http://opencalais.com, http://zemanta.com3Language detection:http://code.google.com/p/language-detection/Translation: http://code.google.com/apis/language/translate/overview.html

  • !"#$%#&'()$&%#*(%+(,-.,(/%.0(*%-+(1%

    !"#$%/.%'(.'2(%*('.*$%#3.4$%.0(*%-+(1%

    !"#$%$5'(%.6%$"7,8&%#*(%+(,-.,(/%7,%$"(%$9(($&1%

    :#;%

  • !"#$%!"&&%

    !"$'%

    !"&(%!"$)%

    !"%

    !"'&%

    !"*$%

    !%!"&%!"#%!"$%!"*%!"(%!"'%!"+%

    ,-./012%3456-7408%%

    9-:;?/@-=%3456-7408%

    !"#$%#&'$%()*+,,%

    ABC%

    CD&!%

    CD$!%

    E-2/55%

    (a) Filtering

    RAW Datasetting_description as settingnameOfSearchEngine as strategy avg(IF(avgRankOfSelectedFVP