Validating search protocols for mining of health and disease events on Twitter

17
Validating search protocols for mining of health and disease events on Twitter Aditya Lia Ramadona 1,2* , Lutfan Lazuardi 3 , Sulistyawati 1,4 , Anwar Dwi Cahyono 5 , Åsa Holmner 6 , Hari Kusnanto 3 , Joacim Rocklöv 1 The International Conference on Public Health (ICPH) Solo, Indonesia; September 14-15, 2016 https://arxiv.org/abs/1608.05910

Transcript of Validating search protocols for mining of health and disease events on Twitter

Validating search protocols for mining of health and disease events

on TwitterAditya Lia Ramadona1,2*, Lutfan Lazuardi3, Sulistyawati1,4,

Anwar Dwi Cahyono5, Åsa Holmner6, Hari Kusnanto3, Joacim Rocklöv1

The International Conference on Public Health (ICPH)Solo, Indonesia; September 14-15, 2016

https://arxiv.org/abs/1608.05910

Introduction Twitter• free social networking and micro-

blogging service • 140-character: news, events,

personal feeling and experiences, …• May 2016: 24.34 million Indonesian

active users ~ 10% (Statista, 2016)

Twitter offers streams of the public data flowing

• might contain health-related information

• can be explored for public health monitoring and surveillance purposes (Paul et al. 2016)

Indonesia Social Media Trend (Jakpat, 2016)

Introduction

Previous studies • Signorini et al. 2011: track levels of disease activity

• Eichstaedt et al. 2015: predicts heart disease mortality

• Strom et al. 2013: measuring health-related quality of life

• many more…

Methodological challenges • data and language processing

• model development

www.bahasakita.com

Subjects and Methods

Develop groups of words and phrases relevant to disease symptoms and health outcomes in the Bahasa Indonesia

historical Twitter

Twitter stream14dreal-time

Subjects and Methods

Sentiment analysis• examining a tweet from Twitter feeds

• the decisions were made by people with expert knowledge

millions of tweets: time-consuming and inefficient

Replicating expert assessment• develop a model, interpret results and adjust the model

• make predictions

Results: text analysis

Historical Twitter feeds: 390 tweets• "rumah OR sakit OR rawat OR inap OR demam OR panas -cuaca OR berdarah

OR pendarahan OR tombosit OR badan OR muntah OR badan OR tua OR ':('"

Preprocessing • removing retweets and eliminate some noise

• removing punctuation, numbers, capitalization, and the Bahasa stop-words (e.g. kamu and aja)

[107] "@XYZ kamu izin aja, bilang kamu sakit :(("[107] "xyz izin bilang sakit"

Results: text analysis

1,632 words

• the most highly correlate words: sakit (sick, ill, pain)

hati (0.23) ~ shame, broken heart, …rasa (0.13) ~ painperut (0.12) ~ stomach ache

Figure 1. Words that appear more than 10 times

Results: model development

Predictors

• highest words frequencies (22)

• counting the number of the predictor words in a tweet

Classification and Regression Trees model (Breiman et al. 1983)

• rpart package (Therneau et al. 2015)

Results: model development

390 tweets

historical Twitter feeds

• 273 tweets (70%): training

• 117 tweets (30%): validating

1,145,649 tweets

Twitter stream feeds: testingIndonesia: between 11°S and 6°N and 95°E and 141°E, 7 days: 26th July – 1st August 2016

• 100 from 6,109 TRUE results

• 100 from 1,139,540 FALSE result

Results: model development

Results

Results

Model Performance Validation Testing

AUC 0.82 0.70

Sensitivity 80.0 42.0

Specifity 84.6 98.0

Positive Predictive Value 86.7 95.5

Negative Predictive Value 77.2 62.8

Limitations + Challenges = Future Work

team member involved• academics, health workers

Twitter users• telecommunications infrastructure

• characteristics of people

methods• data: streaming (Indonesia, 7d/24h ~ 1.5GB in csv format)

• model: CART, RandomForest, GBM, …

Summary

Monitoring of public sentiment on Twitter + contextual knowledge • a nearly real-time proxy for health-related indicators

Models do not replace expert judgment• accurately analyze small amounts of information (tweets)

• improve and refine the model

• bias and emotion: integrate assessments of many experts

Summary

1 Department of Public Health and Clinical Medicine, Epidemiology and Global Health, Umeå University2 Center for Environmental Studies, Universitas Gadjah Mada

3 Department of Public Health, Faculty of Medicine, Universitas Gadjah Mada4 Department of Public Health, Universitas Ahmad Dahlan

5 District Health Office, Yogyakarta6 Department of Radiation Sciences, Umeå University

*[email protected]

www.themexpert.com/images/easyblog_articles/270/twitter_cover.jpg