Validating search protocols for mining of health and disease events on Twitter
-
Upload
aditya-l-ramadona -
Category
Data & Analytics
-
view
77 -
download
0
Transcript of Validating search protocols for mining of health and disease events on Twitter
Validating search protocols for mining of health and disease events
on TwitterAditya Lia Ramadona1,2*, Lutfan Lazuardi3, Sulistyawati1,4,
Anwar Dwi Cahyono5, Åsa Holmner6, Hari Kusnanto3, Joacim Rocklöv1
The International Conference on Public Health (ICPH)Solo, Indonesia; September 14-15, 2016
https://arxiv.org/abs/1608.05910
Introduction Twitter• free social networking and micro-
blogging service • 140-character: news, events,
personal feeling and experiences, …• May 2016: 24.34 million Indonesian
active users ~ 10% (Statista, 2016)
Twitter offers streams of the public data flowing
• might contain health-related information
• can be explored for public health monitoring and surveillance purposes (Paul et al. 2016)
Indonesia Social Media Trend (Jakpat, 2016)
Introduction
Previous studies • Signorini et al. 2011: track levels of disease activity
• Eichstaedt et al. 2015: predicts heart disease mortality
• Strom et al. 2013: measuring health-related quality of life
• many more…
Methodological challenges • data and language processing
• model development
www.bahasakita.com
Subjects and Methods
Develop groups of words and phrases relevant to disease symptoms and health outcomes in the Bahasa Indonesia
historical Twitter
Twitter stream14dreal-time
Subjects and Methods
Sentiment analysis• examining a tweet from Twitter feeds
• the decisions were made by people with expert knowledge
millions of tweets: time-consuming and inefficient
Replicating expert assessment• develop a model, interpret results and adjust the model
• make predictions
Results: text analysis
Historical Twitter feeds: 390 tweets• "rumah OR sakit OR rawat OR inap OR demam OR panas -cuaca OR berdarah
OR pendarahan OR tombosit OR badan OR muntah OR badan OR tua OR ':('"
Preprocessing • removing retweets and eliminate some noise
• removing punctuation, numbers, capitalization, and the Bahasa stop-words (e.g. kamu and aja)
[107] "@XYZ kamu izin aja, bilang kamu sakit :(("[107] "xyz izin bilang sakit"
Results: text analysis
1,632 words
• the most highly correlate words: sakit (sick, ill, pain)
hati (0.23) ~ shame, broken heart, …rasa (0.13) ~ painperut (0.12) ~ stomach ache
Figure 1. Words that appear more than 10 times
Results: model development
Predictors
• highest words frequencies (22)
• counting the number of the predictor words in a tweet
Classification and Regression Trees model (Breiman et al. 1983)
• rpart package (Therneau et al. 2015)
Results: model development
390 tweets
historical Twitter feeds
• 273 tweets (70%): training
• 117 tweets (30%): validating
1,145,649 tweets
Twitter stream feeds: testingIndonesia: between 11°S and 6°N and 95°E and 141°E, 7 days: 26th July – 1st August 2016
• 100 from 6,109 TRUE results
• 100 from 1,139,540 FALSE result
Results
Model Performance Validation Testing
AUC 0.82 0.70
Sensitivity 80.0 42.0
Specifity 84.6 98.0
Positive Predictive Value 86.7 95.5
Negative Predictive Value 77.2 62.8
Limitations + Challenges = Future Work
team member involved• academics, health workers
Twitter users• telecommunications infrastructure
• characteristics of people
methods• data: streaming (Indonesia, 7d/24h ~ 1.5GB in csv format)
• model: CART, RandomForest, GBM, …
Summary
Monitoring of public sentiment on Twitter + contextual knowledge • a nearly real-time proxy for health-related indicators
Models do not replace expert judgment• accurately analyze small amounts of information (tweets)
• improve and refine the model
• bias and emotion: integrate assessments of many experts
1 Department of Public Health and Clinical Medicine, Epidemiology and Global Health, Umeå University2 Center for Environmental Studies, Universitas Gadjah Mada
3 Department of Public Health, Faculty of Medicine, Universitas Gadjah Mada4 Department of Public Health, Universitas Ahmad Dahlan
5 District Health Office, Yogyakarta6 Department of Radiation Sciences, Umeå University
www.themexpert.com/images/easyblog_articles/270/twitter_cover.jpg