Ambient Geographic Information and biosurveilance todd barr
-
Upload
todd-barr -
Category
Data & Analytics
-
view
169 -
download
3
Transcript of Ambient Geographic Information and biosurveilance todd barr
Ambient Geographic
Information and
Biosurveillance
Capstone Presentation
Todd Barr
March 20, 2013
“Classic” Biosurveillance
• Reports Only the Cases that are handled by Medical
Professionals
• Data is sent to the Centers for Disease Control and
Prevention
• Data is Aggregated to the State level
• Standard Turn Around time is anywhere from 7 to 10
days depending on the data, and the level of the crisis
Ambient Geographic
Information
• Ambient Geographic Information (AGI) differs from
Volunteered Geographic Information (VGI)
• Most Commonly Captured from Twitter, Facebook and
Four Square
• Can be used to trace vectors through Social Networks
• Can Determine “Hot Spots” of activity via Hashtags, key
words and modifiers
• Starting to be used in Biosurveillance, but still does not
have buy in from “establishment”
Risk Terrain Modeling
• Originally Used to Predict Crime
• Core Concept is that Certain activities are related to
Geographic Features (Assaults tend to occur near certain
Liquor Stores, Bars or Entertainment Venue)
• Leads to a Spatial Understanding for Strategic Decision
Making
• Allows Decision Makers to make best use their of
Resources
AGI and RTM Enhancing
Biosurveilance
• AGI
• Allowing Real Time Disease Information to be consumed
and Analyzed both Spatially and Text
• No turn around time
• Not Aggregated to a State level
• RTM
• Generation of a RTM Map for Public Health by County
• People in the lesser served areas less likely to seek medical
attention and less likely to have symptoms/aliment reported
Data Collection - RTM
• Used the Criteria from Publication “County Health Rankings and Roadmaps: a Healthier Nation County by County
• 32 influencers on health and health care quality
• Examples
• Number of Medical Doctors in County
• Proximity to Medical Care
• Percentage of Population with Health Insurance
• Divided Counties into Quartiles
• 152 counties had no Data
Data Collection - AGI
• Used Python Script To Collect Tweets within the US to
populate spreadsheet
• Collected an average of 40,000 tweets a night
• Roughly 5% of those Tweets had location data
• Used Hashtags, Keywords and Modifiers to determine if
they were talking about the Flu, or getting a Flu shot
The Study
• Collection of Flu Related Geo located Tweets within the
United States from the week of January 5 to the week
ending February 2
• Determined how many of those Tweets were in each
Quartile
• Compare the Results to the CDC Data from those same
timeframe
Data Cleaning - AGI
• Total Usable Tweets 25,000
• Geocoding Issues
• Most had City and State
• Some just had State
• Others had full State Names which did not Geocode
• Others had Clinics for Cities and Cities for States
• Used both ESRI Online Geocoding as well as CartoDB
• ESRI Online Geolocated 75% of the total tweets
• CartoDB Geolocated 90% of the total tweets
Data Metrics – Key Words
0
5000
10000
15000
20000
25000
30000
flu Influenza h1n1 H3N2 H5N1 Adenovirus
Key Word and Hashtag
Data Metrics - Modifiers
0
500
1000
1500
2000
2500
3000
Tweet Modifiers
Data Metrics – by State
0
500
1000
1500
2000
2500
3000
AK
AL
AR
AZ
MD FL
MA
NY
CA
DE
GA
VA
TN
MO NJ
MI
WI
NC HI
IA ID IN KS
KY
LA
PA
ME
OR
MN
MS
MT
ND
NE
NH
NM
NV
OK
WV
WA RI
SC
SD
UT
VT
WY
Data Metrics – by Quartile
Total Tweets By Quartile
Quartile 1
Quartile 2
Quartile 3
Quartile 4
No Data
Maps – All Tweets
Map – Tweets January 5th
Map – CDC ILI January 5
Maps – Tweets January 12
Maps – CDC ILI January 12
Maps – Tweets January 19
Maps – CDC ILI January 19
Maps – Tweets January 26
Maps – CDC ILI January 26
Maps – Tweets February 2
Maps – CDC ILI February 2
Conclusions
• Social Media can be used as a new tool in the
Biosurveillance Toolkit
• Tweets are nearly evenly disturbed between the Risk
Quartiles
• Social Media shows trends that are reflected in the CDC
Data