Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza...

26
Internet Search Term Internet Search Term Surveillance for Surveillance for Influenza Influenza Philip M. Polgreen 1 , ([email protected]) Yiling Chen 3 , Forrest Nelson 2 , David M. Pennock 3 Departments of 1 Internal Medicine and 2 Economics, The University of Iowa, Iowa City, IA; 3 Yahoo! Research, New York, NY

Transcript of Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza...

Page 1: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

Internet Search Term Internet Search Term Surveillance for InfluenzaSurveillance for Influenza

Philip M. Polgreen1, ([email protected])

Yiling Chen3,Forrest Nelson2,

David M. Pennock3

Departments of 1Internal Medicine and 2Economics, The University of Iowa, Iowa City, IA;

3Yahoo! Research, New York, NY

Page 2: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

DisclosuresDisclosures

Disclosures: PMP: An Influenza Advisory Board Member for Roche

YC and DMP were employees of Yahoo! Research

Funding: RWJF,CDC, NIH

Page 3: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

MotivationMotivation

There are multiple surveillance system components for influenza in the U.S. including:

Influenza Mortality from Influenza and PneumoniaInfluenza Like Illness (ILI)Culture Data

However… they all report disease activity after it occurs

The only local (i.e., state level) data is a weekly influenza activity report from each state

Page 4: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

MotivationMotivationInfluenza occurs in regular seasonal cycles, but the character and timing of each season varies

Historically, despite the seriousness of the disease and the potential benefit from advance warning, forecasts of influenza activity have not been routinely available in the U.S.

Page 5: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

MotivationMotivationBenefits of an influenza forecast (even a few weeks in advance) include extra time for:

Preparing for an increased number of patients admitted for influenza complications

Administering prophylactic medications to persons in high-risk groups

Vaccinating high-risk individuals and healthcare workers

Page 6: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

MotivationMotivationThe Internet is an increasingly important source for medical information

Patients/Families

Medical Providers

Thus, analysis of the volume of internet search traffic may provide information about disease activity over time

An analysis of search terms can produce accurate and useful statistics about the unemployment rate

Ettredge M, Gerdes J, Karuga G. Using web-based search data to predict macroeconomics statistics. Commun ACM, 2005; 48(11):87--92.

Page 7: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

GoalsGoals

The purpose of this project was to:

(1) determine the temporal relationship between search terms for influenza and actual disease occurrence

(2) determine if and to what extent an increase in search frequency precedes official measures of influenza activity

(3) explore the feasibility of building a search based prediction market for infectious diseases

Page 8: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

MethodsMethodsDe-identified Search query logs were obtained daily from http://search.yahoo.com starting 3/2004

Unique queries originating from the U.S. and containing influenza-related search terms were counted daily

Searches had to include either: FLU or INFLUENZA

Searches were excluded if they included BIRD, AVIAN, or PANDEMIC

We also excluded searches containing SHOT, VACCINATION, VACCINE -- to avoid capturing queries related to influenza vaccination searches

Page 9: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

MethodsMethods

Daily search counts were divided by the total number of U.S. searches to get the daily fraction of influenza related searches

We then averaged the fraction over the week for every week of the year

Page 10: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

MethodsMethodsInfluenza Surveillance Data from March 2004 to August 2007

1. Weekly Influenza Culture Data: Proportion of Positive cultures

Clinical laboratories throughout the U.S. who are either World Health Organization (WHO) Collaborating Laboratories or National Respiratory and Enteric Virus Surveillance System (NREVSS) laboratories report the total number of respiratory specimens tested and the number positive for influenza types

2. 122 Cities Mortality Reporting System:

Each week participating cities report the total number of death certificates received and also the number which list pneumonia or influenza as the underlying and/or contributing cause of death. Based on the city data, we obtain influenza mortality data for 9 U.S. census regions and the whole county

Page 11: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

Searches for Influenza and Positive Influenza Cultures by Week

01

02

03

0

Pe

rcen

tage

of P

ositi

ve In

flue

nza

Cul

ture

s

0.0

000

5.0

001

.00

015

Pe

rcen

tage

of I

nflu

enz

a-R

ela

ted

Sea

rche

s

1-2005 1-2006 1-2007 1-2008

Week-Year

Internet Searches Positive Influenza Cultures

Page 12: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

Searches for Influenza and Mortality from Influenza and Pneumonia by Week

400

600

800

100

01

200

Mo

rta

lity

fro

m In

fluen

za a

nd

Pne

umo

nia

0.0

000

5.0

001

.00

015

Pe

rcen

tage

of I

nflu

enz

a-R

ela

ted

Sea

rche

s

1-2005 1-2006 1-2007 1-2008

Week-Year

Internet Searches Mortality

Page 13: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

Search and Positive CulturesSearch and Positive Cultures

We fit a linear model to test the predictability of search frequency on percentage of positive influenza cultures:

where t is a time trend (measured in weeks), Ct is rate of positive cultures in week t, and st-x is the search

frequency in week t-x

To determine the appropriate lag, we examined 0-10 (weeks)

Page 14: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

Searches and MortalitySearches and Mortality

Using the mortality data, we fit another linear model with the same format:

where

mt is the number of deaths during week t, and all other variables are as defined earlier

To determine the appropriate lag, we examined 0-10 (weeks)

Page 15: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

Predicted Values for Positive Influenza Cultures Based on Searches and Actual Values by Week

01

02

03

04

0

1-2005 1-2006 1-2007 1-2008

Week-Year

Predicted Positive Cultures Positive Influenza Cultures

Page 16: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

Predicted Values for Mortality from Influenza and Pneumonia Based on Searches and Actual Values by Week

400

600

800

100

01

200

1-2005 1-2006 1-2007 1-2008

Week-Year

Predicted Mortality Mortality

Page 17: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

Culture ResultsCulture Results

Positive Influenza Culture Regression Results

X (Lag in weeks) Coefficient:St-x Std. Error t P > |t| R2

0 239636.2 18301.99 13.09 <0.001 0.4672

1 242579.5 18218.11 13.32 <0.001 0.4723

2 239568.6 18487.33

12.96 <0.001 0.4568

3 234749.1 18848.97 12.45 <0.001 0.4356

4 229446.4 19225.16 11.93 <0.001 0.4134

5 223257.3 19628.85 11.37 <0.001 0.3890

6 215900.2 20064.8 10.76 <0.001 0.3618

7 206683.5 20565.4 10.05 <0.001 0.3300

8 195520.6 21118.44 9.26 <0.001 0.2943

9 184502.1 21619.25 8.53 <0.001 0.2610

10 173491.3 22164.1 7.83 <0.001 0.2305

Page 18: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

Mortality ResultsMortality Results

Influenza Mortality Regression Results

X (Lag in weeks)Coefficient:St-x Std. Error t P > |t| R2

0 3300788 436385.8 7.56 <0.001 0.2075

1 3810620 415148.2 9.18 <0.001 0.2787

2 4194847 394455.2 10.63 <0.001 0.3418

3 4445665 378633.3 11.74 <0.001 0.3882

4 4604043 367573.4 12.53 <0.001 0.4198

5 4625652 368166.3 12.56 <0.001 0.4229

6 4461079 379889.1 11.74 <0.001 0.3919

7 4314867 390405 11.05 <0.001 0.3649

8 4248610 396362.5 10.72 <0.001 0.3523

9 3992864 410770.2 9.72 <0.001 0.3111

10 3767351 422055.3 8.93 <0.001 0.2765

Page 19: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

LimitationsLimitations

With only four years of data, the inferential conclusions that we can make are limited

Some proportion of searches may be generated by news reports and not actual disease activity (celebrity effect)

Other searches might be for related topics that are not related to influenza activity (e.g., influenza vaccination)

Page 20: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

LimitationsLimitations

Two U.S. influenza search fraction series: one that excludes vaccination related terms and the other that does not.

Page 21: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

LimitationsLimitations

Lack of availability of this data to researchers – privacy and proprietary concerns

The geographic data gleaned from search terms is extracted from IP addresses and may not always represent actual geographic location

We could reproduce our results at a census region level

There is a lack of generally available surveillance data against which to compare search data

Page 22: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

Summary & ConclusionsSummary & Conclusions

A temporal association exists between search term frequency and influenza disease activity

Influenza related search term activity seems to precede an increase in influenza culture data by at least 4 weeks, and deaths from pneumonia and influenza by at least 7 weeks

“Search-term surveillance” may provide an inexpensive supplement to more traditional disease-surveillance systems

Page 23: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

Future WorkFuture Work

Search term surveillance is not limited to influenza

It could also be used for emerging infectious diseases, re-emerging infectious diseases and also to detect changes in phenomena related to chronic diseases

Search term surveillance of symptom based searches (e.g., diarrhea) may help detect outbreaks if search levels rise above an established baseline

Search Based Prediction Markets (How this experiment started)

Page 24: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

Future Directions Future Directions (Search Markets)(Search Markets)

Experimental markets called prediction (or decision) Experimental markets called prediction (or decision) markets are created for the sole purpose of making markets are created for the sole purpose of making forecasts and have been used successfully in a number forecasts and have been used successfully in a number of contextsof contexts

In situations involving uncertainty regarding future events, In situations involving uncertainty regarding future events, markets can be used to aggregate information from markets can be used to aggregate information from various individuals to predict future events (i.e., various individuals to predict future events (i.e., information can be extracted from the prices derived in information can be extracted from the prices derived in experimental markets) experimental markets)

Page 25: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

Future DirectionsFuture DirectionsThe Iowa Electronic MarketThe Iowa Electronic Market (the first prediction market) has a consistent track (the first prediction market) has a consistent track record of making more accurate forecasts of political elections than any national record of making more accurate forecasts of political elections than any national poll. For 6 presidential elections, the average prediction error has been under 1.5%, poll. For 6 presidential elections, the average prediction error has been under 1.5%, while opinion polls for those same elections have had an average error of 2.5%.while opinion polls for those same elections have had an average error of 2.5%.

HEWLETT-PACKARD HEWLETT-PACKARD has used experimental markets to forecast the sales of its has used experimental markets to forecast the sales of its printers more accurately than its statisticians.printers more accurately than its statisticians.

ELI LILLYELI LILLY has designed markets to predict which developmental drugs have the has designed markets to predict which developmental drugs have the best chance of advancing though clinical trials.best chance of advancing though clinical trials.

GOOGLEGOOGLE has used markets (based on IEM research) to successfully forecast has used markets (based on IEM research) to successfully forecast product launch dates, new office openings, and other events of strategic product launch dates, new office openings, and other events of strategic importance. importance.

The Iowa Influenza Prediction Market The Iowa Influenza Prediction Market has predicted influenza activity 2-4 weeks has predicted influenza activity 2-4 weeks in advance.in advance.

ProMED-mail Iowa H5N1 MarketProMED-mail Iowa H5N1 Market has predicted the number of human cases of has predicted the number of human cases of avian influenza months in advance. avian influenza months in advance.

Page 26: Internet Search Term Surveillance for Influenza Internet Search Term Surveillance for Influenza Philip M. Polgreen 1, (philip-polgreen@uiowa.edu) Yiling.

Search Based Prediction Search Based Prediction Markets for Health TopicsMarkets for Health Topics

Yahoo Tech Buzz Game: a fantasy (i.e., not real money) prediction market for high-tech products, concepts and trends.

The participants goal was to predict how popular various technologies will be in the future. Popularity or buzz is measured by Yahoo! Search frequency over time.

Predictions were made by buying stock in the products or technologies you believe will succeed, and selling stock in the technologies you think will flop.

In other words, you “put your fantasy dollars where your mouth is.”

Thus, our original (and current goal) is to build a search market for diseases