2009 EpiSPIDER CDC GIS Day

35
Herman Tolentino, MD Director, Public Health Informatics Fellowship Program

Transcript of 2009 EpiSPIDER CDC GIS Day

Page 1: 2009 EpiSPIDER CDC GIS Day

Herman Tolentino, MDDirector, Public Health Informatics Fellowship Program

Page 2: 2009 EpiSPIDER CDC GIS Day

Presentation Outline

What is EpiSPIDER? Why was EpiSPIDER built? What is event-based surveillance? How was EpiSPIDER built? The EpiSPIDER “Information Ecosystem” Evolution of EpiSPIDER How has EpiSPIDER been used? What are the challenges in implementing EpiSPIDER? Overall challenges in event-based surveillance Next steps Summary

Page 3: 2009 EpiSPIDER CDC GIS Day

What is EpiSPIDER?

The acronym stands for Semantic Processing and Integration of Distributed Electronic Resources for Epidemics and disasters

Key words Semantic processing Integration of distributed electronic resources

• “Mashup”• Visualization

Page 4: 2009 EpiSPIDER CDC GIS Day

Why was EpiSPIDER built?

2005: Request from ProMED Mail to represent their emerging infectious disease reports in time and space and provide RSS feeds to their members

2006: Growth beyond ProMED Mail and Google maps

2009 and beyond: Leveraging linked data to reduce information overload

Page 5: 2009 EpiSPIDER CDC GIS Day

Why was EpiSPIDER built?

Early response to disease outbreaks is a public health priority Emerging infectious diseases may not be part of routine public

health reporting in many countries We can potentially leverage non-traditional sources of data to

provide practitioners with early warning Specifically, leverage Internet killer applications to collect and

exchange health event information Extracting and visualizing event information from unstructured

data can be done using computer algorithms such as NLP and text mining (80% of health information remain locked in free text)

The Role of Information Technology and Surveillance Systems in Bioterrorism Readiness. Bioterrorism and Health System Preparedness, Issue Brief No. 5. AHRQ Publication No. 05-0072, March 2005. Agency for Healthcare Research and Quality, Rockville, MD. http://www.ahrq.gov/news/ulp/btbriefs/btbrief5.htm

Page 6: 2009 EpiSPIDER CDC GIS Day

What is event-based surveillance?WHO DEFINITION

Definition: The organized and rapid capture of information about events that are a potential risk to public health

Can be rumors and other ad-hoc reports transmitted through formal channels (i.e. established routine reporting systems) and informal channels (i.e. media, health workers and nongovernmental organizations reports), including: Events related to the occurrence of disease in humans, such as clustered cases of a

disease or syndromes, unusual disease patterns or unexpected deaths as recognized by health workers and other key informants in the country; and

Events related to potential exposure for humans, such as events related to diseases and deaths in animals, contaminated food products or water, and environmental hazards including chemical and radio-nuclear events.

Information received through event-based surveillance should be rapidly assessed for the risk the event poses to public health and responded to appropriately

Source: WHO, A guide to establishing event-based surveillance, 2008. URL: http://www.wpro.who.int/internet/resources.ashx/CSR/Publications/eventbasedsurv.pdf

Page 7: 2009 EpiSPIDER CDC GIS Day

Role of event-based surveillance in national surveillance system (WHO)

Source: WHO, A guide to establishing event-based surveillance, 2008. URL: http://www.wpro.who.int/internet/resources.ashx/CSR/Publications/eventbasedsurv.pdf

Indicator-based Surveillance

Routine reporting of cases of disease, including•Notifiable disease surveillance system•Sentinel surveillance•Laboratory-based surveillance

Commonly•Health care facility based•Weekly, monthly reporting

Event-based Surveillance

Rapid detection, reporting, confirmation, assessment of public health events including•Clusters of disease•Rumors of unexplained deaths

Commonly•Immediate reporting

ResponseLinked to surveillance

National and subnational capacity to respond to alerts

Page 8: 2009 EpiSPIDER CDC GIS Day

Role of event-based surveillance in national surveillance (ECDC)

Indicator-based component

Surveillance Systems

Event-based component

Event-monitoring

Data Events

Signal

Public health alert

Control measures

CaptureFilterValidate

CollectAnalyseInterpret

Assess

Investigate

Disseminate

Confidential: EWRSRestricted access: network inquiries, ECDC threat bulletinPublic: Eurosurveillance, press release, web site

Paquet C, et..al. Epidemic intelligence: A new framework for strengthening disease surveillance in Europe. Euro Surveill. 2006;11(12): 212-4. URL: http://www.eurosurveillance.org/ViewArticle.aspx?ArticleId=665

Page 9: 2009 EpiSPIDER CDC GIS Day

Major challenges in developing automated event-based surveillance systems

Can event-based surveillance systems be automated?

Major challenges: Describing what information can be extracted from

event reports Identifying methods to extract desired information Identifying methods to convert unstructured to

structured data

Page 10: 2009 EpiSPIDER CDC GIS Day

How was EpiSPIDER built?

Began as a fellowship project in 2005 with Dr. Raoul Kamadjeu

On a “shoestring budget,” utilizing Open-Source software and freely available web services and data sources Linux, Apache, MySQL and PHP (LAMP) Initially Scalable Vector Graphics then Yahoo Maps and

Google Maps Existing RSS feeds and unstructured web content

Custom-developed NLP later replaced with OpenCalais NLP web service

Page 11: 2009 EpiSPIDER CDC GIS Day

The Ecosystem Definition: Any natural unit or entity including living and non-living parts

that interact to produce a stable system through cyclic exchange of materials [NASA Earth Observatory Glossary].

Concept can be applied to Internet-based applications that function as information-consuming or information producing “organisms” that interact with each other in an interdependent way through exchange of information.

This information “ecosystem” has: Producers of data Transformers of data Consumers of data

http://earthobservatory.nasa.gov/Glossary/?mode=all

Page 12: 2009 EpiSPIDER CDC GIS Day

Graphical depiction of “ecosystem”

Yahoo Pipes

ProMED MailUNDPCIAWAHID

Unstructured Text

Google NewsMoreover ReutersWHOGDACSTwitter

RSSRSS

RSS, GeoRSS

OpenCalaisAlchemyUMLSKSuClassifier

GeonamesGoogle TranslateYahoo MapsWikipedia

KML

Exhibit

Faceted Browsing

Google Maps

JSON data

RDF, XML

XMLSOAP REST

Mobile Provider

SMTP

SMS

Dapper

RSS

Consumers

Transformers

Producers

RSS

EpiSPIDER

RSS

RSS

Page 13: 2009 EpiSPIDER CDC GIS Day

EpiSPIDER Web ServicesCATEGORIES BY TASK

Task Category Services

Information retrieval Search engines , RSS feeds, Raw HTML sources

Information extraction Dapper, Yahoo Pipes, Alchemy

Language identification Alchemy, Twitter, uClassifier

Language translation Google Translate

Keyword extraction Alchemy

Named entity recognition OpenCalais, Alchemy

Text classification uClassifier

Visualization SIMILE Exhibit, Google Visualization API, Google Maps

Georeferencing Google Maps, Yahoo Maps, Geonames, Twitter, OpenCalais, Alchemy

Concept annotation UMLS Knowledge Source Server

Page 14: 2009 EpiSPIDER CDC GIS Day

Technology Adoption Timeline

2005 2006 2007 2008

Data sourcesRSS Feeds (2)Unstructured content (1)

Visualization toolsScalable Vector GraphicsJPGraph

Web servicesYahoo MapsaskMEDLINE

ProductsRSS feedsVisualizations

Data sourcesRSS Feeds (4)Unstructured content (3)Email

Visualization toolsGoogle, Yahoo MapsJPGraph

Web servicesYahoo MapsGoogle MapsaskMEDLINEGeonames

ProductsRSS feedsVisualizations

Data sourcesRSS Feeds (8)Unstructured content (4)Email

Visualization toolsSIMILE ExhibitAJAX visualization tools

Web servicesYahoo MapsGoogle MapsaskMEDLINEGeonamesWikipedia

ProductsRSS , GeoRSS feedsKML feedsSMSVisualizationsCustom products

Data sourcesRSS Feeds (8)Unstructured content (4)Email(Server)

Visualization toolsSIMILE ExhibitAJAX visualization toolsGoogle Earth

Web servicesYahoo MapsGoogle MapsGoogle Visualization API (1)askMEDLINEGeonamesWikipediaUMLSKSOpenCalaisYahoo PipesDapper

ProductsRSS, GeoRSS feedsKML feedsSMSVisualizationsCustom products

Data sourcesRSS Feeds (9)Unstructured content (6)Linked DataEmail(Server)Social networks: Twitter

Visualization toolsSIMILE ExhibitAJAX visualization toolsGoogle EarthWordle

Web servicesYahoo MapsGoogle MapsGoogle TranslateGoogle Visualization API (3)askMEDLINEGeonamesWikipediaUMLSKSOpenCalaisYahoo PipesDapperuClassifierAlchemyTwitterURL services

ProductsRSS, GeoRSS feedsKML feedsSMSVisualizationsCustom products

2009

Page 15: 2009 EpiSPIDER CDC GIS Day

EpiSPIDER, 2005-2006SCALABLE VECTOR GRAPHICS MAP INTERFACE

Page 16: 2009 EpiSPIDER CDC GIS Day

EpiSPIDER, 2006GOOGLE MAPS INTERFACE

Page 17: 2009 EpiSPIDER CDC GIS Day

ProMED Mail RSS Feeds, 2006

Page 18: 2009 EpiSPIDER CDC GIS Day

EpiSPIDER, 2009SIMILE EXHIBIT INTERFACE

Page 19: 2009 EpiSPIDER CDC GIS Day

EpiSPIDER, 2009

Page 20: 2009 EpiSPIDER CDC GIS Day

EpiSPIDER, 2009

Page 21: 2009 EpiSPIDER CDC GIS Day

EpiSPIDER, 2008KML FEEDS FOR GOOGLE EARTH

Page 22: 2009 EpiSPIDER CDC GIS Day

EpiSPIDER, 2009SMS USING MOBILE PROVIDER GATEWAYS

Server Load Alert RSS Feed Outage ProMED Mail Latest

Page 23: 2009 EpiSPIDER CDC GIS Day

How has EpiSPIDER been used?

Page 24: 2009 EpiSPIDER CDC GIS Day

How has EpiSPIDER been used?

Access by type (most to least) RSS Exhibit KML

Access by organization Government agencies Academic institutions Research organizations Health departments

Access by individuals

Page 25: 2009 EpiSPIDER CDC GIS Day

Challenges in implementing EpiSPIDER

Changing nature of data Emergent nature of web services Understanding and developing connections with

complex APIs Information extraction and data linking

challenges Service delivery expansion increases resource

demands

Page 26: 2009 EpiSPIDER CDC GIS Day

Changing nature of web dataCHALLENGES IN IMPLEMENTING EPISPIDER

Challenges with underlying HTML structure Non-standard HTML use prevents effective parsing of

content

Need to map data to shared terminologies and ontologies and knowledge metadata For better integration into an information ecosystem,

system needs to let other “organisms” know what information it needs and what type of information it produces

Page 27: 2009 EpiSPIDER CDC GIS Day

Emergent nature of web servicesCHALLENGES IN IMPLEMENTING EPISPIDER

Adapting to changing interfaces Must go beyond “taping” applications together manually

- need for automated “duct tape” adjustments Difficult for some interfaces (non-SOAP)

Feed URL changes Have to subscribe to multiple mailing lists

Changes in data structure of service response Service may have new data elements Example, new Twitter geolocation elements

Page 28: 2009 EpiSPIDER CDC GIS Day

Understanding complex APIsCHALLENGES IN IMPLEMENTING EPISPIDER

APIs are in continuous development Complexity increasing Knowledge base rapidly expanding

Example: OpenCalais and Alchemy - addition of named entities

and relationships and linked data (Wikipedia, Freebase) for disambiguation

Promising developments Number of APIs in different task categories increasing

Page 29: 2009 EpiSPIDER CDC GIS Day

Information extraction and data linking challengesCHALLENGES IN IMPLEMENTING EPISPIDER

Named entity recognition and disambiguation Named entity recognition by web services of emerging

diseases may lag behind and provide non-specific references

Example: H1N1 may just be tagged as “influenza” (nonspecific)

Missing piece: UMLS Knowledge Source Server named-entity extraction and concept annotation web service Currently a standalone download: Metamap Transfer

Page 30: 2009 EpiSPIDER CDC GIS Day

Service delivery increases resource demandsCHALLENGES IN IMPLEMENTING EPISPIDER

Managing contention for scarce computing resources How to process huge amounts of information without

crashing the server Automated responses to certain parameters –

feedback loop Avoiding process collisions

Alerting mechanisms How to send alerts when the server is about to crash

Page 31: 2009 EpiSPIDER CDC GIS Day

Overall challenges in event-based surveillance for public health threats

Increasing dependence on and need for development of semantic tools to: Identify emerging outbreaks Assign outbreak severity Track escalation/decline, social disruption and government

response over time

Promoting semantic data sharing among similar systems Shared terminologies Ontologies Knowledge metadata

Chute C. Biosurveillance, Classification, and Semantic Health Technologies (editorial), J Am Med Inform Assoc. 2008;15:172–173.

Page 32: 2009 EpiSPIDER CDC GIS Day

Advantages of web services

Main advantages Outsource complex tasks to agents who can devote

resources and economies of scale to deliver high quality, reliable service and outputs

Promote use of standards for information exchange

Other advantages Develop and reuse standard tools for processing

unstructured information

Page 33: 2009 EpiSPIDER CDC GIS Day

What could be next steps?

Critical Incorporation of and mapping of knowledge base to ontology for event-based

surveillance to enable sharing of data across event-based surveillance systems Implementing event-based surveillance systems at national level to enable

targeted, distributed collection of event-based data Exposing underlying database as Resource Description Framework (RDF) or other

standards-based data Collaboration across event-based surveillance systems to enable system-to-system

interoperability

Non-critical Continue to explore new data sources Annotated view of news articles Providing citizen reporting and participatory information processing interfaces to

end-users

Page 34: 2009 EpiSPIDER CDC GIS Day

Summary

Inflection point in evolution of web services just “around the corner”

Challenges remain in: Automation and integration of web services in event-

based surveillance systems Integrating event-based surveillance in national

surveillance systems (local public health context) Enabling sharing of data across event-based

surveillance systems

Page 35: 2009 EpiSPIDER CDC GIS Day

Acknowledgements

NCIRD: Raoul Kamadjeu NLM: Paul Fontelo, Fang Liu, Olivier Bodenreider ProMED Mail: Larry Madoff, Marjorie Pollack,

Alison Bodenheimer, Drew Tenenholz

The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention