Challenges of Developing a Global Alerting System American Chemical Society National Meeting -...
-
Upload
rosa-waters -
Category
Documents
-
view
219 -
download
0
Transcript of Challenges of Developing a Global Alerting System American Chemical Society National Meeting -...
Challenges of Developing a Global Alerting System
American Chemical Society National Meeting - ChicagoSymposium Honoring Gary Wiggins
March 25th, 2007
Leah Sandvoss
Information Scientist
2
Honors
Chemical Informatics program initiation
Mentorship
Independent study on Chemical Information
3
Outline
Definitions
Overview of Global Alerting System
Business Analysis – pre-project
Information Retrieval
Metadata
Users
Project limitations
Lessons Learned
4
Definitions
Alerts/Selective Dissemination of Information(SDI)/Current Awareness – a stored search strategy which is run periodically against a database to return any newly added results to the end-user
Information Retrieval – the systematic storage and recovery of data, as from a file or database
Knowledge Management – refers to a range of practices used by organizations to identify, create, represent, and distribute knowledge for reuse, awareness and learning across the organization.
Business Analysis – act of gathering and translating business issues and needs into a form that can be given to appropriate people to form solutions
Metadata – data about data “Zip Code” is the metadata for the piece of data “92121” “Abstract” is the metadata for the actual abstract text
Unstructured data – data structure which is not readily machine readable
Controlled vocabulary – carefully selected list of words and phrases which are used to tag units of information to make them more retrievable by a search cancer vs neoplasm
Sources: Chicago Manual Style (CMS): information retrieval. Dictionary.com. Dictionary.com Sources: Chicago Manual Style (CMS): information retrieval. Dictionary.com. Dictionary.com Unabridged (v 1.1). Random House, Inc. http://dictionary.reference.com/browse/information Unabridged (v 1.1). Random House, Inc. http://dictionary.reference.com/browse/information retrieval (accessed: February 21, 2007). “Consulting Skills for Business Analysis” course by retrieval (accessed: February 21, 2007). “Consulting Skills for Business Analysis” course by Watermark LearningWatermark Learning
5
Knowledge Management
Each company in the healthcare and pharmaceutical sector has spent an average of US$274,000 per annum on knowledge management over the past three years (ref 2001)
Dyer, G. and McDonough B. (2001) Vertical targets for knowledge management vendors. International Data Corporation. Document No. 25535
6
State of Biomedical Literature Mining
Source: Jensen, L.J., Sari, J., Bork, P. Literature Mining for the biologist: from information retrieval to biological discovery Nature, vol 7, February 2006
7
Current Awareness System
Common platform to deliver many types of information, providing a common process for inserting the information
Compiles results from multiple information retrieval systems
Allows for the collection, review, analysis, and summarization of information types
Export capabilities
Uses controlled vocabularies
Provides structured, actionable information
PortalPortal
PatentsPatents
LiteratureLiterature
NewsNews
BooksBooks
TOCsTOCs
InternalInternal
Integration layerIntegration layer
Key Op LeadersKey Op Leaders
8
Business AnalysisBusiness Analysis
9
Business Analysis
A team was formed in 2002 to look at key information products available to the end-user, coined “value-added products”
First focus was on the products providing alerts/SDIs Used online alert survey from an existing internal system to
identify user needs. Approximately 230 total respondents. Held discussions among team members about workflow based
on their experience with customer needs Conducted a per-month cost comparison of various alerting
services
Several products provided overlapping information, resulting in duplication of effort among the information scientists
Recommendations were made for a future system and workflow for managing alerts
10
Business Analysis
In late 2004, a project team was formed to develop a tool to manage alerts as well as search results
First goal was to provide a repository to manage content. Tool would: Allow information scientists to contribute, manage, and disseminate
content Replace existing current awareness products Provide automation where possible to save time on the part of
information scientists
Second goal was to develop a tool to display content to the end-user in one interface with a common format
Environmental scan was performed but determined to develop product in-house Facilitate incorporating internal content
Focus of this talk is on the repository development
11
Information RetrievalInformation Retrieval
12
Information Retrieval– Licensing Issues
Needed to determine what was “in scope” for existing contracts
Copyright restrictions questions Vendor-produced abstracts subject to copyright restrictions? Does it violate copyright to redistribute results? Can full-text of article be used and classified? Can a screen scrape be performed on an HTML page? Can complete citation(s) be stored? If not, what fields can be stored? (ie, a
unique identifier so that user can get back to complete citation). For stored content, is there an expiration date?
Results Options Format - XML, plain text, HTML, etc Transfer type - sFTP, e-mail, HTML view
13
Unstructured or Semi-structured Data – BRS/Tagged
UI 92158846 TI Cluster headache syndrome. Ways to abort or ward off attacks. [Review] AU Marks DR. Rapoport AM
14
Unstructured or Semi-structured data - HTML
15
Unstructured or Semi-structured data - TOC
16
Information Retrieval – Process
Supported Commercial Databases
Repository
Supported News
Sources
OtherSources
Information Scientists
E-mail inbox
Rules applied for Rules applied for strategy setup and strategy setup and deliverydelivery
Parsers appliedParsers applied
17
MetadataMetadata
18
Source System Metadata Source system - defined set of metadata to which the vendor tags would map
To define core fields, looked at range of fields provided by all the databases of interest from the different vendors
Used Dublin Core fields where applicable
Abstract
Author
Classification
Date Granted
Device Manufacturer
Device Trade Name
Diseases
Edition Subset
External Reference ID
Gene Info
Keywords
Language
Literature Title
Literature Type
Location
Methods & Equipment
Miscellaneous
Molecular Sequence
Molecular Source Number
Open URL issn
Open URL issue
Organism Info
Patent Assignee
Patent Class
Patent Country
Patent Number
Personal Name as subject
Publish Date
Publisher
Sequence Data
Space Flight Mission
Subject Heading
Title
TOC Categories
19
Metadata Mapping
Database Name Database Tag Name Target Metadata Field Name
Biosis Previews Concept Code Classification
Derwent World Patents Index
Title Index Terms and Additional Words
Keywords
Derwent World Patents Index
Derwent Accession Number External Reference ID
SciSearch Cited Work Cited Reference
CAB Abstracts Organism Descriptors Organism Info
Medline Country of Publication Location
Minimum set of fields / database standpoint
Exclude fields not used for search or retrieval (ex: Item URL, Locally Held, Local Messages, Record Owner, Update Code, Notes, Order Number, Price, Abbreviated Source, Reprint Address, etc.)
Manual process by subject matter experts (information scientists)
20
Metadata - Content Objects
Content objects defined to differentiate content types on the backend Contained unique metadata as well as overlapping metadata
Choices for end-user interface
Content Objects:
21
Metadata - Controlled Vocabulary
Controlled terms enhance search and retrieval capability Terms are selected by user (information scientist) for tagging content items Use preferred term, then list of synonyms Standard terminology lists as pick lists (ex: Therapeutic area, disease)
Authoritative sources were used to determine appropriate values Internal vocabularies National Library of Medicine Medical Subject Headings (MeSH) Medical Dictionary for Regulatory Activities (MedDRA)
Repository
Metathesaurus
Authoritative Classifications
MeSH
MedDRA Benzene
Internal
Benzene
Cyclohexatriene
Figure source: DATAFUSION, Inc copyright 1999Figure source: DATAFUSION, Inc copyright 1999
22
UsersUsers
23
Users
Information Scientists Set up alert strategies in vendor databases as
well as the source system repository Involved in interactive sessions with the tool to
discuss content needs and find bugs in the system
Information Scientists
End-Users
End-users Used the portal which displayed content Involved early on in the initial requirements
gathering, then engaged by the information scientists to test the tool
24
Project LimitationsProject Limitations
25
Project Limitations – Source System
For every new vendor file/database that needed to be added to the system, a manual mapping from the vendor database fields to the target metadata had to be performed
Repository interface was cumbersome Setting up a strategy was quite time-consuming as there was no auto-population
of data Opening new windows within the system was quite slow
New version of source system arrived mid-project
An approver role was required to allow an alert strategy to be set-up
System did not provide robust, boolean searching at the time
Only had one expert on the source system
26
Project Limitations - Organizational
Key reasons why projects fail: Inattentiveness to organizational change Sponsorship is lost or changes Lack of budget/resources
Other Factors
Project team leaders and members changed several times throughout life of project
Other applications identified to integrate into the solution were also “new” or in development
IT resources not well supported
NO-GO decision was made near production
27
Lessons Learned
For a multi-year project: Manage change
– Knowledge transfer– Sustain momentum
Sustain business sponsorship Plan the budget carefully Involve influencing parties (vendors/publishers) early
Current awareness system: Portal concept well-supported by end-users
– Flexibility on their part to manage alerts– Integrated several different content types
Common workflow supported by information scientists
28
Summary
Knowledge Management is a continuous challenge
A need still exists for a global current awareness system
Follow-up plans
Currently evaluating commercially available products
Internal efforts to filter, consolidate, and analyze content for customers
29
Acknowledgements
Ajit AcharyaAjit AcharyaAmy Tellez-KarstenAmy Tellez-KarstenAndrew HorganAndrew HorganAngela LiuAngela LiuAngelika Wendler-AwasthiAngelika Wendler-AwasthiAnn YoungAnn YoungBarb MillerBarb MillerBarbara BreenBarbara BreenBeverly KucharskiBeverly KucharskiBill GillickBill GillickBob BergerBob BergerBryon Tilley Bryon Tilley Cara EvansCara EvansChandra AithaChandra AithaChris Duhl, West PoleChris Duhl, West PoleChristina CarrChristina CarrChristina KeilChristina KeilChristine NgChristine NgClaire HogikyanClaire HogikyanClare ChallengerClare ChallengerCleazoe MalekCleazoe MalekDan Cooney, West PoleDan Cooney, West PoleDavid WalshDavid WalshEd PelicEd PelicElaine LoganElaine LoganEmory EmrichEmory EmrichFradwin Marmol Fradwin Marmol Francis Di BellaFrancis Di BellaGetu DiroGetu DiroHennie OswaldHennie Oswald
Ian ParsonsIan ParsonsIradj RezaIradj RezaJan CarrJan CarrJanet SmithJanet SmithJill MaddoxJill MaddoxJulie grannisJulie grannisKaren EraniKaren EraniKarl RoyerKarl RoyerKathy CornishKathy CornishKathy VanLeeuwenKathy VanLeeuwenKen DrakeKen DrakeKevin OgborneKevin OgborneKim JohnsonKim JohnsonKirsten KliwinskiKirsten KliwinskiLeah SandvossLeah SandvossMaheshkar PorandlaMaheshkar PorandlaMark MitchellMark MitchellMary SkousenMary SkousenMichele WangMichele WangMichele WolfeMichele WolfeMurali NandulaMurali NandulaNathaniel DunfordNathaniel DunfordNicola CooperNicola CooperPam KubiakPam KubiakPat BurkePat BurkePenny MillerPenny MillerPeter Dresslar, MetamaticsPeter Dresslar, MetamaticsPragati MithalPragati MithalRaj DandamudiRaj DandamudiRavneesh Sachdev Ravneesh Sachdev
Rich SteelRich SteelRichard NicholasRichard NicholasRob ExpositoRob ExpositoRob PurdueRob PurdueRobert LindeRobert LindeShuntai WangShuntai WangSimona HendlSimona HendlSrilekha Komma, Keane, IncSrilekha Komma, Keane, IncSusan SuchettaSusan SuchettaSuzan Quick, West PoleSuzan Quick, West PoleThomas KnowlesThomas KnowlesVeronica TrimbleVeronica TrimbleVishal KumarVishal Kumar
30
Thanks
31
Backup Slides – Requirements from VAP team
Developed Requirements. Key documents included: A proposed “alert service model” which included questions regarding alert gadget data
entry, working with clients and ROI metrics. A list of roles and responsibilities of stakeholders involved in “Global alerting process”,
including IM Colleagues and Pfizer Colleagues. A detailed description of the requirements needed for a future alerting system. It includes
requirements for processing and managing alerts, archiving/distribution/retention issues, and delivery and service to clients.
A list of all of the various types of alerts that are currently used within IM and at which location they are run is provided.
A process model that describes how an end-user might look for/subscribe to alerts. Also included is a process that would be used by the IM colleague when setting up alerts.
A “client need” summary, provided from the IM perspective
32
Backup - Dublin Core Metadata Elements
Contributor
Coverage
Creator
Date
Description
Format
Resource Identifier
Language
Publisher
Relation
Rights Management
Source
Subject and Keywords
Title
Resource Type