Developing a Semantic Search Application A Pharma Case Study
description
Transcript of Developing a Semantic Search Application A Pharma Case Study
![Page 1: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/1.jpg)
Developing a Semantic Search Application
A Pharma Case StudyTom Reamy
Chief Knowledge ArchitectKAPS Group
http://www.kapsgroup.comProgram Chair – Text Analytics World
Taxonomy Boot Camp: Washington DC, 2013
![Page 2: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/2.jpg)
2
KAPS Group: General Knowledge Architecture Professional Services – Network of Consultants Partners – SAS, SAP, IBM, FAST, Smart Logic, Concept Searching
– Attensity, Clarabridge, Lexalytics, Strategy – IM & KM - Text Analytics, Social Media, Integration Services:
– Taxonomy/Text Analytics development, consulting, customization– Text Analytics Fast Start – Audit, Evaluation, Pilot– Social Media: Text based applications – design & development
Clients: – Genentech, Novartis, Northwestern Mutual Life, Financial Times,
Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, etc.
Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies
Presentations, Articles, White Papers – http://www.kapsgroup.com
![Page 3: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/3.jpg)
3
Project Agile Methodology Goal – evaluate semantic technologies ability to:
– Replace manual annotation of scientific documents – automated or semi-automated
– Discover new entities and relationships – Provide users with self-service capabilities
Goal – feasibility and effort level
![Page 4: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/4.jpg)
4
Components – Technology, Resources Cambridge Semantics, Linguamatics, SAS Enterprise Content
Categorization– Initial integration – passing results as XML
Content – scientific journal articles Taxonomy – Mesh – select small subset Access to a “customer” – critical for success
![Page 5: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/5.jpg)
5
Three rounds - Iterations Visualization – faceted search, sort by date, author, journal
– Cambridge Semantics Round 1 – PDF from their database
– Needed to create additional structure and metadata– No such thing as unstructured content
Round 2 & 3 – XML with full metadata from PubMed Entity Recognition – Species, Document Type, Study Type, Drug
Names, Disease Names, Adverse Events
![Page 6: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/6.jpg)
6
Components & Approach Rules or sample documents?
– Need more precision and granularity than documents can do– Training sets – not as easy as thought
First Rules – text indicators to define sections of the document – Objectives, Abstract, Purpose, Aim – all the “same” section
Separate logic of the rules from the text – Stable rules, changing text
Scores – relevancy with thresholds– Not just frequency of words
![Page 7: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/7.jpg)
7
Document Type Rules
![Page 8: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/8.jpg)
8
Document Type Rules
(START_2000, (AND, (OR, _/article:"[Abstract]", _/article:"[Methods]“, _/article:"[Objective]",
_/article:"[Results]", _/article:"[Discussion]“, (OR, _/article:"clinical trial*", _/article:"humans", (NOT, (DIST_5, (OR,_/article:"approved", _/article:"safe",
_/article:"use", _/article:"animals"), Clinical Trial Rule: If the article has sections like Abstract or Methods AND has phrases around “clinical trials / Humans” and not words
like “animals” within 5 words of “clinical trial” words – count it and add up a relevancy score
![Page 9: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/9.jpg)
9
Rules for Drug Names and Diseases
Primary issue – major mentions, not every mention– Combination of noun phrase extraction and categorization– Results – virtually 100%
Taxonomy of drug names and diseases Capture general diseases like thrombosis and specific types like
deep vein, cerebral, and cardiac Combine text about arthritis and synonyms with text like “Journal
of Rheumatology”
![Page 10: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/10.jpg)
10
![Page 11: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/11.jpg)
11
Rules for Drug Names and Diseases
(OR, _/article/title:"[clonidine]", (AND, _/article/mesh:"[clonidine]",_/article/abstract:"[clonidine]"), (MINOC_2, _/article/abstract:"[clonidine]") (START_500, (MINOC_2,"[clonidine]")))
Means – any variation of drug name in title – high score Any variation in Mesh Keywords AND in abstract – high score Any variation in Abstract at least 2x – good score Any variation in first 500 words at least 2x – suspect
![Page 12: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/12.jpg)
12
Rules for Drug Names and Diseases
Results: – Wide Range by type -- 70-100% recall and precision
Focus mostly on precision – difficult to test recall One deep dive area indicated that 90%+ scores for both precision
and recall could be built with moderate level of effort Not linear effort – 30% accuracy does not mean 1/3 done
![Page 13: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/13.jpg)
13
Iteration 3
Complete treatment of disease state:– Indication (disease you want to treat)– Concomitant disease– Adverse or side effects
Use XML metadata – some variant of “adverse” Any combination of words associated with a disease (depression)
and any of the words that indicated an adverse event or effect
![Page 14: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/14.jpg)
Conclusion
Project was a success! Useful results – as defined by the customer Reasonable and doable effort level – both for initial development
and maintenance Essential Success Factors
– Rules not documents, training sets (starting point)– Full platform for disambiguation of noun phrase extraction,
major-minor mention– Separation of logic and text
Semantic Search works!– If you do it smart!
14
![Page 15: Developing a Semantic Search Application A Pharma Case Study](https://reader036.fdocuments.us/reader036/viewer/2022070500/5681688a550346895ddf09a4/html5/thumbnails/15.jpg)
Questions? Tom Reamy
[email protected] Group
Knowledge Architecture Professional Serviceshttp://www.kapsgroup.com
www.TextAnalyticsWorld.com March 17-19, San Francisco