Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai...
-
Upload
cristopher-dinsdale -
Category
Documents
-
view
216 -
download
1
Transcript of Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai...
![Page 1: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/1.jpg)
1
Understanding User Intents in Online Health Forums
Thomas Zhang, Jason H.D. Cho, Chengxiang ZhaiDepartment of Computer Science
University of Illinois at Urbana-Champaign
5th ACM Conference on Bioinformatics, Computational Biology, and Health InformaticsNewport Beach, California
22nd September 2014
![Page 2: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/2.jpg)
2
Online Health Forums
• Purpose: To provide a convenient platform to facilitate discussion among patients and professionals
• Huge user base, and still growing!• In 2011, 80% of all web users searched for health information
online, of which 6% participated in health related discussions
• Forums contain valuable information– Contain rich, often first hand experiences
![Page 3: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/3.jpg)
3
Deficiencies of Forums
• Threads are scattered
• Similar questions are asked again and again
• Keyword search is inadequate – Finding several keyword matches in a thread does
not necessarily mean that the thread is relevant
![Page 4: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/4.jpg)
4
Post about cholinergic urticaria in April 2004
Received 3rd and final reply a week later
Post from March 2012
No replies as of July 2014
![Page 5: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/5.jpg)
5
Applications of Intents
• Improving thread retrieval– e.g. A thread whose original post matches both
keywords and intent specified by the user are more likely to be helpful
• Filtering threads– e.g. To treat a condition, only look at posts asking
about treatment
• Understanding user behavior in forums– i.e. users of different forums have different intents
![Page 6: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/6.jpg)
6
This Paper
• Introduces problem of identifying user intents in health forums as a classification problem
• Derives the first taxonomy of user intents
• Designs a set of novel features for use with machine learning to solve the problem
• Create the first dataset for evaluation, and conducted experiments to make empirical findings
![Page 7: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/7.jpg)
7
Roadmap
1. Problem formulation2. Intent taxonomy derivation3. Methodology– Support vector machines– Hierarchical classification– Feature design
4. Evaluation– Dataset– Experiments– Results
5. Intents in MedHelp forums6. Wrap-up
![Page 8: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/8.jpg)
8
Problem Formulation
Given an original thread post from our dataset with intent from a taxonomy of user intents . Denote as the sentence representation of .
Classify as some using as evidence. is correctly classified if and only if
![Page 9: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/9.jpg)
9
Taxonomy Derivation
• No taxonomy exists for health forum intents
• Solution: Create our own!
• First reduce top ten most commonly asked generic questions by doctors (Ely et al, 2000) into three intent classes– Classes match the intents of users who search for health
information online (Choudhury et al, 2014)
• Next introduce two additional intent classes that are specific to health forum posts
![Page 10: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/10.jpg)
10
• Manage: How should I manage or treat condition X?
• Cause: What is the cause of symptom/physical/test finding X?
• Adverse: Can drug or treatment X cause adverse finding Y?
• Combo: Combination (at least two of first three)
• Story: Story telling, news, sharing or asking about experience, soliciting support, or others
Taxonomy
![Page 11: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/11.jpg)
11
Where are we?
1. Problem formulation2. Intent taxonomy derivation3. Methodology– Support vector machines– Feature Selection– Hierarchical classification
4. Evaluation– Dataset– Experiments– Results
5. Intents in MedHelp forums6. Wrap-up
![Page 12: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/12.jpg)
12
Support Vector Machines (SVM)
• Main idea: Learn a hyperplane from examples to separate them into two classes
• Use learned hyperplane to classify unseen examples
• Capable of non-linear and multiclass classification
• Shown to have good performance on high dimensional data
![Page 13: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/13.jpg)
13
Post Representation
• How should we represent posts?– SVMs require examples to be represented as a
vector of features
• What are features?– Some measurable property of the observed data
• How should we select them?
![Page 14: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/14.jpg)
14
Feature Selection
A good feature should be:1. Generic enough to be found in many posts2. Sufficiently discriminative for different
intents
![Page 15: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/15.jpg)
15
Solution: Patterns!
• Sequence of (possibly non-contiguous) tokens that represent recurring text patterns in sentences
• Very generic– Lowercasing, stemming– POS tagging– UMLS semantic group tagging
• Very discriminative– “What could X be…?” signifies Cause intent, but “What
does X do…?” signifies Manage intent
![Page 16: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/16.jpg)
16
Pattern TypesEach pattern falls under one of four types:
• LSP: Lowercased + stemmed tokens only– E.g. “…what can caus…”
• POSP: LSP + POS tags– E.g. “…how to <VERB>…”
• SGP: LSP + semantic group tags– E.g. “…if <CHEM> works…”
• ALL: All types of tokens and tags– E.g. “…<CHEM> make <PRP> feel…”
![Page 17: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/17.jpg)
17
UMLS Semantic Groups
• MetaMap labels text phrases with semantic group labels from the UMLS Metathesaurus
![Page 18: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/18.jpg)
18
Caveat
• Patterns possess limitations– Difficult to achieve good coverage without
sacrificing discriminative properties– Impossible to extract for posts with large content
variations (e.g. Story posts)
• However, we still want complete coverage of our dataset!
![Page 19: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/19.jpg)
19
Solution: Hierarchical Classification!
• Two cascading SVM classifiers– The first uses binary pattern
features (Pattern SVM)– The second uses unigram
features with TF-IDF weighting (Word SVM)
• Complete coverage allows comparison with unigram baseline
Input Post
Match ≥ 1 pattern?
Yes No
Pattern SVM
Word SVM
Output Class
![Page 20: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/20.jpg)
20
Where are we?
1. Problem formulation2. Intent taxonomy derivation3. Methodology– Support vector machines– Hierarchical classification– Feature design
4. Evaluation– Dataset– Experiments– Results
5. Intents in MedHelp forums6. Wrap-up
![Page 21: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/21.jpg)
21
Dataset
• No labeled dataset exists, since this is a new problem
• So we create our own!– 1,192 original HealthBoards posts, evenly divided among
four topics: allergies, breast cancer, depression, and heart disease
• Ideally want more posts, but labeling is expensive
• Why the four topics?
![Page 22: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/22.jpg)
22
Dataset Labeling
• Labeling done by two CS students– Substantial* agreement with medical students ()– Substantial* agreement between themselves (,
labels match)
• Combo posts labeled by a third CS student according to their underlying classes– A Combo post is predicted correctly if a classifier
outputs one of its class labels
*Per Landis and Koch, 1977
![Page 23: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/23.jpg)
23
Experiments
• What is the best performing set of patterns?– Try different type combinations of patterns
• How does hierarchical compare with baseline?– Five-fold cross validation (CV)
• Does performance suffer if we train on posts from three topics and test on the fourth?– Four-fold forum CV
![Page 24: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/24.jpg)
24
Selecting a Pattern Set
𝑃=𝐶𝑜𝑟 .𝑇𝑜𝑡 .
,𝑅=𝐶𝑜𝑟 .
|𝑀|+|𝐶|+¿ 𝐴∨¿ ,𝐹 1=2𝑃𝑅𝑃+𝑅
¿
![Page 25: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/25.jpg)
25
• Patterns reach labeling agreement upper bound
CV Takeaways
• Overall improvement is underwhelming, why?• Patterns give high precision but low recall– Why is this acceptable?
• Patterns generalize well across forum topicsHierarchical Classification Performance
Word Classifier (Baseline) Performance
![Page 26: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/26.jpg)
26
Intents in MedHelp Forums
We applied our Pattern SVM to 61,225 MedHelp posts split across allergies, breast cancer, depression, and heart disease
![Page 27: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/27.jpg)
27
Concluding Remarks
• Introduced the new problem of forum post intent analysis
• Designed the first taxonomy and dataset for classification
• Proposed a novel set of pattern features for SVMs
• Proved that patterns give high classification precision while generalizing well across forums
![Page 28: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/28.jpg)
28
Future Work
• Administer study of health forum user intents
• Expand pattern feature set to improve recall
• Handle classification of Story posts
• Identify all intents from Combo posts
• Further evaluation with larger datasets
![Page 29: Understanding User Intents in Online Health Forums Thomas Zhang, Jason H.D. Cho, Chengxiang Zhai Department of Computer Science University of Illinois.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c725503460f94924815/html5/thumbnails/29.jpg)
29
Thank you!
Questions? Comments?