Automation Extraction of Side Effect Information from Consumer drug reviews

22
AUTOMATIC EXTRACTION OF SIDE EFFECT INFORMATION FROM CONSUMER DRUG REVIEWS SUPERVISED BY: Assoc Prof Khoo Soo Guan, Christopher Wee Kim Wee School of Communication and Information 20 April, 2015 PRESENTED BY: Abdul Rachman(G1400808F) Paudel Sunil(G1400834A) Sathasivamoorthy Nirathan(G1301369K)

Transcript of Automation Extraction of Side Effect Information from Consumer drug reviews

Page 1: Automation Extraction of Side Effect Information from Consumer drug reviews

AUTOMATIC EXTRACTION OF SIDE EFFECT

INFORMATION FROM CONSUMER DRUG

REVIEWS

SUPERVISED BY:

Assoc Prof Khoo Soo Guan, Christopher

Wee Kim Wee School of Communication and Information

20 April, 2015

PRESENTED BY:

Abdul Rachman(G1400808F)

Paudel Sunil(G1400834A)

Sathasivamoorthy Nirathan(G1301369K)

Page 2: Automation Extraction of Side Effect Information from Consumer drug reviews

Introduction

• Text mining and information extraction from the reviews of social media

(www.webmd.com).

• Extracting side effect information of psychotropic drugs.

• Psychotropic drugs alter the chemical levels in the brain and impact the behavior,

emotions and the mood.

• In past, pharmacy used to provide the side effects based on the clinical trials.

• These days, trusted health sites (like www.fda.gov) provide the list of probable side

effects.

• Sometimes, user might experience side effects not mentioned in the label of the medicine.

Page 3: Automation Extraction of Side Effect Information from Consumer drug reviews

Reviews from www.webmd.com

Page 4: Automation Extraction of Side Effect Information from Consumer drug reviews

Objectives

• Objectives:

• To develop an information extraction method to extract the side effect information from

online drug reviews (www.webmd.com)

• To compare the extracted side effects with the ones listed in www.fda.gov

Page 5: Automation Extraction of Side Effect Information from Consumer drug reviews

Information extraction method

• Side effect information : awful headache

• Pattern : the only side effect has been ____________________

Page 6: Automation Extraction of Side Effect Information from Consumer drug reviews

Information extraction method

• Side effect Information : shaking, restlessness and dizziness

• Pattern : side effects are _______________

Page 7: Automation Extraction of Side Effect Information from Consumer drug reviews

Information extraction method

• Side effect information : nausea (typo error by the user) – pain area in text mining

• Pattern : _________ is a side effect

Page 8: Automation Extraction of Side Effect Information from Consumer drug reviews

Information extraction method

• Side effect extracted by the proposed method:

Till full stop for the information after the pattern

From the beginning of the sentence for the information before the pattern.

Page 9: Automation Extraction of Side Effect Information from Consumer drug reviews

Overall approach for constructing extraction patterns

• To construct a set of good patterns (accurate and good coverage) – candidate patterns

Good coverage: pattern must occur several times (more than 2)

Accuracy: more than 60%

Page 10: Automation Extraction of Side Effect Information from Consumer drug reviews

Overall approach for constructing extraction patterns

• Generation of N-grams: ranging from 3 to 6

• For this study: we investigate only 1 seed word, which is “side effect”

Page 11: Automation Extraction of Side Effect Information from Consumer drug reviews

Extraction Method

• Side effect information extracted using the generated patterns

• Patterns are matched with the reviews and side effects are extracted using automation

method

Page 12: Automation Extraction of Side Effect Information from Consumer drug reviews

Challenges Faced

• Extraction of negative information

Page 13: Automation Extraction of Side Effect Information from Consumer drug reviews

Challenges Faced

• User don’t follow proper structure in writing

Page 14: Automation Extraction of Side Effect Information from Consumer drug reviews

Analysis of Extracted information

• Total No of Patterns: 505

• Total No of Reviews: 801

• Total No of Side Effect information Retrieved: 63

• Total No of relevant side effect information retrieved: 50

• Total No of relevant side effect information available: 71

Page 15: Automation Extraction of Side Effect Information from Consumer drug reviews

Precision, Recall and F1 measure

• 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =Total number of Relevant Side Effects Information Retrieved

Total number of Side Effects Information Retrieved∗ 100

=50

63∗ 100 = = 79.37%

• 𝑅𝑒𝑐𝑎𝑙𝑙 =Total number of Relevant Side Effects Information Retrieved

Total number of Relevant Side Effects Information 𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒∗ 100

=50

71∗ 100 = 70.42%

• F1 = 2 ∗precision .recall

precision+recall

= 2 ∗79.37

70.42∗ 100 = 74.63%

Page 16: Automation Extraction of Side Effect Information from Consumer drug reviews

Error Analysis

• 21 relevant side effects were missed

• Reasons:

Use of free writing (1)

Pattern construction not possible (2)

In training data sample, accuracy was less than 60% (3)

1

2

3

Page 17: Automation Extraction of Side Effect Information from Consumer drug reviews

Error Analysis

• 13 non-relevant side effect information extracted

• Reason:

Even good patterns might extract few bad information

• All these patterns accuracy was above 60% in training sample

Page 18: Automation Extraction of Side Effect Information from Consumer drug reviews

Comparison of Side Effects

• Extracted side effects of 15 drugs compared with those listed in www.fda.gov

• Drugs Selection Criteria:

Minimum 30 reviews in training sample

• Few complained side effects are similar in meaning

Page 19: Automation Extraction of Side Effect Information from Consumer drug reviews

Comparison of Side Effects

• Few of the extracted side effects not mentioned in the list at all

Page 20: Automation Extraction of Side Effect Information from Consumer drug reviews

Conclusion & Future Work

• Thus, the side effects were extracted using the candidate patterns

• Extracted side effects were compared with those of www.fda.gov and found few of them

are not listed in the site

• The extracted information contains lot of noise; future work to be done to extract only the

side effects leaving the noise behind.

• Use of other seed words like downside, bad news, symptom, ill effect etc. to increase the

accuracy of the end results.

Page 21: Automation Extraction of Side Effect Information from Consumer drug reviews

References

• Cheng, V. C., Leung, C. H., Liu, J., & Milani, A. (2014). Probabilistic Aspect Mining Model for

Drug Reviews. Knowledge and Data Engineering, IEEE Transactions on, 26(8), 2002-2013.

• Gaizauskas, R., & Wilks, Y. (1998). Information extraction: Beyond document retrieval. Journal of

documentation, 54(1), 70-105.

• Grishman, R. (1997). Information extraction: Techniques and challenges. InInformation extraction

a multidisciplinary approach to an emerging information technology (pp. 10-27). Springer Berlin

Heidelberg.

• Khoo, C. S. G., Chan, S., Niu, Y., & Ang, A. (1999). A method for extracting causal knowledge

from textual databases.

• Nahm, U. Y., & Mooney, R. J. (2002, March). Text mining with information extraction. In AAAI

2002 Spring Symposium on Mining Answers from Texts and Knowledge Bases (Vol. 1).

Page 22: Automation Extraction of Side Effect Information from Consumer drug reviews

Thank You !!!

Q & A