Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's...
-
Upload
knoesis-center-wright-state-university -
Category
Technology
-
view
51 -
download
2
Transcript of Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's...
![Page 1: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/1.jpg)
Domain-Specific Document Retrieval Framework for
Near Real-time Social Health Data
1
Master’s Thesis
Swapnil SoniCommittees
Prof. Amit P. Sheth (Advisor)
Prof. Krishnaprasad Thirunarayan
Dr. Tanvi Banerjee
Collaborator
Ashutosh Jadhav
Contact:LinkedIn: https://www.linkedin.com/in/swapnilsoniknoesisHome page: http://knoesis.org/researchers/swapnil/http://knoesis.org/
![Page 2: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/2.jpg)
Outline
2
Background, Objective
Problem Statement
Data Collection
Pattern Extraction Analysis
Results and Evaluation
Demonstration
Conclusion
![Page 3: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/3.jpg)
Background, Objective
Problem Statement
Data Collection
Pattern Extraction Analysis
Results and Evaluation
Demonstration
Conclusion
3
Outline
![Page 4: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/4.jpg)
4
Background
Sources: Pew research
http://www.pewinternet.org/2010/03/24/health-information/
http://www.pewinternet.org/2011/02/01/health-topics-3/
Online health resources are easily accessible
and provide information about most of health
topics.
These resources can help non-experts to make
more informed decisions and play a vital role
in improving health literacy.
![Page 5: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/5.jpg)
5
According to the pew research*,
45% of U.S. adults are dealing with at least one chronic condition
Of those who are living with two or more conditions, 45% have diabetes
*http://www.pewinternet.org/files/old-media/Files/Reports/2013/PIP_TrackingforHealth%20with%20appendix.pdf
Background
![Page 6: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/6.jpg)
6
Health Information Seeking
Web search engine Social media
Choudhury et. al., Seeking and Sharing Health Information Online: Comparing Search Engines and Social Media,ACM,2014
Teevan et. Al., #TwitterSearch: A Comparison of Microblog Search and Web Search, 2011
Real-time content
Popular trends
Online health
information-seeker
Learn about basic facts
Get deeper understanding
about a topic of interest
![Page 7: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/7.jpg)
7
Online health
information-seeker
Real-time content
Relevant
Reliable information
Health Information Seeking
Social media search-engine
![Page 8: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/8.jpg)
8
Health Information Seeking Challenges
Keyword-based techniques are based on the
interpretation of keywords
Search results may not be real-time
![Page 9: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/9.jpg)
9
Example: How to control diabetes
Keyword-basedNot real-time
![Page 10: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/10.jpg)
10
To provide a platform to ask health-related questions
in near real-time, reliable, and relevant health
information shared on social media.
Objective
![Page 11: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/11.jpg)
11
In the US 18% internet users use Twitter.
As we know, there are 500 million tweets per day and around 75K
verified healthcare professionals accounts from all over the world.
152K: number of health tweets every day by professionals in health-
care.
Twitter as a Data Source
Twitter has become a new source of information overload in health-care
![Page 12: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/12.jpg)
12
Problem
How to extract near real-time, reliable and relevant
documents from the health information shared on
Twitter for a given user query?
![Page 13: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/13.jpg)
Outline
13
Background, Objective
Problem Statement
Data Collection
Pattern Extraction Analysis
Results and Evaluation
Demonstration
Conclusion
![Page 14: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/14.jpg)
14
Data Collection
![Page 15: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/15.jpg)
Predefined questions:
Selected most frequently asked questions from Mayo clinic, WebMD, etc.
Dynamic questions:
User can ask any question
15
Categories of Questions
![Page 16: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/16.jpg)
16
System Architecture
User interface
Database
patternsURL social media
rank
Similarity score
Calc
Patterns Rank
Calc
URL
content
extractor
Hadoop-based Pattern
extractor
Pattern
extractorURL
share &
like
counts
extractor
23
4
5
1
Language Identifier
URL extractor
URL resolver
DBHandler
Apache Storm
Processing pipeline
![Page 17: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/17.jpg)
17
Apache Storm
It is a distributed, real-time computation
system.
Spouts and Bolts are basic components in
storm for real-time processing of data.
Networks of spouts and bolts are packaged
into a “topology”, which is submitted to storm
cluster.
![Page 18: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/18.jpg)
18
Crawler
Spout
Topology Architecture
Language
identifierBolt
Object
Modeling
Hashtag
extractorURL
Extractor
URL
resolver
A spout which crawls in real-time based on keywords
It allows only English tweets
It is used for retrieving a
hashtag from the tweets
It converts tweet object to Java
object
Extract URLs from tweets
It expands the URL(s) from short to
its original form
![Page 19: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/19.jpg)
Outline
19
Background, Objective
Problem Statement
Data Collection
Pattern Extraction Analysis
Results and Evaluation
Demonstration
Conclusion
![Page 20: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/20.jpg)
20
Component: Pattern Extractor
User interface
DatabaseLanguage Identifier
URL extractor
URL resolver
DBHandler
URL
content
extractor
Apache Storm
Processing pipeline
patternsURL social media
rank
Similarity score
Calc
Patterns Rank
Calc
Hadoop-based Pattern
extractor
Pattern
extractor
URL
share &
like
counts
extractor
23
4
5
1
![Page 21: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/21.jpg)
21
Content Extractor:
To extract content from the URLs (present in the tweets).
URL(s) Share & Like counts Extractor:
Popularity of a source: To measure the content popularity, we have used social
media shares and likes counts of the URLs.
Facebook shares, Facebook like count, Twitter share count.
Reliability of a source: Google domain page rank of the URLs.
Extractors
![Page 22: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/22.jpg)
22
Pattern Extractor
Pattern-based Mining
Triple Subject, predicate, and objectQuestion
Construct an AQL query
A noun or noun phrase, or a
verb or verb
![Page 23: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/23.jpg)
23
AQL is a language used for building queries that pulls structured
information from unstructured or semi-structured text.
Syntax of AQL is similar to that of Structured Query Language (SQL).
AQL file
AOG
SystemTData
folder
Contains all the patterns.
Result contains pattern.
Annotation Query Language (AQL)
![Page 24: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/24.jpg)
5 easy natural remedies to control diabetes : If you are a diabetic or know someone who is a diabeti... http://bit.ly/13oypg4
24
Pattern Extractor: Example
How to control diabetes?
X control diabetes
X control blood sugar
X handle blood sugar
X handle diabetes
UMLSWordNet
Synsets
![Page 25: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/25.jpg)
This module extracts triples (patterns) from unstructured (tweets and
URLs’ content) based on predefined questions (AQL queries).
The text analytic engine executes AQL queries--an interval of six hours.
25
Predefined Questions: Pattern Mining
![Page 26: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/26.jpg)
26
How to control diabetes by exercise?
Part-of-speech
taggerControl (verb), diabetes(noun), exercise(noun)
Query builderWordNetSynsets
Query executer
diabetes control exercise
exercise control diabetes
Dynamic Query Processing Architecture
![Page 27: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/27.jpg)
Question Extracted Pattern Paragraph
How to control
diabetes?
control blood sugar Exercise is a healthy way to lower and control blood sugar levels
within your body. Doing exercise and lifting weights will improve
your condition significantly. http://t.co/88ulxDPFTo
How to control
diabetes?
insulin into the blood
stream to handle
When a meal is eaten, the pancreas will send larger amounts of
insulin into the blood stream to handle the food
http://t.co/WsCWiNqhb9
How to control
diabetes?
remove sugar Since people with Type 2 diabetes tend to accumulate sugar in
their blood due to their inability to efficiently remove sugar from
the blood http://t.co/aHqKJjrTPY
27
Results
Question Extracted Pattern Paragraph
What are the
Symptoms of
diabetes?
Diabetics tend to get Diabetics should be very cautious when having a pedicure.
Diabetics tend to get bad infections in the feet, so you must be
very aware of any puncture or cut you notice on your feet.
http://t.co/HqJBjBtrXC
What are the
causes of
diabetes?
can cause diabetes Smoking isn’t healthy for anyone but can be very dangerous if
you’re a diabetic. This habit produces many poor health issues.
Smoking makes a person’s insulin resistant, in can cause
diabetes to develop http://t.co/Ca5SaXRL6w
![Page 28: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/28.jpg)
PatternsURL share and
like counts
Pattern Rank
Calculator
Pattern Rank Calculator Architecture
24
Similarity
Score
Calculator
Database
1a 1b 2
3
![Page 29: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/29.jpg)
29
Features Set
Popularity Relevancy
Facebook share counts
Facebook like counts
Twitter share count
Vector based
similarity score
Reliability
Google domain rank
![Page 30: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/30.jpg)
30
Query Expansion
How to control diabetes?
How to control diabetes?
How to control blood sugar?
How to handle blood sugar?
How to handle diabetes?
0.81 (TF-IDF score)
0.0 (TF-
IDF score)
0.81(TF-IDF score)
0.77(TF-IDF score)
Exercise controls diabetes
Natural way to handle blood sugar
![Page 31: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/31.jpg)
31
0.638 0.630.678
0.376
0.639 0.667 0.694
0.556
NAIVEBAYES SUPPORTVECTOR RANDOMFOREST ADABOOSTM1
Social Media share and like count + Jaccard Index on query expansion
Precision Recall
0.7530.687
0.793
0.501
0.722 0.750.806
0.583
NAIVEBAYES SUPPORTVECTOR RANDOMFOREST ADABOOSTM1
Social Media share and like count + TF-IDF on query expansion
Precision Recall
Experiments: ML Classifiers
![Page 32: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/32.jpg)
Outline
32
Background, Objective
Problem Statement
Data Collection
Pattern Extraction Analysis
Results and Evaluation
Demonstration
Conclusion
![Page 33: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/33.jpg)
Evaluation
33
Reliability, Relevancy, and Real-time
Pattern Generator
Query Expansion based on Relevance Feedback
![Page 34: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/34.jpg)
34
Evaluation: Reliability, Relevancy & Real-time
Reliability:
• Based on URL’s (extracted news article) Google domain pagerank
• Filtration criteria is URL’s Google domain pagerank should be greater than 4
Relevancy:
• Based on qualitative approach
• For a given question, user survey participants judge the relevancy of the result set from 1)
Twitter search 2) Social Health Signals 3) Google time bound search and assign relevancy score
from 1 (low) to 3(high)
Real-time:
• Timeliness (trends) of a retrieved document. We have considered only 6 hours
data to find out information of a user’s given query
• Example: breaking news on diabetes
![Page 35: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/35.jpg)
Collected the top 10 results from these sources: Twitter search, Social
Health Signal, and Google time-bound search
35
Evaluation: Relevancy
Queries (Frequently Asked Query)
1) How to control diabetes?
2) What are the causes of diabetes?
3) What are the symptoms of diabetes?
![Page 36: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/36.jpg)
Presented the top 10 results from all the three sources for each of the query
to participants
Participants judge each document of a query on a scale of 1 to 3 (i.e. 1-Not
good, 2-good, and 3-very good)
To calculate average rank, we have used the following formula*:
𝐑𝐚𝐧𝐤 =𝑻𝒐𝒕𝒂𝒍𝑪𝒐𝒖𝒏𝒕𝒐𝒇𝟑′𝒔 ∗ 𝟑 + 𝑻𝒐𝒕𝒂𝒍𝑪𝒐𝒖𝒏𝒕𝒐𝒇𝟐′𝒔 ∗ 𝟐 + 𝑻𝒐𝒕𝒂𝒍𝑪𝒐𝒖𝒏𝒕𝒐𝒇𝟏′𝒔 ∗ 𝟏
𝑵𝒐. 𝒐𝒇 𝑷𝒂𝒓𝒕𝒊𝒄𝒊𝒑𝒂𝒏𝒕𝒔
36
Evaluation: Relevancy
*http://help.surveymonkey.com/articles/en_US/kb/What-is-the-Rating-Average-and-how-is-it-calculated
![Page 37: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/37.jpg)
37
Evaluation: Relevancy
How to control diabetes?
Result 1
Result 2
Result 3
Score 1 Score 2 Score 3
![Page 38: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/38.jpg)
38
Evaluation: Relevancy
search
Social
Health
Signal
Query 1 40% 50%
Query 2 10% 60%
Query 3 40% 50%
search
Social
Health
Signal
Query 1 10% 10%
Query 2 30% 10%
Query 3 30% 20%
search
Social
Health
Signal
Query 1 50% 40%
Query 2 60% 30%
Query 3 30% 30%
Bad
GoodVery Good
Google-
time bound
10%
50%
10%
Google-time
bound
40%
10%
70%
Google-time
bound
50%
40%
20%
![Page 39: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/39.jpg)
nDCG@K (Normal Discounted cumulative gain)
nDCG@K can handle multiple levels of relevance
It gives more weightage to a higher position document than a lower
ranking position document
39
Evaluation Matrices: Relevancy
![Page 40: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/40.jpg)
40
Twitter-
Search
Social Health
Signal
DCG 9.68 12.72
IDCG 13.33 13.76
nDCG 0.726 0.924
Twitter-
Search
Social Health
Signal
DCG 9.67 13.15
IDCG 10.55 14.15
nDCG 0.91 0.92
Twitter-
Search
Social Health
Signal
DCG 10.75 11.47
IDCG 12.69 13.45
nDCG 0.84 0.85
Time-Bound
9.12
9.81
0.929
Time-Bound
10.03
12.76
0.78
Time-Bound
10.76
10.89
0.98
Query 2Query 1
Query 3
Evaluation Matrices: Relevancy
![Page 41: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/41.jpg)
41
Evaluation: Popularity
Google time-bound Social Health Signal
Query: How to control diabetes?
Facebook (Share + Like )
CountsTwitter Share Counts
4 0
0 0
0 0
0 4
0 0
0 0
1 2
52 1211
229 0
Facebook (Share + Like )
CountsTwitter Share Counts
3910 1843
213 8
81 90
0 128
149 826
0 24
0 20
0 24
0 2
![Page 42: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/42.jpg)
42
Google Time-Bound Search
Obesity cause diabetes Overweight cause diabetes
![Page 43: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/43.jpg)
43
URL Title: Replacing sugary drinks with water may reduce diabetes risk
Extracted Pattern: 'obese is a major risk'
URL Title: More Evidence Links Diabetes to Alzheimer's Disease
Extracted Pattern: Overweight May Decrease Mortality Risk
URL Title :The facts about sugar
Extracted Pattern: ‘overweight can increase your risk ‘
URL Title : Having Diabetes Can Increase Your Alzheimer's Risk Via Blood Glucose And
Brain Plaque Link
Extracted Pattern : obesity can also increase our risk
URL Title : Diabetes Study Suggests a Little Extra Weight Tied to Longer Survival
Extracted Pattern: risk for dying than overweight
Social Health SignalObesity less Mortality Risk
![Page 44: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/44.jpg)
44
Demo
http://knoesis-twit.cs.wright.edu/SocialHealthSignal/
![Page 45: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/45.jpg)
45
Future Work
Evaluation
Relevancy on more Queries
Pattern Generator
Query Expansion based on Relevance Feedback
Semantic Categorization
Performance improvement for dynamic queries
![Page 46: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/46.jpg)
46
Conclusion
◦ Twitter has become a popular tool for seeking health information.
◦ It is very difficult task to extract relevant, and reliable health document
from Twitter in near real-time
◦ We address this problem, by using state-of-the-art approaches such as
◦ Semantics-based pattern mining
◦ TF-IDF relevancy score on query expansion
◦ Content popularity: Social media share and like counts
◦ Reliability : Google domain page rank
![Page 47: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/47.jpg)
47
Acknowledgements
Dr. Amit Sheth Dr. T.K Prasad Dr. Tanvi Banerjee Ashutosh Jadhav
![Page 48: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/48.jpg)
48
Thanks!
Questions?
![Page 49: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/49.jpg)
49
Social Health
Signal
Screenshots
![Page 50: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/50.jpg)
50
Home Screen
![Page 51: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/51.jpg)
51
Search & Explore Screen
![Page 52: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/52.jpg)
52
Top 10 URLs Screen
![Page 53: Domain Specific Document Retrieval Framework for Near Real-time Social Health Data - Swapnil Soni's MS thesis presentation](https://reader033.fdocuments.us/reader033/viewer/2022042818/55aae7571a28abff5d8b4630/html5/thumbnails/53.jpg)
53
Tweet Locations Screen