Big Data Recommendation Approaches for Healthcaretoronto.ieee.ca/files/2018/06/DSP_revised.pdf ·...

Big Data Recommendation Approaches for Healthcare

Samee U. KhanDepartment of Electrical and Computer Engineering

North Dakota State University

Fargo, ND 58108-6050, USA

[email protected]

n Introduction – Personal & Topic

n Recommendation system models

n Big data recommendation system applications

n Case studies

Outline

2May 31, 2018

3May 31, 2018

Recommendation Systems and Big Data

Volume

Dimensions

Velocity

Variety

Veracity

Ever growing data e.g. 500 million tweets daily 1and 600 TB daily on Facebook2

Time sensitive applications e.g. scrutinize fraud from millions of trade events

Structured (relational data), unstructured (text, audio, video, log files etc.). 80% unstructured3

Data authenticity and correctness

Introduced in 90s:• Information filtering• Personalization• Recommend items/services

Customers’ perspective

Providers’ perspective

• Finding items of interest

• Narrow down choices

• Customizations• Predict needs

• Understanding customers’ behavior

• Increase sales• Product promotion• Trend analysis

Big DataRecent Web trends require tools and methodologies to efficiently manage the data for curation, processing, and storage

Challenges• Storage• Availability• Reliability• Computations• Scalability

1 “Internet Live Stats,”http://www.internetlivestats.com/twitter-statistics/, Accessed on April 10, 2018. 2 "Fcode,” https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/, Accessed on April 10, 2018. 3 “Unstructured Data—A Growing Problem,” https://www.waterfordtechnologies.com/unstructured-data-growing-problem/#more-10513, Accessed on April 10, 2018.

https://code.facebook.com/posts/229861827208629/scaling-the-facebook-data-warehouse-to-300-pb/

https://www.waterfordtechnologies.com/unstructured-data-growing-problem/

4May 31, 2018

Recommendation System Modelsn Collaborative Filteringn Content Based Filteringn Hybrid Filteringn Collaborative Filtering

5 Information filtering through human behavior/user profiles

5 commonly employed in commercial recommender systems

g Example: Amazon

n Issues with Collaborative Filtering5 Cold Start

g Requires enough users/items in the system5 Sparsity

g Occurs due to scarce data points 5 Long-tail/Popularity Bias

g Recommendation of popular items only5 Scalability

g Occurs due to increase in users and items

Active users’ preferences

Existing users database

Recommendation Module

Find users with similar tastes

Generate Recommendations

Top-NRecommendations

5May 31, 2018

Recommendation System Modelsn Content Based Filtering

5 Recommendations based on the contents of items instead of users ratings or opinion

5 Requires no information about other users5 Recommendations for users with unique tastes5 No cold start and sparsity issues5 Capable of recommending new or unpopular items

n Issues with Content Based Filtering5 Requires meaningful encoding of content features5 Inability to utilize judgement quality of other users

n Hybrid Filtering5 Combination of collaborative and content based

filteringg Popularity datag Contents

n Issues with Hybrid Filtering5 Datasets interoperability

Active users’ profile

Recommendation Module

Top-N Recommendations

Profile Learning

Contents used in past

6May 31, 2018

Big Data Recommendation Systems Applications

n Healthcare5 Health expert recommendation from social media5 Disease risk assessment (prediction)5 Health insurance recommendation

n Route Recommendation Systems5 Social venues5 Large-scale evacuation

7May 31, 2018

Case Study I: Personalized Healthcare Services1

n Increasing trends for finding online health information5 Health related searches by 93 million Americans (Pew Internet & American

Life Project )2

n Health information from online health communities5 Exchange and share disease specific experiences5 Psychological support from peers (example: patientslikeme3)

n Increased expenditure of healthcare 5 U.S. healthcare expenses approx. 18.2% of the GDP till now (2018)4

n Key Contributions:5 Disease risk assessment (NHANES 2009—2010 dataset5 )5 Health expert recommendation from Twitter

g 1,500,000,000+ Healthcare Tweets6

g 30,000 provider profiles (MD, RN etc.)g 15,780 predefined topicsg 16,283 health communities

2“NBCNews,” http://www.nbcnews.com/id/3077086/t/more-people-search-health-online/#.Ws4ss4hubIU, accessed on April 11, 2018.3“Patientslikeme”,http://www.patientslikeme.com/, accessed on April 11, 2018.4 “Statista: The Statistical Portal,” https://www.statista.com/statistics/184968/us-health-expenditure-as-percent-of-gdp-since-1960/, Accessed on April 11, 2018. 5“National Health and Nutrition Examination Survey,” http://wwwn.cdc.gov/nchs/nhanes/search/nhanes09_10.aspx, accessed on September 29, 2014.6“Healthcare Social Media Analytics,” http://www.symplur.com/healthcare-social-media-analytics/, accessed on April 11, 2018.

1A. Abbas, M. Ali, M. U. S. Khan, and S. U. Khan, “Personalized Healthcare Cloud Services for Disease Risk Assessment and Wellness Management using Social Media” Pervasive and Mobile Computing, vol. 28, pp. 81-96, 2016.

https://www.statista.com/statistics/184968/us-health-expenditure-as-percent-of-gdp-since-1960/

http://www.symplur.com/healthcare-social-media-analytics/

8May 31, 2018


Existing users’

profiles for

multiple diseasesRequesting

User’s profile

Collaborative Filtering

Disease Risk Assessment

Disease risk

assessment request

Disease specific profiles retrieval

Retrieval of important profile attributes

Similarity computation

Profile

matching

Twitter Based Health Expert

Recommendation

Tweets retrieval

Tweets tokenization

Candidate expert identification

Disease specific segregation of experts

through HITS

Pa

ralle

l an

d p

erio

dic

job

s

pre

pro

ce

ssin

g

Ranked list

of experts

Recommended

list of experts

Health expert

recommendation request

Health expert recommendation request

1

2

3

45

7

6

7

8

9

! ", $ = &'( +∑+∈- ./0(", 2)(4+,5 − &4+)

∑+∈- ./0(", 2)

attributes

mean of q

Similarity score predicted value of disease d for

existing user e

mean for particular

attribute

Hyperlink Induced Topic Search (HITS)—hubs and authorities

1A. Abbas, M. Ali, M. U. S. Khan, and S. U. Khan, “Personalized Healthcare Cloud Services for Disease Risk Assessment and

Wellness Management using Social Media” Pervasive and Mobile Computing, vol. 28, pp. 81-96, 2016.

9May 31, 2018


K1 K2 K3 K4 K5 K6

U1 5 1 2 2 5 1U2 - 3 2 8 2 -U3 3 1 2 4 6 -U4 4 - - - - 11

Iteration No. U1 U2 U3 U4

1 0.281 0.218 0.255 0.234

38 0.275 0.249 0.288 0.196

Iteration No. K1 K2 K3 K4 K5 K6

1 0.197 0.060 0.067 0.235 0.246 0.191

39 0.190 0.065 0.068 0.258 0.254 0.163

User-keyword matrix Hub score Authority score

EvaluationPrecision = TP

TP + FP Recall = TPTP + FN F − measure = 2. precision. recallprecision + recall

CART— data partitioning to classify the presence of absence of diseaseLogistic Regression— relationship between disease data attributes to determine outcomes Naïve Bayes— probability for the presence or absence of disease BF tree— best data split to determine the outcomesBayesNet— DAG to represent the relationship between disease and symptoms MLP—attributes provided at input layer to produce output Random Forest—creation of multiple trees

Rotation Forest — splitting of dataset and evaluation through decision tree SVM— hyperplane separates patients from non-patients

TP=True Positive, FP=False Positive, TN=True Negative, FN=False Negative

CFDRA Evaluation EUR Evaluation RowSum —health related keyword count2Paul et al. — topical authority identification (tweets, retweets, self-similarity)3Cheng et al. — local experts in an area

2 A. Paul, and S. Counts, “Identifying topical authorities in microblogs,” In Proceedings of the fourth ACM international conference on

Web search and data mining, 2011, pp. 45-54. 3Z. Cheng, J. Caverlee, H. Barthwal, and V. Bachani, “Who is the Barbecue King of Texas? A Geo-Spatial Approach to Finding Local

Experts on Twitter,” In Proceedings of the 37th international ACM SIGIR conference on Research & development in information

retrieval, 2011, pp. 335-344.

“age”, “gender”, “ethnicity/race”, “height”, “weight”, “diagnosed high blood sugar or pre-diabetes”, “diabetes family history”, “physical activity”, “ever observed high blood pressure”, “blood cholesterol”, and “smoking, “ever diagnosed diabetes”.

Health related tweets extracted from Twitter

When it predicts YES, how often it is correct?

When it’s actually YES, how often does it predict YES?

1A. Abbas, M. Ali, M. U. S. Khan, and S. U. Khan, “Personalized Healthcare Cloud Services for Disease Risk Assessment and Wellness Management using Social Media” Pervasive and Mobile Computing, vol. 28, pp. 81-96, 2016.

10May 31, 2018

Case Study I: Experimental Results

0

0.5

1

CART RF LR

Naïve Bayes BF

MLP

BayesNet

RoFSVM

CFDRA

Proposed CFDRA Performance ComparisonPrecision Recall F-measure

0.1

0.2

0.3

5 10 15 20No. of Recommendations

Recall score comparison of the proposed EUR method

EUR RowSum Paul et al. Cheng et al.

0.1

0.3

0.5


F-measure score comparison of the proposed EUR method


0

0.5

1


Precision score comparison of the proposed EUR method


11May 31, 2018

Case Study I: Scalability Analysis

0

5

10

15

1 2 3 4 5 6 7 8 9 10 11 12

Time (Sec.)

No. of Processors

CFDRA Scalability Analysis by varying the no. of profiles and no. of processors

5K Profiles 10K Profiles 15K Profiles

0

500

1000

1500

1 3 5 7 9 11

Time (sec.)

No. of Processors

EUR Scalability Analysis by varying the no. of profiles and no. of processors

103 MB 206 MB 309 MB

0

1000

2000

3000

4000

2 4 6 8 10 12

TPS Per Processor

No. of Processors

CFDRA Transactions per second per processor

5K Profiles 10K Profiles 15K Profiles

TPS=No. of profiles compared

00.10.20.30.40.5

2 4 6 8 10 12

TPS per processor

No. of Processors

EUR Transactions per second per processor

103 MB 206 MB 309 MB

TPS= Amount of data in MB

12May 31, 2018

Case Study II: Health Insurance Plan Recommendation1

n Patient Protection and Affordable Care Act (PPACA)

5Marketplaces

g medical plans (78,000)

2

g dental plans (45,000)

3

g expected increase in near future

5Private insurance providers

n Limited capabilities of the contemporary Web based tools

5Challenges

g Multi-faceted requirements

ucost

ucoverage

g Information filtering

udifficult to find relevant information

2QHP landscape individual market, https://data.healthcare.gov/dataset/QHPLandscape-Individual-Market-Medical/b8in-sz6k, 2015

(accessed on April 12, 2018).

3Dental plan information for individuals and families, https://www.healthcare. gov/dental-plan- information/, 2015 (accessed on April 12,

2018).

1A. Abbas, M. U. S. Khan, A. Yusoff, Y. Sadikaj, J. Ashley, and S. U. Khan, “Personalized Health Insurance Recommendation

Services,” IEEE Transactions on Cloud Computing (under review).

13May 31, 2018

Case Study II: Health Insurance Plan Recommendation1

Insurance plans retrieval from Web

Plans’ ontological representation

Plan Ranking

Plan ranking

Plan clustering

Similarity computation

Ranked list of plans

Cloud based health insurance plan recommendation

Implicit and explicit

recomm

endations

Interface to the cloud

Parallel jobs

Users request for health insurance plans

1

4

3

25

Key Accomplishments:Plans evaluation based on various criteria, such as premium, copay, deductibles, and out-of-pocket limit

Implicit plan recommendations in the start (solution to cold start issue)

Explicit plan recommendations based on user stated requirements

Plans’ clustering to minimize the number of comparisons

A ranking methodology to rank the plans

A methodology to avoid long-tail issue of recommender systems

• Recommendations offered on first interaction with the system

• Based on plan popularity

• Initial popularity computation to overcome cold start

Explicit Recommendations

• Recommendations based on user stated requirements

• Similarity between the plans and requirements

• Ranking using Multi-attribute Utility Theory

Implicit Recommendations

Cluster identification

!"#$% = ( ()* !, , % ×(. /% ×01% ))

Similarity scoreWeights of decision

criteria

Satisfiability1A. Abbas, M. U. S. Khan, A. Yusoff, Y. Sadikaj, J. Ashley, and S. U. Khan, “Personalized Health Insurance Recommendation Services,” IEEE Transactions on Cloud Computing (under review).

14May 31, 2018

Case Study II: Experimental Results

0

0.2

0.4

0.6

0.8

1

5 10 15 20 25Cluster size

Recall scoreVoronoi DBSCAN FCM Bclust.

00.20.40.60.8

1


Precision ScoreVoronoi DBSCAN FCM Bclust.

0

0.2

0.4

0.6

0.8

1


F-measure scoreVoronoi DBSCAN FCM Bclust.

0

10

20

30

40

50

1 3 5 7 9 11

Ti

me

(

sec)

No. of Processors

Scalability analysis3K Plans 6K Plans 12K Plans

DBSCAN— clustering based on density connected points Fuzzy C Mean (FCM) — closeness to centerVoronoi—partitioning into cells based on ranking distance of plans Bayesian Clustering (Bclust.)— cluster merging through statistical hypothesis test

15May 31, 2018

TCC

Case Study III: A Route Recommendation Service For Large-scale Evacuations1

Route Recommendation Service

RSU

Congestion

2

Real-timeRoute

Computation

Real-time Map processing

Computer Cluster

3

• Checks space in each shelter• Location of each member of group• Density and Congestion on each

route• Routes with least time to reach the

same shelter for each member of group

4

Real-time route recommended through RSUs and other media

1

Key Accomplishments:

• A scalable service capable of route recommendation during an emergency evacuation:

• efficient traffic flows • leads to minimum congestion of the

roads

Challenges:• Scalability:

• big data graphs handling and partitioning

• Dynamic factors:• road congestions• road safety• shelter space

1M. U. S. Khan, O. Khalid, Y. Huang, F. Zhang, R. Ranjan, S. U. Khan, J. Cao, K. Li, B. Veeravalli, and A. Zomaya, “MacroServ: A Route Recommendation Service for Large-Scale Evacuations,” IEEE Transactions on Services Computing, vol. 10, no. 4, pp. 589-602, 2017.

16May 31, 2018

Average evacuations per minute

Aver

age

trave

l tim

e (m

inut

es)

Average travel times with varying number of departing vehicles from each intersection

Number of cars per minute→

Aver

age

trave

l tim

e (m

ins)

Effect of road damage by varying departure time (scale parameter α of Weibull Distribution)

Scale parameter "→ Scale parameter "→

Aver

age

trave

l tim

e (m

ins)

Aver

age

cong

estio

n

Average congestion with respect to time with damaged road network

Time (mins)→ Aver

age

trave

l tim

e (m

ins)

Effect of population increase in future 3 years on average car travel time on

damaged network

Year →

Case Study III: Experimental Results

17May 31, 2018

Case Study III: Experimental Results

Num

ber o

f Mes

sage

s

Number of partitions →

Number of partitions →

Num

ber o

f Mes

sage

s

Com

mun

icat

ion/

com

puta

tion

• Doubling the size of the region increases the recommendation generation time by an average of 26%.

• The increase in single processor results in decrease in the recommendation generation time by an average of 9%

• Doubling the size of the map decreases the average number of vehicle crossing from one zone to another by 76%. Av

erag

e re

com

men

datio

n ge

nera

tion

time

time

(sec)

Number of partitions→

18May 31, 2018

Future Work

n Case Study I:5 Identification of health experts from the same geographical area where

enquiring users reside5 Identification of fake twitter profiles through tweet analysis

n Case Study II:5 Insurance plan recommendation through existing users characteristics

n Case Study III:5 Considering additional parameters for emergency evacuations:

g Drivers’ behaviorg Evacuees’ compliance to the recommended routes

Big Data Recommendation Approaches for Healthcaretoronto.ieee.ca/files/2018/06/DSP_revised.pdf ·...

Documents

Transcript of Big Data Recommendation Approaches for Healthcaretoronto.ieee.ca/files/2018/06/DSP_revised.pdf ·...