YICHIA WANG 2011/01/10 Weekly Meeting. Big Picture Study the relationships between the support that...

25
YICHIA WANG 2011/01/10 Weekly Meeting

Transcript of YICHIA WANG 2011/01/10 Weekly Meeting. Big Picture Study the relationships between the support that...

YICHIA WANG2011/01/10

Weekly Meeting

Big Picture

Study the relationships between the support thata member received from health support groups and his/her commitment to the groups

Hypotheses -> literature reviewAppropriate data -> breast cancer communityBetter ground truth regarding social support -> mTurkBetter social support classifiers -> error analysis

Rousseau and Aube, 2010

Supervisor and coworker support -> affective commitment (-> stay in the organization) Supervisor and coworkers supports are additive Moderators

Job resource adequacy Ambient conditions

Social support –?-> commitment to discussion groups (–?-> improvement of life quality) Roles of support provider: doctor, caregiver, … etc. Types of support: informational and emotional Moderators

Forum functions Individual status The relationship between the support provider and receiver …

Breast Cancer Discussion Forums - Data Collection

Study the relationships between the support produced in these support groups and members’ commitment to the groups.

Collaborate with Dong

Crawl data All user profiles:

82,150 members All threads:

65 forums 66,532 threads

Mechanical Turk

Mechanical Turk Results

100 ACOR messages + 10 ACOR messages with gold standard Assignments completed: 1100/1100 (100%) Average submit time: 83 seconds

Result analysis Cronbach’s alpha for 110 messages

Emotional support = 0.91 Informational support = 0.91

Correlation for 10 messages Individual judgment level = 0.59 Average judgment level = 0.87

Acquiring More Mturk Data

Upload more data 1000 messages from breast cancer communities 50 messages from acor 50 messages from bambina $0.05 * 1100 * 10 * 1.1 = $605

ACOR VERSUS BAMBINA

Social Support Classifier

Bambina’s Corpus

SOL-Cancer Forum (support online cancer forum)First 2 weeks of March 20001149 messages

Emotional support: 519 messages Informational support: 359 messages

Inter-rater reliability of 400 messages > 70%

Providing-emotional-support

BambinaFeatures

# of features

Naïve Bayes Acc. Kappa

SVMAcc. Kappa

TagHelper 300 0.65 0.32 0.74 0.46

Domain Kldg All (10) 0.63 0.27 0.66 0.29

LDA All (20) 0.57 0.19 0.65 0.28

Domain Kldg + LDA All (30) 0.59 0.23 0.70 0.39

TagHelper + Domain Kldg 300 0.66 0.33 0.75 0.48

TagHelper + LDA 300 0.64 0.30 0.73 0.46

TagHelper + Domain Kldg + LDA 300 0.64 0.31 0.74 0.47

TagHelper 50 0.62 0.28 0.73 0.45

TagHelper + Domain Kldg + LDA 50 0.62 0.27 0.75 0.49

SVM Model Feature Analysis (PES)

Weight Feature Weight Feature Weight Feature

1.6493 agree 0.9240 chemo -0.6776 VBD_FW

1.5327 regards 0.8897 faith -0.7068 lda16

1.4401 inspirational 0.8844 marlene -0.7331 carcinoma

1.3882 alot 0.8589 NNS_VBN -0.7405 single

1.3803 lda14 0.8530 among -0.7668 s

1.3555 POLAR_POS 0.8287 deb -0.7682 6

1.3322 chris 0.6826 your -0.7718 lda17

1.3189 caregiver 0.6813 great -0.7907 CD_CC

1.2998 researchers 0.6686 according -0.7921 chemotherapy

1.2947 don 0.6607 recent -0.7990 0

1.2789 prayers 0.6520 tumors -0.8324 lda20

1.2472 husband 0.6486 love -0.8351 overall

1.0985 physicians 0.6279 20 -0.8397 lol

1.0561 thanks 0.6236 hope -0.8407 medscape

1.0315 joy 0.6223 certainly -0.8474 20www

1.0236 luck 0.6120 _NNP -0.8589 medicine

1.0195 god 0.6060 lda12 -0.8668 were

1.0134 art 0.5626 VBP_PRP<dollar> -0.9670 effect

1.0042 tommy 0.5624 hear -0.9703 invite

1.0000 coaster 0.5558 glad -0.9960 use

1.0000 roller 0.5493 things -1.0000 treating

0.9864 oncology 0.5411 andy -1.0443 lda11

0.9491 studies 0.5195 informational -1.0628 lda5

0.9354 thank 0.5174 following -1.2514 drugs

0.9256 SYM_CD 0.5168 qksk -1.3466 american

lda14

time care

people best

life lot

tell stuff

heart having

make find

feel took

say comes

going second

better pity

today pot

sure night

home living

way beautiful

things able

long journey

friends hours

days chance

know couple

age world

got worked

tomorrow place

times word

think keep

live married

lda5

www x

information main

thalidomide page

death morning

site leukemia

ask htm

children edu

com adenocarcinoma

talking links

sites difficult

forum line

name s

welcome similar

help origin

20http y

new wonderful

talk lillian

html asp

brain metasearch

friend stay

org learn

support myeloma

melanoma positive

full about

yes 3f

Error Analysis: Emotional-support

Confusion matrixEmotional support usually occurs in the beginning or at the end of the

messageExtract features from specific parts of messages

Ambiguities caused by positive wordsWe should focus on specific categories of positive words, such as

encouragement, prayer

a b <-- classified as 505 125 | a = No 164 355 | b = Yes

Providing-information-support

BambinaFeatures

# of features

Naïve Bayes Acc. Kappa

SVMAcc. Kappa

TagHelper 300 0.77 0.47 0.82 0.58

Domain Kldg All (10) 0.76 0.40 0.70 0.08

LDA All (20) 0.82 0.54 0.81 0.49

Domain Kldg + LDA All (30) 0.83 0.56 0.83 0.54

TagHelper + Domain Kldg 300 0.77 0.47 0.82 0.58

TagHelper + LDA 300 0.80 0.53 0.82 0.58

TagHelper + Domain Kldg + LDA 300 0.81 0.53 0.83 0.58

TagHelper 50 0.79 0.48 0.83 0.55

TagHelper + Domain Kldg + LDA 50 0.81 0.52 0.84 0.58

SVM Model Feature Analysis (PIS)

Weight Feature Weight Feature Weight Feature

1.9209 number 0.8031 journal -0.5906 our

1.7349 opinion 0.8010 national -0.6830 IN_CD

1.6541 food 0.7792 known -0.6853 nextpart

1.4625 medical 0.7760 NNP_MD -0.7040 studies

1.3858 diagnosis 0.7646 fda -0.7050 lda8

1.3204 org 0.7634 FW_ -0.7067 20warmly

1.2818 risk 0.7625 PERCENT_SIGN -0.7096 VBG_NN

1.1778 lda2 0.7569 women -0.7136 trials

1.1382 JJR_NNS 0.7414 800 -0.7171 line_length

1.1250 tips 0.7393 20we -0.7288 CC_NNP

1.0686 dr 0.7392 SUBJ_WEAK -0.7616 CC_JJ

1.0320 20www 0.7101 20the -0.8033 com

1.0061 x 0.7099 NNS_VBZ -0.8042 lda16

1.0000 associated 0.7015 tumor -0.8399 SUBJ_STRONG

0.9974 _FW 0.6836 _ -0.8519 PRP<dollar>_NNP

0.9913 study 0.6810 institute -0.8560 IN_VBN

0.9458 SEMI_COLON 0.6810 increased -0.8609 CD_JJ

0.9284 among 0.6649 cell -0.9168 look

0.9082 lda10 0.6529 lda12 -0.9889 POS_ADJ

0.8938 advanced 0.6320 NNPS_NNP -0.9891 including

0.8718 NNS_ 0.6239 blood -1.1614 cells

0.8715 trial 0.6146 lda9 -1.2121 s

0.8495 within 0.6042 TO_JJ -1.4048 lda4

0.8170 lda1 0.5963 VBZ_JJ -1.4447 lda18

0.8112 following 0.5828 survival -1.6828 thanks

lda1

years cancer

m seeds

d primary

blood conventional

children 29

20 injury

hospital philadelphia

old serious

free question

john tumors

physician mucosal

white net

cell visit

anderson life

info study

phase brain

iron head

apricot co

eligible protocol

fda everyone

therapy appear

problems suggestion

course helped

side refractory

research seed

lda4

message postema

original truly

hi reply

thanks certainly

i chuck

jeanne list

mom address

nancy mcafee

from prayers

you gone

thank debby

lillian hopefully

nanc michigan

send sure

john attitude

hear hall

love 95

wrote with

glad too

ya pc

peggy grodin

kissinger touch

hope rambeau

deb watch

mail joy

Error Analysis: Informational-support

Confusion matrix Informational support can occur at anywhere in a message More present tense

(Predicted: no and actual: yes) dusti i have on an infusion pump. it is connected to a line in my chest andpumps 24 hrs a day. the pump is carried in a fanny pack or i can lay itbeside me. there is enough line to allow you to move a short distance.my husband gives me the shots and i do get flu like symptoms from them.i am still hoping the gemzar will help your mom. it is weird butsometimes one person will respond and another won't so don't give up.betty

a b <-- classified as 761 29 | a = No 159 200 | b = Yes

Discussion

We can manually select meaningful and important features from machine learning models and construct a feature set for social support detection

References

Domain Knowledge Features

DRUG Normalized number of drugs in a message

POLAR_POS Normalized number of positive clues in a message

POLAR_NEG Normalized number of negative clues in a message

SUBJ_STRONG Normalized number of strong subjective clues in a message

SUBJ_WEAK Normalized number of weak subjective clues in a message

Domain Knowledge Features (2)

QUESTION Number of question sentences in a message Rule-based approach for identifying questions

Direct questions A sentence ends with “?” A sentence starts with question words, such as “what”, “do”, …

Indirect questions I am wondering if … I want …

Negation Normalized number of negation words in a message

not, couldn’t, won’t, …

Domain Knowledge Features (3)

POS_NNP Normalized number of proper nouns in a message

POS_ADJ Normalized number of adjective in a message

POS_PAST Normalized number of past tense verbs in a message

Providing-emotional-support

BambinaFeatures

# of features

Naïve Bayes Acc. Kappa

SVMAcc. Kappa

TagHelper 300 0.65 0.32 0.74 0.46

TagHelper + Domain Kldg 300 0.66 0.33 0.75 0.48

Domain Kldg All (10) 0.63 0.27 0.66 0.29

ACORFeatures

# of features

Naïve Bayes Acc. Kappa

SVMAcc. Kappa

LIWC 52 0.79 0.24 0.85 0.10

LIWC + TagHelper 300 0.80 0.35 0.85 0.31

LIWC + TagHelper + Domain Kldg 300 0.79 0.34 0.85 0.30

LIWC + TagHelper + Domain Kldg+ LDA

300 0.77 0.31 0.85 0.30

LIWC + Domain Kldg + LDA All (80) 0.71 0.22 0.86 0.00

Providing-information-support

ACORFeatures

# of features Naïve Bayes Acc. Kappa

SVMAcc. Kappa

LIWC 52

LIWC + Unigrams 300

LIWC + Unigrams + Domain Kldg 300 0.66 0.33

LIWC + Unigrams + Domain Kldg+ LDA

300 0.68 0.33

LIWC + Domain Kldg + LDA All (80) 0.64 0.28 0.69 0.38

BambinaFeatures

# of features

Naïve Bayes Acc. Kappa

SVMAcc. Kappa

TagHelper 300 0.77 0.47 0.82 0.58

TagHelper + Domain Kldg 300 0.77 0.47 0.82 0.58

Domain Kldg All (10) 0.76 0.40 0.70 0.08

Regular project meetings3:00PM Eastern, 3001 NSH

Call in: 1-800-882-3610 or 1-412-380-2000 passcode 1949227#

AdobeConnect information: To host a meeting: login: '[email protected]'   password:

'.adobe1' (no quotes). Then go to MEETINGS tab. To join a meeting:

http://connectnow.acrobat.com/robertkraut493