YICHIA WANG 2011/01/10 Weekly Meeting. Big Picture Study the relationships between the support that...
Transcript of YICHIA WANG 2011/01/10 Weekly Meeting. Big Picture Study the relationships between the support that...
Big Picture
Study the relationships between the support thata member received from health support groups and his/her commitment to the groups
Hypotheses -> literature reviewAppropriate data -> breast cancer communityBetter ground truth regarding social support -> mTurkBetter social support classifiers -> error analysis
Rousseau and Aube, 2010
Supervisor and coworker support -> affective commitment (-> stay in the organization) Supervisor and coworkers supports are additive Moderators
Job resource adequacy Ambient conditions
Social support –?-> commitment to discussion groups (–?-> improvement of life quality) Roles of support provider: doctor, caregiver, … etc. Types of support: informational and emotional Moderators
Forum functions Individual status The relationship between the support provider and receiver …
Breast Cancer Discussion Forums - Data Collection
Study the relationships between the support produced in these support groups and members’ commitment to the groups.
Collaborate with Dong
Crawl data All user profiles:
82,150 members All threads:
65 forums 66,532 threads
Mechanical Turk Results
100 ACOR messages + 10 ACOR messages with gold standard Assignments completed: 1100/1100 (100%) Average submit time: 83 seconds
Result analysis Cronbach’s alpha for 110 messages
Emotional support = 0.91 Informational support = 0.91
Correlation for 10 messages Individual judgment level = 0.59 Average judgment level = 0.87
Acquiring More Mturk Data
Upload more data 1000 messages from breast cancer communities 50 messages from acor 50 messages from bambina $0.05 * 1100 * 10 * 1.1 = $605
Bambina’s Corpus
SOL-Cancer Forum (support online cancer forum)First 2 weeks of March 20001149 messages
Emotional support: 519 messages Informational support: 359 messages
Inter-rater reliability of 400 messages > 70%
Providing-emotional-support
BambinaFeatures
# of features
Naïve Bayes Acc. Kappa
SVMAcc. Kappa
TagHelper 300 0.65 0.32 0.74 0.46
Domain Kldg All (10) 0.63 0.27 0.66 0.29
LDA All (20) 0.57 0.19 0.65 0.28
Domain Kldg + LDA All (30) 0.59 0.23 0.70 0.39
TagHelper + Domain Kldg 300 0.66 0.33 0.75 0.48
TagHelper + LDA 300 0.64 0.30 0.73 0.46
TagHelper + Domain Kldg + LDA 300 0.64 0.31 0.74 0.47
TagHelper 50 0.62 0.28 0.73 0.45
TagHelper + Domain Kldg + LDA 50 0.62 0.27 0.75 0.49
SVM Model Feature Analysis (PES)
Weight Feature Weight Feature Weight Feature
1.6493 agree 0.9240 chemo -0.6776 VBD_FW
1.5327 regards 0.8897 faith -0.7068 lda16
1.4401 inspirational 0.8844 marlene -0.7331 carcinoma
1.3882 alot 0.8589 NNS_VBN -0.7405 single
1.3803 lda14 0.8530 among -0.7668 s
1.3555 POLAR_POS 0.8287 deb -0.7682 6
1.3322 chris 0.6826 your -0.7718 lda17
1.3189 caregiver 0.6813 great -0.7907 CD_CC
1.2998 researchers 0.6686 according -0.7921 chemotherapy
1.2947 don 0.6607 recent -0.7990 0
1.2789 prayers 0.6520 tumors -0.8324 lda20
1.2472 husband 0.6486 love -0.8351 overall
1.0985 physicians 0.6279 20 -0.8397 lol
1.0561 thanks 0.6236 hope -0.8407 medscape
1.0315 joy 0.6223 certainly -0.8474 20www
1.0236 luck 0.6120 _NNP -0.8589 medicine
1.0195 god 0.6060 lda12 -0.8668 were
1.0134 art 0.5626 VBP_PRP<dollar> -0.9670 effect
1.0042 tommy 0.5624 hear -0.9703 invite
1.0000 coaster 0.5558 glad -0.9960 use
1.0000 roller 0.5493 things -1.0000 treating
0.9864 oncology 0.5411 andy -1.0443 lda11
0.9491 studies 0.5195 informational -1.0628 lda5
0.9354 thank 0.5174 following -1.2514 drugs
0.9256 SYM_CD 0.5168 qksk -1.3466 american
lda14
time care
people best
life lot
tell stuff
heart having
make find
feel took
say comes
going second
better pity
today pot
sure night
home living
way beautiful
things able
long journey
friends hours
days chance
know couple
age world
got worked
tomorrow place
times word
think keep
live married
lda5
www x
information main
thalidomide page
death morning
site leukemia
ask htm
children edu
com adenocarcinoma
talking links
sites difficult
forum line
name s
welcome similar
help origin
20http y
new wonderful
talk lillian
html asp
brain metasearch
friend stay
org learn
support myeloma
melanoma positive
full about
yes 3f
Error Analysis: Emotional-support
Confusion matrixEmotional support usually occurs in the beginning or at the end of the
messageExtract features from specific parts of messages
Ambiguities caused by positive wordsWe should focus on specific categories of positive words, such as
encouragement, prayer
a b <-- classified as 505 125 | a = No 164 355 | b = Yes
Providing-information-support
BambinaFeatures
# of features
Naïve Bayes Acc. Kappa
SVMAcc. Kappa
TagHelper 300 0.77 0.47 0.82 0.58
Domain Kldg All (10) 0.76 0.40 0.70 0.08
LDA All (20) 0.82 0.54 0.81 0.49
Domain Kldg + LDA All (30) 0.83 0.56 0.83 0.54
TagHelper + Domain Kldg 300 0.77 0.47 0.82 0.58
TagHelper + LDA 300 0.80 0.53 0.82 0.58
TagHelper + Domain Kldg + LDA 300 0.81 0.53 0.83 0.58
TagHelper 50 0.79 0.48 0.83 0.55
TagHelper + Domain Kldg + LDA 50 0.81 0.52 0.84 0.58
SVM Model Feature Analysis (PIS)
Weight Feature Weight Feature Weight Feature
1.9209 number 0.8031 journal -0.5906 our
1.7349 opinion 0.8010 national -0.6830 IN_CD
1.6541 food 0.7792 known -0.6853 nextpart
1.4625 medical 0.7760 NNP_MD -0.7040 studies
1.3858 diagnosis 0.7646 fda -0.7050 lda8
1.3204 org 0.7634 FW_ -0.7067 20warmly
1.2818 risk 0.7625 PERCENT_SIGN -0.7096 VBG_NN
1.1778 lda2 0.7569 women -0.7136 trials
1.1382 JJR_NNS 0.7414 800 -0.7171 line_length
1.1250 tips 0.7393 20we -0.7288 CC_NNP
1.0686 dr 0.7392 SUBJ_WEAK -0.7616 CC_JJ
1.0320 20www 0.7101 20the -0.8033 com
1.0061 x 0.7099 NNS_VBZ -0.8042 lda16
1.0000 associated 0.7015 tumor -0.8399 SUBJ_STRONG
0.9974 _FW 0.6836 _ -0.8519 PRP<dollar>_NNP
0.9913 study 0.6810 institute -0.8560 IN_VBN
0.9458 SEMI_COLON 0.6810 increased -0.8609 CD_JJ
0.9284 among 0.6649 cell -0.9168 look
0.9082 lda10 0.6529 lda12 -0.9889 POS_ADJ
0.8938 advanced 0.6320 NNPS_NNP -0.9891 including
0.8718 NNS_ 0.6239 blood -1.1614 cells
0.8715 trial 0.6146 lda9 -1.2121 s
0.8495 within 0.6042 TO_JJ -1.4048 lda4
0.8170 lda1 0.5963 VBZ_JJ -1.4447 lda18
0.8112 following 0.5828 survival -1.6828 thanks
lda1
years cancer
m seeds
d primary
blood conventional
children 29
20 injury
hospital philadelphia
old serious
free question
john tumors
physician mucosal
white net
cell visit
anderson life
info study
phase brain
iron head
apricot co
eligible protocol
fda everyone
therapy appear
problems suggestion
course helped
side refractory
research seed
lda4
message postema
original truly
hi reply
thanks certainly
i chuck
jeanne list
mom address
nancy mcafee
from prayers
you gone
thank debby
lillian hopefully
nanc michigan
send sure
john attitude
hear hall
love 95
wrote with
glad too
ya pc
peggy grodin
kissinger touch
hope rambeau
deb watch
mail joy
Error Analysis: Informational-support
Confusion matrix Informational support can occur at anywhere in a message More present tense
(Predicted: no and actual: yes) dusti i have on an infusion pump. it is connected to a line in my chest andpumps 24 hrs a day. the pump is carried in a fanny pack or i can lay itbeside me. there is enough line to allow you to move a short distance.my husband gives me the shots and i do get flu like symptoms from them.i am still hoping the gemzar will help your mom. it is weird butsometimes one person will respond and another won't so don't give up.betty
a b <-- classified as 761 29 | a = No 159 200 | b = Yes
Discussion
We can manually select meaningful and important features from machine learning models and construct a feature set for social support detection
Domain Knowledge Features
DRUG Normalized number of drugs in a message
POLAR_POS Normalized number of positive clues in a message
POLAR_NEG Normalized number of negative clues in a message
SUBJ_STRONG Normalized number of strong subjective clues in a message
SUBJ_WEAK Normalized number of weak subjective clues in a message
Domain Knowledge Features (2)
QUESTION Number of question sentences in a message Rule-based approach for identifying questions
Direct questions A sentence ends with “?” A sentence starts with question words, such as “what”, “do”, …
Indirect questions I am wondering if … I want …
Negation Normalized number of negation words in a message
not, couldn’t, won’t, …
Domain Knowledge Features (3)
POS_NNP Normalized number of proper nouns in a message
POS_ADJ Normalized number of adjective in a message
POS_PAST Normalized number of past tense verbs in a message
Providing-emotional-support
BambinaFeatures
# of features
Naïve Bayes Acc. Kappa
SVMAcc. Kappa
TagHelper 300 0.65 0.32 0.74 0.46
TagHelper + Domain Kldg 300 0.66 0.33 0.75 0.48
Domain Kldg All (10) 0.63 0.27 0.66 0.29
ACORFeatures
# of features
Naïve Bayes Acc. Kappa
SVMAcc. Kappa
LIWC 52 0.79 0.24 0.85 0.10
LIWC + TagHelper 300 0.80 0.35 0.85 0.31
LIWC + TagHelper + Domain Kldg 300 0.79 0.34 0.85 0.30
LIWC + TagHelper + Domain Kldg+ LDA
300 0.77 0.31 0.85 0.30
LIWC + Domain Kldg + LDA All (80) 0.71 0.22 0.86 0.00
Providing-information-support
ACORFeatures
# of features Naïve Bayes Acc. Kappa
SVMAcc. Kappa
LIWC 52
LIWC + Unigrams 300
LIWC + Unigrams + Domain Kldg 300 0.66 0.33
LIWC + Unigrams + Domain Kldg+ LDA
300 0.68 0.33
LIWC + Domain Kldg + LDA All (80) 0.64 0.28 0.69 0.38
BambinaFeatures
# of features
Naïve Bayes Acc. Kappa
SVMAcc. Kappa
TagHelper 300 0.77 0.47 0.82 0.58
TagHelper + Domain Kldg 300 0.77 0.47 0.82 0.58
Domain Kldg All (10) 0.76 0.40 0.70 0.08
Regular project meetings3:00PM Eastern, 3001 NSH
Call in: 1-800-882-3610 or 1-412-380-2000 passcode 1949227#
AdobeConnect information: To host a meeting: login: '[email protected]' password:
'.adobe1' (no quotes). Then go to MEETINGS tab. To join a meeting:
http://connectnow.acrobat.com/robertkraut493