1 Rated Aspect Summarization of Short Comments Yue Lu, ChengXiang Zhai, and Neel Sundaresan.
-
Upload
sheila-harper -
Category
Documents
-
view
213 -
download
1
Transcript of 1 Rated Aspect Summarization of Short Comments Yue Lu, ChengXiang Zhai, and Neel Sundaresan.
1
Rated Aspect Summarization of Short Comments
Yue Lu, ChengXiang Zhai, and Neel Sundaresan
2
Web 2.0 Web 2.0 Opinions Everywhere Opinions EverywhereNovotel
……
Overall Rating
iPhone
Sushi Kame
Seller’s Feedback on eBay
23,385 Feedback received23,385 Feedback received
Very fast shipping and awesome price!!!
Very fast shipping and awesome price!!!
3
Seller’s Feedback on eBay
4
Need More Specific Aspects!
Fast shippingFast shipping
Is this seller rated high/low mainly because of service?
Is this seller rated high/low mainly because of service?
Which seller provides fast shipping?
Which seller provides fast shipping?
Good serviceGood service
5
6
Rated Aspect Summarization Rated Aspect Summarization
Aspect Aspect Rating Representative Phrase
Support InformationChallenges:– How to identify coherent aspects? with user interest?– How to accurately rate each aspect?– How to get meaningful phrases supporting the ratings?
23,385 Feedback received
23,385 Feedback received
6
Overall ApproachOverall Approach
7
Step1: Step1: Aspect DiscoveryandClustering
Step2: Step2: Aspect RatingPrediction
Step3: Step3: ExtractExtractRepresentative Phrases
7
8
Preprocessing of Short CommentsPreprocessing of Short Comments
2
1
Source
businessgreatsellerhonest
priceawesomeshippingfast
Head Term (feature)
Modifier (opinion)
Very fast shipping and awesome price!!!
Very fast shipping and awesome price!!!
Great business, honest seller
Great business, honest seller
Shallow parsingShallow parsing
Comment 1
Comment 2
Step1: Step1: Aspect Discovery & Clustering
9
Step1: Step1: Aspect DiscoveryandClustering
Step2: Step2: Aspect RatingPrediction
Step3: Step3: ExtractExtractRepresentative Phrases
9
10
Method(1) Head Method(1) Head Term Clustering
2
1
Source
shippingfast
sellerhonest
sellerreliable
deliveryquick
shippingfast
Head TermModifier
fast:100 speedy:80 slow:50 … Shippingfast:120 speedy:85 slow:70 … Deliveryhonest:80 reliable:60 … Seller
Head TermModifiers
Clustering:e.g. k-means
Clustering:e.g. k-means
Support = Cluster Size
Method(2) Method(2) Unstructured PLSA
2
1
Source
shippingfast
sellerhonest
sellerreliable
deliveryquick
shippingfast
Head TermModifier
…
1
2
k
w
d1
d2
dk
shiping 0.3 delivery 0.2
service 0.32exchange 0.2
email 0.25comm. 0.22
[Hofmann 99]Topic model = unigram language model= multinomial distribution
11
Method(2) Unstructured PLSA
2
1
Source
shippingfast
sellerhonest
sellerreliable
deliveryquick
shippingfast
Head TermModifier
…
1
2
k
w
d1
d2
dk
shiping delivery
service exchange
email comm.
[Hofmann 99]Topic model = unigram language model= multinomial distribution
?
?? ?
??
Estimation: e.g. EM with MLE
Estimation: e.g. EM with MLE
12
Method(3) SMethod(3) Structured PLSA
2
1
Source
deliveryfast
Sellerhonest
sellerreliable
deliveryquick
Shippingfast
Head TermModifier
…
1
2
k
w
d1
d2
dk
shiping delivery
service exchange
email comm.
?
?? ?
??
shipping: 70slow
delivery: 80
response: 10
delivery: 30
shipping:180fast
Head TermModifier
13
Method(2) Method(2) (3): Topics Aspects
…
1
2
k
w
d1
d2
dk
shiping 0.3 delivery 0.2
service 0.32exchange 0.2
email 0.25comm. 0.22
Support = Topic Coverage
TopicsAspects
14
Method(2) Method(2) (3): Adding Prior to PLSA
…
1
2
k
w
d1
d2
dk
shiping ? delivery ?
service ?exchange ?
email ?comm. ?
a1
a2
Dirichlet Prior Topics
shiping delivery
email comm.
Estimation:e.g. EM with Maximum A Posteriori (MAP) instead of MLE
Estimation:e.g. EM with Maximum A Posteriori (MAP) instead of MLE
15
Step2: Step2: Aspect Rating Prediction
16
Step1: Step1: Aspect DiscoveryandClustering
Step2: Step2: Aspect RatingPrediction
Step3: Step3: ExtractExtractRepresentative Phrases
16
Method(1) Method(1) Local Prediction
productfine
packagedpoorly
deliveryslow2
…
1
Source
……
productgreat
shippingfast
Head TermModifier
Shipping
Aspects
Productslow
ShippingPackaging
Product
What if?What if?
17
Method(2) Method(2) Global Prediction
Shipping
Aspects
ProductShippingPackging
Productproductfine
Packagedpoorly
deliveryslow2
…
1
Source
……
productgreat
shippingfast
Head TermModifier
fast , timely, quick, fast, slow, quickly, fast, great, bad
Shipping
slow , bad, fast, poor, slowly, unbearable, quick, poor
Shipping
What if?slow shipping
What if?slow shipping
fast 0.2 timely 0.2 quick 0.2 … … slow 0.01
Shipping
slow 0.4 bad 0.2 … … quick 0.02fast 0.01
Shipping
Language Model
18
19
Method(1)(2): Method(1)(2): Rating Aggregation
slow shippingFast deliveryquick shipping
AVGAVG 2.33 stars
badly wrappedpoor packagingwell packaged
AVGAVG 1.67 stars
Aspect Rating
Shipping
Packaging
Aspect
Step3: Step3: Representative Phrases
20
Step1: Step1: Aspect DiscoveryandClustering
Step2: Step2: Aspect RatingPrediction
Step3: Step3: ExtractExtractRepresentative Phrases
20
21
Step3: Step3: Top K Frequent Phrases
Fast shippingTimely deliveryQuickly arrived
Slow shipmentBad shippingSlow delivery
Step 1 Step 2 Step 3
slow deliveryFast deliveryquick shipping
Shipping
bad shipping
Support = Phrase Freq.
(50)
22
Experiments: eBay Data SetExperiments: eBay Data Set
28 eBay sellers with high feedback scores for the past year
overall rating (positive %)# of phrases/comment# of comments/seller
Statistics
0.9597.90.04421.553362,39557,055
STDMean
Positive rating 1Neutral rating 0Negative rating 0
23
Experiments: Evaluate Step 1Experiments: Evaluate Step 1
Step1: Aspect Discovery & Clustering
Gold standard: human labeled clusters
24
Eval Step 1: Aspect CoverageEval Step 1: Aspect Coverage
Aspect Coverage measures the percentage of covered aspects
Top K ClustersTop K Clusters
Asp
ect
Co
vera
ge
Asp
ect
Co
vera
ge
k-meansk-means
Unstructured PLSAUnstructured PLSA
Structured PLSAStructured PLSA
25
Eval Step 1: Eval Step 1: Clustering AccuracyClustering Accuracy
Clustering Accuracy measures the cluster coherence
Structured PLSA
Unstructured PLSA
K-means
Method
0.52
0.32
0.36
Clustering Accuracy
0.67450.61540.66670.7414Annot2-3
0.6319
0.6806
0.5484
Seller2
0.7290
0.7846
0.6610
Seller1
AVG
Annot1-3
Annot1-2
0.67380.6604
0.72650.7143
0.62030.6515
AVGSeller3
Low Agreement;Varies a lot
Low Agreement;Varies a lot
Still much room for improvement!
Still much room for improvement!
Human Agreement
26
Experiments: Evaluate Step 2Experiments: Evaluate Step 2
Step2: Aspect Rating Prediction
27
Detailed Seller Ratings as Gold stdDetailed Seller Ratings as Gold std
Gold standard: user DSR ratings DSR criteria as priors of aspects
28
Eval Step 2: CorrelationEval Step 2: Correlation
-0.0250 (-108%)0.1225 (-58%)GlobalK-means
0.1106 (-62%)
0.2892
Kendal’s tau
Local
Step 2
K-means
Baseline
Step 1
0.1735 (-45%)
0.3162
Pearson
0.5781 (+39%)0.4958 (+76%)GlobalUnstr. PLSA
0.41580.2815LocalUnstr. PLSA
0.6118 (+35%)0.4167 (+119%)GlobalStr. PLSA
0.1905LocalStr. PLSA 0.4517
Correlation measures the effectiveness of ranking the four DSRs for a given seller
29
Eval Step 2: Ranking LossEval Step 2: Ranking Loss
0.1977 (-16%)LocalUnstr. PLSA
0.2101(-11%)GlobalUnstr. PLSA
0.1909 (-19%)LocalStr. PLSA
0.6307 (+167%)GlobalK-means
0.1534 (-35%)GlobalStr. PLSA
Local
Step 2
K-means
Baseline
Step 1
0.2170 (-8%)
0.2363
AVG of 3 DSR
Ranking Loss measures the distance between the true and predicted ratings (smallerbetter)
30
Experiments: Evaluate Step 3Experiments: Evaluate Step 3
Step3: Representative Phrases
Questions:– How do previous steps affect the phrase quality?
31
Eval Step 3: Human LabelingEval Step 3: Human Labeling
Item as Described
Communication
Shipping time
Shipping and Handling Charges
Rating 1DSR Rating 0
Rating 1:Rating 0:
Fast delivery Prompt email Slow shipping …
Excessive postage As promised …
32
Eval Step 3: Measures & ResultsEval Step 3: Measures & Results
0.5611
0.5925
0.4008
0.4127
0.2635
0.3055
Prec.
0.4605LocalUnstr. PLSA
0.4435GlobalUnstr. PLSA
0.6379LocalStr. PLSA
0.2923GlobalK-means
0.5952GlobalStr. PLSA
Local
Step 2K-means
Step 10.3510
Recall
Information Retrieval measures:Human generated phrases “relevant document“Computer generated phrases “retrieved document".
33
SummarySummary
• Novel problem– Rated Aspect Summarization
• General Methods – Three steps– Effective on eBay Feedback Comments
• Future Work– Evaluate on other data– Three steps One optimization framework
34
Thank you!Thank you!
PLSA & EM Formulas
Structured PLSA & EM Formulas
Incorporated with prior