Product Review Summarization from a Deeper Perspective Duy Khang Ly, Kazunari Sugiyama, Ziheng Lin,...
-
Upload
bernadette-owens -
Category
Documents
-
view
219 -
download
0
Transcript of Product Review Summarization from a Deeper Perspective Duy Khang Ly, Kazunari Sugiyama, Ziheng Lin,...
Product Review Summarization from a Deeper Perspective
Duy Khang Ly, Kazunari Sugiyama, Ziheng Lin, Min-Yen Kan
National University of Singapore
Introduction
2WING, NUS
• Other customers can refer to the review when they buy it or not• Manufacturers can get a kind of feedback from customers
“Best photos that I have ever taken and a joy to use”
“fantastic results”
“754 customer reviews”
IntroductionOutput of summary in existing systems[Hu and Liu, KDD’04], [Hu and Liu, AAAI’04], [Popescu and Etzioni, HLT/EMNLP’05]
3WING, NUS
a. Lens
(+): 57 sentences 1. The lens feels very solid! 2. I have taken a whole bunch of excellent pictures with this lens.
…(-): 15 sentences 1. I do not satisfy with the included lens kit. 2. The lens cap is very loose and come off very easily !
…
b. Battery Life
(+): 32 sentences 1. The battery lasts for ever on one single charge.
2. The battery duration is amazing ! …(-): 8 sentences
1. I experienced very short battery life from this camera.2. It uses a heavy battery. …
• Does not organize the sentences in each sentiment• Users need to read through the sentences to know the reasons that justify the sentiment
IntroductionOutput of desirable summary that our system aims at
4WING, NUS
a. Lens
(+): The lens feels very solid! (+10 similar)(-): I think the lens does not worth it, it’s a bit too fragile. (+2 similar)
(+): I have taken a lot of excellent pictures with this lens. (+7 similar)(-): Don’t buy this lens, I always get my pictures blurred. (+0 similar) …
b. Battery Life
(+): The battery lasts for ever on one single charge. (+18 similar)(-): I experienced very short battery life from this camera. (+4 similar)
(+): 0 sentence(-): It uses a heavy battery.…
• Provides a representative reason for the sentiment• Users can read a concise summary
Proposed Method
5WING, NUS
Pre- processing
Association Rule Mining
Post- processing
Infreq. Facet Extraction
Opinionated Sentence Extraction
1.The lens is too plastic!2.The price of this lens is affordable!…
1.The output pictures are crystal clear.2.I like the sharpness of the picture.…
…
Sentence Representation
Sentence Clustering
Compact Presentation
(1) PRODUCT FACET
IDENTIFICATION
(2) SUMMARIZATION
Subtopic ClusteringProductReviews
OutputSummary
Syntactic role
Clustering
Product Facet Identification
6WING, NUS
Pre- processing
Association Rule Mining
Post- processing
Infreq. Facet Extraction
POS tagging Extract noun and noun phrasesSyntactic Roles Filter away noisy results
Identify all the frequent explicit product facets
Remove irrelevant facets
Help discover infrequent facets
Summarization
7WING, NUS
Opinionated Sentence Extraction
1.The lens is too plastic!2.The price of this lens is affordable!…
1.The output pictures are crystal clear.2.I like the sharpness of the picture.…
…
Sentence Representation
Sentence Clustering
Compact Presentation
Subtopic Clustering
[Ding’s et al., WSDM’08]• Assign a polarity score per sentence• Compute summation of polarity score of its constituent words
Compute content-based pairwise similarities between all resulting opinion sentences
Clustering• Hierarchical clustering with groupwise-average distance• Non-hierarchical clustering
Select the most representative sentence in the cluster
ExperimentsExperimental Data
3 products
from [Hu and Liu, KDD’04]
8
Products Number of sentences
Camera 160
Phone 139
DVD player 111
Evaluation Measure
(1) Product Facet Identification- Recall, Precision
(2) Summarization- Purity, Inverse purity
- F (harmonic mean of purity and inverse purity) [Hotho et al., GLDV-Journal for Computational Linguistics and Language Technology ‘05]
WING, NUS
9
Purity(i) In each generated cluster, precision is first computed regarding
each label, the maximum value is then selected.
(ii) The overall value for purity are computed by taking the
weighted average of (i).
8
4:
5
2:
7
5: 3
,2
,1 CCC
(i) Maximum precision of each cluster
6.08
4
20
8
5
2
20
5
7
5
20
7Purity
(ii) “purity” for this clustering result
(8)
(5)
(4)
× × (3)×
Target documents for clustering (20)
×
1C2C
3C
×
×
WING, NUS
Inverse purity(i) In each generated cluster, recall is first computed regarding
each label, the maximum value is then selected.
(ii) The overall value for inverse purity are computed by taking the weighted average of (i).
(8)
(5)
(4)
× × (3)
×
1C2C
3C
×
,8
5:
(i) Maximum recall of each label
65.03
2
20
3
4
2
20
4
5
4
20
5
8
5
20
8
purityInverse
(ii) “inverse purity” for this clustering result×
×
,5
4:
,4
2: ×
,3
2:
Target documents for clustering (20)
10WING, NUS
F1-measureHarmonic mean of “purity” and “inverse purity”
11WING, NUS
ityInversePurPurity
F1
)1(1
1
(α = 0.5)
(1) Product Facet IdentificationExample of extracted facet:
Camera: “battery,” “picture,” “lens”
Phone: “signal,” “headset”
DVD player: “remote control,” “format”
12WING, NUS
WING, NUS
(1) Product Facet Identification
13
Data Number of manuallyextracted
facets
Association mining Post processing Infrequent facet
Recall Precision Recall Precision Recall Precision
Camera 79 0.671 0.552 0.658 0.825 0.822 0.747
Phone 67 0.731 0.563 0.716 0.828 0.761 0.718
DVD 49 0.754 0.531 0.754 0.765 0.797 0.793
Average 65 0.719 0.549 0.709 0.806 0.793 0.753
Performance of the product facet identification component [Hu and Liu, KDD’04]
Performance of the product facet identification component [Hu and Liu, KDD’04] + syntactic role
Data Number of manuallyextracted
facets
Association mining Post processing Infrequent facet
Recall Precision Recall Precision Recall Precision
Camera 79 0.671 0.646 0.658 0.894 0.822 0.842
Phone 67 0.731 0.648 0.716 0.903 0.761 0.769
DVD 49 0.754 0.610 0.754 0.818 0.797 0.867
Average 65 0.719 0.634 0.709 0.872 0.793 0.826
(2) Summarization
14WING, NUS
Data Facet (Number of manually defined clusters)
CameraBattery (4), Memory (3), Flash (4),LCD (6), Lens (7), Megapixels (5), Mode (6), Shutter (6)
Average: 5.13
PhoneBattery (3), Camera (3), Headset (4), Radio (3),Service (5), Signal (3), Size (3), Speaker (4)
Average: 3.50
DVD Price (1), Remote (4), Format (1),Design (1), Service (1), Picture (4)
Average: 2.00
Number of facets in each product
“Camera” has richer properties.
(2) Summarization
15WING, NUS
Performance of summarization (F1-measure)
Data Facet Number of manually defined clusters
Hierarchicalclustering
Non-hierarchicalclustering
Random clustering
Camera
Battery 4 0.702 0.733 0.596
Memory 3 0.783 0.707 0.563
Flash 4 0.628 0.693 0.550
LCD 6 0.606 0.722 0.473
Lens 7 0.884 0.884 0.571
Megapixels 5 0.543 0.626 0.473
Mode 6 0.897 0.897 0.556
Shutter 6 0.760 0.760 0.555
Average 5.13 0.725 0.753 0.542
DVD
Price 1 0.833 0.865 0.688
Remote 4 0.682 0.643 0.579
Format 1 0.833 0.727 0.667
Design 1 1.000 1.000 1.000
Service 1 0.850 0.686 0.686
Picture 4 0.824 0.824 0.474
Average 2.00 0.837 0.791 0.682
Effective when the number of subtopics is small.
Effective when the number of subtopics is large.
Conclusion• Design a system that can summarize product
reviews and organize them into a structured, extractive summary- Product facet identification
- Syntactic role information within a sentence is effective.
- Summarization- Both hierarchical and non-hierarchical clustering work
better compared with random clustering.
16WING, NUS
Future Work• Recognize brand names to improve facet identification
“My Canon camera has longer battery life than Nikon.”
Thank you very much!