ExtRA: Extracting Prominent Review Aspects from Customer ... · examples of such aspect-based...

10
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3477–3486 Brussels, Belgium, October 31 - November 4, 2018. c 2018 Association for Computational Linguistics 3477 ExtRA: Extracting Prominent Review Aspects from Customer Feedback Zhiyi Luo, Shanshan Huang, Frank F. Xu Bill Yuchen Lin, Hanyuan Shi, Kenny Q. Zhu Shanghai Jiao Tong University, Shanghai, China {jessherlock, huangss 33}@sjtu.edu.cn, [email protected] {yuchenlin, shihanyuan}@sjtu.edu.cn, [email protected] Abstract Many existing systems for analyzing and sum- marizing customer reviews about products or service are based on a number of prominent review aspects. Conventionally, the promi- nent review aspects of a product type are determined manually. This costly approach cannot scale to large and cross-domain ser- vices such as Amazon.com, Taobao.com or Yelp.com where there are a large number of product types and new products emerge almost everyday. In this paper, we propose a novel framework, for extracting the most prominent aspects of a given product type from textual reviews. The proposed framework, ExtRA, extracts K most prominent aspect terms or phrases which do not overlap semantically au- tomatically without supervision. Extensive ex- periments show that ExtRA is effective and achieves the state-of-the-art performance on a dataset consisting of different product types. 1 Introduction Online user review is an essential part of e- commerce. Popular e-commerce websites feature an enormous amount of text reviews, especially for popular products and services. To improve the user experience and expedite the shopping pro- cess, many websites provide qualitative and quan- titative analysis and summary of user reviews, which is typically organized by different promi- nent review aspects. For instance, Figure 1 shows a short review passage from a customer on Tri- pAdvisor.com, and the customer is also asked to give scores on several specific aspects of the hotel, such as location and cleanness. With aspect-based reviews summary, potential customers can assess a product from various essential aspects very ef- ficiently and directly. Also, aspect-based review summary offers an effective way to group prod- ucts by their prominent aspects and hence enables quick comparison. Figure 1: An example user review about a hotel on TripAd- visor. The grades are organized by different prominent review aspects: value, rooms, etc. Existing approaches for producing such promi- nent aspect terms have been largely manual work (Poria et al., 2014; Qiu et al., 2011). This is feasible for web services that only sell (or re- view) a small number of product types of the same domain. For example, TripAdvisor.com only fea- tures travel-related products, and Cars.com only reviews automobiles, so that human annotators can provide appropriate aspect terms for cus- tomers based on their domain knowledge. While it is true that the human knowledge is useful in characterizing a product type, such manual ap- proach does not scale well for general-purpose e- commerce platforms, such as Amazon, eBay, or Yelp, which feature too many product types, not to mention that new product and service types are emerging everyday. In these cases, manually se- lecting and pre-defining aspect terms for each type is too costly and even impractical. Moreover, the key aspects of a product type may also change over time. For example, in the past, people care more about the screen size and signal intensity when reviewing cell phones. These as- pects are not so much of an issue in present days. People instead focus on battery life and processing speed, etc. Therefore, there is a growing need to automatically extract prominent aspects from user

Transcript of ExtRA: Extracting Prominent Review Aspects from Customer ... · examples of such aspect-based...

Page 1: ExtRA: Extracting Prominent Review Aspects from Customer ... · examples of such aspect-based review summarization [1] are shown in Fig. 1 and Fig. 2. In Fig. 1 from TripAdvisor,

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3477–3486Brussels, Belgium, October 31 - November 4, 2018. c©2018 Association for Computational Linguistics

3477

ExtRA: Extracting Prominent Review Aspects from Customer Feedback

Zhiyi Luo, Shanshan Huang, Frank F. XuBill Yuchen Lin, Hanyuan Shi, Kenny Q. ZhuShanghai Jiao Tong University, Shanghai, China

{jessherlock, huangss 33}@sjtu.edu.cn, [email protected]{yuchenlin, shihanyuan}@sjtu.edu.cn, [email protected]

AbstractMany existing systems for analyzing and sum-marizing customer reviews about products orservice are based on a number of prominentreview aspects. Conventionally, the promi-nent review aspects of a product type aredetermined manually. This costly approachcannot scale to large and cross-domain ser-vices such as Amazon.com, Taobao.com orYelp.com where there are a large number ofproduct types and new products emerge almosteveryday. In this paper, we propose a novelframework, for extracting the most prominentaspects of a given product type from textualreviews. The proposed framework, ExtRA,extracts K most prominent aspect terms orphrases which do not overlap semantically au-tomatically without supervision. Extensive ex-periments show that ExtRA is effective andachieves the state-of-the-art performance on adataset consisting of different product types.

1 Introduction

Online user review is an essential part of e-commerce. Popular e-commerce websites featurean enormous amount of text reviews, especiallyfor popular products and services. To improvethe user experience and expedite the shopping pro-cess, many websites provide qualitative and quan-titative analysis and summary of user reviews,which is typically organized by different promi-nent review aspects. For instance, Figure 1 showsa short review passage from a customer on Tri-pAdvisor.com, and the customer is also asked togive scores on several specific aspects of the hotel,such as location and cleanness. With aspect-basedreviews summary, potential customers can assessa product from various essential aspects very ef-ficiently and directly. Also, aspect-based reviewsummary offers an effective way to group prod-ucts by their prominent aspects and hence enablesquick comparison.

Figure 1: An example user review about a hotel on TripAd-visor. The grades are organized by different prominent reviewaspects: value, rooms, etc.

Existing approaches for producing such promi-nent aspect terms have been largely manualwork (Poria et al., 2014; Qiu et al., 2011). Thisis feasible for web services that only sell (or re-view) a small number of product types of the samedomain. For example, TripAdvisor.com only fea-tures travel-related products, and Cars.com onlyreviews automobiles, so that human annotatorscan provide appropriate aspect terms for cus-tomers based on their domain knowledge. Whileit is true that the human knowledge is useful incharacterizing a product type, such manual ap-proach does not scale well for general-purpose e-commerce platforms, such as Amazon, eBay, orYelp, which feature too many product types, notto mention that new product and service types areemerging everyday. In these cases, manually se-lecting and pre-defining aspect terms for each typeis too costly and even impractical.

Moreover, the key aspects of a product type mayalso change over time. For example, in the past,people care more about the screen size and signalintensity when reviewing cell phones. These as-pects are not so much of an issue in present days.People instead focus on battery life and processingspeed, etc. Therefore, there is a growing need toautomatically extract prominent aspects from user

Page 2: ExtRA: Extracting Prominent Review Aspects from Customer ... · examples of such aspect-based review summarization [1] are shown in Fig. 1 and Fig. 2. In Fig. 1 from TripAdvisor,

3478

reviews.A related but different task is aspect-based

opinion mining (Su et al., 2008; Zeng and Li,2013). Here techniques have been developedto automatically mine product-specific “opinionphrases” such as those shown in Figure 2. Inthis example, the most frequently mentioned opin-ion phrases about a phone model along with themention frequency are displayed. Their goal isto get the fine-grained opinion summary on pos-sibly overlapping aspects of a particular product.For example, “good looks” and “beautiful screen”both comments on the “appearance” aspect of thephone. However, these aspects are implicit andcan’t be used in aspect-based review summariza-tion directly. The main disadvantage of these opin-ion phrases is that their aspects differ from prod-uct to product, making it difficult to compare theproduct side by side.

fast system (196) long battery-life (193)

good design (236) high call-quality (163)

nice functions (181) good value (282)

Automatic Aspect Extraction from Online Reviewsby Multi-stage ClusteringShi Feng

Shanghai Jiao Tong UniversityShanghai, China

[email protected]

Kenny Q. ZhuShanghai Jiao Tong University

Shanghai, [email protected]

Abstract—One popular way of summarizing users opinionsabout a product or service is to grade it by a number of distinctaspects which can be shared by the same type of product orservice. Traditionally, the aspects for a product type are deter-mined manually but this doesn’t scale to large number of producttypes for large e-commerce platform such as amazon.com ortaobao.com. In this paper, we propose a unsupervised multistageclustering approach for automatically discovering the best aspectwords from massive amount of textual user reviews. This methodcan be applied to reviews of any product or service types. Ourexperiments showed that our approach is efficient and achievesthe state-of-the-art accuracies for a diverse set of products andservices.

I. INTRODUCTION

Online user review is an integral part of e-commerce.Popular e-commerce websites feature enormous amount oftext reviews, especially for popular products and services.To improve the user experience and expedite the shoppingprocess, many websites provides either qualitative or quanti-tative summary of the user reviews, organized by importantaspects or characteristics of the target product or service. Twoexamples of such aspect-based review summarization [1] areshown in Fig. 1 and Fig. 2. In Fig. 1 from TripAdvisor, besidesthe short review passage written by the user, the user are askedto give discrete ratings (on the scale of 1-5) on various aspectsof the hotel room, e.g., location and cleanness. The ratings ofa product from individual reviews can then be aggregated intoan overall ratings of the same product by many users, such asthose shown about a specific car model in Fig. 2, a snapshotfrom cars.com.

Aspect-based reviews have several advantages comparedto the more traditional review form that consists of a shortpassage and an overall rating. In aspect-based reviews, moredetails are provided quantitatively and more directly, and theusers can learn about various aspects of a product withouthaving to read the whole review passage. Another advantage ofaspect-based reviews is that different products within the samecategory can be compared directly with respect to multipleaspects, instead of just an overall rating. When researching onproducts, users spend most of their time comparing differentbrands and models. Aspect-based review summarization pro-vides a effective and efficient way for doing such comparison,saving the users both time and effort.

Fig. 1 User review from TripAdvisor.

Fig. 2 Review summarization from Cars.com.

At present, websites that offers aspect-based review sum-maries typically only features a single or small number ofproduct categories, e.g., TripAdvisor.com only features travelrelated products while car.com reviews automobiles. The rea-son is that it takes in-depth knowledge about the product toproduce a set of words that best characterize the product, bothin terms of the coverage and user interests. It is such a difficulttask that these aspects are mostly manually chosen by thewebsite operator. Manual selection of aspects certainly cannotscale to large number of product types as featured by gen-eral e-commerce platforms such as amazon.com, taobao.comand Yelp!. These platforms instead turn to automatic reviewsummarization, mined from the user review text.

looks good (462) beautiful screen (398)

nice functions (356) high resolution (872)

good camera (218) good value (628)

Fig. 3 Automatic review summarization for a mobile phonefrom an e-commerce website.

Figure 2: Automatic review summarization for two mobilephones on an e-commerce website

The goal of this paper is to develop an unsuper-vised framework for automatically extracting Kmost prominent, non-overlapping review aspectsfor a given type of product from user review texts.Developing such an unsupervised framework ischallenging for the following reasons:

• The extracted prominent aspects not onlyneed to cover as many customer concerns aspossible but also have little semantic overlap.

• The expression of user opinions is highlyversatile: aspect terms can be expressed ei-ther explicitly or implicitly. For example,the mention of “pocket” implies the aspect“size”.

• Product reviews are information rich. A shortpiece of comments may target multiple as-pects, so topics transit quickly from sentenceto sentence.

Most previous unsupervised approaches for theprominent aspect extraction task are variants of

topic modeling techniques (Lakkaraju et al., 2011;Lin and He, 2009; Wang et al., 2011a). The mainproblem of such approaches is that they typicallyuse only word frequency and co-occurrence infor-mation, and thus degrade when extracting aspectsfrom sentences that appear different on the surfacebut actually discuss similar aspects.

Given all review text about a certain producttype, our framework, ExtRA, extracts most promi-nent aspect terms in four main steps: first it ex-tracts potential aspect terms from text corpus bylexico-syntactic analysis; then it associates theterms to synsets in WordNet and induce a sub-graph that connect these terms together; after thatit ranks the aspect terms by a personalized pagerank algorithm on the sub-graph; and finally picksthe top K non-overlapping terms using the sub-sumption relation in the subgraph.

The main contributions in this paper are as fol-lows:

1. We propose a novel framework for extractingprominent aspects from customer review cor-pora (Section 2), and provide an evaluationdataset for future work in this research area.

2. Extensive experiments show that our unsu-pervised framework is effective and outper-forms the state-of-the-art methods by a sub-stantial margin (Section 3).

2 Framework

In this section, we first state the review aspect ex-traction problem, then present the workflow of ourmethod, shown in Figure 3.

2.1 Problem StatementThe review aspect extraction problem is given allthe text reviews about one type of product or ser-vice, extract K words (or phrases), each of whichrepresents a prominent and distinct review aspect.

For instance, if the given product type is ho-tel, we expect a successful extraction frameworkto extract K = 5 aspect terms as follows: room,location, staff, breakfast, pool.

2.2 Aspect Candidates ExtractionFollowing the observation of Liu (2004; 2015),we assume that aspect terms are nouns and nounphrases. First, we design a set of effective syn-tactic rules, which can be applied across domains,to collect the aspect candidates from review texts.

Page 3: ExtRA: Extracting Prominent Review Aspects from Customer ... · examples of such aspect-based review summarization [1] are shown in Fig. 1 and Fig. 2. In Fig. 1 from TripAdvisor,

3479

Itisagreatzoomlensamod

ID Sentence

(1) It'sagreatzoomlens,butit'stoomuchofariskformeforthatmuchmoney.

(2) TheonlyissuesthatIhavewiththecameraissomewhatslowerautofocus,anoisyshutterandcheaplenscover.

(3) Theshutter isquickandquiet.

Stage1:AspectCandidatesExtraction

Reviewcorpusfromdifferentdomains

nsubj

Step1:WordNetNounSynsetMatching

SyntacticRulePhraseMining(AutoPhrase)

amod

A

RawCorpus

Stage2:AspectTaxonomyConstruction

Cluster

lenslens_coverzoom_lensexposureshutter

….AspectCandidates

AspectCandidatesClustering

Contextfor

shutter

shutter.n.01

shutter.n.02✘

Step2:AspectTaxonomyExtractionfromWordNet

lens.n.01

device.n.01optical_device.n.01

AspectTaxonomy

shutter.n.01

lens

zoom_lensshutter

Stage3:AspectRankingAspectGraph&RandomWalk(PersonalizedPageRank)

lens.n.01

device.n.01optical_device.n.01

shutter.n.01

lenstelephoto_lens.n.01 zoom_lens

shutter

0.75

0.66

0.5

0.32

Teleport

Stage4:AspectGenerationTop-KAspects

Deduplication

RemoveTaxonomyOverlapping

lenstelephoto_lens

shutter….

FinalAspects

Theshutterisquickandquiet.conj

N-1 N

Ai AjN telephoto_lens.n.01

zoom_lenslens_cover

Figure 3: Overall framework.

We mainly use the adjectival modifier dependencyrelation (amod) and the nominal subject relation(nsubj) to extract the aspect-opinion pairs 〈N,A〉.In addition, we leverage the conjunction relation(conj) between adjectives to complement the ex-tracted pairs. Formally, the extraction rules can bespecified as follows:

Rule 1. If amod(N,A), then extract 〈N,A〉.

Rule 2. If nsubj(A,N), then extract 〈N,A〉.

Rule 3. If 〈N,Ai〉 and conj(Ai, Aj), then extract〈N,Aj〉.

In this case, N indicates a noun, and A (e.g.Ai, Aj) is an adjective. The dependencies (e.g.amod(N,A)) are expressed as rel(head, depen-dent), where rel is the dependency relation whichholds between head and dependent. Note thatmany aspects are expressed by phrases, thus, weextend the phrases as aspect candidates by intro-ducing the extension rules as follows:

Rule E1. If 〈N,A〉 and N−1 N ∈ P , then use〈N−1 N,A〉 to replace 〈N,A〉.

Rule E2. If 〈N,A〉 and N N+1 ∈ P , then use〈N N+1, A〉 to replace 〈N,A〉.

where N−1 and N+1 denotes the noun word, andthe subscript represents displacement to N in thesentence. We use AutoPhrase (Liu et al., 2017)to extract a set of phrases P with high coherence.Then we use P to filter out the incoherent phrasesso as to obtain the high-quality phrases as aspectcandidates. The example in Figure 3 (Stage 1)demonstrates the extraction process. For example,

we extract the pair 〈 great, zoom lens 〉 from sen-tence (1) by applying Rule 1 and Rule E1. Simi-larly, the extraction rules match 〈 slower, autofo-cus 〉, 〈 noisy, shutter 〉, 〈 cheap, lens cover 〉 insentence (2) and 〈 quick, shutter 〉, 〈 quiet, shut-ter 〉 in sentence (3) as potential aspect-opinionpairs. After extracting such pairs from the text re-views, we sort them by the number of occurrences,and extract the nouns and noun phrases in the toppairs as aspect candidates, assuming that the mostprominent aspects are subsumed by those candi-dates terms.

2.3 Aspect Taxonomy Construction

The aspect candidates extracted in the last stagecome with the counts modified by adjectives. Wecan directly use such raw counts to rank the as-pect candidates. This is one of the baseline modelsin our experiments. However, such ranking usu-ally suffers from the aspect overlapping problemwhich obviously violates the principle of pursuingboth coverage and distinctiveness of prominent as-pects. For example, given the number of promi-nent aspects K as 5, we can extract both of ‘lo-cation‘ and ‘place‘ aspects from the hotel reviews.In order to solve this problem, we construct an as-pect taxonomy to obtain such overlapping infor-mation between aspect candidates by leveragingthe WordNet ontology.

2.3.1 WordNet Synset Matching

First, we need to match our aspect candidates ontoWordNet synsets. The accuracy of synset match-ing is very important for our aspect taxonomyconstruction. This is actually a classical wordsense disambiguation (WSD) problem. Our ini-tial attempt is to use a Python WSD tool (Tan,

Page 4: ExtRA: Extracting Prominent Review Aspects from Customer ... · examples of such aspect-based review summarization [1] are shown in Fig. 1 and Fig. 2. In Fig. 1 from TripAdvisor,

3480

2014). For each aspect candidate, we take it asthe target and randomly sample a bunch of sen-tences that contain this target. We use the extendedword sense disambiguation algorithm (Banerjeeand Pedersen, 2003) in this tool. We count the to-tal occurrences for each noun sense (synset) of thecandidate and match the candidate to the most fre-quent synset. However, such a method is not goodenough for our problem, as shown in the resultslater. It only considers the local context informa-tion within the review sentence. Whats more, thereview sentences are usually very short and collo-quial, which makes it more difficult to match prop-erly by a common WSD algorithm. Therefore, itis critical to construct more reliable contexts foraspect candidate matching.

To achieve this goal, we cluster the aspect can-didates with similar semantics together. Then, foreach aspect candidate, we take the other candi-dates within the same cluster as its context forlater disambiguation. As shown in the first stepof stage 2 in Figure 3, the semantic similar aspectcandidates such as lens, lens cover, zoom lens, ex-posure and shutter are clustered together. For ex-ample, we can disambiguate the sense of shutterby leveraging lens, lens cover, zoom lens, and ex-posure. We observed that our aspect candidatescan be fine-grain clustered with a two-stage k-means clustering method,1 which generates thebetter context for the aspect candidates. Morespecifically, for a particular aspect candidate atfrom the cluster C = {a1, a2, ..., at, ..., an}, wecalculate the context vector of at as:

c(at) =n∑

i=1,i 6=tE(ai), (1)

where c(at) denotes the context vector of at, andE(ai) represents the embedding of ai. The set ofcandidate synsets S(at) = {st1, st2, ..., stm} con-sists of the noun senses (e.g. sti) of at from Word-Net. Each sense sti is associated with a gloss gti(i.e. a brief definition of sti) which covers the se-mantics of the sense. Therefore, we encode sti asthe summation of the word vectors in gti :

v(sti) =

q∑j=1

E(wt,ij ), (2)

W (gti) is the sequence of words in gti , i.e.,W (gti) = [wt,i1 , w

t,i2 , ..., w

t,iq ]. For each candidate

1The implementation details are in Section 3.2.

sense sti of the aspect candidate at, we calculatethe cosine semantic similarity between v(sti) andc(at), and match at to the most similar sti.

2.3.2 Aspect Taxonomy Extraction fromWordNet

In order to construct the aspect taxonomy fromWordNet, we first extract the hypernym paths forevery matched synsets in the previous step. Bydefinition, a hypernym path p of synset s is theis-a relation path from s to the root synset (i.e.entity.n.01 for nouns). We extract the hypernympaths for each matched synset si in the WordNetontology. Next, we scan over all the collectedpaths once to construct the aspect taxonomy whichis a directed acyclic graph (DAG). In p, s1 is thesynset matched from our potential aspects, andsi+1 is the hypernym of si. As shown in step 2of Stage 2 in Figure 3, we match the aspect can-didate shutter to shutter.n.01. The only one hy-pernym path of shutter.n.01 is [shutter.n.01, opti-cal device.n.01, device.n.01, ..., entity.n.01].

However, the matched synset usually has multi-ple hypernym paths in WordNet. We use the fol-lowing strategy to compact and minimize the as-pect taxonomy:

• Among all the paths from an aspect candidates1, we will keep those paths that contain morethan 1 aspect candidates, unless there’s onlyone path from s1. If all paths contain only 1aspect candidate s1 each, we will keep all ofthem.

• To further optimize the taxonomy structure,we induce a minimum subgraph from theoriginal taxonomy using a heuristic algo-rithm (Kou et al., 1981). Such a subgraphsatisfies the following conditions: 1) it con-tains all the nodes matched from aspect can-didates; 2) the total number of nodes in thegraph is minimal. Consequently, the inducedgraph is a weakly connected DAG.

After acquiring the aspect taxonomy for the givenproduct or service, we can now tell if two aspectsare semantically overlapped or not.

2.4 Aspect RankingIn this section, we propose a novel method basedon personalized page rank to compute the overallrank values for the potential aspects by leveragingthe aspect taxonomy.

Page 5: ExtRA: Extracting Prominent Review Aspects from Customer ... · examples of such aspect-based review summarization [1] are shown in Fig. 1 and Fig. 2. In Fig. 1 from TripAdvisor,

3481

Let the aspect taxonomy be a graph G =(V,E). Each node v ∈ V is a synset in the as-pect taxonomy and encoded as a vector by instan-tiating E as Glove embeddings in (2) . Each edgee = 〈u, v〉 carries a weight which is the semanticsimilarity between the nodes u and v, computedusing cosine similarity.

Next, we perform the random walks on our con-structed aspect taxonomy. The propagation startsfrom candidate aspect nodes in the aspect taxon-omy, which are called seeds here. The rank values(aspect importance) of all nodes are:

xt = (1− α) ∗Axt−1 + α ∗ E, (3)

where t is the time step in random walk process.In the initial state E(i.e. x0), the aspect impor-tance only distributes on the seeds (v ∈ Vb). Ei isthe i-th dimension of E, indicating the portion ofaspect importance on node si at time step 0. E iscalculated as follows:

Ei =

{f(le(si))∑n

j=1 f(le(sj)), if si is a seed

0 , otherwise,(4)

where n is the number of nodes in the graph, siis the synset node, le(si) denotes the lemma formof si, and f(le(si)) represents the frequency thatle(si) is modified by adjectives.

The aspect importance is updated using the tran-sition probabilities matrixAwhich are the normal-ized weights on the edges of the taxonomy. α isthe teleport probability, which is the probabilityof returning to the initial distribution at each timestep. α determines the distance of propagation ofthe taxonomy.

2.5 Aspect Generation

Finally, we generate the prominent aspects usingthe rank values of the aspects as well as the is-arelations in the aspect taxonomy.

We sort le(si) in decreasing order by their rankvalues. We essentially take the top aspects fromthe sorted list. However there might be two typesof overlapping that we need to avoid: i) duplicate:different synset nodes may map to the same as-pects, i.e., le(si) = le(sj), si 6= sj ( aspects); ii)taxonomy overlap: the later aspect in the list is thehypernym or hyponym of the one of previous as-pects. To this end, we just skip overlapped aspect,and move along the list until we generate K non-overlapping prominent aspects from the list.

3 Experiments

We compare the ExtRA framework with multiplestrong baselines on extracting aspect terms fromuser reviews. We first introduce the dataset andthe competing models, then show the quantitativeevaluation as well as qualitative analysis for dif-ferent models.

3.1 Dataset

We use the customer review corpora of 6 kinds ofproduct and service 2 collected from popular web-sites, including Amazon, TripAdvisor and Yelp.The number of hotel reviews (Wang et al., 2011b)in the original corpus is huge. Therefore, werandomly sample 20% of the reviews to performour experiments. The statistics of the corpora areshown in Table 1.

Table 1: Dataset statistics.

Product type Source #Reviewshotel TripAdvisor 3,155,765

mobile phone Amazon 185,980mp3 player Amazon 30,996

laptop Amazon 40,744cameras Amazon 471,113

restaurant Yelp 269,000

Existing published aspect extraction datasets(Hu and Liu, 2004; Popescu and Etzioni, 2007;Pavlopoulos and Androutsopoulos, 2014; Dinget al., 2008) include only fine-grained aspectsfrom reviews, which are not suitable for evalu-ating the performance of prominent aspects ex-traction. Therefore, we build a new evaluationdataset particularly for this task. Following theprevious work (Ganu et al., 2009; Brody and El-hadad, 2010; Zhao et al., 2010; Wang et al., 2015)as well as the popular commercial websites (e.g.TripAdvisor), which most manually labeled 3-6prominent aspects for rating, we set K as five.Therefore, we ask each annotator who are famil-iar with the domain to give 5 aspect terms whichthey think are most important for each category.We have five annotators in total. 3 One prominentaspect can be expressed by different terms. Thus,it is difficult to achieve a satisfied inner-agreement.We propose two evaluation methods, especiallythe soft accuracy in Section 3.3.1 to compensate

2The data is available from http://times.cs.uiuc.edu/˜wang296/Data/ and https://www.yelp.com/dataset

3The complete labeled set of ExtRA is released at http://adapt.seiee.sjtu.edu.cn/extra/.

Page 6: ExtRA: Extracting Prominent Review Aspects from Customer ... · examples of such aspect-based review summarization [1] are shown in Fig. 1 and Fig. 2. In Fig. 1 from TripAdvisor,

3482

this problem. To acquire a relatively higher inner-agreement, we educate the annotators with top 100frequent aspect candidates as hints. Though, theyare not required to pick up labels from the can-didates. The inter-annotator agreement of eachproduct type shown in Table 3 is computed as theaverage jaccard similarity between every two an-notators.

3.2 Baselines and ExtRA

We introduce three topic modeling based baselinesfor the task. These are LDA (Blei et al., 2003),BTM (Cheng et al., 2014) and MG-LDA (Titovand McDonald, 2008). MG-LDA is a strong base-line which attempts to capture multi-grain topics(i.e. global & local), where the local topics cor-respond to the rateable prominent aspects. Wetreat each review as a document and perform thosemodels to extract K topics. Then, we select mostprobable words in each topic as our extracted as-pect terms. To prevent extracting the same as-pects (w) from different topics, we only keep wfor the topic t with the highest probability p(w|t)value, then re-select aspects for the other topicsuntil we get K different aspects. For fair compar-ison among different models, the number of targetaspectsK is set as 5. The hyper-parameter of MG-LDA (global topics) is set to 30 with fine-tuning.

Another syntactic rule-based baseline modelAmodExt is from the first stage of our framework.After extracting the aspect candidates using amod-rule in Section 2.2, we sort the aspect candidatesby their counts of extracted occurrences. Then se-lect the topK candidates as the prominent aspects.

ABAE (He et al., 2017) is a neural based modelthat can to infer K aspect types. Each aspect typeis a ranked list of representative words. To gener-ate K prominent aspects, we first infer K aspecttypes using ABAE, then select the most represen-tative word from each aspect type.

For ExtRA, in the taxonomy construction stage,we use a two-stage K-means clustering methodfor synset matching task, and the cluster numberis auto-tuned using silhouette score (Rousseeuw,1987). We use SkipGram (Mikolov et al., 2013)model to train the embeddings on review textsfor k-means clustering. We set the dimensionof the embeddings as 100 and run 64 epochs foreach product corpora. In the aspect ranking stage,we empirically set the teleport probability α as0.5 which indicates that the expected walk-length

from the seeds is 1α = 2.

3.3 Evaluation

In this section, we compare ExtRA with five base-line models both quantitatively and qualitatively.

3.3.1 Quantitative EvaluationFirst, we perform two experiments to justify ouraspect taxonomy construction stage:

• To justify the synset matching step, we com-pare our proposed cluster method with classi-cal WSD algorithm (Lesk) on matching accu-racy. We manually label 100 sampled synsetnodes for each category. The synset match-ing accuracies are shown in Table 2. We cansee that our clustering method is effective forthe synset matching task.

• We induce the aspect taxonomy using aheuristic algorithm to obtain more compactand aspect-oriented subgraph. We show thesize of aspect taxonomy induced before andafter taxonomy minimization in Figure 4.

Next, we evaluate our model as well as abovebaselines on the evaluation dataset describedabove. We did not remove the duplicate aspectlabels for the qualitative evaluation, since the re-peated aspects are assume to be better. For agiven category, we first calculate the percentageof the 25 labels that exactly match one of the 5aspect terms generated by the model as the hardaccuracy of the model. Formally, Aspects(m) =[a1, a2, a3, a4, a5] denotes the five prominent as-pects generated from model m for the given cate-gory. L = [l1, l2, ..., l25] are the 25 golden aspectterms, where L(h) = [l5h−4, ..., l5h] are from theh-th human annotator. The hard accuracy is de-fined as:

hacc(m) =

∑25i=1 hit(Aspects(m), li)

25(5)

hit(Aspects(m), li) =

{1, li ∈ Aspects(m)

0, otherwise,(6)

However, counting the number of exact matchesmakes the accuracy score discrete and coarse. Be-sides, it penalizes aspect terms that don’t matchthe label but actually have similar meanings. To

Page 7: ExtRA: Extracting Prominent Review Aspects from Customer ... · examples of such aspect-based review summarization [1] are shown in Fig. 1 and Fig. 2. In Fig. 1 from TripAdvisor,

3483

Table 2: WordNet Synset matching accuracies

hotel mp3 cameras mobilephone laptop restaurant

LESK 0.71 0.59 0.62 0.64 0.53 0.65Cluster 0.86 0.83 0.74 0.80 0.78 0.69

Table 3: Inner-annotator agreements

hotel mp3 cameras mobilephone laptop restaurant

Jaccard 0.470 0.554 0.304 0.440 0.271 0.671

0

50

100

150

200

250

300

350

400

450

hotel mp3 cameras mobilephone

laptop restaurant

#nodes

before after

0

50

100

150

200

250

300

350

400

450

hotel mp3 cameras mobilephone

laptop restaurant

#edges

before after

Figure 4: Statistics of induced aspect taxonomy before and after taxonomy minimization

remedy this, we propose the soft accuracy eval-uation measure. For each set of five golden la-bels from h-th annotator, we first align each gen-erated aspect ak ∈ Aspects(m) with one goldenaspect lj ∈ L(h) (i.e. align(h)(ak) = lj).We align the exact match terms together, andthen choose the optimal alignment for the oth-ers by permuting all possible alignments. Theoptimal alignment align(h)(ak) acheives maxi-mum soft accuracy. Then we calculate the softmatching score between Aspects(m) and L(h) as∑K

k=1 sim(ak, align(h)(ak)), where sim is the

cosine similarity computed by Glove (2014) 4. Wethen compute the soft accuracy measure as fol-lows:

sacc(m) =1

5∗

5∑h=1

K∑k=1

sim(ak, align(h)(ak)),

(7)where K = 5 in this case. The comparison resultsare shown in Table 4.

Our model (ExtRA) outperforms all the otherbaselines in all categories except cameras usingthe hard accuracy measure. Besides, ExtRA is thebest model on four out of six products under thesoft accuracy measure. As shown in Table 2, theaccuracy for synset matching is relatively low for

4 We use the GloVe embeddings with 300 dimensions,trained from 840B tokens using common crawl data.

Table 4: Comparison of hard (upper row) & soft (lower row)accuracies using different models for aspect extraction.

LDA BTM MG-LDA ABAE AmodExt ExtRA

hotel 0.16 0.16 0.16 0.16 0.44 0.560.50 0.49 0.67 0.35 0.65 0.70

mp3 0.0 0.08 0.08 0.0 0.35 0.440.47 0.49 0.47 0.32 0.58 0.60

camera 0.24 0.40 0.28 0.04 0.04 0.320.56 0.69 0.54 0.29 0.41 0.55

mobilephone

0.16 0.0 0.28 0.0 0.52 0.600.58 0.33 0.58 0.31 0.73 0.71

laptop 0.08 0.24 0.24 0.0 0.24 0.280.40 0.50 0.50 0.22 0.51 0.53

restaurant 0.20 0.0 0.0 0.0 0.56 0.560.49 0.38 0.42 0.29 0.77 0.72

cameras and restaurant, resulting in the lower ac-curacy in overall aspect extraction.

3.3.2 Qualitative Analysis

To qualitatively evaluate different models, wepresent the extracted 5 aspect terms by each modelfrom each domain in Table 5. Our model (ExtRA)has significant advantage over other baselines forthat we can do better aspect extraction with rea-sonable results, and extract not only words but alsophrases as prominent aspects, e.g. sound qual-ity, image quality. The proposed model avoid theoverlapping aspects appeared in our strong base-line (AmodExt) by deduplication using generatedaspect taxonomy information. The overlapping as-pects are marked in italics. For example, both lo-

Page 8: ExtRA: Extracting Prominent Review Aspects from Customer ... · examples of such aspect-based review summarization [1] are shown in Fig. 1 and Fig. 2. In Fig. 1 from TripAdvisor,

3484

Table 5: The five prominent aspect terms

hotel

LDA room, pool, stay, good, niceBTM walk, good, room, stay, check

MGLDA room, stay, good, location, staffABAE shouted, room, terrific, accommodation, alexan-

derplatzAmodExt room, location, place, view, staffExtRA room, location, view, staff, service

mp3

LDA work, great, good, music, ipodBTM battery, ipod, work, song, good

MGLDA battery, ipod, music, song, goodABAE documentation, content, portability, bought, ta-

bleAmodExt drive, quality, sound, feature, deviceExtRA drive, sound quality, feature, screen, software

cameras

LDA lens, picture, buy, video, modeBTM battery, picture, function, lens, good

MGLDA battery, picture, good, mpcture, modeABAE toy, picture, mailed, ultrazoom, sharpnessAmodExt picture, photo, quality, feature, shotExtRA image quality, photograph, feature, shot, lens

mobilephone

LDA battery, buy, good, apps, workBTM core, good, work, para, apps

MGLDA work, battery, screen, good, cardABAE cracked, amazing, continuously, archive, boughtAmodExt feature, screen, price, camera, qualityExtRA feature, price, screen, quality, service

laptop

LDA screen, good, buy, drive, chromebookBTM windows, screen, work, drive, good

MGLDA windows, battery, screen, good, yearABAE salign, returned, affordable, downloads, positionAmodExt drive, machine, price, screen, lifeExtRA drive, price, screen, deal, performance

restaurant

LDA food, good, room, time, greatBTM good, room, pour, time, order

MGLDA great, good, place, time, makeABAE jones, polite, told, chickpea, placeAmodExt food, service, place, experience, priceExtRA service, food, experience, company, price

cation and place are extracted as top aspects, butthey mean nearly the same concept. The resultsfrom other baseline methods, inevitably containsome sentiment words and opinions, like good,nice, great, etc. Our model resolves such draw-back by extracting aspect candidates from onlynouns and using syntactic rules to find words thatare frequently modified by adjectives.

4 Related Work

Existing research on aspect-based review analy-sis has focused on mining opinion based on givenaspects (Su et al., 2008; Zeng and Li, 2013) orjointly extracting the aspects and sentiment (Linand He, 2009; Zhao et al., 2010; Qiu et al., 2011;Wang et al., 2015; Liu et al., 2016). They aremostly interested in detecting aspect words in agiven sentence, whereas our goal is to extract themost prominent aspects of a type of product froma large number of reviews about that product type.We divide the existing work on review aspect ex-traction into three types:

• rule-based methods, most of which utilizehandcrafted rules to extract candidate aspectsand then perform clustering algorithm on

them.

• topic modeling based methods, which di-rectly model topics from texts and then ex-tract aspects from the topics.

• neural network based methods, which takesadvantage of the recent deep neural networkmodels.

4.1 Rule-based Methods

These methods leverage word statistical and syn-tactic features to manually design rules, recog-nizing aspect candidates from texts. Poria et al.(2014) use manually crafted mining rules. Qiu etal. (2011) also used rules, plus the Double Propa-gation method to better relate sentiment to aspects.Gindl et al. (2013) cooperate the Double Prop-agation with anaphora resolution for identifyingco-references to improve the accuracy. Su et al.(2008) used a clustering method to map the im-plicit aspect candidates (which were assumed tobe the noun form of adjectives in the paper) toexplicit aspects. Zeng et al. (2013) mapped im-plicit features to explicit features using a set ofsentiment words and by clustering explicit feature-sentiment pairs. Rana et al. (2017) propose a two-fold rules-based model, using rules defined by se-quential patterns. Their first fold extracts aspectsassociated with domain independent opinions andthe second fold extracts aspects associated withdomain dependent opinions.

However, such rule-based models are designedfor extracting product features which can not eas-ily adapt to our K most prominent aspect extrac-tion problem. Besides, most of them require hu-man efforts to collect lexicons and to carefully de-sign complex rules and thus do not scale very well.

4.2 Topic Modeling Based Methods

Most work in this domain are based on two ba-sic models, pLSA(Hofmann, 1999) and LDA(Bleiet al., 2003). The variants of these models considertwo special features of review texts: 1) topics shiftquickly between sentences, 2) sentiment plays animportant role and there is a strong correlation be-tween sentiments and aspects. The approach ofLin et al. (2011) models are parallel aspects andsentiments per review. Lin et al. (2009) models thedependency between the latent aspects and ratings.Wang et al. (2011a) proposed a generative modelwhich incorporates topic modeling technique into

Page 9: ExtRA: Extracting Prominent Review Aspects from Customer ... · examples of such aspect-based review summarization [1] are shown in Fig. 1 and Fig. 2. In Fig. 1 from TripAdvisor,

3485

the latent rating regression model (Wang et al.,2010). Moghaddam et al. (2012) made a nicesummarization of some basic variations of LDAfor opinion mining. In stead of using topics,our method relies on word embeddings to cap-ture the latent semantics of words and phrases andachieves better results. MG-LDA (Titov and Mc-Donald, 2008) is a variant of LDA that can alsomodel topics at different granularities, which arebased on extensions to standard topic modelingmethods such as LDA and PLSA to induce multi-grain topics. D-PLDA (Moghaddam and Ester,2012), is a variant of LDA models, which is de-signed specifically for modeling topics from userreviews. D-PLDA only considers opinion-relatedterms and phrases, and nouns and phrases are con-trolled by two separate hidden parameters. Thus,the model needs aspects, ratings, and phrases asinput, which are all very expensive.

4.3 Neural Network Based Methods

He et al. (2017) propose a neural attention modelfor identifying aspect terms. Their goal is simi-lar to ours but instead of directly comparing theirextracted terms with the gold standard, they askhuman judges to map the extracted terms to oneof the prominent gold aspects manually beforecomputing the precision/recall. This evaluationmethodology mixed machine results with humanjudgment and is problematic in our opinion. Ourexperiments showed that their output aspects aretoo fine-grained and can not be used as prominentaspects.

5 Conclusion

In this paper, we propose an unsupervised frame-work ExtRA for extracting the most prominentaspect terms about a type of product or servicefrom user reviews, which benefits both qualitativeand quantitative aspect-based review summariza-tion. Using WordNet as a backbone, and by run-ning personalized page rank on the network, wecan produce aspect terms that are both importantand non-overlapping. Results show that this ap-proach is more effective than a number of otherstrong baselines.

Acknowledgment

Kenny Q. Zhu is the contact author and was sup-ported by NSFC grants 91646205 and 61373031.

Thanks to the anonymous reviewers for their valu-able feedback.

ReferencesSatanjeev Banerjee and Ted Pedersen. 2003. Extended

gloss overlaps as a measure of semantic relatedness.In Ijcai, volume 3, pages 805–810.

David M. Blei, Andrew Y. Ng, and Michael I. Jordan.2003. Latent dirichlet allocation. Journal of Ma-chine Learning Research.

Samuel Brody and Noemie Elhadad. 2010. An unsu-pervised aspect-sentiment model for online reviews.In Human Language Technologies: The 2010 An-nual Conference of the North American Chapterof the Association for Computational Linguistics,pages 804–812. Association for Computational Lin-guistics.

Xueqi Cheng, Xiaohui Yan, Yanyan Lan, and JiafengGuo. 2014. Btm: Topic modeling over short texts.TKDE.

Xiaowen Ding, Bing Liu, and Philip S Yu. 2008. Aholistic lexicon-based approach to opinion mining.In Proc. WSDM.

Gayatree Ganu, Noemie Elhadad, and Amelie Marian.2009. Beyond the stars: improving rating predic-tions using review text content. In WebDB, vol-ume 9, pages 1–6. Citeseer.

Stefan Gindl, Albert Weichselbraun, and Arno Scharl.2013. Rule-based opinion target and aspect extrac-tion to acquire affective knowledge. In Proc. ofWWW.

Ruidan He, Wee Sun Lee, Hwee Tou Ng, and DanielDahlmeier. 2017. An unsupervised neural attentionmodel for aspect extraction. In ACL.

Thomas Hofmann. 1999. Probabilistic latent semanticindexing. In Proc. of SIGIR.

Minqing Hu and Bing Liu. 2004. Mining and summa-rizing customer reviews. In Proc. of KDD.

L Kou, George Markowsky, and Leonard Berman.1981. A fast algorithm for steiner trees. Acta in-formatica.

Himabindu Lakkaraju, Chiranjib Bhattacharyya, Indra-jit Bhattacharya, and Srujana Merugu. 2011. Ex-ploiting coherence for the simultaneous discovery oflatent facets and associated sentiments. In SDM.

Chenghua Lin and Yulan He. 2009. Joint senti-ment/topic model for sentiment analysis. In Proc.of CIKM.

Jialu Liu, Jingbo Shang, and Jiawei Han. 2017. Phrasemining from massive text and its applications. Syn-thesis Lectures on Data Mining and Knowledge Dis-covery.

Page 10: ExtRA: Extracting Prominent Review Aspects from Customer ... · examples of such aspect-based review summarization [1] are shown in Fig. 1 and Fig. 2. In Fig. 1 from TripAdvisor,

3486

Qian Liu, Zhiqiang Gao, Bing Liu, and Yuanlin Zhang.2015. Automated rule selection for aspect extrac-tion in opinion mining. In IJCAI, volume 15, pages1291–1297.

Qian Liu, Bing Liu, Yuanlin Zhang, Doo Soon Kim,and Zhiqiang Gao. 2016. Improving opinion aspectextraction using semantic similarity and aspect asso-ciations.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S.Corrado, and Jeffrey Dean. 2013. Distributed repre-sentations of words and phrases and their composi-tionality. In Proc. of NIPS.

Samaneh Moghaddam and Martin Ester. 2012. On thedesign of lda models for aspect-based opinion min-ing. In Proc. of CIKM.

John Pavlopoulos and Ion Androutsopoulos. 2014. As-pect term extraction for sentiment analysis: Newdatasets, new evaluation measures and an improvedunsupervised method. Proceedings of LASMEACL.

Jeffrey Pennington, Richard Socher, and Christopher DManning. 2014. Glove: Global vectors for wordrepresentation. In Proc. of EMNLP.

Ana-Maria Popescu and Orena Etzioni. 2007. Extract-ing product features and opinions from reviews. InNatural language processing and text mining.

Soujanya Poria, Erik Cambria, Lun-Wei Ku, Chen Gui,and Alexander Gelbukh. 2014. A rule-based ap-proach to aspect extraction from product reviews.In Proc. of the Workshop on Natural Language Pro-cessing for Social Media.

Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen.2011. Opinion word expansion and target extractionthrough double propagation. CL.

Toqir A Rana and Yu-N Cheah. 2017. A two-fold rule-based model for aspect extraction. Expert Systemswith Applications.

Peter J Rousseeuw. 1987. Silhouettes: a graphical aidto the interpretation and validation of cluster anal-ysis. Journal of computational and applied mathe-matics, 20:53–65.

Qi Su, Xinying Xu, Honglei Guo, Zhili Guo, Xian Wu,Xiaoxun Zhang, Bin Swen, and Zhong Su. 2008.Hidden sentiment association in chinese web opin-ion mining. In Proc. of WWW.

Liling Tan. 2014. Pywsd: Python implementationsof word sense disambiguation (wsd) technologies[software].

Ivan Titov and Ryan McDonald. 2008. Modelingonline reviews with multi-grain topic models. InWWW.

Hongning Wang, Yue Lu, and Chengxiang Zhai. 2010.Latent aspect rating analysis on review text data: arating regression approach. In Proc. of KDD.

Hongning Wang, Yue Lu, and ChengXiang Zhai.2011a. Latent aspect rating analysis without aspectkeyword supervision. In Proc. of KDD.

Hongning Wang, Chi Wang, ChengXiang Zhai, and Ji-awei Han. 2011b. Learning online discussion struc-tures by conditional random fields. In SIGIR.

Linlin Wang, Kang Liu, Zhu Cao, Jun Zhao, and Ger-ard de Melo. 2015. Sentiment-aspect extractionbased on restricted boltzmann machines. In Pro-ceedings of the 53rd Annual Meeting of the Associ-ation for Computational Linguistics and the 7th In-ternational Joint Conference on Natural LanguageProcessing (Volume 1: Long Papers), volume 1,pages 616–625.

Lingwei Zeng and Fang Li. 2013. A classification-based approach for implicit feature identification.In Chinese Computational Linguistics and NaturalLanguage Processing Based on Naturally AnnotatedBig Data.

Wayne Xin Zhao, Jing Jiang, Hongfei Yan, and Xiaom-ing Li. 2010. Jointly modeling aspects and opin-ions with a maxent-lda hybrid. In Proceedings of the2010 Conference on Empirical Methods in NaturalLanguage Processing, pages 56–65. Association forComputational Linguistics.