Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2....

27
1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5. Discussions FEC - Hoa Sen University Exploring customer sentiment from Tripadvisor’s online reviews Hung Nguyen 1 , Thang Phan 1 , Thanh Le 1 and Su Nguyen 2 1 Hoa Sen University 2 Victoria University of Wellington, NZ Research Project, 2017 HCM 2017 Research Project 1 / 27

Transcript of Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2....

Page 1: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Exploring customer sentiment from Tripadvisor’sonline reviews

Hung Nguyen1, Thang Phan1, Thanh Le1 and Su Nguyen2

1Hoa Sen University2Victoria University of Wellington, NZ

Research Project, 2017

HCM 2017 Research Project 1 / 27

Page 2: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Outline

1. Introduction

2. Literature review

3. Dataset and experiment procedure

4. Preliminary result

5. Discussions

HCM 2017 Research Project 2 / 27

Page 3: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Hospitality media

ONLINE REVIEW & BOOKING93% of global travelers say their booking decisions

are impacted by online reviews.91% of global hotels say reviews are important for

bookings.The TripBarometer by TripAdvisor is based upon an online survey conducted inDec. 2012-Jan. 2013. A total of 35,042 people participated in the online surveyfrom 26 countries spanning 7 regions. The sample is made up of 15,595consumers and 19,447 businesses, making it the world’s largest combinedaccommodation and traveler survey.

HCM 2017 Research Project 3 / 27

Page 4: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Hospitality media

Study byMarket Metrix. Hotel & Motel Management, January 13, 2010 showed that:

Customer feedback is an integral part of the continuous improvementprocess implemented in the hotel industry and yet a comprehensivecharacterization of the customer experience is difficult to achieve.

HCM 2017 Research Project 4 / 27

Page 5: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Web scraper

How we come up with ideas?Web Scraper: User-Generated-Content (UGC) sites such as TripAdvisor,Bookings, AirBnb provide us huge amount of data including:reviewer’s rating metrics - given by site’s managers andreviewers opinions - given by users comments.

HCM 2017 Research Project 5 / 27

Page 6: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Data mining

Rapid Miner: Rapid Miner give us a starting point to come together tothink about applying data mining and machine learning techniques inresearch.

HCM 2017 Research Project 6 / 27

Page 7: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Why "Exploring customer sentiment from Tripadvisor’s onlinereviews"?

We want to have a mechanism that continuously improve serviceoperations and strategies by identifying potential issues and understandingcustomer behaviors; help travel portals provide useful information (betteranalysis and better reviews) and insights to travelers.

HCM 2017 Research Project 7 / 27

Page 8: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Research Objectives

1. Find which are aspects/features of a hotel that travelers are talkingabout.2. Measure traveler’s sentiment toward different aspects of hotels inVietnam.3. Create summation and find the emerging opinions of customers towardsdifferent aspects.

HCM 2017 Research Project 8 / 27

Page 9: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Terminology

• Corpus/Copora

• Aspects/Features

• Sentiment

• Tokenization

• Lemmatization

• Part-Of-Speech (POS) tagging

• Summation

HCM 2017 Research Project 9 / 27

Page 10: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Travelers comments on TripAdvisor

HCM 2017 Research Project 10 / 27

Page 11: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Summation

Figure: Summation in Topic Modeling. Kim et. al. 2011

HCM 2017 Research Project 11 / 27

Page 12: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

2. Literature review

2.1 Mining aspects/featuresThere are basically two techniques for discovering aspects/features incorpora:

• Symbolic approaches that rely on syntactic description of terms,namely noun phrases (Agrawal et. al. 1994; Liu et. al. 1998; Justesonet. al. 1995; Jacquemin et. al. 2001; Hu et. al. 2004; OPINEAna-Maria et. al. 2005)

• Statistical approaches that exploiting the fact that the wordscomposing a term tend to be found close to each other andreoccurring (Church and Hanks 1990; Daille 1996; Qiu et. al. 2009).

HCM 2017 Research Project 12 / 27

Page 13: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

2. Literature review

2.2 Sentiment classificationIn this regard, we have two forms: Document Sentiment Classification andAspect-based sentiment analysis.1st form classifies an opinion document (e.g., a product review) asexpressing a positive or negative opinion or sentiment. The task is alsocommonly known as the document-level sentiment classification.

• Supervised learning: Pang et al. 2002 took Bayesian classification andSVM to classify movie reviews into two classes, positive and negative.Tan et al. 2008 and Qiu et. al. 2009 used opinion words to label aportion of informative examples. Melville et al. 2009 incorporatedlexical knowledge in supervised learning to enhance accuracy.

• Unsupervised learning: Turney et. al. 2002 uses known opinion wordsfor classification; David Blei and Andrew Ng 2003 use LatentDirichlet Allocation (LDA); Taboada et. al. 2010 defines somephrases which are likely to be opinionated.

HCM 2017 Research Project 13 / 27

Page 14: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

2. Literature review

• For aspect-based sentiment analysis we have the pioneer work of Huand Liu in 2004 that introduced more interesting problem ofaspect-based sentiment analysis, where polarity is not assigned tosentences or documents, but to single aspects discussed in them.

• Nishikawa et. al 2010; Paul et. al. 2010; Tata et. al. 2010 also tried toproduced more readable and quantitative way of text summaries.

HCM 2017 Research Project 14 / 27

Page 15: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

3. Dataset

1st part of dataset:

• 403/410 hotels in Ho Chi Minh city having review in English.

• 58,381 travelers giving review. There are 5 travel types: couple, solo,on business, with friends and with family.

• 74,238 reviews.

• 587,073 sentences.

HCM 2017 Research Project 15 / 27

Page 16: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

3. Dataset and experiment procedure

HCM 2017 Research Project 16 / 27

Page 17: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

3. General framework

General framework:

1 Data collection from TripAdvisor

2 Text preprocessing• stopwords removing• tokenization• lemmatization• POS

3 Topic/Aspect identification• Noun/Noun phrase extraction• clustering techniques

4 Sentiment analysis

HCM 2017 Research Project 17 / 27

Page 18: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

4. Preliminary result

• Aspects/Features identification: LDA, Noun/Noun phrase mining,Cluster analysis.

• Aspect-based sentiment analysis:

HCM 2017 Research Project 18 / 27

Page 19: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Aspects/Features identification - LDA

Latent Dirichlet allocation (LDA) is a generative probabilistic model of acorpus. The basic idea is that documents are represented as randommixtures over latent topics, where each topic is characterized by adistribution over words. (Blei et. al. 2003)

HCM 2017 Research Project 19 / 27

Page 20: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Aspects/Features identification - LDA

We then can infer back the distribution of topics that a specific review canbelong to:

HCM 2017 Research Project 20 / 27

Page 21: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Aspects/Features identification - Noun/Noun Phrase mining

With P.O.S tagging we can find all the noun/noun phrase in our corpus.Then by setting up a pruning level, we can have a set of seed aspects:

HCM 2017 Research Project 21 / 27

Page 22: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Aspect-based sentiment analysis

Example:"From the moment you enter the lobby you are amazed with the design of thehotel. So much gold and colors and crystals. Design is very specific and loud andeither you like it or not. I loved it as it made me feel like Alice in wonderland.Choosing the room be sure tu pick the higher floors because the view on the riveris great. Rooms are spacious with comfortable beds, large bathrooms and all youmight need in them. Breakfast is excellent. Try Pho soup if you would like to trysomething local and delicious. The service is impeccable. The only thing that Iwouldn’t recommend is to take the city tour recommended by concierge becausethis was the worst city tour we ever had. The guide never took us to placesmentioned in a tour description. Left us with some lousy excuse in a cafe wedidn’t want, just so the time for his tour would pass and when we asked him toshow us a nice shop where we can buy some quality souvenirs, not just the samecheap stuf that you can buy on every corner, he took us to the shop next door tothe hotel. "

HCM 2017 Research Project 22 / 27

Page 23: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Aspect-based sentiment analysis

• Choosing the room be sure to pick the higher floors because the viewon the river is great.Topic ID: [17] Sentiment: Sentiment(polarity=0.516,subjectivity=0.712)

• Rooms are spacious with comfortable beds, large bathrooms and allyou might need in them.Topic ID: [4] Sentiment: Sentiment(polarity=0.307,subjectivity=0.614)

• Breakfast is excellent.Topic ID: [19] Sentiment: Sentiment(polarity=1.0, subjectivity=1.0)

HCM 2017 Research Project 23 / 27

Page 24: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Aspect-based sentiment analysis

Topic 17:service 1.03286882763staff 0.21220842169customer 0.0581641926689location 0.0528627688579food 0.051651014844breakfast 0.0478642835504level 0.0431687367464quality 0.0302938503484excellent 0.0299909118449concierge 0.0280218115723

HCM 2017 Research Project 24 / 27

Page 25: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Aspect-based sentiment analysis

Topic 4:room 1.0staff 0.072922619728bathroom 0.0472053949023book 0.0418905017716breakfast 0.037718596411window 0.0365184592525size 0.0320036575609club 0.0281746485313pool 0.0234312492856location 0.0232598011201

HCM 2017 Research Project 25 / 27

Page 26: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

Discussions

HCM 2017 Research Project 26 / 27

Page 27: Exploring customer sentiment from Tripadvisor's online reviews · 2017-06-26 · 1. Introduction 2. Literature review 3. Dataset and experiment procedure 4. Preliminary result 5.

1. Introduction2. Literature review

3. Dataset and experiment procedure4. Preliminary result

5. DiscussionsFEC - Hoa Sen University

References

[1] Turney, P. Thumbs up or thumbs down?: semantic orientation applied tounsupervised classification of reviews. In Proceedings of Annual Meeting of theAssociation for Computational Linguistics (ACL- 2002), 2002.

[2] Hu M., Liu B., Mining and summarizing customer reviews. In Proceedings ofthe tenth ACM SIGKDD International Conference on Knowledge Discovery andData Mining, pages 168–177. ACM, 2004.

[3] Qiu G., Liu B., Bu J., Chen C., Expanding domain sentiment lexicon throughdouble propagation. In Proceedings of the Twenty-First International JointConference on Artificial Intelligence, volume 9, pages 1199–1204, 2009.

[4] Paltoglou G., Thelwall M., A study of information retrieval weighting schemesfor sentiment analysis. In Proceedings of the 48th Annual Meeting of theAssociation for Computational Linguistics, pages 1386–1395, 2010.

[5] Jo Y., Oh A. H., Aspect and sentiment unification model for online reviewanalysis. In Proceedings of the fourth ACM international conference on Websearch and data mining, pages 815–824. ACM, 2011.

HCM 2017 Research Project 27 / 27