1
Sentiment Analysis SymposiumSentiment & Triangulation
© Anderson Analytics LLC. All Rights Reserved
2
Why Now: Difference in Technology
• Different AI Levels of Understanding Text Data
Level 1 2 3 4 5
Key function
Word Count (including inflected forms)
Grouping of synonyms
Word Association
Grouping of related terms
Detecting Positive/ negative sentiment
Meaning in context Implication
Output example
Bed=2
Room=5
Wine=6
Great= (fantastic, excellent, wonderful)
Dirty=(filthy, smelly, dirty)
Furniture=(chair, table, couch)
Food=(bread, shrimp
Furniture+<positive>
Food+<negative>
People talking about their dining experience
People talk about how the dining experience relate to their overall vacation experience
Accuracy
Machine more accuratethan human
Human more accuratethan machine
Past Present
3
Advantage
Machine Coding Human Coding
Diagram Copyright © Anderson Analytics, LLC
iterative
Difficulty in modifying code book
Inter-coder reliability issues
Similar surveys can be coded easily
Text data
4
QualitativeIdentified Concepts
Text MiningIdentified Concepts
Universe of text data in a study
ExtremeOutliers
Qualitative analysis only accounts for a small sample of the available data set. Concept proportionality, importance and relevance can get distorted. Extreme outliers might be overlooked.
Text mining accounts for most of the data. Extraction of concepts and categorization of data are more accurate. Extreme outliners can be identified.
AA Text Mining vs Qualitative
5
Data Mining/VisualizationNeural Nets,
Factoring, Clustering,Logistic Regression…
I. Quantitative
TriangulatedValidation
III. QualitativeII. Psychological
Text Mining (non a priori)
Random Sample (a priori) Review/ConfirmationPsychological Measures
Review/ConfirmationVerbatim Concepts and Themes
Validation Through Triangulation
6
Copyright 2005, SPSS Inc.
An Unexplored Opportunity:Listening to the “The Voice of a Million Customers”
•About 750 properties; 300,000 rooms; 82 countries
•6 Major Brands
•1 Million Surveys Analyzed each year
• Current Database 5+ million records
“Good…”
“Service…”
“…Bad…”
“…Bathroom…”
“Bed…”
“Not Clean…”
“…Reservation”
“…Not Working”
“Disappointed…”
“…Management”
“…Check-In”
“…Charge”
“Excellent…”
“Loud…” “Not Acceptable…”
“…Not Friendly”
7
Listening to the “The Voice of a Million Customers”
“…Check-In”
“Good…”
“Not Clean…”
“…Not Working”
“Disappointed…”
“Excellent…”“Loud…”
“…Not Friendly”
“…Management”
“…Charge”
*For Example Only/Concepts Disguised
8
About Your Customers
• Visualizing Data (100+posts/user)– Data flows like a river, Data has shape– Network Chart
Value to Starwood Hotelsand Hospitality Industry
Starwood Hotels and Resorts was delighted participate in this text mining project. Understanding the key words that drive verbal satisfaction could provide another important tool for General Managers to ensure that a guest's stay is a great one. Being better able to judge how satisfied a guest is while they are still at the hotel provides another opportunity to make the guest's experience a positive one, which is the most important factor in the decision to return to the hotel and ultimately to drive true preference for our brands.
Rebecca GillanVP, Global Market Research and Guest SatisfactionStarwood Hotels and Resorts Worldwide, Inc.
10
The Future…
11
• 2008 Linked In Study• LinkedIn Database vs. Profile Text vs. Member Survey
• Sampling:– Panelists vs. SNS Members Lower Income AND Lower Seniority– However, willing to take relevant studies through network
• Text Mining– Able to Predict Income AND Purchasing Power– Predict, keep short, ask fewer questions
• “Headline”• Title• Schools• Companies• Connections• ….
Text Mine(Sample & Predict)
12
Most Used Terms in User Headlines (and their monetary value)
Income Purchase Power
Title Rank Mean Rank Mean
vp 1 $190,000 3 $200,250
advertising 2 $187,500 4 $175,000
contractor 3 $150,000 5 $154,375
chief__officers 4 $145,455 1 $252,262
partner 5 $126,429 25 $54,500
executive 6 $121,094 15 $99,444
owner 7 $118,625 21 $73,698
sales 8 $118,000 24 $57,759
marketing 9 $116,667 12 $105,375
consultant 10 $116,486 29 $40,227
director 11 $115,330 6 $137,712
financial 12 $113,636 14 $99,900
senior 13 $111,116 17 $89,515
operations 14 $103,125 18 $88,750
technology 15 $99,286 8 $127,500
manager 16 $99,042 11 $108,601
computer 17 $97,500 34 $13,750
engineer 18 $92,857 27 $49,528
software 19 $91,912 32 $28,646
services 20 $88,226 23 $58,882
information 21 $87,500 2 $212,500
associates 22 $87,083 16 $95,429
human resources 23 $85,833 22 $61,042
analyst 24 $83,594 20 $81,447
development 25 $83,462 33 $15,735
professional 26 $78,421 26 $53,301
assistant 27 $77,344 9 $116,406
account 28 $77,206 28 $42,105
program 29 $70,833 7 $128,056
medical 30 $66,667 13 $104,444
attorney 31 $66,250 30 $36,250
real_estate 32 $65,625 19 $84,722
designer 33 $65,625 31 $30,417
health 34 $63,824 10 $114,211
teacher 35 $45,833 36 $1,667
student 36 $30,441 35 $3,977
• Teacher makes $46KSpending Power $1.6K
• Student makes $30KSpending power $4K
13
• First Sample n=53,873 records.
• Original Seed (1,000 US + 1,000 ROW) + First Level Connections (Approx. 30 Connections Per Seed)
• Zooming in to explore micro level networks on LinkedIn using Clementine Web Charting (n=5,000)
Visualizing Social Networks On LinkedIn
1st Level ConnectionOriginal Seed
14
Visualizing Social Networks On LinkedIn
Seed A(22 Connections)
Connection C
Seed B(213 Connections)
1st Level ConnectionOriginal Seed
15
The Future
16
17
LI Interests/Purpose
Work Situation*
Purchase Behavior*
Use of LinkedIn
*Variables NOT used in clustering
LinkedIn Segments – Important Variables (Neural Net)
18
The Future - SNS
19
The Future - SNS
Source: Anderson Analytics April 2009
Top Related