807 - TEXT ANALYTICS Massimo Poesio Lecture 7: Wikipedia for Text Analytics.
Introduction to Text Analytics for Customer Insights
-
Upload
ryan-stuart -
Category
Technology
-
view
305 -
download
1
Transcript of Introduction to Text Analytics for Customer Insights
![Page 1: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/1.jpg)
Getting started with and realising ROI on Text Analytics
Ryan Stuart Founder & CTO
![Page 2: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/2.jpg)
Who am I?
• Software Engineer (previously?) • Founder & CTO of Kapiche • Work in the Text Analytics industry
since 2008. • Interests: Distributed Computing,
Database Design, Machine Learning.
@rstuart85 / @Kapiche Official
rstuart85 / Kapiche
![Page 3: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/3.jpg)
Raise your hand if you are a….. • Engineer / Developer / Technical; • Data Scientist; • Academic; • Market Researcher; • Statistician; • Have “analyst” or “risk” in your job title; or • Other;
Who are you?
![Page 4: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/4.jpg)
• Overview of Text Analytics – What is it? – Different Types
• Who are Kapiche? • Solving Business Problems with Text Analytics – Automation – Enterprise Search – Voice of the Customer (with demo) – Machine Learning
• Resources
Overview
![Page 5: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/5.jpg)
Overview of Text Analytics
![Page 6: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/6.jpg)
“…the process of analyzing unstructured text, extracting relevant information, and transforming it into useful business intelligence.”
What is Text Analytics?
• Consider a big customer survey with two questions: • How likely are you to recommend Microsoft to your family,
friends or colleagues? (0-10) • Why did you give us that score?
• You get 10,000 responses to your survey. Now what? • Maybe add more structure to the survey? • Maybe send it offshore to be understood? • Enter Text Analytics.
Text Analytics performs some sort of dimensionality reduction which results in a lower-dimensional representation of data to serve the task of
analytics.
![Page 7: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/7.jpg)
Types of Text Analytics?
• Entity Extraction (NER): – Markup text with entity tags: Person, Organisation, Time etc. – Used to improve processing/routing of text
• Classification: – The process of classifying a piece of text with a fixed set labels. – Sentiment Analysis and Categorisation are both examples of
classification. • Topic Modeling:
– Identifying of high level constructs (topics or ideas) present in the text.
– Some approaches treat topic as abstract constructs useful for specific tasks (e.g. more like this search). Others use them as a mechanism for understanding data.
![Page 8: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/8.jpg)
Who are Kapiche?
![Page 9: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/9.jpg)
What does Kapiche do?
• Take away all the marketing lingo and Kapiche does automatic Topic Modeling.
• Not the abstract variety. The understandable variety.
• The goal is to understand large amounts of data quickly.
• But what is a topic and how are they identified?
![Page 10: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/10.jpg)
What is a Topic?
• Remember, most text analytics is just noise reduction.
• Kapiche uses a pure mathematical approach to determine which terms from a text corpus have high entropy.
• This is done by combining influence of a term with the frequency.
• Once these nodes of information have been identified, we begin to build topics around them.
![Page 11: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/11.jpg)
Understand the Data using Topics
Understanding the Topic Model helps us understand the data.
![Page 12: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/12.jpg)
Solving Business Problems with Text Analytics
![Page 13: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/13.jpg)
Automation (prediction?)
• Text Analytics can help automate a range of business processes.
• NER and Classification can be used to: – Assign support tickets to the right person
(routing) – Determine if email is spam – Automatically tag new documents in a
database – Fraud detection
![Page 14: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/14.jpg)
Enterprise Search
• Using a combination of Topic Modeling and Classification / NER, it’s possible to come up with a bunch of different approaches to search.
• NER can be used for “semantic search”. • Abstract Topic Modeling (the type where the
topics are abstract constructs) is great for More Like This.
• Concrete is great for understanding the search results and finding what you are looking for (quick demo).
![Page 15: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/15.jpg)
Voice of the Customer
• Perhaps the most powerful tool in sales and marketing is knowing what your customers think about your brand / product / business.
• It has always been possible to just ask them of course, but what do you do with the responses? Read them all?
• Actually, that is the exact approach most companies take. They develop complicated coding frameworks and offshore it all.
• Obviously, that is a seriously flawed (human bias?) and expensive approach. So much so that surveys are tailored to be easier to extract knowledge from.
![Page 16: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/16.jpg)
Sentiment Analysis for VotC
• Sentiment Analysis is usually how people get started. It has problems though.
Gee, I really love the complementary snacks on Virgin
Airlines!
• Sentiment analysis is traditionally just a classification problem using machine learning.
• Generally require a new model for each data domain.
![Page 17: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/17.jpg)
Topic Modeling for VotC
• Companies like Kapiche (and Luminoso for example) are trying to make it easy to understand your customer.
• The approach is generally based around some degree of automated insight extraction.
• In the case of Kapiche, we are trying to reduce the noise to significantly decrease the time to understand customers.
• This technology doesn’t replace the analyst! It does reduce the amount of expertise need though.
![Page 18: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/18.jpg)
Demo!
![Page 19: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/19.jpg)
Future of VotC
• The current best practice for survey design, which a bunch of structured multiple choice questions, is flawed.
• It’s build around the idea that automating the extraction of insights from text is hard.
• These complex surveys also result in low engagement rates.
• Technology like this has the ability to change how we design customer surveys.
• I propose simple surveys with only 2 questions. • Also consider how we are extracting value from social
media, call centre data, etc.
![Page 20: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/20.jpg)
Machine Learning
• Another way to describe dimensionality reduction in a manner for Machine Learning is feature extraction.
• Combining features extracted using some techniques from Text Analytics with structured data to build a classifier has lots and lots of uses. – News reports and stock price changes? – Book content and customer review scores? – Movie scripts and critic ratings?
• The traditional approach here has been Bag of Words. • New methods like Word2Vec and GloVe are emerging that
don’t discard structure of the text.
![Page 21: Introduction to Text Analytics for Customer Insights](https://reader030.fdocuments.us/reader030/viewer/2022020203/58efa9b11a28aba6268b456f/html5/thumbnails/21.jpg)
Resources
• Word2Vec - https://en.wikipedia.org/wiki/Word2vec • GloVe - http://nlp.stanford.edu/projects/glove/ • Sentiment Analysis -
https://blog.monkeylearn.com/sentiment-analysis-apis-benchmark/
• Kapiche for Research – https://research.kapiche.com • Gensim - https://radimrehurek.com/gensim/index.html • NLTK - http://www.nltk.org/