capturing the value of unstructured data: introduction to text mining

51
Copyright © 2013, SAS Institute Inc. All rights reserved. CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth (“M-E”) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc.

Transcript of capturing the value of unstructured data: introduction to text mining

Page 1: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING

Mary-Elizabeth (“M-E”) EddlestonePrincipal Systems Engineer, AnalyticsSAS Customer Loyalty, SAS Institute, Inc.

Page 2: capturing the value of unstructured data: introduction to text mining

2Copyright © 2013, SAS Institute Inc. All rights reserved.

Is there valuable information “locked away” in your unstructured data?

Page 3: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

CURRENT SITUATION: COMMON QUESTIONS ABOUT TEXTUAL DATA SOURCES

How can I leverage on our textual data sources?

What value can it bring?

Are there hidden insights within text datasources that can help my organization?

Such as call center notes, emails, news, government filings, social media…

How can I leverage on both unstructured and structured

data sources?Customer data + Customer

feedback?

Need to leverage the most from text data!

Can I also use text data to analyze and

predict the future?To reduce fraud, reduce churn, improve sales, reduce costs…

Page 4: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

WHAT IF YOU COULD….

Extract key information from text data? e.g. people, places, companies

See how things are related to each other?

Across a large number of documents and messages?

Discover main ideas/ topics across all documents and messages

Find patterns across non/text data, that can predict the future

Page 5: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

WHAT IF YOU COULD…

Discover new insights from large text data sources

Extract key patterns from text data to predict the future

Discover current topics about your products from customer opinions

Find patterns within customer feedback, that predicts good interest in upsell 

opportunities

Detect anomalies from usual topics described in text reports, 

text applications or feedback

Find patterns in reports that may seem to predict/ relate to suspicious behavior

Understand previously unknown issues/ concerns, from citizen discussions on 

twitter/ forums

Extract key opinions from citizen feedback to forecast citizen sentiments 

in the near future

Customers

Fraud

Public Opinion

Page 6: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

Text Mining has numerous applications in any industry

WHERE IS TEXT MINING USED?

GovernmentDetect fraudulent activity. Spot emerging trends and

public concerns.

FinanceRetention of current customer

base using call center transcriptions or transcribed

audio. Identification of potentially fraudulent activities.

InsuranceIdentify fraudulent claims.

Track competitive intelligence.

Brand management

Life SciencesIdentify adverse

events.Recommend

appropriate research materials.

Manufacturing

Reduce time to detect root cause of product issues.Identify trends in market

segments.

TelecommunicationsHelp prevent churn and suggest

up-sell/cross-sell opportunities for individual customers.

RetailIdentify the most profitable

customers and the underlying reasons for their

loyalty.Brand management

Page 7: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

TEXT MINING

Page 8: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

SAS® Text Analytics

Domain-Driven

Information Organization and Access

SAS Enterprise

Content Categorization

SAS Ontology

Management

Analysis-Driven

Predictive Modeling, Discover Trends and Patterns

SAS Text Miner SAS Sentiment Analysis

Page 9: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

SAS® TEXT MINER • Is a complete solution, to discover insights or predict behaviour and outcomes – by leveraging on data mining capabilities of SAS®

Enterprise Miner™ and SAS natural language processing (NLP)/ advanced linguistic technologies.

• What is Concept Extraction?

• To automatically locate and extract the key information from documents based on the rules & advanced linguistic logic

• What is Concept Linking?

• To look within a large corpus of text documents to discover how concepts/ key information are associated/ linked with each other.

• What is Topic Discovery?

• To analyse a large corpus of text documents to discover topics by grouping messages that has very similar content.

Page 10: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

HOW DOES TEXT MINING WORK? EXPLORING & DISCOVERING INSIGHTS

1. Input text messages –e.g. twitter data, reports, 

email, news, forum messages

3. Discover Topics – cluster documents of similar content 

and describe them with important key words

2. Parse & explore Text Data –break down text and explore relationships of key concepts such as persons, 

places, organizations…

Page 11: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

HOW DOES TEXT MINING WORK? DISCOVER PATTERNS FOR PREDICTIVE MODELING

1. Input text messages with relevant structured 

data –e.g. email, call center notes, applications

Customer data

2. Parse Text Data and Discover Topics – Break down text into 

structured data, group messages of similar content

3. Predictive Modeling with text data – text data input into models may provide reliable info to predict 

outcome & behavior

Predict activity that is likely fraudulent…

Page 12: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

WHAT CAN WE DISCOVER?Discover relationships between concepts described in  large 

corpus of text data –how are persons, places, organizations related?

Discover topics mentioned in text data–what are main topics mentioned? 

What are the rare topics?

Discover patterns related to structured data –

e.g. how is feedback related to customer purchase behavior?

Page 13: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

EXAMPLE – DISCOVERING INSIGHTS FROM CUSTOMER COMPLAINT DATA

This is even more difficult when we wish to detect concepts and patterns within the documents, in order to find trends and detect 

high risk events

How can we analyse millions of documents quickly and identify key patterns and cases of high risk? (e.g. risk of fraudulent activity)

From customer complaints to engineer logs to legal documents, it is a considerable challenge  to draw insights from large amounts 

of information, and usually unfeasible via manual means.

THE DRIVER SIDE SEAT BELT SOMETIMES FAILS TO RETRACT. WHEN I PULLED THE BELT OUT, IT STAYED OUT AND WOULD NOT RETRACT. I INSPECTED THE AREA AND FOUND NO INTERFERENCE. THIS

HAPPENED ON A SAT. I DROVE THE VEHICLE SAT. AND SUN WITH A FAULTY BELT. I CALLED THE DEALERS SERVICE DEPT. TOLD THEM THE PROBLEM BUT

COULDN'T GET IN FOR A WEEK.

Page 14: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

EXAMPLE – DISCOVERING INSIGHTS FROM CUSTOMER COMPLAINT DATA

SAS Text Miner automates manual comprehension of text documents, uncovering relationships and trends of concepts mentioned across documents, allowing drill down analysis and integrated with predictive modeling

within SAS Enterprise Miner.

In this example, we look at a large database of car faults

Car Fault Records

THE DRIVER SIDE SEAT BELT SOMETIMES FAILS TO RETRACT.

WHEN I PULLED THE BELT OUT, IT STAYED OUT AND WOULD NOT

RETRACT. I INSPECTED THE AREA AND FOUND NO INTERFERENCE…

Here, SAS Text Miner runs a Text Parsing processing on thousands of reports of car faults –• Recognizing and extracting entities and parts of speech • Supporting a wide range of languages • Into a detailed term/ document matrix• Allowing us deeper analysis/ visualization of insights

Page 15: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

EXAMPLE – DISCOVERING INSIGHTS FROM CUSTOMER COMPLAINT DATA

This allows us to discover relationships between concepts across all messages –

e.g. what is usually mentioned with issues such as “brake problems”?

Discover topics mentioned in text data– e.g.Understand the main topics: “dealerships”…

Uncover the emerging topics: “Battery issues”…

Discover patterns related to structured data –e.g. Complaints on “engine trouble” have a 

higher chance of car accidents

Page 16: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

EXAMPLE – DISCOVERING INSIGHTS FROM CUSTOMER COMPLAINT DATA

• Discovery of new insights/ topics:

• Text data – forum messages, emails, logs, records typically contain rich, yet sparse/ uncommon insights. 

• Text mining allows you to:• Parse and extract information 

from text data • Reliably filter and retain 

important information• Automatically group documents 

into similar topics, allowing discovery of important/ large topics or rare/ small topics

• Text mining input in Predictive modeling:

• Documents and records often contain important facts that can reliably predict outcomes – for e.g. any mention of bad maintenance habits will likely result in earlier car failure

• Empowered by SAS Natural Language Processing and wide multi‐language support, Text mining discovers key trends within large amounts of text, to be used as clean, reliable input in data mining analysis.

How does this help?

Page 17: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

BENEFITS

• SAS Text Miner helps your organization to: Uncover previously undetected associations and relationships

Get a complete view data, and drill down to specific documents for more insight

Automate time-consuming tasks of reading and understanding text.

Analyse both text and non-text data produce predictive models that spot more opportunities and recognize trends more accurately

Discover hidden patterns from text data for insights and predictive modeling!

Discover hidden patterns from text data for insights and predictive modeling!

Page 18: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

SAS® TEXT MINER

Page 19: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

SAS® TEXT MINER – ANALYTICAL WORKFLOW

Text Mining

Raw Data Model with Structured and Unstructured Data

Page 20: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

EXAMPLE TEXT MINING PROCESS FLOWS

Page 21: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

EXAMPLE TEXT MINING PROCESS FLOWS

Start with a table that contains either:- Documents saved as a variable (column)- A column that points to physical text files

Page 22: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

EXAMPLE INPUT DATA VARIABLE CONTAINS FULL TEXT

Page 23: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

EXAMPLE INPUT DATA VARIABLE CONTAINS POINTER TO TEXT FILE

Page 24: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

EXAMPLE TEXT MINING PROCESS FLOWS

Apply natural language processing algorithms to parse the documents and quantify information about the terms in the corpus.

Page 25: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

TEXT PARSING NODE

• Tokenization - break sentences or documents into terms • Stemming - identify the root form of a word (run, runs, running, ran,

etc.)• Synonyms • Remove low-information words such as a, an, and the (stop list)• Part of speech identification (noun, verb, etc.)• Identify Standard and Custom Entities (names, places, etc.) Multiword terms or phrases (“blue screen of death”) Import custom entities, facts, and events as defined in SAS Enterprise Content

Categorization (ECC) Include negation entities from SAS ECC for Sentiment Analysis

Page 26: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

SUPPORTED LANGUAGES

Arabic, Chinese, Dutch, English, French, German, Italian, Japanese, Korean, Polish, Portuguese, Spanish, and Swedish, Czech, Danish, Finnish, Greek, Hebrew, Hungarian, Indonesian, Norwegian, Romanian, Russian, Slovak, Thai, Turkish, Vietnamese, Russian, Greek, Vietnamese, Turkish, Czech, Indonesian, Thai, Danish, Norwegian, Slovak, Finnish, Romanian, Hebrew, Hungarian, Korean

New in SAS 9.3

Page 27: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

EXAMPLE TEXT MINING PROCESS FLOWS

Perform spell-checking and refine synonym lists. Discover related concepts using Concept Linking. Perform full text search. Subset documents and/or terms for further analysis.

Page 28: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

TEXT FILTER NODE

• Spell checking• Concept Linking• Full text search• Define additional synonyms• Sub-setting management of terms and documents that are

passed to subsequent nodes

Page 29: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

FILTER VIEWER

Page 30: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

SAS

Text

Min

ing

Page 31: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

CONCEPT LINKING

Page 32: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

EXAMPLE TEXT MINING PROCESS FLOWS

Analyze the documents to create topics and assign each document to one or more topics. In addition to derived topics, users can add their own topic definitions.

Page 33: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

TEXT TOPIC NODE

• Multiple topics per document• Soft clustering using rotated SVD (PROC SVD followed by

PROC FACTOR)• Allows automatic creation of single and multi-word topics• User defined topics and editing of automatic topics

Page 34: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

INTERACTIVE TOPIC VIEWER

Page 35: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

EXAMPLE TEXT MINING PROCESS FLOWS

Analyze the documents to create clusters and assign each document to a single cluster.

Page 36: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

CLUSTER VIEWER

Page 37: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

CLUSTER VIEWER

Page 38: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

EXAMPLE TEXT MINING PROCESS FLOWS

Clusters can be further explored using the Segment Profile node to identify factors that differentiate data segments from the population.

Page 39: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

SEGMENT PROFILE

• The Segment Profile node is available on the Assess tab of Enterprise Miner.

• It allows the examination of segmented or clustered data to identify factors that differentiate data segments from the population.

Page 40: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

SEGMENT PROFILE

Page 41: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

EXAMPLE TEXT MINING PROCESS FLOWS: PREDICTION

Several methods are available to use the unstructured data to create predictions.

Page 42: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

Text Mining has numerous applications in any industry

WHERE IS TEXT MINING USED?

GovernmentDetect fraudulent activity. Spot emerging trends and

public concerns.

FinanceRetention of current customer

base using call center transcriptions or transcribed

audio. Identification of potentially fraudulent activities.

InsuranceIdentify fraudulent claims.

Track competitive intelligence.

Brand management

Life SciencesIdentify adverse

events.Recommend

appropriate research materials.

Manufacturing

Reduce time to detect root cause of product issues.Identify trends in market

segments.

TelecommunicationsHelp prevent churn and suggest

up-sell/cross-sell opportunities for individual customers.

RetailIdentify the most profitable

customers and the underlying reasons for their

loyalty.Brand management

Page 43: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

BENEFITS

• SAS Text Miner helps your organization to: Uncover previously undetected associations and relationships

Get a complete view data, and drill down to specific documents for more insight

Automate time-consuming tasks of reading and understanding text.

Analyse both text and non-text data produce predictive models that spot more opportunities and recognize trends more accurately

Discover hidden patterns from text data for insights and predictive modeling!

Discover hidden patterns from text data for insights and predictive modeling!

Page 44: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

LEARNING MORE

Page 45: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

SAS® TEXT MINER RESOURCES

SAS Text Miner Product Web Sitehttp://www.sas.com/text-analytics/text-miner/index.html

SAS Text Miner Technical Support Web Sitehttp://support.sas.com/software/products/txtminer/index.html

SAS Text Miner Technical Forum (Join Today!)https://communities.sas.com/community/support-communities/sas_data_mining_and_text_mining

SAS TrainingData Miner Training Path: http://support.sas.com/training/us/paths/dm.htmlCourses for SAS® Text Miner: https://support.sas.com/edu/prodcourses.html?code=TM&ctry=US

Page 46: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

http://support.sas.com/documentation/onlinedoc/txtminer/index.html

Step-by-step

how-toguide

Page 47: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

Data for the step-by-

step how-toguide

Page 48: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

DISCUSSION FORUMS

http://communities.sas.com

Page 49: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

DISCUSSION FORUMS

https://communities.sas.com/community/support-communities/text-analytics

Page 50: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved.

COMPLIMENTARY ON-DEMAND WORKSHOPS

http://www.sas.com/reg/offer/corp/handson

Page 51: capturing the value of unstructured data: introduction to text mining

Copyr igh t © 2013, SAS Ins t i tu te Inc . A l l r igh ts reserved. www.SAS.com

THANK YOU FOR USING SAS!