Summit slide loop ny

44
Introduction to Text Analytics October 2, 2013 Dr. Stuart Shulman Phone No.: +1-413-345-8939 E-mail: [email protected]

Transcript of Summit slide loop ny

Page 1: Summit slide loop ny

Introduction to Text Analytics

October 2, 2013

Dr. Stuart ShulmanPhone No.: +1-413-345-8939

E-mail: [email protected]

Page 2: Summit slide loop ny

The Value Proposition

Our solution helps users easily discover information to:• streamline business processes

• increase ROI & create new business opportunities

• identify positive and negative trends

• discover unique, rare or unexpected information

Page 3: Summit slide loop ny

How Do These Tools Help Analysts?

Page 4: Summit slide loop ny

What This Means for Analysis

Page 5: Summit slide loop ny

The Core MethodsCoding and Classifying Text Data

Page 6: Summit slide loop ny

Iteration and Re-Use Are Critical Techniques

Page 7: Summit slide loop ny

Measure Everything Starting With Human Agreement

Page 8: Summit slide loop ny

The Core DiscoverText Approach

Page 9: Summit slide loop ny

An Indispensable Role for Humans

Page 10: Summit slide loop ny

Innovation Happens in Groups

Page 11: Summit slide loop ny

“CoderRank” – A Lifetime Accuracy Measurement

Vision Critical Patent Pending – “Enhanced Machine Learning”

Page 12: Summit slide loop ny

Five Essential Tools for Text Analytics

1. Search

2. Filtering on Metadata

3. Human Coding

4. Automated Clustering

5. Machine Classification

Page 13: Summit slide loop ny

A Social Media Use CaseSifting and Sorting Relevant Data

Page 14: Summit slide loop ny

Great Researchers Demand Transparent Tools

Page 15: Summit slide loop ny

The HMC is a Leading Edge Gnip Customer

Page 16: Summit slide loop ny

Gnip Data Streams and Search Filters

Page 17: Summit slide loop ny

Fair Warning

This part of the presentation contains strong and potentially quite offensive, inappropriate, disturbing, or just completely stupid language.

Page 18: Summit slide loop ny

Studying Media Campaign Effects

Page 19: Summit slide loop ny

Create Custom Machine Classifiers

Yes

No

No

Page 20: Summit slide loop ny

Search is Fundamental for Purposive Sampling

Page 21: Summit slide loop ny

Defined Search Speeds Up Discovery

Page 22: Summit slide loop ny

Tumblr. – “The Wild West of the Internet”

Page 23: Summit slide loop ny

Stupid Stuff People Do & Tweet

redacted

redacted

Page 24: Summit slide loop ny

Are These Tweets Just Social Garbage?

redacted

redacted

Page 25: Summit slide loop ny

Signs of Health Fear Engagement

redacted

redacted

Page 26: Summit slide loop ny

An IdeaScreen Use CaseConcept Testing Data

Page 27: Summit slide loop ny

Raw VoC Data: A Fortune 500 Tech Company

Page 28: Summit slide loop ny

Near Duplicate Clusters Can Be Interesting

Page 29: Summit slide loop ny

Two Naturally Occurring Clusters of Free Text

Page 30: Summit slide loop ny

Wherever Humans Go in Numbers, There Are Clusters

Page 31: Summit slide loop ny

1st Wave of Human Coding Blazes a Trail

Page 32: Summit slide loop ny

A „Simple‟ Coding Scheme with No Coder Training

Page 33: Summit slide loop ny

Filtering Based on Classifier Scores

Page 34: Summit slide loop ny

Testing Coder Agreement on a Small Sample

Page 35: Summit slide loop ny

Measuring Inter-Coder Agreement

Page 36: Summit slide loop ny

Validation of Coders & Codes

Page 37: Summit slide loop ny

Text Analytics is a Series Buckets & Datasets

Page 38: Summit slide loop ny

Breaking Down Concerns by Subtype

Page 39: Summit slide loop ny

Breaking Down Advocacy by Pro and Con

Page 40: Summit slide loop ny

A New Vision Critical Front EndThe First Preview of the New Release

Page 41: Summit slide loop ny

The New VC Front End for DiscoverText

Page 42: Summit slide loop ny

Coding Items to Train a Classifier

Page 43: Summit slide loop ny

Leverage Item Metadata While Coding or Filtering

Page 44: Summit slide loop ny

Code Items in a List View