Deep Machine Reading for Customer Analytics
-
Upload
naveen-ashish -
Category
Data & Analytics
-
view
320 -
download
0
Transcript of Deep Machine Reading for Customer Analytics
![Page 1: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/1.jpg)
Deep Machine Reading for Customer Analytics
Naveen AshishHutch Data Commonwealth, Fred Hutchinson Cancer Research CenterAugust 11th 2016
Seattle Natural language Processing Meetup
![Page 2: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/2.jpg)
Deep Understanding of Text
DeepThought A text understanding engine focused on feedback understanding and analytics
Smart Health Informatics Platform (SHIP) Mining insights from patient conversations
An Automated Abstract Reader (AAR) Determining “directionality” from peer-reviewed literature research abstracts
NLP MACHINE-LEARNING SEMANTICS
![Page 3: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/3.jpg)
Hutch Data Commonwealth
An analytics platformHutch Data Commonwealth will be a hub for data science — expanding Hutch's research scope with data resources and tools that will generate new ideas and opportunities.
https://www.fredhutch.org/en/labs/hutch-data-commonwealth.html
![Page 4: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/4.jpg)
Context…
Health Informatics
DeepThought AARSHIP whatpatientsknow.com
2011-2013 2013 – 2016 2013-2015
http://www.dropthought.com http://www.praedicat.com
![Page 5: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/5.jpg)
Work with …
Senthil Ravichandran Ajith Ravi
Karan Chaudhry
Lauren Caston
![Page 6: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/6.jpg)
DeepThought
DropThought Comment Data
Survey Comment Data Social Media Reviews/ Feeds
Structured Actionable Insights for Institutions(with highly intuitive visualization and actionable insights)
Native AppsWeb Apps
Integrated Solutions
DeepThought: Analyzing Feedback Data
![Page 7: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/7.jpg)
Feedback Examples
Multiple, diverse domains Collected at “point-of-experience”
The food is great, cool setting by the lake, but the service can be a LOT better !!
We have to get through way too many levels for approval for anything, simplify !
I think the XYZs are really beginning to get it on the college crisis, with the student loan proposal in the manifesto
The hotel staff is universally terrific, assisted us with early check-in and other needs. Very prompt service with a smile
Professor ABC did a great job ! His lectures are so well prepared. I’d suggest though we incorporate more deep learning material in the course.
![Page 8: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/8.jpg)
Health Domain: Patient Experiences
![Page 9: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/9.jpg)
Health: WhatPatientsKnow.com
![Page 10: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/10.jpg)
Insurance Risk: Automated Abstract Reader
Software solution for insurance risk assessment
Focus on (risk due to) industrial chemicals
Comprehensive risk score Signal from peer-reviewed
literature one major factor
![Page 11: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/11.jpg)
Unstructured Data Feedback data from devices or systems of record, surveys,
market research, social media etc. Health discussion boards PubMed research abstracts
Deep Text Analysis DeepThought SHIP AAR
Structured Insights for Business
Deep Text Analysis
![Page 12: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/12.jpg)
Manually Done Automate !
All applications require a structured representation of the (unstructured) data A structured database/meta-base that powers
Analytics dashboards Data coding processes Risk assessment computations Consumer health portals ….
Manual extraction processes are typically in place Goal is to eliminate or alleviate manual effort
![Page 13: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/13.jpg)
Extraction of keywords Polar sentiment words
Richer space of extracted concepts
Document analysis features
Abstract expressions, concepts and relationships
Deep contextual understanding Domain(s) specific applicability
Base Text Analytics
Advanced Text Analytics
Deep Text Analytics
Features
Value
Limited actionability Low accuracy Not effective in
dealing with sparse data sets
Variable actionability Medium accuracy Not effective in
dealing with sparse data sets
High actionability High accuracy Effective in dealing
with sparse data sets
Examples
Alchemy API Clustify
Clarabridge Lexalytics
Stanford Deep Learner DeepThought, AAR
and SHIP
Deep Text Analysis
![Page 14: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/14.jpg)
Topics
Architecture Functionality
Expressions Sentiment Directionality
Scale Training Learning
Some Results
![Page 15: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/15.jpg)
Structured Feedback
(Eliza)
Active Learning
Ngram Analysis
Part-of-speech (POS) Tagging
Entity Extraction(Ontology
driven)
Entity Extraction(Unsupervised)
TFIDF
Text Analysis
ONTOLOGIES
SemanticResolution
Customized Topic Model
Natural LanguageParsing
Semantic &Language Analysis
Best of Breed Sentiment & Category
Classifiers(Dictionary based,
Feature Driven, Unsupervised Feature Driven/ Neural Nets)
Sentiment Ensemble
CategoryEnsemble
Machine Learning Classification
UnstructuredData
Semantic Tagging
ONTOLOGIES
Architecture
![Page 16: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/16.jpg)
Ngram AnalysisPart-of-speech (POS) Tagging
Entity Extraction(Ontology driven)
Unsupervised Entity Extraction
Built in named entity extraction
POS tag based
TFIDFUnstructured
Data
Discriminative Patterns
Data driven analysis helps identify key
terminology
Significant leverage of “open” & proprietary knowledge resources
Key Indicator Phrases
POS tagged text
Entities
Specialized Entities Normalized Entities
Overview of Text Analysis Architecture
ONTOLOGIES
Wikipedia User Defined
Other Open Source
![Page 17: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/17.jpg)
ONTOLOGIESSemantic
Resolution
Topic Model
Natural LanguageParsing
Semantic & Language Analysis Architecture
SemanticTagging
Cohesive Topic Mining
Parsing as required for relationship establishment
Most domains benefit from domain knowledge
Specialized Entities Normalized Entities
Semantic cohesion layer over customized topic model
Entities Concepts Topics Categories Roles Relationships
WordNet Freebase Synonym Resolver
LDA Pachinko SVD NMF
Wikipedia
Other Open Source
User Defined
![Page 18: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/18.jpg)
Eliza
Active Learning
Ensemble
Ensemble
Overview of Classification Architecture SENTIMENT
CLASSIFIERS
CATAEGORYCLASSIFIERS
Classifier ensembles for “best of breed”
Learn from feedback
Classified andStructured Data
Entities Concepts Topics Categories Roles Relationships
SENTIMENT/ CATEGORY
CLASSIFIERS
Best of breed Sentiment ClassifiersKnowledge driven and
Deep Learning
Best of breed Category Classifier
Dictionary based, Feature Driven and
Unsupervised Feature Driven
![Page 19: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/19.jpg)
Deep Machine Reading is ….
The ability to distill the abstract from text The ability to comprehensively extract multiple concepts and
relationships from the text The ability to link extracted elements to known concepts The ability to use the text (data) itself, to improve understanding of that
text
![Page 20: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/20.jpg)
The Abstract, in Text
The abstract, not explicitly mentioned ! What falls in this category
Expressions Contextual sentiment Aspects or Categories
I think you need better chefs SUGGESTION
The mocha is too sweet NEGATIVE
I used to take Lipitor for … PERSONAL EXPERIENCE
The dim lights have a cozy effect …. AMBIENCE
![Page 21: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/21.jpg)
Classification, rather than Extrication
Much of the technology, up to recently, is extrication focused Extricate particular terms, elements, concepts from the text
Extrication Named-Entity extraction
PERSONS, ORGANIZATIONS, LOCATIONS, … Sentiment extraction
Based on polar words Need for much more sophisticated classification of text snippets
Along different dimensions of interest
![Page 22: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/22.jpg)
Deeper Text Analysis Better Insights
Goal: Get actionable insights from data ! Hypothesis: Deeper extraction Better insights !
The top advice items advised for skin rash are aloe vera, vitamin E oil and oatmeal
Complaints comprise 36% of the overall feedback with top issues being slow service, drinks and coffee
73% of all research articles indicate that Cadmium is a causal factor for asthma
![Page 23: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/23.jpg)
Expressions
Beyond entities and sentiment : EXPRESSSIONS EXPRESSIONS
Ashish et al (2012). The Smart Health Informatics Platform: From Patient Conversations to Big Data to Insights. 2nd International Advanced Health Informatics Conference, April 2012, Toronto
![Page 24: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/24.jpg)
Expressions
…showers had no hot water !… COMPLAINT
..you should have more veggie options… SUGGESTION
RETAIL/ENTERPRISE
..meats on special this weekend… ANNOUNCEMENT..this is the best store on the west side… ADVOCACY
There is hardly any evidence to suggest a link between salt and diabetes -
This results confirm that high intake of salt leads to increase in BP +
RISK ASSESSMENT
![Page 25: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/25.jpg)
Expressions
You should try Vitamin E oil … ADVICE
..I have had arthritis since 1991… EXPERIENCE
HEALTH
..for me lipitor worked like a charm… OUTCOME
![Page 26: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/26.jpg)
The Indicators: “Give Aways”
A combination of multiple types of elements !
…showers had no hot water !… COMPLAINT
(You) should have more veggie options… SUGGESTION
..i have been on lipitor… EXPERIENCE..this is the best store on the west side… ADVOCACY
![Page 27: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/27.jpg)
Approach: Given Indicators NLP
Identification of individual elements Unsupervised
Relationships between elements Semantics
Identification of individual elements Knowledge driven
Machine Learning Classification Combine elements classify
![Page 28: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/28.jpg)
Expression Classification: Relevant Features
Curated lexicons of specific indicative phrases Examples
“could you”, “I took”, …. Approach
Manual creation of “seed” lexicons Automated expansion from data plus resource such as WordNet
The Sentiment For instance a Complaint would almost always have negative sentiment
Punctuations, Other expressions or emoticons
![Page 29: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/29.jpg)
Expression Classification Features
Positional information of words, phrases, or part-of-speech patterns in the sentence Suggestions will usually begin with certain ‘request’ words
Custom patterns Such as subject-verb-object for PERSONAL EXPERIENCE
Ontology concepts
![Page 30: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/30.jpg)
Expression Classification: Results
Have achieved 75% precision and recall for all expressions considered Factors
Feature engineering Classifier selection Knowledge engineering
![Page 31: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/31.jpg)
Before Automated Classification: Patterns Developed Manually
SoL: Sequences of Labels Labels
LEX-FOODADJ spicy
LEX-EXCESS too, very
ONT-FOOD POS-NOUN
Sequences (Patterns) ANY LEX-EXCESS LEX-FOODADJ ANY Negative POS-VB POS-MD * Suggestion
![Page 32: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/32.jpg)
Conditional Random Field (CRF) “Semi-markovian” models Take the “neighboring classification” into account
Use for label sequencing Applicability to Feedback classification Use in multi-step classification Classify labels (intermediate) using CRF Use labels as features for text signal classification (such as sentiment
or other) In some cases the text signal (say sentiment) is identified by particular
terms in particular sequence in the text CRF to classify these “labels” accurately From labels to signal
Sequence Tagging Classifiers
![Page 33: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/33.jpg)
Baseline Classifiers for Expressions
Mallet and Weka NaiveBayes MaxEnt CRF
Gram-based Uni, Bi and Trigram features
Baseline ~ 10% accuracy
![Page 34: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/34.jpg)
Contextual Sentiment
(Just) polar words can be misleading ! Polar words many not be present at all ! Combination of elements CRF Classifier
The mocha is too sweet
Wait time is over an hour
Aisles are too narrow
Service is slow
![Page 35: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/35.jpg)
Qualified Sentiment
Classify negative comments Further segregate into
Immediately actionable items ‘Long term’ issues
Approach Curation of Ngrams for each type of negative comments Classifier
![Page 36: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/36.jpg)
Topic Mining
Motivated by feedback survey analytics People can talk about “anything”
Interested in broad ‘topics’ of discussion But the set of topics is dynamic, not necessarily known
Unsupervised topic mining LDA: Latent Dirichlet Allocation
As-is led to very fragmented topics that were semantically not meaningful Solution: consolidation of terms using WordNet
Expand terms using WordNet synonyms Consolidate with manual curation after
Semi-automated approach
![Page 37: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/37.jpg)
Cohesive Topic Mining
Problem with WordNet (synonym) expansion Prone to semantic divergence
Example Presentation Project(or) Milestones
(Almost) strongly connected components in relationship graph
Manual review after
![Page 38: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/38.jpg)
Semantics / Knowledge Engineering
![Page 39: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/39.jpg)
Semantics
Domain knowledge is not ‘nice-to-have’ but critical
HEALTH• Condition names• Drug names• Symptoms• Procedures• ..
RETAIL• Food items• Other products• Competitors• …
RESEARCH• Chemical substances• Harmful conditions• MeSH terms• …
![Page 40: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/40.jpg)
Leverage Existing Knowledge Sources Health informatics
UMLS http://www.nlm.nih.gov/research/umls/ NCI Thesaurus
http://ncit.nci.nih.gov/
SNOMED http://www.nlm.nih.gov/snomed
Retail DMOZ
http://www.dmoz.org Many other
Freebase http://www.freebase.com
Wikipedia, DBPedia OpenData
data.gov
![Page 41: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/41.jpg)
Knowledge Engineering Tools
Getting available ontologies into usable formats Available as database dumps, RDF, or Web data
“Mini” ontology creation Curate manually when possible (small dictionaries)
Example: list of competitors API access
Freebase https://www.freebase.com/query Query using ‘MQL’ – Metaweb Query Language (Sparql like)
BioPortal http://data.bioontology.org/documentation Provided sometimes by customer ! Leveraging MeSH for abstract reader
![Page 42: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/42.jpg)
Practical Requirements
Confidence Measures Quantitative confidence score for extracted elements Binary confidence Y/N
Not confident Routed for manual review ‘Explanation’ for classification
Relevant snippets “….and the checkout times continue to be long despite …”
Complaint
![Page 43: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/43.jpg)
Classifiers Description Reason for Incorporation
Decision Tree Family
J48, Random Forest, … Many discrete features
(linguistic)
Can handle “raw” features Learns from both discrete and numerical
features
Regression Logistic, SVM Numerical features also
prevalent
Text classification problems are typically high-dimensional which regression can handle well
Hybrid SMO Combine advantages of both decision tree
(forest) and regression driven classifiers
Core Classifiers
Machine-Learning Classification
![Page 44: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/44.jpg)
Classifier Families Description Reason for Incorporation
Label Sequences
Conditional-Random-Field(CRF)
Sequences of labels, especially in feedback text
Markovian modeling where sequencing information in text is critical
Some classification tasks require recognizing sequences of labels in the text
Classifier Families Description Reason for Incorporation
Neural Networks Multi-layer Perceptrons Can work with “lower level” or less sophisticated
features
Sequence Taggers
Neural Networks
Classifier Families Description Reason for Incorporation
Deep Learner Vectorized features Unsupervised feature learning
Deep Learning
Machine-Learning Classification
![Page 45: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/45.jpg)
SHIP: Structuring Health Conversations
Health domain expressions Strong need for normalization
Conditions, Medications, Symptoms. …. Done with biomedical ontologies
Sentiment analysis can be challenging The word ‘cancer’ for instance Specific : “fever going up”, “cholesterol gone down”
Modeling conversation threads
![Page 46: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/46.jpg)
Abstract Reader Challenges The notion of tell sentences and their
identification Classification approach Did involve custom phrases lexicons
amongst other features Had to analyze clauses in sentences Negation Semantic resolution of terms and
abbreviations Curated synonym lists Determine from abstract text itself
Anchoring to MeSH
![Page 47: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/47.jpg)
Scale: Training and Learning
![Page 48: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/48.jpg)
Optimizing Training: Active Learning
Active-learning for training optimization In DeepThought
![Page 49: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/49.jpg)
DropThought Active learning with Eliza Sampling Strategies
Cluster based Data is typically skewed in our domains Language (phrases) are common and repetitive
Uncertainty based Select samples that are most uncertain to
classify Model change
Sample that most affect model Error change
Samples that induce most error reduction Variance reduction
ClassifiersEliza
Classified Output
Iterative retraining
Algorithm Cluster Based
Uncertainty Based
Reduction 71% 66%
Active Learning
![Page 50: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/50.jpg)
Training challenges Training data is sparse at best, if not completely unavailable Expensive and time consuming to curate training data
Solutions Factor related data that can be used as training data For instance data on identical or related topics from Wikipedia and other
publicly available sources (such as a news corpus) Exploit publicly available opinion data (Amazon or Yelp reviews) towards
sentiment training data Results
Benefits are category specific, it proves useful for certain classes but not universally
Important to identify external datasets with similar language and terminology
Transfer Learning
![Page 51: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/51.jpg)
Some Results
![Page 52: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/52.jpg)
Overview of DropThought Classifier Performance across Data-setsPerformance = Correct Classifications/ Total Classifications
![Page 53: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/53.jpg)
SVM
Trend 1: Non-linear kernels (RBF and others) giving better results than Linear kernels
Expected as number of features is not extremely high
Trend 2:Accuracy generally better at smaller values of C
Number of target classes is typically small
Trees (Random Forest, Extra Trees)
Trend 1:Number of trees: accuracy increases with number of trees, peak at around 500 trees
Trend 2: Number of features for split: Higher at lower values
Optimal near log(number of features)
Some Evaluation Insights
![Page 54: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/54.jpg)
Glassdoor- Sentiment
Extra Trees Random Forest
![Page 55: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/55.jpg)
Abstract Reader Results
![Page 56: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/56.jpg)
56
“Animal” performance: over 93% accurate
Actual
Predicted
![Page 57: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/57.jpg)
57
“Human” performance: over 97% accurate
Actual
Predicted
![Page 58: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/58.jpg)
58
False-positive versus False-negative
Corresponds to <5% FP, FN rates
![Page 59: Deep Machine Reading for Customer Analytics](https://reader035.fdocuments.us/reader035/viewer/2022070513/58859a951a28abd2498b57a5/html5/thumbnails/59.jpg)
Conclusions Text understanding
Deep understanding Distilling the abstract, feature engineering Mining patterns for extraction Semi-automated expansion of pattern lexicon Knowledge engineering
The unsupervised aspect Scaling
Transfer learning Active learning
Outcomes SHIP Expression classification DeepThought Acquisition AAR Integration into risk assessment software