Aspiring Minds | Svar
-
Upload
aspiring-minds -
Category
Technology
-
view
70 -
download
2
Transcript of Aspiring Minds | Svar
Aspiring Minds
www.aspiringminds.com
Spoken English Evaluation Machine Learning with Crowd Intelligence
Varun Aggarwal
Presented at KDD, 2015, ACL 2015
Problem Statement & Motivation
Importance of spoken English
English language has a very high socio-economic impact – with people speaking the language fluently reported to earn 30-50% more than their peers who don’t.
Grading spoken English in a scalable way needed by companies, training organization and also individuals.
Problem Statement
Scalable grading of spontaneous English speech, as good as experts.
Why are automated methods not accurate?
Speaker independent Speech recognition for spontaneous
speech is a hard problem!
Proposed system architecture
Crowdsourcing helps us get accurate transcriptions. Crowd grades
also help improve!Crowd
Grades
FA Features
Crowdsourcing task
Crowdsourcing task
Worker quality control
• Each worker is assigned a risk level which reflects the quality of his past work.
• Based on the state, number and when to give a gold standard task is determined.
Supervised learning setup
Experiment Details• Sample Size : 566 • 319 India • 247 from Philippines
Expert Grading• Two expert raters• Overall score based on Pronunciation/Fluency
Content-Org/Grammar.• Inter-rater correlation ~0.8.
The learning task• Modelling done separately for Indian and Philippines
set.• Linear ridge regression, Neural Networks and SVM
regression with different kernels were used to build the models.
Case study
• Studied deployment of proposed algorithm in Philippines.
• Event had 500 applicants for the role of a customer support executive. The scoring algorithm was tested on a subset of 150 students.
• Internal expert graded each candidate’s speech as hirable or not-hireable.
Features usedWe use three classes of features
• Force Alignment features (FA) and • The speech sample is forced aligned on the crowdsourced transcription. • Features like– rate of speech, position and length of pauses, log likelihood of recognition, posterior probability,
hesitations and repetitions, etc are derived.
• Natural Language Processing features (NLP).• Surface level features : number of words, complexity or difficulty of words and the number of common words
used. • Semantic features like the coherency in text, context of the words spoken, sentiment of the text and grammar
correctness.
• Crowd Grades (CG)• Crowd provides scores on - pronunciation, fluency, content organization and grammar. • These grades are combined to form a composite score.
Experiment and Results
Crowdsourced transcriptions + Crowd grades outperforms all other methods
Accuracy nears inter expert agreement (~0.8).
Summing it up
• Svar provides an automated assessment of candidate’s pronunciation and fluency.
• Crowdsourcing, in addition to NLP feature, renders reliable composite scores.
• Speech assessments can be made scalable with accuracy nearly matching experts’ opinion.