GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE,...

38
GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014

Transcript of GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE,...

Page 1: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

GPU-ACCELERATED HMM FOR SPEECH RECOGNITION

Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University

HUCAA 2014

Page 2: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Outline

Background & Motivation

HMM

GPGPU

Results

Future Work

Page 3: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Background

• Translate Speech to Text

• Speaker DependentSpeaker Independent

• Applications* Natural Language Processing* Home Automation* In-car Voice Control* Speaker Verifications* Automated Banking* Personal Intelligent Assistants

Apple SiriSamsung S Voice

* etc.

[http://www.kecl.ntt.co.jp]

Page 4: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

DTWDynamic Time Warping

A template-based approach to measure similarity between two temporal sequences which may vary in time or speed.

[opticalengineering.spiedigitallibrary.org]

Page 5: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

DTWDynamic Time Warping

DTW Pros:1) Handle timing variation2) Recognize Speech at reasonable cost

DTW Cons:1) Template Choosing2) Ending point detection (VAD, acoustic noise) 3) Words with weak fricatives, close to acoustic background

For i := 1 to n For j := 1 to m cost:= D(s[i], t[j]) DTW[i, j] := cost + minimum(DTW[i-1, j ], DTW[i , j-1], DTW[i-1, j-1])

Page 6: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Neural NetworksAlgorithms mimics the brain.

Simplified Interpretation:* takes a set of input features* goes through a set of hidden layers* produces the posterior probabilities as the output

Page 7: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Neural Networks

“activation” of unit in layer

matrix of weights controlling function mapping from layer to layer

Bike Pedestrian Car Parking Meter

If Pedestrian

[Machine Learning, Coursera]

Page 8: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Neural Networks

Equation Example

Page 9: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Neural Networks Example

Hint: * effective in recognizing individual phones isolated words as short-time units

* not ideal for continuous recognition tasks largely due to the poor ability to model temporal dependencies.

Page 10: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Hidden Markov ModelIn a Hidden Markov Model,

* the states are hidden* output that depend on the states are visible

x — statesy — possible observationsa — state transition probabilitiesb — output probabilities

[wikipedia]

Page 11: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Hidden Markov ModelThe temporal transition of the hidden states fits well with the nature of phoneme transition.

Hint: * Handle temporal variability of speech well * Gaussian mixture models(GMMs), controlled by the hidden variables determine how well a HMM can represent the acoustic input. * Hybrid with NN to leverage each modeling technique

Page 12: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Motivation• Parallel Architecture

multi-core CPU to many-core GPU ( graphics + general purpose)

• Massive Parallelism in Speech Recognition SystemNeural Networks, HMMs, etc. , are both Computation and Memory Intensive

• GPGPU Evolvement* Dynamic Parallelism

* Concurrent Kernel Execution* Hyper-Q* Device Partitioning* Virtual Memory Addressing* GPU-GPU Data Transfer, etc.

• Previous works

• Our goal is to use new modern GPU features to accelerate Speech Recognition

Page 13: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Outline

Background & Motivation

HMM

GPGPU

Results

Future Work

Page 14: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Hidden Markov ModelMarkov chains and processes are named after Andrey Andreyevich Markov(1856-1922), a Russian mathematician, whose Doctoral Advisor is Pafnuty Chebyshev.

1966, Leonard Baum described the underlying mathematical theory.

1989, Lawrence Rabiner wrote a paper with the most comprehensive description on it.

Page 15: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Hidden Markov ModelHMM Stages

* causal transitional probabilities between states

* observation depends on current state, not predecessor

Page 16: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Hidden Markov Model

Forward

Backward

Expectation-Maximization

Page 17: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

HMM-Forward

Page 18: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Hidden Markov Model

Forward

Backward

Expectation-Maximization

Page 19: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

HMM Backward

I J

t - 1 t t + 1 t + 2

𝛼 𝑖(𝑡) 𝛽 𝑗 (𝑡+1)

𝛼 𝑖𝑗

𝛽 𝑗 (𝑥𝑡+1)

Page 20: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

HMM-EM

Variable Definitions:* Initial Probability

* Transition Prob. Observation Prob.

* Forward Variable Backward Variable

Other Variables During Estimation:* the estimated state transition probability matrix, epsilon

* the estimated probability in a particular state at time t, gamma

* Multivariate Normal Probability Density FunctionUpdate Obs. Prob. From Gaussian Mixture Models

𝜀

𝛾

Page 21: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

HMM-EM

Page 22: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Outline

Background & Motivation

HMM

GPGPU

Results

Future Work

Page 23: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

GPGPU

Programming Model

Page 24: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

GPGPUGPU Hierarchical Memory System

[http://www.biomedcentral.com]

• Visibility

• Performance Penalty

Page 25: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

GPGPU

[www.math-cs.gordon.edu]

• Visibility

• Performance Penalty

Page 26: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

GPGPUGPU-powered Eco System

1) Programming Model* CUDA* OpenCL* OpenACC, etc.

2) High Performance Libraries* cuBLAS* Thrust* MAGMA (CUDA/OpenCL/Intel Xeon Phi)* Armadilo (C++ Linear Algebra Library), drop-in libraries etc.

3) Tuning/Profiling Tools* Nvidia: nvprof / nvvp* AMD: CodeXL

4) Consortium StandardsHeterogeneous System Architecture (HSA) Foundation

Page 27: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Outline

Background & Motivation

HMM

GPGPU

Results

Future Work

Page 28: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

ResultsPlatform Specs

Page 29: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

ResultsMitigate Data Transfer Latency

Pinned Memory Sizecurrent process limit: ulimit -l ( in KB )hardware limit: ulimit –H –lincrease the limit: ulimit –S –l 16384

Page 30: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Results

Page 31: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

ResultsA Practice to Efficiently Utilize Memory System

Page 32: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Results

Page 33: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Results

Hyper-Q Feature

Page 34: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Results

Running Multiple Word Recognition Tasks

Page 35: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Results

Page 36: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Outline

Background & Motivation

HMM

GPGPU

Results

Future Work

Page 37: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

Future Work

• Integrate with Parallel Feature Extraction

• Power Efficiency Implementation and Analysis

• Embedded System Development, Jetson TK1 etc.

• Improve generosity, LMs

• Improve robustness, Front-end noise cancelation

• Go with the trend!

Page 38: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014.

QUESTIONS ?