GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition...

38
GPU-ACCELERATED HMM FOR SPEECH RECOGNITION Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University HUCAA 2014

Transcript of GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition...

Page 1: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

GPU-ACCELERATED HMM FOR SPEECH RECOGNITION

Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University

HUCAA 2014

Page 2: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Outline

Background & Motivation

HMM

GPGPU

Results

Future Work

Page 3: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Background

• Translate Speech to Text

• Speaker DependentSpeaker Independent

• Applications* Natural Language Processing *

Home Automation* In-car Voice Control* Speaker Verifications* Automated Banking* Personal Intelligent Assistants

Apple SiriSamsung S Voice

* etc.

[http://www.kecl.ntt.co.jp]

Page 4: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

DTWDynamic Time Warping

A template-based approach to measure similarity between two temporal sequences which may vary in time or speed.

[opticalengineering.spiedigitallibrary.org]

Page 5: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

DTWDynamic Time Warping

DTW Pros:1) Handle timing variation2) Recognize Speech at reasonable cost

DTW Cons:1) Template Choosing2) Ending point detection (VAD, acoustic noise) 3) Words with weak fricatives, close to acoustic background

For i := 1 to n For j := 1 to m cost:= D(s[i], t[j]) DTW[i, j] := cost + minimum(DTW[i-1, j ], DTW[i , j-1], DTW[i-1, j-1])

For i := 1 to n For j := 1 to m cost:= D(s[i], t[j]) DTW[i, j] := cost + minimum(DTW[i-1, j ], DTW[i , j-1], DTW[i-1, j-1])

Page 6: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Neural NetworksAlgorithms mimics the brain.

Simplified Interpretation:* takes a set of input features* goes through a set of hidden layers* produces the posterior probabilities as the output

Page 7: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Neural Networks

“activation” of unit in layer

matrix of weights controlling function mapping from layer to layer

Bike Pedestrian Car Parking Meter

If Pedestrian

[Machine Learning, Coursera]

Page 8: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Neural Networks

Equation Example

Page 9: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Neural Networks Example

Hint: * effective in recognizing individual phones isolated words as short-time units

* not ideal for continuous recognition tasks largely due to the poor ability to model temporal dependencies.

Page 10: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Hidden Markov ModelIn a Hidden Markov Model,

* the states are hidden* output that depend on the states are visible

x — statesy — possible observationsa — state transition probabilitiesb — output probabilities

[wikipedia]

Page 11: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Hidden Markov ModelThe temporal transition of the hidden states fits well with the nature of phoneme transition.

Hint: * Handle temporal variability of speech well * Gaussian mixture models(GMMs), controlled by the hidden variables determine how well a HMM can represent the acoustic input. * Hybrid with NN to leverage each modeling technique

Page 12: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Motivation• Parallel Architecture

multi-core CPU to many-core GPU ( graphics + general purpose)

• Massive Parallelism in Speech Recognition SystemNeural Networks, HMMs, etc. , are both Computation and Memory Intensive

• GPGPU Evolvement* Dynamic Parallelism

* Concurrent Kernel Execution* Hyper-Q* Device Partitioning* Virtual Memory Addressing* GPU-GPU Data Transfer, etc.

• Previous works

• Our goal is to use new modern GPU features to accelerate Speech Recognition

Page 13: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Outline

Background & Motivation

HMM

GPGPU

Results

Future Work

Page 14: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Hidden Markov ModelMarkov chains and processes are named after Andrey Andreyevich Markov(1856-1922), a Russian mathematician, whose Doctoral Advisor is Pafnuty Chebyshev.

1966, Leonard Baum described the underlying mathematical theory.

1989, Lawrence Rabiner wrote a paper with the most comprehensive description on it.

Page 15: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Hidden Markov ModelHMM Stages

* causal transitional probabilities between states

* observation depends on current state, not predecessor

Page 16: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Hidden Markov Model

Forward

Backward

Expectation-Maximization

Page 17: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

HMM-Forward

Page 18: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Hidden Markov Model

Forward

Backward

Expectation-Maximization

Page 19: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

HMM Backward

I J

t - 1 t t + 1 t + 2

Page 20: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

HMM-EM

Variable Definitions:* Initial Probability

* Transition Prob. Observation Prob.

* Forward Variable Backward Variable

Other Variables During Estimation:* the estimated state transition probability matrix, epsilon

* the estimated probability in a particular state at time t, gamma

* Multivariate Normal Probability Density FunctionUpdate Obs. Prob. From Gaussian Mixture Models

Page 21: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

HMM-EM

Page 22: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Outline

Background & Motivation

HMM

GPGPU

Results

Future Work

Page 23: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

GPGPU

Programming Model

Page 24: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

GPGPUGPU Hierarchical Memory System

[http://www.biomedcentral.com]

• Visibility

• Performance Penalty

Page 25: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

GPGPU

[www.math-cs.gordon.edu]

• Visibility

• Performance Penalty

Page 26: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

GPGPUGPU-powered Eco System

1) Programming Model* CUDA* OpenCL* OpenACC, etc.

2) High Performance Libraries* cuBLAS* Thrust* MAGMA (CUDA/OpenCL/Intel Xeon Phi)* Armadilo (C++ Linear Algebra Library), drop-in libraries etc.

3) Tuning/Profiling Tools* Nvidia: nvprof / nvvp* AMD: CodeXL

4) Consortium StandardsHeterogeneous System Architecture (HSA) Foundation

Page 27: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Outline

Background & Motivation

HMM

GPGPU

Results

Future Work

Page 28: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

ResultsPlatform Specs

Page 29: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

ResultsMitigate Data Transfer Latency

Pinned Memory Sizecurrent process limit: ulimit -l ( in KB )hardware limit: ulimit –H –lincrease the limit: ulimit –S –l 16384

Page 30: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Results

Page 31: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

ResultsA Practice to Efficiently Utilize Memory System

Page 32: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Results

Page 33: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Results

Hyper-Q Feature

Page 34: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Results

Running Multiple Word Recognition Tasks

Page 35: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Results

Page 36: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Outline

Background & Motivation

HMM

GPGPU

Results

Future Work

Page 37: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

Future Work

• Integrate with Parallel Feature Extraction

• Power Efficiency Implementation and Analysis

• Embedded System Development, Jetson TK1 etc.

• Improve generosity, LMs

• Improve robustness, Front-end noise cancelation

• Go with the trend!

Page 38: GPU-ACCELERATED HMM FOR SPEECH RECOGNITION€¦ · • Massive Parallelism in Speech Recognition System Neural Networks, HMMs, etc. , are both Computation and Memory Intensive •

QUESTIONS ?