8/2/2019 Tech Seminar Doc
1/27
Speech To Text Conversion 1
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
CHAPTER 1
INTRODUCTION
8/2/2019 Tech Seminar Doc
2/27
Speech To Text Conversion 2
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
INTRODUCTION
Since the beginning of the 21st century, a revolutionary development began in
technical and communication fields. These developments affected several areas such as
medicine, industry, and even translation. There were a lot of problems and difficulties which
faced this generation, however, People searched for new ideas that save time, effort, and
money. When talking about translation, many has established computer programs electronic
tools, and websites to service this area, for instance, online translation sites, translating
programs, electronic dictionaries, corps. etc. One area of interest is Speech Recogni tion.
Speech recognition is the process of capturing spoken words using microphone or
telephone and converting them into a digitally stored set of words .Speech recognition
reduces the overhead caused by alternate communication methods. Speech has not been used
much in the field of electronics and computers due to the complexity and variety of speech
signals and sounds. However, with modern processes, algorithms, and methods we can
process speech signals easily and recognize the text.
8/2/2019 Tech Seminar Doc
3/27
Speech To Text Conversion 3
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
1.1OBJECTIVE
In this an on-line speech-to-text engine, implemented as a system-on-aprogrammable-
chip (SOPC) solution. The system acquires speech at run time through a microphone and
processes the sampled speech to recognize the uttered text. Here hidden Markov model
(HMM) is used for speech recognition, which converts the speech to text. The recognized text
can be stored in a file on a PC that connects to an FPGA on a development board using a
standard RS-232 serial cable. Our speech-to-text system directly acquires and converts
speech to text. It can supplement other larger systems, giving users a different choice for data
entry. A speech-to-text system can also improve system accessibility by providing data entry
options for blind, deaf, or physically handicapped users.
8/2/2019 Tech Seminar Doc
4/27
Speech To Text Conversion 4
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
CHAPTER 2
DESIGN IMPLEMENTATION
DESCRIPTION PARTS
8/2/2019 Tech Seminar Doc
5/27
Speech To Text Conversion 5
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
DESIGN IMPLEMENTATION DESCRIPTION PARTS
2.1AUDIO INPUT HARDWARE
PULSE CODE MODULATION
Pulse Code Modulation (PCM) is an extension of PAM wherein each analogue sample value is
quantized into a discrete value for representation as a digital code word. Thus, as shown below, a
PAM system can be converted into a PCM system by adding a suitable analogue-to-digital (A/D)
converter at the source and a digital-to-analogue (D/A) converter at the destination.
Fig 2.1 PCM Modulation and Demodulation
In PCM the speech signal is converted from analogue to digital form. PCM is
standardised for telephony by the ITU-T (International Telecommunications Union -
Telecoms, a branch of the UN), in a series of recommendations called the G series. For
example the ITU-T recommendations for out-of-band signal rejection in PCM voice coders
require that 14 dB of attenuation is provided at 4 kHz. Also, the ITU-T transmission quality
specification for telephony terminals require that the frequency response of the handset
microphone has a sharp roll-off from 3.4 kHz.
In quantization the levels are assigned a binary codeword. All sample values falling
between two quantization levels are considered to be located at the centre of the quantization
interval. In this manner the quantization process introduces a certain amount of error or
distortion into the signal samples. This error known as quantization noise, is minimised by
establishing a large number of small quantization intervals. Of course, as the number of
quantization intervals increase, so must the number or bits increase to uniquely identify the
A to D
Converter
Binary
Coder
Parallel
to Serial
Converter
Digital
Pulse
Generator
Serial to
Parallel
Converter
D to A
ConverterLPF
Sampler
Analogue
Input
PCM
Output
Modulator
PCM
Input
Analogue
Output
Demodulator
8/2/2019 Tech Seminar Doc
6/27
Speech To Text Conversion 6
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
quantization intervals. For example, if an analogue voltage level is to be converted to a digital
system with 8 discrete levels or quantization steps three bits are required. In the ITU-T
version there are 256 quantization steps, 128 positive and 128 negative, requiring 8 bits. A
positive level is represented by having bit 8 (MSB) at 0, and for a negative level the MSB is
1.
2.2 FPGA INTERFACE
The trend in hardware design is towards implementing a complete system, intended for
various applications, on a single chip. The advent of high-density FPGAs with high-capacity
RAMs, and support for soft-core processors such as Alteras Nios II processor, have
enabled designers to implement a complete system on a chip. FPGAs provide the following
benefits
FPGA systems are portable, cost effective, and consume very little power comparedto PCs. A complete system can be implemented easily on a single chip because
complex integrated circuits (ICs) with millions of gates are available now.
System on programmable chip (SOPC) Builder can trim months from a design cycleby simplifying and accelerating the design process. It integrates complex system
components such as intellectual property (IP) blocks, memories, and interfaces to off-
chip devices including application-specific standard products(ASSPs) and ASICs on
Altera high-density FPGAs.
SOPC methodology gives the designer flexibility when writing code, and supportsboth high-level languages (HLLs) and hardware description language (HDLs). The
time-critical blocks can be implemented in an HDL while the remaining blocks are
implemented in an HLL. It is easy to change the existing algorithm in hardware and
software by simply modifying the code.
FPGAs provide the best of both worlds: a microcontroller or RISC processor canefficiently perform control and decision-making operations while the FPGA can
perform digital signal processing (DSP) operations and other computationally
intensive tasks.
An FPGA supports hardware/software co-design in which the time-critical blocks arewritten in HDL and implemented as hardware units, while the remaining application
logic is written C. The challenge is to find a good tradeoff between the two. Both the
8/2/2019 Tech Seminar Doc
7/27
Speech To Text Conversion 7
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
processor and the custom hardware must be optimally designed such that neither is
idle or under-utilized.
2.3 NIOS II PROCESSOR
The Nios II processor is a general-purpose RISC processor core with the following
features
Full 32-bit instruction set, data path, and address space 32 general-purpose registers Optional shadow register sets 32 interrupt sources External interrupt controller interface for more interrupt sources Single-instruction 32 32 multiply and divide producing a 32-bit result Floating-point instructions for single-precision floating-point operations Single-instruction barrel shifter Access to a variety of on-chip peripherals, and interfaces to off-chip memories and
peripherals
Hardware-assisted debug module enabling processor start, stop, step, and traceunder control of the Nios II software development tools
Optional memory management unit (MMU) to support operating systems thatrequire MMUs
Optional memory protection unit (MPU) Software development environment based on the GNU C/C++ tool chain and the
Nios II Software Build Tools (SBT) for Eclipse
Integration with Altera's SignalTap II Embedded Logic Analyzer, enablingreal-time analysis of instructions and data along with other signals in the FPGA
design
Instruction set architecture (ISA) compatible across all Nios II processor systems Performance up to 250 DMIPS
A Nios II processor system is equivalent to a microcontroller or computer on a chipthat
includes a processor and a combination of peripherals and memory on a single chip. A Nios
II processor system consists of a Nios II processor core, a set of on-chip peripherals, on-chip
8/2/2019 Tech Seminar Doc
8/27
Speech To Text Conversion 8
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
memory, and interfaces to off-chip memory, all implemented on a single Altera device. Like
a microcontroller family, all Nios II processor systems use a consistent instruction set and
programming model.
2.4 DEVELOPMENT AND EDUCATION BOARD(DE2)
The DE2 package includes
DE2 board. USB Cable for FPGA programming and control. CD-ROM containing the DE2 documentation and supporting materials, including the
User Manual, the Control Panel utility, reference designs and demonstrations, device
datasheets, tutorials, and a set of laboratory exercises.
CD-ROMs containing Alteras Quartus II Web Edition and the Nios II EmbeddedDesign Suit Evaluation Edition software.
Bag of six rubber (silicon) covers for the DE2 board stands. The bag also containssome extender pins, which can be used to facilitate easier probing with testing
equipment of theboards I/O expansion headers.
Clear plastic cover for the board. 9V DC wall-mount power supply.
The DE2 board has many features that allow the user to implement a wide range of designed
circuits, from simple circuits to various multimedia projects.
The following hardware is provided on the DE2 board
Altera Cyclone II 2C35 FPGA device
Altera Serial Configuration device - EPCS16 USB Blaster (on board) for programming and user API control; both JTAG and
Active Serial(AS) programming modes are supported
512-Kbyte SRAM 8-Mbyte SDRAM 4-Mbyte Flash memory (1 Mbyte on some boards) SD Card socket 4 pushbutton switches
8/2/2019 Tech Seminar Doc
9/27
Speech To Text Conversion 9
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
18 toggle switches 18 red user LEDs 9 green user LEDs
50-MHz oscillator and 27-MHz oscillator for clock sources 24-bit CD-quality audio CODEC with line-in, line-out, and microphone-in jacks VGA DAC (10-bit high-speed triple DACs) with VGA-out connector TV Decoder (NTSC/PAL) and TV-in connector 10/100 Ethernet Controller with a connector USB Host/Slave Controller with USB type A and type B connectors RS-232 transceiver and 9-pin connector PS/2 mouse/keyboard connector IrDA transceiver Two 40-pin Expansion Headers with diode protection
In addition to these hardware features, the DE2 board has software support for standard
I/O interfaces and a control panel facility for accessing various components. Also, software is
provided for a number of demonstrations that illustrate the advanced capabilities of the DE2
board.
8/2/2019 Tech Seminar Doc
10/27
Speech To Text Conversion 10
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
CHAPTER 3
DESIGN IMPLEMENTATION
8/2/2019 Tech Seminar Doc
11/27
Speech To Text Conversion 11
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
3.1 BLOCK DIAGRAM
Fig 3.1 Block diagram
Through
MicrophoneSpeech
acquisition
Speech
preprocessing
Hidden
Marcov model
Text storage
External Hardware
8/2/2019 Tech Seminar Doc
12/27
Speech To Text Conversion 12
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
3.2 FUNCTIONAL DESCRIPTION
Functionally, the Speech to text conversion has the following blocks
Speech acquisition Speech preprocessing Hidden Marcov model(HMM) Text storage
SPEECH ACQUISITIONDuring speech acquisition, speech samples are obtained from the speaker in real time
and stored in memory for preprocessing. The microphone input port with the audio codec
receives the signal, amplifies it, and converts it into 16-bit PCM digital samples at a sampling
rate of 8 KHz. The system needs a parallel/serial interface to the Nios II processor and an
application running on the processor that acquires and stores data in memory. The received
samples are stored into memory on the Altera Development and Education (DE2)
development board.
The codec requires initial configuration, which is performed using custom hardware
implemented in the Altera Cyclone II FPGA on the board. The audio codec provides a
serial communication interface, which is connected to a UART. We used SOPC Builder to
add the UART to the Nios II processor to enable the interface. The UART is connected to the
processor through the Avalon bus. The C application running on the HAL transfers data
from the UART to the SDRAM. Direct memory access (DMA) transfers data efficiently and
quickly, and we may use it instead in future designs.
SPEECH PREPROCESSINGThe speech signal consists of the uttered digit along with a pause period and background
noise. Preprocessing reduces the amount of processing required in later stages. Generally,
preprocessing involves taking the speech samples as input, blocking the samples into frames,
and returning a unique pattern for each sample, as described in the following steps.
1. The system must identify useful or significant samples from the speech signal. To
accomplish this goal, the system divides the speech samples into overlapped frames.
8/2/2019 Tech Seminar Doc
13/27
Speech To Text Conversion 13
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
2. The system checks the frames for voice activity using endpoint detection and energy
threshold calculations.
3. The speech samples are passed through a pre-emphasis filter.
4. The frames with voice activity are passed through a Hamming window.
5. The system performs autocorrelation analysis on each frame.
6. The system finds linear predictive coding (LPC) coefficients using the Levinson and
Durbin algorithm.
7. From the LPC coefficients, the system determines the cepstral coefficients and weighs
them using a tapered window. The cepstral coefficients serve as feature vectors.
Fig 3.2 Flow chart for speech preprocessing
Sampled
speech input
Block into
frames
Find maximum
energy and ZCR
Find starting &
ending ofsample
Pre-emphasis
filter
Windowing onframe
Auto
correlation
LPC/Cepstral
analysis
Cepstrum o/p
for sample
8/2/2019 Tech Seminar Doc
14/27
Speech To Text Conversion 14
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
VOICE ACTIVITY DETECTIONThe system uses the endpoint detection algorithm to find the start and end points of
the speech. The speech is sliced into frames that are 450 samples long. Next, the system
finds the energy and number of zero crossings of each frame. The threshold energy and
zero crossing value is determined based on the computed values and only frames crossing
the threshold are considered, removing most background noise. We include a small
number of frames beyond the starting and ending frames so that we do not miss starting
or ending parts that do not cross the threshold but may be important for recognition.
PRE-EMPHASISThe digitized speech signal s(n) is put through a low-order LPF to flatten the signal
spectrally and make it less susceptible to finite precision effects later in the signal processing.
The filter is represented by the equation
H(z) = 1 - az-1 where a is 0.9375.
FRAME BLOCKINGSpeech frames are formed with a duration of 56.25 ms (N = 450 sample length) and an
overlap of 18.75 ms (M = 150 sample length) between adjacent frames. The overlapping
ensures that the resulting LPC spectral estimates are correlated from frame to frame and are
quite smooth.
Xq(n) = s(Mq + n)
with n = 0 to N - 1 and q = 0 to L - 1, where L is the number of frames.
WINDOWINGHamming window is used to each frame to minimize signal discontinuities at the
beginning and end of the frame according to the equation:
xq(n) = xq(n). w(n)
where w(n) = 0.54 = 0.46 cos(2n/N - 1).
8/2/2019 Tech Seminar Doc
15/27
Speech To Text Conversion 15
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
HIDDEN MARCOV MODEL
The Hidden Markov Model(HMM) is a powerful statistical tool for modeling
generative sequences that can be characterised by an underlying process generating an
observable sequence. HMMs have found application in many areas interested in signalprocessing, and in particular speech processing, but have also been applied with success to
low level Natural language processing( NLP) tasks such as part-of-speech tagging, phrase
chunking, and extracting target information from documents. Andrei Markov gave his name
to the mathematical theory of Markov processes in the early twentieth century, but it was
Baum and his colleagues that developed the theory of HMMs in the 1960s[2].
HMM TRAININGAn important part of speech-to-text conversion using pattern recognition is training.
Training involves creating a pattern representative of the features of a class using one or more
test patterns that correspond to speech sounds of the same class. The resulting pattern
(generally called a reference pattern) is an example or template, derived from some type of
averaging technique. It can also be a model that characterizes the reference pattern statistics.
A model commonly used for speech recognition is the HMM, which is a statistical model
used for modeling an unknown system using an observed output sequence. The system trains
the HMM for each digit in the vocabulary using the Baum-Welch algorithm. The codebook
index created during preprocessing is the observation vector for the HMM model.
After preprocessing the input speech samples to extract feature vectors, the system builds the
codebook. The codebook is the reference code space that we can use to compare input feature
vectors. The weighted cepstrum matrices for various users and digits are compared with the
codebook. The nearest corresponding codebook vector indices are sent to the Baum-Welch
algorithm for training an HMM model.
The HMM characterizes the system using three matrices
AThe state transition probability distribution. BThe observation symbol probability distribution. nThe initial state distribution.
8/2/2019 Tech Seminar Doc
16/27
Speech To Text Conversion 16
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
Any digit is completely characterized by its corresponding A, B, and n matrices. The A,
B, and n matrices are modeled using the Baum-Welch algorithm, which is an iterative
procedure (we limit the iterations to 20). The Baum-Welch algorithm gives 3 matrices for
each digit corresponding to the 3users with whom we created the vocabulary set. The A, B,
and n matrices are averaged over the users to generalize them for user-independent
recognition.
For the design to recognize the same digit uttered by a user for which the design has not
been trained, the zero probabilities in the B matrix are replaced with a low value so that it
gives a non-zero value on recognition. To some extent, this arrangement overcomes the
problem of less training data. Training is a one-time process. Due to the complexity and
resource requirements, it is performed using standalone PC application software that we
created by compiling our C program into an executable. For recognition, we compile the
same C program but target it to run on the Nios II processor instead. We were able to
accomplish this cross-compilation because of the wide support for the C language in the Nios
II processor IDE.
The C program running on the PC takes the digit speech samples from a MATLAB
output file and performs preprocessing, feature vector extraction, vector quantization, Baum-
Welch modeling, and averaging, which outputs the normalized A, B, and n matrices for each
digit. The normalized A, B, and n matrices are then embedded in the recognition C program
code and stored in the DE2 developmentboards SDRAM using the Nios II IDE.
BAUM-WELCH ALGORITHM
Baum-Welch algorithm is an instance of a general algorithm, the Expectation-
Maximisation algorithm, which maximises the probability of observations depending on
hidden data. The algorithm requires specifying the number of states n of the learnt model
.The algorithm finds a local maximum in the parameter space of n-state HMMs, rather than a
global maximum.
8/2/2019 Tech Seminar Doc
17/27
Speech To Text Conversion 17
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
Fig 3.3 Flow chart for Training
Speech
sample i/p for
each digit
preprocessing
Append Weighted
Cepstrum for Each User
for Digit (0 - 9)
Compress the Vector
Space by Vector
Quantization
Find the Index of the
Nearest Code Book (in a
Euclidian Sense) Vector
for Each Frame (Vector)
of the Input Speech
Train the HMM for Each
Digit for Each User &
Average the Parameters
(A, B, ) over All Users
Trained
Models
For Each Digit
Code book of size 128
Weighted cepstrum
8/2/2019 Tech Seminar Doc
18/27
Speech To Text Conversion 18
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
HMM BASED RECOGNITION
Recognition or pattern classification is the process of comparing the unknown test
pattern with each sound class reference pattern and computing a measure of similarity
(distance) between the test pattern and each reference pattern. The digit is recognized using a
maximum likelihood estimate, such as the Viterbi decoding algorithm, which implies that the
digit whose model has the maximum probability is the spoken digit.
Preprocessing, feature vector extraction, and codebook generation are same as in
HMM training. The input speech sample is preprocessed and the feature vector is extracted.
Then, the index of the nearest codebook vector for each frame is sent to all digit models. The
model with the maximum probability is chosen as the recognized digit.
After preprocessing in the Nios II processor, the required data is passed to the
hardware for Viterbi decoding. Viterbi decoding is computationally intensive so we
implemented it in the FPGA for better execution speed, taking advantage of
hardware/software co-design. We wrote the Viterbi decoder in Verilog HDL and included it
as a custom instruction in the Nios II processor. Data passes through the dataa and datab ports
and the prefix port is used for control operations. The custom instruction copies or adds two
floating-point numbers from dataa and datab, depending on the prefix input. The output
(result) is sent back to the Nios II processor for further maximum likelihood estimation.
VITERBI ALGORITHM
The Viterbi algorithm is a dynamic programming algorithm for finding the
most likely sequence of hidden statescalled the Viterbi paththat results in a sequence of
observed events, especially in the context of Markov information sources, and more
generally, hidden Markov models. The forward algorithm is a closely related algorithm for
computing the probability of a sequence of observed events. These algorithms belong to the
realm of probability theory.
The algorithm makes a number of assumptions
8/2/2019 Tech Seminar Doc
19/27
Speech To Text Conversion 19
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
First, both the observed events and hidden events must be in a sequence. Thesequence is often temporal, i.e. in time order of occurrence.
Second, these two sequences need to be aligned: an instance of an observed eventneeds to correspond to exactly one instance of a hidden event.
Third, computing the most likely hidden sequence (which leads to a particularstate) up to a certain point tmust depend only on the observed event at point t, and
the most likely sequence which leads to that state at point t 1.
Fig 3.4 Flow chart for HMM-Recognition
Sample
Speech Input
to be
Recognized
PreprocessingCode Book
Input
Find the Index of theNearest Code Book (in a
Euclidian Sense) Vector
for Each Frame Vector
Find the Probability
for the Input Being
Digit K = 1 to M
Find the Model with theMaximum Probability &
the Corresponding Digit
Recognized
Digit
Trained Digit
Model
for All Digits for All
Users
8/2/2019 Tech Seminar Doc
20/27
Speech To Text Conversion 20
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
TEXT STORAGEOur speech-to-text conversion system can send the recognized digit to a PC via the
serial, USB, or Ethernet interface for backup or archiving. For our testing, we used a serial
cable to connect the PC and RS-232 port on the DE2 board. The Nios II processor on the
DE2 board sends the digital speech data to a PC; a target program running on the PC receives
the text and writes it to the disk.
We wrote the PC program using Visual Basic 6 (VB) using a Microsoft serial port
Control. The VB program must be run in the background for the PC to receive the data and
write it to the hard disk. The Windows HyperTerminal software or any other RS-232 serial
communication receiver could also be used to receive and view the data. The serial portcommunication runs at 115,200 bits per second (bps) with 8 data bits, 1 stop bit, and no
parity. Handshaking signals are not required. The speech-to-text conversion system can also
operate as a standalone network device using an Ethernet interface for PC communication
and appropriate speech recognition server software designed for the Nios II processor.
8/2/2019 Tech Seminar Doc
21/27
Speech To Text Conversion 21
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
Fig 3.5 Flow chart for text storage
start
Serial
port data
received
Ask
Filename
Open File
Copy Serial Buffer
to File
serial
port data
over?
Close File
Stop
Y
Y
N
N
8/2/2019 Tech Seminar Doc
22/27
Speech To Text Conversion 22
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
3.3 WORKING
The microphone input port with the audio codec receives the signal, amplifies it, and
converts it into 16-bit PCM digital samples at a sampling rate of 8 KHz. And it transmits to
the Nios II processor in the FPGA through the communication interface.
Generally, a speech signal consists of noise-speech-noise. The detection of actual
speech in the given samples is important. The speech signal is divided into frames of 450
samples each with an overlap of 300 samples, i.e., two-thirds of a frame length. The speech is
separated from the pauses using voice activity detection (VAD) techniques. The receivedsamples are stored into memory on the Altera Development and Education (DE2)
development board.
The speech signal consists of the uttered digit along with a pause period and
background noise. Preprocessing involves taking the speech samples as input, blocking the
samples into frames, and returning a unique pattern for each sample. The system performs
speech analysis using the linear predictive coding (LPC) method. From the LPC coefficients
we get the weighted cepstral coefficients and cepstral time derivatives, which form the
feature vector for a frame. Then, the system performs vector quantization using a vector
codebook. The resulting vectors form the observation sequence. For each word in the
vocabulary, the system builds an Hidden Marcov Model(HMM ) and trains the model during
the training phase. The training steps, from VAD to HMM model building, are performed
using PC-based C programs. The resulting HMM models on to an FPGA for the recognition
phase.
In the recognition phase, the speech is acquired dynamically from the microphone
through a codec and is stored in the FPGAs memory. These speech samples are
preprocessed, and the probability of getting the observation sequence for each model is
calculated. The uttered word is recognized based on a maximum likelihood estimation.
8/2/2019 Tech Seminar Doc
23/27
Speech To Text Conversion 23
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
3.4 SOFTWARE AND HARDWARE USED
Sound recorder MATLAB version 6 CDevC++ with gcc and gdb Quartus II version 5.1 SOPC Builder version 5.1 Nios II processor Nios II IDE version 5.1 MegaCore IP library Altera Development and Education (DE2) board Microphone and headset RS-232 cable
8/2/2019 Tech Seminar Doc
24/27
Speech To Text Conversion 24
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
CHAPTER 4
APPLICATIONS
8/2/2019 Tech Seminar Doc
25/27
Speech To Text Conversion 25
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
APPLICATIONS
Interactive voice response system (IVRS) Voice-dialing in mobile phones and telephones Hands-free dialing in wireless bluetooth headsets PIN and numeric password entry modules Automated teller machines (ATMs)
8/2/2019 Tech Seminar Doc
26/27
Speech To Text Conversion 26
CVSR COLLEGE OF ENGINEERING DEPARTMENT OF ECE
CHAPTER 5
REFERENCES
8/2/2019 Tech Seminar Doc
27/27
Speech To Text Conversion 27
REFERENCES
1. Topic taken from seminartopics.co.in/ece-seminar-topics/
2. Garg, Mohit. Linear Prediction Algorithms. Indian Institute of Technology, Bombay, India,
Apr 2003.
3. Li, Gongjun and Taiyi Huang. An Improved Training Algorithm in Hmm-Based Speech
Recognition.National Laboratory of Pattern Recognition. Chinese Academy of Sciences,
Beijing.
4. Altera Nios ii Document