30

7
A11 3062 1 University of Pittsburgh Swanson School of Engineering March 7, 2013 DEVELOPMENTS IN VOICE REGONITION TECHNOLOGY Ronald Buttermore ([email protected], 0012 Bon 6:00), Naseem Lee-Perkins ([email protected], 0012 Budny 10:00) Abstract Voice recognition software is “the process of taking the spoken word as an input to a computer program” [1]. Through this software, the goal is to be able to control a computer efficiently and intuitively [2]. Voice recognition, often only thought of as a way to command a cell phone, has many beneficial applications which can be expanded once improved to a more user friendly and accurate product. In this paper, the objectives will be to explain and analyze the importance of voice recognition and its applications through detail of research and development of and speech software, designs and implementations in current and future technologies, and existing problems with technologies and possible solutions. We will also detail the impact of voice recognition’s integration into engineering and society. Voice recognition software and its improvement, in ease of use and accuracy, could benefit the lives of many people, as the applications of the technology would increase. In addition, these applications have great potential with respect to sustainability and considering the needs of future generations along with meeting our own. These applications work towards benefiting the consumer and corporations alike. Voice recognition can allow users to go hands free and result in faster interactions. Companies that utilize the technologies may experience more productive workers, with higher output. Ultimately voice recognition creates a more appealing product which should generate new sales not only for companies making the software but also for companies using the software. To achieve our objective of displaying the importance of voice recognition technologies we will start by describing the process about how speech recognition works from a technical standpoint. We will be thoroughly outlining all pros and cons of the development of this software. Finally, we will analyze the impact of this technology on society. In conclusion, this paper will highlight the benefits of voice and speech recognition technologies by analyzing current products, how voice recognition works, and its societal and ethical impact. Key Words Acoustic Model, HMM model, Language Model, Linguistic Interpretations, User Interface, Voice Recognition VOICE RECOGNITION SYSTEMS Voice recognition systems are electronic systems that help electronic devices respond to speech. Voice recognition systems have many uses but with current flaws voice recognition systems do not reach their full potential. This paper will focus on the critique of voice recognition systems and possible improvements to be made to the current software, which will increase the uses and benefits of the software. We will begin by explaining how voice recognition works. We will include hardware requirements and descriptions of different techniques used by different software to carry out these models. A description of multiple uses by many different individuals will be included to explain the importance and benefits of voice recognition software. These uses will prove the importance of voice recognition software and illustrate why flaws and the improvement of flaws are necessary. However, there are already proposed solutions for these flaws that will increase the effectiveness of the systems which will benefit many people. Recent developments in modeling speech through improving recognition models has helped voice recognition move forward. These recent improvements have opened up possibilities of new and reworked technologies that apply voice recognition software. These technologies have the potential to benefit and help progress society and its processes, specifically through the sustainable increase in quality of life. WHAT IS VOICE RECOGNITION AND HOW DOES IT WORK? A voice recognition system translates spoken word to digital signals that are processed to perform specific tasks. However, voice recognition systems have requirements for hardware and software in order to undergo this process. Requirements for Voice Recognition Software Basic voice recognition systems require 200 megahertz Pentium processor, a minimum of 64 megabytes of RAM, a basic microphone, and a minimum of a 16-bit sound card [2]. Although most voice recognition will run, improvements in these requirements can make a dramatic difference in the performance of the voice recognition software. Increasing the size of the RAM, processor, and sound card as well as investing in a better microphone can help in the improvement of the voice recognition system [2]. Along with these hardware requirements, voice recognition requires the use of software to collect, analyze, and interpret the data. Different software approaches this process in different ways. Acoustic and Language Models

description

 

Transcript of 30

A11

3062

1

University of Pittsburgh

Swanson School of Engineering March 7, 2013

DEVELOPMENTS IN VOICE REGONITION TECHNOLOGY

Ronald Buttermore ([email protected], 0012 Bon 6:00), Naseem Lee-Perkins ([email protected], 0012 Budny 10:00)

Abstract – Voice recognition software is “the process of

taking the spoken word as an input to a computer program”

[1]. Through this software, the goal is to be able to control

a computer efficiently and intuitively [2]. Voice recognition,

often only thought of as a way to command a cell phone, has

many beneficial applications which can be expanded once

improved to a more user friendly and accurate product.

In this paper, the objectives will be to explain and

analyze the importance of voice recognition and its

applications through detail of research and development of

and speech software, designs and implementations in

current and future technologies, and existing problems with

technologies and possible solutions. We will also detail the

impact of voice recognition’s integration into engineering

and society.

Voice recognition software and its improvement, in ease

of use and accuracy, could benefit the lives of many people,

as the applications of the technology would increase. In

addition, these applications have great potential with

respect to sustainability and considering the needs of future

generations along with meeting our own. These applications

work towards benefiting the consumer and corporations

alike. Voice recognition can allow users to go hands free

and result in faster interactions. Companies that utilize the

technologies may experience more productive workers, with

higher output. Ultimately voice recognition creates a more

appealing product which should generate new sales not only

for companies making the software but also for companies

using the software.

To achieve our objective of displaying the importance of

voice recognition technologies we will start by describing

the process about how speech recognition works from a

technical standpoint. We will be thoroughly outlining all

pros and cons of the development of this software. Finally,

we will analyze the impact of this technology on society. In

conclusion, this paper will highlight the benefits of voice and

speech recognition technologies by analyzing current

products, how voice recognition works, and its societal and

ethical impact.

Key Words – Acoustic Model, HMM model, Language

Model, Linguistic Interpretations, User Interface, Voice

Recognition

VOICE RECOGNITION SYSTEMS

Voice recognition systems are electronic systems that

help electronic devices respond to speech. Voice recognition

systems have many uses but with current flaws voice

recognition systems do not reach their full potential. This

paper will focus on the critique of voice recognition systems

and possible improvements to be made to the current

software, which will increase the uses and benefits of the

software.

We will begin by explaining how voice recognition

works. We will include hardware requirements and

descriptions of different techniques used by different

software to carry out these models.

A description of multiple uses by many different

individuals will be included to explain the importance and

benefits of voice recognition software. These uses will prove

the importance of voice recognition software and illustrate

why flaws and the improvement of flaws are necessary.

However, there are already proposed solutions for these

flaws that will increase the effectiveness of the systems

which will benefit many people. Recent developments in

modeling speech through improving recognition models has

helped voice recognition move forward.

These recent improvements have opened up possibilities

of new and reworked technologies that apply voice

recognition software. These technologies have the potential

to benefit and help progress society and its processes,

specifically through the sustainable increase in quality of

life.

WHAT IS VOICE RECOGNITION AND

HOW DOES IT WORK?

A voice recognition system translates spoken word to

digital signals that are processed to perform specific tasks.

However, voice recognition systems have requirements for

hardware and software in order to undergo this process.

Requirements for Voice Recognition Software

Basic voice recognition systems require 200 megahertz

Pentium processor, a minimum of 64 megabytes of RAM, a

basic microphone, and a minimum of a 16-bit sound card

[2]. Although most voice recognition will run, improvements

in these requirements can make a dramatic difference in the

performance of the voice recognition software. Increasing

the size of the RAM, processor, and sound card as well as

investing in a better microphone can help in the

improvement of the voice recognition system [2]. Along

with these hardware requirements, voice recognition requires

the use of software to collect, analyze, and interpret the data.

Different software approaches this process in different ways.

Acoustic and Language Models

Ronald Buttermore

Naseem Lee-Perkins

2

The acoustic and language models are the basic

processes in which sound is taken from the microphone and

processed through the computer.

In the acoustic model the voice is analyzed. After the

user speaks into the microphone, background noise and

unnecessary changes in volume are removed. Mathematical

calculations are used to take the voice and convert it to a

range of frequencies, which correspond to pitches in sound.

The data is then analyzed and converted to digital

representations of phonemes, which are the basic sounds of

language [2].

Next, the language model analyzes the content of speech.

The language model compares the combinations of

phonemes to words in a dictionary, the database of the most

common words in the English language. Dictionaries, or

databases, of individual software may differ based on how

each developer designs the language model process. Once

the language model decides which word was spoken, the

digital signal of the word is ready to be displayed or carry

out its designated task [2].

While acoustic and language models are the basic ways

voice recognition systems work, each system has a different

way of approaching these systems. The design of software

by each developer creates different techniques for voice

recognition systems.

Template Matching

Template matching is a technique that uses user input

and is referred to as a user dependent method. It has the

highest accuracy, at 98%, but in turn has the most limitations

[3]. Template matching starts with asking the user to speak

several words or terms into a microphone. The words are

repeated multiple times and a statistical average of the

sample words is stored used as a template [3]. Other

programs may come already loaded with digital voice

sample templates. Once the user speaks into the microphone

the electrical signal is converted to a digital signal using an

analog to digital converter and is stored in the memory. The

analog to digital converter takes a physical quantity and

converts it to a digital number [2]. Digital signals are

compared to those of the templates, and meaning is deduced

producing a word [3]. Some software also uses pre-

programed rules to help find meaning [4].

Feature Analysis

Feature Analysis is the speaker independent technique of

voice recognition software. It processes voice input using

Fourier Transforms or Linear Predictive Coding. Fourier

Transforms make a “generic function”, based on a quantity

like time, and convert it to another function based on a

physical property like frequency [5]. Fragniere, van Schaik,

and Vittoz describe the objective of Linear Predictive

Coding as a way “to predict the current value of the signal

using a linear combination of previous samples, each

weighted by a coefficient [6].” These two processes mimic

digital signals of real voice and the software attempts to find

characteristic similarities between expected inputs and actual

digitized voice input [3]. The similarities produce results that

correspond to a word. These similarities are present for a

wide range of speakers, so no user training is required. Once

a more efficient speaker independent software, like feature

analysis, is developed the system would be able to account

for accents, speed of delivery, pitch, volume, and inflection

[3].

THE APPLICATIONS AND

IMPORTANCE OF VOICE RECOGNITION

SOFTWARE

Voice recognition can have many applications, which

can benefit the lives of many people. Uses of voice

recognition software include learning tools, corporate and

consumer uses, helping people with disabilities, and

government use.

Learning Uses and Benefits

Learning can also be benefited by the use of voice

recognition software. It can be used to improve a user’s first

language addressing issues like pronunciation which can be

affected by ethnicity, social class, and education [7].

Because voice recognition systems are often looking for

traditional pronunciation of a word, error messages can alert

users to incorrect pronunciation. With repeated use of voice

recognition software along with error messages, users can

learn and practice to correctly pronounce words. A similar

idea can help users learn a second language as software can

be programed to mimic accents of other languages [7].

Again with practice and error analysis, voice recognition

software can shape a user’s pronunciation and delivery of a

non-native language. Voice recognition can also be used to

help individuals with learning disabilities like dyslexia, as it

allows them to easily use a computer and encourages writing

[8].

Voice recognition software can help improve the

education level of many individuals through helping with

speech problems and learning disorders, and as a tool to

teach a second language. Also through aiding the learning

process of a second language, voice recognition systems

help break down cultural barriers, letting users connect with

people without a language barrier. This is an increasingly

important aspect of education as the world becomes more

and more diverse and it is important to be able to

communicate and connect with people from different areas

of the world. The importance of education and learning is a

basic need that could be improved by voice recognition

technology, which would also potentially improve the

quality of life for many people.

Ronald Buttermore

Naseem Lee-Perkins

3

Consumer Uses and Benefits

Voice recognition software can be utilized by the

everyday consumer. Consumers can use voice recognition

software to dictate emails, navigate applications on electrical

devices, create documents, and search the web [9]. Voice

recognition systems also help people with disabilities.

People with speech disorders and people who are deaf or

hard of hearing can benefit from voice recognition software

[10]. Voice recognition systems can help facilitate faster

interactions between the deaf or hard of hearing with others,

especially those who do not know sign language, as speech

can be displayed as it is spoken. Voice recognition systems

can also help people with motor problems, physical

disabilities, and learning disabilities because would allow

them to use electronic devices without having to use their

hands [8].

Voice recognition allows for hands free use of computers

and electric devices. This allows for faster interactions with

electric devices as options are no longer limited to screen

size and there is no need to thumb through pages of options.

Voice recognition also saves the consumer time when typing

is involved because words can often be dictated faster than

they can be typed.

With new laws that require drivers to use electronic

devices hands free, voice recognition systems are now more

useful in phones and other devices. The ability to control

these devices without the use of hands makes roads safer for

drivers and pedestrians. With safer roads, increased

productivity, and aid towards the disabled, voice recognition

software has obvious benefits towards the improvement of

quality of life.

Corporate Uses and Benefits

Many different professions can use voice recognition

systems to help improve productivity. Voice recognition can

help reporting, healthcare systems, and call centers.

Healthcare systems often use vice recognition software as a

way to quickly enter electronic health records [11]. Call

centers allows users to participate in different activities or

transactions without waiting to talk to a live representative.

Careers in reporting often require quick typing; however

with the use of voice recognition systems words can be more

easily recorded and processed.

The main benefit of voice recognition software for

companies is increased productivity. In healthcare, workers

can enter information faster. Call centers don’t have to be

flooded with calls, as computers can process human input

and deliver results, freeing up workers to complete other

tasks. Reporters no longer have to worry about quick typing

and shortening words, with the risk of error and later

misunderstanding, as speech can be recorded and processed

as it is spoken. Implementing voice recognition software into

companies can thus improve a company’s productivity.

Government Uses and Benefits

Voice recognition software can be implemented to help

the government. Many different government agencies use

voice recognition including law enforcement, legal

departments, and case working [6]. These agencies may use

voice recognition software in much of the way that corporate

companies do. Employees can write quicker, easily and

quickly search documents, and quickly navigate the internet

and computers.

However, the government can also use voice recognition

software in the military. The most common use of voice

recognition software in the military is in “command and

control” which allows a control of a device or a system

through spoken commands. The military also uses voice

recognition software for speaker detection, and speaker

verification [12]. This allows the military to detect when

someone is speaking and confirm the identity of that person.

Government agencies similar to companies also

experience some of the same benefits as they also experience

increased productivity. However, since government agencies

tend to deal with more important matters, the productivity

experience is more beneficial to the public. Military uses of

voice recognition present the public with benefits in safety

since voice recognition systems can improve and aid

operations. Implementing voice recognition systems in

“command and control” help keep soldiers safe, as they no

longer are required to enter some of the dangerous situations

where some of the control devices must go.

However, uses and benefits of voice recognition system

are currently limited because of design flaws in the certain

software, which process the voice input and analyzes it, and

hardware, which have yet to make full use of voice

recognition technology.

CURRENT FLAWS AND LIMITATIONS

While voice recognition software does work, there are

some flaws and limitations to the current software that can

be improved in order to make the systems more user-friendly

and effective.

One of the current flaws in some of the voice recognition

software is the use of word systems. Most word systems are

discrete or connected. This means they can only process one

word or short phrases, respectively. This system requires

users to use clear articulate speech, making sure not to slur

their words. Users must be especially careful not to slur the

ends of their words in connected word systems where it is

easy to slur the end of one word into the beginning of the

other [3]. This creates a problem with users as speech

becomes unnatural. These words systems present problems

for some uses of voice recognition software. One example of

this issue is in reporting where voice recognition software is

needed to rapidly record speech. However, speech may not

always be correctly pronounced or clearly articulated. While

the discrete and connected word systems are the easiest to

Ronald Buttermore

Naseem Lee-Perkins

4

implement, there is a third type of word system that can be

used, the continuous speech system. Unfortunately

continuous speech systems are the hardest to implement.

Another problem with voice recognition systems is

homophones [2]. Homophones, words that sound the same

but have different meaning, are difficult to distinguish in

voice recognition systems. Because voice recognition

systems match the patterns of the sounds of phonemes with

words in a dictionary, homophones present multiple choices

in the system and the correct word is not always selected.

While some voice recognition systems use statistical models

to select the most probable word, this does not always

present the correct word. Other systems attempt to use

trigrams, which analyze the context to decide which word to

use. However, if the system is also operating with a discrete

word system, which is only a single word, there isn’t any

context to help analyze the meaning of the word.

User dependent systems also present multiple problems

for voice recognition systems [2]. Because user dependent

systems need to be adjusted by the user in order to become

familiar with an accent or speech pattern, they are not

practical for widespread use. This presents problems for

professional uses like reporting or call-centers, where a wide

range of accents, dialects, and speech patterns need to be

understood.

Not only would the flaws keep companies from

effectively completing tasks but in cases like a call center,

where there is client interaction, clients may grow frustrated

with the process and company. These flaws can also hinder

learning efforts and keep individuals with disabilities from

easily using electronic devices which could be detrimental in

a world which is becoming more reliant on electrical

devices.

IMPROVEMENT SOLUTIONS

Overcoming Flaws

The flaws discussed above provide software and

hardware developers a challenge to overcome language

interpretation barriers such as speech variations among

different users, innate language problems like homophones,

and user dependent systems. These problems are being

worked on by companies and individuals in order to

optimize the efficiency and accuracy of voice recognition

systems and improve technologies that are beneficial in

various practical applications. Solutions that are currently

being developed and implemented are software designs such

as the Neural Network Algorithm, Hidden Markov Model,

the MFCC algorithm, and combinations of multiple models

and algorithms. These solutions have beneficial applications

in areas such as security, convenience and accessibility for

the handicapped, emotion recognition, aiding scientific

research, and performing everyday tasks.

Modeling Speech

While speech recognition rates are low compared to a

technology like image recognition, research and

development of this software has of late grown steadily due

to growing expectations. The main challenge in designing

voice recognition software is not only analyzing frequency

and change in frequency of the voice, but also to attempt to

replicate with an algorithm how the brain recognizes

phonemes and syllables in speech. Two different main

models have been proposed and studied in order to extract

the speech signals from a voice and analyze their feature

parameters to have the computer recognize the speech input.

One model is based on a probabilistic approach algorithm,

while the other is an analytical algorithm. The Neural

Network (NN) recognition algorithm, mainly analytical, uses

a large coefficient matrix to match feature parameters of

syllables and words to an output index. This is not very

efficient. However, since every individual speech signal

must be run through the algorithm in order to be matched,

the algorithm must start from the beginning with every new

speech input. This is not ideal because of its slow rate of

recognition. A more probabilistic approach such as the

Hidden Markov Model (HMM) has proven to be faster with

recognition when faced with a larger number of speech

samples. The HMM attempts to improve recognition rates

by overcoming challenges such as rate of speech, ambient

noise, and varying voices by more successfully extracting

unique feature parameters of audio signals generated by the

voice [13].

Hidden Markov Model

The Hidden Markov Model selects quantitative

parameters in order to identify the unique features of speech.

An algorithm named the Mel-Frequency Cepstral

Coefficients (MFCC) algorithm is most commonly used to

identify specific parameters from a voice signal, which the

HMM uses to isolate further parameters. The MFCC is a

way of representing sound as a cepstrum (power spectrum

on a non-linear mel scale of frequency) by transforming the

cepstrum into a coefficient matrix. The matrix is a

representation of the syllable or word, which the HMM uses

to match to an initial database using various matrix

transformations to analyze based on a sequential

probabilistic model. As new voice and speech inputs are

transformed into MFCC matrices for different speakers and

syllables/words and inputted into the HMM, the HMM’s

initial probability is changed due to the variation of these

conditions. This transition probability is combined with the

initial probability based on the difference in number of states

and events in both transformations of MFCC matrices to

form a resulting observational probability. The three

probabilities (initial, transition, and observation) are then

used as variables to compare a test word or phrase to a pre-

trained test word or phrase. Improvements on these three

Ronald Buttermore

Naseem Lee-Perkins

5

variables are taken into account by the HMM when

comparing inputs. Applications of the HMM have been

tested, and research has found that when the number of

sample inputs increases, the recognition rate increases,

proving the HMM model to be successively more accurate.

A Mel-frequency index increase also slightly increases

accuracy, but slows processing time [13].

HMM ALGORITHM RECOGNITION RATE [13]

The rate of recognition of the HMM algorithm based on

various factors

MFCC Improvements

While the HMM focuses on interpreting user speech as

language, a solution that attempts to more accurately

recognize the speaker’s identity is a robust computer voice

recognition improvement of the MFCC algorithm based on

an individual user. This improvement tests a user’s voice

patterns and utilizes an improved MFCC algorithm to

produce a faster, more accurate result for voice recognition.

It attempts to achieve text-independent speaker

identification, or the identification of a user by their voice

without regards to the user’s specific language or words.

The modifications to the original MFCC include slightly

changing steps of the MFCC algorithm that convert the

sound into a MFCC matrix. After the speech signal is first

blocked into frames, it is windowed in order to minimize

error in signal discontinuities. Using the function of a

Kaiser window instead of a traditional Hamming window is

an improvement in that it minimizes the mean square error

instead of maximum error. A Fast Fourier Transformation is

used to convert the windowed frames from a time domain to

a frequency domain, which is then filtered using a Mel-scale,

which attempts to emulate the way the brain processes

different frequencies of speech as non-linear. Modifying the

way the Fast Fourier Transformation is applied over a Mel

filter has shown to reduce computing costs and make the

algorithm more efficient. Another suggested feature

matching technique (similar to HMM) is called Vector

Quantization. In this method, vectors assigned to MFCC are

mapped to a large area of space instead of matrices used in

HMM. Each small region within is referred to as a cluster,

with multiple centers of clusters (codewords) comprising a

codebook. To feature match, input MFCC vectors are

compared to clustered MFCC initial training vectors.

Research has shown these modifications to the MFCC

technique increase the speed of the algorithm (from 0.12 ms

to 0.11 ms) and increase the accuracy (from 66% to 80%) in

a sample of 50 tests [14].

Combining Methods

Various methods to obtain faster, more accurate speech

recognition, such as the ones previously detailed, all have

their own strengths and weaknesses. A solution proposed in

order to eliminate weaknesses in different approaches to

improving voice recognition and interpretation is to combine

multiple methods. One of these combinations is using the

HMM with Prediction by Partial Matching (PPM) method.

PPM is a technique that uses statistical analysis based on

context to predict the next character in a series. When

combined, a new system is created that uses the voice and

word recognition strengths of HMM with the prediction

matching of PPM to greatly improve input processing and

recognition [15]. Combinations of multiple proposed

solutions could be the key to the goal of increasingly

accurate voice recognition.

BENEFITS OF IMPROVEMING VOICE

RECOGNITION SYSTEMS Practical Applications

Improved voice recognition technology is a further step

towards beneficial applications of the software in many

fields and aspects of life. These aspects can be applied in

consumer, corporate, industrial, and scientific fields in

various ways such as security, automation systems, ease of

use increases (convenience), and aiding other voice based

tasks. As the reliability and integration of this technology

increases, the more practical uses we will find will also

increase.

Security

One of the benefits provided by the improvement of

voice recognition software is the major application it has in

security. The potential for vocal based security systems

gives an added layer of protection in an era of information.

Speech recognition makes it possible to verify a person’s

identity through voice interfaces such as the phone and

computer microphone. This gives a person access to various

control and security actions such as controlling a mobile

device, voice banking, voice mail, voice activated security

control over certain devices, and remote access to a

computer [14]. Biological means of controlling computer

systems helps prevent fraud and theft based on an

individual’s unique physical characteristics [15].

Ease of Use and Handicap Accessibility

Ronald Buttermore

Naseem Lee-Perkins

6

Another of the various benefits of the applications of

voice recognition systems is the increase in ease of use for

controlling computer systems. This is probably one of the

most useful applications of the voice recognition software,

as it provides a means of control based on one of a person’s

most basic and instinctual processes: speech. Not only does

this make communication and control easier, it also gives

people with less access to controlling computers and even

basic machines a way to avoid having to physically control

systems. A home automation system was developed in order

to assist handicapped persons and the elderly with basic

household tasks they may be incapable of doing themselves.

Combining speech recognition software with wireless

networks allows the user to control most electronic devices

by voice, providing a 79.8% accuracy with 1225 commands

[16].

Sustainability: Looking Forward

As our use of technology grows and advances, more and

more emphasis is being placed on the sustainability of our

current systems and lifestyles, whether it is environmental or

societal. There is a growing awareness of the challenges we

must overcome in order to be able to sustain equilibrium

between meeting our present needs and ensuring that we

leave behind a world where future generations can live

comfortably. While a discussion of environmental impact

with regards to computer technologies and software would

be less meaningful than most other engineering topics, voice

recognition technology presents advantages that offer

flexibility in sustaining and improving quality of life. As

voice recognition technologies improve and unfold into new

applications, it is definitely a tool that could be used on

quality of life and sustainability enhancements.

Sustainability with regards to quality of life upgrades is

an important factor to consider when weighing the

importance of a technology such as voice recognition

software. Voice recognition software provides clear

improvements to quality of life. One example is the direct

applications to people with disabilities. Certain

functionalities of the technologies involved would provide

disabled people to become more independent. Less reliance

on other people creates a more enjoyable, free experience

where the disabled can find more meaningful interactions

and come closer to living a normal lifestyle. Even for

normal people in a home or work environment, less work

and effort has shown to reduce stress, increasing quality of

life where voice recognition technology cuts down on

manual labor.

Along with creating more significant experiences for

people due to its simplistic and intuitive nature, voice

recognition technology could also help improve quality of

life through its applications relevant to making the world a

safer place. Governments will have a better way to operate

smoothly and protect its citizens, through voice verification

systems and secure identity techniques. Even smaller

impacts such as hands-free control of a mobile device in a

motor vehicle could help reduce accidents and create safer

roads. The elimination of distractions due to the ability to

control electronics vocally would make the world safer and

improve quality of life.

CONCLUSION: FUTURE

DEVELOPMENTS OF VOICE

RECOGNITION SYSTEMS

In conclusion, voice recognition software, its

improvements, and technological applications have the

potential to positively affect the future of human society. Its

useful applications in security, productivity, and efficiency

provide concrete reasons why the development,

improvement, and integration of this technology is so

important. From consumer uses to government uses, voice

recognition is an important technology that has the potential

to benefit the lives of many people, and sustainably improve

the quality of life for future generations.

REFERENCES

[1] (2009, April 30) .“Voice Recognition Systems”. US

Food and Drug Administration.

http://www.fda.gov/ICECI/Inspections/InspectionGuides/Ins

pectionTechnicalGuides/ucm093579.htm

[2] S. Miastkowsk. (2000, April 14). “How It Works: Speech

Recognition.” PC World. (Online Article).

http://www.pcworld.com/article/16276/article.html

[3] J. Baumann. “Voice Recognition.” Human Interface

Technology Lab. (Online Article).

http://www.hitl.washington.edu/scivw/EVE/index.html

[4] D. Borghino. (2012, August 24). “New approach

promises more accurate speech recognition software.”

Gizmag. (Online Article). http://www.gizmag.com/speech-

recognition-ntnu/23870/

[5] T. Tao. “Fourier Transforms” Department of

Mathematics, UCLA.

http://www.math.ucla.edu/~tao/preprints/fourier.pdf

[6] E. Fragniere, A. van Schaik, E. Vittoz. “Linear Predictive

Coding of Speech Using an Analogue Cochlear Model”.

Swiss Federal Institute of Technology. (Online Article).

http://goo.gl/sA5nB.

[7] A. Neri, C. Cucchiarini, W. Strik. “Automatic Speech

Recognition for Second Language Learning: How and Why

it Actually Works”. Department of Language and Speech,

University of Nijmegen. (Online Article).

http://goo.gl/XiN80.

[8] “Speech Recognition For Learning”. Brainline.org.

(Online Article).

http://www.brainline.org/content/2010/12/speech-

recognition-for-learning_pageall.html

[9] (2008 August). “Dragon NaturallySpeaking Professional

for Government”. Nuance Communications. (Online

Ronald Buttermore

Naseem Lee-Perkins

7

Article).

http://www.nuance.com/ucmprod/groups/dragon/documents/

webasset/nd_004913.pdf

[10] R. Hoyt (2010 January 1). “Lessons Learned from

Implementation of Voice Recognition for Documentation in

the Military Electronic Health Record System”. National

center for Biotechnology Information. (Online Article).

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2805557/

[11] (2005 December).“Use of Speech and Language

Technology in Military Environments”. NATO Research and

Technology Organisation. (Online Article).

http://www.stephanepigeon.com/Docs/TR-IST-037-ALL.pdf

[14] S. Jarng. (2011). “HMM Voice Recognition Algorithm

Coding” IEEE. (Online Article).

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5

772321

[15] C. Leon. (2009). “Robust Computer Voice Recognition

Using Improved MFCC Algorithm” IEEE. (Online Article).

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5

260824

[16] B. Wang, J. Zhang. (2010). “A novel voice recognition

model based on HMM and fuzzy PPM” IEEE. (Online

Article).

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5

656855

ADDITIONAL SOURCES

AlShu’eili, H. "Voice Recognition Based Wireless Home

Automation System." IEEE Xplore. IEEE, 2011. Web.

<http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=>.

Aziz, A. "Security System Using Biometric Technology:

Design and Implementation of Voice Recognition System

(VRS)." IEEE Xplore. IEEE, 2008. Web.

<http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=>.

ACKNOWLEDGMENTS

Thanks go to the Swanson School of Engineering, the

engineering professors and writing instructors, the library,

coffee, and the Internet.