VOICERSFINAL

7/31/2019 VOICERSFINAL

1/74

VOICE RESPONSE SYSTEM

ABSTRACT

A voice response system is a computer system that responds to voice commands, rather

than input from a keystroke or a mouse. Uses for this kind of system range from convenience to

necessity to security. People who are visually or otherwise physically impaired are prime

candidates for a voice response system. Because they cannot see or otherwise access a keyboard

or mouse, they have no way to access a computer without a voice response system, unless they

want to depend entirely on other people. Being able literally to tell a computer what to do may be

a revelation for someone who ordinarily has little hope of controlling a computer. A voice

response system would also come in handy for someone who is not physically impaired. With a

voice response system, you wouldn't need to be very close to your computer in order to access it

or give it commands. As long as you are in earshot of the PC, it can use its voice response system

to accept voice commands from you in the same way that it traditionally accepts keystroke and

mouse commands.

The system acquires speech at run time through a microphone and processes the sampled

speech to recognize the uttered text. Sphinx-4 is a speech recognition system written entirely in

the Java(TM) programming language. A VRS is an intelligent system which enables the user to

instruct computer to perform actions through voice commands and also form his own repository

of commands and map them to appropriate actions. The recognized text will be matched to

corresponding action


2/74

CONTENTS

1. Introduction

2. Voice recognition

Relevance of The Project

Application of voice recognition

3. Working of the Project

Speech Engine

JSAPI

JSAPI classes and interfaces

Speech Synthesis

Speech Recognition

Components

Speech Recognition Weakness & Flaws

Future of Speech Recognition

JSGF Grammar Format

Sphinx Speech Recognition System

4. Feasibility Study & Requirement Analysis

5. System Analysis & System Design

6. Data flow diagram

Context diagram

Level 1

Level 2

7. code snippets

8. results and screenshots

9. discussion

10. conclusion

11.bibliography


3/74


4/74


5/74

Chapter 1

INTRODUCTION

A VRS is an intelligent system which enables the user to instruct computer to

perform actions through voice commands and also form his own repository of commands and

map them to appropriate actions.

A voice response system is a computer system that responds to voice commands, rather

than input from a keystroke or a mouse. Uses for this kind of system range from convenience to

necessity to security. People who are visually or otherwise physically impaired are primecandidates for a voice response system. Because they cannot see or otherwise access a keyboard

or mouse, they have no way to access a computer without a voice response system, unless they

want to depend entirely on other people. Being able literally to tell a computer what to do may be

a revelation for someone who ordinarily has little hope of controlling a computer. A voice

response system would also come in handy for someone who is not physically impaired. With a

voice response system, you wouldn't need to be very close to your computer in order to access it

or give it commands. As long as you are in earshot of the PC, it can use its voice response system

to accept voice commands from you in the same way that it traditionally accepts keystroke and

mouse commands.

Key points that outline the implemented idea are:

VRS runs as a background process.

Based on the instruction, multiple processes are created.

While the background process keeps on listening to the user requirements,

independent processes are continuously created in response to the input voice

instruction.

Voice recognition may be enabled in the processes executed on top also, but it has been

avoided as it interferes with the background process.


6/74

VRS Library has been built which includes some basic commands.

1. DATA FILE - Opens list of saved file that may be given.

2. SONGS - Opens list of songs that may be played.

3. MOVIES - Opens list of movies that may b played.

4. NEWS - Read the news from given website.

5. SNAP - Opens picture.

The library may be further extended by the user for his own specific requirements. User.gram

has been included in the src along with directions to add an action map for this purpose.

Technologies used in implementation:

Sphinx 4.

JSAPI.

Java Programming Language.

JSGF Grammer files.

The relevence and use of each of the above has been discussed later in the document. The

code has been developed in Eclipse. The paths used in mapping actions are absolute and hence

system dependent.

The requirement of this project is to develop an intelligent system which:

1. is capable of taking voice input.

2. interprets the input command.

3. processes the command to map it to the action set.

4. it has an action set must contain mapping of input to the corresponding response.

5. has adaptive mechanism to handle more mappings and add it to the action set.

6. Example: voice input draw circle on the screen.


7/74

Chapter 2

VOICE RECOGNITION

The term voice recognition is sometimes used to refer to recognition systems that must be

trained to a particular speaker as is the case for most desktop recognition software.

1. Voice Recognition: Converts speech to text.

2. Recognizing the speaker can simplify the task of translating speech.

3. Voice Recognition targets to generalize the task without being targeted at a single speaker.

Relevance To The Project

1. Voice recognition is used to map a voice command with its corresponding action.This is brought about by converting speech to text.

2. The API used for recognizing voice is trained [by default] to understand americanmale accent recorded at 16kbps.

3. The program matches the input voice with the voice on which it is trained and maps it to the

best possible result.

Although the idea of recognizing voice may seem fairly simple, there are a lot of real timeproblems. Some include:

Large amount of memory is required to store voice files

Noise interference reduces accuracy.

Comparing our accent with the trained voice often gives rise to absurd results.

Precision of the system is directly proportional to complexity of source code.

APPLICATIONS OF VOICE RECOGNITION

Health Care

In the health care domain, even in the wake of improving speech recognition technologies,

medical transcriptionists (MTs) have not yet become obsolete. The services provided may be


8/74

redistributed rather than replaced. Speech recognition is used to enable deaf people to understand

the spoken word via speech to text conversion, which is very helpful.

Military

Substantial efforts have been devoted in the last decade to the test and evaluation of speech

recognition in fighter aircraft. Of particular note are the U.S. program in speech recognition for

the Advanced Fighter Technology Integration (AFTI)/F-16 aircraft (F-16VISTA), the program in

France on installing speech recognition systems on Mirage aircraft, and programs in the UK

dealing with a variety of aircraft platforms. In these programs, speech recognizers have been

operated successfully in fighter aircraft with applications including: setting radio frequencies,

commanding an autopilot system, setting steer-point coordinates and weapons release

parameters, and controlling flight displays. Generally, only very limited, constrained

vocabularies have been used successfully, and a major effort has been devoted to integration of

the speech recognizer with the avionics system.

Telephony and other domains

ASR in the field of telephony is now commonplace and in the field of computer gaming and

simulation is becoming more widespread. Despite the high level of integration with word

processing in general personal computing, however, ASR in the field of document production

has not seen the expected increases in use. The improvement of mobile processor speeds made

feasible the speech-enabled Symbian and Windows Mobile Smartphones. Speech is used mostly

as a part of User Interface, for creating pre-definedor custom speech commands. Leading

software vendors in this field are: Microsoft Corporation (Microsoft Voice Command), Nuance

Communications (Nuance Voice Control), Vito Technology (VITO Voice2Go), SpeereoSoftware (Speereo Voice Translator) and SVOX.


9/74

People with disabilities

People with disabilities can benefit from speech recognition programs. Speech recognition is

especially useful for people who have difficulty using their hands, ranging from mild repetitive

stress injuries to involved disabilities that preclude using conventional computer input devices.

In fact, people who used the keyboard a lot and developed RSI became an urgent early market

for speech recognition. Speech recognition is used in deaf telephony, such as voicemail to text,

relay services, and captioned telephone. Individuals with learning disabilities who have problems

with thought-to-paper communication (essentially they think of an idea but it is processed

incorrectly causing it to end up differently on paper) can benefit from the software.

Home Automation

Luxury being the priority, such program also finds application in home automation. Home

automation may include centralized control of lighting, heating, ventilation, air conditioning and

other systems, to provide improved convenience, comfort, energy efficiency and security.

Transcription

Transcription in the linguistic sense is the conversion of a representation of language into

another representation of language, usually in the same language but in a different form.

Transcription should not be confused with translation, which in linguistics usually means

converting from one language to another, such as from English to Spanish. The most common

type of transcription is from a spoken-language source into text.


10/74

Chapter 3

WORKING OF THE PROJECT

In this chapter, we will cover all the elements required for the working of the project and then

converge the requirements to explain the solution design implemented for the project.

Speech Engine

The Speech Engine loads a list of words to be recognized. This list of words is called a grammar.

Takes input as distinct characteristics of sound - derived from the waveform and compares them

with its own acoustic model. The engine searches its acoustic space, using the grammar to guidethis search. It then determines which words in the grammar the audio most closely matches and

returns a result.

Speech Engine


11/74

Java Speech API/JSAPI

The Java Speech API (JSAPI) is an application programming interface for cross-

platform support of command and control recognizers, dictation systems, and speech

synthesizers. Although JSAPI defines an interface only there are several implementations createdby third parties, for example FreeTTS.

The Java Speech API enables speech applications to interact with speech engines in a

common, standardized, and implementation-independent manner. Speech engines from different

vendors can be accessed using the Java Speech API, as long as they are JSAPI-compliant. With

JSAPI, speech applications can use speech engine functionality such as selecting a specific

language or a voice, as well as any required audio resources. JSAPI provides an API for both

speech synthesis and speech recognition.

.

The Java Speech APIs classes and interfaces

The different classes and interfaces that form the Java Speech API are grouped into the following

three packages:

javax.speech: Contains classes and interfaces for a generic speech engine.

javax.speech.synthesis: Contains classes and interfaces for speech synthesis.

javax.speech.recognition: Contains classes and interfaces for speech recognition.

The Central class is like a factory class that all Java Speech API applications use. It provides

static methods to enable the access of speech synthesis and speech recognition engines. The

Engine interface encapsulates the generic operations that a Java Speech API-compliant speech

engine should provide for speech applications.

Speech applications can primarily use methods to perform actions such as retrieving the

properties and state of the speech engine and allocating and deallocating resources for a speech

engine. In addition, the Engine interface exposes mechanisms to pause and resume the audio


12/74

stream generated or processed by the speech engine. The Engine interface is subclassed by the

Synthesizer and Recognizer interfaces, which define additional speech synthesis and speech

recognition functionality. The Synthesizer interface encapsulates the operations that a Java

Speech API-compliant speech synthesis engine should provide for speech applications.

The Java Speech API is based on the event-handling model of AWT components. Events

generated by the speech engine can be identified and handled as required. There are two ways to

handle speech engine events: through the EngineListener interface or through the EngineAdapter

class.

JSAPI STACK


13/74

Features:

Converts speech to text.

Converts text and delivers them in various formats of speech.

Supports events based on the Java event queue.

Easy to implement API interoperates with multiple Java-based applications like applets

and Swing applications.

Interacts seamlessly with the AWT event queue.

Supports annotations using JSML to improve pronunciation and naturalness in speech.

Supports grammar definitions using JSGF.

Ability to adapt to the language of the speaker.

Two core speech technologies are supported through the Java Speech API: speech

synthesis and speech recognition

Speech synthesis

Speech synthesis provides the reverse process of producing synthetic speech from text

generated by an application, an applet, or a user. It is often referred to as text-to-speech

technology.

The major steps in producing speech from text are as follows:

Structure analysis: Processes the input text to determine where paragraphs, sentences, and

other structures start and end. For most languages, punctuation and formatting data are used

in this stage.

Text pre-processing: Analyzes the input text for special constructs of the language. In

English, special treatment is required for abbreviations, acronyms, dates, times, numbers,

currency amounts, e-mail addresses, and many other forms. Other languages need special

processing for these forms, and most languages have other specialized requirements


14/74

The remaining steps convert the spoken text to speech:

Text-to-phoneme conversion: Converts each word to phonemes. A phoneme is a basic unit

of sound in a language.

Prosody analysis: Processes the sentence structure, words, and phonemes to determine the

appropriate prosody for the sentence.

Waveform production: Uses the phonemes and prosody information to produce the audio

waveform for each sentence.

Speech synthesizers can make errors in any of the processing steps described above.

Human ears are well-tuned to detecting these errors, but careful work by developers can

minimize errors and improve the speech output quality. The Java Speech API and the Java

Speech API MarkupLanguage (JSML) provide many ways for you to improve the output quality

of a speech synthesizer.

Speech Recognition

Speech recognition provides computers with the ability to listen to spoken language and

determine what has been said. In other words, it processes audio input containing speech by

converting it to text.

Speech Recognition System

Components:

With the help of microphone audio is input to the system, the pc sound card produces the

equivalent digital representation of received audio.

DigitizationThe process of converting the analog signal into a digital form is known as digitization, it

involves the both sampling and quantization processes. Sampling is converting a continuous

signal into discrete signal, while the process of approximating a continuous range of values is

known as quantization.


15/74

SPEECH RECOGNITION SYSTEM

Acoustic Model

An acoustic model is created by taking audio recordings of speech, and their text transcriptions,

and using software to create statistical representations of the sounds that make up each word. It is

used by a speech recognition engine to recognize speech .The software acoustic model breaks the

words into the phonemes.

Language Model

Language modeling is used in many natural language processing applications such as speech

recognition tries to capture the properties of a language and to predict the next word in the


16/74

speech sequence . The software language model compares the phonemes to words in its built in

dictionary .

Speech engine

The job of speech recognition engine is to convert the input audio into text ; to accomplish this it

uses all sorts of data, software algorithms and statistics. Its first operation is digitization as

discussed earlier, that is to convert it into a suitable format for further processing. Once audio

signal is in proper format it then searches the best match for it. It does this by considering the

words it knows, once the signal is recognized It returns its corresponding text string.

The major steps of a typical speech recognizer are as follows:

Grammar design: Defines the words that may be spoken by a user and the patterns in

which they may be spoken.

Signal processing: Analyzes the spectrum (the frequency) characteristics of the incoming

audio.

Phoneme recognition: Compares the spectrum patterns to the patterns of the phonemes

of the language being recognized.

Word recognition: Compares the sequence of likely phonemes against the words and

patterns of words specified by the active grammars.

Result generation: Provides the application with information about the words the

recognizer has detected in the incoming audio. The result information is always providedonce recognition of a single utterance (often a sentence) is complete, but may also be

provided during the recognition process. The result always indicates the recognizer's best

guess of what a user said, but may also indicate alternative guesses.


17/74

A grammaris an object in the Java Speech API that indicates what words a user is

expected to say and in what patterns those words may occur. Grammars are important to speech

recognizers because they constrain the recognition process. These constraints make recognition

faster and more accurate because the recognizer does not have to check for bizarre sentences.

The Java Speech API supports two basic grammar types: rule grammars and dictation

grammars. These types differ in various ways, including how applications set up the grammars;

the types of sentences they allow; how results are provided; the amount of computational

resources required; and how they are used in application design. Rule grammars are defined

by JSGF, the Java Speech Grammar Format.

Speech Recognition Workflow

Speech Recognition weakness and flaws

Besides all these advantages and benefits, yet a hundred percent perfect speech recognitionsystem is unable to be developed. There are number of factors that can reduce the accuracy and

performance of a speech recognition program.

Speech recognition process is easy for a human but it is a difficult task for a machine, comparing

with a human mind speech recognition programs seems less intelligent, this is due to that fact


18/74

that a human mind is God gifted thing and the capability of thinking, understanding and reacting

is natural, while for a computer program it is a complicated task, first it need to understand the

spoken words with respect to their meanings, and it has to create a sufficient balance between the

words, noise and spaces. A human has a built in capability of filtering the noise from a speech

while a machine requires training, computer requires help for separating the speech sound from

the other sounds.

Few factors that are considerable in this regard are:

Homonyms:

Are the words that are differently spelled and have the different meaning but acquires the same

meaning, for example there their be andbee. This is a challenge for computer machine

to distinguish between such types of phrases that sound alike.

Overlapping speeches:

A second challenge in the process, is to understand the speech uttered by different users, current

systems have a difficulty to separate simultaneous speeches form multiple users.

Noise factor:

The program requires hearing the words uttered by a human distinctly and clearly. Any extra

sound can create interference, first you need to place system away from noisy environments and

then speak clearly else the machine will confuse and will mix up the words.

The Future Of Speech Recognition:

Accuracy will become better and better.

Dictation speech recognition will gradually become accepted.

Greater use will be made ofintelligentsystems which will attempt to guess what the

speaker intended to say, rather than what was actually said, as people often misspeak and

make unintentional mistakes.


19/74

Microphone and sound systems will be designed to adapt more quickly to changing

background noise levels, different environments, with better recognition of extraneous

material to be discarded.

JSGF Grammar Format

Speech recognition systems provide computers with the ability to listen to user speech and

determine what is said. Current technology does not yet support unconstrained speech

recognition: the ability to listen to any speech in any context and transcribe it accurately. To

achieve reasonable recognition accuracy and response time, current speech recognizers constrain

what they listen for by using grammars.

The Java Speech Grammar Format (JSGF) defines a platform-independent, vendor-independent

way of describing one type of grammar, a rule grammar (also known as a command and control

grammar or regular grammar). It uses a textual representation that is readable and editable by

both developers and computers, and can be included in Java source code. The other major

grammar type, the dictation grammar, is not discussed in this document.

A rule grammar specifies the types of utterances a user might say (a spoken utterance is similar

to a written sentence). For example, a simple window control grammar might listen for "open a

file", "close the window", and similar commands.

What the user can say depends upon the context: is the user controlling an email application,

reading a credit card number, or selecting a font? Applications know the context, so applications

are responsible for providing a speech recognizer with appropriate grammars.

This document is the specification for the Java Speech Grammar Format. First, the basic naming

and structural mechanisms are described. Following that, the basic components of the grammar,

the grammar header and the grammar body, are described. The grammar header declares the

grammar name and lists the imported rules and grammars. The grammar body defines the rules

of this grammar as combinations of speakable text and references to other rules. Finally, some


20/74

simple examples of grammar declarations are provided. Grammars are used by speech

recognizers to determine what the recognizer should listen for, and so describe the utterances a

user may say.

A Java Speech Grammar Format document starts with a self-identifying header. This header

identifies that the document contains JSGF and indicates the version of JSGF being used

(currentlyV1.0). JSGFV 1:0.The grammar body defines rules. Each rule is defined in a rule

definition. A rule is defined once in a grammar. The order of definition of rules is not

significant. Rule Name >= rule Expansion; public < rule Name >= rule Expansion;

Sphinx Speech Recognition System

Sphinx-4 is a speech recognition system written entirely in the Java(TM) programming language.

Sphinx is a continuos-speech, speaker-independent recognition system with large vocabulary

recognition making use of hidden Markov acoustic models(HMMs) and n-gram statistical

language model.

Each component of the architecture is explained below:

Recognizer- Contains the main components of Sphinx-4, which are the front end, the linguist,

and the decoder. The application interacts with the Sphinx-4 system mainly via the Recognizer.

Audio - The data to be decoded. This is audio in most systems, but it can also be configured to

accept other forms of data, e.g., spectral or cepstral data.

Front End- Performs digital signal processing (DSP) on the incoming data.

Feature- The output of the front end are features, which are used for decoding in the rest of the

system.


21/74

Linguist- Embodies the linguistic knowledge of the system, which are the acoustic model, the

dictionary, and the language model. The linguist produces a search graph structure on which the

search manager performs search using different algorithms.

.

Sphinx-4 Architecture

Acoustic Model- Contains a representation (often statistical) of a sound, often created by

training using lots of acoustic data


22/74

Dictionary- Responsible for determining how a word is pronounced.

Language Model- Contains a representation (often statistical) of the probability of

occurrence of words.

Search Graph- The graph structure produced by the linguist according to certain criteria

(e.g., the grammar), using knowledge from the dictionary, the acoustic model, and the

language model.

Decoder- Contains the search manager.

Search Manager- Performs search using certain algorithm used, e.g., breadth first

search, best-first search, depth-first search, etc.. Also contains the feature scorer and the

pruner.

Active List- A list of tokens representing all the states in the search graph that are active

in the current feature frame.

Scorer- Scores the current feature frame against all the active states in the Active List.

Pruner- Prunes the active list according to certain strategies.

Result- The decoded result, which usually contains the N-best results.

Configuration Manager- loads the Sphinx-4 configuration data from an XML based

file, and manages the component life cycle for objects.


23/74

The need for Sphinx 4:

Need to overcome Sphinx-3s limitations

Need for flexibility in acoustic modeling

Require handling of multimodal inputs

With information fusion at various levels

Need for more correct decoders

Need for expansion of language model capabilities

Facilitate the incorporation of several new online

algorithms, that are currently difficult to incorporate

into Sphinx-3

Need for better application interfaces

The SPHINX of the new millennium:

An open source project by Carnegie Mellon

University, SUN Microsystems Inc. and MERL

Written entirely in Java TM

Highly modularized and flexible architecture

Supports any acoustic model structure

Supports most types of language models

CFGs, N grams, Combinations

New algorithms for obtaining word level hypotheses

Multimodal inputs

Flexible APIs


24/74

Recognition Issue:

Good Voice Data is the key

to good recognition!

Quality of recognitionis directly related to quality of voice data

As part of the Sphinx 4 project we will be developing a trainer to give us good voice data

How does a Recognizer Work?


25/74

Goal:

Audio goes in

Results come out

Three application types

Isolated words

Command /

Control

General Dictation

Front-End:

Transforms speech waveform into features used by recognition

Features are sets of mel-frequency cepstrum coefficients (MFCC)

MFCC model human auditory system

Front-End is a set of signal processing filters

Pluggable architecture


26/74

Knowledge Base:

The data that drives the decoder

Consists of three sets of data:

Dictionary

Acoustic Model

Language Model

Needs to scale between the three application types

DICTIONARY:

Maps words to pronunciations

Provides word classification information (such as part-of- speech)

Single word may have multiple pronunciations

Pronunciations represented as phones or other units

Can vary in size from a dozen words to >100,000 words


27/74

Language Model:

Describes what is likely to be spoken in a particular context

Uses stochastic approach.Word transitions are defined in terms of transition probabilities

Helps to constrain the search space

Acoustic Models:

Database of statistical models

Each statistical model represents a ingle unit of speech such as a word or phoneme

Acoustic Models are created/trained by analyzing large corpora of labeled speech

Acoustic Models can be speaker dependent or speaker independent


28/74

Chapter 4

FEASIBILITY STUDY, REQUIREMENT ANALYSIS

SOFTWARE DEVELOPMENT LIFE CYCLE

Since the inception of this project all software engineering principles have been followed. This

project has passed through all the stages of software development lifecycle (SDLC). A

development process consist of various phases, each phase ending with a defined output. The

main reason for following the SDLC process is that it breaks the problem of developing software

into successfully performing a set of phases, each phase handling a different concern of software

development. Object technologies lead to reuse and reuse (of program components) lead to faster

software development and higher quality programs. Object oriented software is easy to maintain

because its structure is inherently decoupled. In addition, object oriented systems are easier to

adopt and easier to scale. The Object Oriented process moves through an evolutionary spiral that

starts with customer satisfaction. It is here that the problem domain is defined and that basic

problem classes are identified. Planning establishes a foundation for the Object Oriented Project

plan.

FEASIBILTY STUDYIt is feasible because it is being frequently used in various areas like military, telephone,

healthcare etc. It is also used by topmost industries for the recognition of their employees in their

attendance process. So it is feasible and can be completed in given period. A Real-Time Voice

Recognition Security System can be developed using the different algorithm.

THREE PHASES OF FEASIBILITY STUDY

Technical Feasibility:

It involves determining whether or not a system can actually be constructed to solve the problem

at hand. The technical issues raised during the feasibility stage of investigation are related to

achievability of projects goal and possibility of completion of project.


29/74

Economical Feasibility:

This feasibility deals with the cost/benefit analysis. A number of intangible benefits like user

friendliness, robustness and security were pointed out. The cost that will be incurred upon the

implementation of this project would be quite nominal.

Operational Feasibility:

The developed system will be very reliable and user friendly. All the features and operations that

we will implement in our project are possible to implement and thus feasible. This will facilitate

easy use and adoptability of the system. With the use of menus, and proper validation required it

become fully understandable to the common user and operational with the user.

STEPS INVOLVED IN THE FEASIBILITY ANALYSIS

Feasibility is carried out in the following steps:

Form a project team and appoint a project leader:

First of all project management of the organization forms separate teams for independent project

team comprises of one or system analyst and programmers with a project leader. The project

leader is responsible for planning and managing the development activities of the system.

Starts preliminary investigation:

The system analyst of each project team starts preliminary investigation through different fact

techniques.

Prepare the current system flow chart:

After preliminary investigation; the analysts prepare the system flowchart of the current system.

These charts describe the general working of the system in graphical way.

Determine objective of the proposed system:

The major objectives of the proposed system are listed by each analyst and are discussed in the

current system.


30/74

Describe the deficiencies of the proposed system:

On study the current system flowchart, the analysts prepare their system flowchart; the analysts

prepare their system flowchart. Systems flowcharts of the proposed system are compared with of

the current system.

Prepare the proposed system flow chart:

After determining the major objectives of the proposed system; the analysts prepare their system

flowchart. Systems flowcharts of the proposed system are compared with of the current system.

Determining the technical feasibility:

The existing computer systems (hardware/software) of the concerned department are identified

and their technical specifications are noted down. The analyst decides whether the existing

systems are sufficient for the technical requirement of the proposed system or not.

Determine the operational feasibility:

After determine the economic feasibility, the analysts identify the responsible users of the system

and hence determine the operational feasibility of the project.

Presentation of feasibility analysis:

During the feasibility study, the analysts also keep on the feasibility report. At the end feasibility

analysis report is given to the management along the oral presentation.

Feasibility Analysis report:

Feasibility analysis report is formal document for management use and is prepared for system

analyst during or after feasibility study. This report generally contains the following sections .

Covering letter:

It is formally presents the report with brief description of the project problem along with

recommendation to be considered.


31/74

Table of content:

It lists the section of feasibility study report along with their page number.

Description of the existing system:

A brief description of the existing system along with the purpose and scope of the project.

System requirement:

The system requirements, which are either derived from the existing system or from the

discussion with the users, are presented in this section.

Description of proposed system:

It presents a general description of the proposed system, highlighting its role in solving theproblem. A description of output reports to be generated by the system is also represented in the

desired formats.

Development plan:

It present a detailed plan with the starting and completion dates for different phases of SDLC.

Complimentary planes also needed for hardware and software evaluation, purchase and

installation.

Technical feasibility finding:

It presents the finding of technical feasibility study along with recommendation.

Costs and benefits:

The detailed findings of cost and benefits analysis are presented in this section. The saving and

benefits are highlighted to justify the economic feasibility of this project.

Operational feasibility finding

It presents the finding of operational feasibility along with the human resource requirements to

implement the system.


32/74

REQUIREMENT ANALYSIS

A requirement is a condition or capability that must be met or possessed by a system to satisfy a

contract, standard, specification or other formally imposed specification of the client. This phase

ends with the Software Requirements Specifications (SRS). The SRS is a document that

completely describes what the proposed software should do without describing how the software

will do it.

SOFTWARE REQUIREMENTS SPECIFICATIONS

System Analysis is a technique for carrying out system requirement & project management using

structured analysis for specifying both manual & automated system. In system analysis the focus

is on inquiring of current organizational environment,defining the system requirement, making

recommendation for system improvement and determining the feasibility of system.

Analysis Methodology:

A complete understanding of requirement is essential for success of a project. This is done by

gathering information, the approach and manner in which sensitivity, commonsense and

knowledge of what and when to gather and what to use in securing information. There are

various tools for gathering during the phase of system analysis.

The phases are:-

1. Familiarity with the present through available documentation, such as procedure manuals,

document and their flow, interviews of user staff and on site observation.

2. Defining of decision making associated with managing the system. This is important for

determining what information is required of the system conduction interview clarifies the

decision point and how decision made in user area.

3. Once decision point is identified, a database may be conduct to define the information

requirement. The information gathered is analyzed and documented. Discrepancies

between decision system and information gathered from the information system are

identified. This concludes the analysis and sets the stage for system design.


33/74

Type of Information Needed:

Organization based information deals with policies, objectives, goals and structure. User based

information focuses on information requirement. Work based information addresses the work

flow, method & procedure and workstation. We are interested in what happened to data through

various point in system.

SYSTEM REQUIREMENTS:

SOFTWARE REQUIREMENTS:

Language :Java SDK, Eclipse Front End Tool: Sphinx-4

Back End Tool : Oracle 10g for database.

Operating system :Windows XP/7

Microsoft Word is used for documentation.

HARDWARE REQUIREMENTS:

Processor: PC with a Pentium IV-class processor, 600 MHz, Recommended: Pentium IV-class,

1.63 GHz.

RAM : 1 GB

Hard Disk Space: 20 GB on system drive, 10 GB for development environment.

Microphone : Good Quality microphone.


34/74

Chapter 5

SYSTEM ANALYSIS AND SYSTEM DESIGN

Requirement analysis defines WHAT the system should do; design tells HOW to do it. Thisis the simplest way to defines system design. Any design has to be constantly evaluated to ensure

that it meet its requirements, is practical and workable in the given environment. If there are

number of alternatives, then all alternatives must be evaluated and the best possible solution

must be implemented.

SYSTEM ANALYSIS

System Analysis is a term used to describe the process of calculating and analyzing facts in

respect of existing operation of the prevailing situations that an effective computerized system

may be designed and implemented if provided feasible. This is required in order to understand

the problem that has to be solved. The problem may be of any kind like computerizing an

existing system or developing an entirely new system or it can be a combination of two.

Basically system analysis is used to describe the process of calculating and analyzing facts

related to the existing operations of the prevailing situation, so that an effective and accurate

computerized system may be designed and implemented if feasible. This is required in order to

understand the problem the problem that has to be solved. To solve the problem in actual sense is

not the aim of designing phase, but to see how the problem can be solved. For this the logical

model of the system is required, providing the way to solve the problem and achieving the

desired goal. The logical view of the system is provided to the developer and user for decision

making such that developer can fee lease in designing the system.

SPECIFICATION OF PROJECT

The proposed system should have following features:

1. It should be able to store voices in .wav format.

2. It should be able to store usernames in database.

3. It should provide the option for existing and new user.


35/74

4. It should have the ability of processing voice prints.

5. It should closely match the voices.

6. It should recognize speech up to a reasonable extent.

7. It should provide proper guidance to the user to use it.

8. It should give fast results.

SYSTEM DESIGN

System Design is the technique of creating a system that takes into notice such factors such as

needs, performance levels, database design, hardware specifications, and data management. It is

the most important part in the development part in the development of the system, as in the

design phase the developer brings into existence the proposed system the analyst through of in

the analysis phase.

DESIGN CONCEPT

Software design sites at the technical kernel of software engineering and is applied regardless of

the software process model that is used. After software requirements have been analyzed and

specified. Software design is the first of three technical activities-designs, code generation and

test-that are required to build and verify the software. Each activity transforms information in a

manner that utility results in validated computer software. The design transforms the information

domain model created during analysis into the data structure that will be required to implement

the software. The data objects and relationship diagram and the detailed data content depicted in

the data dictionary provide the basis for the design activity .As aforesaid Design is that phase

of software engineering that tells all about the completion of a project or complete failure. In our

project Face Recognition System we have spent maximum time on Image preprocessing &processing. Now we are ready with processed images so as to make it easier for the user to match

images. Also data flow diagrams for the project has been developed. While developing this

project we have gone through various angles of images. The training data base structures are well

defined with complete description of images about the used. Another part which took most of our

consideration is that we decided to create the user input for directly giving path of images in the


36/74

dialog box and then executing each of them. The architectural design defined the relationship

between major structure elements of the software, the design patterns that can be used to

achieve the requirements that have been defined for the system and the constraints that affect the

way in which architectural design pattern can be applied. The interface design describes how the

software communicates with in itself, with systems that interoperate with it, and with humans

who use it .An interface applies a flow of information and a specific type of behavior. Design is

the phase where quality is fostered in website designing. Design provides us with representations

of software that can be assessed for quality. Design is the only way that we can accurately

translate a customers requirement into a finished software product or systems. Website design

serves as the foundation of the software support steps that follow.


37/74

Chapter 6

DATA FLOW DIAGRAM

A data flow diagram is graphical tool used to describe and analyze movement of datathrough a system. These are the central tool and the basis from which the other components are

developed. The transformation of data from input to output, through processed, may be

described logically and independently of physical components associated with the system. These

are known as the logical data flow diagrams. The physical data flow diagrams show the actual

implements and movement of data between people, departments and workstations.

A full description of a system actually consists of a set of data flow diagrams. Using two

familiar notations Yourdon, Gane and Sarson notation develops the data flow diagrams. Eachcomponent in a DFD is labelled with a descriptive name. Process is further identified with a

number that will be used for identification purpose. The development of DFDs is done in

several levels. Each process in lower level diagrams can be broken down into a more detailed

DFD in the next level. The lop-level diagram is often called context diagram. It consists a single

process bit, which plays vital role in studying the current system. The process in the context

level diagram is exploded into other process at the first level DFD.

The idea behind the explosion of a process into more process is that understanding at one

level of detail is exploded into greater detail at the next level. This is done until further

explosion is necessary and an adequate amount of detail is described for analyst to understand

the process.

Larry Constantine first developed the DFD as a way of expressing system requirements in

a graphical from, this lead to the modular design.

A DFD is also known as a bubble Chart has the purpose of clarifying system

requirements and identifying major transformations that will become programs in system design.

So it is the starting point of the design to the lowest level of detail. A DFD consists of a series of

bubbles joined by data flows in the system.


38/74

DFD SYMBOLS:

In the DFD, there are four symbols

1.

A square defines a source(originator) or destination of system data.2. An arrow identifies data flow. It is the pipeline through which the information flows.

3. A circle or a bubble represents a process that transforms incoming data flow into outgoing

data flows.

4. An open rectangle is a data store, data at rest or a temporary repository of data.

Process that transforms data flow.

Source or Destination of data

Data flow

Data Store

CONSTRUCTING DFD:

Several rules of thumb are used in drawing DFDs:

1. Process should be named and numbered for an easy reference. Each name should be

representative of the process.

2. The direction of flow is from top to bottom and from left to right. Data traditionally flow

from source to the destination although they may flow back to the source. One way to

indicate this is to draw long flow line back to a source. An alternative way is to repeat the


39/74

source symbol as a destination. Since it is used more than once in the DFD it is marked with

a short diagonal.

3. When a process is exploded into lower level details, they are numbered.

4. The names of data stores and destinations are written in capital letters. Process and dataflow

names have the first letter of each work capitalized.

5. A DFD typically shows the minimum contents of data store. Each data store should contain

all the data elements that flow in and out. Questionnaires should contain all the data elements

that flow in and out. Missing interfaces redundancies and like is then accounted for often

through interviews.

SAILENT FEATURES OF DFD:

1.The DFD shows flow of data, not of control loops and decision are controlled

considerations do not appear on a DFD.

2.The DFD does not indicate the time factor involved in any process whether the dataflow

take place daily, weekly, monthly or yearly.

3.The sequence of events is not brought out on the DFD.

TYPES OF DATA FLOW DIAGRAMS:

1. Current Physical

2. Current Logical

3. New Logical

4. New Physical

CURRENT PHYSICAL:In Current Physical DFD process label include the name of people or their positions or

the names of computer systems that might provide some of the overall system-processing label

includes an identification of the technology used to process the data. Similarly data flows and

data stores are often labels with the names of the actual physical media on which data are stored

such as file folders, computer files, business forms or computer tapes.


40/74

CURRENT LOGICAL:

The physical aspects at the system are removed as mush as possible so that the current

system is reduced to its essence to the data and the processors that transforms them regardless of

actual physical form.

NEW LOGICAL:

This is exactly like a current logical model if the user were completely happy with he user

were completely happy with the functionality of the current system but had problems with how it

was implemented typically through the new logical model will differ from current logical model

while having additional functions, absolute function removal and inefficient flows recognized.

NEW PHYSICAL:

The new physical represents only the physical implementation of the new system.

RULES GOVERNING THE DFDS

PROCESS:

1) No process can have only outputs.

2) No process can have only inputs. If an object has only inputs than it must be a sink.

3) A process has a verb phrase label.

DATA STORE:

Data cannot move directly from one data store to another data store, a process must move data.

from the source and place the data into data store .A data store has a noun phrase label.

SOURCE OR SINK

The origin and /or destination of data.

1) Data cannot move direly from a source to sink it must be moved by a process

2) A source and /or sink has a noun phrase land


41/74

DATA FLOW

1) A Data Flow has only one direction of flow between symbols. It may flow in both

directions between a process and a data store to show a read before an update. The

later is usually indicated however by two separate arrows since these happen at

different type.

2) A join in DFD means that exactly the same data comes from any of two or more

different processes data store or sink to a common location.

3) A data flow cannot go directly back to the same process it leads. There must be at

least one other process that handles the data flow produce some other data flow

returns the original data into the beginning process.

4) A Data flow to a data store means update (delete or change).

5) A data Flow from a data store means retrieve or use.

6) A data flow has a noun phrase label more than one data flow noun phrase can appear

on a single arrow as long as all of the flows on the same arrow move together as one

package.

DEVELOPING DATA-FLOW DIAGRAM

Top-Down Approach:

The system designer makes "a context level DFD" or Level 0, which shows the

"interaction" (data flows) between "the system" (represented by one process) and "the system

environment" (represented by terminators).The system is "decomposed in lower-level DFD

(Level 1)" into a set of "processes, data stores, and the data flows between these processes

and data stores" .Each process is then decomposed into an "even-lower-level diagram containing

its sub processes". This approach "then continues on the subsequent sub processes", until a

necessary and sufficient level of detail is reached which is called the primitive process


42/74

DATA FLOW DIAGRAM LEVELS

Context Level Diagram:

This level shows the overall context of the system and its operating environment and shows the

whole system as just one process. It does not usually show data stores, unless they are "owned"

by external systems, e.g. are accessed by but not maintained by this system, however, these are

often shown as external entities.


43/74

Level 1 (High Level Diagram):

This level (Level 1) shows all processes at the first level of numbering, data stores, external

entities and the data flows between them. The purpose of this level is to show the major high-

level processes of the system and their interrelation. A process model will have one, and only

one, level-1 diagram. A level-1 diagram must be balanced with its parent context level diagram,

i.e. there must be the same external entities and the same data flows, these can be broken down

to more detail in level 1.


44/74

LEVEL 2 DFD DIAGRAM:

name and identifier of higher level process shown at top of lower level diagram

frame represents the boundary of the process

data flow across the frame must relate to data flows at the higher level

data store used by only one process are usually shown as internal to that process at the

lower level

processes with no further decomposition are marked/*


45/74

Chapter 7

CODE SNIPPETS

1.RSSReader Classpackage com.cvrce.projects.launcher;

import java.net.URL;

import javax.xml.parsers.DocumentBuilder;

import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.CharacterData;

import org.w3c.dom.Document;import org.w3c.dom.Element;

import org.w3c.dom.Node;

import org.w3c.dom.NodeList;

public class RSSReader {

private static RSSReader instance = null;

private RSSReader() {

}

public static RSSReader getInstance() {

if(instance == null) {

instance = new RSSReader();

}

return instance;

}

public String writeNews() {

String s=new String("hello and welcome to News Reader Application. ");

String newsInBrief = new String("Briefing the headlines?");

String headLines = new String("! The headlines are?");

try {


46/74

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();

URL u = new URL("http://feeds.bbci.co.uk/news/world/asia/rss.xml"); // your feed url

Document doc = builder.parse(u.openStream());

NodeList nodes = doc.getElementsByTagName("item");

for(int i=0;i


47/74

try {

Node child = e.getFirstChild();

if(child instanceof CharacterData) {

CharacterData cd = (CharacterData) child;

return cd.getData();

}

}

catch(Exception ex) {

}

return "";

} //private String getCharacterDataFromElement

protected float getFloat(String value) {

if(value != null && !value.equals("")) {return Float.parseFloat(value);

}

return 0;

}

protected String getElementValue(Element parent,String label) {

return getCharacterDataFromElement((Element)parent.getElementsByTagName(label).item(0));

}

/*public static void main(String[] args) {

RSSReader reader = RSSReader.getInstance();

reader.writeNews();}

*/

}

2.Class TaskLauncher1

package com.cvrce.projects.launcher;

import java.awt.*;

//import java.awt.event.*;

import com.sun.speech.freetts.Voice;

import com.sun.speech.freetts.VoiceManager;

import com.sun.speech.freetts.audio.AudioPlayer;

import java.io.*;

import edu.cmu.sphinx.frontend.util.Microphone;


48/74

public class TaskLauncher1 extends Frame {

static int type; //mediaType=1 for movie, =2 for song and =3 for file

Frame f;

TextArea t1;

public TaskLauncher1() {

f=new Frame("BBC News");

//setLayout(new FlowLayout());

t1=new TextArea(200,200);

//t1.setSize(100, 50);

f.add(t1);

f.setSize(1200,700);

}

public Boolean launchTask(String task)

{

System.out.println("Launcher received : "+task);

// Microphone microphone=new Microphone();

try {

if(task.contains("movie"))

{type=1;

// microphone.stopRecording();

String s=new String("Select your movie! say? 1? for Sixth

sense? 2? for Illusionist? 3? for Madagascar? 4? for shrek? and 5? for Impact");

voice1(s);

//microphone.startRecording();

}

else if(task.contains("song"))

{

type=2;

String s=new String("Select your Music? say 1? for Chak

de India? 2? for Give me some sun shine? 3? for iss pal? 4? for miss independent and 5? for

Kaash ik din ");

voice1(s);


49/74

}

//Runtime.getRuntime().exec("D:\\VLC\\vlc

E:\\Music\\Low.mp3");

//Runtime.getRuntime().exec("E:\\Music\\Low.mp3");

else if(task.contains("data file"))

{

type=3;

int i=0;

String s=new String("Select whose biodata file to read? say

1? for samarpita? 2? for pranita? 3? for snigdha? and 4? for ellora green");

voice1(s);

//fileread(i);

}if(task.contains("one"))

{

//if media type is movie

if(type == 1)

{

//play first movie

Runtime.getRuntime().exec("D:\\VLC\\vlc

E:\\Movies\\Sixth_sense.avi");

}

//if media type is song

if(type == 2)

{

//play first song


E:\\Music\\ChakDe.mp3");

}

//if type is file

if(type==3)

fileread(1);

}


50/74

//if user says two

if(task.contains("two"))

{


if(type == 1)

{

//play 2nd movie


E:\\Movies\\The_Illusionist.avi");

}


if(type == 2)

{//play 2nd song


E:\\Music\\3idiots04.mp3");

}

//if type is file

if(type==3)

fileread(2);

}

//if user says Three

if(task.contains("three"))

{


if(type == 1)

{

//play first movie


E:\\Movies\\madagascar2.mkv");

}


if(type == 2)


51/74

{

//play first song

Runtime.getRuntime().exec("D:\\VLC\\vlcE:\\Music\\Ispal.mp3");

}

//if type is file

if(type==3)

fileread(3);

}

//if user says four

if(task.contains("four"))

{//if media type is movie

if(type == 1)

{

//play first movie


E:\\Movies\\Shrek1.avi");

}


if(type == 2){

//play first song


E:\\Music\\MissIndependent.mp3");

}

//if type is file

if(type==3)

fileread(4);

}


52/74

//if user says five

if(task.contains("five"))

{


if(type == 1)

{

//play first movie


D:\\Impact.avi");

}


if(type == 2)

{//play first song


E:\\Music\\dwnlds\\showbiz03.mp3");

}

}

else if(task.contains("news"))readRSS();

else if(task.contains("snap"))

Runtime.getRuntime().exec("D:\\PicasaPhotoViewerD:\\friends.jpg");

else

{

String s=new String("");

}

} catch (Exception e) {

// TODO Auto-generated catch block

e.printStackTrace();


53/74

}

return false;

}

public void listAllVoices() {

VoiceManager voiceManager = VoiceManager.getInstance();

Voice[] voices = voiceManager.getVoices();

}

public void voice1(String s)

{

listAllVoices();

String voiceName = "kevin16";

/* The VoiceManager manages all the voices for FreeTTS.

*/


Voice helloVoice = voiceManager.getVoice(voiceName);

if (helloVoice == null) {

System.err.println("Cannot find a voice named "

+ voiceName + ". Please specify a different voice.");

System.exit(1);

}

/* Allocates the resources for the voice.

*/

helloVoice.allocate();

/* Synthesize speech.

*/

helloVoice.speak(s);

helloVoice.deallocate();


54/74

}

public void fileread(int i)throws Exception

{

String s1=new String();

if (i==1)

{

s1="D:/sambiodata.txt";

//Runtime.getRuntime().exec("D://sambiodata.txt");

}

if (i==2)

{

s1="D:/prabiodata.txt";

//Runtime.getRuntime().exec("D://prabiodata.txt");

}

if (i==3)

{

s1="E:/snicv.txt";

//Runtime.getRuntime().exec("E://snicv.txt");

}

if (i==4)

{

s1="E:/ellucv.txt";

//Runtime.getRuntime().exec("E://ellucv.txt");}

FileReader fr = new FileReader(s1);

BufferedReader br = new BufferedReader(fr);

String s2;

while((s2 = br.readLine())!= null) {

System.out.println(s2);

voice1(s2);

}

fr.close();

}


55/74

public void readRSS()

{

RSSReader reader = RSSReader.getInstance();

String s=reader.writeNews();

f.setVisible(true);

t1.setText(s);

//speak the news

voice1(s);

}

}

3.Class VoiceResponseSystem

/*

* Copyright 1999-2004 Carnegie Mellon University.

* Portions Copyright 2004 Sun Microsystems, Inc.

* Portions Copyright 2004 Mitsubishi Electric Research Laboratories.

* All Rights Reserved. Use is subject to license terms.

*

* See the file "license.terms" for information on usage and

* redistribution of this file, and for a DISCLAIMER OF ALL

* WARRANTIES.*

*/

package com.cvrce.projects.speech;

import com.cvrce.projects.launcher.TaskLauncher1;

import com.sun.speech.freetts.Voice;

import com.sun.speech.freetts.VoiceManager;

import edu.cmu.sphinx.frontend.util.Microphone;import edu.cmu.sphinx.recognizer.Recognizer;

import edu.cmu.sphinx.result.Result;

import edu.cmu.sphinx.util.props.ConfigurationManager;

/**


56/74

* A Program showing a simple speech application built using Sphinx-4. This application uses

the Sphinx-4

* endpointer, which automatically segments incoming audio into utterances and silences.

*/

public class VoiceResponseSystem {

public void listAllVoices() {


Voice[] voices = voiceManager.getVoices();

}

public void voice1(String s)

{

listAllVoices();

String voiceName = "kevin16";

System.out.println();

//System.out.println("Using voice: " + voiceName);

/* The VoiceManager manages all the voices for FreeTTS.

*/VoiceManager voiceManager = VoiceManager.getInstance();

Voice helloVoice = voiceManager.getVoice(voiceName);

if (helloVoice == null) {

System.err.println(

"Cannot find a voice named "

+ voiceName + ". Please specify a different voice.");

System.exit(1);

}

/* Allocates the resources for the voice.

*/

helloVoice.allocate();


57/74

/* Synthesize speech.

*/

helloVoice.speak(s);

helloVoice.deallocate();

}

public static void main(String[] args) {

String s1= new String("Hello and welcome to Voice response system?! select your

option? " +

" say movie? to watch a movie? song? to listen a song?! news? to listen

news? " +"Data file? to listen the containts of biodata file? and? say snap? to view a

picture?");

VoiceResponseSystem v1=new VoiceResponseSystem();

//v1.voice1(s1);

ConfigurationManager cm;

if (args.length > 0) {cm = new ConfigurationManager(args[0]);

} else {

cm = new

ConfigurationManager(VoiceResponseSystem.class.getResource("vrs.config.xml"));

}

Recognizer recognizer = (Recognizer) cm.lookup("recognizer");

recognizer.allocate();

// start the microphone or exit if the programm if this is not possible

Microphone microphone = (Microphone) cm.lookup("microphone");

if (!microphone.startRecording()) {

System.out.println("Cannot start microphone.");

recognizer.deallocate();


58/74

System.exit(1);

}

System.out.println("Ask: Song/News/Data File/Movie/Snap");

// loop the recognition until the programm exits.

while (true) {

System.out.println("Start speaking.\n");

Result result = recognizer.recognize();

if (result != null) {

String resultText = result.getBestFinalResultNoFiller();

System.out.println("You said: " + resultText + '\n');

TaskLauncher1 tl = new TaskLauncher1();tl.launchTask(resultText);

// microphone.stopRecording();

// recognizer.deallocate();

} else {

}

}

}

}

4.Grammar File

#JSGF V1.0;

/**

* JSGF Grammar for Hello World example

*/

grammar hello;

public = ( Song | News | Data File | Movie | One | Two | Three | Four | Five | Snap );


59/74

Chapter 8

RESULT & SCREENSHOTS

After running the application it asks to choose any options by saying codeallocated to each action. There are 5 actions:

1.Song

2. Snap

3.Movie

4.News

5.Data File


60/74

Output of each action is described below.

1.Selecting song

Example 1:

After selecting song, it asks for other options under this action, like Saying one

for song Chak de India, two for Give me some sunshine etc.


61/74

Example 2:


62/74

2.Selecting Photo:

After selecting option snap it opens a picture friends.jpg as shown below.


63/74

3.Selecting movie:

After selecting movie, it asks for other options under this action, like Saying

one for movie The sixth sense, two for The Illusionist etc.

Example 1: selected movie 4 : Shrek2


64/74

Example 2:


65/74

4.Selecting News:

After selecting this option, it connects to the bbc news rss feed i.e,http://feeds.bbci.co.uk/news/world/asia/rss.xml
http://feeds.bbci.co.uk/news/world/asia/rss.xmlhttp://feeds.bbci.co.uk/news/world/asia/rss.xmlhttp://feeds.bbci.co.uk/news/world/asia/rss.xml


66/74

5.Selecting a data file to read:

After selecting data file, it asks for other options under this action, like Saying

one for filesambiodata.txt, two for prabiodata.txt etc.

Selected file 3:snicv.txt


67/74

Chapter 9

DISCUSSION

The modular framework of Sphinx-4 has permitted us to do some things very easily thathave been traditionally difficult. The modular nature of Sphinx-4 also provides it with the ability

to use modules whose implementations range from general to specific applications of an

algorithm. For example, we were able to improve the runtime speed for the RM1 regression test

by almost 2 orders of magnitude merely by plugging in a new Linguist and leaving the rest of the

system the same. Furthermore, the modularity of Sphinx-4 also allows it to support a wide

variety of tasks. For example, the various SearchManager implementations allow Sphinx-4 to

efficiently support tasks that range from small vocabulary tasks implementations allow Sphinx-4

to support different tasks such as traditional CFG-based command-and-control applications in

addition to applications that use stochastic language models.

The modular nature of Sphinx-4 was enabled primarily by the use of the Java

programming language. In particular, the ability of the Java platform to load code at run time

permits simple support for the pluggable framework, and the Java programming language

construct of interfaces permits separation of the framework design from the implementation.

The Java platform also provides Sphinx-4 with a number of other advantages:

Sphinx-4 can run on a variety of platforms without the need for recompilation

The rich set of platform APIs greatly reduces coding time

Built-in support for multithreading makes it simple to experiment with distributing decoding

tasks across multiple threads

Automatic garbage collection helps developers to concentrate on algorithm development

instead of memory leaks

On the downside, the Java platform can have issues with memory footprint. Also related

to memory, some speech engines will directly access the platform memory directly in order to

optimize the memory throughput during decoding. Direct access to the platform memory model

is not permitted with the Java programming language. A common misconception people have

regarding the Java programming language is that it is too slow. When developing Sphinx-4, we

carefully instrumented the code to measure various aspects of the system, comparing the results

to its predecessor.


68/74

Table I provides a summary showing that Sphinx-4 performs well (for both WER and RT,

a lower number indicates better performance). An interesting result of this helps to demonstrate

the strength of the pluggable and modular design of Sphinx-4.

we were able to plug in different implementations of the Linguist and SearchManager that were

optimized for the particular tasks, allowing Sphinx-4 to perform much better. Another interesting

aspect of the performance study shows us that raw computing speed is not our biggest concern

when it comes to RT performance. For the 2 CPU results in this table, we used a Scorer that

equally divided the scoring task across the available CPUs. While the increase in speed is

noticeable, it is not as dramatic as we expected. Further analysis helped us determine that only

about 30 percent of the CPU time is spent doing the actual scoring of the acoustic model states.

The remaining 70 percent is spent doing non-scoring activity, such as growing and pruning the

ActiveList. Our results also show that the Java platforms garbage collection mechanism only

accounts for 2-3 percent of the overall CPU usage.

TEST

WER RTTI46(11 WORDS) 0.168 0.02

TIDIGITS(11 WORDS) 0.549 0.05

AN4(79 WORDS) 1.192 0.20RM1(1000 WORDS) 2.739 0.40

WSJ5K(5000 WORDS) 7.174 0.96

(Sphinx-4 performance.word error rate (wer) is given in percent. Real time (rt) speed is the ratio

of utterance duration to the Time to decode the utterance.)

Results:

The test cases mentioned in the previous slide have been found to produce correct results given

the voice is recognized correctly. However, the voice recognition is not 100 percent accurate. It

may sometimes lead to frustrating results.


69/74

Known Bugs/Defects

Since the project is based on voice recognition, the accuracy while working is not very

high.Sometimes, it may so happen that we speak at the loudest of our voice levels in as clear

pronunciation as possible and yet the program might misunderstand what is spoken. It cannot be

attributed as a bug in the project, but is for sure a defect which arises due to large number of

factors. Some of these factors may be the noise interference from the environment, difference in

the accent of the user and the accent on which the program is trained to understand etc.

Workaround:

While no perfect solution for this can be implemented, we can have a workaround. This is to

train the program to understand accent of a specific user which will in turn result in higher

accuracy.


70/74

Chapter 10

CONCLUSION

ADVANTAGES:

Able to write the text through both keyboard and voice input.

Voice recognition of different notepad commands such as open save and clear.

Open different windows softwares, based on voice input.

Requires less consumption of time in writing text.

Provide significant help for the people with disabilities.

Lower operational costs.

DISADVANTAGES: Low accuracy

Not good in the noisy environment

After careful development of the Sphinx-4 framework, we created a number of differing

implementations for each module in the framework. For example, the Front End implementations

support MFCC, PLP, and LPC feature extraction; the Linguist implementations support a variety

of language models, including CFGs, FSTs, and N-Grams; and the Decoder supports a variety of

Search Manager implementations. Using the Configuration Manager, the various

implementations of the modules can be combined in various ways, supporting our claim that we

have developed a flexible pluggable framework. Furthermore, the framework is performing well

both in speed and accuracy when compared to its predecessors. The Sphinx-4 framework is

already proving itself as being research ready, easily supporting various work as well as a

specialized Linguist. We view this as only the very beginning, however, and expect Sphinx-4 to

support future areas of core speech recognition research. Finally, the source code to Sphinx-4 is

freely available. The license permits others to do academic and commercial research and to

develop products without requiring any licensing fees. More information is available at

http://cmusphinx.sourceforge.net/sphinx4.

This Thesis/Project work of voice response system started with a brief introduction of

the technology and its applications in different sectors. The project part of the Report was based


71/74

on software development for voice response system. In the later stage we discussed different

tools for bringing that idea into practical work. After the development of the software finally it

was tested and results were discussed, few deficiencies factors were brought in front. After the

testing work, advantages of the software were described and suggestions for further enhancement

and improvement were discussed.

Future Enhancements

This work can be taken into more detail and more work can be done on the project in

order to bring modifications and additional features. The current software doesnt support a large

vocabulary, the work will be done in order to accumulate more number of samples and increase

the efficiency of the software. The current version of the software supports only few areas but

more areas can be covered and effort will be made in this regard.


72/74

Chapter 11

BIBLIOGRAPHY

[1] S. Young, The HTK hidden Markov model toolkit: Design and philosophy, Cambridge

University Engineering Department, UK, Tech. Rep. CUED/FINFENG/TR152, Sept. 1994.

[2] N. Deshmukh, A. Ganapathiraju, J. Hamaker, J. Picone, and M. Ordowski, A public domain

speech-to-text system, in Proceedings of the 6th European

Conference on Speech Communication and Technology, vol. 5, Budapest, Hungary, Sept. 1999,

pp. 21272130.

[3] X. X. Li, Y. Zhao, X. Pi, L. H. Liang, and A. V. Nefian, Audio-visual continuous speech

recognition using a coupled hidden Markov model, in

Proceedings of the 7th International Conference on Spoken Language Processing, Denver, CO,

Sept. 2002, pp. 213216.

[4] K. F. Lee, H. W. Hon, and R. Reddy, An overview of the SPHINX speech recognition

system,IEEE Transactions on Acoustics, Speech and Signal

Processing, vol. 38, no. 1, pp. 3545, Jan. 1990.

[5] X. Huang, F. Alleva, H. W. Hon, M. Y. Hwang, and R. Rosenfeld, The SPHINX-II speech

recognition system: an overview, Computer Speech and

Language, vol. 7, no. 2, pp. 137148, 1993.

[6] M. K. Ravishankar, Efficient algorithms for speech recognition, PhD Thesis (CMU

Technical Report CS-96-143), Carnegie Mellon University, Pittsburgh,PA, 1996.

[7] P. Lamere, P. Kwok, W. Walker, E. Gouvea, R. Singh, B. Raj, and P. Wolf, Design of theCMU Sphinx-4 decoder, in Proceedings of the 8th EuropeanConference on Speech Communication and Technology, Geneve, Switzerland, Sept. 2003, pp.

11811184.

[8] J. K. Baker, The Dragon system - an overview, inIEEE Transactions on Acoustic, Speech

and Signal Processing, vol. 23, no. 1, Feb. 1975, pp. 2429.

[9] B. T. Lowerre, The Harpy speech recognition system, Ph.D. dissertation, Carnegie MellonUniversity, Pittsburgh, PA, 1976.

[10] J. K. Baker, Stochastic modeling for automatic speech understanding, in SpeechRecognition, R. Reddy, Ed. New York: Academic Press, 1975, pp.

521542.


73/74

[11] P. Placeway, S. Chen, M. Eskenazi, U. Jain, V. Parikh, B. Raj, M. Ravishankar, R.

Rosenfeld, K. Seymore, M. Siegler, R. Stern, and E. Thayer, The 1996 HUB-4 Sphinx-3

system, in Proceedings of the DARPA Speech Recognition Workshop. Chantilly, VA: DARPA,

Feb. 1997. [Online]. Available:

http://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdf

[12] M. Ravishankar, Some results on search complexity vs accuracy, in Proceedings of the

DARPA Speech Recognition Workshop. Chantilly, VA:

DARPA, Feb. 1997. [Online].Available:http://www.nist.gov/speech/publications/darpa97/pdf/ravisha1.pdf

[13] F. Jelinek, Statistical Methods for Speech Recognition. Cambridge, MA: MIT Press, 1998.SMLI TR2004-0811 c2004 SUN MICROSYSTEMS INC. 9

[14] X. Huang, A. Acero, F. Alleva, M. Hwang, L. Jiang, and M. Mahajan, From SPHINX-II to

Whisper: Making speech recognition usable, inAutomatic

Speech and Speaker Recognition, Advanced Topics, C. Lee, F. Soong, and K. Paliwal, Eds.Norwell, MA: Kluwer Academic Publishers, 1996.

[15] S. B. Davis and P. Mermelstein, Comparison of parametric representations formonosyllable word recognition in continuously spoken sentences, in

IEEE Transactions on Acoustic, Speech and Signal Processing, vol. 28, no. 4, Aug. 1980.

[16] H. Hermansky, Perceptual linear predictive (PLP) analysis of speech,Journal of the

Acoustical Society of America, vol. 87, no. 4, pp. 17381752,

1990.[17] NIST. Speech recognition scoring package (score). [Online]. Available:

http://www.nist.gov/speech/tools

[18] G. D. Forney, The Viterbi algorithm, Proceedings of The IEEE, vol. 61, no. 3, pp. 268

278, 1973.[19] P. Kenny, R. Hollan, V. Gupta, M. Lenning, P. Mermelstein, and D. OShaugnessy, A*-

admissible heuristics of rapid lexical access,IEEE Transactions on Speech and Audio

Processing, vol. 1, no. 1, pp. 4959, Jan. 1993.[20] Java speech API grammar format (JSGF). [Online].

Available:http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/

[21] M. Mohri, Finite-state transducers in language and speech processing, ComputationalLinguistics, vol. 23, no. 2, pp. 269311, 1997.

[22] P. Clarkson and R. Rosenfeld, Statistical language modeling using the CMU-cambridge

toolkit, in Proceedings of the 5th European Conference on

Speech Communication and Technology, Rhodes, Greece, Sept. 1997.

[23] Carnegie Mellon University. CMU pronouncing dictionary. [Online]. Available:

http://www.speech.cs.cmu.edu/cgi-bin/cmudict
http://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdfhttp://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdfhttp://java.sun.com/products/java-media/speech/forDevelopers/JSGF/http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/http://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://java.sun.com/products/java-media/speech/forDevelopers/JSGF/http://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdf


74/74

[24] S. J. Young, N. H. Russell, and J. H. S. Russell, Token passing: A simple conceptual

model for connected speech recognition systems, Cambridge

University Engineering Dept, UK, Tech. Rep. CUED/F-INFENG/TR38, 1989.

[25] R. Singh, M. Warmuth, B. Raj, and P. Lamere, Classification with free energy at raisedtemperatures, in Proceedings of the 8th European Conference

on Speech Communication and Technology, Geneve, Switzerland, Sept. 2003, pp. 17731776.

[26] P. Kwok, A technique for the integration of multiple parallel feature streams in the Sphinx-

4 speech recognition system, Masters Thesis (Sun Labs

TR-2003-0341), Harvard University, Cambridge, MA, June 2003.

[27] P. Price, W. M. Fisher, J. Bernstein, and D. S. Pallett, The DARPA 1000-word resource

management database for continuous speech recognition, in

Proceedings of the International Conference on Acoustics, Speech and Signal Processing, vol. 1.

IEEE, 1988, pp. 651654.

[28] G. R. Doddington and T. B. Schalk, Speech recognition: Turning theory to practice,IEEE

Spectrum, vol. 18, no. 9, pp. 2632, Sept. 1981.

[29] R. G. Leonard and G. R. Doddington, A database for speaker-independent digit

recognition, in Proceedings of the International Conference on Acoustics,Speech and Signal Processing, vol. 3. IEEE, 1984, p. 42.11.

[30] J. Garofolo, E. Voorhees, C. Auzanne, V. Stanford, and B. Lund, Design and preparationof the 1996 HUB-4 broadcast news benchmark test corpora,

in Proceedings of the DARPA Speech Recognition Workshop. Chantilly, Virginia: Morgan

Kaufmann, Feb. 1997, pp. 1521.

[31] (2003, Mar.) Sphinx-4 trainer design. [Online]. Available:

http://www.speech.cs.cmu.edu/cgi-bin/cmusphinx/twiki/view/Sphinx4/Train%erDesign

[32] J. R. Glass, A probablistic framework for segment-based speech recognition, Computer

Speech and Language, vol. 17, no. 2, pp. 137152, Apr. 2003.
http://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp:/

VOICERSFINAL

Documents

Transcript of VOICERSFINAL