VOICERSFINAL
-
Upload
snigdha-mohanty -
Category
Documents
-
view
221 -
download
0
Transcript of VOICERSFINAL
-
7/31/2019 VOICERSFINAL
1/74
VOICE RESPONSE SYSTEM
ABSTRACT
A voice response system is a computer system that responds to voice commands, rather
than input from a keystroke or a mouse. Uses for this kind of system range from convenience to
necessity to security. People who are visually or otherwise physically impaired are prime
candidates for a voice response system. Because they cannot see or otherwise access a keyboard
or mouse, they have no way to access a computer without a voice response system, unless they
want to depend entirely on other people. Being able literally to tell a computer what to do may be
a revelation for someone who ordinarily has little hope of controlling a computer. A voice
response system would also come in handy for someone who is not physically impaired. With a
voice response system, you wouldn't need to be very close to your computer in order to access it
or give it commands. As long as you are in earshot of the PC, it can use its voice response system
to accept voice commands from you in the same way that it traditionally accepts keystroke and
mouse commands.
The system acquires speech at run time through a microphone and processes the sampled
speech to recognize the uttered text. Sphinx-4 is a speech recognition system written entirely in
the Java(TM) programming language. A VRS is an intelligent system which enables the user to
instruct computer to perform actions through voice commands and also form his own repository
of commands and map them to appropriate actions. The recognized text will be matched to
corresponding action
-
7/31/2019 VOICERSFINAL
2/74
CONTENTS
1. Introduction
2. Voice recognition
Relevance of The Project
Application of voice recognition
3. Working of the Project
Speech Engine
JSAPI
JSAPI classes and interfaces
Speech Synthesis
Speech Recognition
Components
Speech Recognition Weakness & Flaws
Future of Speech Recognition
JSGF Grammar Format
Sphinx Speech Recognition System
4. Feasibility Study & Requirement Analysis
5. System Analysis & System Design
6. Data flow diagram
Context diagram
Level 1
Level 2
7. code snippets
8. results and screenshots
9. discussion
10. conclusion
11.bibliography
-
7/31/2019 VOICERSFINAL
3/74
-
7/31/2019 VOICERSFINAL
4/74
-
7/31/2019 VOICERSFINAL
5/74
Chapter 1
INTRODUCTION
A VRS is an intelligent system which enables the user to instruct computer to
perform actions through voice commands and also form his own repository of commands and
map them to appropriate actions.
A voice response system is a computer system that responds to voice commands, rather
than input from a keystroke or a mouse. Uses for this kind of system range from convenience to
necessity to security. People who are visually or otherwise physically impaired are primecandidates for a voice response system. Because they cannot see or otherwise access a keyboard
or mouse, they have no way to access a computer without a voice response system, unless they
want to depend entirely on other people. Being able literally to tell a computer what to do may be
a revelation for someone who ordinarily has little hope of controlling a computer. A voice
response system would also come in handy for someone who is not physically impaired. With a
voice response system, you wouldn't need to be very close to your computer in order to access it
or give it commands. As long as you are in earshot of the PC, it can use its voice response system
to accept voice commands from you in the same way that it traditionally accepts keystroke and
mouse commands.
Key points that outline the implemented idea are:
VRS runs as a background process.
Based on the instruction, multiple processes are created.
While the background process keeps on listening to the user requirements,
independent processes are continuously created in response to the input voice
instruction.
Voice recognition may be enabled in the processes executed on top also, but it has been
avoided as it interferes with the background process.
-
7/31/2019 VOICERSFINAL
6/74
VRS Library has been built which includes some basic commands.
1. DATA FILE - Opens list of saved file that may be given.
2. SONGS - Opens list of songs that may be played.
3. MOVIES - Opens list of movies that may b played.
4. NEWS - Read the news from given website.
5. SNAP - Opens picture.
The library may be further extended by the user for his own specific requirements. User.gram
has been included in the src along with directions to add an action map for this purpose.
Technologies used in implementation:
Sphinx 4.
JSAPI.
Java Programming Language.
JSGF Grammer files.
The relevence and use of each of the above has been discussed later in the document. The
code has been developed in Eclipse. The paths used in mapping actions are absolute and hence
system dependent.
The requirement of this project is to develop an intelligent system which:
1. is capable of taking voice input.
2. interprets the input command.
3. processes the command to map it to the action set.
4. it has an action set must contain mapping of input to the corresponding response.
5. has adaptive mechanism to handle more mappings and add it to the action set.
6. Example: voice input draw circle on the screen.
-
7/31/2019 VOICERSFINAL
7/74
Chapter 2
VOICE RECOGNITION
The term voice recognition is sometimes used to refer to recognition systems that must be
trained to a particular speaker as is the case for most desktop recognition software.
1. Voice Recognition: Converts speech to text.
2. Recognizing the speaker can simplify the task of translating speech.
3. Voice Recognition targets to generalize the task without being targeted at a single speaker.
Relevance To The Project
1. Voice recognition is used to map a voice command with its corresponding action.This is brought about by converting speech to text.
2. The API used for recognizing voice is trained [by default] to understand americanmale accent recorded at 16kbps.
3. The program matches the input voice with the voice on which it is trained and maps it to the
best possible result.
Although the idea of recognizing voice may seem fairly simple, there are a lot of real timeproblems. Some include:
Large amount of memory is required to store voice files
Noise interference reduces accuracy.
Comparing our accent with the trained voice often gives rise to absurd results.
Precision of the system is directly proportional to complexity of source code.
APPLICATIONS OF VOICE RECOGNITION
Health Care
In the health care domain, even in the wake of improving speech recognition technologies,
medical transcriptionists (MTs) have not yet become obsolete. The services provided may be
-
7/31/2019 VOICERSFINAL
8/74
redistributed rather than replaced. Speech recognition is used to enable deaf people to understand
the spoken word via speech to text conversion, which is very helpful.
Military
Substantial efforts have been devoted in the last decade to the test and evaluation of speech
recognition in fighter aircraft. Of particular note are the U.S. program in speech recognition for
the Advanced Fighter Technology Integration (AFTI)/F-16 aircraft (F-16VISTA), the program in
France on installing speech recognition systems on Mirage aircraft, and programs in the UK
dealing with a variety of aircraft platforms. In these programs, speech recognizers have been
operated successfully in fighter aircraft with applications including: setting radio frequencies,
commanding an autopilot system, setting steer-point coordinates and weapons release
parameters, and controlling flight displays. Generally, only very limited, constrained
vocabularies have been used successfully, and a major effort has been devoted to integration of
the speech recognizer with the avionics system.
Telephony and other domains
ASR in the field of telephony is now commonplace and in the field of computer gaming and
simulation is becoming more widespread. Despite the high level of integration with word
processing in general personal computing, however, ASR in the field of document production
has not seen the expected increases in use. The improvement of mobile processor speeds made
feasible the speech-enabled Symbian and Windows Mobile Smartphones. Speech is used mostly
as a part of User Interface, for creating pre-definedor custom speech commands. Leading
software vendors in this field are: Microsoft Corporation (Microsoft Voice Command), Nuance
Communications (Nuance Voice Control), Vito Technology (VITO Voice2Go), SpeereoSoftware (Speereo Voice Translator) and SVOX.
-
7/31/2019 VOICERSFINAL
9/74
People with disabilities
People with disabilities can benefit from speech recognition programs. Speech recognition is
especially useful for people who have difficulty using their hands, ranging from mild repetitive
stress injuries to involved disabilities that preclude using conventional computer input devices.
In fact, people who used the keyboard a lot and developed RSI became an urgent early market
for speech recognition. Speech recognition is used in deaf telephony, such as voicemail to text,
relay services, and captioned telephone. Individuals with learning disabilities who have problems
with thought-to-paper communication (essentially they think of an idea but it is processed
incorrectly causing it to end up differently on paper) can benefit from the software.
Home Automation
Luxury being the priority, such program also finds application in home automation. Home
automation may include centralized control of lighting, heating, ventilation, air conditioning and
other systems, to provide improved convenience, comfort, energy efficiency and security.
Transcription
Transcription in the linguistic sense is the conversion of a representation of language into
another representation of language, usually in the same language but in a different form.
Transcription should not be confused with translation, which in linguistics usually means
converting from one language to another, such as from English to Spanish. The most common
type of transcription is from a spoken-language source into text.
-
7/31/2019 VOICERSFINAL
10/74
Chapter 3
WORKING OF THE PROJECT
In this chapter, we will cover all the elements required for the working of the project and then
converge the requirements to explain the solution design implemented for the project.
Speech Engine
The Speech Engine loads a list of words to be recognized. This list of words is called a grammar.
Takes input as distinct characteristics of sound - derived from the waveform and compares them
with its own acoustic model. The engine searches its acoustic space, using the grammar to guidethis search. It then determines which words in the grammar the audio most closely matches and
returns a result.
Speech Engine
-
7/31/2019 VOICERSFINAL
11/74
Java Speech API/JSAPI
The Java Speech API (JSAPI) is an application programming interface for cross-
platform support of command and control recognizers, dictation systems, and speech
synthesizers. Although JSAPI defines an interface only there are several implementations createdby third parties, for example FreeTTS.
The Java Speech API enables speech applications to interact with speech engines in a
common, standardized, and implementation-independent manner. Speech engines from different
vendors can be accessed using the Java Speech API, as long as they are JSAPI-compliant. With
JSAPI, speech applications can use speech engine functionality such as selecting a specific
language or a voice, as well as any required audio resources. JSAPI provides an API for both
speech synthesis and speech recognition.
.
The Java Speech APIs classes and interfaces
The different classes and interfaces that form the Java Speech API are grouped into the following
three packages:
javax.speech: Contains classes and interfaces for a generic speech engine.
javax.speech.synthesis: Contains classes and interfaces for speech synthesis.
javax.speech.recognition: Contains classes and interfaces for speech recognition.
The Central class is like a factory class that all Java Speech API applications use. It provides
static methods to enable the access of speech synthesis and speech recognition engines. The
Engine interface encapsulates the generic operations that a Java Speech API-compliant speech
engine should provide for speech applications.
Speech applications can primarily use methods to perform actions such as retrieving the
properties and state of the speech engine and allocating and deallocating resources for a speech
engine. In addition, the Engine interface exposes mechanisms to pause and resume the audio
-
7/31/2019 VOICERSFINAL
12/74
stream generated or processed by the speech engine. The Engine interface is subclassed by the
Synthesizer and Recognizer interfaces, which define additional speech synthesis and speech
recognition functionality. The Synthesizer interface encapsulates the operations that a Java
Speech API-compliant speech synthesis engine should provide for speech applications.
The Java Speech API is based on the event-handling model of AWT components. Events
generated by the speech engine can be identified and handled as required. There are two ways to
handle speech engine events: through the EngineListener interface or through the EngineAdapter
class.
JSAPI STACK
-
7/31/2019 VOICERSFINAL
13/74
Features:
Converts speech to text.
Converts text and delivers them in various formats of speech.
Supports events based on the Java event queue.
Easy to implement API interoperates with multiple Java-based applications like applets
and Swing applications.
Interacts seamlessly with the AWT event queue.
Supports annotations using JSML to improve pronunciation and naturalness in speech.
Supports grammar definitions using JSGF.
Ability to adapt to the language of the speaker.
Two core speech technologies are supported through the Java Speech API: speech
synthesis and speech recognition
Speech synthesis
Speech synthesis provides the reverse process of producing synthetic speech from text
generated by an application, an applet, or a user. It is often referred to as text-to-speech
technology.
The major steps in producing speech from text are as follows:
Structure analysis: Processes the input text to determine where paragraphs, sentences, and
other structures start and end. For most languages, punctuation and formatting data are used
in this stage.
Text pre-processing: Analyzes the input text for special constructs of the language. In
English, special treatment is required for abbreviations, acronyms, dates, times, numbers,
currency amounts, e-mail addresses, and many other forms. Other languages need special
processing for these forms, and most languages have other specialized requirements
-
7/31/2019 VOICERSFINAL
14/74
The remaining steps convert the spoken text to speech:
Text-to-phoneme conversion: Converts each word to phonemes. A phoneme is a basic unit
of sound in a language.
Prosody analysis: Processes the sentence structure, words, and phonemes to determine the
appropriate prosody for the sentence.
Waveform production: Uses the phonemes and prosody information to produce the audio
waveform for each sentence.
Speech synthesizers can make errors in any of the processing steps described above.
Human ears are well-tuned to detecting these errors, but careful work by developers can
minimize errors and improve the speech output quality. The Java Speech API and the Java
Speech API MarkupLanguage (JSML) provide many ways for you to improve the output quality
of a speech synthesizer.
Speech Recognition
Speech recognition provides computers with the ability to listen to spoken language and
determine what has been said. In other words, it processes audio input containing speech by
converting it to text.
Speech Recognition System
Components:
With the help of microphone audio is input to the system, the pc sound card produces the
equivalent digital representation of received audio.
DigitizationThe process of converting the analog signal into a digital form is known as digitization, it
involves the both sampling and quantization processes. Sampling is converting a continuous
signal into discrete signal, while the process of approximating a continuous range of values is
known as quantization.
-
7/31/2019 VOICERSFINAL
15/74
SPEECH RECOGNITION SYSTEM
Acoustic Model
An acoustic model is created by taking audio recordings of speech, and their text transcriptions,
and using software to create statistical representations of the sounds that make up each word. It is
used by a speech recognition engine to recognize speech .The software acoustic model breaks the
words into the phonemes.
Language Model
Language modeling is used in many natural language processing applications such as speech
recognition tries to capture the properties of a language and to predict the next word in the
-
7/31/2019 VOICERSFINAL
16/74
speech sequence . The software language model compares the phonemes to words in its built in
dictionary .
Speech engine
The job of speech recognition engine is to convert the input audio into text ; to accomplish this it
uses all sorts of data, software algorithms and statistics. Its first operation is digitization as
discussed earlier, that is to convert it into a suitable format for further processing. Once audio
signal is in proper format it then searches the best match for it. It does this by considering the
words it knows, once the signal is recognized It returns its corresponding text string.
The major steps of a typical speech recognizer are as follows:
Grammar design: Defines the words that may be spoken by a user and the patterns in
which they may be spoken.
Signal processing: Analyzes the spectrum (the frequency) characteristics of the incoming
audio.
Phoneme recognition: Compares the spectrum patterns to the patterns of the phonemes
of the language being recognized.
Word recognition: Compares the sequence of likely phonemes against the words and
patterns of words specified by the active grammars.
Result generation: Provides the application with information about the words the
recognizer has detected in the incoming audio. The result information is always providedonce recognition of a single utterance (often a sentence) is complete, but may also be
provided during the recognition process. The result always indicates the recognizer's best
guess of what a user said, but may also indicate alternative guesses.
-
7/31/2019 VOICERSFINAL
17/74
A grammaris an object in the Java Speech API that indicates what words a user is
expected to say and in what patterns those words may occur. Grammars are important to speech
recognizers because they constrain the recognition process. These constraints make recognition
faster and more accurate because the recognizer does not have to check for bizarre sentences.
The Java Speech API supports two basic grammar types: rule grammars and dictation
grammars. These types differ in various ways, including how applications set up the grammars;
the types of sentences they allow; how results are provided; the amount of computational
resources required; and how they are used in application design. Rule grammars are defined
by JSGF, the Java Speech Grammar Format.
Speech Recognition Workflow
Speech Recognition weakness and flaws
Besides all these advantages and benefits, yet a hundred percent perfect speech recognitionsystem is unable to be developed. There are number of factors that can reduce the accuracy and
performance of a speech recognition program.
Speech recognition process is easy for a human but it is a difficult task for a machine, comparing
with a human mind speech recognition programs seems less intelligent, this is due to that fact
-
7/31/2019 VOICERSFINAL
18/74
that a human mind is God gifted thing and the capability of thinking, understanding and reacting
is natural, while for a computer program it is a complicated task, first it need to understand the
spoken words with respect to their meanings, and it has to create a sufficient balance between the
words, noise and spaces. A human has a built in capability of filtering the noise from a speech
while a machine requires training, computer requires help for separating the speech sound from
the other sounds.
Few factors that are considerable in this regard are:
Homonyms:
Are the words that are differently spelled and have the different meaning but acquires the same
meaning, for example there their be andbee. This is a challenge for computer machine
to distinguish between such types of phrases that sound alike.
Overlapping speeches:
A second challenge in the process, is to understand the speech uttered by different users, current
systems have a difficulty to separate simultaneous speeches form multiple users.
Noise factor:
The program requires hearing the words uttered by a human distinctly and clearly. Any extra
sound can create interference, first you need to place system away from noisy environments and
then speak clearly else the machine will confuse and will mix up the words.
The Future Of Speech Recognition:
Accuracy will become better and better.
Dictation speech recognition will gradually become accepted.
Greater use will be made ofintelligentsystems which will attempt to guess what the
speaker intended to say, rather than what was actually said, as people often misspeak and
make unintentional mistakes.
-
7/31/2019 VOICERSFINAL
19/74
Microphone and sound systems will be designed to adapt more quickly to changing
background noise levels, different environments, with better recognition of extraneous
material to be discarded.
JSGF Grammar Format
Speech recognition systems provide computers with the ability to listen to user speech and
determine what is said. Current technology does not yet support unconstrained speech
recognition: the ability to listen to any speech in any context and transcribe it accurately. To
achieve reasonable recognition accuracy and response time, current speech recognizers constrain
what they listen for by using grammars.
The Java Speech Grammar Format (JSGF) defines a platform-independent, vendor-independent
way of describing one type of grammar, a rule grammar (also known as a command and control
grammar or regular grammar). It uses a textual representation that is readable and editable by
both developers and computers, and can be included in Java source code. The other major
grammar type, the dictation grammar, is not discussed in this document.
A rule grammar specifies the types of utterances a user might say (a spoken utterance is similar
to a written sentence). For example, a simple window control grammar might listen for "open a
file", "close the window", and similar commands.
What the user can say depends upon the context: is the user controlling an email application,
reading a credit card number, or selecting a font? Applications know the context, so applications
are responsible for providing a speech recognizer with appropriate grammars.
This document is the specification for the Java Speech Grammar Format. First, the basic naming
and structural mechanisms are described. Following that, the basic components of the grammar,
the grammar header and the grammar body, are described. The grammar header declares the
grammar name and lists the imported rules and grammars. The grammar body defines the rules
of this grammar as combinations of speakable text and references to other rules. Finally, some
-
7/31/2019 VOICERSFINAL
20/74
simple examples of grammar declarations are provided. Grammars are used by speech
recognizers to determine what the recognizer should listen for, and so describe the utterances a
user may say.
A Java Speech Grammar Format document starts with a self-identifying header. This header
identifies that the document contains JSGF and indicates the version of JSGF being used
(currentlyV1.0). JSGFV 1:0.The grammar body defines rules. Each rule is defined in a rule
definition. A rule is defined once in a grammar. The order of definition of rules is not
significant. Rule Name >= rule Expansion; public < rule Name >= rule Expansion;
Sphinx Speech Recognition System
Sphinx-4 is a speech recognition system written entirely in the Java(TM) programming language.
Sphinx is a continuos-speech, speaker-independent recognition system with large vocabulary
recognition making use of hidden Markov acoustic models(HMMs) and n-gram statistical
language model.
Each component of the architecture is explained below:
Recognizer- Contains the main components of Sphinx-4, which are the front end, the linguist,
and the decoder. The application interacts with the Sphinx-4 system mainly via the Recognizer.
Audio - The data to be decoded. This is audio in most systems, but it can also be configured to
accept other forms of data, e.g., spectral or cepstral data.
Front End- Performs digital signal processing (DSP) on the incoming data.
Feature- The output of the front end are features, which are used for decoding in the rest of the
system.
-
7/31/2019 VOICERSFINAL
21/74
Linguist- Embodies the linguistic knowledge of the system, which are the acoustic model, the
dictionary, and the language model. The linguist produces a search graph structure on which the
search manager performs search using different algorithms.
.
Sphinx-4 Architecture
Acoustic Model- Contains a representation (often statistical) of a sound, often created by
training using lots of acoustic data
-
7/31/2019 VOICERSFINAL
22/74
Dictionary- Responsible for determining how a word is pronounced.
Language Model- Contains a representation (often statistical) of the probability of
occurrence of words.
Search Graph- The graph structure produced by the linguist according to certain criteria
(e.g., the grammar), using knowledge from the dictionary, the acoustic model, and the
language model.
Decoder- Contains the search manager.
Search Manager- Performs search using certain algorithm used, e.g., breadth first
search, best-first search, depth-first search, etc.. Also contains the feature scorer and the
pruner.
Active List- A list of tokens representing all the states in the search graph that are active
in the current feature frame.
Scorer- Scores the current feature frame against all the active states in the Active List.
Pruner- Prunes the active list according to certain strategies.
Result- The decoded result, which usually contains the N-best results.
Configuration Manager- loads the Sphinx-4 configuration data from an XML based
file, and manages the component life cycle for objects.
-
7/31/2019 VOICERSFINAL
23/74
The need for Sphinx 4:
Need to overcome Sphinx-3s limitations
Need for flexibility in acoustic modeling
Require handling of multimodal inputs
With information fusion at various levels
Need for more correct decoders
Need for expansion of language model capabilities
Facilitate the incorporation of several new online
algorithms, that are currently difficult to incorporate
into Sphinx-3
Need for better application interfaces
The SPHINX of the new millennium:
An open source project by Carnegie Mellon
University, SUN Microsystems Inc. and MERL
Written entirely in Java TM
Highly modularized and flexible architecture
Supports any acoustic model structure
Supports most types of language models
CFGs, N grams, Combinations
New algorithms for obtaining word level hypotheses
Multimodal inputs
Flexible APIs
-
7/31/2019 VOICERSFINAL
24/74
Recognition Issue:
Good Voice Data is the key
to good recognition!
Quality of recognitionis directly related to quality of voice data
As part of the Sphinx 4 project we will be developing a trainer to give us good voice data
How does a Recognizer Work?
-
7/31/2019 VOICERSFINAL
25/74
Goal:
Audio goes in
Results come out
Three application types
Isolated words
Command /
Control
General Dictation
Front-End:
Transforms speech waveform into features used by recognition
Features are sets of mel-frequency cepstrum coefficients (MFCC)
MFCC model human auditory system
Front-End is a set of signal processing filters
Pluggable architecture
-
7/31/2019 VOICERSFINAL
26/74
Knowledge Base:
The data that drives the decoder
Consists of three sets of data:
Dictionary
Acoustic Model
Language Model
Needs to scale between the three application types
DICTIONARY:
Maps words to pronunciations
Provides word classification information (such as part-of- speech)
Single word may have multiple pronunciations
Pronunciations represented as phones or other units
Can vary in size from a dozen words to >100,000 words
-
7/31/2019 VOICERSFINAL
27/74
Language Model:
Describes what is likely to be spoken in a particular context
Uses stochastic approach.Word transitions are defined in terms of transition probabilities
Helps to constrain the search space
Acoustic Models:
Database of statistical models
Each statistical model represents a ingle unit of speech such as a word or phoneme
Acoustic Models are created/trained by analyzing large corpora of labeled speech
Acoustic Models can be speaker dependent or speaker independent
-
7/31/2019 VOICERSFINAL
28/74
Chapter 4
FEASIBILITY STUDY, REQUIREMENT ANALYSIS
SOFTWARE DEVELOPMENT LIFE CYCLE
Since the inception of this project all software engineering principles have been followed. This
project has passed through all the stages of software development lifecycle (SDLC). A
development process consist of various phases, each phase ending with a defined output. The
main reason for following the SDLC process is that it breaks the problem of developing software
into successfully performing a set of phases, each phase handling a different concern of software
development. Object technologies lead to reuse and reuse (of program components) lead to faster
software development and higher quality programs. Object oriented software is easy to maintain
because its structure is inherently decoupled. In addition, object oriented systems are easier to
adopt and easier to scale. The Object Oriented process moves through an evolutionary spiral that
starts with customer satisfaction. It is here that the problem domain is defined and that basic
problem classes are identified. Planning establishes a foundation for the Object Oriented Project
plan.
FEASIBILTY STUDYIt is feasible because it is being frequently used in various areas like military, telephone,
healthcare etc. It is also used by topmost industries for the recognition of their employees in their
attendance process. So it is feasible and can be completed in given period. A Real-Time Voice
Recognition Security System can be developed using the different algorithm.
THREE PHASES OF FEASIBILITY STUDY
Technical Feasibility:
It involves determining whether or not a system can actually be constructed to solve the problem
at hand. The technical issues raised during the feasibility stage of investigation are related to
achievability of projects goal and possibility of completion of project.
-
7/31/2019 VOICERSFINAL
29/74
Economical Feasibility:
This feasibility deals with the cost/benefit analysis. A number of intangible benefits like user
friendliness, robustness and security were pointed out. The cost that will be incurred upon the
implementation of this project would be quite nominal.
Operational Feasibility:
The developed system will be very reliable and user friendly. All the features and operations that
we will implement in our project are possible to implement and thus feasible. This will facilitate
easy use and adoptability of the system. With the use of menus, and proper validation required it
become fully understandable to the common user and operational with the user.
STEPS INVOLVED IN THE FEASIBILITY ANALYSIS
Feasibility is carried out in the following steps:
Form a project team and appoint a project leader:
First of all project management of the organization forms separate teams for independent project
team comprises of one or system analyst and programmers with a project leader. The project
leader is responsible for planning and managing the development activities of the system.
Starts preliminary investigation:
The system analyst of each project team starts preliminary investigation through different fact
techniques.
Prepare the current system flow chart:
After preliminary investigation; the analysts prepare the system flowchart of the current system.
These charts describe the general working of the system in graphical way.
Determine objective of the proposed system:
The major objectives of the proposed system are listed by each analyst and are discussed in the
current system.
-
7/31/2019 VOICERSFINAL
30/74
Describe the deficiencies of the proposed system:
On study the current system flowchart, the analysts prepare their system flowchart; the analysts
prepare their system flowchart. Systems flowcharts of the proposed system are compared with of
the current system.
Prepare the proposed system flow chart:
After determining the major objectives of the proposed system; the analysts prepare their system
flowchart. Systems flowcharts of the proposed system are compared with of the current system.
Determining the technical feasibility:
The existing computer systems (hardware/software) of the concerned department are identified
and their technical specifications are noted down. The analyst decides whether the existing
systems are sufficient for the technical requirement of the proposed system or not.
Determine the operational feasibility:
After determine the economic feasibility, the analysts identify the responsible users of the system
and hence determine the operational feasibility of the project.
Presentation of feasibility analysis:
During the feasibility study, the analysts also keep on the feasibility report. At the end feasibility
analysis report is given to the management along the oral presentation.
Feasibility Analysis report:
Feasibility analysis report is formal document for management use and is prepared for system
analyst during or after feasibility study. This report generally contains the following sections .
Covering letter:
It is formally presents the report with brief description of the project problem along with
recommendation to be considered.
-
7/31/2019 VOICERSFINAL
31/74
Table of content:
It lists the section of feasibility study report along with their page number.
Description of the existing system:
A brief description of the existing system along with the purpose and scope of the project.
System requirement:
The system requirements, which are either derived from the existing system or from the
discussion with the users, are presented in this section.
Description of proposed system:
It presents a general description of the proposed system, highlighting its role in solving theproblem. A description of output reports to be generated by the system is also represented in the
desired formats.
Development plan:
It present a detailed plan with the starting and completion dates for different phases of SDLC.
Complimentary planes also needed for hardware and software evaluation, purchase and
installation.
Technical feasibility finding:
It presents the finding of technical feasibility study along with recommendation.
Costs and benefits:
The detailed findings of cost and benefits analysis are presented in this section. The saving and
benefits are highlighted to justify the economic feasibility of this project.
Operational feasibility finding
It presents the finding of operational feasibility along with the human resource requirements to
implement the system.
-
7/31/2019 VOICERSFINAL
32/74
REQUIREMENT ANALYSIS
A requirement is a condition or capability that must be met or possessed by a system to satisfy a
contract, standard, specification or other formally imposed specification of the client. This phase
ends with the Software Requirements Specifications (SRS). The SRS is a document that
completely describes what the proposed software should do without describing how the software
will do it.
SOFTWARE REQUIREMENTS SPECIFICATIONS
System Analysis is a technique for carrying out system requirement & project management using
structured analysis for specifying both manual & automated system. In system analysis the focus
is on inquiring of current organizational environment,defining the system requirement, making
recommendation for system improvement and determining the feasibility of system.
Analysis Methodology:
A complete understanding of requirement is essential for success of a project. This is done by
gathering information, the approach and manner in which sensitivity, commonsense and
knowledge of what and when to gather and what to use in securing information. There are
various tools for gathering during the phase of system analysis.
The phases are:-
1. Familiarity with the present through available documentation, such as procedure manuals,
document and their flow, interviews of user staff and on site observation.
2. Defining of decision making associated with managing the system. This is important for
determining what information is required of the system conduction interview clarifies the
decision point and how decision made in user area.
3. Once decision point is identified, a database may be conduct to define the information
requirement. The information gathered is analyzed and documented. Discrepancies
between decision system and information gathered from the information system are
identified. This concludes the analysis and sets the stage for system design.
-
7/31/2019 VOICERSFINAL
33/74
Type of Information Needed:
Organization based information deals with policies, objectives, goals and structure. User based
information focuses on information requirement. Work based information addresses the work
flow, method & procedure and workstation. We are interested in what happened to data through
various point in system.
SYSTEM REQUIREMENTS:
SOFTWARE REQUIREMENTS:
Language :Java SDK, Eclipse Front End Tool: Sphinx-4
Back End Tool : Oracle 10g for database.
Operating system :Windows XP/7
Microsoft Word is used for documentation.
HARDWARE REQUIREMENTS:
Processor: PC with a Pentium IV-class processor, 600 MHz, Recommended: Pentium IV-class,
1.63 GHz.
RAM : 1 GB
Hard Disk Space: 20 GB on system drive, 10 GB for development environment.
Microphone : Good Quality microphone.
-
7/31/2019 VOICERSFINAL
34/74
Chapter 5
SYSTEM ANALYSIS AND SYSTEM DESIGN
Requirement analysis defines WHAT the system should do; design tells HOW to do it. Thisis the simplest way to defines system design. Any design has to be constantly evaluated to ensure
that it meet its requirements, is practical and workable in the given environment. If there are
number of alternatives, then all alternatives must be evaluated and the best possible solution
must be implemented.
SYSTEM ANALYSIS
System Analysis is a term used to describe the process of calculating and analyzing facts in
respect of existing operation of the prevailing situations that an effective computerized system
may be designed and implemented if provided feasible. This is required in order to understand
the problem that has to be solved. The problem may be of any kind like computerizing an
existing system or developing an entirely new system or it can be a combination of two.
Basically system analysis is used to describe the process of calculating and analyzing facts
related to the existing operations of the prevailing situation, so that an effective and accurate
computerized system may be designed and implemented if feasible. This is required in order to
understand the problem the problem that has to be solved. To solve the problem in actual sense is
not the aim of designing phase, but to see how the problem can be solved. For this the logical
model of the system is required, providing the way to solve the problem and achieving the
desired goal. The logical view of the system is provided to the developer and user for decision
making such that developer can fee lease in designing the system.
SPECIFICATION OF PROJECT
The proposed system should have following features:
1. It should be able to store voices in .wav format.
2. It should be able to store usernames in database.
3. It should provide the option for existing and new user.
-
7/31/2019 VOICERSFINAL
35/74
4. It should have the ability of processing voice prints.
5. It should closely match the voices.
6. It should recognize speech up to a reasonable extent.
7. It should provide proper guidance to the user to use it.
8. It should give fast results.
SYSTEM DESIGN
System Design is the technique of creating a system that takes into notice such factors such as
needs, performance levels, database design, hardware specifications, and data management. It is
the most important part in the development part in the development of the system, as in the
design phase the developer brings into existence the proposed system the analyst through of in
the analysis phase.
DESIGN CONCEPT
Software design sites at the technical kernel of software engineering and is applied regardless of
the software process model that is used. After software requirements have been analyzed and
specified. Software design is the first of three technical activities-designs, code generation and
test-that are required to build and verify the software. Each activity transforms information in a
manner that utility results in validated computer software. The design transforms the information
domain model created during analysis into the data structure that will be required to implement
the software. The data objects and relationship diagram and the detailed data content depicted in
the data dictionary provide the basis for the design activity .As aforesaid Design is that phase
of software engineering that tells all about the completion of a project or complete failure. In our
project Face Recognition System we have spent maximum time on Image preprocessing &processing. Now we are ready with processed images so as to make it easier for the user to match
images. Also data flow diagrams for the project has been developed. While developing this
project we have gone through various angles of images. The training data base structures are well
defined with complete description of images about the used. Another part which took most of our
consideration is that we decided to create the user input for directly giving path of images in the
-
7/31/2019 VOICERSFINAL
36/74
dialog box and then executing each of them. The architectural design defined the relationship
between major structure elements of the software, the design patterns that can be used to
achieve the requirements that have been defined for the system and the constraints that affect the
way in which architectural design pattern can be applied. The interface design describes how the
software communicates with in itself, with systems that interoperate with it, and with humans
who use it .An interface applies a flow of information and a specific type of behavior. Design is
the phase where quality is fostered in website designing. Design provides us with representations
of software that can be assessed for quality. Design is the only way that we can accurately
translate a customers requirement into a finished software product or systems. Website design
serves as the foundation of the software support steps that follow.
-
7/31/2019 VOICERSFINAL
37/74
Chapter 6
DATA FLOW DIAGRAM
A data flow diagram is graphical tool used to describe and analyze movement of datathrough a system. These are the central tool and the basis from which the other components are
developed. The transformation of data from input to output, through processed, may be
described logically and independently of physical components associated with the system. These
are known as the logical data flow diagrams. The physical data flow diagrams show the actual
implements and movement of data between people, departments and workstations.
A full description of a system actually consists of a set of data flow diagrams. Using two
familiar notations Yourdon, Gane and Sarson notation develops the data flow diagrams. Eachcomponent in a DFD is labelled with a descriptive name. Process is further identified with a
number that will be used for identification purpose. The development of DFDs is done in
several levels. Each process in lower level diagrams can be broken down into a more detailed
DFD in the next level. The lop-level diagram is often called context diagram. It consists a single
process bit, which plays vital role in studying the current system. The process in the context
level diagram is exploded into other process at the first level DFD.
The idea behind the explosion of a process into more process is that understanding at one
level of detail is exploded into greater detail at the next level. This is done until further
explosion is necessary and an adequate amount of detail is described for analyst to understand
the process.
Larry Constantine first developed the DFD as a way of expressing system requirements in
a graphical from, this lead to the modular design.
A DFD is also known as a bubble Chart has the purpose of clarifying system
requirements and identifying major transformations that will become programs in system design.
So it is the starting point of the design to the lowest level of detail. A DFD consists of a series of
bubbles joined by data flows in the system.
-
7/31/2019 VOICERSFINAL
38/74
DFD SYMBOLS:
In the DFD, there are four symbols
1.
A square defines a source(originator) or destination of system data.2. An arrow identifies data flow. It is the pipeline through which the information flows.
3. A circle or a bubble represents a process that transforms incoming data flow into outgoing
data flows.
4. An open rectangle is a data store, data at rest or a temporary repository of data.
Process that transforms data flow.
Source or Destination of data
Data flow
Data Store
CONSTRUCTING DFD:
Several rules of thumb are used in drawing DFDs:
1. Process should be named and numbered for an easy reference. Each name should be
representative of the process.
2. The direction of flow is from top to bottom and from left to right. Data traditionally flow
from source to the destination although they may flow back to the source. One way to
indicate this is to draw long flow line back to a source. An alternative way is to repeat the
-
7/31/2019 VOICERSFINAL
39/74
source symbol as a destination. Since it is used more than once in the DFD it is marked with
a short diagonal.
3. When a process is exploded into lower level details, they are numbered.
4. The names of data stores and destinations are written in capital letters. Process and dataflow
names have the first letter of each work capitalized.
5. A DFD typically shows the minimum contents of data store. Each data store should contain
all the data elements that flow in and out. Questionnaires should contain all the data elements
that flow in and out. Missing interfaces redundancies and like is then accounted for often
through interviews.
SAILENT FEATURES OF DFD:
1.The DFD shows flow of data, not of control loops and decision are controlled
considerations do not appear on a DFD.
2.The DFD does not indicate the time factor involved in any process whether the dataflow
take place daily, weekly, monthly or yearly.
3.The sequence of events is not brought out on the DFD.
TYPES OF DATA FLOW DIAGRAMS:
1. Current Physical
2. Current Logical
3. New Logical
4. New Physical
CURRENT PHYSICAL:In Current Physical DFD process label include the name of people or their positions or
the names of computer systems that might provide some of the overall system-processing label
includes an identification of the technology used to process the data. Similarly data flows and
data stores are often labels with the names of the actual physical media on which data are stored
such as file folders, computer files, business forms or computer tapes.
-
7/31/2019 VOICERSFINAL
40/74
CURRENT LOGICAL:
The physical aspects at the system are removed as mush as possible so that the current
system is reduced to its essence to the data and the processors that transforms them regardless of
actual physical form.
NEW LOGICAL:
This is exactly like a current logical model if the user were completely happy with he user
were completely happy with the functionality of the current system but had problems with how it
was implemented typically through the new logical model will differ from current logical model
while having additional functions, absolute function removal and inefficient flows recognized.
NEW PHYSICAL:
The new physical represents only the physical implementation of the new system.
RULES GOVERNING THE DFDS
PROCESS:
1) No process can have only outputs.
2) No process can have only inputs. If an object has only inputs than it must be a sink.
3) A process has a verb phrase label.
DATA STORE:
Data cannot move directly from one data store to another data store, a process must move data.
from the source and place the data into data store .A data store has a noun phrase label.
SOURCE OR SINK
The origin and /or destination of data.
1) Data cannot move direly from a source to sink it must be moved by a process
2) A source and /or sink has a noun phrase land
-
7/31/2019 VOICERSFINAL
41/74
DATA FLOW
1) A Data Flow has only one direction of flow between symbols. It may flow in both
directions between a process and a data store to show a read before an update. The
later is usually indicated however by two separate arrows since these happen at
different type.
2) A join in DFD means that exactly the same data comes from any of two or more
different processes data store or sink to a common location.
3) A data flow cannot go directly back to the same process it leads. There must be at
least one other process that handles the data flow produce some other data flow
returns the original data into the beginning process.
4) A Data flow to a data store means update (delete or change).
5) A data Flow from a data store means retrieve or use.
6) A data flow has a noun phrase label more than one data flow noun phrase can appear
on a single arrow as long as all of the flows on the same arrow move together as one
package.
DEVELOPING DATA-FLOW DIAGRAM
Top-Down Approach:
The system designer makes "a context level DFD" or Level 0, which shows the
"interaction" (data flows) between "the system" (represented by one process) and "the system
environment" (represented by terminators).The system is "decomposed in lower-level DFD
(Level 1)" into a set of "processes, data stores, and the data flows between these processes
and data stores" .Each process is then decomposed into an "even-lower-level diagram containing
its sub processes". This approach "then continues on the subsequent sub processes", until a
necessary and sufficient level of detail is reached which is called the primitive process
-
7/31/2019 VOICERSFINAL
42/74
DATA FLOW DIAGRAM LEVELS
Context Level Diagram:
This level shows the overall context of the system and its operating environment and shows the
whole system as just one process. It does not usually show data stores, unless they are "owned"
by external systems, e.g. are accessed by but not maintained by this system, however, these are
often shown as external entities.
-
7/31/2019 VOICERSFINAL
43/74
Level 1 (High Level Diagram):
This level (Level 1) shows all processes at the first level of numbering, data stores, external
entities and the data flows between them. The purpose of this level is to show the major high-
level processes of the system and their interrelation. A process model will have one, and only
one, level-1 diagram. A level-1 diagram must be balanced with its parent context level diagram,
i.e. there must be the same external entities and the same data flows, these can be broken down
to more detail in level 1.
-
7/31/2019 VOICERSFINAL
44/74
LEVEL 2 DFD DIAGRAM:
name and identifier of higher level process shown at top of lower level diagram
frame represents the boundary of the process
data flow across the frame must relate to data flows at the higher level
data store used by only one process are usually shown as internal to that process at the
lower level
processes with no further decomposition are marked/*
-
7/31/2019 VOICERSFINAL
45/74
Chapter 7
CODE SNIPPETS
1.RSSReader Classpackage com.cvrce.projects.launcher;
import java.net.URL;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.CharacterData;
import org.w3c.dom.Document;import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class RSSReader {
private static RSSReader instance = null;
private RSSReader() {
}
public static RSSReader getInstance() {
if(instance == null) {
instance = new RSSReader();
}
return instance;
}
public String writeNews() {
String s=new String("hello and welcome to News Reader Application. ");
String newsInBrief = new String("Briefing the headlines?");
String headLines = new String("! The headlines are?");
try {
-
7/31/2019 VOICERSFINAL
46/74
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
URL u = new URL("http://feeds.bbci.co.uk/news/world/asia/rss.xml"); // your feed url
Document doc = builder.parse(u.openStream());
NodeList nodes = doc.getElementsByTagName("item");
for(int i=0;i
-
7/31/2019 VOICERSFINAL
47/74
try {
Node child = e.getFirstChild();
if(child instanceof CharacterData) {
CharacterData cd = (CharacterData) child;
return cd.getData();
}
}
catch(Exception ex) {
}
return "";
} //private String getCharacterDataFromElement
protected float getFloat(String value) {
if(value != null && !value.equals("")) {return Float.parseFloat(value);
}
return 0;
}
protected String getElementValue(Element parent,String label) {
return getCharacterDataFromElement((Element)parent.getElementsByTagName(label).item(0));
}
/*public static void main(String[] args) {
RSSReader reader = RSSReader.getInstance();
reader.writeNews();}
*/
}
2.Class TaskLauncher1
package com.cvrce.projects.launcher;
import java.awt.*;
//import java.awt.event.*;
import com.sun.speech.freetts.Voice;
import com.sun.speech.freetts.VoiceManager;
import com.sun.speech.freetts.audio.AudioPlayer;
import java.io.*;
import edu.cmu.sphinx.frontend.util.Microphone;
-
7/31/2019 VOICERSFINAL
48/74
public class TaskLauncher1 extends Frame {
static int type; //mediaType=1 for movie, =2 for song and =3 for file
Frame f;
TextArea t1;
public TaskLauncher1() {
f=new Frame("BBC News");
//setLayout(new FlowLayout());
t1=new TextArea(200,200);
//t1.setSize(100, 50);
f.add(t1);
f.setSize(1200,700);
}
public Boolean launchTask(String task)
{
System.out.println("Launcher received : "+task);
// Microphone microphone=new Microphone();
try {
if(task.contains("movie"))
{type=1;
// microphone.stopRecording();
String s=new String("Select your movie! say? 1? for Sixth
sense? 2? for Illusionist? 3? for Madagascar? 4? for shrek? and 5? for Impact");
voice1(s);
//microphone.startRecording();
}
else if(task.contains("song"))
{
type=2;
String s=new String("Select your Music? say 1? for Chak
de India? 2? for Give me some sun shine? 3? for iss pal? 4? for miss independent and 5? for
Kaash ik din ");
voice1(s);
-
7/31/2019 VOICERSFINAL
49/74
}
//Runtime.getRuntime().exec("D:\\VLC\\vlc
E:\\Music\\Low.mp3");
//Runtime.getRuntime().exec("E:\\Music\\Low.mp3");
else if(task.contains("data file"))
{
type=3;
int i=0;
String s=new String("Select whose biodata file to read? say
1? for samarpita? 2? for pranita? 3? for snigdha? and 4? for ellora green");
voice1(s);
//fileread(i);
}if(task.contains("one"))
{
//if media type is movie
if(type == 1)
{
//play first movie
Runtime.getRuntime().exec("D:\\VLC\\vlc
E:\\Movies\\Sixth_sense.avi");
}
//if media type is song
if(type == 2)
{
//play first song
Runtime.getRuntime().exec("D:\\VLC\\vlc
E:\\Music\\ChakDe.mp3");
}
//if type is file
if(type==3)
fileread(1);
}
-
7/31/2019 VOICERSFINAL
50/74
//if user says two
if(task.contains("two"))
{
//if media type is movie
if(type == 1)
{
//play 2nd movie
Runtime.getRuntime().exec("D:\\VLC\\vlc
E:\\Movies\\The_Illusionist.avi");
}
//if media type is song
if(type == 2)
{//play 2nd song
Runtime.getRuntime().exec("D:\\VLC\\vlc
E:\\Music\\3idiots04.mp3");
}
//if type is file
if(type==3)
fileread(2);
}
//if user says Three
if(task.contains("three"))
{
//if media type is movie
if(type == 1)
{
//play first movie
Runtime.getRuntime().exec("D:\\VLC\\vlc
E:\\Movies\\madagascar2.mkv");
}
//if media type is song
if(type == 2)
-
7/31/2019 VOICERSFINAL
51/74
{
//play first song
Runtime.getRuntime().exec("D:\\VLC\\vlcE:\\Music\\Ispal.mp3");
}
//if type is file
if(type==3)
fileread(3);
}
//if user says four
if(task.contains("four"))
{//if media type is movie
if(type == 1)
{
//play first movie
Runtime.getRuntime().exec("D:\\VLC\\vlc
E:\\Movies\\Shrek1.avi");
}
//if media type is song
if(type == 2){
//play first song
Runtime.getRuntime().exec("D:\\VLC\\vlc
E:\\Music\\MissIndependent.mp3");
}
//if type is file
if(type==3)
fileread(4);
}
-
7/31/2019 VOICERSFINAL
52/74
//if user says five
if(task.contains("five"))
{
//if media type is movie
if(type == 1)
{
//play first movie
Runtime.getRuntime().exec("D:\\VLC\\vlc
D:\\Impact.avi");
}
//if media type is song
if(type == 2)
{//play first song
Runtime.getRuntime().exec("D:\\VLC\\vlc
E:\\Music\\dwnlds\\showbiz03.mp3");
}
}
else if(task.contains("news"))readRSS();
else if(task.contains("snap"))
Runtime.getRuntime().exec("D:\\PicasaPhotoViewerD:\\friends.jpg");
else
{
String s=new String("");
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
-
7/31/2019 VOICERSFINAL
53/74
}
return false;
}
public void listAllVoices() {
VoiceManager voiceManager = VoiceManager.getInstance();
Voice[] voices = voiceManager.getVoices();
}
public void voice1(String s)
{
listAllVoices();
String voiceName = "kevin16";
/* The VoiceManager manages all the voices for FreeTTS.
*/
VoiceManager voiceManager = VoiceManager.getInstance();
Voice helloVoice = voiceManager.getVoice(voiceName);
if (helloVoice == null) {
System.err.println("Cannot find a voice named "
+ voiceName + ". Please specify a different voice.");
System.exit(1);
}
/* Allocates the resources for the voice.
*/
helloVoice.allocate();
/* Synthesize speech.
*/
helloVoice.speak(s);
helloVoice.deallocate();
-
7/31/2019 VOICERSFINAL
54/74
}
public void fileread(int i)throws Exception
{
String s1=new String();
if (i==1)
{
s1="D:/sambiodata.txt";
//Runtime.getRuntime().exec("D://sambiodata.txt");
}
if (i==2)
{
s1="D:/prabiodata.txt";
//Runtime.getRuntime().exec("D://prabiodata.txt");
}
if (i==3)
{
s1="E:/snicv.txt";
//Runtime.getRuntime().exec("E://snicv.txt");
}
if (i==4)
{
s1="E:/ellucv.txt";
//Runtime.getRuntime().exec("E://ellucv.txt");}
FileReader fr = new FileReader(s1);
BufferedReader br = new BufferedReader(fr);
String s2;
while((s2 = br.readLine())!= null) {
System.out.println(s2);
voice1(s2);
}
fr.close();
}
-
7/31/2019 VOICERSFINAL
55/74
public void readRSS()
{
RSSReader reader = RSSReader.getInstance();
String s=reader.writeNews();
f.setVisible(true);
t1.setText(s);
//speak the news
voice1(s);
}
}
3.Class VoiceResponseSystem
/*
* Copyright 1999-2004 Carnegie Mellon University.
* Portions Copyright 2004 Sun Microsystems, Inc.
* Portions Copyright 2004 Mitsubishi Electric Research Laboratories.
* All Rights Reserved. Use is subject to license terms.
*
* See the file "license.terms" for information on usage and
* redistribution of this file, and for a DISCLAIMER OF ALL
* WARRANTIES.*
*/
package com.cvrce.projects.speech;
import com.cvrce.projects.launcher.TaskLauncher1;
import com.sun.speech.freetts.Voice;
import com.sun.speech.freetts.VoiceManager;
import edu.cmu.sphinx.frontend.util.Microphone;import edu.cmu.sphinx.recognizer.Recognizer;
import edu.cmu.sphinx.result.Result;
import edu.cmu.sphinx.util.props.ConfigurationManager;
/**
-
7/31/2019 VOICERSFINAL
56/74
* A Program showing a simple speech application built using Sphinx-4. This application uses
the Sphinx-4
* endpointer, which automatically segments incoming audio into utterances and silences.
*/
public class VoiceResponseSystem {
public void listAllVoices() {
VoiceManager voiceManager = VoiceManager.getInstance();
Voice[] voices = voiceManager.getVoices();
}
public void voice1(String s)
{
listAllVoices();
String voiceName = "kevin16";
System.out.println();
//System.out.println("Using voice: " + voiceName);
/* The VoiceManager manages all the voices for FreeTTS.
*/VoiceManager voiceManager = VoiceManager.getInstance();
Voice helloVoice = voiceManager.getVoice(voiceName);
if (helloVoice == null) {
System.err.println(
"Cannot find a voice named "
+ voiceName + ". Please specify a different voice.");
System.exit(1);
}
/* Allocates the resources for the voice.
*/
helloVoice.allocate();
-
7/31/2019 VOICERSFINAL
57/74
/* Synthesize speech.
*/
helloVoice.speak(s);
helloVoice.deallocate();
}
public static void main(String[] args) {
String s1= new String("Hello and welcome to Voice response system?! select your
option? " +
" say movie? to watch a movie? song? to listen a song?! news? to listen
news? " +"Data file? to listen the containts of biodata file? and? say snap? to view a
picture?");
VoiceResponseSystem v1=new VoiceResponseSystem();
//v1.voice1(s1);
ConfigurationManager cm;
if (args.length > 0) {cm = new ConfigurationManager(args[0]);
} else {
cm = new
ConfigurationManager(VoiceResponseSystem.class.getResource("vrs.config.xml"));
}
Recognizer recognizer = (Recognizer) cm.lookup("recognizer");
recognizer.allocate();
// start the microphone or exit if the programm if this is not possible
Microphone microphone = (Microphone) cm.lookup("microphone");
if (!microphone.startRecording()) {
System.out.println("Cannot start microphone.");
recognizer.deallocate();
-
7/31/2019 VOICERSFINAL
58/74
System.exit(1);
}
System.out.println("Ask: Song/News/Data File/Movie/Snap");
// loop the recognition until the programm exits.
while (true) {
System.out.println("Start speaking.\n");
Result result = recognizer.recognize();
if (result != null) {
String resultText = result.getBestFinalResultNoFiller();
System.out.println("You said: " + resultText + '\n');
TaskLauncher1 tl = new TaskLauncher1();tl.launchTask(resultText);
// microphone.stopRecording();
// recognizer.deallocate();
} else {
}
}
}
}
4.Grammar File
#JSGF V1.0;
/**
* JSGF Grammar for Hello World example
*/
grammar hello;
public = ( Song | News | Data File | Movie | One | Two | Three | Four | Five | Snap );
-
7/31/2019 VOICERSFINAL
59/74
Chapter 8
RESULT & SCREENSHOTS
After running the application it asks to choose any options by saying codeallocated to each action. There are 5 actions:
1.Song
2. Snap
3.Movie
4.News
5.Data File
-
7/31/2019 VOICERSFINAL
60/74
Output of each action is described below.
1.Selecting song
Example 1:
After selecting song, it asks for other options under this action, like Saying one
for song Chak de India, two for Give me some sunshine etc.
-
7/31/2019 VOICERSFINAL
61/74
Example 2:
-
7/31/2019 VOICERSFINAL
62/74
2.Selecting Photo:
After selecting option snap it opens a picture friends.jpg as shown below.
-
7/31/2019 VOICERSFINAL
63/74
3.Selecting movie:
After selecting movie, it asks for other options under this action, like Saying
one for movie The sixth sense, two for The Illusionist etc.
Example 1: selected movie 4 : Shrek2
-
7/31/2019 VOICERSFINAL
64/74
Example 2:
-
7/31/2019 VOICERSFINAL
65/74
4.Selecting News:
After selecting this option, it connects to the bbc news rss feed i.e,http://feeds.bbci.co.uk/news/world/asia/rss.xml
http://feeds.bbci.co.uk/news/world/asia/rss.xmlhttp://feeds.bbci.co.uk/news/world/asia/rss.xmlhttp://feeds.bbci.co.uk/news/world/asia/rss.xml -
7/31/2019 VOICERSFINAL
66/74
5.Selecting a data file to read:
After selecting data file, it asks for other options under this action, like Saying
one for filesambiodata.txt, two for prabiodata.txt etc.
Selected file 3:snicv.txt
-
7/31/2019 VOICERSFINAL
67/74
Chapter 9
DISCUSSION
The modular framework of Sphinx-4 has permitted us to do some things very easily thathave been traditionally difficult. The modular nature of Sphinx-4 also provides it with the ability
to use modules whose implementations range from general to specific applications of an
algorithm. For example, we were able to improve the runtime speed for the RM1 regression test
by almost 2 orders of magnitude merely by plugging in a new Linguist and leaving the rest of the
system the same. Furthermore, the modularity of Sphinx-4 also allows it to support a wide
variety of tasks. For example, the various SearchManager implementations allow Sphinx-4 to
efficiently support tasks that range from small vocabulary tasks implementations allow Sphinx-4
to support different tasks such as traditional CFG-based command-and-control applications in
addition to applications that use stochastic language models.
The modular nature of Sphinx-4 was enabled primarily by the use of the Java
programming language. In particular, the ability of the Java platform to load code at run time
permits simple support for the pluggable framework, and the Java programming language
construct of interfaces permits separation of the framework design from the implementation.
The Java platform also provides Sphinx-4 with a number of other advantages:
Sphinx-4 can run on a variety of platforms without the need for recompilation
The rich set of platform APIs greatly reduces coding time
Built-in support for multithreading makes it simple to experiment with distributing decoding
tasks across multiple threads
Automatic garbage collection helps developers to concentrate on algorithm development
instead of memory leaks
On the downside, the Java platform can have issues with memory footprint. Also related
to memory, some speech engines will directly access the platform memory directly in order to
optimize the memory throughput during decoding. Direct access to the platform memory model
is not permitted with the Java programming language. A common misconception people have
regarding the Java programming language is that it is too slow. When developing Sphinx-4, we
carefully instrumented the code to measure various aspects of the system, comparing the results
to its predecessor.
-
7/31/2019 VOICERSFINAL
68/74
Table I provides a summary showing that Sphinx-4 performs well (for both WER and RT,
a lower number indicates better performance). An interesting result of this helps to demonstrate
the strength of the pluggable and modular design of Sphinx-4.
we were able to plug in different implementations of the Linguist and SearchManager that were
optimized for the particular tasks, allowing Sphinx-4 to perform much better. Another interesting
aspect of the performance study shows us that raw computing speed is not our biggest concern
when it comes to RT performance. For the 2 CPU results in this table, we used a Scorer that
equally divided the scoring task across the available CPUs. While the increase in speed is
noticeable, it is not as dramatic as we expected. Further analysis helped us determine that only
about 30 percent of the CPU time is spent doing the actual scoring of the acoustic model states.
The remaining 70 percent is spent doing non-scoring activity, such as growing and pruning the
ActiveList. Our results also show that the Java platforms garbage collection mechanism only
accounts for 2-3 percent of the overall CPU usage.
TEST
WER RTTI46(11 WORDS) 0.168 0.02
TIDIGITS(11 WORDS) 0.549 0.05
AN4(79 WORDS) 1.192 0.20RM1(1000 WORDS) 2.739 0.40
WSJ5K(5000 WORDS) 7.174 0.96
(Sphinx-4 performance.word error rate (wer) is given in percent. Real time (rt) speed is the ratio
of utterance duration to the Time to decode the utterance.)
Results:
The test cases mentioned in the previous slide have been found to produce correct results given
the voice is recognized correctly. However, the voice recognition is not 100 percent accurate. It
may sometimes lead to frustrating results.
-
7/31/2019 VOICERSFINAL
69/74
Known Bugs/Defects
Since the project is based on voice recognition, the accuracy while working is not very
high.Sometimes, it may so happen that we speak at the loudest of our voice levels in as clear
pronunciation as possible and yet the program might misunderstand what is spoken. It cannot be
attributed as a bug in the project, but is for sure a defect which arises due to large number of
factors. Some of these factors may be the noise interference from the environment, difference in
the accent of the user and the accent on which the program is trained to understand etc.
Workaround:
While no perfect solution for this can be implemented, we can have a workaround. This is to
train the program to understand accent of a specific user which will in turn result in higher
accuracy.
-
7/31/2019 VOICERSFINAL
70/74
Chapter 10
CONCLUSION
ADVANTAGES:
Able to write the text through both keyboard and voice input.
Voice recognition of different notepad commands such as open save and clear.
Open different windows softwares, based on voice input.
Requires less consumption of time in writing text.
Provide significant help for the people with disabilities.
Lower operational costs.
DISADVANTAGES: Low accuracy
Not good in the noisy environment
After careful development of the Sphinx-4 framework, we created a number of differing
implementations for each module in the framework. For example, the Front End implementations
support MFCC, PLP, and LPC feature extraction; the Linguist implementations support a variety
of language models, including CFGs, FSTs, and N-Grams; and the Decoder supports a variety of
Search Manager implementations. Using the Configuration Manager, the various
implementations of the modules can be combined in various ways, supporting our claim that we
have developed a flexible pluggable framework. Furthermore, the framework is performing well
both in speed and accuracy when compared to its predecessors. The Sphinx-4 framework is
already proving itself as being research ready, easily supporting various work as well as a
specialized Linguist. We view this as only the very beginning, however, and expect Sphinx-4 to
support future areas of core speech recognition research. Finally, the source code to Sphinx-4 is
freely available. The license permits others to do academic and commercial research and to
develop products without requiring any licensing fees. More information is available at
http://cmusphinx.sourceforge.net/sphinx4.
This Thesis/Project work of voice response system started with a brief introduction of
the technology and its applications in different sectors. The project part of the Report was based
-
7/31/2019 VOICERSFINAL
71/74
on software development for voice response system. In the later stage we discussed different
tools for bringing that idea into practical work. After the development of the software finally it
was tested and results were discussed, few deficiencies factors were brought in front. After the
testing work, advantages of the software were described and suggestions for further enhancement
and improvement were discussed.
Future Enhancements
This work can be taken into more detail and more work can be done on the project in
order to bring modifications and additional features. The current software doesnt support a large
vocabulary, the work will be done in order to accumulate more number of samples and increase
the efficiency of the software. The current version of the software supports only few areas but
more areas can be covered and effort will be made in this regard.
-
7/31/2019 VOICERSFINAL
72/74
Chapter 11
BIBLIOGRAPHY
[1] S. Young, The HTK hidden Markov model toolkit: Design and philosophy, Cambridge
University Engineering Department, UK, Tech. Rep. CUED/FINFENG/TR152, Sept. 1994.
[2] N. Deshmukh, A. Ganapathiraju, J. Hamaker, J. Picone, and M. Ordowski, A public domain
speech-to-text system, in Proceedings of the 6th European
Conference on Speech Communication and Technology, vol. 5, Budapest, Hungary, Sept. 1999,
pp. 21272130.
[3] X. X. Li, Y. Zhao, X. Pi, L. H. Liang, and A. V. Nefian, Audio-visual continuous speech
recognition using a coupled hidden Markov model, in
Proceedings of the 7th International Conference on Spoken Language Processing, Denver, CO,
Sept. 2002, pp. 213216.
[4] K. F. Lee, H. W. Hon, and R. Reddy, An overview of the SPHINX speech recognition
system,IEEE Transactions on Acoustics, Speech and Signal
Processing, vol. 38, no. 1, pp. 3545, Jan. 1990.
[5] X. Huang, F. Alleva, H. W. Hon, M. Y. Hwang, and R. Rosenfeld, The SPHINX-II speech
recognition system: an overview, Computer Speech and
Language, vol. 7, no. 2, pp. 137148, 1993.
[6] M. K. Ravishankar, Efficient algorithms for speech recognition, PhD Thesis (CMU
Technical Report CS-96-143), Carnegie Mellon University, Pittsburgh,PA, 1996.
[7] P. Lamere, P. Kwok, W. Walker, E. Gouvea, R. Singh, B. Raj, and P. Wolf, Design of theCMU Sphinx-4 decoder, in Proceedings of the 8th EuropeanConference on Speech Communication and Technology, Geneve, Switzerland, Sept. 2003, pp.
11811184.
[8] J. K. Baker, The Dragon system - an overview, inIEEE Transactions on Acoustic, Speech
and Signal Processing, vol. 23, no. 1, Feb. 1975, pp. 2429.
[9] B. T. Lowerre, The Harpy speech recognition system, Ph.D. dissertation, Carnegie MellonUniversity, Pittsburgh, PA, 1976.
[10] J. K. Baker, Stochastic modeling for automatic speech understanding, in SpeechRecognition, R. Reddy, Ed. New York: Academic Press, 1975, pp.
521542.
-
7/31/2019 VOICERSFINAL
73/74
[11] P. Placeway, S. Chen, M. Eskenazi, U. Jain, V. Parikh, B. Raj, M. Ravishankar, R.
Rosenfeld, K. Seymore, M. Siegler, R. Stern, and E. Thayer, The 1996 HUB-4 Sphinx-3
system, in Proceedings of the DARPA Speech Recognition Workshop. Chantilly, VA: DARPA,
Feb. 1997. [Online]. Available:
http://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdf
[12] M. Ravishankar, Some results on search complexity vs accuracy, in Proceedings of the
DARPA Speech Recognition Workshop. Chantilly, VA:
DARPA, Feb. 1997. [Online].Available:http://www.nist.gov/speech/publications/darpa97/pdf/ravisha1.pdf
[13] F. Jelinek, Statistical Methods for Speech Recognition. Cambridge, MA: MIT Press, 1998.SMLI TR2004-0811 c2004 SUN MICROSYSTEMS INC. 9
[14] X. Huang, A. Acero, F. Alleva, M. Hwang, L. Jiang, and M. Mahajan, From SPHINX-II to
Whisper: Making speech recognition usable, inAutomatic
Speech and Speaker Recognition, Advanced Topics, C. Lee, F. Soong, and K. Paliwal, Eds.Norwell, MA: Kluwer Academic Publishers, 1996.
[15] S. B. Davis and P. Mermelstein, Comparison of parametric representations formonosyllable word recognition in continuously spoken sentences, in
IEEE Transactions on Acoustic, Speech and Signal Processing, vol. 28, no. 4, Aug. 1980.
[16] H. Hermansky, Perceptual linear predictive (PLP) analysis of speech,Journal of the
Acoustical Society of America, vol. 87, no. 4, pp. 17381752,
1990.[17] NIST. Speech recognition scoring package (score). [Online]. Available:
http://www.nist.gov/speech/tools
[18] G. D. Forney, The Viterbi algorithm, Proceedings of The IEEE, vol. 61, no. 3, pp. 268
278, 1973.[19] P. Kenny, R. Hollan, V. Gupta, M. Lenning, P. Mermelstein, and D. OShaugnessy, A*-
admissible heuristics of rapid lexical access,IEEE Transactions on Speech and Audio
Processing, vol. 1, no. 1, pp. 4959, Jan. 1993.[20] Java speech API grammar format (JSGF). [Online].
Available:http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/
[21] M. Mohri, Finite-state transducers in language and speech processing, ComputationalLinguistics, vol. 23, no. 2, pp. 269311, 1997.
[22] P. Clarkson and R. Rosenfeld, Statistical language modeling using the CMU-cambridge
toolkit, in Proceedings of the 5th European Conference on
Speech Communication and Technology, Rhodes, Greece, Sept. 1997.
[23] Carnegie Mellon University. CMU pronouncing dictionary. [Online]. Available:
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
http://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdfhttp://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdfhttp://java.sun.com/products/java-media/speech/forDevelopers/JSGF/http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/http://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://java.sun.com/products/java-media/speech/forDevelopers/JSGF/http://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdf -
7/31/2019 VOICERSFINAL
74/74
[24] S. J. Young, N. H. Russell, and J. H. S. Russell, Token passing: A simple conceptual
model for connected speech recognition systems, Cambridge
University Engineering Dept, UK, Tech. Rep. CUED/F-INFENG/TR38, 1989.
[25] R. Singh, M. Warmuth, B. Raj, and P. Lamere, Classification with free energy at raisedtemperatures, in Proceedings of the 8th European Conference
on Speech Communication and Technology, Geneve, Switzerland, Sept. 2003, pp. 17731776.
[26] P. Kwok, A technique for the integration of multiple parallel feature streams in the Sphinx-
4 speech recognition system, Masters Thesis (Sun Labs
TR-2003-0341), Harvard University, Cambridge, MA, June 2003.
[27] P. Price, W. M. Fisher, J. Bernstein, and D. S. Pallett, The DARPA 1000-word resource
management database for continuous speech recognition, in
Proceedings of the International Conference on Acoustics, Speech and Signal Processing, vol. 1.
IEEE, 1988, pp. 651654.
[28] G. R. Doddington and T. B. Schalk, Speech recognition: Turning theory to practice,IEEE
Spectrum, vol. 18, no. 9, pp. 2632, Sept. 1981.
[29] R. G. Leonard and G. R. Doddington, A database for speaker-independent digit
recognition, in Proceedings of the International Conference on Acoustics,Speech and Signal Processing, vol. 3. IEEE, 1984, p. 42.11.
[30] J. Garofolo, E. Voorhees, C. Auzanne, V. Stanford, and B. Lund, Design and preparationof the 1996 HUB-4 broadcast news benchmark test corpora,
in Proceedings of the DARPA Speech Recognition Workshop. Chantilly, Virginia: Morgan
Kaufmann, Feb. 1997, pp. 1521.
[31] (2003, Mar.) Sphinx-4 trainer design. [Online]. Available:
http://www.speech.cs.cmu.edu/cgi-bin/cmusphinx/twiki/view/Sphinx4/Train%erDesign
[32] J. R. Glass, A probablistic framework for segment-based speech recognition, Computer
Speech and Language, vol. 17, no. 2, pp. 137152, Apr. 2003.
http://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp:/