VOICERSFINAL

download VOICERSFINAL

of 74

Transcript of VOICERSFINAL

  • 7/31/2019 VOICERSFINAL

    1/74

    VOICE RESPONSE SYSTEM

    ABSTRACT

    A voice response system is a computer system that responds to voice commands, rather

    than input from a keystroke or a mouse. Uses for this kind of system range from convenience to

    necessity to security. People who are visually or otherwise physically impaired are prime

    candidates for a voice response system. Because they cannot see or otherwise access a keyboard

    or mouse, they have no way to access a computer without a voice response system, unless they

    want to depend entirely on other people. Being able literally to tell a computer what to do may be

    a revelation for someone who ordinarily has little hope of controlling a computer. A voice

    response system would also come in handy for someone who is not physically impaired. With a

    voice response system, you wouldn't need to be very close to your computer in order to access it

    or give it commands. As long as you are in earshot of the PC, it can use its voice response system

    to accept voice commands from you in the same way that it traditionally accepts keystroke and

    mouse commands.

    The system acquires speech at run time through a microphone and processes the sampled

    speech to recognize the uttered text. Sphinx-4 is a speech recognition system written entirely in

    the Java(TM) programming language. A VRS is an intelligent system which enables the user to

    instruct computer to perform actions through voice commands and also form his own repository

    of commands and map them to appropriate actions. The recognized text will be matched to

    corresponding action

  • 7/31/2019 VOICERSFINAL

    2/74

    CONTENTS

    1. Introduction

    2. Voice recognition

    Relevance of The Project

    Application of voice recognition

    3. Working of the Project

    Speech Engine

    JSAPI

    JSAPI classes and interfaces

    Speech Synthesis

    Speech Recognition

    Components

    Speech Recognition Weakness & Flaws

    Future of Speech Recognition

    JSGF Grammar Format

    Sphinx Speech Recognition System

    4. Feasibility Study & Requirement Analysis

    5. System Analysis & System Design

    6. Data flow diagram

    Context diagram

    Level 1

    Level 2

    7. code snippets

    8. results and screenshots

    9. discussion

    10. conclusion

    11.bibliography

  • 7/31/2019 VOICERSFINAL

    3/74

  • 7/31/2019 VOICERSFINAL

    4/74

  • 7/31/2019 VOICERSFINAL

    5/74

    Chapter 1

    INTRODUCTION

    A VRS is an intelligent system which enables the user to instruct computer to

    perform actions through voice commands and also form his own repository of commands and

    map them to appropriate actions.

    A voice response system is a computer system that responds to voice commands, rather

    than input from a keystroke or a mouse. Uses for this kind of system range from convenience to

    necessity to security. People who are visually or otherwise physically impaired are primecandidates for a voice response system. Because they cannot see or otherwise access a keyboard

    or mouse, they have no way to access a computer without a voice response system, unless they

    want to depend entirely on other people. Being able literally to tell a computer what to do may be

    a revelation for someone who ordinarily has little hope of controlling a computer. A voice

    response system would also come in handy for someone who is not physically impaired. With a

    voice response system, you wouldn't need to be very close to your computer in order to access it

    or give it commands. As long as you are in earshot of the PC, it can use its voice response system

    to accept voice commands from you in the same way that it traditionally accepts keystroke and

    mouse commands.

    Key points that outline the implemented idea are:

    VRS runs as a background process.

    Based on the instruction, multiple processes are created.

    While the background process keeps on listening to the user requirements,

    independent processes are continuously created in response to the input voice

    instruction.

    Voice recognition may be enabled in the processes executed on top also, but it has been

    avoided as it interferes with the background process.

  • 7/31/2019 VOICERSFINAL

    6/74

    VRS Library has been built which includes some basic commands.

    1. DATA FILE - Opens list of saved file that may be given.

    2. SONGS - Opens list of songs that may be played.

    3. MOVIES - Opens list of movies that may b played.

    4. NEWS - Read the news from given website.

    5. SNAP - Opens picture.

    The library may be further extended by the user for his own specific requirements. User.gram

    has been included in the src along with directions to add an action map for this purpose.

    Technologies used in implementation:

    Sphinx 4.

    JSAPI.

    Java Programming Language.

    JSGF Grammer files.

    The relevence and use of each of the above has been discussed later in the document. The

    code has been developed in Eclipse. The paths used in mapping actions are absolute and hence

    system dependent.

    The requirement of this project is to develop an intelligent system which:

    1. is capable of taking voice input.

    2. interprets the input command.

    3. processes the command to map it to the action set.

    4. it has an action set must contain mapping of input to the corresponding response.

    5. has adaptive mechanism to handle more mappings and add it to the action set.

    6. Example: voice input draw circle on the screen.

  • 7/31/2019 VOICERSFINAL

    7/74

    Chapter 2

    VOICE RECOGNITION

    The term voice recognition is sometimes used to refer to recognition systems that must be

    trained to a particular speaker as is the case for most desktop recognition software.

    1. Voice Recognition: Converts speech to text.

    2. Recognizing the speaker can simplify the task of translating speech.

    3. Voice Recognition targets to generalize the task without being targeted at a single speaker.

    Relevance To The Project

    1. Voice recognition is used to map a voice command with its corresponding action.This is brought about by converting speech to text.

    2. The API used for recognizing voice is trained [by default] to understand americanmale accent recorded at 16kbps.

    3. The program matches the input voice with the voice on which it is trained and maps it to the

    best possible result.

    Although the idea of recognizing voice may seem fairly simple, there are a lot of real timeproblems. Some include:

    Large amount of memory is required to store voice files

    Noise interference reduces accuracy.

    Comparing our accent with the trained voice often gives rise to absurd results.

    Precision of the system is directly proportional to complexity of source code.

    APPLICATIONS OF VOICE RECOGNITION

    Health Care

    In the health care domain, even in the wake of improving speech recognition technologies,

    medical transcriptionists (MTs) have not yet become obsolete. The services provided may be

  • 7/31/2019 VOICERSFINAL

    8/74

    redistributed rather than replaced. Speech recognition is used to enable deaf people to understand

    the spoken word via speech to text conversion, which is very helpful.

    Military

    Substantial efforts have been devoted in the last decade to the test and evaluation of speech

    recognition in fighter aircraft. Of particular note are the U.S. program in speech recognition for

    the Advanced Fighter Technology Integration (AFTI)/F-16 aircraft (F-16VISTA), the program in

    France on installing speech recognition systems on Mirage aircraft, and programs in the UK

    dealing with a variety of aircraft platforms. In these programs, speech recognizers have been

    operated successfully in fighter aircraft with applications including: setting radio frequencies,

    commanding an autopilot system, setting steer-point coordinates and weapons release

    parameters, and controlling flight displays. Generally, only very limited, constrained

    vocabularies have been used successfully, and a major effort has been devoted to integration of

    the speech recognizer with the avionics system.

    Telephony and other domains

    ASR in the field of telephony is now commonplace and in the field of computer gaming and

    simulation is becoming more widespread. Despite the high level of integration with word

    processing in general personal computing, however, ASR in the field of document production

    has not seen the expected increases in use. The improvement of mobile processor speeds made

    feasible the speech-enabled Symbian and Windows Mobile Smartphones. Speech is used mostly

    as a part of User Interface, for creating pre-definedor custom speech commands. Leading

    software vendors in this field are: Microsoft Corporation (Microsoft Voice Command), Nuance

    Communications (Nuance Voice Control), Vito Technology (VITO Voice2Go), SpeereoSoftware (Speereo Voice Translator) and SVOX.

  • 7/31/2019 VOICERSFINAL

    9/74

    People with disabilities

    People with disabilities can benefit from speech recognition programs. Speech recognition is

    especially useful for people who have difficulty using their hands, ranging from mild repetitive

    stress injuries to involved disabilities that preclude using conventional computer input devices.

    In fact, people who used the keyboard a lot and developed RSI became an urgent early market

    for speech recognition. Speech recognition is used in deaf telephony, such as voicemail to text,

    relay services, and captioned telephone. Individuals with learning disabilities who have problems

    with thought-to-paper communication (essentially they think of an idea but it is processed

    incorrectly causing it to end up differently on paper) can benefit from the software.

    Home Automation

    Luxury being the priority, such program also finds application in home automation. Home

    automation may include centralized control of lighting, heating, ventilation, air conditioning and

    other systems, to provide improved convenience, comfort, energy efficiency and security.

    Transcription

    Transcription in the linguistic sense is the conversion of a representation of language into

    another representation of language, usually in the same language but in a different form.

    Transcription should not be confused with translation, which in linguistics usually means

    converting from one language to another, such as from English to Spanish. The most common

    type of transcription is from a spoken-language source into text.

  • 7/31/2019 VOICERSFINAL

    10/74

    Chapter 3

    WORKING OF THE PROJECT

    In this chapter, we will cover all the elements required for the working of the project and then

    converge the requirements to explain the solution design implemented for the project.

    Speech Engine

    The Speech Engine loads a list of words to be recognized. This list of words is called a grammar.

    Takes input as distinct characteristics of sound - derived from the waveform and compares them

    with its own acoustic model. The engine searches its acoustic space, using the grammar to guidethis search. It then determines which words in the grammar the audio most closely matches and

    returns a result.

    Speech Engine

  • 7/31/2019 VOICERSFINAL

    11/74

    Java Speech API/JSAPI

    The Java Speech API (JSAPI) is an application programming interface for cross-

    platform support of command and control recognizers, dictation systems, and speech

    synthesizers. Although JSAPI defines an interface only there are several implementations createdby third parties, for example FreeTTS.

    The Java Speech API enables speech applications to interact with speech engines in a

    common, standardized, and implementation-independent manner. Speech engines from different

    vendors can be accessed using the Java Speech API, as long as they are JSAPI-compliant. With

    JSAPI, speech applications can use speech engine functionality such as selecting a specific

    language or a voice, as well as any required audio resources. JSAPI provides an API for both

    speech synthesis and speech recognition.

    .

    The Java Speech APIs classes and interfaces

    The different classes and interfaces that form the Java Speech API are grouped into the following

    three packages:

    javax.speech: Contains classes and interfaces for a generic speech engine.

    javax.speech.synthesis: Contains classes and interfaces for speech synthesis.

    javax.speech.recognition: Contains classes and interfaces for speech recognition.

    The Central class is like a factory class that all Java Speech API applications use. It provides

    static methods to enable the access of speech synthesis and speech recognition engines. The

    Engine interface encapsulates the generic operations that a Java Speech API-compliant speech

    engine should provide for speech applications.

    Speech applications can primarily use methods to perform actions such as retrieving the

    properties and state of the speech engine and allocating and deallocating resources for a speech

    engine. In addition, the Engine interface exposes mechanisms to pause and resume the audio

  • 7/31/2019 VOICERSFINAL

    12/74

    stream generated or processed by the speech engine. The Engine interface is subclassed by the

    Synthesizer and Recognizer interfaces, which define additional speech synthesis and speech

    recognition functionality. The Synthesizer interface encapsulates the operations that a Java

    Speech API-compliant speech synthesis engine should provide for speech applications.

    The Java Speech API is based on the event-handling model of AWT components. Events

    generated by the speech engine can be identified and handled as required. There are two ways to

    handle speech engine events: through the EngineListener interface or through the EngineAdapter

    class.

    JSAPI STACK

  • 7/31/2019 VOICERSFINAL

    13/74

    Features:

    Converts speech to text.

    Converts text and delivers them in various formats of speech.

    Supports events based on the Java event queue.

    Easy to implement API interoperates with multiple Java-based applications like applets

    and Swing applications.

    Interacts seamlessly with the AWT event queue.

    Supports annotations using JSML to improve pronunciation and naturalness in speech.

    Supports grammar definitions using JSGF.

    Ability to adapt to the language of the speaker.

    Two core speech technologies are supported through the Java Speech API: speech

    synthesis and speech recognition

    Speech synthesis

    Speech synthesis provides the reverse process of producing synthetic speech from text

    generated by an application, an applet, or a user. It is often referred to as text-to-speech

    technology.

    The major steps in producing speech from text are as follows:

    Structure analysis: Processes the input text to determine where paragraphs, sentences, and

    other structures start and end. For most languages, punctuation and formatting data are used

    in this stage.

    Text pre-processing: Analyzes the input text for special constructs of the language. In

    English, special treatment is required for abbreviations, acronyms, dates, times, numbers,

    currency amounts, e-mail addresses, and many other forms. Other languages need special

    processing for these forms, and most languages have other specialized requirements

  • 7/31/2019 VOICERSFINAL

    14/74

    The remaining steps convert the spoken text to speech:

    Text-to-phoneme conversion: Converts each word to phonemes. A phoneme is a basic unit

    of sound in a language.

    Prosody analysis: Processes the sentence structure, words, and phonemes to determine the

    appropriate prosody for the sentence.

    Waveform production: Uses the phonemes and prosody information to produce the audio

    waveform for each sentence.

    Speech synthesizers can make errors in any of the processing steps described above.

    Human ears are well-tuned to detecting these errors, but careful work by developers can

    minimize errors and improve the speech output quality. The Java Speech API and the Java

    Speech API MarkupLanguage (JSML) provide many ways for you to improve the output quality

    of a speech synthesizer.

    Speech Recognition

    Speech recognition provides computers with the ability to listen to spoken language and

    determine what has been said. In other words, it processes audio input containing speech by

    converting it to text.

    Speech Recognition System

    Components:

    With the help of microphone audio is input to the system, the pc sound card produces the

    equivalent digital representation of received audio.

    DigitizationThe process of converting the analog signal into a digital form is known as digitization, it

    involves the both sampling and quantization processes. Sampling is converting a continuous

    signal into discrete signal, while the process of approximating a continuous range of values is

    known as quantization.

  • 7/31/2019 VOICERSFINAL

    15/74

    SPEECH RECOGNITION SYSTEM

    Acoustic Model

    An acoustic model is created by taking audio recordings of speech, and their text transcriptions,

    and using software to create statistical representations of the sounds that make up each word. It is

    used by a speech recognition engine to recognize speech .The software acoustic model breaks the

    words into the phonemes.

    Language Model

    Language modeling is used in many natural language processing applications such as speech

    recognition tries to capture the properties of a language and to predict the next word in the

  • 7/31/2019 VOICERSFINAL

    16/74

    speech sequence . The software language model compares the phonemes to words in its built in

    dictionary .

    Speech engine

    The job of speech recognition engine is to convert the input audio into text ; to accomplish this it

    uses all sorts of data, software algorithms and statistics. Its first operation is digitization as

    discussed earlier, that is to convert it into a suitable format for further processing. Once audio

    signal is in proper format it then searches the best match for it. It does this by considering the

    words it knows, once the signal is recognized It returns its corresponding text string.

    The major steps of a typical speech recognizer are as follows:

    Grammar design: Defines the words that may be spoken by a user and the patterns in

    which they may be spoken.

    Signal processing: Analyzes the spectrum (the frequency) characteristics of the incoming

    audio.

    Phoneme recognition: Compares the spectrum patterns to the patterns of the phonemes

    of the language being recognized.

    Word recognition: Compares the sequence of likely phonemes against the words and

    patterns of words specified by the active grammars.

    Result generation: Provides the application with information about the words the

    recognizer has detected in the incoming audio. The result information is always providedonce recognition of a single utterance (often a sentence) is complete, but may also be

    provided during the recognition process. The result always indicates the recognizer's best

    guess of what a user said, but may also indicate alternative guesses.

  • 7/31/2019 VOICERSFINAL

    17/74

    A grammaris an object in the Java Speech API that indicates what words a user is

    expected to say and in what patterns those words may occur. Grammars are important to speech

    recognizers because they constrain the recognition process. These constraints make recognition

    faster and more accurate because the recognizer does not have to check for bizarre sentences.

    The Java Speech API supports two basic grammar types: rule grammars and dictation

    grammars. These types differ in various ways, including how applications set up the grammars;

    the types of sentences they allow; how results are provided; the amount of computational

    resources required; and how they are used in application design. Rule grammars are defined

    by JSGF, the Java Speech Grammar Format.

    Speech Recognition Workflow

    Speech Recognition weakness and flaws

    Besides all these advantages and benefits, yet a hundred percent perfect speech recognitionsystem is unable to be developed. There are number of factors that can reduce the accuracy and

    performance of a speech recognition program.

    Speech recognition process is easy for a human but it is a difficult task for a machine, comparing

    with a human mind speech recognition programs seems less intelligent, this is due to that fact

  • 7/31/2019 VOICERSFINAL

    18/74

    that a human mind is God gifted thing and the capability of thinking, understanding and reacting

    is natural, while for a computer program it is a complicated task, first it need to understand the

    spoken words with respect to their meanings, and it has to create a sufficient balance between the

    words, noise and spaces. A human has a built in capability of filtering the noise from a speech

    while a machine requires training, computer requires help for separating the speech sound from

    the other sounds.

    Few factors that are considerable in this regard are:

    Homonyms:

    Are the words that are differently spelled and have the different meaning but acquires the same

    meaning, for example there their be andbee. This is a challenge for computer machine

    to distinguish between such types of phrases that sound alike.

    Overlapping speeches:

    A second challenge in the process, is to understand the speech uttered by different users, current

    systems have a difficulty to separate simultaneous speeches form multiple users.

    Noise factor:

    The program requires hearing the words uttered by a human distinctly and clearly. Any extra

    sound can create interference, first you need to place system away from noisy environments and

    then speak clearly else the machine will confuse and will mix up the words.

    The Future Of Speech Recognition:

    Accuracy will become better and better.

    Dictation speech recognition will gradually become accepted.

    Greater use will be made ofintelligentsystems which will attempt to guess what the

    speaker intended to say, rather than what was actually said, as people often misspeak and

    make unintentional mistakes.

  • 7/31/2019 VOICERSFINAL

    19/74

    Microphone and sound systems will be designed to adapt more quickly to changing

    background noise levels, different environments, with better recognition of extraneous

    material to be discarded.

    JSGF Grammar Format

    Speech recognition systems provide computers with the ability to listen to user speech and

    determine what is said. Current technology does not yet support unconstrained speech

    recognition: the ability to listen to any speech in any context and transcribe it accurately. To

    achieve reasonable recognition accuracy and response time, current speech recognizers constrain

    what they listen for by using grammars.

    The Java Speech Grammar Format (JSGF) defines a platform-independent, vendor-independent

    way of describing one type of grammar, a rule grammar (also known as a command and control

    grammar or regular grammar). It uses a textual representation that is readable and editable by

    both developers and computers, and can be included in Java source code. The other major

    grammar type, the dictation grammar, is not discussed in this document.

    A rule grammar specifies the types of utterances a user might say (a spoken utterance is similar

    to a written sentence). For example, a simple window control grammar might listen for "open a

    file", "close the window", and similar commands.

    What the user can say depends upon the context: is the user controlling an email application,

    reading a credit card number, or selecting a font? Applications know the context, so applications

    are responsible for providing a speech recognizer with appropriate grammars.

    This document is the specification for the Java Speech Grammar Format. First, the basic naming

    and structural mechanisms are described. Following that, the basic components of the grammar,

    the grammar header and the grammar body, are described. The grammar header declares the

    grammar name and lists the imported rules and grammars. The grammar body defines the rules

    of this grammar as combinations of speakable text and references to other rules. Finally, some

  • 7/31/2019 VOICERSFINAL

    20/74

    simple examples of grammar declarations are provided. Grammars are used by speech

    recognizers to determine what the recognizer should listen for, and so describe the utterances a

    user may say.

    A Java Speech Grammar Format document starts with a self-identifying header. This header

    identifies that the document contains JSGF and indicates the version of JSGF being used

    (currentlyV1.0). JSGFV 1:0.The grammar body defines rules. Each rule is defined in a rule

    definition. A rule is defined once in a grammar. The order of definition of rules is not

    significant. Rule Name >= rule Expansion; public < rule Name >= rule Expansion;

    Sphinx Speech Recognition System

    Sphinx-4 is a speech recognition system written entirely in the Java(TM) programming language.

    Sphinx is a continuos-speech, speaker-independent recognition system with large vocabulary

    recognition making use of hidden Markov acoustic models(HMMs) and n-gram statistical

    language model.

    Each component of the architecture is explained below:

    Recognizer- Contains the main components of Sphinx-4, which are the front end, the linguist,

    and the decoder. The application interacts with the Sphinx-4 system mainly via the Recognizer.

    Audio - The data to be decoded. This is audio in most systems, but it can also be configured to

    accept other forms of data, e.g., spectral or cepstral data.

    Front End- Performs digital signal processing (DSP) on the incoming data.

    Feature- The output of the front end are features, which are used for decoding in the rest of the

    system.

  • 7/31/2019 VOICERSFINAL

    21/74

    Linguist- Embodies the linguistic knowledge of the system, which are the acoustic model, the

    dictionary, and the language model. The linguist produces a search graph structure on which the

    search manager performs search using different algorithms.

    .

    Sphinx-4 Architecture

    Acoustic Model- Contains a representation (often statistical) of a sound, often created by

    training using lots of acoustic data

  • 7/31/2019 VOICERSFINAL

    22/74

    Dictionary- Responsible for determining how a word is pronounced.

    Language Model- Contains a representation (often statistical) of the probability of

    occurrence of words.

    Search Graph- The graph structure produced by the linguist according to certain criteria

    (e.g., the grammar), using knowledge from the dictionary, the acoustic model, and the

    language model.

    Decoder- Contains the search manager.

    Search Manager- Performs search using certain algorithm used, e.g., breadth first

    search, best-first search, depth-first search, etc.. Also contains the feature scorer and the

    pruner.

    Active List- A list of tokens representing all the states in the search graph that are active

    in the current feature frame.

    Scorer- Scores the current feature frame against all the active states in the Active List.

    Pruner- Prunes the active list according to certain strategies.

    Result- The decoded result, which usually contains the N-best results.

    Configuration Manager- loads the Sphinx-4 configuration data from an XML based

    file, and manages the component life cycle for objects.

  • 7/31/2019 VOICERSFINAL

    23/74

    The need for Sphinx 4:

    Need to overcome Sphinx-3s limitations

    Need for flexibility in acoustic modeling

    Require handling of multimodal inputs

    With information fusion at various levels

    Need for more correct decoders

    Need for expansion of language model capabilities

    Facilitate the incorporation of several new online

    algorithms, that are currently difficult to incorporate

    into Sphinx-3

    Need for better application interfaces

    The SPHINX of the new millennium:

    An open source project by Carnegie Mellon

    University, SUN Microsystems Inc. and MERL

    Written entirely in Java TM

    Highly modularized and flexible architecture

    Supports any acoustic model structure

    Supports most types of language models

    CFGs, N grams, Combinations

    New algorithms for obtaining word level hypotheses

    Multimodal inputs

    Flexible APIs

  • 7/31/2019 VOICERSFINAL

    24/74

    Recognition Issue:

    Good Voice Data is the key

    to good recognition!

    Quality of recognitionis directly related to quality of voice data

    As part of the Sphinx 4 project we will be developing a trainer to give us good voice data

    How does a Recognizer Work?

  • 7/31/2019 VOICERSFINAL

    25/74

    Goal:

    Audio goes in

    Results come out

    Three application types

    Isolated words

    Command /

    Control

    General Dictation

    Front-End:

    Transforms speech waveform into features used by recognition

    Features are sets of mel-frequency cepstrum coefficients (MFCC)

    MFCC model human auditory system

    Front-End is a set of signal processing filters

    Pluggable architecture

  • 7/31/2019 VOICERSFINAL

    26/74

    Knowledge Base:

    The data that drives the decoder

    Consists of three sets of data:

    Dictionary

    Acoustic Model

    Language Model

    Needs to scale between the three application types

    DICTIONARY:

    Maps words to pronunciations

    Provides word classification information (such as part-of- speech)

    Single word may have multiple pronunciations

    Pronunciations represented as phones or other units

    Can vary in size from a dozen words to >100,000 words

  • 7/31/2019 VOICERSFINAL

    27/74

    Language Model:

    Describes what is likely to be spoken in a particular context

    Uses stochastic approach.Word transitions are defined in terms of transition probabilities

    Helps to constrain the search space

    Acoustic Models:

    Database of statistical models

    Each statistical model represents a ingle unit of speech such as a word or phoneme

    Acoustic Models are created/trained by analyzing large corpora of labeled speech

    Acoustic Models can be speaker dependent or speaker independent

  • 7/31/2019 VOICERSFINAL

    28/74

    Chapter 4

    FEASIBILITY STUDY, REQUIREMENT ANALYSIS

    SOFTWARE DEVELOPMENT LIFE CYCLE

    Since the inception of this project all software engineering principles have been followed. This

    project has passed through all the stages of software development lifecycle (SDLC). A

    development process consist of various phases, each phase ending with a defined output. The

    main reason for following the SDLC process is that it breaks the problem of developing software

    into successfully performing a set of phases, each phase handling a different concern of software

    development. Object technologies lead to reuse and reuse (of program components) lead to faster

    software development and higher quality programs. Object oriented software is easy to maintain

    because its structure is inherently decoupled. In addition, object oriented systems are easier to

    adopt and easier to scale. The Object Oriented process moves through an evolutionary spiral that

    starts with customer satisfaction. It is here that the problem domain is defined and that basic

    problem classes are identified. Planning establishes a foundation for the Object Oriented Project

    plan.

    FEASIBILTY STUDYIt is feasible because it is being frequently used in various areas like military, telephone,

    healthcare etc. It is also used by topmost industries for the recognition of their employees in their

    attendance process. So it is feasible and can be completed in given period. A Real-Time Voice

    Recognition Security System can be developed using the different algorithm.

    THREE PHASES OF FEASIBILITY STUDY

    Technical Feasibility:

    It involves determining whether or not a system can actually be constructed to solve the problem

    at hand. The technical issues raised during the feasibility stage of investigation are related to

    achievability of projects goal and possibility of completion of project.

  • 7/31/2019 VOICERSFINAL

    29/74

    Economical Feasibility:

    This feasibility deals with the cost/benefit analysis. A number of intangible benefits like user

    friendliness, robustness and security were pointed out. The cost that will be incurred upon the

    implementation of this project would be quite nominal.

    Operational Feasibility:

    The developed system will be very reliable and user friendly. All the features and operations that

    we will implement in our project are possible to implement and thus feasible. This will facilitate

    easy use and adoptability of the system. With the use of menus, and proper validation required it

    become fully understandable to the common user and operational with the user.

    STEPS INVOLVED IN THE FEASIBILITY ANALYSIS

    Feasibility is carried out in the following steps:

    Form a project team and appoint a project leader:

    First of all project management of the organization forms separate teams for independent project

    team comprises of one or system analyst and programmers with a project leader. The project

    leader is responsible for planning and managing the development activities of the system.

    Starts preliminary investigation:

    The system analyst of each project team starts preliminary investigation through different fact

    techniques.

    Prepare the current system flow chart:

    After preliminary investigation; the analysts prepare the system flowchart of the current system.

    These charts describe the general working of the system in graphical way.

    Determine objective of the proposed system:

    The major objectives of the proposed system are listed by each analyst and are discussed in the

    current system.

  • 7/31/2019 VOICERSFINAL

    30/74

    Describe the deficiencies of the proposed system:

    On study the current system flowchart, the analysts prepare their system flowchart; the analysts

    prepare their system flowchart. Systems flowcharts of the proposed system are compared with of

    the current system.

    Prepare the proposed system flow chart:

    After determining the major objectives of the proposed system; the analysts prepare their system

    flowchart. Systems flowcharts of the proposed system are compared with of the current system.

    Determining the technical feasibility:

    The existing computer systems (hardware/software) of the concerned department are identified

    and their technical specifications are noted down. The analyst decides whether the existing

    systems are sufficient for the technical requirement of the proposed system or not.

    Determine the operational feasibility:

    After determine the economic feasibility, the analysts identify the responsible users of the system

    and hence determine the operational feasibility of the project.

    Presentation of feasibility analysis:

    During the feasibility study, the analysts also keep on the feasibility report. At the end feasibility

    analysis report is given to the management along the oral presentation.

    Feasibility Analysis report:

    Feasibility analysis report is formal document for management use and is prepared for system

    analyst during or after feasibility study. This report generally contains the following sections .

    Covering letter:

    It is formally presents the report with brief description of the project problem along with

    recommendation to be considered.

  • 7/31/2019 VOICERSFINAL

    31/74

    Table of content:

    It lists the section of feasibility study report along with their page number.

    Description of the existing system:

    A brief description of the existing system along with the purpose and scope of the project.

    System requirement:

    The system requirements, which are either derived from the existing system or from the

    discussion with the users, are presented in this section.

    Description of proposed system:

    It presents a general description of the proposed system, highlighting its role in solving theproblem. A description of output reports to be generated by the system is also represented in the

    desired formats.

    Development plan:

    It present a detailed plan with the starting and completion dates for different phases of SDLC.

    Complimentary planes also needed for hardware and software evaluation, purchase and

    installation.

    Technical feasibility finding:

    It presents the finding of technical feasibility study along with recommendation.

    Costs and benefits:

    The detailed findings of cost and benefits analysis are presented in this section. The saving and

    benefits are highlighted to justify the economic feasibility of this project.

    Operational feasibility finding

    It presents the finding of operational feasibility along with the human resource requirements to

    implement the system.

  • 7/31/2019 VOICERSFINAL

    32/74

    REQUIREMENT ANALYSIS

    A requirement is a condition or capability that must be met or possessed by a system to satisfy a

    contract, standard, specification or other formally imposed specification of the client. This phase

    ends with the Software Requirements Specifications (SRS). The SRS is a document that

    completely describes what the proposed software should do without describing how the software

    will do it.

    SOFTWARE REQUIREMENTS SPECIFICATIONS

    System Analysis is a technique for carrying out system requirement & project management using

    structured analysis for specifying both manual & automated system. In system analysis the focus

    is on inquiring of current organizational environment,defining the system requirement, making

    recommendation for system improvement and determining the feasibility of system.

    Analysis Methodology:

    A complete understanding of requirement is essential for success of a project. This is done by

    gathering information, the approach and manner in which sensitivity, commonsense and

    knowledge of what and when to gather and what to use in securing information. There are

    various tools for gathering during the phase of system analysis.

    The phases are:-

    1. Familiarity with the present through available documentation, such as procedure manuals,

    document and their flow, interviews of user staff and on site observation.

    2. Defining of decision making associated with managing the system. This is important for

    determining what information is required of the system conduction interview clarifies the

    decision point and how decision made in user area.

    3. Once decision point is identified, a database may be conduct to define the information

    requirement. The information gathered is analyzed and documented. Discrepancies

    between decision system and information gathered from the information system are

    identified. This concludes the analysis and sets the stage for system design.

  • 7/31/2019 VOICERSFINAL

    33/74

    Type of Information Needed:

    Organization based information deals with policies, objectives, goals and structure. User based

    information focuses on information requirement. Work based information addresses the work

    flow, method & procedure and workstation. We are interested in what happened to data through

    various point in system.

    SYSTEM REQUIREMENTS:

    SOFTWARE REQUIREMENTS:

    Language :Java SDK, Eclipse Front End Tool: Sphinx-4

    Back End Tool : Oracle 10g for database.

    Operating system :Windows XP/7

    Microsoft Word is used for documentation.

    HARDWARE REQUIREMENTS:

    Processor: PC with a Pentium IV-class processor, 600 MHz, Recommended: Pentium IV-class,

    1.63 GHz.

    RAM : 1 GB

    Hard Disk Space: 20 GB on system drive, 10 GB for development environment.

    Microphone : Good Quality microphone.

  • 7/31/2019 VOICERSFINAL

    34/74

    Chapter 5

    SYSTEM ANALYSIS AND SYSTEM DESIGN

    Requirement analysis defines WHAT the system should do; design tells HOW to do it. Thisis the simplest way to defines system design. Any design has to be constantly evaluated to ensure

    that it meet its requirements, is practical and workable in the given environment. If there are

    number of alternatives, then all alternatives must be evaluated and the best possible solution

    must be implemented.

    SYSTEM ANALYSIS

    System Analysis is a term used to describe the process of calculating and analyzing facts in

    respect of existing operation of the prevailing situations that an effective computerized system

    may be designed and implemented if provided feasible. This is required in order to understand

    the problem that has to be solved. The problem may be of any kind like computerizing an

    existing system or developing an entirely new system or it can be a combination of two.

    Basically system analysis is used to describe the process of calculating and analyzing facts

    related to the existing operations of the prevailing situation, so that an effective and accurate

    computerized system may be designed and implemented if feasible. This is required in order to

    understand the problem the problem that has to be solved. To solve the problem in actual sense is

    not the aim of designing phase, but to see how the problem can be solved. For this the logical

    model of the system is required, providing the way to solve the problem and achieving the

    desired goal. The logical view of the system is provided to the developer and user for decision

    making such that developer can fee lease in designing the system.

    SPECIFICATION OF PROJECT

    The proposed system should have following features:

    1. It should be able to store voices in .wav format.

    2. It should be able to store usernames in database.

    3. It should provide the option for existing and new user.

  • 7/31/2019 VOICERSFINAL

    35/74

    4. It should have the ability of processing voice prints.

    5. It should closely match the voices.

    6. It should recognize speech up to a reasonable extent.

    7. It should provide proper guidance to the user to use it.

    8. It should give fast results.

    SYSTEM DESIGN

    System Design is the technique of creating a system that takes into notice such factors such as

    needs, performance levels, database design, hardware specifications, and data management. It is

    the most important part in the development part in the development of the system, as in the

    design phase the developer brings into existence the proposed system the analyst through of in

    the analysis phase.

    DESIGN CONCEPT

    Software design sites at the technical kernel of software engineering and is applied regardless of

    the software process model that is used. After software requirements have been analyzed and

    specified. Software design is the first of three technical activities-designs, code generation and

    test-that are required to build and verify the software. Each activity transforms information in a

    manner that utility results in validated computer software. The design transforms the information

    domain model created during analysis into the data structure that will be required to implement

    the software. The data objects and relationship diagram and the detailed data content depicted in

    the data dictionary provide the basis for the design activity .As aforesaid Design is that phase

    of software engineering that tells all about the completion of a project or complete failure. In our

    project Face Recognition System we have spent maximum time on Image preprocessing &processing. Now we are ready with processed images so as to make it easier for the user to match

    images. Also data flow diagrams for the project has been developed. While developing this

    project we have gone through various angles of images. The training data base structures are well

    defined with complete description of images about the used. Another part which took most of our

    consideration is that we decided to create the user input for directly giving path of images in the

  • 7/31/2019 VOICERSFINAL

    36/74

    dialog box and then executing each of them. The architectural design defined the relationship

    between major structure elements of the software, the design patterns that can be used to

    achieve the requirements that have been defined for the system and the constraints that affect the

    way in which architectural design pattern can be applied. The interface design describes how the

    software communicates with in itself, with systems that interoperate with it, and with humans

    who use it .An interface applies a flow of information and a specific type of behavior. Design is

    the phase where quality is fostered in website designing. Design provides us with representations

    of software that can be assessed for quality. Design is the only way that we can accurately

    translate a customers requirement into a finished software product or systems. Website design

    serves as the foundation of the software support steps that follow.

  • 7/31/2019 VOICERSFINAL

    37/74

    Chapter 6

    DATA FLOW DIAGRAM

    A data flow diagram is graphical tool used to describe and analyze movement of datathrough a system. These are the central tool and the basis from which the other components are

    developed. The transformation of data from input to output, through processed, may be

    described logically and independently of physical components associated with the system. These

    are known as the logical data flow diagrams. The physical data flow diagrams show the actual

    implements and movement of data between people, departments and workstations.

    A full description of a system actually consists of a set of data flow diagrams. Using two

    familiar notations Yourdon, Gane and Sarson notation develops the data flow diagrams. Eachcomponent in a DFD is labelled with a descriptive name. Process is further identified with a

    number that will be used for identification purpose. The development of DFDs is done in

    several levels. Each process in lower level diagrams can be broken down into a more detailed

    DFD in the next level. The lop-level diagram is often called context diagram. It consists a single

    process bit, which plays vital role in studying the current system. The process in the context

    level diagram is exploded into other process at the first level DFD.

    The idea behind the explosion of a process into more process is that understanding at one

    level of detail is exploded into greater detail at the next level. This is done until further

    explosion is necessary and an adequate amount of detail is described for analyst to understand

    the process.

    Larry Constantine first developed the DFD as a way of expressing system requirements in

    a graphical from, this lead to the modular design.

    A DFD is also known as a bubble Chart has the purpose of clarifying system

    requirements and identifying major transformations that will become programs in system design.

    So it is the starting point of the design to the lowest level of detail. A DFD consists of a series of

    bubbles joined by data flows in the system.

  • 7/31/2019 VOICERSFINAL

    38/74

    DFD SYMBOLS:

    In the DFD, there are four symbols

    1.

    A square defines a source(originator) or destination of system data.2. An arrow identifies data flow. It is the pipeline through which the information flows.

    3. A circle or a bubble represents a process that transforms incoming data flow into outgoing

    data flows.

    4. An open rectangle is a data store, data at rest or a temporary repository of data.

    Process that transforms data flow.

    Source or Destination of data

    Data flow

    Data Store

    CONSTRUCTING DFD:

    Several rules of thumb are used in drawing DFDs:

    1. Process should be named and numbered for an easy reference. Each name should be

    representative of the process.

    2. The direction of flow is from top to bottom and from left to right. Data traditionally flow

    from source to the destination although they may flow back to the source. One way to

    indicate this is to draw long flow line back to a source. An alternative way is to repeat the

  • 7/31/2019 VOICERSFINAL

    39/74

    source symbol as a destination. Since it is used more than once in the DFD it is marked with

    a short diagonal.

    3. When a process is exploded into lower level details, they are numbered.

    4. The names of data stores and destinations are written in capital letters. Process and dataflow

    names have the first letter of each work capitalized.

    5. A DFD typically shows the minimum contents of data store. Each data store should contain

    all the data elements that flow in and out. Questionnaires should contain all the data elements

    that flow in and out. Missing interfaces redundancies and like is then accounted for often

    through interviews.

    SAILENT FEATURES OF DFD:

    1.The DFD shows flow of data, not of control loops and decision are controlled

    considerations do not appear on a DFD.

    2.The DFD does not indicate the time factor involved in any process whether the dataflow

    take place daily, weekly, monthly or yearly.

    3.The sequence of events is not brought out on the DFD.

    TYPES OF DATA FLOW DIAGRAMS:

    1. Current Physical

    2. Current Logical

    3. New Logical

    4. New Physical

    CURRENT PHYSICAL:In Current Physical DFD process label include the name of people or their positions or

    the names of computer systems that might provide some of the overall system-processing label

    includes an identification of the technology used to process the data. Similarly data flows and

    data stores are often labels with the names of the actual physical media on which data are stored

    such as file folders, computer files, business forms or computer tapes.

  • 7/31/2019 VOICERSFINAL

    40/74

    CURRENT LOGICAL:

    The physical aspects at the system are removed as mush as possible so that the current

    system is reduced to its essence to the data and the processors that transforms them regardless of

    actual physical form.

    NEW LOGICAL:

    This is exactly like a current logical model if the user were completely happy with he user

    were completely happy with the functionality of the current system but had problems with how it

    was implemented typically through the new logical model will differ from current logical model

    while having additional functions, absolute function removal and inefficient flows recognized.

    NEW PHYSICAL:

    The new physical represents only the physical implementation of the new system.

    RULES GOVERNING THE DFDS

    PROCESS:

    1) No process can have only outputs.

    2) No process can have only inputs. If an object has only inputs than it must be a sink.

    3) A process has a verb phrase label.

    DATA STORE:

    Data cannot move directly from one data store to another data store, a process must move data.

    from the source and place the data into data store .A data store has a noun phrase label.

    SOURCE OR SINK

    The origin and /or destination of data.

    1) Data cannot move direly from a source to sink it must be moved by a process

    2) A source and /or sink has a noun phrase land

  • 7/31/2019 VOICERSFINAL

    41/74

    DATA FLOW

    1) A Data Flow has only one direction of flow between symbols. It may flow in both

    directions between a process and a data store to show a read before an update. The

    later is usually indicated however by two separate arrows since these happen at

    different type.

    2) A join in DFD means that exactly the same data comes from any of two or more

    different processes data store or sink to a common location.

    3) A data flow cannot go directly back to the same process it leads. There must be at

    least one other process that handles the data flow produce some other data flow

    returns the original data into the beginning process.

    4) A Data flow to a data store means update (delete or change).

    5) A data Flow from a data store means retrieve or use.

    6) A data flow has a noun phrase label more than one data flow noun phrase can appear

    on a single arrow as long as all of the flows on the same arrow move together as one

    package.

    DEVELOPING DATA-FLOW DIAGRAM

    Top-Down Approach:

    The system designer makes "a context level DFD" or Level 0, which shows the

    "interaction" (data flows) between "the system" (represented by one process) and "the system

    environment" (represented by terminators).The system is "decomposed in lower-level DFD

    (Level 1)" into a set of "processes, data stores, and the data flows between these processes

    and data stores" .Each process is then decomposed into an "even-lower-level diagram containing

    its sub processes". This approach "then continues on the subsequent sub processes", until a

    necessary and sufficient level of detail is reached which is called the primitive process

  • 7/31/2019 VOICERSFINAL

    42/74

    DATA FLOW DIAGRAM LEVELS

    Context Level Diagram:

    This level shows the overall context of the system and its operating environment and shows the

    whole system as just one process. It does not usually show data stores, unless they are "owned"

    by external systems, e.g. are accessed by but not maintained by this system, however, these are

    often shown as external entities.

  • 7/31/2019 VOICERSFINAL

    43/74

    Level 1 (High Level Diagram):

    This level (Level 1) shows all processes at the first level of numbering, data stores, external

    entities and the data flows between them. The purpose of this level is to show the major high-

    level processes of the system and their interrelation. A process model will have one, and only

    one, level-1 diagram. A level-1 diagram must be balanced with its parent context level diagram,

    i.e. there must be the same external entities and the same data flows, these can be broken down

    to more detail in level 1.

  • 7/31/2019 VOICERSFINAL

    44/74

    LEVEL 2 DFD DIAGRAM:

    name and identifier of higher level process shown at top of lower level diagram

    frame represents the boundary of the process

    data flow across the frame must relate to data flows at the higher level

    data store used by only one process are usually shown as internal to that process at the

    lower level

    processes with no further decomposition are marked/*

  • 7/31/2019 VOICERSFINAL

    45/74

    Chapter 7

    CODE SNIPPETS

    1.RSSReader Classpackage com.cvrce.projects.launcher;

    import java.net.URL;

    import javax.xml.parsers.DocumentBuilder;

    import javax.xml.parsers.DocumentBuilderFactory;

    import org.w3c.dom.CharacterData;

    import org.w3c.dom.Document;import org.w3c.dom.Element;

    import org.w3c.dom.Node;

    import org.w3c.dom.NodeList;

    public class RSSReader {

    private static RSSReader instance = null;

    private RSSReader() {

    }

    public static RSSReader getInstance() {

    if(instance == null) {

    instance = new RSSReader();

    }

    return instance;

    }

    public String writeNews() {

    String s=new String("hello and welcome to News Reader Application. ");

    String newsInBrief = new String("Briefing the headlines?");

    String headLines = new String("! The headlines are?");

    try {

  • 7/31/2019 VOICERSFINAL

    46/74

    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();

    URL u = new URL("http://feeds.bbci.co.uk/news/world/asia/rss.xml"); // your feed url

    Document doc = builder.parse(u.openStream());

    NodeList nodes = doc.getElementsByTagName("item");

    for(int i=0;i

  • 7/31/2019 VOICERSFINAL

    47/74

    try {

    Node child = e.getFirstChild();

    if(child instanceof CharacterData) {

    CharacterData cd = (CharacterData) child;

    return cd.getData();

    }

    }

    catch(Exception ex) {

    }

    return "";

    } //private String getCharacterDataFromElement

    protected float getFloat(String value) {

    if(value != null && !value.equals("")) {return Float.parseFloat(value);

    }

    return 0;

    }

    protected String getElementValue(Element parent,String label) {

    return getCharacterDataFromElement((Element)parent.getElementsByTagName(label).item(0));

    }

    /*public static void main(String[] args) {

    RSSReader reader = RSSReader.getInstance();

    reader.writeNews();}

    */

    }

    2.Class TaskLauncher1

    package com.cvrce.projects.launcher;

    import java.awt.*;

    //import java.awt.event.*;

    import com.sun.speech.freetts.Voice;

    import com.sun.speech.freetts.VoiceManager;

    import com.sun.speech.freetts.audio.AudioPlayer;

    import java.io.*;

    import edu.cmu.sphinx.frontend.util.Microphone;

  • 7/31/2019 VOICERSFINAL

    48/74

    public class TaskLauncher1 extends Frame {

    static int type; //mediaType=1 for movie, =2 for song and =3 for file

    Frame f;

    TextArea t1;

    public TaskLauncher1() {

    f=new Frame("BBC News");

    //setLayout(new FlowLayout());

    t1=new TextArea(200,200);

    //t1.setSize(100, 50);

    f.add(t1);

    f.setSize(1200,700);

    }

    public Boolean launchTask(String task)

    {

    System.out.println("Launcher received : "+task);

    // Microphone microphone=new Microphone();

    try {

    if(task.contains("movie"))

    {type=1;

    // microphone.stopRecording();

    String s=new String("Select your movie! say? 1? for Sixth

    sense? 2? for Illusionist? 3? for Madagascar? 4? for shrek? and 5? for Impact");

    voice1(s);

    //microphone.startRecording();

    }

    else if(task.contains("song"))

    {

    type=2;

    String s=new String("Select your Music? say 1? for Chak

    de India? 2? for Give me some sun shine? 3? for iss pal? 4? for miss independent and 5? for

    Kaash ik din ");

    voice1(s);

  • 7/31/2019 VOICERSFINAL

    49/74

    }

    //Runtime.getRuntime().exec("D:\\VLC\\vlc

    E:\\Music\\Low.mp3");

    //Runtime.getRuntime().exec("E:\\Music\\Low.mp3");

    else if(task.contains("data file"))

    {

    type=3;

    int i=0;

    String s=new String("Select whose biodata file to read? say

    1? for samarpita? 2? for pranita? 3? for snigdha? and 4? for ellora green");

    voice1(s);

    //fileread(i);

    }if(task.contains("one"))

    {

    //if media type is movie

    if(type == 1)

    {

    //play first movie

    Runtime.getRuntime().exec("D:\\VLC\\vlc

    E:\\Movies\\Sixth_sense.avi");

    }

    //if media type is song

    if(type == 2)

    {

    //play first song

    Runtime.getRuntime().exec("D:\\VLC\\vlc

    E:\\Music\\ChakDe.mp3");

    }

    //if type is file

    if(type==3)

    fileread(1);

    }

  • 7/31/2019 VOICERSFINAL

    50/74

    //if user says two

    if(task.contains("two"))

    {

    //if media type is movie

    if(type == 1)

    {

    //play 2nd movie

    Runtime.getRuntime().exec("D:\\VLC\\vlc

    E:\\Movies\\The_Illusionist.avi");

    }

    //if media type is song

    if(type == 2)

    {//play 2nd song

    Runtime.getRuntime().exec("D:\\VLC\\vlc

    E:\\Music\\3idiots04.mp3");

    }

    //if type is file

    if(type==3)

    fileread(2);

    }

    //if user says Three

    if(task.contains("three"))

    {

    //if media type is movie

    if(type == 1)

    {

    //play first movie

    Runtime.getRuntime().exec("D:\\VLC\\vlc

    E:\\Movies\\madagascar2.mkv");

    }

    //if media type is song

    if(type == 2)

  • 7/31/2019 VOICERSFINAL

    51/74

    {

    //play first song

    Runtime.getRuntime().exec("D:\\VLC\\vlcE:\\Music\\Ispal.mp3");

    }

    //if type is file

    if(type==3)

    fileread(3);

    }

    //if user says four

    if(task.contains("four"))

    {//if media type is movie

    if(type == 1)

    {

    //play first movie

    Runtime.getRuntime().exec("D:\\VLC\\vlc

    E:\\Movies\\Shrek1.avi");

    }

    //if media type is song

    if(type == 2){

    //play first song

    Runtime.getRuntime().exec("D:\\VLC\\vlc

    E:\\Music\\MissIndependent.mp3");

    }

    //if type is file

    if(type==3)

    fileread(4);

    }

  • 7/31/2019 VOICERSFINAL

    52/74

    //if user says five

    if(task.contains("five"))

    {

    //if media type is movie

    if(type == 1)

    {

    //play first movie

    Runtime.getRuntime().exec("D:\\VLC\\vlc

    D:\\Impact.avi");

    }

    //if media type is song

    if(type == 2)

    {//play first song

    Runtime.getRuntime().exec("D:\\VLC\\vlc

    E:\\Music\\dwnlds\\showbiz03.mp3");

    }

    }

    else if(task.contains("news"))readRSS();

    else if(task.contains("snap"))

    Runtime.getRuntime().exec("D:\\PicasaPhotoViewerD:\\friends.jpg");

    else

    {

    String s=new String("");

    }

    } catch (Exception e) {

    // TODO Auto-generated catch block

    e.printStackTrace();

  • 7/31/2019 VOICERSFINAL

    53/74

    }

    return false;

    }

    public void listAllVoices() {

    VoiceManager voiceManager = VoiceManager.getInstance();

    Voice[] voices = voiceManager.getVoices();

    }

    public void voice1(String s)

    {

    listAllVoices();

    String voiceName = "kevin16";

    /* The VoiceManager manages all the voices for FreeTTS.

    */

    VoiceManager voiceManager = VoiceManager.getInstance();

    Voice helloVoice = voiceManager.getVoice(voiceName);

    if (helloVoice == null) {

    System.err.println("Cannot find a voice named "

    + voiceName + ". Please specify a different voice.");

    System.exit(1);

    }

    /* Allocates the resources for the voice.

    */

    helloVoice.allocate();

    /* Synthesize speech.

    */

    helloVoice.speak(s);

    helloVoice.deallocate();

  • 7/31/2019 VOICERSFINAL

    54/74

    }

    public void fileread(int i)throws Exception

    {

    String s1=new String();

    if (i==1)

    {

    s1="D:/sambiodata.txt";

    //Runtime.getRuntime().exec("D://sambiodata.txt");

    }

    if (i==2)

    {

    s1="D:/prabiodata.txt";

    //Runtime.getRuntime().exec("D://prabiodata.txt");

    }

    if (i==3)

    {

    s1="E:/snicv.txt";

    //Runtime.getRuntime().exec("E://snicv.txt");

    }

    if (i==4)

    {

    s1="E:/ellucv.txt";

    //Runtime.getRuntime().exec("E://ellucv.txt");}

    FileReader fr = new FileReader(s1);

    BufferedReader br = new BufferedReader(fr);

    String s2;

    while((s2 = br.readLine())!= null) {

    System.out.println(s2);

    voice1(s2);

    }

    fr.close();

    }

  • 7/31/2019 VOICERSFINAL

    55/74

    public void readRSS()

    {

    RSSReader reader = RSSReader.getInstance();

    String s=reader.writeNews();

    f.setVisible(true);

    t1.setText(s);

    //speak the news

    voice1(s);

    }

    }

    3.Class VoiceResponseSystem

    /*

    * Copyright 1999-2004 Carnegie Mellon University.

    * Portions Copyright 2004 Sun Microsystems, Inc.

    * Portions Copyright 2004 Mitsubishi Electric Research Laboratories.

    * All Rights Reserved. Use is subject to license terms.

    *

    * See the file "license.terms" for information on usage and

    * redistribution of this file, and for a DISCLAIMER OF ALL

    * WARRANTIES.*

    */

    package com.cvrce.projects.speech;

    import com.cvrce.projects.launcher.TaskLauncher1;

    import com.sun.speech.freetts.Voice;

    import com.sun.speech.freetts.VoiceManager;

    import edu.cmu.sphinx.frontend.util.Microphone;import edu.cmu.sphinx.recognizer.Recognizer;

    import edu.cmu.sphinx.result.Result;

    import edu.cmu.sphinx.util.props.ConfigurationManager;

    /**

  • 7/31/2019 VOICERSFINAL

    56/74

    * A Program showing a simple speech application built using Sphinx-4. This application uses

    the Sphinx-4

    * endpointer, which automatically segments incoming audio into utterances and silences.

    */

    public class VoiceResponseSystem {

    public void listAllVoices() {

    VoiceManager voiceManager = VoiceManager.getInstance();

    Voice[] voices = voiceManager.getVoices();

    }

    public void voice1(String s)

    {

    listAllVoices();

    String voiceName = "kevin16";

    System.out.println();

    //System.out.println("Using voice: " + voiceName);

    /* The VoiceManager manages all the voices for FreeTTS.

    */VoiceManager voiceManager = VoiceManager.getInstance();

    Voice helloVoice = voiceManager.getVoice(voiceName);

    if (helloVoice == null) {

    System.err.println(

    "Cannot find a voice named "

    + voiceName + ". Please specify a different voice.");

    System.exit(1);

    }

    /* Allocates the resources for the voice.

    */

    helloVoice.allocate();

  • 7/31/2019 VOICERSFINAL

    57/74

    /* Synthesize speech.

    */

    helloVoice.speak(s);

    helloVoice.deallocate();

    }

    public static void main(String[] args) {

    String s1= new String("Hello and welcome to Voice response system?! select your

    option? " +

    " say movie? to watch a movie? song? to listen a song?! news? to listen

    news? " +"Data file? to listen the containts of biodata file? and? say snap? to view a

    picture?");

    VoiceResponseSystem v1=new VoiceResponseSystem();

    //v1.voice1(s1);

    ConfigurationManager cm;

    if (args.length > 0) {cm = new ConfigurationManager(args[0]);

    } else {

    cm = new

    ConfigurationManager(VoiceResponseSystem.class.getResource("vrs.config.xml"));

    }

    Recognizer recognizer = (Recognizer) cm.lookup("recognizer");

    recognizer.allocate();

    // start the microphone or exit if the programm if this is not possible

    Microphone microphone = (Microphone) cm.lookup("microphone");

    if (!microphone.startRecording()) {

    System.out.println("Cannot start microphone.");

    recognizer.deallocate();

  • 7/31/2019 VOICERSFINAL

    58/74

    System.exit(1);

    }

    System.out.println("Ask: Song/News/Data File/Movie/Snap");

    // loop the recognition until the programm exits.

    while (true) {

    System.out.println("Start speaking.\n");

    Result result = recognizer.recognize();

    if (result != null) {

    String resultText = result.getBestFinalResultNoFiller();

    System.out.println("You said: " + resultText + '\n');

    TaskLauncher1 tl = new TaskLauncher1();tl.launchTask(resultText);

    // microphone.stopRecording();

    // recognizer.deallocate();

    } else {

    }

    }

    }

    }

    4.Grammar File

    #JSGF V1.0;

    /**

    * JSGF Grammar for Hello World example

    */

    grammar hello;

    public = ( Song | News | Data File | Movie | One | Two | Three | Four | Five | Snap );

  • 7/31/2019 VOICERSFINAL

    59/74

    Chapter 8

    RESULT & SCREENSHOTS

    After running the application it asks to choose any options by saying codeallocated to each action. There are 5 actions:

    1.Song

    2. Snap

    3.Movie

    4.News

    5.Data File

  • 7/31/2019 VOICERSFINAL

    60/74

    Output of each action is described below.

    1.Selecting song

    Example 1:

    After selecting song, it asks for other options under this action, like Saying one

    for song Chak de India, two for Give me some sunshine etc.

  • 7/31/2019 VOICERSFINAL

    61/74

    Example 2:

  • 7/31/2019 VOICERSFINAL

    62/74

    2.Selecting Photo:

    After selecting option snap it opens a picture friends.jpg as shown below.

  • 7/31/2019 VOICERSFINAL

    63/74

    3.Selecting movie:

    After selecting movie, it asks for other options under this action, like Saying

    one for movie The sixth sense, two for The Illusionist etc.

    Example 1: selected movie 4 : Shrek2

  • 7/31/2019 VOICERSFINAL

    64/74

    Example 2:

  • 7/31/2019 VOICERSFINAL

    65/74

    4.Selecting News:

    After selecting this option, it connects to the bbc news rss feed i.e,http://feeds.bbci.co.uk/news/world/asia/rss.xml

    http://feeds.bbci.co.uk/news/world/asia/rss.xmlhttp://feeds.bbci.co.uk/news/world/asia/rss.xmlhttp://feeds.bbci.co.uk/news/world/asia/rss.xml
  • 7/31/2019 VOICERSFINAL

    66/74

    5.Selecting a data file to read:

    After selecting data file, it asks for other options under this action, like Saying

    one for filesambiodata.txt, two for prabiodata.txt etc.

    Selected file 3:snicv.txt

  • 7/31/2019 VOICERSFINAL

    67/74

    Chapter 9

    DISCUSSION

    The modular framework of Sphinx-4 has permitted us to do some things very easily thathave been traditionally difficult. The modular nature of Sphinx-4 also provides it with the ability

    to use modules whose implementations range from general to specific applications of an

    algorithm. For example, we were able to improve the runtime speed for the RM1 regression test

    by almost 2 orders of magnitude merely by plugging in a new Linguist and leaving the rest of the

    system the same. Furthermore, the modularity of Sphinx-4 also allows it to support a wide

    variety of tasks. For example, the various SearchManager implementations allow Sphinx-4 to

    efficiently support tasks that range from small vocabulary tasks implementations allow Sphinx-4

    to support different tasks such as traditional CFG-based command-and-control applications in

    addition to applications that use stochastic language models.

    The modular nature of Sphinx-4 was enabled primarily by the use of the Java

    programming language. In particular, the ability of the Java platform to load code at run time

    permits simple support for the pluggable framework, and the Java programming language

    construct of interfaces permits separation of the framework design from the implementation.

    The Java platform also provides Sphinx-4 with a number of other advantages:

    Sphinx-4 can run on a variety of platforms without the need for recompilation

    The rich set of platform APIs greatly reduces coding time

    Built-in support for multithreading makes it simple to experiment with distributing decoding

    tasks across multiple threads

    Automatic garbage collection helps developers to concentrate on algorithm development

    instead of memory leaks

    On the downside, the Java platform can have issues with memory footprint. Also related

    to memory, some speech engines will directly access the platform memory directly in order to

    optimize the memory throughput during decoding. Direct access to the platform memory model

    is not permitted with the Java programming language. A common misconception people have

    regarding the Java programming language is that it is too slow. When developing Sphinx-4, we

    carefully instrumented the code to measure various aspects of the system, comparing the results

    to its predecessor.

  • 7/31/2019 VOICERSFINAL

    68/74

    Table I provides a summary showing that Sphinx-4 performs well (for both WER and RT,

    a lower number indicates better performance). An interesting result of this helps to demonstrate

    the strength of the pluggable and modular design of Sphinx-4.

    we were able to plug in different implementations of the Linguist and SearchManager that were

    optimized for the particular tasks, allowing Sphinx-4 to perform much better. Another interesting

    aspect of the performance study shows us that raw computing speed is not our biggest concern

    when it comes to RT performance. For the 2 CPU results in this table, we used a Scorer that

    equally divided the scoring task across the available CPUs. While the increase in speed is

    noticeable, it is not as dramatic as we expected. Further analysis helped us determine that only

    about 30 percent of the CPU time is spent doing the actual scoring of the acoustic model states.

    The remaining 70 percent is spent doing non-scoring activity, such as growing and pruning the

    ActiveList. Our results also show that the Java platforms garbage collection mechanism only

    accounts for 2-3 percent of the overall CPU usage.

    TEST

    WER RTTI46(11 WORDS) 0.168 0.02

    TIDIGITS(11 WORDS) 0.549 0.05

    AN4(79 WORDS) 1.192 0.20RM1(1000 WORDS) 2.739 0.40

    WSJ5K(5000 WORDS) 7.174 0.96

    (Sphinx-4 performance.word error rate (wer) is given in percent. Real time (rt) speed is the ratio

    of utterance duration to the Time to decode the utterance.)

    Results:

    The test cases mentioned in the previous slide have been found to produce correct results given

    the voice is recognized correctly. However, the voice recognition is not 100 percent accurate. It

    may sometimes lead to frustrating results.

  • 7/31/2019 VOICERSFINAL

    69/74

    Known Bugs/Defects

    Since the project is based on voice recognition, the accuracy while working is not very

    high.Sometimes, it may so happen that we speak at the loudest of our voice levels in as clear

    pronunciation as possible and yet the program might misunderstand what is spoken. It cannot be

    attributed as a bug in the project, but is for sure a defect which arises due to large number of

    factors. Some of these factors may be the noise interference from the environment, difference in

    the accent of the user and the accent on which the program is trained to understand etc.

    Workaround:

    While no perfect solution for this can be implemented, we can have a workaround. This is to

    train the program to understand accent of a specific user which will in turn result in higher

    accuracy.

  • 7/31/2019 VOICERSFINAL

    70/74

    Chapter 10

    CONCLUSION

    ADVANTAGES:

    Able to write the text through both keyboard and voice input.

    Voice recognition of different notepad commands such as open save and clear.

    Open different windows softwares, based on voice input.

    Requires less consumption of time in writing text.

    Provide significant help for the people with disabilities.

    Lower operational costs.

    DISADVANTAGES: Low accuracy

    Not good in the noisy environment

    After careful development of the Sphinx-4 framework, we created a number of differing

    implementations for each module in the framework. For example, the Front End implementations

    support MFCC, PLP, and LPC feature extraction; the Linguist implementations support a variety

    of language models, including CFGs, FSTs, and N-Grams; and the Decoder supports a variety of

    Search Manager implementations. Using the Configuration Manager, the various

    implementations of the modules can be combined in various ways, supporting our claim that we

    have developed a flexible pluggable framework. Furthermore, the framework is performing well

    both in speed and accuracy when compared to its predecessors. The Sphinx-4 framework is

    already proving itself as being research ready, easily supporting various work as well as a

    specialized Linguist. We view this as only the very beginning, however, and expect Sphinx-4 to

    support future areas of core speech recognition research. Finally, the source code to Sphinx-4 is

    freely available. The license permits others to do academic and commercial research and to

    develop products without requiring any licensing fees. More information is available at

    http://cmusphinx.sourceforge.net/sphinx4.

    This Thesis/Project work of voice response system started with a brief introduction of

    the technology and its applications in different sectors. The project part of the Report was based

  • 7/31/2019 VOICERSFINAL

    71/74

    on software development for voice response system. In the later stage we discussed different

    tools for bringing that idea into practical work. After the development of the software finally it

    was tested and results were discussed, few deficiencies factors were brought in front. After the

    testing work, advantages of the software were described and suggestions for further enhancement

    and improvement were discussed.

    Future Enhancements

    This work can be taken into more detail and more work can be done on the project in

    order to bring modifications and additional features. The current software doesnt support a large

    vocabulary, the work will be done in order to accumulate more number of samples and increase

    the efficiency of the software. The current version of the software supports only few areas but

    more areas can be covered and effort will be made in this regard.

  • 7/31/2019 VOICERSFINAL

    72/74

    Chapter 11

    BIBLIOGRAPHY

    [1] S. Young, The HTK hidden Markov model toolkit: Design and philosophy, Cambridge

    University Engineering Department, UK, Tech. Rep. CUED/FINFENG/TR152, Sept. 1994.

    [2] N. Deshmukh, A. Ganapathiraju, J. Hamaker, J. Picone, and M. Ordowski, A public domain

    speech-to-text system, in Proceedings of the 6th European

    Conference on Speech Communication and Technology, vol. 5, Budapest, Hungary, Sept. 1999,

    pp. 21272130.

    [3] X. X. Li, Y. Zhao, X. Pi, L. H. Liang, and A. V. Nefian, Audio-visual continuous speech

    recognition using a coupled hidden Markov model, in

    Proceedings of the 7th International Conference on Spoken Language Processing, Denver, CO,

    Sept. 2002, pp. 213216.

    [4] K. F. Lee, H. W. Hon, and R. Reddy, An overview of the SPHINX speech recognition

    system,IEEE Transactions on Acoustics, Speech and Signal

    Processing, vol. 38, no. 1, pp. 3545, Jan. 1990.

    [5] X. Huang, F. Alleva, H. W. Hon, M. Y. Hwang, and R. Rosenfeld, The SPHINX-II speech

    recognition system: an overview, Computer Speech and

    Language, vol. 7, no. 2, pp. 137148, 1993.

    [6] M. K. Ravishankar, Efficient algorithms for speech recognition, PhD Thesis (CMU

    Technical Report CS-96-143), Carnegie Mellon University, Pittsburgh,PA, 1996.

    [7] P. Lamere, P. Kwok, W. Walker, E. Gouvea, R. Singh, B. Raj, and P. Wolf, Design of theCMU Sphinx-4 decoder, in Proceedings of the 8th EuropeanConference on Speech Communication and Technology, Geneve, Switzerland, Sept. 2003, pp.

    11811184.

    [8] J. K. Baker, The Dragon system - an overview, inIEEE Transactions on Acoustic, Speech

    and Signal Processing, vol. 23, no. 1, Feb. 1975, pp. 2429.

    [9] B. T. Lowerre, The Harpy speech recognition system, Ph.D. dissertation, Carnegie MellonUniversity, Pittsburgh, PA, 1976.

    [10] J. K. Baker, Stochastic modeling for automatic speech understanding, in SpeechRecognition, R. Reddy, Ed. New York: Academic Press, 1975, pp.

    521542.

  • 7/31/2019 VOICERSFINAL

    73/74

    [11] P. Placeway, S. Chen, M. Eskenazi, U. Jain, V. Parikh, B. Raj, M. Ravishankar, R.

    Rosenfeld, K. Seymore, M. Siegler, R. Stern, and E. Thayer, The 1996 HUB-4 Sphinx-3

    system, in Proceedings of the DARPA Speech Recognition Workshop. Chantilly, VA: DARPA,

    Feb. 1997. [Online]. Available:

    http://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdf

    [12] M. Ravishankar, Some results on search complexity vs accuracy, in Proceedings of the

    DARPA Speech Recognition Workshop. Chantilly, VA:

    DARPA, Feb. 1997. [Online].Available:http://www.nist.gov/speech/publications/darpa97/pdf/ravisha1.pdf

    [13] F. Jelinek, Statistical Methods for Speech Recognition. Cambridge, MA: MIT Press, 1998.SMLI TR2004-0811 c2004 SUN MICROSYSTEMS INC. 9

    [14] X. Huang, A. Acero, F. Alleva, M. Hwang, L. Jiang, and M. Mahajan, From SPHINX-II to

    Whisper: Making speech recognition usable, inAutomatic

    Speech and Speaker Recognition, Advanced Topics, C. Lee, F. Soong, and K. Paliwal, Eds.Norwell, MA: Kluwer Academic Publishers, 1996.

    [15] S. B. Davis and P. Mermelstein, Comparison of parametric representations formonosyllable word recognition in continuously spoken sentences, in

    IEEE Transactions on Acoustic, Speech and Signal Processing, vol. 28, no. 4, Aug. 1980.

    [16] H. Hermansky, Perceptual linear predictive (PLP) analysis of speech,Journal of the

    Acoustical Society of America, vol. 87, no. 4, pp. 17381752,

    1990.[17] NIST. Speech recognition scoring package (score). [Online]. Available:

    http://www.nist.gov/speech/tools

    [18] G. D. Forney, The Viterbi algorithm, Proceedings of The IEEE, vol. 61, no. 3, pp. 268

    278, 1973.[19] P. Kenny, R. Hollan, V. Gupta, M. Lenning, P. Mermelstein, and D. OShaugnessy, A*-

    admissible heuristics of rapid lexical access,IEEE Transactions on Speech and Audio

    Processing, vol. 1, no. 1, pp. 4959, Jan. 1993.[20] Java speech API grammar format (JSGF). [Online].

    Available:http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/

    [21] M. Mohri, Finite-state transducers in language and speech processing, ComputationalLinguistics, vol. 23, no. 2, pp. 269311, 1997.

    [22] P. Clarkson and R. Rosenfeld, Statistical language modeling using the CMU-cambridge

    toolkit, in Proceedings of the 5th European Conference on

    Speech Communication and Technology, Rhodes, Greece, Sept. 1997.

    [23] Carnegie Mellon University. CMU pronouncing dictionary. [Online]. Available:

    http://www.speech.cs.cmu.edu/cgi-bin/cmudict

    http://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdfhttp://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdfhttp://java.sun.com/products/java-media/speech/forDevelopers/JSGF/http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/http://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://java.sun.com/products/java-media/speech/forDevelopers/JSGF/http://www.nist.gov/speech/publications/darpa97/pdf/placewa1.pdf
  • 7/31/2019 VOICERSFINAL

    74/74

    [24] S. J. Young, N. H. Russell, and J. H. S. Russell, Token passing: A simple conceptual

    model for connected speech recognition systems, Cambridge

    University Engineering Dept, UK, Tech. Rep. CUED/F-INFENG/TR38, 1989.

    [25] R. Singh, M. Warmuth, B. Raj, and P. Lamere, Classification with free energy at raisedtemperatures, in Proceedings of the 8th European Conference

    on Speech Communication and Technology, Geneve, Switzerland, Sept. 2003, pp. 17731776.

    [26] P. Kwok, A technique for the integration of multiple parallel feature streams in the Sphinx-

    4 speech recognition system, Masters Thesis (Sun Labs

    TR-2003-0341), Harvard University, Cambridge, MA, June 2003.

    [27] P. Price, W. M. Fisher, J. Bernstein, and D. S. Pallett, The DARPA 1000-word resource

    management database for continuous speech recognition, in

    Proceedings of the International Conference on Acoustics, Speech and Signal Processing, vol. 1.

    IEEE, 1988, pp. 651654.

    [28] G. R. Doddington and T. B. Schalk, Speech recognition: Turning theory to practice,IEEE

    Spectrum, vol. 18, no. 9, pp. 2632, Sept. 1981.

    [29] R. G. Leonard and G. R. Doddington, A database for speaker-independent digit

    recognition, in Proceedings of the International Conference on Acoustics,Speech and Signal Processing, vol. 3. IEEE, 1984, p. 42.11.

    [30] J. Garofolo, E. Voorhees, C. Auzanne, V. Stanford, and B. Lund, Design and preparationof the 1996 HUB-4 broadcast news benchmark test corpora,

    in Proceedings of the DARPA Speech Recognition Workshop. Chantilly, Virginia: Morgan

    Kaufmann, Feb. 1997, pp. 1521.

    [31] (2003, Mar.) Sphinx-4 trainer design. [Online]. Available:

    http://www.speech.cs.cmu.edu/cgi-bin/cmusphinx/twiki/view/Sphinx4/Train%erDesign

    [32] J. R. Glass, A probablistic framework for segment-based speech recognition, Computer

    Speech and Language, vol. 17, no. 2, pp. 137152, Apr. 2003.

    http://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp://www.speech.cs.cmu.edu/cgi-bin/cmudicthttp:/