Voice Based Image Viewer

download Voice Based Image Viewer

of 21

Transcript of Voice Based Image Viewer

  • 7/31/2019 Voice Based Image Viewer

    1/21

    Click to edit Master subtitle style

    8/19/12

    Voice Based Image

    Viewer

    Guided by-Mr. Arpit Agrawal

    11

    Prepared by-Neha Agrawal (7I629)Mayank Gupta (7I623)Mehul Gupta (7I624)

  • 7/31/2019 Voice Based Image Viewer

    2/21

    8/19/12

    ContentsProject goals

    Benefits

    Use case

    Java speech API

    How speech enabled application works?

    Speech recognition system

    Speech synthesis system

    Challenges

    Wh ava?22

  • 7/31/2019 Voice Based Image Viewer

    3/21

    8/19/12

    Project Goals To make a software, based on speech

    technology.

    To convert human voice into commands andsystem talking to the user in response.

    Try to bring a revolution from the traditionaltechnology.

    33

  • 7/31/2019 Voice Based Image Viewer

    4/21

    8/19/12

    BenefitsA major benefit of incorporating speech in anapplication is that speech is natural.

    Improves accessibility to computer for manypeople with physical limitations.

    Interactive voice response systems are alternativeto touch-tone interfaces.

    Dictation systems are considerably faster thantyped input.

    It reduce the risk of repetitive strain injury andother problems caused by current interfaces.

    Remote access to the system becomes possible.

    44

  • 7/31/2019 Voice Based Image Viewer

    5/21

    8/19/12

    Use Case OpenNext

    Previous

    Zoom

    Rotate

    Slideshow

    Save

    Flip

    User

    on saying

    'one'

    on saying

    'two'

    on saying

    'three','four'...

    on saying

    'seven'

    on saying

    'six'

    55

  • 7/31/2019 Voice Based Image Viewer

    6/21

    8/19/12

    Java Speech APIThe Java Speech API allows Java applications

    to incorporate speech technology into theiruser interfaces.

    It defines a cross-platform API to supportcommand and control recognizers, dictationsystems and speech synthesizers.

    Two core speech technologies are supported :

    1. speech recognition

    2. speech synthesis.

    66

  • 7/31/2019 Voice Based Image Viewer

    7/21

    8/19/12

    Java Speech API Stack

    77

    Speech engine isthe generic term for asystem designed to

    deal with eitherspeech input orspeech output.

    JSAPI engineprovides a trueimplementation ofthe Java classes andinterfaces definedby the API.

  • 7/31/2019 Voice Based Image Viewer

    8/21

    8/19/12

    How speech enabled

    application works?

    88

  • 7/31/2019 Voice Based Image Viewer

    9/21

    8/19/12

    Speech Recognition

    SystemIt processes audio input containing speech by

    converting it to text through the followingprocesses:

    Grammar design

    Signal processing

    Phoneme recognition

    Word recognition

    Result generation

    99

  • 7/31/2019 Voice Based Image Viewer

    10/21

    8/19/12

    Grammar designA grammar defines what a recognizer should

    listen for in incoming speech.

    Anygrammar defines the set of tokens a usercan say and the patterns in which those wordsare spoken.

    Grammar is active and inactive depending on

    whether

    recognizer is matching incoming audio or not.

    1010

  • 7/31/2019 Voice Based Image Viewer

    11/21

    8/19/12

    SphinxA JSAPI-compliant speech recognizer

    Facilitates speech feature through its classes

    and interfaces defined in jar files

    1111

  • 7/31/2019 Voice Based Image Viewer

    12/21

    8/19/12

    Sequence Diagram:

    Basic flow

    1212

  • 7/31/2019 Voice Based Image Viewer

    13/21

    8/19/12

    Alternate Flow

    1313

  • 7/31/2019 Voice Based Image Viewer

    14/21

  • 7/31/2019 Voice Based Image Viewer

    15/21

    8/19/12

    Speech synthesis system

    A speech synthesizer converts written textinto spoken language.

    Speech synthesis is also referred to as text-to-speech (TTS) conversion.

    Synthesis system makes use of speech outputqueue.

    1515

  • 7/31/2019 Voice Based Image Viewer

    16/21

    8/19/12

    Free TTS

    A JSAPI-compliant speech synthesizer.

    1616

  • 7/31/2019 Voice Based Image Viewer

    17/21

  • 7/31/2019 Voice Based Image Viewer

    18/21

    8/19/12

    Challenges

    Transience: What did you say?

    Speech is transient. Once you hear it or say

    it, its gone. By contrast, graphics arepersistent.

    Invisibility: What can I say?

    The lack of visibility makes it challenging tocommunicate the functional boundaries of anapplication to the user.

    Asymmetry

    People can produce speech easily and 1818

  • 7/31/2019 Voice Based Image Viewer

    19/21

    8/19/12

    Why java??

    Java is easy to learn

    Java is object-oriented

    Java is platform independent

    Javas Speech API

    Java is distributed

    1919

  • 7/31/2019 Voice Based Image Viewer

    20/21

    Click to edit Master subtitle style

    8/19/12

    Thank you

    2020

  • 7/31/2019 Voice Based Image Viewer

    21/21

    8/19/12

    Queriesinvited

    2121