Voice Enabled Desktop Interaction and Control System (VEDICS).

23
Desktop at Your Command

description

Nischal Rao & Bharat Joshi, Rashtreeya Vidyalaya College of Engineering

Transcript of Voice Enabled Desktop Interaction and Control System (VEDICS).

Page 1: Voice Enabled Desktop Interaction and Control System (VEDICS).

Desktop at Your Command

Page 2: Voice Enabled Desktop Interaction and Control System (VEDICS).

• Team Members: Nischal E Rao Bharat Joshi Suhas Kamath N Sharath M Puranik

• Project Guide: Prof. Shantharam Nayak

• Carried out at:R.V. College of Engineering,

Bangalore, India.

Page 3: Voice Enabled Desktop Interaction and Control System (VEDICS).

• Voice Enabled Desktop Interaction andControl System(VEDICS) is a softwaresolution for controlling the desktop systemusing voice based commands.

• The system takes audio signals as input,processes it, recognizes it and executesthe desired action on the desktop system.

Page 4: Voice Enabled Desktop Interaction and Control System (VEDICS).

• All software products should incorporateaccessibility features to enable differently-abledpeople to use the software easily and efficiently.

• For persons with physical disabilities, the abilityto simply talk to a computer could be a pricelessasset.

• Hands-free computing is more convenient thanconventional I/O.

Page 5: Voice Enabled Desktop Interaction and Control System (VEDICS).

• The user should be able too access any element present on the user’s screen.o run common programs and applications.o navigate through the file system.o perform common window operations like minimize,

maximize, close etc.

• User commands should be easy to remember and use.

• The user must be able to turn the system on and offwhenever required.

Page 6: Voice Enabled Desktop Interaction and Control System (VEDICS).

• VEDICS follows MVC design pattern.

• Flexibility of using any speech-to-text converter for usewith VEDICS.

• VEDICS uses a feedback mechanism to learn what isbeing displayed on the desktop.

• Increased accuracy since only relevant words arerecognized.

Page 7: Voice Enabled Desktop Interaction and Control System (VEDICS).

Speech-to-text

Converter

Desktop

Control

System

User’s

Desktop

Recognized Text

Grammar and

Names of visible

elementsCurrently visible

objects

Command

Page 8: Voice Enabled Desktop Interaction and Control System (VEDICS).

• Speech to text Conversion

Speech To Text Converter

Page 9: Voice Enabled Desktop Interaction and Control System (VEDICS).

• Grammar and Dictionary are used to convert sound signals into text.

Speech To Text Converter

Grammar

Dictionary

Page 10: Voice Enabled Desktop Interaction and Control System (VEDICS).

• The recognized text is given as input to the Desktop Control System.

Speech To Text Converter

“Open Firefox” Desktop Control System

Grammar

Dictionary

Page 11: Voice Enabled Desktop Interaction and Control System (VEDICS).

• The Desktop Control System determines the command to execute on the desktop.

Speech To Text Converter

Desktop Control System

Open_firefox command

Page 12: Voice Enabled Desktop Interaction and Control System (VEDICS).

• After successful execution, the names of objects visible on the screen are collected.

Speech To Text Converter

Desktop Control System

“File” | “Edit” | “Google”

Page 13: Voice Enabled Desktop Interaction and Control System (VEDICS).

• The collected names are used to update the grammar and the dictionary files.

Speech To Text Converter

Desktop Control System

“File”, “Edit”, “Google”Grammar

Dictionary

Page 14: Voice Enabled Desktop Interaction and Control System (VEDICS).

• The updated grammar and dictionary files are used in the next recognition cycle.

Speech To Text Converter

Updated Grammar

Updated Dictionary

Page 15: Voice Enabled Desktop Interaction and Control System (VEDICS).

• VEDICS consists of the following parts:

o Sphinx 4 Sub-system : Open Source tool used to convertspeech to text.

o Desktop Control Sub-system: Used to execute the convertedtext into corresponding command on the desktop. It re-createsthe grammar file based on what is displayed on the screen.

o Logios Tool : Used to generate a new dictionary based onwhat is displayed on the screen.

Page 16: Voice Enabled Desktop Interaction and Control System (VEDICS).
Page 17: Voice Enabled Desktop Interaction and Control System (VEDICS).

• Accuracy of VEDICS depends on accuracy of Sphinx 4.• Summary of performance of Sphinx 4:

* RT Ratio: Ratio of utterance duration to the time taken to decode the utterance.

Parameters Performance

Vocabulary Size 79

Word Error Rate (in %) 1.192

RT Ratio in Single CPU Configuration* 0.25

RT Ratio in Dual CPU Configuration* 0.20

Page 18: Voice Enabled Desktop Interaction and Control System (VEDICS).

• Increased accuracy due to context aware nature ofVEDICS.

• Use of small vocabulary further improves accuracy.

• Use of Logios enables recognition of custom words.Words with any sequence of characters can berecognized.

• Almost all components accessible on the desktop.

Page 19: Voice Enabled Desktop Interaction and Control System (VEDICS).

• VEDICS can be used to perform most actions that canbe done using a pointing device.

• Using voice to access and control the desktop has manyadvantages. This feature can be a boon to thedifferently-abled people.

• VEDICS can navigate through file system, openapplications, control the desktop window, and recognizealmost any word.

• VEDICS is context aware. It determines whatis currently being displayed on the desktop anddynamically generates the grammar and the dictionary.

Page 20: Voice Enabled Desktop Interaction and Control System (VEDICS).

• Dictation facility: The ability to dictate into a text editor ortext field.

• Artificial Intelligence in VEDICS.

• If there is a conflict in name of object on the screen thenthe user should be able to select the right object.

• The user should be able to either pronounce the entireword or spell individual characters of the word.

• Facility to add custom commands to suit the user.

• Screen Reader Facility.

Page 21: Voice Enabled Desktop Interaction and Control System (VEDICS).

Project Link: http://vedics.sourceforge.net/

References:

• Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, Joe Woelfel, “Sphinx-4: A Flexible Open Source Framework for Speech Recognition”, SML Technical Report, Sun Microsystems, SMLI TR-2004-139, Nov. 2004

• Kai-Fu Lee, Hsiao-Wuen Hon, Raj Reddy, “An Overview of the SPHINX Speech Recognition System”, IEEE Transactions on Acoustics Speech and Signal Processing, Vol 38, No. 1, Jan, 1990.

• Frank Buschmann, Regine Meunier, Hans Rohnert, Peter Sommerlad, Michael Stal, “Pattern-Oriented Software Architecture – Vol 1: A System of Patterns”, Wiley Publications, 1996.

Page 22: Voice Enabled Desktop Interaction and Control System (VEDICS).

• Gnome Voice Control [Online]. Available: http://live.gnome.org/GnomeVoiceControl

• “Java Speech Grammar Format (JSGF)” [Online]. Available: http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/

• “Logios Lexicon Tool” [Online]. Available: http://www.speech.cs.cmu.edu/ tools/lextool.html

• “Gnome Accessibility API” [Online]. Available: http://library.gnome.org/devel/at-spi-cspi/

• “Libwnck: Window Navigator Construction Kit” [Online]. Available: http://library.gnome.org/devel/libwnck/

• “GConf Configuration System” [Online]. Available: http://library.gnome.org/devel/gconf/

Page 23: Voice Enabled Desktop Interaction and Control System (VEDICS).