Voice Enabled Desktop Interaction and Control System (VEDICS).

Desktop at Your Command

• Team Members: Nischal E Rao Bharat Joshi Suhas Kamath N Sharath M Puranik

• Project Guide: Prof. Shantharam Nayak

• Carried out at:R.V. College of Engineering,

Bangalore, India.

• Voice Enabled Desktop Interaction andControl System(VEDICS) is a softwaresolution for controlling the desktop systemusing voice based commands.

• The system takes audio signals as input,processes it, recognizes it and executesthe desired action on the desktop system.

• All software products should incorporateaccessibility features to enable differently-abledpeople to use the software easily and efficiently.

• For persons with physical disabilities, the abilityto simply talk to a computer could be a pricelessasset.

• Hands-free computing is more convenient thanconventional I/O.

• The user should be able too access any element present on the user’s screen.o run common programs and applications.o navigate through the file system.o perform common window operations like minimize,

maximize, close etc.

• User commands should be easy to remember and use.

• The user must be able to turn the system on and offwhenever required.

• VEDICS follows MVC design pattern.

• Flexibility of using any speech-to-text converter for usewith VEDICS.

• VEDICS uses a feedback mechanism to learn what isbeing displayed on the desktop.

• Increased accuracy since only relevant words arerecognized.

Speech-to-text

Converter

Desktop

Control

System

User’s

Desktop

Recognized Text

Grammar and

Names of visible

elementsCurrently visible

objects

Command

• Speech to text Conversion

Speech To Text Converter

• Grammar and Dictionary are used to convert sound signals into text.


Grammar

Dictionary

• The recognized text is given as input to the Desktop Control System.


“Open Firefox” Desktop Control System

Grammar

Dictionary

• The Desktop Control System determines the command to execute on the desktop.


Desktop Control System

Open_firefox command

• After successful execution, the names of objects visible on the screen are collected.



“File” | “Edit” | “Google”

• The collected names are used to update the grammar and the dictionary files.



“File”, “Edit”, “Google”Grammar

Dictionary

• The updated grammar and dictionary files are used in the next recognition cycle.


Updated Grammar

Updated Dictionary

• VEDICS consists of the following parts:

o Sphinx 4 Sub-system : Open Source tool used to convertspeech to text.

o Desktop Control Sub-system: Used to execute the convertedtext into corresponding command on the desktop. It re-createsthe grammar file based on what is displayed on the screen.

o Logios Tool : Used to generate a new dictionary based onwhat is displayed on the screen.

• Accuracy of VEDICS depends on accuracy of Sphinx 4.• Summary of performance of Sphinx 4:

* RT Ratio: Ratio of utterance duration to the time taken to decode the utterance.

Parameters Performance

Vocabulary Size 79

Word Error Rate (in %) 1.192

RT Ratio in Single CPU Configuration* 0.25

RT Ratio in Dual CPU Configuration* 0.20

• Increased accuracy due to context aware nature ofVEDICS.

• Use of small vocabulary further improves accuracy.

• Use of Logios enables recognition of custom words.Words with any sequence of characters can berecognized.

• Almost all components accessible on the desktop.

• VEDICS can be used to perform most actions that canbe done using a pointing device.

• Using voice to access and control the desktop has manyadvantages. This feature can be a boon to thedifferently-abled people.

• VEDICS can navigate through file system, openapplications, control the desktop window, and recognizealmost any word.

• VEDICS is context aware. It determines whatis currently being displayed on the desktop anddynamically generates the grammar and the dictionary.

• Dictation facility: The ability to dictate into a text editor ortext field.

• Artificial Intelligence in VEDICS.

• If there is a conflict in name of object on the screen thenthe user should be able to select the right object.

• The user should be able to either pronounce the entireword or spell individual characters of the word.

• Facility to add custom commands to suit the user.

• Screen Reader Facility.

Project Link: http://vedics.sourceforge.net/

References:

• Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, Joe Woelfel, “Sphinx-4: A Flexible Open Source Framework for Speech Recognition”, SML Technical Report, Sun Microsystems, SMLI TR-2004-139, Nov. 2004

• Kai-Fu Lee, Hsiao-Wuen Hon, Raj Reddy, “An Overview of the SPHINX Speech Recognition System”, IEEE Transactions on Acoustics Speech and Signal Processing, Vol 38, No. 1, Jan, 1990.

• Frank Buschmann, Regine Meunier, Hans Rohnert, Peter Sommerlad, Michael Stal, “Pattern-Oriented Software Architecture – Vol 1: A System of Patterns”, Wiley Publications, 1996.

• Gnome Voice Control [Online]. Available: http://live.gnome.org/GnomeVoiceControl

• “Java Speech Grammar Format (JSGF)” [Online]. Available: http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/

• “Logios Lexicon Tool” [Online]. Available: http://www.speech.cs.cmu.edu/ tools/lextool.html

• “Gnome Accessibility API” [Online]. Available: http://library.gnome.org/devel/at-spi-cspi/

• “Libwnck: Window Navigator Construction Kit” [Online]. Available: http://library.gnome.org/devel/libwnck/

• “GConf Configuration System” [Online]. Available: http://library.gnome.org/devel/gconf/

Voice Enabled Desktop Interaction and Control System (VEDICS).

Technology

Transcript of Voice Enabled Desktop Interaction and Control System (VEDICS).