Voice input and speech recognition system in tourism/social media

Click here to load reader

download Voice input and speech recognition system in tourism/social media

of 14

Transcript of Voice input and speech recognition system in tourism/social media

Slide 1

Voice input & speech Recognition System in tourism/social media network.

Meaning A device in which speech is used to input data or system commands directly into a system. Such equipment involves the use ofspeech recognitionprocesses, and can replace or supplement other input devices. Some voice input devices can recognize spoken words from a predefined vocabulary, some have to be trained for a particular speaker. When the operator utters a vocabulary item, the matching data input is displayed as characters on a screen and can then be verified by the operator.

Speech recognition(SR) is theinter-disciplinarysubfield ofcomputational linguisticswhich incorporates knowledge and research in thelinguistics, computer science, and electrical engineering fields to develop methodologies and technologies that enables the recognition andtranslationof spoken language into text by computers and computerized devices such as those categoriezed asSmart Technologiesandrobotics. It is also known as "automatic speech recognition" (ASR), "computer speech recognition", or just "speech to text" (STT).

Siri for ios

Speech recognition applications includevoice user interfacessuch as voice dialling (e.g. "Call home"), call routing (e.g. "I would like to make a collect call"),domoticappliance control, search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), speech-to-text processing (e.g.,word processorsoremails), andaircraft(usually termedDirect Voice Input).

Facts

Eliminating typewriter/keyboard and so on.From the technology perspective, speech recognition has a long history with several waves of major innovations.. These speech industry players include Microsoft, Google, IBM, Baidu (China), Apple, Amazon, Nuance, IflyTek (China), many of which have publicized the core technology in their speech recognition systems being based on deep learning.

Used for translation

In tourismCar GPS systemMobile maps applications-for directions like google map,here map etcApps like watsaap,viber, and most of the social networking sites other key areasDigital Security locks

Skype and other social networking sites

AccuracyAs mentioned earlier in this article, accuracy of speech recognition varies in the following:Error rates increase as the vocabulary size grows:e.g. The 10 digits "zero" to "nine" can be recognized essentially perfectly, but vocabulary sizes of 200, 5000 or 100000 may have error rates of 3%, 7% or 45% respectively.Vocabulary is hard to recognize if it contains confusable words:e.g. The 26 letters of the English alphabet are difficult to discriminate because they are confusable words (most notoriously, the E-set: "B, C, D, E, G, P, T, V, Z"); an 8% error rate is considered good for this vocabulary.[Speaker dependence vs. independence:A speaker-dependent system is intended for use by a single speaker.A speaker-independent system is intended for use by any speaker, more difficult.Isolated, Discontinuous or continuous speechWith isolated speech single words are used, therefore it becomes easier to recognize the speech.With discontinuous speech full sentences separated by silence are used, therefore it becomes easier to recognize the speech as well as with isolated speech.With continuous speech naturally spoken sentences are used, therefore it becomes harder to recognize the speech, different from both isolated and discontinuous speech.Task and language constraints

Lets make it simpleThe trouble is, listening is much harder than it looks (or sounds): there are all sorts of different problems going on at the same time...When someone speaks to you in the street, there's the sheer difficulty of separating their words (what scientists would call the acousticsignal) from the backgroundnoiseespecially in something like a cocktail party, where the "noise" is similar speech from other conversations.When people talk quickly, and run all their words together in a long stream, how do we know exactly when one word ends and the next one begins? (Did they just say "dancing and smile" or "dance, sing, and smile"?)There's the problem of how everyone's voice is a little bit different, and the way our voices change from moment to moment. How do our brains figure out that a word like "bird" means exactly the same thing when it's trilled by a ten year-old girl or boomed by her forty-year-old father?

What about words like "red" and "read" that sound identical but mean totally different things (homophones, as they're called)? How does our brain know which word the speaker means?What about sentences that are misheard to mean radically different things? There's the age-old military example of "send reinforcements, we're going to advance" being misheard for "send three and fourpence, we're going to a dance"and all of us can probably think of song lyrics we've hilariously misunderstood the same way (I always chuckle when I hear Kate Bush singing about "the cattle burning over your shoulder").

Thank you