Speech recognition: ready to take off?

50
By: Ma Jie (A0129447X) Niu Rui (A0040287J) Nguyen Gia Huy (A0045581E) Liu Lili (A0132407R) Tan Gee Kwang (A0147159X) Speech Recognition: Ready to Take Off?

Transcript of Speech recognition: ready to take off?

Page 1: Speech recognition: ready to take off?

By: Ma Jie (A0129447X)Niu Rui (A0040287J)Nguyen Gia Huy (A0045581E)Liu Lili (A0132407R)Tan Gee Kwang (A0147159X)

Speech Recognition: Ready to Take Off?

Page 2: Speech recognition: ready to take off?

Overview

• Siri

• Other applicationsPerformance

of SR

• Underlying technologySR improvement

• Avionics

• Field AutomationEmerging

Application

Page 3: Speech recognition: ready to take off?

Overview

• Siri

• Other applicationsPerformance

of SR

• Underlying technologySR improvement

• Avionics

• Field AutomationEmerging

Application

Page 4: Speech recognition: ready to take off?
Page 5: Speech recognition: ready to take off?
Page 6: Speech recognition: ready to take off?

In 2013, Intelligent Voice survey showed that only 15% of respondents said that they had used Siri in iOS7. Nearly half believed Apple had “oversold Siri’s voice recognition capabilities”

2015 WWDC, Apple’s software engineering vice president claimed that Siri Gets 1 Billion Requests a Week

Performance of Siri

Doing Basic Math faster Find facts two times faster Four Time faster than you to set alarms Tweets more than two times faster than you Convert measurements

Page 7: Speech recognition: ready to take off?

Siri Usage Rate Detail and Customer Satisfaction

Source: http://www.imore.com/siri-months-community-report-card

15%

36%

10%

20%

12%

7%

Do you use Siri on your iOS device?

Yes, and I like it

Yes, but it could be better

Yes, and I'm neutral

No: tried it and didn't like it

No: I didn't even try because I have no desire

Other

Source: http://www.besttechie.com/2013/03/07/do-people-still-use-siri/

Page 8: Speech recognition: ready to take off?

Performance of Siri

Apple claims that iOS 9, Siriwill be up to 40 percent faster and 40 percent more accurate

What has hold it back?1. There is learning curve. 2. It’s far from perfect3. The use cases are limited4. Lack of integration of third-party apps

Page 9: Speech recognition: ready to take off?

Speech Recognition Market

Source: Matt M., Joshua S., and David H. 2014. Dynamic Commercialization Strategies for Disruptive Technologies: Evidence from the Speech Recognition Industry

In past 50 years, the technological breakthroughs haven enabled the SR become reality.

Coupled with the advances in CPU power and enhanced software algorithms, SR had achieved steep improvement and commercial feasibility after 1990s.

Page 10: Speech recognition: ready to take off?

Current Applications of SR

Applications in various industries

Call Centers

Medical Industries

Education

Automotive

Home Automation

Page 11: Speech recognition: ready to take off?

Students with disabilities used a SR powered Hosted Transcription System (HTS) to convert digitized audio and video into accessible, Multimedia Transcripts

In 2011, 52% of Canadian disability service providers interviewed reported using speech to text supports

Strengthen by lowering WER

Problems:

– Scalability to meet temporal demands

– Fixed cost for infrastructure

SR in Educational – Liberated learning project (LLR)Quality

Cost

Source: http://www.transcribeyourclass.ca/financial.html

Page 12: Speech recognition: ready to take off?

HIS Automotive: About 25% U.S. motorists use speech recognition in their cars dailyand 53% use it at least once a week; by 2020, 68 million vehicles worldwide will have voice controls, increased by 84% from 37 million in 2014.

SR in Automotive

Page 13: Speech recognition: ready to take off?

Most SR in today’s market have about 50 to 60 voice commands

Common used features: Make calls, play music, temperature control, navigation.

More features available: Reminders, Send emails, search nearby restaurants/shops/petrol stations, real-time traffic conditions, connect to other SR control system (e.g. home automation)…

SR in Automotive

Page 14: Speech recognition: ready to take off?

Nuance – Dragon Drive Platform

– Cloud-based voice and content solutions

– Integrated with in-vehicle cloud-based search capabilities from Telenav, leader of location-based services (Source: Telenav, Nov 3, 2015)

– Attractive features – Read out the daily update when enters the car, Connect the home to your car through LG HomeChat software

SR in Automotive

Video: https://www.youtube.com/watch?v=laxXWUxXcWs

Page 15: Speech recognition: ready to take off?

Problems encountered with ASR in cars -

– Doesn’t recognize/misinterprets verbal commands (63 percent)

– Doesn’t recognize/misinterprets names/words (44 percent)

– Doesn’t recognize/misinterprets numbers (31 percent)

– Wind noise

– Language accents

– Imperfect speech recognition software might prove to be a distraction

SR in Automotive

Page 16: Speech recognition: ready to take off?

SR in Home Automation

Smart home

– Lighting control (Vocca)

– TV (apple TV)

– Personal Assistant (Echo, Homey)

Page 17: Speech recognition: ready to take off?

SR in Home Automation – Apple TV

The Apple TV uses Siri search as the glue that holds all those individual apps together. Voice commands (also found on Roku, Android TV and Amazon Fire TV) are easier than entering names on a virtual keyboard. And despite some rough edges, Siri is more helpful than the rest.

Siri’s advantage is more advanced queries.

Six degrees of Kevin Bacon

Filter TV episodes by actors

Rewind

Siri’s limitation:

Pronunciation of difficult names

TV show recognition by genres

Source: http://www.wsj.com/articles/apple-tv-review-a-giant-iphone-for-your-living-room-1446080460

The TV of the future needs to be as powerful and easy to use as an iPhone, and this Apple TV is the first box—and the first Apple TV—to achieve that.

Page 18: Speech recognition: ready to take off?

Amazon Echo – launch in November 6, 2014 Limited and June 23, 2015 Wide

Can answer general questions, reorder the items you buy frequently from Amazon, and play music

SR in Home Automation

Source: http://www.amazon.com/Amazon-SK705DI-Echo/dp/B00X4WHP5E/ref=sr_1_1?ie=UTF8&qid=1446173814&sr=8-1&keywords=amazon+echo

Source: http://www.cnet.com/products/amazon-echo-review/

Page 19: Speech recognition: ready to take off?

Apple's HomeKit

– A framework for communicating with and controlling connected accessories in a user’s home, announced in Apple WWDC 2014.

SR in Home Automation

HomeKit-certified devicesecobee3 Use sensors and a thermostat to keep tabs on your home’s temp.

ElgatoA variety of Elgato’s Eve sensors will give you all kinds of information about what’s going on inside your home. (Door & Window, Energy, Weather, Room)

iHome Connect ordinary devices into the smart plug, and you can start controlling them with your phone.Insteon The company’s hub can control all its products, including lights and locks, even from outside your home.Lutron Control your lights and shades with its bridges and kits.

iDevicesPlug anything into the company’s indoor or outdoor switch to make the device smart, and control your climate with the thermostat.

Schlage You’ll be able to ask Siri to lock and unlock your door.

AugustThe smart lock company announced a doorbell camera and keypad to its lineup, but it’s just the new lock that works with Siri for now.

Coming Plugs, Thermostats (Honeywell Lyric), Lighting (Philips), Alarm System (Honeywell Lynx Security System)

PartnershipsChamberlain MA Garage, Cree, Friday Smart Lock, GE (color-changing LEDs), Haier (smart air-conditioner), Incipio, Kwikset, Netatmo, Osram Sylvania, Philips Hue, SkyBell, Withings (baby monitors)

Source: http://www.digitaltrends.com/home/a-list-of-apple-homekit-compatible-devices/

Total price: US$2000

Page 20: Speech recognition: ready to take off?

SR in Home Automation

Source: http://publications.lib.chalmers.se/records/fulltext/203117/203117.pdf

Most common used features

Other features that users would like

Page 21: Speech recognition: ready to take off?

There is user base for SR (doctors, drivers, smart phone users…)

But the fact is that most of the customers only tried few times or use basic commands for SR when they have to (driving, busy hands, etc.)

Why?

– SR doesn’t recognize the complicated commands, which offers limitations to the features

– SR reacts very slow

– Takes time to train it

– Interaction with SR is not natural; words must be clear and without emotion

– Bad first impression, no interest to try even SR is improving

Summary of Challenges in SR

Customers don’t think that using SR is necessary in their daily life!

Page 22: Speech recognition: ready to take off?

Overview

• Siri

• Other applicationsPerformance

of SR

• Underlying technologySR improvement

• Avionics

• Field AutomationEmerging

Application

Page 23: Speech recognition: ready to take off?

ComponentsRequirementsDimension

SpeedProcess the algorithms

Processor

Underlying Technology of Speech Recognition

Source: http://web.sfc.keio.ac.jp/~rdv/keio/sfc/teaching/architecture/architecture-2008/lec07-cache.html

Page 24: Speech recognition: ready to take off?

AchievementsRequirementsDimension

Accuracy

Quality of Signal Receive

Background noise

elimination

Channel effect elimination

Acoustic scoring

Deep Learning

Acoustic database

Language Matching

Modelling

Language database

Underlying Technology of Speech Recognition

Page 25: Speech recognition: ready to take off?

AchievementsRequirementsDimension

Accuracy

Quality of Signal Receive

Background noise

elimination

Channel effect elimination

Acoustic scoring

Deep Learning

Acoustic database

Language Matching

Modelling

Language database

Underlying Technology of Speech Recognition

Microphone

Components

Page 26: Speech recognition: ready to take off?

AchievementsRequirementsDimension

Accuracy

Quality of Signal Receive

Background noise

elimination

Channel effect elimination

Acoustic scoring

Deep Learning

Acoustic database

Language Matching

Modelling

Language database

Underlying Technology of Speech Recognition

Memory

Components

• Speech Recognition needs support from data base which can be local or in Cloud.

• Performance of memory is far behind processor, bottleneck of SRS is memory speed (network speed if with Cloud)

Source: http://web.sfc.keio.ac.jp/~rdv/keio/sfc/teaching/architecture/architecture-2008/lec07-cache.html

Page 27: Speech recognition: ready to take off?

AchievementsRequirementsDimension

Accuracy

Quality of Signal Receive

Background noise

elimination

Channel effect elimination

Acoustic scoring

Deep Learning

Acoustic database

Language Matching

Modelling

Language database

Underlying Technology of Speech Recognition

Algorithms

Components

Page 28: Speech recognition: ready to take off?

Noise Elimination Algorithm Performance• Noise has two main effects over the speech representation: distortion in the

representation space, and a loss of information. • Study shows that noise compensation methods will help to improve the accuracy in

different SNR (signal noise ratio) levels and distances

Source: Angel de la T. et al. Speech Recognition Under Noise Conditions: Compensation Methods

Source: Pedro J. Moreno, 1996, Speech Recognition in Noisy Environments

Page 29: Speech recognition: ready to take off?

Speakers may have different accents, dialects, or pronunciations, and speak in different styles, at different rates, and in different emotional states.

Deep learning, introduced in 2006, attempt to learn multiple levels of representation of increasing complexity/abstraction.

A new architecture, the deep belief network (DBN)-HMM, has been developed in 2012.

Deep Learning

Page 30: Speech recognition: ready to take off?

Idea was started from 1970s, but the progress is very slow -> Computational and data limitations

Deep learning - one step closer to artificial intelligence

Deep Learning

More data Faster hardware

Page 31: Speech recognition: ready to take off?

Word error rate (WER) for SR technology in automotive has been reduced to below <1%

Accuracy of SR

Source: http://whatsnext.nuance.com/in-the-labs/deep-learning-in-connected-cars/

Page 32: Speech recognition: ready to take off?

Overall WER improvement for SR

Accuracy of SR

Source: http://whatsnext.nuance.com/in-the-labs/what-is-deep-machine-learning/

Page 33: Speech recognition: ready to take off?

Accuracy of SR According to Baidu, their error rates in a clean environment were at 6.56% and

19.06% in noisy environments by using GPUs

Apple claims that Siri in iOS 9 has only a 5% word error rate

Siri in iOS 9 requests to teach Siri your voice whenever change to a new language

Source: NVIDIA GTC: The Race To Perfect Voice Recognition Using GPUs

TARGET: < 0.1% or even 0%

Page 34: Speech recognition: ready to take off?

How will SR improve further?

Customers don’t think that using SR is necessary in their daily life!

BUT IF –SR is faster and smarter to understand the commands, with more features available

Customers might start thinking: Why not try SR?

For example: Ability to recognize multilingual content, direct link to third-party apps, allow multi-users to interact at the same time…

Page 35: Speech recognition: ready to take off?

So, when will SR like Siri be able to widely used by customers?

2020 to 2025– Improvement of Deep Learning (Apple has just acquired VocalIQ in Oct, 2015) for

more intelligent algorithm

– Improvement of Big data, multiple channels to enhance data base used in modeling for higher accuracy

– Improvement of Mobile network, faster response for better customer experience

– With diffusion of smart devices and apps, new customers will get more chance to accept SR before old hobby formed

– Potential new standard of human-machine interface

– Cost will be reduced further with core components improvement

How will SR improve further?

Page 36: Speech recognition: ready to take off?

Speech Recognition: Future Market Trend Voice will be the most important area for growth in mobile user interfaces

Tractica forecasts the growth rate for SR: reach $5.1 billion by 2024 at a CAGR of 40%

Strongest market - Consumer-facing market: Mobile device authentication and control of wearable devices

Page 37: Speech recognition: ready to take off?

Global Automotive Voice Recognition Market 2014-2018 forecasts the automotive voice recognition sector to grow at 10.59% CAGR to 2018

Speech Recognition: Future Market Trend SR market in Automotive

Page 38: Speech recognition: ready to take off?

Market for Home automation

– Annual growth rate can reach 67% over next 5 years

– Revenue arrives $61billion with 52% compound annual growth rate, forecast the value can reach $490 billion in 2019

Speech Recognition: Future Market Trend

Page 39: Speech recognition: ready to take off?

Overview

• Siri

• Other applicationsPerformance

of SR

• Underlying technologySR improvement

• Avionics

• Field AutomationEmerging

Application

Page 40: Speech recognition: ready to take off?

SR in Avionics - Head-in and Head-out in cockpit

Multi-function displays with menu structures many tiers deep

Pilot needs one hand on collective while the other one on the joystick

Page 41: Speech recognition: ready to take off?

SR in Avionics

Speech recognition reduce workload and free hands for pilots.

With increment of head up time, pilot can focus on flying the aircraft and response to out environment.

Noise elimination and integration with onboard system

http://www.speech.sri.com/press/airforce-print-news-oct15-2007.pdf http://www.gizmag.com/go/7484/

Page 42: Speech recognition: ready to take off?

Navigation Functions

• Entering waypoints and inputting FMS data

• Reduce confusion

Communication Functions

• Change frequencies of channel by voice control

• Query system by “asking”

Checklist

• Task list

• Avionics monitor

Safety and security are roadblocks for SR adoption in avionics

Entry level functions with low safety concerns

SR in Avionics

Page 43: Speech recognition: ready to take off?

SR Deployment in Avionics

2000 2007 2008 2014 2015Typhoon Gazelle F-35 & F-22 Sferion Assistance System

Direct input voice system Speaker- independent system

Start in civil avionics

Pro Line Fusion flight deck

Page 44: Speech recognition: ready to take off?

It is not a technology problem, but more of an acceptance problem.

Air transport will accept after SR product actually comes out and proves its value

SR Commercialization in Avionics

"We've hit our sweet spot finally and its gotten to the point where its getting very, very close to being product ready in terms of being mature enough to get out there."

- Geoff Shapiro from Rockwell CollinesResource: http://www.aviationtoday.com/av/topstories/Rockwell-Collins-Rapidly-Advancing-Cockpit-Voice-Recognition-Technology_83515.html#.Vjm710b0wTY

Page 45: Speech recognition: ready to take off?

SR in Field Automation

Equipment inspection in the field by using portable devices embedded with speech recognition system

Enter data faster and reduce the cost

Page 46: Speech recognition: ready to take off?

Source: https://www.earthworksaction.org/issues/detail/oil_and_gas_noise#.Vh_SNN-qpBchttp://www.ehjournal.net/content/14/1/18

SR in Field Automation

Noise level is very high thus noise elimination will be more challenging

Page 47: Speech recognition: ready to take off?

Robot designed for dedicate functions can only receive pre-defined instruction

Low request for noise elimination, process and memory

SR in Personal Robot for Family

Page 48: Speech recognition: ready to take off?

Artificial Intelligence – Key technology for future improvement of SR

We should “talk” rather than type

Artificial Intelligence should be deployed in any complex environment with capacity to understand the instruction

High request for noise elimination, process and memory

Page 49: Speech recognition: ready to take off?

SR in the future – everywhere in your life

Driving in the car Shopping in the mall Eating in the canteen

Page 50: Speech recognition: ready to take off?

Q&A