Download - Omar Tawakol at AI Frontiers: The Rise Of Voice-Activated Assistants In The Workplace

E N T E R P R I S E V O I C E A IE N T E R P R I S E V O I C E A I

Exoskeletons, not Robots

• Four attributes of exoskeletons– Enhance not replace– Collective intelligence– Fits your workflow– Human-in-the-loop (optional)

2

The most adopted form of enterprise collaboration

3

high

low high

Email

Meetings(Voice)

low

Employee time spent

IM

InformationGeneration

Size of bubble representsactivation opportunity

lacks activation

EnterpriseApps

3

$1,000AVG. LABOR COST PER 60

MINUTE MEETING

$75MFORTUNE 50 COMPANY

WASTED LABOR

9BUS MEETINGS PER YEAR

(100B GLOBALLY)

37%TIME SPENT IN

MEETINGS

voicera = voice collaboration

• Connect what you say with you what you do

• meet eva, your in-meeting AI assistant that takes notes

5

step 1: call or invite [email protected] to your meetingsstep 2: interact through voice queues or “taps”step 3: review email and share through Voicera

secular trends

6

Enterprise

Voice Collaboration

Consumer

“Gartner predicts that by 2020, 60% of meetings with three or more participants will involve a virtual assistant.”

7

Agenda

Actions

Decisions

Artifacts

Feedback

Meeting Threads

horizontal use: collaborate and share information with clarity…

conversations inbox

Post Meeting Inbox View

8

Post Meeting Inbox View

9

A different type of competitive advantage

10

• Oracle Data Cloud & Classic Data Network Effects

• AI can create a compounding competitive advantage*

More DataSellers

More DataBuyers

Better Monetization

NetworkEffect

…but producing this type of advantage isn’t business as usual.

BetterExperience

More interaction

data

Better algorithmic

results

Deeper preferences

learned

Compoundingadvantage

*GGVC term

Building the data pipeline

• Bootstrap through acquiring data & labels

• Generate production data

• Process for accurate, continuous labels (e.g. FP, TP, and FN)

• Compress learning cycles w/ model automation:– Creation– Judgement – Parameter tuning/learning– Deployment

11

Example: Key Word Spotting

• Goal: Utterance in which the keyword is spoken has higher confidence than any other spoken utterance in which the keyword is not spoken

• The most common measure to evaluate keyword spotters is AUC (Area Under Precision & Recall Curve)

• Alternatively, we also use Recall @ Near 100% Precision

12

Technical Challenges

• Telephony is the least common denominator– 8K Sampling Rate

• A wide variety of microphones & meeting environments

• High Social Cost of False Triggers

• Online Decoding: Very Fast & Small footprint

• Handle different accents and pronunciations

13

Avoiding Judgement Errors: Survivor Bias Example• A KWS creates FP, TP and misses FN

• FP & TP are easily labeled (FN are harder)

• Survivor bias misjudges performance of next candidate

• New algorithm has a bias for it for FP – b/c it won’t generate the same false positives– but it would generate its own false positives

• New algorithm has a bias against it for FN – b/c it is judged against TP and fails >0%– but it could accurately identify previous algorithms FN

14

Results• We train a number of models for various keywords

• On average we achieve:– A precision of ~0.0005%

false trigger every 3 (1-hour meetings)– A recall of ~90%:

1 of 10 voice commands missed

• The results vary dramatically based onenvironments

• Our online training constantly trains

• Please visit: http://voicera.com to signup and use

15

Performance over time

recall

prec

isio

n