E N T E R P R I S E V O I C E A IE N T E R P R I S E V O I C E A I
Exoskeletons, not Robots
• Four attributes of exoskeletons– Enhance not replace– Collective intelligence– Fits your workflow– Human-in-the-loop (optional)
2
The most adopted form of enterprise collaboration
3
high
low high
Meetings(Voice)
low
Employee time spent
IM
InformationGeneration
Size of bubble representsactivation opportunity
lacks activation
EnterpriseApps
3
$1,000AVG. LABOR COST PER 60
MINUTE MEETING
$75MFORTUNE 50 COMPANY
WASTED LABOR
9BUS MEETINGS PER YEAR
(100B GLOBALLY)
37%TIME SPENT IN
MEETINGS
voicera = voice collaboration
• Connect what you say with you what you do
• meet eva, your in-meeting AI assistant that takes notes
5
step 1: call or invite [email protected] to your meetingsstep 2: interact through voice queues or “taps”step 3: review email and share through Voicera
secular trends
6
Enterprise
Voice Collaboration
Consumer
“Gartner predicts that by 2020, 60% of meetings with three or more participants will involve a virtual assistant.”
7
Agenda
Actions
Decisions
Artifacts
Feedback
Meeting Threads
horizontal use: collaborate and share information with clarity…
conversations inbox
Post Meeting Inbox View
8
Post Meeting Inbox View
9
A different type of competitive advantage
10
• Oracle Data Cloud & Classic Data Network Effects
• AI can create a compounding competitive advantage*
More DataSellers
More DataBuyers
Better Monetization
NetworkEffect
…but producing this type of advantage isn’t business as usual.
BetterExperience
More interaction
data
Better algorithmic
results
Deeper preferences
learned
Compoundingadvantage
*GGVC term
Building the data pipeline
• Bootstrap through acquiring data & labels
• Generate production data
• Process for accurate, continuous labels (e.g. FP, TP, and FN)
• Compress learning cycles w/ model automation:– Creation– Judgement – Parameter tuning/learning– Deployment
11
Example: Key Word Spotting
• Goal: Utterance in which the keyword is spoken has higher confidence than any other spoken utterance in which the keyword is not spoken
• The most common measure to evaluate keyword spotters is AUC (Area Under Precision & Recall Curve)
• Alternatively, we also use Recall @ Near 100% Precision
12
Technical Challenges
• Telephony is the least common denominator– 8K Sampling Rate
• A wide variety of microphones & meeting environments
• High Social Cost of False Triggers
• Online Decoding: Very Fast & Small footprint
• Handle different accents and pronunciations
13
Avoiding Judgement Errors: Survivor Bias Example• A KWS creates FP, TP and misses FN
• FP & TP are easily labeled (FN are harder)
• Survivor bias misjudges performance of next candidate
• New algorithm has a bias for it for FP – b/c it won’t generate the same false positives– but it would generate its own false positives
• New algorithm has a bias against it for FN – b/c it is judged against TP and fails >0%– but it could accurately identify previous algorithms FN
14
Results• We train a number of models for various keywords
• On average we achieve:– A precision of ~0.0005%
false trigger every 3 (1-hour meetings)– A recall of ~90%:
1 of 10 voice commands missed
• The results vary dramatically based onenvironments
• Our online training constantly trains
• Please visit: http://voicera.com to signup and use
15
Performance over time
recall
prec
isio
n
16
Top Related