Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy
description
Transcript of Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy
![Page 1: Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy](https://reader034.fdocuments.us/reader034/viewer/2022051517/56815a65550346895dc7ac05/html5/thumbnails/1.jpg)
Behrooz Chitsaz Lorrie Apple JohnsonMicrosoft Research U.S. Department of Energy Behrooz Chitsaz Lorrie Apple JohnsonMicrosoft Research U.S. Department of Energy
![Page 2: Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy](https://reader034.fdocuments.us/reader034/viewer/2022051517/56815a65550346895dc7ac05/html5/thumbnails/2.jpg)
Multimedia ResearchSpeech Search
Face identification
Object
recognition
Video browsing
Semantic
extraction
(3D) Segmentation
(3D) Image search
![Page 3: Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy](https://reader034.fdocuments.us/reader034/viewer/2022051517/56815a65550346895dc7ac05/html5/thumbnails/3.jpg)
Speech as interface
Speech as 1st class content
Speech Applications
![Page 4: Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy](https://reader034.fdocuments.us/reader034/viewer/2022051517/56815a65550346895dc7ac05/html5/thumbnails/4.jpg)
Speech recognition
Spectral Analysis
Matching (Decoding)time alignment most likely hypothesis
W’=argmax(w1..wN)p(ot..o|w1..wN) P(w1..wN)
Acoustic Modelsp(ot..o|phoneme)
DictionaryP(phonemes|w)
Grammar (Language Model)
P(w1..wN)
“Hello World”
o1..oT
(w1..wN)^
![Page 5: Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy](https://reader034.fdocuments.us/reader034/viewer/2022051517/56815a65550346895dc7ac05/html5/thumbnails/5.jpg)
MAVIS technology
• Indexing automatic transcripts as text– Automatic transcription accuracy is only 50-80%
• MAVIS techniques– Word-level lattice indexing
• index word alternatives – robust to recognizer errors• 50-140% accuracy improvement • index timing – navigate to exact point in video
– Vocabulary Adaptation• Use NLP and Bing Search to expand word dictionary
– Automatic keywords to expose to search engines• Enables discovery of speech content through search engines• Bi-product of vocabulary adaptation
– See http://research.microsoft.com/mavis
![Page 6: Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy](https://reader034.fdocuments.us/reader034/viewer/2022051517/56815a65550346895dc7ac05/html5/thumbnails/6.jpg)
MAVIS Architecture
SQL Server(s)
1. S
ubm
it au
dio/
vid
eo R
SS
2. R
etrie
ve
AIB
3. Import AIB in SQL
Web server(s)
4. S
earc
h/R
etr
ieve
re
sults
• Store content to be processed in temporary Azure storage
• Do vocabulary adaptation using Bing• Run recognition engine on content• Store results or recognition process (AIB)
![Page 7: Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy](https://reader034.fdocuments.us/reader034/viewer/2022051517/56815a65550346895dc7ac05/html5/thumbnails/7.jpg)
U.S. Department of Energy Office of Scientific and Technical
Information (OSTI) Mission
• DOE invests > $10 billion/year in basic sciences, clean energy technology, and nuclear research.
• The immediate output from this investment is Information…Knowledge… R&D results
• OSTI’s mission is to accelerate scientific progress by accelerating access to this information.
![Page 8: Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy](https://reader034.fdocuments.us/reader034/viewer/2022051517/56815a65550346895dc7ac05/html5/thumbnails/8.jpg)
OSTI’s Core Products
• Information Bridge
• Science Accelerator
• Science.gov
![Page 9: Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy](https://reader034.fdocuments.us/reader034/viewer/2022051517/56815a65550346895dc7ac05/html5/thumbnails/9.jpg)
WorldWideScience.org
![Page 10: Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy](https://reader034.fdocuments.us/reader034/viewer/2022051517/56815a65550346895dc7ac05/html5/thumbnails/10.jpg)
Emerging Forms of Scientific Information Require New Tools
• Numeric data, multimedia, and social media are emerging forms of scientific information
• Each form presents special opportunitiesand challenges
![Page 11: Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy](https://reader034.fdocuments.us/reader034/viewer/2022051517/56815a65550346895dc7ac05/html5/thumbnails/11.jpg)
Search and Retrieval Challenges with Multimedia Science Information
• Lack of written transcripts, i.e. no “full text” to search
• Metadata, if available, is often minimal
• Scientific, technical, and medical terminology/vocabulary
• Videos can be long, often up to an hour or more
![Page 12: Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy](https://reader034.fdocuments.us/reader034/viewer/2022051517/56815a65550346895dc7ac05/html5/thumbnails/12.jpg)
• Video files collected from DOE’s National Laboratories
• RSS feeds with metadata and URLs sent to Microsoft Research
• Audio indexing performed via MAVIS• Audio index blob (AIB) returned to OSTI and
integrated with SQL servers• Users can search for a precise term within the video,
and be directed to the exact point in the video where the term was spoken
OSTI and Microsoft Research Partnership
![Page 14: Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy](https://reader034.fdocuments.us/reader034/viewer/2022051517/56815a65550346895dc7ac05/html5/thumbnails/14.jpg)
Looking to the Future
• Additional content from DOE researchers• Integration of multimedia searches into
WorldWideScience.org by June• High quality automatic closed captions• Multilingual translation capabilities