SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

31
SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks Brandon Muramatsu [email protected] Andrew McKinney [email protected] Peter Wilkins [email protected] MIT, Office of Educational Innovation and Technology Citation: Muramatsu, B., McKinney, A., Wilkins, P. (2010). SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks. Presented at NERCOMP 2010: Providence, Rhode Island, March 9, 2010. Unless otherwise specified, this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

description

Need to find a specific segment in an hour-long web video, webcast or podcast of a lecture? Want to read a transcript of that lecture? Want to bookmark, annotate, or discuss video or audio clips from an entire lecture? The SpokenMedia project at MIT is developing a web-based service to enable automatic lecture transcription. The project is also developing a suite of tools and services to improve interaction with webcasts and podcasts enabling students and faculty to create rich media notebooks to support their learning and teaching. Presented by Brandon Muramatsu, Andrew McKinney and Peter Wilkins at the NERCOMP 2010, Providence, Rhode Island, March 9, 2010.

Transcript of SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Page 1: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

SpokenMedia:Automatic Lecture Transcription

and Rich Media Notebooks

Brandon Muramatsu [email protected] McKinney [email protected]

Peter Wilkins [email protected]

MIT, Office of Educational Innovation and Technology

Citation: Muramatsu, B., McKinney, A., Wilkins, P. (2010). SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks.Presented at NERCOMP 2010: Providence, Rhode Island, March 9, 2010.

Unless otherwise specified, this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

Page 2: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

SpokenMedia:Automatic Lecture Transcription

and Rich Media Notebooks

Brandon Muramatsu [email protected] McKinney [email protected]

Peter Wilkins [email protected]

MIT, Office of Educational Innovation and Technology

Citation: Muramatsu, B., McKinney, A., Wilkins, P. (2010). SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks.Presented at NERCOMP 2010: Providence, Rhode Island, March 9, 2010.

Unless otherwise specified, this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License

B R E A K I N G N E W S…YouTube announces captions on all videos…News at 11……we now return you to your regularly scheduled presentation…

YouTube. (2010, March 4). The Future Will be Captioned: Improving Accessibility on YouTube. Retrieved on March 8, 2010 from YouTube Website:

http://youtube-global.blogspot.com/2010/03/future-will-be-captioned-improving.html

SpokenMedia: What to do if SpokenMedia: What to do if your videos aren’t in YouTubeyour videos aren’t in YouTube

Page 3: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Why are we doing this?

• More & more videos on the Web– Universities recording

course lectures– Students relying upon

Web video for courses

3

MIT OCW 8.01: Professor Lewin puts his life on the line in Lecture 11 by demonstrating his faith

in the Conservation of Mechanical Energy.

Page 4: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

What video? Where?

4

iTunes U

Page 5: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

What are the challenges?

• Search– Volume– Segmented

by Web, Video

– Text title and Description

5

Google Search for “angular momentum”Performed April 2009

Page 6: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Bing Search for “angular momentum” Performed August 2009

What about Bing?

6

Page 7: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

What are the Challenges?

• Description– Course and

Lecture Title– Summary– Metadata?

7

YouTube, MIT OCW Physics 8.01 - Lecture 20Retrieved August 2009

webcast.berkeley, Physics 8A, 002, Spring 2009

Retrieved August 2009

Page 8: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

What are the challenges? Use

• Interaction & Use– Transcripts /

captions• Do they exist?• Cost?

– Full video vs.segments

8

Lewin, W. (1999). Lec 20 | 8.01 Physics I: Classical Mechanics, Fall 1999.Retrieved August 1, 2009 from YouTube Website:

http://www.youtube.com/watch?v=ibePFvo22x4

“GOD!!!51 MINUTES!!i think i'll pass.. “

– slourdas, YouTube

Page 9: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Search thru the Static

We’re living in a video world…but only have text to use for search…

9

flickr @futureatlas.com

Page 10: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Why do we need these tools?

• Improve search and retrieval• Improve user experience• Captioning for accessibility?• Facilitate translation?• Other uses?

10

Page 11: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

YouTube Announcement

11YouTube. (2010, March 4). The Future Will be Captioned: Improving Accessibility on YouTube. Retrieved on March 8, 2010 from YouTube Website:

http://youtube-global.blogspot.com/2010/03/future-will-be-captioned-improving.html

Page 12: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Comparing SpokenMedia and YouTube Auto-Caption?

YouTube• Scale ✔• Research-basis ✔• For all videos ✔ (soon)• No transcript/caption

export (?)• YouTube hosted• Accuracy based on

general patterns (?)• No transcript editing (?)

SpokenMedia• Limited• Research-basis ✔• Service by request• Transcript/caption export

available ✔• Hosted anywhere ✔• Accuracy based on custom

models ✔ (soon)• Transcript editing ✔ (soon)

12

Page 13: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Developing SpokenMedia…

• What do we have at MIT?– Existing videos & audio, new video– Lecture notes, slides, etc. (descriptive text)– Multiple videos/audio by same lecturer– Diverse topics/disciplines

• Research from Spoken Language Systems Group !!!

13

Page 14: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Enabling Research

• Spoken Lecture: research project• Speech recognition & automated

transcription of lectures• Why lectures?

– Conversational, spontaneous, starts/stops– Different from broadcast news, other types of

speech recognition– Specialized vocabularies

14

James [email protected]

Page 15: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Spoken Lecture Project

• Processor, browser, workflow• Prototyped with lecture & seminar video

– MIT OCW (~300 hours, lectures)– MIT World (~80 hours, seminar speakers)

Supported with iCampus MIT/Microsoft Alliance funding

15

James [email protected]

Page 16: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Tech Transfer Timeline:Research -> Service

16

1990 2000 20102006

Spoken Language Systems GroupResearch

2009

Page 17: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Let’s see a demo!

17

Page 18: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Demo

18

Page 19: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

How Does it Work?

19

Lecture Transcription Workflow

Page 20: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Recognizer Accuracy? ~85%

• Accuracy– Domain Model and

Acoustic Model– Internal validity

measure – Single 100% accurate

transcript for a full course

20

Ongoing research by Jim Glass and his team

Page 21: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

What works today?

21

Lecture Transcription Workflow

Page 22: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Transcript “Errors”

• “angular momentum and forks it’s extremely non intuitive”– “folks”?– “torques”?

• “introduce both fork an angular momentum”– “torque”!

22

Page 23: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

That’s what we have today…

• Features– Video linked transcripts– “Bouncing Ball” follow along– Search within a video– Multiple transcript language support

• Challenges– Accuracy (partial toolset)

23

Page 24: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Where are we heading?

• Improved accuracy• Automate and improve processing• Search across multiple video transcripts

• Starting a lecture transcription service

24

Page 25: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Lecture Transcription Service

• Integrate with media production workflows– At MIT, University of Queensland

• Stand-alone service– Test with external content (video) producers

25

Page 26: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

A Lecture Transcription Service? Caveats

• Lecture-style content (technology optimized)• Up to 85% accuracy

– (good for search, not sure about accessibility)• English-language audio

– (need much more research for other languages)• Processing hosted at MIT (current thinking)

– Submit jobs via MIT-run service– Contribute audio, models, transcript for further

research

26

Page 27: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Test it for yourself!

http://spokenmedia.mit.edu/

http://sm.mit.edu/upload

27

Page 28: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Toward Rich Media NotebooksImproving the User Experience

• Innovative player interfaces (soon)– Bookmarking and annotation– Clip creation and authoring

• Transcript editing (soon)• Searching across collections of videos

28

Page 29: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Player with Annotation Mockup

29

Page 30: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)Unless otherwise specified this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License (creativecommons.org/licenses/by-nc-sa/3.0/us/)

Editing Interfaces

30

Soon(we’re designing the editing interfaces right now)

Page 31: SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks

Thanks!

spokenmedia.mit.edu

Brandon Muramatsu [email protected] McKinney [email protected]

Peter Wilkins [email protected]

MIT, Office of Educational Innovation and Technology

Citation: Muramatsu, B., McKinney, A., Wilkins, P. (2010). SpokenMedia: Automatic Lecture Transcription and Rich Media Notebooks.Presented at NERCOMP 2010: Providence, Rhode Island, March 9, 2010.

Unless otherwise specified, this work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License