This site may only be a placeholder for a more permanent home. (To gain access to edit these pages, please contact Brandon Muramatsu at mura [at] mit [dot] edu.)

About the Lecture Transcription Service

The MIT Office of Educational Innovation and Technology (OEIT) is exploring the development/transformation of the existing Spoken Lecture Project research into a production-ready Lecture Transcription Service. The work described here builds upon the Spoken Lecture Project developed through the MIT/Microsoft iCampus Alliance by James Glass and his team at MIT.

When we begin to look at lecture transcription as a service, key aspects seem to be:

  • queing
  • domain model creation
  • speaker model creation (optional)
  • processing (audio disaggregation, speech recognition)
  • video/transcript linkage for display via a browser/viewer
  • closed caption
  • final compile

In a generic video production/webcast workflow, the lecture transcription service probably should occur somewhere between capture and transcoding to multiple formats. It could occur after transcoding, but then one would probably have to process multiple versions of a given video.

Caveat: This service should not necessarily be viewed as a solution for 100% accurate automated transcription that a university or cultural organization might require.

Project Docs

Project Plan March 2009
Use Cases
Podcast System (MIT)

Meeting Notes: LTS ConfCall 040109-040209

Spoken Lecture Project

The Spoken Lecture Project processes video to create a transcript and/or segmented video to enable the two to be linked via a project-developed browser. Through the Spoken Lecture browser one can search for a term/phrase, select the video to watch, view the transcript and control the video by selecting within the transcript. The current browser implementation uses Real video. The current technology resides on and runs from Jim Glass' research cluster in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

The technology is tuned to "lectures"--the typical speaker format is important (or at least appears to be) and that the vocabularies used in many "lectures" do not match the existing, common word dictionaries used by speech recognition programs (e.g., Broadcast News).

The technology requires any number of documents (e.g, existing transcripts, text from slides, research papers, indexes from textbooks, etc.) that are used to create a or augment an existing domain model, in addition to the input video. The domain model is critical for speech recognition, transcript development and ultimately video segmentation for search and retrieval. (Research has shown the poor overlap between existing speech recognition dictionaries and the terms used in typical science and engineering academic lectures.) The technology can be used to create a speaker model as well.

Collaborators

CSAIL: James (Jim) Glass, Scott Cyphers

OEIT: Brandon Muramatsu, Andrew McKinney, Peter Wilkins

University of Queensland: Phil Long, John Zornig

Contacts/Clients

This is a list of potential partners and/or clients

MIT

University of Queensland

External

OpenCourseWare (Cec d'Olivera, Kate James)

Medical Patient Examinations

OpenCast/Matterhorn (Mara Hancock)

AMPS (Larry Gallagher)

Robyn Williams Interviews (Science-related, Australian Broadcast Company)

UC Berkeley ETS (Mara Hancock)

Physics Lectures (Peter Dourmashkin)

 

Cal State University System (Gerry Hanley)

Shakespeare (ask Peter/Andrew)

 

Cal State Sacramento (JP Bayard)

 

 

Johns Hopkins School of Public Health (Sukon Kanchanaraksa)

 

 

Yale OCW (Jeff Levick)

 

 

Stanford (???)

 

 

Japan OCW (ask Vijay)

 

 

VideoLectures.net


  • No labels