PURPOSEPURPOSE
Recognition of Hands-Free Speech and Hand Pointing Action for Conversational TVRecognition of Hands-Free Speech and Hand Pointing Action for Conversational TVYasuo Ariki, Tetsuya Takiguchi and Atsushi Sako
Department of Computer and System Engineering, Kobe University
Conversational TV to which we can enquire the information about TV contents.
HANDS-FREE SPEECH RECOGNITIONHANDS-FREE SPEECH RECOGNITION
Multi-modal interaction with hands-free speech recognition and hand pointing recognition.Multimedia analysis to broadcasted contents.Context awareness to understand user intension.
what is the meaning of the word BROAD-BAND?
What is that?
Multimedia databaseMultimedia database
Internet
Presentation
Transform
Integration
Analysis
Retrieval
Retrieval
Presentation
Modality recognitionHand-free speechSpeaker directionHand pointing action
Front end processing
Meta data extractionNewsDramaSoccerBaseball
Back end processing
show me the goal
show me the latest event
who is he?
CONVERSATIONAL TVCONVERSATIONAL TV
Observed signal
Beam forming
Hands-free s peech input
Speaker direction estimation
User utterance section detection
Acoustic model adaptationSpeech recognition
User recognitionWho, How many
Watching historyPersonal profile,User context Speaking style
Enquire,Conversation, Monologue
Hands-freeUtterance section
Speech/Noise GMM
Emotion recognitionSatisfaction
Video editing
Preferable content retrievalSummary and explanation
Context awarenessPronoun, abbreviation
User analysis
Contents analysis, editing, retrieval
Recognition of intension
Action recognitionFinger mouse
Recognition of requirementRecognition of profile
Recognition of mind
Speech recognition
CONTEXT AWARENESSCONTEXT AWARENESS
Skin color region extraction and noise reduction from camera images
Two-dimensional coordinates estimation of a finger point and head on images
Three-dimensional coordinates estimation of a finger point and head in a real world by camera calibration
Estimation of a pointed coordinateon the screen
HAND POINTING RECOGNITIONHAND POINTING RECOGNITION
DEMONSTRATION 1DEMONSTRATION 1 DEMONSTRATION 2DEMONSTRATION 2 DEMONSTRATION 3DEMONSTRATION 3
Function You can tell your television what you want to watch.
Example
Method Face extraction and recognitionUser speech recognitionEstimation of user requirement on
what of whoPresentation of explanation video
Same speech such as “show me grand slam”,but different videos.
Mr. Matsui is on.
Mr. Ichiro is on.
Grand slam of Mr. Matsui
Grand slam of Mr. Ichiro
Show megrand slam
Function You can watch the sports in your preferable style.
Example
Method Video generation by digital camera workEvent recognitionUser speech recognitionEstimation of what the user wantsPresentation of the corresponding video
The television can understand even by talking “show me the previous goal”
Scene with eventsScene with events
Show me individual plays more
Show me the previous goal
Function You can ask your television what you do not know
Example
What is it?
It indicates the last important word.
東京地方裁判所はニッポン放送が新株予約券を発行するのは著しく不公正だと指摘して新株予約券の発行を禁止する決定を出しました。
Speech recognition result
Method Announcer speech recognitionImportant word extraction (TF/IDF)User speech recognitionEstimation of the important wordPresentation of explanation video
Scenes with the important wordsScenes with the important words
The television can understand evenby talking “what is it?”
Top Related