Audio Media Processing -...
Transcript of Audio Media Processing -...
![Page 1: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/1.jpg)
1
Audio Media Processing
Graduate School of Informatics Kyoto University
Kazuyoshi Yoshii [email protected]
![Page 2: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/2.jpg)
2 Audio Media Processing
Statistical machine learning
Bayesian theory, deep neural network,
optimization
Symbol processing
Phrasal, syntactic, topical analysis
Signal processing
Separation, identification, dereverberation
Computational auditory model
![Page 3: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/3.jpg)
3 Keyword: Listen to Sound • Listen to “speech” ▪ Prince-Shotoku robot
◆ Simultaneous speech recognition ▪ Microphone-array processing
◆ Sound source separation and dereverberation • Listen to “music” ▪ Music understanding and performance
◆ Sound source separation and music transcription ◆ Co-playing and accompaniment
• Listen to “environmental sounds” ▪ Object detection in a disaster environment ▪ Analysis of frog calling
![Page 4: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/4.jpg)
4
Listen to Speech
![Page 5: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/5.jpg)
5 Shotoku-Taishi (Prince Shotoku) • Legendary person who can recognize simultaneous
utterances of ten persons
![Page 6: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/6.jpg)
6
• Isolated word → Continuous speech • Evaluation using three robots in a large room
Simultaneous Speech Recognition
Closeness between speakers
2002
2003
2005
2006
![Page 7: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/7.jpg)
7 Open-Source Software for Robot Audition HARK
![Page 8: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/8.jpg)
8 Shotoku-Taishi Robot 2012 • Can many-ears robots go beyond humans?
16ch microphone-array processing・ Sound directions are given
![Page 9: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/9.jpg)
9 Microphone Array Processing • Separate mixture signals into unknown number of sound
sources with unknown reverberation time
How are you?
Hello!
![Page 10: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/10.jpg)
10
Signal processing
+ Machine learning
+ Application
Bayesian Unified Formulation • A modern principled approach to the conventional egg-
and-chicken problem ▪ C.f. Errors are propagated in a cascade framework
(localization → separation → dereverberation)
Source dereverberation
Source localization
Source separation
Nonparametric Bayesian model
Hot topic!
![Page 11: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/11.jpg)
11 Localization + Separation + Dereverberation
With dereverberation
Without dereverberation
Simultaneous two utterances
![Page 12: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/12.jpg)
12 Application to Real Environment • Separate overlapping utterances in a noisy environment
Clock-tower international hall Microphone array
Observed mixture signal
Separated signals
Utterances Background noise
![Page 13: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/13.jpg)
13
Listen to Music
![Page 14: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/14.jpg)
14 Music Co-playing with Humans • Real-time score-position tracking ▪ Listen to partner’s playing by using own ears
The robot can deal with tempo change by co-player
![Page 15: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/15.jpg)
15 Lyric-to-Audio Synchronization • Efficient navigation to a section of interest
![Page 16: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/16.jpg)
16 Music Understanding • Parts-based representation of music ▪ Combinations of “pitches” and “timbres”
Timbres (filters) Pitches (sources)
⊗Composition
Superposition
Music audio signals
![Page 17: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/17.jpg)
17 Music Signal Decomposition • Timbre-based audio source separation
Reconstructed spectrogram
Observed spectrogram
Timbre weights
Estimated filters (timbres)
![Page 18: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/18.jpg)
18 Replacement of Drum Parts • Edit only drum parts in mixture signals
![Page 19: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/19.jpg)
19 Replacement of Guitar Solo • Edit only a guitar part while preserving original timbres
![Page 20: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/20.jpg)
20 Timbre and Effect Estimation • Preserve timbers and reverberation of original guitar solo
Guitar part (phrase)
Accompanying parts
Reverb.
・Audio signal ・Music score
Part separation and effect estimation
Score of new guitar solo
Synthesized guitar solo
Timbre
![Page 21: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/21.jpg)
21 Songle: Active Music listening Service • We can enjoy automatically estimated, visualized, and
sonificated musical elements of songs on the Web
Global view
Zoom view
Structure
Chords
Melody
Beats
songle.jp
![Page 22: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/22.jpg)
22 A New Form of Outreach Activity • We can amplify user contributions by using machine-
learning techniques ▪ Corrections by some users → Retraining → Accuracy improvement
→ Reward to all users
Beat correction
Melody correction
Chord correction
Structure correction
![Page 23: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/23.jpg)
23
Listen to Environmental Sounds
![Page 24: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/24.jpg)
24 Give Ears to Flying/Snake Robots • Robot audition could help in disaster
日経新聞2013年3月24日
![Page 25: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/25.jpg)
25 Audition in Flying Robots • Use a microphone array for localization
![Page 26: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/26.jpg)
26 Visualization of Frog Calling • Discriminate calling of two kinds of frogs
![Page 27: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/27.jpg)
27
• Separation and localization in a park
Visualization of Bird Singing
Reiji Suzuki@Nagoya Univ.
![Page 28: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/28.jpg)
28
Robot Audition
![Page 29: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/29.jpg)
29 Why Robot Audition?
Conventional problem: We need to speak around microphones
The microphones inevitably catch noise sound with target utterances Why?
Foo Bar
Hello ♪
Our approach: We aim to separate and recognize sounds
Sound Source Localization (SSL)
Sound Source Separation (SSS)
Computational Auditory Scene Analysis (CASA)
![Page 30: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/30.jpg)
30 Robot Audition for MC Robots • Robot MCs need to interact with multiple people ▪ Use an open-source robot audition software “HARK”
developed by Honda Research Institute Japan and Kyoto University ▪ HATTACK 25: Speech-based quiz game
A player position can be identified
by his or her voice
Inspired by the well-known quiz game in Japan
![Page 31: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/31.jpg)
31 Demo : HATTACK25 • Players can barge in when the robot is talking about
questions ▪ To answer, players have to say “yes!” first ▪ Impossible for standard dialogue systems
Panel Players
![Page 32: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/32.jpg)
32 Robot Audition for Flying Robots • Localize source sources on the ground by using
flying robots with microphones
Sound Source
Sound Source Map
Mic. Array
Estimated Sound Source Location
Robot audition is disturbed by self-generating noise
1. Learn self-generating noise (with Gaussian Process)
2. Suppress noise from input
Video of the flying robot
![Page 33: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/33.jpg)
33 Robot Audition for Rescue Robot • Estimate robot shapes and detect sound sources in
collapsed buildings
How to estimate microphone positions on the robot? Statistical signal processing techniques
Length: 3-8m & Width: 3-5cm
Move forward with self locomotion
Moving hose-shaped robot
Time Difference of Arrival
HELP!
We designed a state space model of robot posture and estimated the posture by measuring TDOA of sounds
![Page 34: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/34.jpg)
34
• Accurately estimate shapes even when obstacles exist
Posture Estimation for a Hose-shaped Robot
壁
Wall Correct
Estimated
Robot (3.5[m])
![Page 35: Audio Media Processing - sap.ist.i.kyoto-u.ac.jpsap.ist.i.kyoto-u.ac.jp/members/yoshii/lectures/... · Hello! 10 Signal processing + Machine learning + Application . Bayesian](https://reader034.fdocuments.us/reader034/viewer/2022050511/5f9c6cd9eecb277b9c10e4cc/html5/thumbnails/35.jpg)
35 Keyword: Listen to Sound • Listen to “speech” ▪ Prince-Shotoku robot
◆ Simultaneous speech recognition ▪ Microphone-array processing
◆ Sound source separation and dereverberation • Listen to “music” ▪ Music understanding and performance
◆ Sound source separation and music transcription ◆ Co-playing and accompaniment
• Listen to “environmental sounds” ▪ Object detection in a disaster environment ▪ Analysis of frog calling