1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High...
-
Upload
ashley-mcdonald -
Category
Documents
-
view
213 -
download
0
Transcript of 1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High...
11
ELEN 6820ELEN 6820Speech and Audio ProcessingSpeech and Audio Processing
Prof. D. EllisProf. D. EllisColumbia UniversityColumbia University
Midterm PresentationMidterm Presentation
High Quality Music Metacompression Using Repeated-Segment Residuals
Asheesh KashyapSpring 2005
22 ELEN 6820
Music contains much self-similarity and repetition at various levels of detail (many repeated segments).
Remove redundancy by storing a single copy of a repeated segment, and then referencing it every time it is used.
Can be used in conjunction with other audio compression techniques, such as MP3 (hence, metacompression).
Concept has already been explored by Joseph Hazboun, “Detection of Audio Similarity for Redundancy Removal”, ELEN 6820, Spring 2004.
Midterm Presentation
High Quality Music Metacompression Using Repeated-Segment ResidualsHigh Quality Music Metacompression Using Repeated-Segment Residuals
Music CompressionMusic Compression
33 ELEN 6820 Midterm Presentation
High Quality Music Metacompression Using Repeated-Segment ResidualsHigh Quality Music Metacompression Using Repeated-Segment Residuals
Previous WorkPrevious Work
Hazboun’s MethodPhase I: Divide song into 1 sec segments, and correlate each segment with
every other segment. Keep values with corr > 0.78 .
Phase II: Group successive 1 sec segments together.
Phase III: Find similarity of 256 ms pairs with corr > 0.82 (fine tuning).
Phase IV: Perform alignment of segments using a 2 ms STFT correlation.
Phase V: Compare segments based on sum of spectral energy over each frequency, and discard segments with similarities < 0.995*
Phase VI: Remove overlapping segments, and define new, longer similar segment (new start and end points).
Phase VII: Encode audio stream by removing redundant segments.
* In Hazboun’s example, identical tune with different lyrics has correlation of 0.968.
44 ELEN 6820
Current methods, such as Hazboun’s method, apply simple replacement scheme for repeated segments.
Imposes high standards for audio similarity (corr > 0.995) Audibly dissimilar segments removed from consideration (conservative).
Extension A: can relax similarity constraint by storing residuals (error difference between reference and repeated segments).
Extension B: separate music and voice components (music has more self-similarity).
Validate performance using two samples from contemporary, techno and classical music.
Midterm Presentation
High Quality Music Metacompression Using Repeated-Segment ResidualsHigh Quality Music Metacompression Using Repeated-Segment Residuals
Extensions to Previous WorkExtensions to Previous Work
55 ELEN 6820
Residuals: error difference between reference and repeated segments.
Midterm Presentation
High Quality Music Metacompression Using Repeated-Segment ResidualsHigh Quality Music Metacompression Using Repeated-Segment Residuals
Extension A: Residuals Extension A: Residuals
-reference repeated
=
residual
Transmitting residuals allows more precise reconstruction of original waveform (higher quality), and relaxes audio similarity constraint.
Residuals should compress well, as they contain much less information than original signal (lower amplitude and / or fewer components). Basis of Basis of video compressionvideo compression.
66 ELEN 6820
Change Phase V to relax the similarity requirement from 0.995 down to 0.945 in 0.010 increments.
This should allow us to compress segments with similar music, but different vocals.
Modify Phase VII to generate residuals for repeated segments instead of removing the segment.
Midterm Presentation
High Quality Music Metacompression Using Repeated-Segment ResidualsHigh Quality Music Metacompression Using Repeated-Segment Residuals
Extension A: Modification to Hazboun’s Code Extension A: Modification to Hazboun’s Code
Convert wave to MP3 and compare compression with baseline (i.e., converting unmodifed song from wave to MP3).
Convert MP3 back to wave files, decode and compare SNR with original decoded song.
77 ELEN 6820
Separating voice from music may result in improved compression.
Changed lyrics produce different formants, can hamper our correlation/alignment.
Challenging part Separation of music and voice is an extremely difficult problem. Compressing voice and music components separately requires two
streams or files (compression needs to be much better).
Perfect separation is not required for our purposes (our goal is compression).
Correlation and alignment performed on segments with voice removed, but encoding uses original segments (music component will be maximally compressed).
Midterm Presentation
High Quality Music Metacompression Using Repeated-Segment ResidualsHigh Quality Music Metacompression Using Repeated-Segment Residuals
Extension B: Separating Voice from MusicExtension B: Separating Voice from Music
88
Time
Fre
qu
en
cy
Clip from U2: "Real Thing"
0 0.5 1 1.5 2 2.5 3 3.50
1000
2000
3000
4000
5000
Formants still visible in presence of music.
Use cepstral analysis to find max. peak in range 70-255 Hz (voice excitation pitch) for each timeslice.
Build a filter bank that attenuates frequencies at pitch harmonics.
Take derivative across spectrogram to minimize horizontal bands (musical notes).
Midterm Presentation
High Quality Music Metacompression Using Repeated-Segment ResidualsHigh Quality Music Metacompression Using Repeated-Segment Residuals
Extension B: Simple AlgorithmExtension B: Simple Algorithm