Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.
-
Upload
vivien-gibson -
Category
Documents
-
view
218 -
download
1
Transcript of Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.
![Page 1: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/1.jpg)
Music Database QueryMusic Database Queryby Audio Inputby Audio Input
Zvika Ben-HaimZvika Ben-Haim
Advisor: Gal AshourAdvisor: Gal Ashour
![Page 2: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/2.jpg)
Purpose of the ProjectPurpose of the Project
Software
Song nameRecorded melody
![Page 3: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/3.jpg)
Presentation OverviewPresentation Overview
DemonstrationDemonstration InternalsInternals ResultsResults ConclusionsConclusions
![Page 4: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/4.jpg)
Program DemonstrationProgram Demonstration
![Page 5: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/5.jpg)
Inside the ProgramInside the Program
Vocal Input
Segmentation
Database Search
List of Best Matches
Pitch Detection
Volume Detection
![Page 6: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/6.jpg)
Definition of InputDefinition of Input
The input is sung by a human, who The input is sung by a human, who does not need to have any does not need to have any knowledge of music.knowledge of music.
The program was optimized for The program was optimized for singing using the syllables “da-da-singing using the syllables “da-da-da” or “ti-ti-ti”. All testing was da” or “ti-ti-ti”. All testing was performed on this type of input.performed on this type of input.
Input
Pitch Detection
Segmentation
Search
![Page 7: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/7.jpg)
Pitch DetectionPitch Detection
The super-resolution pitch detection The super-resolution pitch detection algorithm achieves accurate detection algorithm achieves accurate detection values without increasing CPU time, values without increasing CPU time, by performing linear interpolation on by performing linear interpolation on aalow sampling rate recording.low sampling rate recording.
Detection is performed in a pitch-Detection is performed in a pitch-synchronous fashion (one pitch value synchronous fashion (one pitch value for each cycle).for each cycle).
Input
Pitch Detection
Segmentation
Search
![Page 8: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/8.jpg)
40
50
60
70
80
90
100
5 6 7 8 9 10
Time (Sec)
Fre
quen
cy (Sem
iton
es)
Vol
ume
Volume
Pitch
Pitch/Volume DetectionPitch/Volume Detection
Input
Pitch Detection
Segmentation
Search
![Page 9: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/9.jpg)
Segmentation (1/3)Segmentation (1/3)
Sequence of Pitches and Volumes
Sequence of Notes
Volume-Based Segmentation
Pitch-Based Segmentation
VoiceNoise
Note IdentificationIgnore
Input
Pitch Detection
Segmentation
Search
Decision
![Page 10: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/10.jpg)
Segmentation (2/3)Segmentation (2/3)
Volume Segmentation:Volume Segmentation: Possible Possible notes are identified as a region in notes are identified as a region in which the volume is higher than a which the volume is higher than a trigger value.trigger value.
Thus, it’s important to separate Thus, it’s important to separate each note by a short quiet period, each note by a short quiet period, e.g. by pronouncing “ta-ta-ta” e.g. by pronouncing “ta-ta-ta” rather thanrather than“la-la-la”.“la-la-la”.
Input
Pitch Detection
Segmentation
Search
![Page 11: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/11.jpg)
Segmentation (3/3)Segmentation (3/3)
Pitch Segmentation:Pitch Segmentation: Within each Within each segment, find the longest region in segment, find the longest region in which the pitch is relatively constant.which the pitch is relatively constant.
Noise Removal:Noise Removal: If this region is very If this region is very short, then the segment is assumed to short, then the segment is assumed to be noise, and it is ignored.be noise, and it is ignored.
Conversion to Notes:Conversion to Notes: The frequency of The frequency of the note is identified by an iterative the note is identified by an iterative averaging technique.averaging technique.
Input
Pitch Detection
Segmentation
Search
![Page 12: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/12.jpg)
Segmentation ExampleSegmentation Example
Input
Pitch Detection
Segmentation
Search
![Page 13: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/13.jpg)
Database SearchDatabase Search
Sequence of Notes
Convert to relative frequencies and durations
Find edit distance for each database entry
Sort by increasing edit cost
List of Best Matches
Input
Pitch Detection
Segmentation
Search
![Page 14: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/14.jpg)
Edit Distance (1/3)Edit Distance (1/3)
Purpose: Correction of errors in singing Purpose: Correction of errors in singing and in previous identification steps.and in previous identification steps.
Mechanism: The edit distance is the Mechanism: The edit distance is the minimum cost required to transform minimum cost required to transform one string into another. The following one string into another. The following changes can be applied at given costs:changes can be applied at given costs:• Change one character into anotherChange one character into another• Insert one characterInsert one character• Delete one characterDelete one character
Input
Pitch Detection
Segmentation
Search
![Page 15: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/15.jpg)
Edit Distance (2/3)Edit Distance (2/3)
Input
Pitch Detection
Segmentation
Search
How to make an elephant become elegant:
elephant
eleghantReplace
elegantDelete
Example:
Total edit distance is the cost of replacing‘p’ with ‘g’, plus the cost of deleting ‘h’.
![Page 16: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/16.jpg)
Edit Distance (3/3)Edit Distance (3/3)
Algorithms differ by the content of the Algorithms differ by the content of the strings being compared. Three strings being compared. Three algorithms were checked:algorithms were checked:• Parsons code: Only the direction of pitch Parsons code: Only the direction of pitch
change is compared (up, down, or repeat).change is compared (up, down, or repeat).• Frequency similarity: The direction and size Frequency similarity: The direction and size
of pitch change (e.g., up 3 semitones).of pitch change (e.g., up 3 semitones).• Frequency/Duration similarity: Both pitch Frequency/Duration similarity: Both pitch
change and relative duration of notes (e.g., change and relative duration of notes (e.g., up 3 semitones, and a longer note).up 3 semitones, and a longer note).
Input
Pitch Detection
Segmentation
Search
![Page 17: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/17.jpg)
ResultsResults
![Page 18: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/18.jpg)
SimulationSimulation
Simulations of the search engine Simulations of the search engine were performed in order to have a were performed in order to have a larger ensemble, from which a larger ensemble, from which a detection probability was calculated.detection probability was calculated.
Random noise was added to the first Random noise was added to the first few notes of a tune. The tune was few notes of a tune. The tune was then applied to the search engine.then applied to the search engine.
![Page 19: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/19.jpg)
Comparison ofComparison ofSearch AlgorithmsSearch Algorithms
0
10
20
30
40
50
60
70
80
90
100
3 4 5 6 7 8 9 10
Number of Notes in Query
Pro
bab
ilit
y o
f C
orr
ect
Iden
tifi
cati
on
(%
)
Parsons Frequency Frequency/Duration
![Page 20: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/20.jpg)
Empirical TestEmpirical Test
Subjects listened to a sample Subjects listened to a sample query.query.Then, they chose a song from the Then, they chose a song from the database, and were told to sing it database, and were told to sing it in a similar manner.in a similar manner.
Number of test subjects: 14Number of test subjects: 14Number of recorded songs: 64Number of recorded songs: 64Number of songs in database: 197Number of songs in database: 197
![Page 21: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/21.jpg)
Empirical ResultsEmpirical Results
Algorithm Identified asTop Match
Identified asTop Five
Freq/Dur 80% 86%
Frequency 77% 88%
Parsons 52% 73%
Human 45%-65%
![Page 22: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/22.jpg)
ConclusionsConclusions
Combined frequency/duration Combined frequency/duration search is the most robust search search is the most robust search algorithm tested, and outperforms algorithm tested, and outperforms the Parsons code search by a wide the Parsons code search by a wide margin.margin.
The program performs better than The program performs better than an average human under the an average human under the tested conditions.tested conditions.
![Page 23: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/23.jpg)
SummarySummary
A successful melody search engine A successful melody search engine has been created.has been created.
Real-time software implementation Real-time software implementation is possible.is possible.
The new frequency/duration search The new frequency/duration search algorithm was found more algorithm was found more effective than the existing Parsons effective than the existing Parsons code search.code search.
![Page 24: Music Database Query by Audio Input Zvika Ben-Haim Advisor: Gal Ashour.](https://reader030.fdocuments.us/reader030/viewer/2022032722/56649ce15503460f949ab4e6/html5/thumbnails/24.jpg)
The EndThe End