1
“Emotion in Music” organizers endeavor at Crowdsourcing task:A Multimodal Approach to Drop Detection in Electronic
Dance Music
Anna Aljanaki2*, Mohammad Soleymani1*, Frans Wiering2, Remco C. Veltkamp2
1University of Geneva, Switzerland2Utrecht University, Netherlands
* Equal technical contributions
2
Problem definition
• Given an electronic music excerpt, its timed comments, and labels from MTurk automatically identify whether the excerpt fully or partially contains a drop.
3
Material
• 15 second excerpts with timed comments including the term “drop”
• MPEG Layer 3 files• Metadata including the comments• Labels from the crowd• 164 excerpts with full agreement (105: full
drop; 4: partial drop; 54: no drop)• 70 excerpts with no agreement
4
Solutions
• Labels from crowdsourcing 1. Majority vote (MV)2. Dawid-Skene (DS)
• Labels from crowdsourcing + comments3. Naïve Bayesian classifier
• Labels from crowdsourcing + content4. Logistic regression
5
Using labels (wisdom of crowd)
Aashish Sheshadri and Matthew Lease. SQUARE: A Benchmark for Research on Computing Crowd Consensus. In Proceedings of the 1st AAAI Conference on Human Computation (HCOMP), 2013
6
Solution 1: Majority vote
• 3 labels each• Calculate the majority• If there is no agreement then the estimated
label is 2 (partial drop)
7
Solution 2: Dawid-Skene• Dawid and Skene proposed a method to combine a
number of uncertain decisions (clinician-patient) (1979)• The method is to calculate the confusion matrices for
every labeler using Expectation-Maximization to get estimates of these values (probabilities); initialized by majority vote.
• We then look at the probability of true response given a label from a given worker for all the three workers and pick the highest one.
• Get-Another-Label toolbox https://github.com/ipeirotis/Get-Another-Label
8
Solution 3: Majority Vote + comments
• For the excerpts with full or partial agreement we do not touch the MV labels
• For the remaining 70 excerpts– Features: • labels from workers• Number of times comments contain the term “drop” (We
did not normalize by the number of comments; it was a mistake!)
– Naïve Bayesian classifier trained on the samples with partial or full agreement
9
Solution 4: MV + acoustic (1)
• Again only the samples with no agreement were changed.
• Trained on the samples with full agreement (164 samples)
• Assumption: there is a moment of silence or quieter segment right after drop
• Energy from 100ms segments extracted and smoothed
10
Solution 4: MV + acoustic (2)
11
Solution 4: MV + acoustic (3)
• Features:– The value of the biggest local minimum in an excerpt – The fraction of the biggest minimum to an average
minimum– The number of potential drop events, as detected by
decrease in loudness bigger than threshold – The dynamic range of the excerpt
• Logistic regression for binary classification (we did not consider class 2 due to not having enough samples)
12
ResultsRun Method F1-score Full drop (1) Part. Drop (2) No drop (3)
1 Majority vote 0.69 0.72 0.31 0.752 Dawid-Skene 0.69 0.72 0.31 0.753 MV + comments 0.7 0.73 0.28 0.764 MV + acoustic 0.71 0.72 0.27 0.79
No significant improvement compared to majority vote
13
Lessons learned
• In the presence of non-malicious workers and enough labels majority vote is very hard to beat
• The scarcity of the samples from the second class reduces our performance
• In future a separate development set and evaluation set will be beneficial
14
Summary
• We primarily used the labels from MTurk since we believed it will be superior
• We proposed possible approaches taking advantage of the metadata and content when MV is indecisive
• As expected, we did not beat the majority vote
Top Related