Using search engines for classification: does it still work?
-
Upload
sten-govaerts -
Category
Education
-
view
404 -
download
0
description
Transcript of Using search engines for classification: does it still work?
![Page 1: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/1.jpg)
USING SEARCH ENGINES FOR CLASSIFICATION: DOES IT
STILL WORK?Sten Govaerts, Nik Corthaut, Erik Duval
![Page 2: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/2.jpg)
•Our problem
•Classification using search engines
•The setup
•The evaluation
•Conclusion
![Page 3: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/3.jpg)
TUNIFY
![Page 4: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/4.jpg)
TUNIFY
![Page 5: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/5.jpg)
TUNIFY
![Page 6: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/6.jpg)
HOW DOES IT WORK?
• manually annotated metadata
• 5 music experts at Aristo Music and different consultants
• almost 80,000 songs
• but, not enough...
![Page 7: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/7.jpg)
PROBLEMS
• satisfying the music choice of all customers
• retail and catering differ from you and me!
• new markets
• react fast on emerging music trends
• adding the full Belgian library catalog
![Page 8: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/8.jpg)
GENERATE THE METADATA
• from different sources:
• the audio signal• web sources• the Aristo database• attention metadata
• using our metadata generation framework: SamgI
![Page 9: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/9.jpg)
GENRE...
• our master thesis looked at different ways to generate genre...
![Page 10: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/10.jpg)
ONE APPROACH...
• M. Schedl, T. Pohle, P. Knees, G. Widmer, “Assigning and Visualizing Music Genres by Web-based Co-occurrence Analysis”, Proceedings of the 7th International Conference on Music Information Retrieval, 2006, pp. 260-265.
• G. Geleijnse, J. Korst, "Web-based Artist Categorization", Proceedings of the 7th International Conference on Music Information Retrieval, 2006, pp. 266 - 271.
![Page 11: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/11.jpg)
CLASSIFICATION WITH SEARCH ENGINES
using co-occurrence
![Page 12: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/12.jpg)
CLASSIFICATION WITH SEARCH ENGINES
using co-occurrence
![Page 13: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/13.jpg)
CLASSIFICATION WITH SEARCH ENGINES
Artist + Genre + Schema
using co-occurrence
![Page 14: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/14.jpg)
CLASSIFICATION WITH SEARCH ENGINES
Artist + Genre + Schema
using co-occurrence
![Page 15: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/15.jpg)
CLASSIFICATION WITH SEARCH ENGINES
Artist + Genre + Schema
using co-occurrence
![Page 16: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/16.jpg)
CLASSIFICATION WITH SEARCH ENGINES
Artist + Genre + Schema
using co-occurrence
![Page 17: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/17.jpg)
![Page 18: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/18.jpg)
Rock:
Blues:
Country:
Jazz:
Pop:
Metal:
![Page 19: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/19.jpg)
Rock:
Blues:
Country:
Jazz:
Pop:
Metal:
0,013
0,009
0,013
0,005
0,0150,009
![Page 20: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/20.jpg)
RESULTS
• master thesis student’s results were much worse
• what happened?
• did Google search result count change?
• has Google Search API different results?
• is the student’s implementation correct?
![Page 21: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/21.jpg)
HOW TO EVALUATE THIS?
• re-run the original experiment
• evaluate on the same data set: 1995 artists and 9 genres.
• different search engines: Google, Yahoo! and Live! Search.
• over time: 8 times over a period of 36 days.
![Page 22: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/22.jpg)
THE DATA SET
Blues Country ElectronicFolk Jazz MetalRap Reggae RnB
![Page 23: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/23.jpg)
THE DATA SET
9%
12%
5%4%
41%
13%
2%3%10%
Blues Country ElectronicFolk Jazz MetalRap Reggae RnB
![Page 24: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/24.jpg)
THE DATA SET
Blues Country ElectronicFolk Jazz MetalRap Reggae RnB
![Page 25: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/25.jpg)
![Page 26: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/26.jpg)
MOTION CHART
• http://hmdb.cs.kuleuven.be/muzik/gapminder.html
![Page 27: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/27.jpg)
![Page 28: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/28.jpg)
![Page 29: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/29.jpg)
MORE FINE-GRAINED...
• 18 artists
• more search engines: Google.co.uk/.fr/.be, uk/fr.search.yahoo.com
• twice a day for 53 days
• 250,000 queries!
![Page 30: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/30.jpg)
2 Pac Rap
Alan Lomax Folk
Art Pepper Jazz
Cradle of Filth Metal
David Parsons Electronic
Desmond Dekker Reggae
Downpour Metal
IceT Rap
Jerry Butler RnB
Joy Lynn White Country
Louisiana Red Blues
Lou Rawls RnB
LTJ Bukem Electronic
Peter Tosh Reggae
Pinetop Smith Jazz
Robert Johnson Blues
Roy Rogers Country
Steeleye Span Folk
![Page 31: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/31.jpg)
![Page 32: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/32.jpg)
MAIN SEARCH ENGINE RESULTS
![Page 33: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/33.jpg)
REGIONAL GOOGLES
![Page 34: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/34.jpg)
![Page 35: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/35.jpg)
WHAT TO USE?
• use Google when it’s stable else rely on Yahoo!
• when is it stable? test with a small set
• some artists get classified incorrectly on bad days
• compare the accuracy achieved with the test set to the average.
![Page 36: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/36.jpg)
CONCLUSION
• still works after 3 years
• Google -> Yahoo! -> Live! Search
• why does Google fluctuate?
• a generic version of an all purpose classifier is implemented in metadata generation framework
![Page 37: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/37.jpg)
FUTURE WORK
• understand the performance differences of regional search engines
• use alternative search engines
• tweak the genre taxonomy depending on the search engine
![Page 38: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/38.jpg)
Q & A.
![Page 39: Using search engines for classification: does it still work?](https://reader033.fdocuments.us/reader033/viewer/2022060107/5549b653b4c905e5048b4a25/html5/thumbnails/39.jpg)
DEMO METADATA GENERATION
• http://ariadne.cs.kuleuven.be/samgi-service/