Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University...

17
Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008

Transcript of Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University...

Page 1: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

Foraging for Music

Donald ByrdSchool of Informatics & Jacobs School of Music,

Indiana University

rev. 10 April 2008

Page 2: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

8 April 08 2

What’s the Problem?

• How much music is there?– Music holdings of Library of Congress: over 10M items

• Most is notation, especially CWMN (Conventional Western Music Notation), not audio

• Includes over 6M pieces of sheet music, 10’s/100’s of thousands of scores of operas, symphonies, etc.

• Today– iTunes: 6M tracks– P2P: 15B tracks

• Tomorrow– “All music will be on line”

• People have very diverse tastes, etc.

Page 3: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

8 Apr. 08 3

Classification: Logician General’s Warning

• Classification is dangerous to your understanding– Almost everything in the real world is messy, ill-defined– Absolute correlations between characteristics are rare

• Example: Ginger Baker says Cream wasn’t a rock group• Example: did Bach write piano music?

– People say “an X has characteristics A, B, C…”– Usually mean “an X has A, & usually B, C…”– Leads to:

• People who know better claiming absolute correlations• “Is it this or that or that?” questions that don’t have an

answer• Don changing his mind

• But lack of classification is dangerous to understanding!• Should we abandon (hierarchic) classifications?

– Of course not! they're too useful, & impossible to avoid– Just be on guard for misleading things, consider

alternatives

Page 4: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

27 Jan. 06 4

Basic Representations of Music & Audio

Audio (e.g., CD, MP3): like speech

Time-stamped Events (e.g., MIDI file): like unformatted text

Music Notation: like text with complex formatting

Digital Audio

Time-stamped Events

Music Notation

Page 5: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

rev. 15 Feb. 5

Basic and Specific Representations vs. Encodings

Audio Time-stamped Events Music Notation

CMN Mensural not.

Gamelan not.

SMF

Csound score

NotelistMusicXML

FinaleETFexpMIDI File

Time-stamped MIDI

Time-stamped expMIDI

Csound score

Waveform

Red Book (CD)

Tablature

.WAV

Basic and Specific Representations (above the line)

Encodings (below the line)

Page 6: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

27 Mar. 07 6

A Similarity Scale for Content-Based Music IR

• Categories describe how similar to query the items to be found are expected to be (from closest to most distant)

• Detailed audio characteristics in common1. Same music, arrangement, performance venue, session,

performance, & recording2. …4. Same music, arrangement, performance venue; different

session, performance, recording• No detailed audio characteristics in common

6. Same music, different arrangement; or different but closely-related music, e.g., conservative variations (Mozart, etc.), many covers, minor revisions

7. Different & less closely-related music: freer variations (Brahms, much jazz, etc.), wilder covers, extensive revisions

8. Music in same genre, style, etc.9. Music influenced by other music

Page 7: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

6 Mar. 06 7

Ways of Finding Music (1)

• How can you find information/music you’re interested in?– You know some of it– You know something about it– “Someone else” knows something about your interests– => Content, Metadata, and “Collaboration”

• Metadata– “Data about data”: information about a thing, not thing itself (or

part)– Includes the standard library idea bibliographic information, plus

information about structure of the content– Metadata is the traditional library way– Also basis for iTunes, etc.: iTunes Music Library.xml– iTunes, Winamp, etc., use ID3 tags in MP3’s

• Content (as in content-based retrieval)– Cf. tasks in Music Similarity Scale

• Collaborative– “People who bought this also bought…”

Page 8: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

8 Mar. 06 8

Ways of Finding Music (2)

• Do you just want to find the music now, or do you want to put in a “standing order”?

• => Searching and Filtering• Searching: data stays the same; information need

changes• Filtering: information need stays the same; data

changes– Closely related to recommender systems– Sometimes called “routing”

• Collaborative approach to identifying music makes sense for filtering, but not for searching(?)

Page 9: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

6 Mar. 08 9

Ways of Finding Music (3)

• Most combinations make sense & seem useful

Searching Filtering

By content Shazam, NightingaleSearch, Themefinder

FOAFing the Music, Pandora

By metadata iTunes, Amazon.com, Variations2, etc. etc.; also Wikipedia, Google!

iTunes RSS feed generator, FOAFing the Music

Collaboratively N/A(?) Amazon.com, Last.fm; word of mouth!

Page 10: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

22 March 07 10

Searching: Metadata (the old and new way) vs. Content (in the middle)

• To librarians, “searching” means of metadata– Has been around as long as library catalogs (c. 300 B.C.?)

• To IR experts, it means of content– Only since advent of IR: started with experiments in 1950’s

• Ordinary people don’t distinguish– Expert estimate: 50% of real-life information needs involve

both• The two approaches are slowly coming together

– Metadata creating “games” (Listen Game, etc.) should help a lot

– Need ways to manage both together

Page 11: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

8 Apr. 08 11

To the Rescue: Music Recommenders! (1)

• Music Recommendation Tutorial– by Paul Lamere & Òscar Celma, at ISMIR 2007– Introduction: Why music recommendation is important

• 4-5: the Long Tail -- 6-10: different types of uses– 20 Formalization of the recommendation problem

• 26-31: users & items -- 64-80: genre & other text tags– 105 Recommendation algorithms– 135 Problems with recommenders

• 136-155: social recommenders -- 156-157: content-based– 158 Recommender examples

• 159ff: social -- 168ff: content (Pandora) -- 180ff: hybrid– 184 Evaluation of recommenders

• 188ff: metrics -- 191-192: mainstream vs. eclectic users– 246 Conclusions / Future

Page 12: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

8 Apr. 08 12

To the Rescue: Music Recommenders! (2)

• Tim Westergren’s approach: Pandora– “Music Genome Project” defined 400 “genes” (attributes)– Every piece (song) has value 1 thru 10 assigned for each– ...completely manual: done by experts w/ degrees in music

theory, etc.– Mostly content-based– Has major advantages, but hybrid (social & content) is

probably best

Page 13: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

rev. 10 April 08 13

“I don’t want similar music, I want something completely different!” (1)

• Much research, many commercial ventures designed to help people find music similar to something they have

• …but what about people who want something very different?– May not be that unusual: cf. Celma & Lamere “mainstream

vs. eclectic users” slides– E.g., something as far as possible from Britney Spears

• Don has “Seriously Weird” playlist & “Music as Different as Possible” project

• How about Brian Whitman’s “Eigenmusic” approach?– Problem: parameters too low-level, not perceptually

significant!

Page 14: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

10 April 08 14

“I don’t want similar music, I want something completely different!” (2)

• How practical to make a system do depends on its representation of music– Must represent perceptual features well enough

• MusicStrands’ representation (every song is an attribute) doesn’t help much– …though might be possible to infer from network

• Pandora “music genome” (400 attributes for all music) is ideal– Find points far away instead of nearby in 400-D

metric space– Could do “Anti-Britney Spears Radio”!

Page 15: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

8 April 08 15

Good Research Is Difficult (1)

• 1. Hard to evaluate reliability of info sources– Especially difficult on the Web– Ex: www.dhmo.org

• 2. People see what they expect to see– Ex: use of kitchen sponges increases E. coli

• 3. Almost everything in the world is complex, messy, etc.– Backus (in Musical Acoustics): why musicians’

explanations in acoustics are almost always wrong– “Classification: Logician General’s Warning”– Ex: What was the first piano? What is a trombone?

Page 16: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

8 April 08 16

Good Research Is Difficult (2)

• 3. Easy to overgeneralize– Ex: Blair & Maron (1985): An Evaluation of Retrieval

Effectiveness for a Full-text Document-Retrieval System. CACM 28(3)

• Famous paper in text-IR research world• Well-thought-out, meticulously done large-scale

study• Conclusion (essentially): fulltext IR (vs. using

abstracts, hand indexing) isn’t worth the trouble(!)

• Faulty assumptions: – Litigation is typical domain, so recall is critical; no

statistical methods; storage is expensive; text must be entered for IR system

– Ex (fiction, but very plausible): Asimov short story: “Not Final”

Page 17: Foraging for Music Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008.

10 April 08 17

Further Information

• Music Recommendation Tutorial• by Paul Lamere & Òscar Celma, at ISMIR 2007

– http://mtg.upf.edu/~ocelma/MusicRecommendationTutorial-ISMIR2007/

• Paul Lamere’s “Duke Listens!” blog– http://blogs.sun.com/plamere/

• My “Information Sources for Music Informatics Students”– http://www.informatics.indiana.edu/donbyrd/Teach/

GeneralInformationSources.html