The Echo Nest at Music and Bits, October 21 2009

Thursday, October 22, 2009

I am losing my voice.I am sorry.

I am normally louder than this.I also added text to the pictures.


A Short (Personal) History ofComputers Listening to Music

1999-2009


I was a musician for a while.Electronic music.

“Intelligent dance music”(worst genre name ever)


“Fish / Cut bait”

Handheld-music (1998-2001)I made my own software to make music


“Fish / Cut bait”

Handheld-music (1998-2001)Did it make me a better musician? Definitely not.


It was 1999. Lots of stuff was happening.


I learned about music from reading web sites.Forums, mailing lists.


You could now download a song faster than real time.I figured things would change quick.


So I went to grad school.I studied information retrieval, language processing.


Columbia University, NYC

MIT Media Labfinishing my dissertation


People were starting to apply IR techniques to music.Audio files are treated like text.

FFT frames became wordsSongs became “documents”


There’s a problem with that.Just because you can convert an mp3 to #s

doesn’t mean you understand it.


“Music IR” was born.The applications are varied, but most

have nothing to do with music.


Retrieving Music by Rhythmic Similarity

cal excerpt. (The effect of varying the truncated regions was notexamined, and it is not unlikely that other values may result in bet-ter retrieval performance.)

4.1.1 Euclidean DistanceThree different distance measures were used. The first wasstraightforward squared Euclidean distance measure, or the sum ofthe squares of the element-by-element differences between the val-ues, as used in Experiment 1. For evaluation, each excerpt wasused as a query. Each of the 15 corpus documents was then rankedby similarity to each of the 15 queries using the squared Euclideandistance. (For the purposes of ranking, the squared distance servesas well as the distance, as the square root function is monotonic.)Each query had 2 relevant documents in the corpus, so this waschosen as the cutoff point for measuring retrieval precision. Thusthere were 30 relevant documents for this query set. For eachquery, documents were ranked by increasing Euclidean distancefrom the query. Using this measure, 24 of the 30 possible docu-ments were relevant (i.e. from the same relevance class), giving aretrieval precision of 80%. (More sophisticated analyses such asROC curves, are probably not warranted due to the small corpussize.)

4.1.2 Cosine DistanceThe second measure used is a cosine metric, similar to thatdescribed in the previous section. This distance measure may bepreferable because it is less sensitive to the actual magnitudes ofthe vectors involved. This measure proved to perform significantlybetter than the Euclidean distance. Using this measure, 29 of the 30

documents retrieved were relevant, giving a retrieval precision of96.7% at this cutoff.

4.1.3 Fourier Beat Spectral CoefficientsThe final distance measure is based on the Fourier coefficients ofthe beat spectrum, because they can represent the rough spectralshape with many fewer parameters. A more compact representa-tion is valuable for a number of reasons: for example, fewer ele-ments speeds distance comparisons and also reduces the amount ofdata that must be stored to represent each file. To this effect, thefast Fourier transform was computed for each beat spectral vector.The log of the magnitude was then determined, and the mean sub-tracted from each coefficient. Because high “frequencies” in thebeat spectra are not rhythmically significant, the transform resultswere truncated to the 25 lowest coefficients. Additionally thezeroth coefficient was ignored, as the DC component is insignifi-cant for zero-mean data. The cosine distance metric was computedfor the 24 zero-mean Fourier coefficients, which served as the finaldistance metric. Experimentally, this measure performed identi-cally to the cosine metric, yielding 29 of 30 relevant documents or96.7% precision. Note that this performance was achieved using anorder of magnitude fewer parameters.Though this corpus is admittedly very small, there is no reason thatthe methods presented here could not be scaled to thousands oreven millions of works. Computing the beast spectrum is computa-tionally quite reasonable and can be done several times faster thanreal time, and even more rapidly if spectral parameters can bederived directly from MP3 compressed data as in [12] and [13].Additionally, well-known database organization methods can dra-

0

0.5

1

1.5

Tempo (bpm)

squa

red

Eucl

idea

n di

stan

ce

110 130 120 122 124 126 128 118 116 114 112

Figure 5. Euclidean Distance vs. Tempo

110 bpm

112 bpm

114 bpm

116 bpm

120 bpm

122 bpm

124 bpm

126 bpm

128 bpm

130 bpm


The worst offender: “Genre Identification”Countless PhDs on this useless task.

Trying to teach a computer a marketing construct.


Show of hands:

Is Bjork “electronic, pop, jazz”?


At MIT I convinced someone to buy lots of computers


And tried to figure out how to get musicinto music analysis


Simple things like detecting holiday musicis very hard.


I decided if I could get a computer to makeholiday music,

We could claim we understand it.


Music Acquisition (2001-)This is automatically generated holiday music

based on listening to 1,000 Christmas songs


It should be a funny joke that you can run statistics of millions of things

and “understand it.”


I built Eigenradio in 2003 to show peopleWhat computers hear when they hear music


There’s obviously so much more to musicthan the audio signaland that other stuff

is probably more important


My brother makes music with sine wavesand nothing else

and gets a 9.7 on Pitchfork.This is fascinating!


My brother makes music with sine wavesand nothing else

and gets a 9.7 on Pitchfork.This is fascinating!

Were the sine waves that good?


Review Regression (2004)Thursday, October 22, 2009

It turns out if you understand languageand audio

at the same time you start learning a lot more.


Here we predict ratings on All Music Guideand Pitchfork

By listening to the audio and reading about the artist.


Audio alone was terribleText alone was better than audio

Both together were the best.


AMG Ratings

Pitc

hfor

k R

atin

gs

2 4 6 8

20

40

60

80

100

Randomly selected AMG Ratings

Pitc

hfor

k R

atin

gs

2 4 6 8

20

40

60

80

100

AMG Ratings

Audi

o−de

rived

Rat

ings

2 4 6 8

2

4

6

8

Pitchfork Ratings

Audi

o−de

rived

Rat

ings

20 40 60 80 100

20

40

60

80

100

2

4

6

8

10

12

2

4

6

8

10

1

2

3

4

5

6

20

40

60

80

100

120.147[.080]

.127[.082]


I became interested in more ridiculous questions:“Can we find the saddest song in the world?”


So I started a company in 2005with my co-founder Tristan, also at the Lab.


Tristan is a DSP “machine listening” expertand I handled the text side


MAGIC


Why does the Echo Nest exist?


The best music experience is still very manual.I am still reading about music, not using a recommender.


& the act of listening to music is easier than ever


But data is hard.Most designers make very bad decisions

because their tools are inefficient.


Collaborative filtering (X who did Y also did Z)is so easy to make; but it’s also so terrible.


Collaborative filtering (X who did Y also did Z)is so easy to make; but it’s also so terrible.

The SQL join is destroying music.


In 2005 we modeled the worst case scenario:

In which collaborative filtering was the only wayfor an artist to get noticed.

The popular ones would eat the unknown ones alive.

3 sets of 3 artists each remained.


Set ABritney Spears

Backstreet BoysCristina Aguilera

Set BAlice in Chains

KornFaith no More

Set CChris IsaakBob Dylan

Crowded House


So the Echo Nest gives everyone great data.They can decide on their own how to show it.


The Echo Nest 2005

Somerville, MA USA2 people2 computersLots of ideas1m documents10,000 artists100,000 songs0 public facing sites


The Echo Nest 2009

Somerville, MA USA20 people200 computersLots of products5bn documents1,000,000 artistsmany millions of songs0 public facing sites


What We Do


“Know everything about music and listeners.”


“Know everything about music and listeners.”“Give (and sell) great data to everyone.”


“Know everything about music and listeners.”“Give (and sell) great data to everyone.”

“Do it automatically with no bias, on everything.”


CodeCustomers Crawling

Machine LearningNLP DSP


• Similar Songs• Tempo• Key• Mode• Time Signature• Beats• Downbeats• Segments• Timbre• Pitch• Loudness• Sections

• demographics - age, gender, location• psychographics - preferences, lifestyle• music preference • listening patterns• tastemaker profiling- writers, bloggers

Artist Data Song Data Listener Data• Tag Clouds• Similar Artists• Analytics• Familiarity • Hotttnesss• Blogs• News • Reviews• Audio• Video • Profile Sites• Misspellings• Aliases


We have a lot of data andwe have a lot of products.

We sell mostly to social networks, labels;video games; PR firms; musicians


Similarity

Acousticanalysis

Artist metrics

FeedsRemix

Recommendation

Search / TagsMetadata

Predictive analytics


The reason we are special is 2 things:

Scale and Platform


Our scale is limitless.We have hundreds of computers

We always do our computation on everything.We can learn about new music very quickly.


All Music Guide Pandora The Echo Nest

known artists 280,000 80,000 1,000,000

years to get there 18 8 1

time to understand one album 1 week 1 day <1 minute

cost to understand one album $400 $40 $0.001

Scale


Our platform is huge. We have thousands of “free” developers using our API

Our customers use the same platformSo do we.


Platform


We sell two main products:

Fanalytics is a predictive analytics toolset for artists

The Knowledge is a dynamic metadata service (recommendation, feeds, data)

for web sites


Fanalytics lets artists and labels get a viewinto the world of online music

We recommend blogs for artistsWe show predicted analytics on activity


Predictive analytics

Artist metrics


We also maintain a popular open sourceremixing community and code baseso people can make awesome free

mashups, remixes, web sites using our tech

Not much of a business but we love it.


Remix


“DonkDJ.com” was made using RemixIt automatically “donks” (ask someone what this means)

any song you upload


Morecowbell.dj adds cowbell to any song

This Is My Jam was a pre-Muxtape (by one day)mixtape sharing site that only let you use 30s samples

and made a total mess of the output.

Like I said, not much of a business.


We also have artists using Remix-- our data is now powering some next generation

electronic music


I’ve always wanted to hear Michael Jackson trying to sing Amerie’s “One Thing” automatically by comparing

timbre, pitch and loudness distances.

-B.L.


James Brown... FOREVER.


Remix also works on video


Let’s hear Daft Punk’s “Revolution 909” played by a fight scene from Undefeatable!

-Y.A.


Our analysis data powers a lot of visualizers andvideo games (rhythm games on your own MP3s)


Acousticanalysis


The Knowledge is a much better music data serviceCustomers can subscribe to constantly-updatedsimilarity, metadata, feeds, recommendations, etc


Our similarity and recommendation data is some of the best, because we use so many sources

and we know about all artists even if they are tiny


Similarity

Feeds


Since our similarity is based on so many features:popularity, audio analysis, text analysis,

structured metadata, influences, ...



structured metadata, influences, ...We provide our customers with the knobs

and let them decide what is important for the task.



structured metadata, influences, ...We provide our customers with the knobs

and let them decide what is important for the task.We do not give a “single answer.”

There is no single answer.


Similarity


We can build paths between artists on any vector


Similarity

Acousticanalysis

Search / Tags


Our future:


1. Listener analytics


We’ve been running large scale data miningon millions of listeners to help with analytics,

for example a gender predictor based on your music taste


Here’s the basis vectors; strongest correlators of gender:


Male Female

Pet Shop Boys Eternal

Fort Minor Metro Station

Justice Gackt

Mike Oldfield Paolo Nutini

U2 London after Midnight


2. More musicians to use our remix tools


(I’ve noticed the better you are with computers,the worse your music is. This may just be me)


0%

25%

50%

75%

100%

nothing not much a little somewhat pretty good expert dork prime

Mus

ic g

oodn

ess

Computers know-how


3. Search anything APIs


We will soon make all of our acoustic dataavailable for searching and browsing

(right now it has to be your content):

“Find me a drum hit in this collectionthat sounds like the break in ‘Single Ladies’”


Combined with Remix this will allow anyoneto compose music that uses all music in the world


>> from echonest import search>> segments = search.query(“voice”, soundsLike=”bjork”, pitch=”F#”)>> len(segments)65706>> new_song = random.shuffle(segments).write(“bjork2009.mp3”)


To wrap up:

1. Don’t trust computers


To wrap up:

1. Don’t trust computers2. But trust us, really


To wrap up:

1. Don’t trust computers2. But trust us, really

3. Sorry I can’t speak very well


The Echo Nest at Music and Bits, October 21 2009

Entertainment & Humor

Transcript of The Echo Nest at Music and Bits, October 21 2009