DiscoRank: optimizing discoverability on SoundCloud

DiscoRank: Optimizing Discoverability on SoundCloud

Amélie Anglade

• Developer at SoundCloud

• SoundCloud is the world’s largest social sound platform

• Academic background in Music Information Retrieval (MIR)

• Design, prototype and implement Machine Learning algorithms for music discovery

DISCOVERABILITY ?

PAGERANK

• The web is a graph:• nodes = web pages• edges = hyperlinks

• The (Page)rank of a node depends on the link structure of the graph

WEB AND PAGERANK

RANDOM SURFER

Nodes visited more often:• Nodes with many links• Coming from frequently visited nodes

RANDOM SURFER

Adjacency matrix A

COMPUTING THE PAGERANK

Transition probability matrix M

Probability distribution of surfer’s position

Adjacency matrix A

TELEPORT

If N nodes in graph, probability to teleport to any other node (including self) = 1/N

TELEPORT

1/N1/N

TELEPORT

1/N1/N

At regular node: invoke teleport operation with probability α and standard random walk with probability (1 - α)

Probability distribution of the surfer at any time is a vector.

That vector converges to a steady state: the PageRank vector.

PAGERANK EQUATION

SOUNDCLOUDDISCORANK

DISCORANK

Playlist

favorite

follow

featured in

• Search across People, Sounds, Sets, Groups• One unique rank vector that contains all entities

• Weight the links based on the type of event:

• User favorites Track• Track is featured in Playlist

• New big (but sparse) adjacency matrix:

UNIVERSAL SEARCH

• How do we identify content that is trending?

• The more recent a listen, favorite, etc. (event) the higher the weight

• Multiply each event (=edge) by a time decay:

• New adjacency matrix:

BACK TO EXPLORE

PERFORMANCE OPTIMIZATION

• Millions of entities(=nodes) and events(=edges)

• First DiscoRank: several hours of computation

• Trimmed down to a few minutes using:• Sparse matrix• Optimized storage of the graph in memory• Versioned copies of the DiscoRank

• So technically we could compute the DiscoRank realtime

A VERY LARGE GRAPH

• Re-mapping entity ids

• Memory optimization so the graph holds in memory:• All edges details are stored in memory in a byte[]• buffer the byte[] into an opaque byte block pool• no object• sort the buffered byte[] in place

• On disk and when computing the DiscoRank:• Delta encoded ordered adjacency lists:

• One “from” node, several “to” nodes• Delta encode the “to” node ids

USING SPARSITY

• We keep versioned copies of:• the DiscoRank vector of results• the DiscoRank graph

• We rebuild the entire DiscoRank graph from scratch once a week

• In between:• we create additional graph segments with new

entities and events• and use as prior for the DiscoRank computation

the results of the previous DiscoRank run

• Side effect:• Also allows for experimentation

VERSIONED DISCORANK

• MySQL batch jobs

• DiscoRank results stored in HDFS

• At the end of every DiscoRank run we re-load it in ElasticSearch: • For each item we combine

its Lucene score with its DiscoRank

INTEGRATION IN OUR INFRASTRUCTURE

Amélie AngladeSound/Music Information Retrieval Engineer

about.me/utstikkar@utstikkar

We’re hiring!

www.soundcloud.com

DiscoRank: optimizing discoverability on SoundCloud

Technology

Transcript of DiscoRank: optimizing discoverability on SoundCloud

Data Democratization at Soundcloud - Bruno Sá (SoundCloud)

Buy Soundcloud Marketing - Promote Your Soundcloud Music

Finagle @ SoundCloud

Collaborating to Improve Discoverability

Discoverability of online reference

Payton Metadata and Discoverability

Perspectives on Discoverability - WaveCollapse

SOUNDCLOUD TUTORIAL

SoundCloud Expert Review

SoundCloud, Web2Expo 08

Soundcloud University

Tutorial on SoundCloud

IFLA Poster: Optimizing Discoverability of Research and Scholarship

How to Buy SoundCloud Likes to Blow Up Fast on SoundCloud?

SoundCloud Promotes Michael Weissman to President ......About SoundCloud SoundCloud is the world’s largest open audio platform, powered by a connected community of creators, listeners,

SoundCloud إستعمال خدمة

Buy soundcloud plays

Soundcloud PPT-3

ConnectingSystems to Enhance Discoverability

Enterprise Node - Code Discoverability