BBC Music : going native on the web

BBC Music : Going native on the web Tom Scott and Matthew Shorter BBC W1


The BBC has recently launched an ambitious beta project to publish artist pages for every artist in the world, which will eventually aggregate all data and content for each artist from across the BBC, but also include external data sources, starting with Wikipedia and the open-source music database MusicBrainz. Matthew Shorter and Tom Scott presented some of the background to this initiative, including how we’ve harnessed this corner of the web to publish and keep up to date 350,000 web pages, and discuss the editorial implications of this radically new way of working for the BBC.

Transcript of BBC Music : going native on the web

Page 1: BBC Music : going native on the web

BBC Music : Going native on the web

Tom Scott and Matthew Shorter


Page 2: BBC Music : going native on the web

a bit of background


not connected with the rest of

not connected with the rest of the web

google hates us is incoherent because it’s unconnected.

Much of what we produce and broadcast is difficult to find via Google (other search indexes are also available).

Page 3: BBC Music : going native on the web

our strategy

build music credibility online for the BBC

help people find new programmes they will love

help people find new music

become the de facto place on the web for artist information

be part of the web

Page 4: BBC Music : going native on the web

being part of the web

persistent urls for every resource

semantically linked and accessible to man and machine

lots of links to others

permissive license

in other words : “linked open data”

TimBL described four simple rules to do the web right:1.Use URIs to identify things on the web as resources2.Use HTTP so people can dereference them3.Provide information about the resource when it is dereferenced4.Include onward links

What this gives you is a highly interlinked web of resources - where each resource is linked to other resources that are contextually relevant.

The idea of open linked data adds a new requirement - that of permissive licensing - so others can reuse data in new contexts.

But why?

Because it benefits us now and in the long terms. Publishing a web page or any other piece of content online is useful but if it is part of a network then it’s value is greatly increased. This is the Network effect.

One consequence of the network effect is that the addition of a node by one individual indirectly benefits others who are part of the network — for example by purchasing a telephone a person makes other telephones more useful.

By building the web in this fashion our new artist pages, although useful in their own right, become much more useful when they are joined to programmes - directly linking to those programmes that feature that artist, the same goes for events. And of course the network effect goes both ways; it goes all ways. Linking artists to programmes also makes the programme pages more valuable - because there is now more context, more discovery and serendipity.

And that’s just within the BBC. By joining our data with the rest of the web the Network Effect is magnified yet further. And that has benefit to the BBC. But it also benefits the web at large. The BBC has a role that transcends it’s business needs - we can help create public value around our content for others and for individuals and businesses.

Page 5: BBC Music : going native on the web

linked open data graph

This then is the LOD graph - a graph representing all the data sources that are semantically linked and accessible to man and machine AND published under a permissive license.

And this month the BBC has added two nodes to this graph - BBC programmes and BBC playcount data (and artist pages).

Page 6: BBC Music : going native on the web


different scale to anything the BBC has done before

need to automate as much as possible

need to link to existing BBC content

need to let others use this data

As we’ll see there are lots and lots of pages - 100,000s of them and they all need to be contextually linked up.

We needed to automate as much as possible - integrating with broadcast systems and data elsewhere on the web.

We also released our data via APIs under the backstage, non-commerical license.

Page 7: BBC Music : going native on the web


using the web as a cms with reactive moderation

musicbrainz to provide core metadata + web scale identifiers

work with others to encourage the adoption of musicbrainz

wikipedia to provide basic biographical information

integrating with broadcast systems and PIPs

integration with

So somewhat ironically - in a world where we are trying to reduce the number of Content Management Systems we have in effect ended up using a new one - the web itself.

MusicBrainz provides metadata about artists, releases and labels and possibly more interestingly web scale identifiers. These IDs, the code at the end of the URLs are unique to each artist and are being used by us, MusicBrainz obviously and

Because the more people the more sites that use these Identifiers the better for everyone - because it makes it easier to link everything up - we are working to encourage the adoption elsewhere in the industry - for example NME and the commercial radio networks.

Because MusicBrainz includes URLs for wikipedia we can go and fetch, the introductory biographical text for each artists. We then monitor wikipedia (via the IRC channel) for updates to those pages.

We are therefore able to get near realtime updates from wikipedia and updates from musicbrainz within an hour.

Internally… integrating with our broadcast systems and pushing this data into the programmes space

And finally because news stories tend to include links to the official artist homepage when they cover a story by an artist we can look for these and match them to URLs in MusicBrainz (which also include the ‘official site URLs’) this gives us a neat and automated mechanism to cross reference BBC news stories from artist pages.

Page 8: BBC Music : going native on the web


380,000 persistent artist pages

integration with /programmes

creative commons licensed album reviews

web, mobile and machine views

Page 9: BBC Music : going native on the web

coming soon

A sneaky peak at the artist pages in the not too distant future...

Page 10: BBC Music : going native on the web

coming soon

News stories and blogs contain links to the official artist pages, MySpace pages etc. in other words links that we know are about the artist because they are also in MusicBrainz.

So we can monitor BBC blogs and news stories for these URLs and if we find one then we know that the story is about that artist and then add the link to the artist page.

Page 11: BBC Music : going native on the web

Album reviews - brought inline with the new visual design and delivered via the new tech stack.

Creative Commons license.

Page 12: BBC Music : going native on the web

open data

Releasing our reviews under a creative commons license means others can use them (as long as they abide by the terms of the license). Like Channel 4 are doing with Abbey Road.

I think this is brilliant - but it did raise a few eyebrows in certain quarters!

Page 13: BBC Music : going native on the web


link to programmes and events

releases, tracks and works

off-schedule content

recommendation and personalisation

a new music site

third party content…

So what might the future look like?

There are a bunch of things that we want to do with BBC content and new functionality, plus working with MusicBrainz to extend the existing schema.

But we are also interested in how we might work with third party services. For example...

Page 14: BBC Music : going native on the web

future maybe?

What about photos for Flickr, or elsewhere?


Page 15: BBC Music : going native on the web

future maybe?

Data from Last.FM - we could include similar artists or Top 10 tracks (attention data).

Page 16: BBC Music : going native on the web

future maybe?

Or news from blogs - via Technorati

Page 17: BBC Music : going native on the web


biographies - wikipedia

photos - flickr, smugmug, photobucket etc.

music metadata [including links] - musicbrainz

blogs - technorati, google

attention data -

audio and video - ???

But there are a number of sources for this data - what should which use? which content areas should we consider? What criteria could we use to make a selection?

Page 18: BBC Music : going native on the web

possible selection criteria

technical feasibility

unique content

best of breed

license to use & downstream rights for content

terms of use for api

existing moderation/ filters


Selection criteria might include... what else? are these right?