Rattle Blog

Connecting concepts: joining up the BBC

We had a good session with the BBC last week looking at their work on linked data and creating dynamic semantic richness (that’s a mouthful).

The BBC’s remit to link to external sources more often has provoked lots of thinking and doing in the area of dynamic linking. The problem with manual linking is that archive links date, it’s expensive, you can’t aggregate or do anything interesting and it’s static. The problem with dynamic linking is defining entities “well enough”, providing relevant ways into different data around a concept and making sure this data is well, ‘right’ for the context . Sounds easy but it isn’t. BBC News has been trialling Apture as one solution to linked data and better linking-out, with some success.

The two day ‘event’ was really insightful for two reasons. Firstly, to understand how DBpedia are moving forward to try and become the defacto resource for much of the development of the semantic web and secondly, how the BBC is working to provide structured, linked, content both through ‘bottom-up’ projects like programme pages and through ‘top-down’ projects like Apture and their ‘Meta’ project. The end point for the BBC is not just a better user experience and a more Public Service, but an architecture that is embedded in the web rather than on top of it which has to be good for the web as a whole.

For our part Rob presented Muddy Boots, the project we’ve been doing for the BBC, as part of the two day event.  The DBpedia chaps were also there (hello!) and were presenting some of their plans to develop the service. One of their ideas is to take dumps of some of the LOD (linked open data) sets out there.  This seems to run counter to the idea of linked data but they explained that there are efficiency gains to be had if they’re actually taking the LOD data and allowing access to it too.  It’s what Freebase do of course and there was some discussion as to whether Freebase may actually be a better resource, but the commercial nature of Freebase was felt to be an impediment to building with it (if it were then to change to a cost / fee based system, it could affect the BBC’s ability to use it).

It was nice to meet the DBpedia guys as we’re heavily plugged into their architecture for presenting disambiguation and semantic richness from named entities (we know this is a person because they have the following attributes in the db and therefore we can present the following relevant objects back e.g. homepage, age, constituency etc.). What I found particularly interesting was the work they intend to do in structuring infoboxes from Wikipedia. Infoboxes are loosely structured templates to help summarise a subject. There are thousands of infoboxes on Wikipedia and currently they can’t really be used by DBpedia because they’re not structured templates at all:

They are a broad class of templates commonly used in articles to present certain summary or overview information about the subject. In theory, the fields in an infobox should be consistent across every article using it; in practice, however, this is rarely the case, for a number of reasons.

The DBpedia chaps said that they were going to manually process 400-500 of the infoboxes for structured data as this would account for 80% of all infobox templates. That means we can start to mine the really interesting summaries in a structured way for *most* concepts, soon. Woo.

The BBC also showed some of their work defining concepts within a given body of text and then presenting back topic pages (aggregation pages) based on these concepts. Neat stuff. We’re hoping to plug Muddy in as one of their ‘meta’ sources soon. It’s an exciting time to be looking at the world of “point-at-things” as there seems to be a critical mass of work now enabling linked data to forge semantic relationships. Our hack of Muddy into the BBC Music artist pages is a prime example, drawing two disparate databases (music; news) together around a concept (the artist). It’s a neat way to make a seamless connection and a nice horizontal navigation experience for the user.

You can see Rob’s presentation, below with a hint of slideology goodness:

No Comments

Sorry, the comment form is closed at this time.



Site Navigation