To face the challenges media and journalism faces in our new global and digital age, we need to solve the problems of packaging, distribution and relevance. For a large public broadcasting company this means there is a new need for streamlined workflows, efficient content handling and decentralised ways to aggregate content. We at the Finnish Broadcasting Company (Yle) believe metadata is the key to accomplish this.
Investing in metadata
We have been running our Drupal-site for the Swedish speaking part of Yle for a bit over two years now, and have written a few blog posts about metadata (in Swedish). It has revolved around best pratices, about what others are doing, and about the annotating work flow.
We have on our site annotated all articles with terms from the Finnish thesaurus and ontology service Finto (previously named Onki). Last fall we also started using Freebase for people, organizations, places, events and media. Currently we have used over 10 000 general terms from the KOKO-ontology and over 5000 terms from Freebase. This whole period has been a time of investment, and now it’s time to cash in on it.
Getting the metadata sorted out
There is a project within Yle called Metatiedot kuntoon (link in Finnish) – Getting the metadata sorted out. The objective of the project is to unify the metadata used within Yle, bridge the silos created by the many systems, parts of the organization and languages used in our company. And also to open up our content as data.
For us at the Swedish Speaking department at Yle, we are sort of a miniature of the whole organisation, this has meant an opportunity as a smaller and more agile unit to experiment and try new paths. We have been the first to use the above-mentioned semantic annotation compliant with the principles of the semantic web. And we now continued with the practical job of linking the data.
We have at the time of writing (June 17th 2014) linked together Yle-content created in different parts of Yle’s organisation, using different publishing platforms and different languages. We have also indirectly been able to form connections between the written articles and the yet untagged video and audio content on our on demand streaming service, Arenan. Thus being able to enrich our knowledge of our own video and audio content through graph relations.
An API-driven approach
The ”traditional” way of building the semantic web and doing linked data has been through RDF and SPARQL. Our work has taken a bit of a different path, mostly due to limited resources and the application of lean development. For example the decision to start using Freebase, over say dbpedia, was mostly dictated by the fact that they had a better API for our purposes. We have also been uninterested (read: have not had the resources) to build our own schemas and ontologies and charting the world according to Yle. So we have chosen existing ontologies and metadata repositories to use for the annotation of our content.
And with that background RDF and SPARQL has seemed to be a too great learning curve and initial investment to make. But we haven’t wanted to part from the track set up by the semantic web effort. So we have taken great care to ensure the quality of the semantics, and safeguarded the machine readability through RDFa, Schema.org, Dublin Core and JSON-LD.
At the same time there is an ongoing large effort within Yle to put all our content up on API’s (link in Finnish). We have API’s for programmes, articles, images, user statistics and so forth. And now we also have an API for metadata.
We send our articles from Drupal to the articles-API. From there their metadata is also sent to a Neo4J graph database. And from there we can make relational queries over the meta-API. We have at the moment four different queries for our front end utilising the meta-API:
- Linking content tagged with the same tag in different languages (all used metadata repositories are multilingual supporting at least Finnish, Swedish and English).
- Giving recommendations for content with similar tagging.
- Giving recommendations for content with similar tagging, but in a different language.
- Giving indirect graph based recommendations of audio and video content from our on demand-service, Arenan, based on article-tag-relations.
These are now working on our development platforms and are at a proof-of-concept level. They will be gradually put in production as soon as we get some queries optimised for speed and our AD and front development team (of one ) gets the rough edges sorted out.
Recommendations is the key to relevance
In the image above you see an article about flooding in Serbia. In the right hand column there are two clips that have been embedded in other articles with similar metadata, showing related flooding clips. In the subject listing below the videos is a list of more reading on the same subject(s). And furthest down to the right an article that also relates to flooding from our Finnish-speaking colleagues.
Future proofing media publishing on the web
As more and more traffic to our web site comes through search engines and recommendations directly into specific content, we strive to build our service along with the credo that ’every page is a front page’. This means wherever you land on our site you should get the full public service spectrum of content and find relevant story’s to follow up on.
At the same time the editorial resources are highly limited, and we want our journalists to make content, not edit subject pages. So all this will have to be as highly automated as possible.
Additionally we are preparing for opening up our API’s for third parties to build services upon, for users to fine tune their own content flows (a first product built on Yle’s API’s is Uutisvahti / The News guard – a personalised news app) and free aggregation over the internet for humans as well as machines.
In this way we hope to keep up our relevance and presence to our audience through high quality public service journalism in todays fragmented media reality.