DUI-heatmap Finland 2012

A month ago I created a heatmap visualization of driving under influence records in Finland 2012. The story and visualization itself were fairly popular on svenska.yle.fi during the week of the release, ranking up to be the most read news-article (see inrikes category artikel.har-aker-rattfylleristerna-fast-se-karta.sivu). It also raised a fair amount of feedback trough the comments section about usability of heatmap visualizations in general.

Heatmap
Heatmap of DUI-incidents at Helsinki area

Heatmaps are a bit tricky to use as visualizations. XKCD-comic has tackled the essence of it. If a map correlates heavily with the population distribution, it may only tell to the viewer what people can already expect, things happen where there is a lot of people. Also when the visualization is dynamic, you get more details on demand. When you zoom in, you’ll see more details and when you zoom out you see less details..which can mean that there will be a huge red blob on top of a map at certain distance.

The data for the map was acquire by a journalist via a information request to police. We received it as an excel-form with 20352 lines of raw data of DUI incidents in Finland. Data included an identification number to police records, municipal area and a mysterious area code, street-address of incident, weekday of incident and date and time of incident. As usual the data was somewhat ‘dirty’ and required thorough clean-up to make it fit our needs.

Screen Shot 2013-03-21 at 12.44.34 PM
Raw-data in Excel

Data itself may reveal patterns and information about measurement and how it was collected. In our case we noticed that the naming convention of incident addresses is fairly imaginative. There’s an address-field that police officers fill in, among other things, when they report the DUI incidents into the system. This field should contain an address of incident so it can be pinpointed if necessary. However there doesn’t seem to be an unified practice of reporting the addresses.

Address field seems to be filled in many different ways which makes the exact pinpointing of the incident later on a fairly difficult task. For example an incident address that has happened at crossroad of two roads might be marked as road1 X road2, or road1, road2 crossroad or road2xroad1 or  road1 at crossroad road2 etc. Or the address-field has been used to give more information about place of incident instead of just an address, i.e. Address xyz, from the yard.

My personal favourite version of the data in the address field was ‘Kylän Kohdalla Jäällä’ = ‘at the region of the village on the ice (lake)’, which couldn’t be more vague in terms of locating the exact spot.

Sure, all these addresses could probably be pinpointed by someone with knowledge of local surroundings, but for outsiders the location stays a mystery. For a future implementation of such a system I’d highly recommend adding the possibility to insert longitude and latitude coordinates field. But enough ranting and back to the topic.

The dataset is fairly large (20k+ addresses) and if one wants to get a overview of geographically distributed data one method to approach this is to plot incidents on to a map. But what tool to use?

I personally prefer to use existing methods and tools to visualize and gain insight on data and after googling for a while I bumped into a heat map javascript library called Heatmap.js. So it was chosen as a prominent technology to implement the test-case. A random city of choice was chosen -Tampere.

The next task was to figure out how to translate this given data from excel-format into a format that Heatmap.js accepts. Heatmap.js uses a list of longitude and latitude coordinates and a weight of how many incidents are at those coordinates. The weight can be calculated in Excel by counting the amount of occurrences of addresses, but the addresses need to be converted into coordinates. This conversion of address into longitude and latitude is usually referred to as geocoding.

Geocoding multiple addresses
Geocoding multiple addresses trough yahoo api

There exists a fair amount of web-based tools to quickly geocode a single address into gps coordinates. I used http://www.gpsvisualizer.com/geocoder/. It’s probably not the best, but it gets the job done. The downside of it lies in the ability to geocode multiple addresses. Gpsvisualizer.com offers an ability to use yahoo geocoder for multiple addresses, but sadly it’s fairly inaccurate in Finland. For example I tried to geocode street-addresses from Tampere and ended up getting a wad of similar coordinates.

But it’s possible to geocode a single address at a time with Google geocoder at gpsvisualizer.com, which gives much more accurate readings, but as you would guess it is also a lot slower method. As I had a deadline breathing down my neck and 20k+ addresses to geocode I gave it a shot. In my case roughly a week of typing and copy+pasting, so clearly a slow method.

Geocoding one address
Geocoding a single address trough google api

A slightly better way to do this is to use Google geocoder api and create your own geocoder. This is the method I would use to geocode multiple addresses now if I’d have to do another address based visualization. But at the time I was working on this the deadline was looming on and I was busy with getting the visualization forward. So I accepted the fact that I’d be spending a considerable amount of time on smashing the same button combination over and over.

Screen Shot 2013-03-21 at 4.36.09 PM
Geocoding multiple addresses trough google api

After the data was in a appropriate format for visualization tool it was a fairly straightforward job to fine tune it and release it.

Well, what I learned from the project or at least reconfirmed, is that 80 percent of the time goes into purifying, transforming and fiddling with the data to get it into a format that can be represented and the rest 20 percent is fine-tuning and adding extra functionalities into it. An interesting dataset and a fun project, and when the data is interesting it seems to translate fairly well into ‘news’.

the last round

Hej

After 2 weeks of intensive building,testing and we are now 5 days from the premiere. 12.3 14.30 (local time) we will air our first live webcast from our studio.

What has happened:

in week 8 (18-22.2) we had our pilot-week were we did tested both the technical aspect and the program-sudstans.

In the technical way I was forced to use a old Sony Anycast (SD-SDI) as a mixer, my PTZ (pana HE120) arravied in the afternoon the day before the first pilotday and my AG-AF101 arrived just before the last pilotday.

My conlusions of the pilotweek were:

– I can’t get Screenmonkey to work as a insert playout-system over SDI (Decklink) even if it works OK with my laptop over HDMI.

– For intercom it is easier to take an old Clearcom and make it work together with my BMD StudioConverter than to use Datavideos intercom, even if I loose the tally.

– CasparCG is working very well as just a videoplayout system.  When I finally did get my CasparCG sytem up and running I was amaized how easy it was to use . I still haven’t had the time to figure out how I configure both my channels on my Duo-card to play out simontainsly two different signals.

– and my LED-lights are still on their way from China towards Finland and will arrive ”too late”. This is one of those times that I’m glad working for a big company were it isn’t a problem to lend 6 Arris and 6 miniflows for a couple of weeks.

 

I will post some pictures as soon as I have the possiblitys. Also I will try to dig a bit deeper in to CasparCG.

 

best Markus

The making of a Yle Drupal Distro (YDD)

From the start of the project (Swedish Yle’s New Drupal (SYND)) we had a goal to build it as a distro. What is better than that someone else can use the same code base? 🙂 With YDD we reached the internal distro milestone, an important step if we also some day want to be able to make it a public distro.

During January we spent two weeks making our own code more abstract (for some reason there is always something that gets hard coded), making new features that were wanted for the PoC (Proof of concept) of FYND (Finnish Yle’s New Drupal), setting up a new dev environment and splitting up the install profiles to support YDD, SYND and FYND.

The structure we decided to use is as follows: YDD contains the basic functionality that all sites should have. SYND and FYND each contain their separate modifications. Example; recipes, maktbasen and some Svenska Yle specific styling is found only in SYND. The themes in fynd_themes and synd_themes are subthemes of the theme in yle_themes.

The GIT structure and install profile logic:

FYND

YDD

SYND
fynd_modules
fynd_features
fynd_themes
fynd_profile
yle_modules
yle_features
yle_themes
synd_modules
synd_features
synd_themes
synd_profile

With this structure we are able to use a common core, and if needed add more functionality, translations or styling per site. Issues we ran into include views that were created when the system was in Swedish. It does not work to then switch the system to English before exporting, the view will be in Swedish.

Another issue was that a view that limited content based on a taxonomy was exported with the VID, and not the machine name of the taxonomy. Solved by changing the view so that the machine name was used in the filter.

We also had some displacements of fields and taxonomies. They had been placed in the wrong feature, so they were not available when some features needed by SYND were not enabled in FYND.

Moving theming to the features and modules was also something that we now needed to do. Not all the views that had styling was going to be used in both SYND and FYND.

The RSS feed generated by Views is one open question. It would be good to share them between the install profiles, but settings like Site name and headlines are hard coded in the view.

Being more thorough in with what is placed where and making sure that the language always was English would have made the process even smoother.

One thing we should have done in the early phase is to change the name on the live production install profile. That we tried to do it at the same time as we had moved some of the modules from yle_modules to synd_modules caused problems when migrating the production database.

The process itself turned out to be a good way to also do a review of the code. Even though we have had two external reviews (thanks Bala, @dasphere) it sometimes takes deep knowledge of a site to realize that something has unnecessary overhead. Having to split the site forces you to look at parts of the code that otherwise does not get looked at that often. We ended up merging and/or removing four features/modules.

Working with Jari Lana who does Drupal for Ylex, Oppiminen & co was also a good way to get a fourth review, from someone who also could make improvements directly in the code.

Even if FYND is never launched, the work will not go to waste. SYND has reached a new abstraction level, got new features and improvements. If the need arises, a new site can be up and running in no time.

Currently svenska.yle.fi is running on the new YDD + SYND setup.

Who is Jarno?

Greetings everyone!

My name is Jarno Marttila and I am the new ‘Teemo’, or incase you don’t yet know Teemo. I’m the new datajournalist for Yle Svenska. I joined the merry band of YLE just a few weeks ago, mid-January.

Well who am I. I guess I’m a lot of things, or maybe one could even say in Finnish a ‘jokapaikan höylä’ (a jack of all trades..) when it comes to data and information analysis and visualizations. I’m 28 year old Diploma Engineer with a major in Hypermedia, though I have studied a little bit of this and a little bit of that on my way.

Past three years I have worked as a researcher at Intelligent Information Systems Laboratory, previously known Hypermedia Laboratory, at Tampere University of Technology with tasks involving all kinds of cool things, including but not limited to, Social Network Analysis and Information Visualization and web-development. My tools of choice in graph and network visualization have been Gephi, Gource, and Javascript libraries such as d3.js, JIT and highcharts when it comes to visualizing and analyzing data. In web-development I’ve mainly dealt with drupal.

In my Master’s thesis I studied on data-driven social network analysis in context of Children’s Parliament of Finland. Lately I’ve been into information visualization techniques and methods of creating insight into complex data-sets. At TUT I’ve done many projects related to gaining and communicating insight into different kinds of data. Whether it has been studying an impact of Government Official in social media or mapping service potentials for customers customers in heavy industry.

Cliques, networks, outliers, factors and facts that explain why data is what it is, and what connects to what, and why things are how they are excite me. Hence a jump into telling stories with information visualization and data-analysis in context of news, or in grander terms in context of data-journalism was a natural leap of faith for me.

At YLE I wish to create interesting and important data journalistic stories for people to consume. As there’s almost nothing more intriguing than succeeding in finding stories from data and implementing them so that they communicate to the readers.

You can find me also from:

LinkedIn
Twitter

The Finnish Twitter elite, and then some

Twittercensus’ charting of the Finnish speaking Twitter users reveal several interesting tendencies that before have been much discussed but less scrutinized, especially in numbers. Judging from the main cluster we have an homogenous and small crowd of active users, active both online and in society it seems.

This charting made by Hampus Brynolf (@HampusBrynolf) of @Intellecta is solely based on language. So this is essentially a chart of Finnish speaking Twitter. You could be located in Fiji, as long as you’re writing in Finnish, you’re in. That means on the other hand that Finns tweeting mainly in English (for example Alexander Stubb @AlexStubb) or in Swedish (for example Peppe Öhman @peppepeppepeppe) are excluded from this survey. A more specific description of the methology can be read here.

The small Finnish Twitter

First of all, it really is a small crowd of active users we encounter here. Twittercensus lands at the figure of 64K Finnish speaking Twitter accounts, of which only 26K are active (1 tweet/30 days, these are included in the graph) and an even smaller crowd of 5000 accounts that are judged to be very active (1 teet / day). According to a previous estimate Finnish Twitter accounts would be as many as 300K. That of course would include tweeps using all languages as well as passive and/or lurking accounts. Whereas Toni Nummelas @toninummela ongoing Twitter survey has by 18.2.2013 found over 25K active Finnish Twitter accounts.

The homogenous Finnish Twitter

The graph is also quite homogeneous  There is only one clearly obtruding cluster consisting of tweeps with a shared interest in manga/anime (please correct me if this description is inaccurate). Beside this on only six other clusters are to be found, as seen in the picture below.

Twittercensus Finland cluster map
Twittercensus Finland cluster map

The six main clusters blend into each other to a very high degree. When you look closer at them there are some noteworthy tendencies to be found. First of all there is media everywhere, and not separated into a cluster of its own. Most news media is in the light blue cluster, as are the politicians. This could potentially be an interesting base for a detailed analysis of the power structures between politicians and news media.

But there is also media to be found in the green business cluster (e.g. @KauppalehtiFI), in the entertainment cluster (e.g. @Maikkari) and in the sports cluster (e.g. @YleUrheilu). How these large clusters integrate into each other is well illustrated by one of Finland’s largest Twitter profiles @TuomasEnbuske. He sits as a journalist surrounded by the mainly light blue journalists/politicians cluster. But he is himself counted into the purple entertainment cluster, most likely largely through his followers who are mainly entertainment-oriented. Also note into how large an extent the different clusters blend into each other when you zoom in and look closer at the cluster structure of the surprisingly homogeneous graph.

Tuomas Enbuske and the graph of many colours

The Nordic perspective

Hampus Brynolf has made similar surveys over the Swedish, Danish and Norwegian language Twitter users. Putting these figures side by side gives some rather interesting insights.

Twittercensus Nordic comparison chart
Twittercensus Nordic comparison chart

Finland and Denmark are both relatively small Twitter nations with quite similar figures and degrees of engagement. Sweden and Norway are both significantly larger Twitter nations. But note that the population of Norway actually is slightly smaller than the populations of Finland and Denmark. Thus making Norway the Nordic Twitter giant per capita. That is also to be seen in the relatively larger amount of the Norwegians’ social spheres. It is also interesting to note that Swedes and he Norwegians are clearly more active tweeters than Finns or Danes. So it seems an active community feeds itself and promotes growth.

Comparing the Swedish and the Finnish graphs also strengthens the notion of Finnish Twitter being very homogeneous. In this mapping of the Swedish graph there are no less than 25 different clusters identified. (Note the light blue Finnish Swedish cluster at the left.) One significant difference is the regional clusters clearly identifiable in Sweden, seem to be completely missing from the Finnish graph.

Twitter elite?

So what should we make of it all? Is Finnish Twitter basically a small tightly interconnected elite setting the agenda? Well, I do see several signed implying precisely that in the graph. But that is of course not the whole truth. I see it as a mainly positive thing that media can be found within all groups of interest. And I read the lack of regional clusters mainly as a democratic tendency where all have the opportunity for everyone to have their voice’s heard within this community without regards to place of activity. But there has also for a long time been a strong over emphasis in media of Twitter as a platform. So it goes both ways. Twitter gets far more media attention than any other social network in relation to its relatively low amount of users. And that is in itself a rather compelling reason to join the conversation.

— — —
More on the subject:

5 days to first pilot-webcast

Its now 5 days to go to our first week. On tuesday we will try to make the first pilot (which will not be webcasted)

I have realized that  I will have about half of the the stuff I need. My biggest fear is that my cameras should arrrive to Finland on monday so I would have 1 night to get them working. In a way its nice to know that the lights and the light-rigging wont be ready so its I can work with secondary solutions. On the solutions I have made the following as mixer I will use Sony MCS-8M and the cameras will be Panasonic AG-AF101 and HE160-robotcameras.

The fun part is that the parts that I have seems to be working quite well so I’m semi-confident that most of the things will run smoothly.

I attach new pictures from the studiofloor and the controlroom

markus

x3m_2_1x3m_2_2x3m_2_3x3m_2_4x3m_2_5

 

DIY TV-station

Hi

I stumbeled over the following link when I was googling if screenmonkey can be output through a Decklink-card

DIY BROADCAST : How to build your own Internet TV Channel with Open-Source & other goodies

DIY_broadcast_platform_exportweb

 

This blogg by Nicolas Well shows in a for me as a public service-man that the industry still needs public service-companies who are not affraid to try to build something that they need and then share it for free (meaning Ingex and CasparCG).

I could also tips everyone to go and check out the BBC reasech and development. Amaizing stuff

b Markus

Running scared?

Good morning.

After a good nights sleep I starting to doubt that I should choose the BMD ATEM 1ME (2ME) switchers. The reason is that I ”found” the Sony MCS8M switcher. It is a updated version of the old Sony Anycast. The kitchenpsycholog in me tell me that this is a way for me to play it safe. I have been using Anycast for over half a decade without any problem so due to my limited timeschedule I just think that MCS8M is a way for me to buy myself more time. What I don’t know is if it’s a better switcher or not.

http://pro.sony.com/bbsc/ssr/product-MCS8M/

Markus

webstudio in progress

Hi

Now I have come so long that I have started to doubt myself. But still I have started to do a first flowchart

I did just get an Sony BRC-H900 for testing and I was suprised in two ways. The first was when I had it connected to a Sony HD-monitor and then I thougt the picture was brilliant. Then I took it to our studio and connected it as SD beside our 6-year old BRC-300 made a spot with both daylight and tungsten beside each other, Took down the lighting and compare the two pictures in the same monitor. I was really suprised to see that I could easily mix those pictures together. Of course the H900 had more details in black and little less noice, but not so much that you could justify to put around 10000€ per camera. Of course if your going to go HD then it is a differrent thing, but still I now leaning towards the Pana HE120. Next week I will have the opportunity to test H900, HE120 and AF-AG101 with my TVS-switcher witch I think will tell me a lot. I will record a short sample if I remember.

I did take some pictures with my phone from floor where we will build our studio.

and here will our controllroom be.

markus

Webstudio

This is my attempt to take an new approach to technical planning. I know that I find it interesting to share and discuss my opinions about how things should be function. I also believe that there are several point of views that I haven’t thought of.

This is a call to make something different.

What are we doing.

We are building what we call a Web-studio with the possibilities to stream live. The project has two phases

Phase One: We build in Radio X3M:s office a studio-floor and about 30 meters from that in a old radio-control room our video and audio-control-room.

Phase Two: Near the control-room we build a small studio (5x6m) for more static program. The vision is an videowall and 4 PTZ.

Our deadline for Phase One is that our first program should air 12.3.

The program will consist of interviews (2 or 3 persons), some pre-made inserts and sometimes some artists who will perform live acoustically.

From the INNEHÅLL there has been the following requests

– a visual look that reminds of DSLR

– a screen to play live stuff like Skype, webpages, Youtube-videos and so forth.

– we have 2 cameramen and 1 sound for 3 hours every recording session

 

My thought about have to build it is

2x Panasonic AG-AF101 cameras

3x Panasonic AW-HE120W cameras

Mixer BMD ATEM M/E1

X-keys-controlboard (1×24)

2x Panasonic AG-HPD24 P2-recorders

BMD 16×16 videohub

2x BMD Camera converter and 1x Studio converter

1 digital soundboard (probably an Yamaha)

 

If i split up why I consider the following setup we have the following

YLE is a P2-company who’s productionformats are 567i50 or 1080i50.

 

The cameras i did check the following

Canon and Nikon DSLR has an output that is 1080i60 (checked with 550D, 7D, 5D mkII & D5000)

Panasonic DSLR could work but HDMI-out is problematic

BMD Cinema Camera has an output of 1080p25 and can’t promise delivery

Panasonic AF101 has 1080i50 out through BNC

with the PTZ I checked both Panasonic HE120W and Sony BRC-Z700. Somehow I just like the Panasonic picture more. But has anyone used both so that you have hands on experience?

As mixer I thought that ATEM M/E1 is the one who is most versatile and can handle both HD and SD. The and don’t think that the Panel is needed but think that you its more effective to have a set of X-keys to mix and then with our mouse control keys graphics and more. Maybe this is influenced because I don’t see that the mixer isn’t the one who chooses graphics or starts inserts with more.

I don’t see in the near future that we need more than M/E1s 8 inputs.

 

Because I see the importance of being able to put live-stuff splited to both the screen in studio and to the mixer I’m thinking of the possibility to use Screenmonkey. Has anyone ane experience of using Screenmonkey. One limitation of Screenmonkey is that you can’t scroll down webpages, Has anyone else come across this or is it in my setup?

 

The recordingmedia I think that P2 wins over BMD Hyperdeck Studio in 2 ways, our user are used to have their own P2-cards with them and Hyperdeck Studio SSD-hardisk are not straight usable in PC:s.

 

The lightning I’m testing Socanland 100W LED-fresnels. Have anyone any experience of these or has anyone compared 30×30 Bi-focus LED to Fresnel-LED. I think that Fresnel is better to use and if I comparing the build of Socanland to Sola, I think that Socanland has an big advantage. But I’m a bit concerned of how long the Socanland would last.

 

Markus Nygård

Technical Producer / Svenska YLE