Grouping countries according to flag similarity

Share Button

This topic is apparently interesting enough that it warrants its own discussion on Quora. People there are relying on keen observational powers of human mind, but for this article, I tried to group the flags algorithmically.

I plotted the results on the map below. Countries with same colors have similar flags. The brighter the color, the bigger the group of countries with similar flags.

Launch the interactive viewer to explore the matter interactively.

Countries by flag similarity
Countries by flag similarity


Here are some flag groups. To see them all, click the image above.

flags_7flags_80 flags_124 flags_27

How I grouped the flags

I used a machine learning algorithm called k-means clustering. It’s really a rudimentary exercise, but the results are good enough to publish on this wee blog.

The algorithm accepts units to be grouped as vectors, so I had to vectorize the images first, that is to say, convert them in a long string of numbers. Each image was partitioned into a grid, then the average color  value for each cell was computed. The grid was 24 x 24 cells big. I found that enough for simple flags.  These color values were converted into HSB color space and experimentally weighted, then copied into a vector. These vectors were fed into the k-means algorithm with requested number of individual clusters set to 120 (there are 240 different flags). You can see results in the viewer.

Number of clusters was set experimentally, and the clustering is not perfect. For example, Canadian is grouped with some very unlikely lookalikes.

See also the other post with k-means clustering, K-means clustering with Processing.js


Enhanced by Zemanta

Presence of faces in House of Cards TV Series by episode

Share Button

I was wondering if presence of faces in video content was an indicator of anything, and if so, of what. So I decided to scan episodes of a popular TV series and analyze them, second by second, for number of faces in video frames, and then compare charts of various episodes. Here is the result of this research.

I decided to analyze House Of Cards, partly because it’s a great series, but also because it’s character focused, so there are many scenes with a lot of people. I built an interactive viewer, which allows to see which faces were recognized at a particular point in time in Episode 3, which contains a variety of scenes with many people in them.

Launch the viewer, or continue reading for short description of technology.

LAunch the House of Cards Face Recognition Interactive Viewer


To pull this off, I used the OpenCV computer vision library, which has a good capability to recognize faces. As the computer watches TV, this tool scans every frame for faces, and, if it finds any, communicates the relevant rectangles, so they can be drawn or extracted and saved.

Here’s a screenshot of a scene in church. It’s immediately apparent that the tool does not do such a good job, for many faces remain unrecognized. Still, many are recognized.

Recognized faces in House of Cards

Recognized faces in the church scene, Episode 3

In this frame below, more faces are recognized.


There are also many false positives. The computer sometimes thinks that something is a face, where it most certainly it’s not, as in this picture below. If one looks carefully, one can sometimes see something face-like in these rectangles.


To construct the viewer, I extracted individual faces from frames so I could display them on the page. They are of various sizes and look like this:

1 0 2 0 1 0

To construct the charts, I just counted the faces in each seconds, then displayed the time series for each episode.


This is the final chart. It’s a series of timelines that show how many faces were recognized per second. Why are some lines orange, and some yellow?

As video frames scanning progressed, some faces were recognized in only one frame in entire second – there are 23 of them. Some other faces were recognized in more frames, ans others in yet more frames. I thought this to be a good indicator of face detection reliability, but that’s not so. If it tells anything, it’s how steady the camera was in that section.

House of Cards face recognition charts by episode
House of Cards face recognition charts by episode

My inspiration was small multiples, a visualization technique which allows for easier comparison of several datasets from the same domain. Wikipedia says:

A small multiple (sometimes called trellis chart, lattice chart, grid chart, or panel chart) is a series or grid of small similar graphics or charts, allowing them to be easily compared. The term was popularized by Edward Tufte.

According to Tufte (Envisioning Information, p. 67):

At the heart of quantitative reasoning is a single question: Compared to what? Small multiple designs, multivariate and data bountiful, answer directly by visually enforcing comparisons of changes, of the differences among objects, of the scope of alternatives. For a wide range of problems in data presentation, small multiples are the best design solution.


As always, if anyone is interested in code, mail me. My address is on About page.

My brainwaves during the final episode of Breaking Bad

Share Button

This is a follow-up to the first self-quantizing post here, my heart rate during the latest episode of the Game of Thrones.  See also Graphs of recognized faces per second in House of Cards episodes. This time I thought it’d be fun to measure my brainwaves while watching a critical episode of another TV show.

Breaking Bad is a great TV show, I really recommend it. Even Anthony Hopkins wrote a much publicized fan letter to the crew and the main actor. I watched it avidly until the episode with the fly. Then I took a pause that somehow extended itself up until the finale.

After that all the information has come from the media and from my girlfriend, who still watched it on a regular basis. So these measurements were taken by a person who isn’t biased enough in sense of any emotional involvement with the onscreen characters.

What do brainwaves measure, and what do the levels mean? Here’s a quote from Wikipedia:

  • delta: adult slow-wave sleep, in babies, has been found during some continuous-attention tasks.
  • theta: young children, drowsiness or arousal in older children and adults, idling, associated with inhibition of elicited responses (has been found to spike in situations where a person is actively trying to repress a response or action).
  • alpha: relaxed/reflecting, closing the eyes, also associated with inhibition control, seemingly with the purpose of timing inhibitory activity in different locations across the brain.
  • beta: alert, active, busy, or anxious thinking, active concentration.
  • gamma: displays during cross-modal sensory processing (perception that combines two different senses, such as sound and sight), also is shown during short-term memory matching of recognized objects, sounds, or tactile sensations.

There’s also mu, but the Mindwave doesn’t measure it.

Here’s the EEG graph overlaid on the frames. The EEG values have been averaged per shown frame.

The colors are:

  • red: low alpha
  • orange: high alpha,
  • pink: low beta,
  • light blue: high beta,
  • green: Attention (synthetic NeuroSky value).

Breaking BAd final episode EEG chart

To measure the brainwaves, I used the NeuroSky Mindwave. It’s a convenient and portable personal EEG. It’s a little limited, and one has to learn how to use it properly, but it has a professional quality DSP chip that it uses to calculate two levels the company calls “Attention” and “Meditation”. It also outputs standard alpha, beta, gamma, theta and delta waves.

It looks like this:

Neurosky Mindwave
Neurosky Mindwave

By “limited” I mean that it’s sampling brainwave data only twice a second. So whatever it’s happening in your brain now, you can measure after half second in the worst case.

This is the “attention” chart during the episode:

Breaking Bad final episode EEG chart (attention)
Breaking Bad final episode EEG chart (attention)

Here is the video with onscreen readings. It’s just another way of presenting the same as in the picture above, except there’s more brainwave frequencies shown.

Breaking bad final episode fast forward with EEG readings from Marko O’Hara on Vimeo.

I hope I’m not in copyright violation for that video. It’s essentially unwatchable story-wise.

I’m not totally satisfied with the images and video produced here, but I’m not watching the episode again. I must also admit that I can’t really interpret the charts and video. Attention is self-explanatory, and elevated beta levels also mean increased attention, but do high alpha values mean that I was falling asleep? I was pretty alert while watching.

There’s also possibility of interference. The EEG is essentially a very sensitive voltmeter that measures minute potential changes. Twitching facial muscles, blinking, yawning, … etc., all interfere with the readings. I did look at my second monitor quite a few times to check if the data was being written to a file, maybe some spikes come from that. All in all, I don’t think there are any spoilers here.

Here are some more charts:



Building ages in Ljubljana, Slovenia

Share Button

Such is the beauty of open data that when I saw the excellent Portland: The Age of a City by Justin Palmer, I immediately wanted to do something similar, but for my town. The people at the government office (GURS) were kind enough to provide me with the files, and after some coding, here it is.

It’s an exploration of how the city grew through the last century. Blue is old, violet younger, res still younger, bright red the youngest.

Launch the interactive map showing structure ages in Ljubljana


Here’s the number of structures built by years. I was able to identify causes for some spikes in building activity, but not all:

  • 1899: four years after the big earthquake,
  • 1919: rebuilding after WW1? I’m not sure there was much destruction here,
  • 1929: more building – in 1929 Ljublaana became the capital of Dravska banovina,
  • 1949: rebuilding after WW2,
  • 1959, 1969, 1979, 1989: might be effects of Yugoslav loans, but I suspect it’s more of an effect of administrative laziness, resulting in entering new buildings into evidence at the end of each decade,
  • 2004: the last surge of prosperity in independent Slovenia.

Generally, it’s been going downhill from 1969 on. The best spots were probably taken by then.


Here’s a animation of the whole thing. It shows city evolution between years 1500 and 2013, since there’s not much happening before that.

City of Ljubljana – growth between years 1500 – 2013 from Marko O’Hara on Vimeo.

Map was made with TileMill, animation in Processing.

See also the real estate prices map.


My heart rate during the latest episode of Game Of Thrones

Share Button

See also: My brainwaves while watching the final episode of Breaking Bad and other TV visualization, Number of faces per second in episodes of House of Cards.

I admit that I like Game Of Thrones. I like the story and the way the TV show is done. So when I read on Monday about the supposedly heartbreaking episode “Rains Of Castamere”, in which Robb Stark, his wife and his mother get killed along many others, I thought I’d test my affection by recording my pulse while watching, then plotting it to see how it correlated to the visual narration.
Here it is:


To do that, I used my old Arduino and PulseSensor to record pulse, then graphed the averages on a picture along with corresponding frames using Processing. There are some mistakes when the sensor grabbed double pulse when I jumped or moved, but it averaged decently.

If you saw the episode, you’ll probably relate.

Update: if anyone wants the code (Processing / Arduino) send me an email. My address is on About page.