Tagged: visualization

A project for Transparency International Slovenija – visualization of lobbying contacts between state officials and lobbyists

Share Button

On the basis of previous post, Transparency International Slovenia asked me to collaborate on some projects. This is one of them, and it was launched today on a separate site: kdovpliva.si (English: whoinfluences.si).

It’s an attempt to visualize several networks of lobbyists, their companies, politicians and state institutions. Perhaps the most interesting part is the network of lobbying contacts, which was constructed with data containing around 700 reported contacts between 2011 and late 2014.

As you may imagine, not every lobbying contact is reported. For those who are, records are kept at the Komisija za preprečevanje korupcije (Commission for prevention of corruption, a state institution). Transparency International Slovenia obtained those records as PDF files, since the institution refused to provide them in a machine-readable format. They hired a few volunteers to copy and paste the information in spreadsheets, then handed them to me to visualize them.

You can see the results below. Click here or the image to open the site in a new window. It’s in Slovenian. For methodology, continue reading below the image.

App screenshot - lobbying contacts
App screenshot – lobbying contacts


Network construction

The meaning of every network is determined by the nature of its nodes and connections. Here, we have four node types:

  • lobbyists
  • those who were lobbied – state officials
  • organizations on which behalf lobbying was performed
  • state institutions at which the abovementioned officials work

Lobbying contact is initiated by a company or an organization, which employs a lobbyist to to the work. These people then contact state officials of a sufficient influence, who work at appropriate state institution.

So an organization is connected to the lobbyist with a weight of 2, the lobbyist to a state official with a weight of 1, and state official to her institution with a weight of 2. The weights signify the approximate loyalty between these entities. We presupposed that lobbyists are more loyal to their clients than they are to the state officials, with which they must be in a promiscuous relationship. Furthermore, the state officials are also supposed to be more loyal to their employers than to the lobbyists, although this is a daring supposition. But let’s say they are, or at least that they should be.

After some processing, the network emerged. Immediately apparent are the interest groups, centered around seats of power. Here’s an image of the pharmaceutical lobby. It’s centered on the Public Agency for Pharmaceuticals and Medicine. Main actors of influence are companies such as Merck, Novartis, Eli Lilly, Aventis, etc.

Pharmaceutical lobby
Pharmaceutical lobby

A click on the agency node brings up a panel with some details, such as a list of companies (font size indicates the frequency of contact), lobbying purposes and a timeline of lobbying contacts. Here we can see that Novartis and Krka were most active companies, and that they lobbied for purposes of pricing and to limit potential competition by producers of generic drugs.

You can explore the network by yourself to see the other interest groups.

Who lobbied the drug agency?
Who lobbied the drug agency?


Some advice from Information Commissioner

Unfortunately, we had to omit lobbyists’ names for reasons of supposed privacy. The Information Commissioner strongly advised us not to display them on the basis of some EU ruling. I’m not an expert in EU law, and perhaps there are good reasons for this. On the other hand, there may not be. I fail to see why this information would not be in public interest, since these decisions have an impact on a significant number of taxpayers, if not all of them.

Anyway, we have the names. After all, we had to use them to connect the network. They are present in raw data, just not displayed.

We’re are probably going to continue developing this project, as new information comes to light and new rulings regarding privacy are issued.

Stay tuned!

Addresses with most registered companies in Slovenian towns

Share Button

There appeared an article, in which an attempt was made to expose questionable practices of some Slovenian enterpreneurs. The scheme is such: establish a company, perform some work, bleed it dry, then establish a new one and move all workers into it, at the same time avoiding paying benefits and a sizable portion of salaries. When the new company has server its purpose, establish a new one, and so on, as far as it goes. These companies are frequently registered at the same address.

The article says that there are as many as 120 companies registered in one residential building. But because of a weakness of the law, state inspectors can’t put an end to such practice.

I wanted to see these addresses on the map, so here’s an attempt. For every address with more than five companies, there’s a dot, with color and radius proportional with number of companies registered there. The biggest dots represent business buildings, in which a predominantly legitimate businesses reside. My data sources didn’t allow for filtering out just residential buildings.

You can see the standalone map here. (In Slovene.)

Interactive map showing addresses with most companies

Clicking on a marker displays a popup with a list of companies, sorted by date of establishment – youngest first. There’s also a chart of predominant business categories at that address. The categories that the article mentions as most prone to scheme in question, are Construction and Retail. So even of this map can’t really show the locations with these questionable companies, it can maybe help their discovery. If there’s a big dot with predominantly these categories, there’s a certain possibility that some of these fraudulent companies are there.

Most addresses shown here of course don’t have anything to do with any illegal activity.

Data source: Zemljevid Najdi.si.

Enhanced by Zemanta

Interactive visualization of Global Gender Gap Index 2013 report

Share Button

This is a brief visualization of Global Gender Gap 2013 Index report by World Economic Forum. As the report authors say,

The Global Gender Gap Index examines the gap between men and women in four fundamental categories (subindexes): Economic Participation and Opportunity, Educational Attainment, Health and Survival and Political Empowerment. Table 1 displays all four of these subindexes and the 14 different indicators that compose them, along with the sources of data used for each.

I thought it would be nice to try to visualize the data and make it as interactive as I could, and learn d3.js in process. I actually tried to mobilize all the data in the report, which one can see in graphical form by clicking on countries on world map, or selecting the categories in the dropdown.

There are several categories:

  • economy,
  • education,
  • health, and
  • politics

In addition to that, I calculated the differences between 2013 and previous years. These maps are also accessible through dropdown menu, or simply by scrolling up and down.

Launch the viewer here, or click the image below.

Interactive visualization, Global gender gap 2013
Interactive visualization, Global gender gap 2013


Copied straight from the report:

Economic Participation and Opportunity

This subindex is captured through three concepts: the participation gap, the remuneration gap and the advancement gap. The participation gap is captured using the difference in labour force participation rates. The remuneration gap is captured through a hard data indicator (ratio of estimated female-to-male earned income) and a qualitative variable calculated through the World Economic Forum’s Executive Opinion Survey (wage equality for similar work). Finally, the gap between the advancement of women and men is captured through two hard data statistics (the ratio of women to men among legislators, senior officials and managers, and the ratio of women to men among technical and professional workers).

Educational Attainment

In this subindex, the gap between women’s and men’s current access to education is captured through ratios of women to men in primary-, secondary- and tertiary-level education. A longer-term view of the country’s ability to educate women and men in equal numbers is captured through the ratio of the female literacy rate to the male literacy rate.

Health and Survival

This subindex provides an overview of the differences between women’s and men’s health. To do this, we use two indicators. The first is the sex ratio at birth, which aims specifically to capture the phenomenon of “missing women” prevalent in many countries with a strong son preference. Second, we use the gap between women’s and men’s healthy life expectancy, calculated by the World Health Organization. This measure provides an estimate of the number of years that women and men can expect to live in good health by taking into account the years lost to violence, disease, malnutrition or other relevant factors.

Political Empowerment

This subindex measures the gap between men and women at the highest level of political decision-making, through the ratio of women to men in minister-level positions and the ratio of women to men in parliamentary positions. In addition, we include the ratio of women to men in terms of years in executive office (prime minister or president) for the last 50 years. A clear drawback in this category is the absence of any indicators capturing differences between the participation of women and men at local levels of government. Should such data become available at a global level in future years, they will be considered for inclusion in the Global Gender Gap Index.

Score changes

Out of the 110 countries that have been involved every year since 2006, 95 (86%) have improved their performance over the last four years, while 15 (14%) have shown widening gaps. Ten countries have closed the gap on both the Health and Survival and Educational Attainment subindexes. No country has closed the economic participation gap or the political empowerment gap. On the Economic Participation and Opportunity subindex, the highest-ranking country (Norway) has closed over 84% of its gender gap, while the lowest ranking country (Syria) has closed only 25% of its economic gender gap. There is similar variation in the Political Empowerment subindex. The highest-ranking country (Iceland) has closed almost 75% of its gender gap whereas the two lowest-ranking countries (Brunei Darussalam and Qatar) have closed none of the political empowerment gap according to this measure.

Presence of faces in House of Cards TV Series by episode

Share Button

I was wondering if presence of faces in video content was an indicator of anything, and if so, of what. So I decided to scan episodes of a popular TV series and analyze them, second by second, for number of faces in video frames, and then compare charts of various episodes. Here is the result of this research.

I decided to analyze House Of Cards, partly because it’s a great series, but also because it’s character focused, so there are many scenes with a lot of people. I built an interactive viewer, which allows to see which faces were recognized at a particular point in time in Episode 3, which contains a variety of scenes with many people in them.

Launch the viewer, or continue reading for short description of technology.

LAunch the House of Cards Face Recognition Interactive Viewer


To pull this off, I used the OpenCV computer vision library, which has a good capability to recognize faces. As the computer watches TV, this tool scans every frame for faces, and, if it finds any, communicates the relevant rectangles, so they can be drawn or extracted and saved.

Here’s a screenshot of a scene in church. It’s immediately apparent that the tool does not do such a good job, for many faces remain unrecognized. Still, many are recognized.

Recognized faces in House of Cards

Recognized faces in the church scene, Episode 3

In this frame below, more faces are recognized.


There are also many false positives. The computer sometimes thinks that something is a face, where it most certainly it’s not, as in this picture below. If one looks carefully, one can sometimes see something face-like in these rectangles.


To construct the viewer, I extracted individual faces from frames so I could display them on the page. They are of various sizes and look like this:

1 0 2 0 1 0

To construct the charts, I just counted the faces in each seconds, then displayed the time series for each episode.


This is the final chart. It’s a series of timelines that show how many faces were recognized per second. Why are some lines orange, and some yellow?

As video frames scanning progressed, some faces were recognized in only one frame in entire second – there are 23 of them. Some other faces were recognized in more frames, ans others in yet more frames. I thought this to be a good indicator of face detection reliability, but that’s not so. If it tells anything, it’s how steady the camera was in that section.

House of Cards face recognition charts by episode
House of Cards face recognition charts by episode

My inspiration was small multiples, a visualization technique which allows for easier comparison of several datasets from the same domain. Wikipedia says:

A small multiple (sometimes called trellis chart, lattice chart, grid chart, or panel chart) is a series or grid of small similar graphics or charts, allowing them to be easily compared. The term was popularized by Edward Tufte.

According to Tufte (Envisioning Information, p. 67):

At the heart of quantitative reasoning is a single question: Compared to what? Small multiple designs, multivariate and data bountiful, answer directly by visually enforcing comparisons of changes, of the differences among objects, of the scope of alternatives. For a wide range of problems in data presentation, small multiples are the best design solution.


As always, if anyone is interested in code, mail me. My address is on About page.

My brainwaves during the final episode of Breaking Bad

Share Button

This is a follow-up to the first self-quantizing post here, my heart rate during the latest episode of the Game of Thrones.  See also Graphs of recognized faces per second in House of Cards episodes. This time I thought it’d be fun to measure my brainwaves while watching a critical episode of another TV show.

Breaking Bad is a great TV show, I really recommend it. Even Anthony Hopkins wrote a much publicized fan letter to the crew and the main actor. I watched it avidly until the episode with the fly. Then I took a pause that somehow extended itself up until the finale.

After that all the information has come from the media and from my girlfriend, who still watched it on a regular basis. So these measurements were taken by a person who isn’t biased enough in sense of any emotional involvement with the onscreen characters.

What do brainwaves measure, and what do the levels mean? Here’s a quote from Wikipedia:

  • delta: adult slow-wave sleep, in babies, has been found during some continuous-attention tasks.
  • theta: young children, drowsiness or arousal in older children and adults, idling, associated with inhibition of elicited responses (has been found to spike in situations where a person is actively trying to repress a response or action).
  • alpha: relaxed/reflecting, closing the eyes, also associated with inhibition control, seemingly with the purpose of timing inhibitory activity in different locations across the brain.
  • beta: alert, active, busy, or anxious thinking, active concentration.
  • gamma: displays during cross-modal sensory processing (perception that combines two different senses, such as sound and sight), also is shown during short-term memory matching of recognized objects, sounds, or tactile sensations.

There’s also mu, but the Mindwave doesn’t measure it.

Here’s the EEG graph overlaid on the frames. The EEG values have been averaged per shown frame.

The colors are:

  • red: low alpha
  • orange: high alpha,
  • pink: low beta,
  • light blue: high beta,
  • green: Attention (synthetic NeuroSky value).

Breaking BAd final episode EEG chart

To measure the brainwaves, I used the NeuroSky Mindwave. It’s a convenient and portable personal EEG. It’s a little limited, and one has to learn how to use it properly, but it has a professional quality DSP chip that it uses to calculate two levels the company calls “Attention” and “Meditation”. It also outputs standard alpha, beta, gamma, theta and delta waves.

It looks like this:

Neurosky Mindwave
Neurosky Mindwave

By “limited” I mean that it’s sampling brainwave data only twice a second. So whatever it’s happening in your brain now, you can measure after half second in the worst case.

This is the “attention” chart during the episode:

Breaking Bad final episode EEG chart (attention)
Breaking Bad final episode EEG chart (attention)

Here is the video with onscreen readings. It’s just another way of presenting the same as in the picture above, except there’s more brainwave frequencies shown.

Breaking bad final episode fast forward with EEG readings from Marko O’Hara on Vimeo.

I hope I’m not in copyright violation for that video. It’s essentially unwatchable story-wise.

I’m not totally satisfied with the images and video produced here, but I’m not watching the episode again. I must also admit that I can’t really interpret the charts and video. Attention is self-explanatory, and elevated beta levels also mean increased attention, but do high alpha values mean that I was falling asleep? I was pretty alert while watching.

There’s also possibility of interference. The EEG is essentially a very sensitive voltmeter that measures minute potential changes. Twitching facial muscles, blinking, yawning, … etc., all interfere with the readings. I did look at my second monitor quite a few times to check if the data was being written to a file, maybe some spikes come from that. All in all, I don’t think there are any spoilers here.

Here are some more charts: