Addresses with most registered companies in Slovenian towns

Share Button

There appeared an article, in which an attempt was made to expose questionable practices of some Slovenian enterpreneurs. The scheme is such: establish a company, perform some work, bleed it dry, then establish a new one and move all workers into it, at the same time avoiding paying benefits and a sizable portion of salaries. When the new company has server its purpose, establish a new one, and so on, as far as it goes. These companies are frequently registered at the same address.

The article says that there are as many as 120 companies registered in one residential building. But because of a weakness of the law, state inspectors can’t put an end to such practice.

I wanted to see these addresses on the map, so here’s an attempt. For every address with more than five companies, there’s a dot, with color and radius proportional with number of companies registered there. The biggest dots represent business buildings, in which a predominantly legitimate businesses reside. My data sources didn’t allow for filtering out just residential buildings.

You can see the standalone map here. (In Slovene.)

Interactive map showing addresses with most companies

Clicking on a marker displays a popup with a list of companies, sorted by date of establishment – youngest first. There’s also a chart of predominant business categories at that address. The categories that the article mentions as most prone to scheme in question, are Construction and Retail. So even of this map can’t really show the locations with these questionable companies, it can maybe help their discovery. If there’s a big dot with predominantly these categories, there’s a certain possibility that some of these fraudulent companies are there.

Most addresses shown here of course don’t have anything to do with any illegal activity.

Data source: Zemljevid Najdi.si.

Enhanced by Zemanta

Interactive visualization of Global Gender Gap Index 2013 report

Share Button

This is a brief visualization of Global Gender Gap 2013 Index report by World Economic Forum. As the report authors say,

The Global Gender Gap Index examines the gap between men and women in four fundamental categories (subindexes): Economic Participation and Opportunity, Educational Attainment, Health and Survival and Political Empowerment. Table 1 displays all four of these subindexes and the 14 different indicators that compose them, along with the sources of data used for each.

I thought it would be nice to try to visualize the data and make it as interactive as I could, and learn d3.js in process. I actually tried to mobilize all the data in the report, which one can see in graphical form by clicking on countries on world map, or selecting the categories in the dropdown.

There are several categories:

  • economy,
  • education,
  • health, and
  • politics

In addition to that, I calculated the differences between 2013 and previous years. These maps are also accessible through dropdown menu, or simply by scrolling up and down.

Launch the viewer here, or click the image below.

Interactive visualization, Global gender gap 2013
Interactive visualization, Global gender gap 2013

 

Copied straight from the report:

Economic Participation and Opportunity

This subindex is captured through three concepts: the participation gap, the remuneration gap and the advancement gap. The participation gap is captured using the difference in labour force participation rates. The remuneration gap is captured through a hard data indicator (ratio of estimated female-to-male earned income) and a qualitative variable calculated through the World Economic Forum’s Executive Opinion Survey (wage equality for similar work). Finally, the gap between the advancement of women and men is captured through two hard data statistics (the ratio of women to men among legislators, senior officials and managers, and the ratio of women to men among technical and professional workers).

Educational Attainment

In this subindex, the gap between women’s and men’s current access to education is captured through ratios of women to men in primary-, secondary- and tertiary-level education. A longer-term view of the country’s ability to educate women and men in equal numbers is captured through the ratio of the female literacy rate to the male literacy rate.

Health and Survival

This subindex provides an overview of the differences between women’s and men’s health. To do this, we use two indicators. The first is the sex ratio at birth, which aims specifically to capture the phenomenon of “missing women” prevalent in many countries with a strong son preference. Second, we use the gap between women’s and men’s healthy life expectancy, calculated by the World Health Organization. This measure provides an estimate of the number of years that women and men can expect to live in good health by taking into account the years lost to violence, disease, malnutrition or other relevant factors.

Political Empowerment

This subindex measures the gap between men and women at the highest level of political decision-making, through the ratio of women to men in minister-level positions and the ratio of women to men in parliamentary positions. In addition, we include the ratio of women to men in terms of years in executive office (prime minister or president) for the last 50 years. A clear drawback in this category is the absence of any indicators capturing differences between the participation of women and men at local levels of government. Should such data become available at a global level in future years, they will be considered for inclusion in the Global Gender Gap Index.

Score changes

Out of the 110 countries that have been involved every year since 2006, 95 (86%) have improved their performance over the last four years, while 15 (14%) have shown widening gaps. Ten countries have closed the gap on both the Health and Survival and Educational Attainment subindexes. No country has closed the economic participation gap or the political empowerment gap. On the Economic Participation and Opportunity subindex, the highest-ranking country (Norway) has closed over 84% of its gender gap, while the lowest ranking country (Syria) has closed only 25% of its economic gender gap. There is similar variation in the Political Empowerment subindex. The highest-ranking country (Iceland) has closed almost 75% of its gender gap whereas the two lowest-ranking countries (Brunei Darussalam and Qatar) have closed none of the political empowerment gap according to this measure.

Presence of faces in House of Cards TV Series by episode

Share Button

I was wondering if presence of faces in video content was an indicator of anything, and if so, of what. So I decided to scan episodes of a popular TV series and analyze them, second by second, for number of faces in video frames, and then compare charts of various episodes. Here is the result of this research.

I decided to analyze House Of Cards, partly because it’s a great series, but also because it’s character focused, so there are many scenes with a lot of people. I built an interactive viewer, which allows to see which faces were recognized at a particular point in time in Episode 3, which contains a variety of scenes with many people in them.

Launch the viewer, or continue reading for short description of technology.

LAunch the House of Cards Face Recognition Interactive Viewer

Technology

To pull this off, I used the OpenCV computer vision library, which has a good capability to recognize faces. As the computer watches TV, this tool scans every frame for faces, and, if it finds any, communicates the relevant rectangles, so they can be drawn or extracted and saved.

Here’s a screenshot of a scene in church. It’s immediately apparent that the tool does not do such a good job, for many faces remain unrecognized. Still, many are recognized.

Recognized faces in House of Cards

Recognized faces in the church scene, Episode 3

In this frame below, more faces are recognized.

hoc-0201

There are also many false positives. The computer sometimes thinks that something is a face, where it most certainly it’s not, as in this picture below. If one looks carefully, one can sometimes see something face-like in these rectangles.

hoc-0018

To construct the viewer, I extracted individual faces from frames so I could display them on the page. They are of various sizes and look like this:

1 0 2 0 1 0

To construct the charts, I just counted the faces in each seconds, then displayed the time series for each episode.

Results

This is the final chart. It’s a series of timelines that show how many faces were recognized per second. Why are some lines orange, and some yellow?

As video frames scanning progressed, some faces were recognized in only one frame in entire second – there are 23 of them. Some other faces were recognized in more frames, ans others in yet more frames. I thought this to be a good indicator of face detection reliability, but that’s not so. If it tells anything, it’s how steady the camera was in that section.

House of Cards face recognition charts by episode
House of Cards face recognition charts by episode

My inspiration was small multiples, a visualization technique which allows for easier comparison of several datasets from the same domain. Wikipedia says:

A small multiple (sometimes called trellis chart, lattice chart, grid chart, or panel chart) is a series or grid of small similar graphics or charts, allowing them to be easily compared. The term was popularized by Edward Tufte.

According to Tufte (Envisioning Information, p. 67):

At the heart of quantitative reasoning is a single question: Compared to what? Small multiple designs, multivariate and data bountiful, answer directly by visually enforcing comparisons of changes, of the differences among objects, of the scope of alternatives. For a wide range of problems in data presentation, small multiples are the best design solution.

 

As always, if anyone is interested in code, mail me. My address is on About page.

My brainwaves during the final episode of Breaking Bad

Share Button

This is a follow-up to the first self-quantizing post here, my heart rate during the latest episode of the Game of Thrones.  See also Graphs of recognized faces per second in House of Cards episodes. This time I thought it’d be fun to measure my brainwaves while watching a critical episode of another TV show.

Breaking Bad is a great TV show, I really recommend it. Even Anthony Hopkins wrote a much publicized fan letter to the crew and the main actor. I watched it avidly until the episode with the fly. Then I took a pause that somehow extended itself up until the finale.

After that all the information has come from the media and from my girlfriend, who still watched it on a regular basis. So these measurements were taken by a person who isn’t biased enough in sense of any emotional involvement with the onscreen characters.

What do brainwaves measure, and what do the levels mean? Here’s a quote from Wikipedia:

  • delta: adult slow-wave sleep, in babies, has been found during some continuous-attention tasks.
  • theta: young children, drowsiness or arousal in older children and adults, idling, associated with inhibition of elicited responses (has been found to spike in situations where a person is actively trying to repress a response or action).
  • alpha: relaxed/reflecting, closing the eyes, also associated with inhibition control, seemingly with the purpose of timing inhibitory activity in different locations across the brain.
  • beta: alert, active, busy, or anxious thinking, active concentration.
  • gamma: displays during cross-modal sensory processing (perception that combines two different senses, such as sound and sight), also is shown during short-term memory matching of recognized objects, sounds, or tactile sensations.

There’s also mu, but the Mindwave doesn’t measure it.

Here’s the EEG graph overlaid on the frames. The EEG values have been averaged per shown frame.

The colors are:

  • red: low alpha
  • orange: high alpha,
  • pink: low beta,
  • light blue: high beta,
  • green: Attention (synthetic NeuroSky value).

Breaking BAd final episode EEG chart

To measure the brainwaves, I used the NeuroSky Mindwave. It’s a convenient and portable personal EEG. It’s a little limited, and one has to learn how to use it properly, but it has a professional quality DSP chip that it uses to calculate two levels the company calls “Attention” and “Meditation”. It also outputs standard alpha, beta, gamma, theta and delta waves.

It looks like this:

Neurosky Mindwave
Neurosky Mindwave

By “limited” I mean that it’s sampling brainwave data only twice a second. So whatever it’s happening in your brain now, you can measure after half second in the worst case.

This is the “attention” chart during the episode:

Breaking Bad final episode EEG chart (attention)
Breaking Bad final episode EEG chart (attention)

Here is the video with onscreen readings. It’s just another way of presenting the same as in the picture above, except there’s more brainwave frequencies shown.

Breaking bad final episode fast forward with EEG readings from Marko O’Hara on Vimeo.

I hope I’m not in copyright violation for that video. It’s essentially unwatchable story-wise.

I’m not totally satisfied with the images and video produced here, but I’m not watching the episode again. I must also admit that I can’t really interpret the charts and video. Attention is self-explanatory, and elevated beta levels also mean increased attention, but do high alpha values mean that I was falling asleep? I was pretty alert while watching.

There’s also possibility of interference. The EEG is essentially a very sensitive voltmeter that measures minute potential changes. Twitching facial muscles, blinking, yawning, … etc., all interfere with the readings. I did look at my second monitor quite a few times to check if the data was being written to a file, maybe some spikes come from that. All in all, I don’t think there are any spoilers here.

Here are some more charts:

 

 

Slovenian real estate prices mapped

Share Button

There has recently been a flurry of activity by self-made mappers on the net that major media have noticed. It seems that proliferation of tools such as the excellent TileMill does help to make custom maps a relatively painless, yet still laborious process.

In my experience, a major hurdle in this process is getting good data. Governments and corporations around the globe have made acquiring the goods easier, but the quality frequently leaves one wanting. More about this particular dataset later.

This map is my attempt to visualize real estate prices in Slovenia. Buildings are colored according to the most expensive unit they contain, except in some cases where data is bad. More below.

See the map!

A map of real estate prices in Slovenia.

A map of real estate prices in Slovenia.

About the dataset

This dataset is provided by GURS, a government institution. I used it before, to make the map of structure ages in Ljubljana. It comes in a variety of formats, such as SHP (geometry) and text (building properties) files, which were clearly dumped from database tables.

It has some severe problems. For example, some bigger and more expensive buildings contain many units, but these units all hold the same value regardless of their useful area. To make matters more complicated, other multiunit buildings don’t hold the same value for the units they contain. They are, in other words, evidenced correctly. Then, there are building compounds, like the nuclear power plant in Krško, in which every building clearly holds the exorbitant value of entire compound. Some other buildings have price value as zero, and so on.

All of this doesn’t even start to address the quality of valuation the government inspectors performed. In the opinion of many property owners, the values are too low. There’s a new round of valuation coming, in which the values are reportedly bound to drop by further five to twenty percent, if I remember correctly. It will be interesting to make another map with the valuation differences some day.

Massaging the data

This means that the above map is my interpretation of the dataset beyond the visualization itself. In calculating values for visualization, there were several decisions I made:

  • For multiunit buildings, I calculated the cost of square meter for every unit, then colored the building with color value of the most expensive unit. This was necessary, because some buildings contain many communal areas, garages and parking lots, which are all independently valued. I first tried with a simple average value, but the apartment buildings with many parking boxes and garages were then valued deceivingly low. I tried to make the map more apartment-oriented, so this was a necessary decision to make it more accurately reflect the market.
  • For incorrectly evidenced buildings with same value (high) unit value, I took the price of one unit, divided by sum of unit areas. I could do this on one unit only, but which one? There’s no easy answer. The average seemed the way to go.

I also made a list of the most expensive buildings by their total Euro value. Individual unit values were summed, except in cases described in the second bullet point above. there I simply took the price of one unit. It’s accessible as a separate vector layer under “Most expensive buildings” menu item.

Findings

Turns out the most expensive buildings are mostly power plants, which is not surprising. In Ljubljana, two of the most expensive buildings were completed recently. Well, the Stožice stadium was not really completed. I don’t know whether it was paid for or not – this is a discourse best suited for political tabloids. See the gallery:

It’s also hardly surprising that the capital and the coast are areas with the most expensive real estate available. The state of city of Maribor is sad to see, though, at least in comparison to Ljubljana.

I suggest taking the tour in the map itself, where I go into a little more depth for some towns and cities. Also, be sure to click the “Most expensive buildings”, then hovering the mouse pointer over highlighted buildings to get an idea of their total cost and price per square meter, which in many cases diverges dramatically.

Here are two charts showing price/m2 distribution at different intervals in time.
This one is an all-time chart. Most buildings are valued low, since all ages were taken into account.
realestate-chart-m2

This one shows the period between year 2008 and now, in other words, since the crisis struck. Nevertheless, more expensive buildings seem to prevail. No wonder, since they are new. But that probably also means that there’s more apartment building construction relative to countryside development. I’m not really a real estate expert, so if anyone has a suggestion, comment away.

realestate-chart-m2-2008

Credits

Inspiration for the tour was this excellent visualization by the Pulitzer center.

I also have to thank the kind people at GURS for providing me with data. They know it’s flawed somewhat, but all in all it’s not so bad.

Disclaimer

As I’ve noted before, this map is a result of my interpretation of government data. I’m in no way I responsible for any misunderstandings arising from this map. If you want to see the actual valuation of your building or building unit, please consult GURS or use their web application to find out.

See also

Structure ages map in Ljubljana.