Uncategorized Archives - Page 2 of 2

by virostatiq - December 24, 2013April 11, 2014

Slovenian business activity by city as animated heatmaps

A few months ago, while researching business times of various categories of establishments in Slovenia, I thought it would be nice to somehow visualize a map with a graphical representation of density of open establishments. I decided on heatmap style, although I later discover that my chosen implementation had some drawbacks.

Getting the data

Data with business hours of commercial establishments is traditionally not open for many reasons, two of them being that (1) this information can be commercially exploited, and (2) the opening hours can be subject to frequent changes, which can tax the database owner with considerable effort should the database stay current and reliable.

First I toyed with the idea of crawling entire directory of odpiralnicasi.com, then I actually thought about making a version for London, Amsterdam or San Francisco with Yelp data, for which I would have to crawl an entire Yelp city directory, a task I’m not sure it would succeed. Yelp would probably block my IP before I could harvest a significant portion of what interested me.

So I decided I would use the Najdi.si maps business directory. Disclosure: I work there, so I have access to the database with various business data, which is being kept current.

For every company, I took out only the name, geo coordinates, business hours and business category, then I constructed the animated maps. Before I delve into that, a short video of economic activity in Slovenia in course of a typical Monday.

Economic activity in Slovenia from Marko O’Hara on Vimeo.

The animated chart you see on the bottom shows the number of active establishments in various economic categories, such as Restaurants and catering, Industry, Shopping, etc. The full list is:

blue: Computers and IT,
red: Restaurants and catering,
green: Home and garden,
yellow: Beauty and health,
pink: General business,
orange: Free time,
violet: Industry,
magenta: Culture and schooling

Rendering the maps and constructing the visualization

Rendering one frame in one city at a specific time is just a matter of setting appropriate latitude, longitude and zoom level on the map, selecting the desired time and plotting on the map all establishments that are open at that time. I used Processing to do that, and for the heat map part I used this excellent example by Philipp Seifried. As a finishing touch, I made maps to switch between day and night styles at appropriate times.

To do entire video, I had to write a parallel rendering queue lest the rendering of a single video took an eternity – Eclipse project available by email request.

To complicate things a bit I decided to include up to four different places on the same map, so the viewer could compare opening hours in Ljubljana in different economic categories, or see how different cities woke up and went to sleep at different times.

A typical frame looks like this:

Video frame / comparison of business activity in Nova Gorica, Koper, Celje and Novo Mesto at noon

Here’s an example for different economic activities in Ljubljana:

Economic activity in Ljubljana – four categories from Marko O’Hara on Vimeo.

top left: General business
top right: Restaurants and catering
bottom left: Industry,
bottom right:Beauty and health

Here’s a comparison between Ljubljana and the city of Maribor:

opentimes ljmb.mp4 from Marko O’Hara on Vimeo.

left: Ljubljana
right: Maribor

And here a comparison of business activity in Nova Gorica, Koper, Celje and Novo Mesto:

opentimes kpnmceng.mp4 from Marko O’Hara on Vimeo.

top left: Nova Gorica
top right: Koper
bottom left: Novo Mesto,
bottom right:Celje

Commentary

I mostly did this to be able to visually compare levels of business activity in Ljubljana. First of all, the heatmap technique I employed here turned out to be somewhat unreliable for video purposes, because it colors the dots relative to the highest concentration. But concentration and absolute numbers of active businesses change from frame to frame, so it seems that at night there’s more activity that during the day.

Even so it’s still clear that restaurants, bars and clubs are still pretty much open when other activity starts to die down.

This is Ljubljana at noon, again:

top left: General business
top right: Restaurants and catering
bottom left: Industry,
bottom right:Beauty and health

The big spot in the northeast is the mall region, where untold number of business operate in ten or more big malls. Business concentration there dwarfs everything else in the city, except maybe in industrial category.

Below is Ljubljana at eight o’clock in the evening. Pretty much everything has closed down except for eating and drinking, and maybe the cinema theater in the mall.

Below: Ljubljana at ten o’clock in the evening. Some businesses don’t close down at all. I double checked the primary data source and it’s true. There are cleaning services that stay open during the night, etc.

I’m relatively satisfied with results except for the heatmap issue. I may correct that if I get the data for a bigger city.

by virostatiq - July 23, 2013April 11, 2014

Food democracy: foodstuffs according to their democratic value

Update: we won the competition!

This is a contribution to Memefest 21013 Food Democracy competition in collaboration with Miha Mazzini. Food democracy generally means more involved citizen participation in food production and supply chain, but here we have a different take on the topic. In Miha’s words, taken from the project form:

Describe your idea and concept of your work in relation to the festival outlines:

Brain has developed as an organ to help us fill the stomach. A lonely stomach hunting and gathering in the savannah has less chance to survive than a group of them, so the societies have developed.

So, the content of the stomachs must mirror the structure of the society – what are the preferred foods for authoritarian regimes and what for democracies?

We took all of the recipes from Food.com and democracy indexes of The Economist and Wikipedia; we linked national cooking recipes with the countries, split recipes into ingredients and added democracy indexes to them.

What kind of communication approach do you use?
Spoof scientific report on real data.

What are in your opinion concrete benefits to the society because of your communication?

To see have very sweet life in democracy really is.

What did you personally learn from creating your submitted work?

Person should choose their restaurant even more carefully than the country.

Why is your work, GOOD communication WORK?

It’s fun getting some food for thought.

Launch the interactive visualization.

More details below.

Main idea:
To range foodstuffs according to their democratic value, if such exists.

Data sources:
Allrecipes.com and food.com were crawled and structured information (ingredients, national provenience, …) extracted from individual recipe pages (~150,000 of them).
Economist Intelligence Unit for democracy indices of individual countries.

Construction:
A network (graph in math parlance) was constructed such that each recipe’s country was associated with one of four main nodes, which represent four democracy groups: authoritarian (0-2.5 on EIU scale), poor (2.5 – 5 on EIU scale), good (5 – 7.5 on EIU scale) and democratic (7.5 – 10 on EIU scale). Then the recipe ingredients were connected to one of these groups. Finally, ForceAtlas2 algorithm was run on the network, producing the result you see in the visualization.

Tools:
Java for crawling the net and original graph construction, Gephi for graph processing and original visualization, sigma.js for web presentation.

Authors:
Miha Mazzini (concept), Marko Plahuta (concept and programming / visualization / web presentation)

What we really found:
That the freshest, most unprocessed food is apparently very undemocratic, which is a side effect of poor countries usually not having a democratic form of government.

by virostatiq - April 9, 2013April 11, 2014

K-means clustering with Processing.js

K-means clustering is an algorithm to quickly group a large quantity of data. It’s used in variety of ways, from statistical analysis to improving usability of user interfaces. If you read Google News, you’re probably familiar with the way they group similar news items together. When I first saw that, I thought there must be some serious language processing and semantics behind that – that they somehow extract meaning from the articles and group them together accordingly.

Turns out that it’s a little easier than that. Article abstracts are split into words, and for each article, a multidimensional vector consisting of these words is constructed. These vectors are then put into n-dimensional space, in which number of dimensions corresponds to the total number of different words in all articles analyzed. Then a clustering algorithm is run on the articles in this space. It yields groups of correlated articles based on their word vectors.

One clustering algorithm to do that is k-means. It works like that (I cite Manning’s excellent book “Algorithms of Intelligent Web” under fair use):

The k-means algorithm randomly picks k points that represent the initial centroids of the candidate clusters. Subsequently the distances between these centroids and each point of the set are calculated, and each point is assigned to the cluster with the minimum distance between the cluster centroid and the point. As a result of these assignments, the locations of the centroids for each cluster have now changed, so we reevaluate the new centroids until their locations stop changing. This particular algorithm for k-means is attributed to E.W. Forgy and to S.P. Lloyd, and has the following advantages:

It works well with many metrics.

It’s easy to derive versions of the algorithm that are executed in parallel—when

the data are divided into, say, N sets and each separate data set is clustered, in

parallel, on N different computational units.

It’s insensitive with respect to data ordering.

At this point you may wonder, what happens if the algorithm doesn’t stop? Don’t worry! It’s guaranteed that the iterations will stop in a finite number of steps. In practice, the algorithm converges quickly (that’s the mathematical jargon).

Of course you need to use Apache Mahout, preferably in combination with Solr, to do that on text in production environment. But it’s also possible to experiment in Processing, albeit for this experiment I w0n’t use text, but points in space. You can see this in the Processing.js applet below. It fills 3D space with random points, adds cluster nodes and then optimizes their positions so that every point belongs to a cluster, coloring the points belonging to each cluster in same color. Resulting configuration is in fact a Voronoi diagram in 3D, which is related to Delaunay triangulation.

Click anywhere on screen to restart, controls to refresh/stop, or select numbers of clusters and points. I suggest downloading the original sketch here, it runs much faster.

Clusters:

Points:

Your browser does not support the canvas tag.