Tagged: gephi

Social network diagrams of Slovenian governments between 1991 and 2013

Share Button

Such a young country, but already so messed up. One is inclined to think that all is lost, and one would not be far from the truth. Much ink has already been spilled on sad state of affairs in Slovenia, its fall from grace in European Union, the precipitous decline of living standard of its citizenry and its bleak outlook for the future. Did I mention the rampant corruption of its ruling class and top managers? Best not. This was, after all, supposed to be the next Switzerland.

Blaming the ruling class in mere abstract terms may give one a fleeting satisfaction, but who were the people who led us off the cliff? Someone did govern here, or was at least giving an appearance of governing. Prime ministers are known: Lojze Peterle, Janez Drnovšek, Tone Rop, Andrej Bajuk, Janez Janša, Borut Pahor and currently Alenka Bratušek. These are the main culprits for the downward spiral, of which one can only hope we already passed the first half. Names of their accomplices – the ministers, secretaries, etc. – have a tendency to drift into oblivion, as majority of people preoccupy themselves with the daily grind.

So who were they and how are they connected? Here’s a diagram showing all the government members  from 2001 on. I call it “loyalty diagram”, since it was constructed in a way that it shows who is close to whom, and who is hardly loyal to any alliance. The rationale in short is:

  • Ministers are considered to be very loyal to the prime minister (although I know they are not).
  • Secretaries a lot less, since they are essentially experts and not politicians.
  • Secretaries are less loyal to ministers as are ministers to prime minister, but still a lot, since it’s they who appoint them.
  • Secretaries are loyal to each other, since they are bureaucrats who like their positions and will in theory support each other, although in practice there exist many party rivalries.

Click the link or image below to launch the interactive diagram, which can be searched, panned, and zoomed, and which shows details for every staff member on the government. Red dots are prime ministers, bright blue ministers, dark blue secretaries. Every person is marked with a color of the highest position occupied.

Launch the interactive loyalty diagram

Social network of staff in Slovenian governments 2001-2013
Social network of staff in Slovenian governments 2001-2013

A few lines of commentary:

  • There are a select few of loyal party cadres that every prime minister carries with him, or her, which very rarely, if at all, work with anyone else. These are the dark blue and bright blue dots in close proximity of red dots (prime ministers).
  • Node radius is proportional to how many times the individual sat in a government over the years. For example, Janez Janša was not only prime minister twice, he also served in other capacities, most notably as Minister of Defense in 1994 and was taking on  more and more departmental duties as his government in 2012 slowly disintegrated.
  • There is a big cluster of common cadres between Janez Drnovšek’s and Anton Rop’s governments. It seems that a lot of secretaries are passed on into the next mandate, except in case of shift between left- and right-wing governments, which perform a purge on inauguration.
  • Anton Rop had most secretaries and the biggest government. If anything, the governments are getting slimmer with time.
  • People in the middle of diagram are generally dragged there because of many ties with different prime ministers and ministers, so they are either the most politically promiscuous, or (theoretically) the best experts in their fields, a theory swiftly disproven considering they took on ministerial duties in vastly different departments. These are the most die-hard bureaucrats who mostly didn’t do much else in life except being politicians. For the sake of argument, let’s suppose there are exceptions even between them.

Here is how the social network of government actors evolved over time:

Growth of social network of Slovenian government members 2001-2013 from Marko O’Hara on Vimeo.


Next diagram shows connections of same cadres to their respective fields of work. Green dots are government offices, other colors are the same as in diagram above. Here one can see, for example:

  • Who is walking in corridors of true power: prime ministers like to keep close Department of Defence, Department of Finance and Department of Internal Affairs. People close to these offices are the movers and shakers.
  • How different the governments of Slovenia truly were: departments were clumped together with other departments over time, split and again clumped with other departments. There’s hardly a department which survived this period without being split or clumped, most notably Department of Defense.
  • Who held which functions, and how are different departments connected with various people.

Launch the interactive diagram of employment by government office

Social network of members and government offices in Slovenian governments 2001-2013
Social network of members and government offices in Slovenian governments 2001-2013

Here is a short video of how all this evolved over time:

Growth of Slovenian government members and ministries 2001-2013 from Marko O’Hara on Vimeo.


Data sources

All data was kindly provided by Government of Republic of Slovenia. Download CSV version here. If anyone wants original documents, e-mail me. My address is on About page.


Graphs were constructed in Java and exported to Gephi for visualization, then again exported to web-friendly sigma.js format.

Correction: it’s actually from 1991 to 2013.

Data-driven drinking: ingredients and brands in the cocktail shaker

Share Button

A better title for post would probably be “What kinds booze to drink together to get drunk in style, according to those who write, compile, publish, test and enjoy cocktail recipes”. Continuing from previous posts, I wanted to see how does it look a network of ingredients of all possible cocktail recipes, and if it’s possible to divide them into sensible groups, so that they would be instantly recognizable and even helpful to experienced and casual drinkers alike.

To do this, more than 25,000 recipes from Drinksmixer.com and Drinksnation.com were scraped, a network was constructed with Gephi, and visualized here below. Dot size reflects the count of that particular ingredient in all analyzed recipes. Dots of same color frequently appear together in recipes. One could say that one can hardly make a mistake if one combines three ingredients of the same color and drinks the concoction.

The map below is interactive, try panning and zooming with mouse or use the control in the upper left-hand corner.

I see five major groups of ingredients, but your alcohol proof may vary. Actually I suspected something like that:

  • ice is in its own group. For some reason it also contains tequila,
  • milky drinks are in their own group (gray-blue),
  • salty and spicy drinks are also in an easily recognizable group (pink),
  • blue group is dominated by vodka and rum,
  • green group mostly has gin and tangy juices, and
  • red group mostly contains fruit schnappses and liqueurs.

You can download hi-res static images here: black background | white background.

For a more mobile-friendly, searchable map with advanced interactivity, click here (Sigma.js). Clicking on an ingredient on this map will show a list of all connected ingredients. Clicking on an element in the list will show a subgraph.

Most recipes contained preferred brands for spirits and fruit juices, so I constructed another diagram. It shows which brands are usually grouped together in drinks.

Here is the interactive map:

Download hi-res static images here: black background | white background.
For a searchable map with advanced interactivity, click here. Clicking on a brand on this map will show a list of all connected brands.

I find it funny that Everclear, Kool-Aid and Mountain Dew are so close. Does that mean that people just pour 100% ethanol and caffeinated water in a jug and drink that? Possibly.


Coming up next: data-driven cooking.

Visualizing drug talk on bluelight.ru

Share Button

In mainstream media, there’s not a lot to be found about recreational drugs except horror stories and arguments for prohibition. From time to time we also hear that Steve Jobs liked to drop acid when he was young, that countless Vietnam vets easily kicked heroin habit upon coming home, and, as US-fed-sponsored study found out, that psychedelic mushrooms can bring a lasting and positive personality change in more than half of those who take them.

Where to find good information? There exist internet communities, so-called harm-reduction forums, where one can spend a few hours to discover that the truth is not black and white. Surely junkies exist, and using meth daily is not a life strategy anyone could recommend, but not all drugs were created equal. There are many classes of recreational drugs, each acting on specific chemical pathways in body – uppers on dopamine, hallucinogens on serotonin, downers on GABA, etc.

Mapping drugs

I thought it would be nice to visualize these drug groups based on what users of harm-reduction forums say, so I analyzed around 1.2 million posts on bluelight.ru and constructed a simple diagram that tells a lot. It was constructed in such a way that drugs that are frequently mentioned together, appear together. Circle radii are proportional with frequency of appearance of the same drugs in the posts. Methodology is explained at the bottom of the post.

Here’s the diagram, pan and zoom at will:

Click here to peruse a clickable, searchable version of the same diagram (give it a second to load). To download a high-resolution image (8000 x 6000), click here (black) or here (white).

The drug groups are color coded for better readability. Starting from the top:

  • light blue group: mostly antidepressives – SSRIs such as Prozac (fluoxetine), Zoloft and such.
  • violet group:  mainly contains benzodiazepines such as Xanax, Valium, and Lorazepam, which are commonly abused, but there are a lot of other downers there.
  • orange group: opiates and opioids, soch as heroin, oxycontin and the like. There were so many mentions of “opiates” without referring to a specific chemical that I considered it would be a pity to leave the word out.
  • dark yellow group on the right: mostly dissociatives such as ketamine and DXM, but there’s also a subgroup on the right side. It forms a larger group, mixed with differently colored drugs, that could be called “shamanic corner”, as it mostly contains so-called entheogens and natural concoctions such as ayahuasca.
  • light orange group: mainly nootropics such as Piracetam. Some use them to enhance a psychedelic or MDMA experience, but they have a more general use as memory, intelligence and sensory enhancers.
  • red group: I don’t know what to call this, but these are “working man’s drugs”. The common drugs that we hear about in the media. Some of these drugs are not considered drugs at all, for example alcohol and tobacco, but the Bluelight discussions show that they are very common. Thinking about it, one must have something to drink while one insufflates synthetic powders, and a cigarette is also a good thing to have while waiting for something stronger to take hold.
  • green group: psychedelic drugs such as shrooms, LSD, DMT and mescaline, along with many newer variations and analogs, such as 2C-X family, the DMT analogs and the whole Tihkal inventory.
  • blue group: Ecstasy (MDMA) and newer stimulants and entactogens, such as methylone, mephedrone, etc. “Plant foods” and “bath salts” are in this category.

Mapping effects

Simply mapping out the drugs is nice, but additional step seemed in order: mapping coincidence of various effects the drugs have on users. Again, posts were analyzed, but in addition to drugs, some (not all!) common effects were extracted and mapped in a network. Result is in the diagram below. Darker dots are effects, lighter are drugs. Size is again proportional to number of mentions in all posts.

Click here to peruse a clickable, searchable version of the same diagram. To download a high-resolution image, click here (black) or here (white).

Note that above diagram does not indicate semantic relationships between drugs and their effects. For example, why is “marijuana” close to “death”? Maybe there was a lot of talk about fear of death that the marijuana experience helps to resolve, or maybe people like to describe how they are dying of laughter while smoking weed. I honestly don’t know. I suspect it’s because of close relationship between mentions (not necessarily use!) of marijuana and those of alcohol, cocaine and methamphetamine, which could have a more significant relation with death or dying.
What’s really notable is heavy clustering of adverse effects around opiates, and relative absence of same around psychedelics. Based on Bluelight data, I can safely conclude that psychedelic drugs do not cause users to complain a lot, except maybe mentioning hallucinations and visuals, but, well …

Drug use over the years

My whole database contains posts from 2010 until March 2013. Here’s an analytical tool to better understand what’s going on in the recreational drug market community. Time is on horizontal axis, while the proportion of posts mentioning specific drug relative to all posts in that month is on the vertical axis.

Play around with interactive chart to discover emerging trends, or simply to behold the wax and wane of specific chemicals as they compete for users’ neurological apparatuses, while their manufacturers are temporarily evading ever stricter analog laws:

Commentary: Bluelight is a harm reduction forum, historically established for the users to be able to tell a good Ecstasy pill from the bad, so MDMA is the most mentioned drug. Use of “classic” drugs doesn’t change much, but it’s interesting to note the rise of new “research chemicals” such as NBOME family, new cathinones (3-MMC), new synthetic canabinoids (STS-135) and different amphetamines, prevalently methamphetamine. You can also see how the newly banned drugs, for example mephedrone, go out of use, and their analogs, in this case 3-MMC, replace them.

Methodology and tools

First, all the Bluelight forums were crawled and contents, dates and other metadata of all posts put into a SOLR index. That took approximately two days of not too aggressive load on their server (thanks Bluelight for not banning my IP).
To make first two network diagrams, undirected graphs were constructed with JGraphT library so that all extracted entities – drugs and effects – in every post were connected as nodes. Mentions of all extracted entities were counted to make the dots size show frequencies, not network degrees. That yielded complete graphs to be visualized with Gephi. Gephi files were exported to a TileMill-friendly format to render map tiles. Tiles are displayed on the site using Leaflet.
To make the interactive chart, SOLR was used to produce time series. Data was then packed into suitable format for the Flot library to be able to display.
To extract entities, two dictionaries were used – one for drugs, one for effects. You can download them here: drugs / effects.
If anyone is interested in the SOLR core, I can put it on Dropbox. Send me a note, my email is on the About page.

What is not here, but could be

  • analysis of effects that specific drugs have over time
  • a chart of effects only
  • some different visualization that could help to establish relationships between specific drugs and effects they have. For example, it’s been known for some time that mephedrone and various dragonflies have vasoconstrictive effects. Maybe some other relationship could be inferred that way.
  • first map should be clickable to search on Wikipedia, I’ll add that as soon as I figure out the Wax lib.

I may revisit this theme in the future.

Some pics:

Drug talk visualizations

How the social network of Hollywood actors evolved over time

Share Button

This is a short update on previous post, which is a static visualization of relatedness between movies, genres and tags, as seen on IMDB. It’s produced from the same dataset of around 15,000 films, grabbed from IMDB.

actors at 2010

It shows the social network of actors and its growth and changes over time.Every movie was analyzed, and a network constructed such that actors that worked together in a movie are connected. This connection also has a time dimension, so from first time when two actors have first seen each other, the network considers them as friends. Of course the actors have their own creative preferences, so they work with other people over time, so they move around the space – at first they are associated with one group, then with another.

The video below shows the animated network, as it evolved from 1960 to 2013. Only actors with more than 100 connections are shown.

It’s interesting to follow Robert De Niro’s associations. He starts off center, moves to center and then separates himself from the majority along with select other actors, which might be some kind of Hollywood elite.

Also note how Christopher’s career flowers and then wanes, as the century comes to a close.

Drop me a note in comments, if you want a bigger or more detailed visualization. This one was made with Gephi and some heavy processing and filtering before that.


How to publish Gephi graphs on MapBox with TileMill

Share Button

This is a simple how-to on publishing Gephi graphs in a tile-based, zoomable map suitable for online presentation. It’s intended for Gephi users, who find other solutions lacking, or would simply like to learn to publish in a free, cloud-based service.

If you have a Gephi graph and would like to publish it, you can export node and edge data in one of standard formats and use it to render tiles to use in Leaflet, Google Maps or other suitable APIs. This requires a server to store the tiles and some knowledge to render them and use the API.

There’s another possibility – you can use TileMill to render the tiles and free online service MapBox to host and display them. So here’s how to do it. I used this procedure to render the map here.

Exporting the graph data to TileMill

There’s more ways to do that. This one is relatively easy,  but it requires some programming knowledge, as you have to use the Gephi Toolkit.

First make the graph. You can use Gephi or Gephi Toolkit. I use Gephi, since it allows to visually inspecting the graph, run additional correcting layout algorithms and so on. Save the graph as a .gephi file.


The graph is already spatialized and analyzed for modularity classes, so the node information in .gephi file contains at least coordinates (x, y), modularity classes, labels and sizes. TileMill can import CSV, so this is what we are going to do using Gephi Toolkit.

Here is the Java code:

 public class RGraph {
private static final String root = “C:\\Users\\solipsy\\Documents\\!Data\\Gephi\\”;

public void openGephi (File file) {
//Init a project – and therefore a workspace
ProjectController pc = Lookup.getDefault().lookup(ProjectController.class);
if (pc.getCurrentProject() != null) {
Workspace workspace = pc.getCurrentWorkspace();

ImportController importController = Lookup.getDefault().lookup(ImportController.class);
GraphModel graphModel = Lookup.getDefault().lookup(GraphController.class).getModel();
AttributeModel attributeModel = Lookup.getDefault().lookup(AttributeController.class).getModel();

//Append imported data to GraphAPI
//See if graph is well imported
DirectedGraph graph = graphModel.getDirectedGraph();
System.out.println(“Nodes: ” + graph.getNodeCount());
System.out.println(“Edges: ” + graph.getEdgeCount());

//export to CSV
try {
CSVWriter writer = new CSVWriter(new FileWriter(root + file.getName() + “.csv”), ‘\t’);
String[] header  = “Latitude#Longitude#Modularity#Size#Label”.split(“#”);
for ( Node n: graphModel.getGraph().getNodes().toArray()) {
if (Math.sqrt(n.getNodeData().x() * n.getNodeData().x() + n.getNodeData().y() * n.getNodeData().y()) > 3500) {
String [] entry = new String[5];
entry[0] = String.valueOf(n.getNodeData().x());
entry[1] = String.valueOf(n.getNodeData().y());
entry[2] = String.valueOf(n.getAttributes().getValue(Modularity.MODULARITY_CLASS));
entry[3] = String.valueOf(n.getNodeData().getRadius());
entry[4] = String.valueOf(n.getNodeData().getLabel());
System.out.println (“modclass: ” + n.getAttributes().getValue(Modularity.MODULARITY_CLASS) +
“\tx: ” + n.getNodeData().x() +
” \ty:” + n.getNodeData().y() +
“\tsize: ” + n.getNodeData().getRadius() +
“\tlabel: ” + n.getNodeData().getLabel());


} catch (IOException e) {
// TODO Auto-generated catch block

ExportController ec = Lookup.getDefault().lookup(ExportController.class);
try {
ec.exportFile(new File(root + file.getName() + System.currentTimeMillis() + “.png”));
} catch (IOException ex) {

You’ll need to import Gephi Toolkit and OpenCSV. Incorporate the export class in your project and call the “openGephi” method. Correct paths to reflect directory structure on your computer. The method should produce a CSV file with following attributes: latitude (x), longitude (y),  size, label, modularity class. Open the file in Notepad and replace out the quotes. Now it’s ready to import into TileMill.

Importing data into TileMill and setting up a project


This is not what you’ll get when you import the data. Its’ a look of a finished project. First you open Tilemill and add a new layer:


Give it a name and choose “900913” (Google) in SRS dropdown. That’ll place your graph right in the center of the map. You’ll notice that it’s just a tiny dot on the first zoom level. Zoom in, until you can clearly see the distinct dots. Zoom some more to decide the zoom bracket for your map, then set it using the slider. For the map above, I used zooms from 14 to 19. You should really use this option, or else the map will be huge, result in thousands of GBs of data and render for a year. You should also mark a part of the whole map to later export. Shift and drag around your graph to select the smallest possible area.

2-tilemill-project settings

The metatile setting is important, but leave it on 1 for now. It’s used to prevent marker and label clipping on closer zoom levels. Larger means less clipping, but also less responsive map during editing.

Now it’s time to style your map so the nodes and labels are displayed in correct sizes and colors.

Styling the map

TileMill uses something called CartoCSS for styling labels, lines, markers, etc. It’s a simple conditional CSS. You can adjust values for each zoom level, and that’s what we are going to do. We’ll use markers to display the nodes, and set marker sizes so that they reflect the values in the “Size” column of your CSV file.


We’ll have to set the marker size to read the data in the column. This if for the biggest zoom level. Marker sizes get lower on lower levels by a factor 2, so for zoom levels 19 and 18 the marker size is specified like this:

[zoom = 19] {
marker-width: [Size] * 8;


[zoom = 18] {
marker-width: [Size] * 4;


You can guess the rest, it’s just dividing down the marker size. Unfortunately, it’s impossible to do something like that for labels. So we have to generate a list of node size brackets and corresponding marker sizes for each zoom level separately. I use Excel to do this, but maybe it’d be better to just write another method to generate all that during export. So, for zoom 19 we have:

[zoom = 19] {
marker-width: [Size] * 8;
[Size >0][Size <= 5] {text-size:8 }
[Size >5][Size <= 10] {text-size:13 }
[Size >10][Size <= 20] {text-size:20 }
[Size >20][Size <= 30] {text-size:40 }
[Size >30][Size <= 40] {text-size:60 }
[Size >40][Size <= 50] {text-size:80 }
[Size >50][Size <= 60] {text-size:100 }
[Size >60][Size <= 70] {text-size:120 }
[Size >70][Size <= 80] {text-size:140 }
[Size >80][Size <= 90] {text-size:160 }
[Size >90][Size <= 100] {text-size:180 }
[Size >100][Size <= 110] {text-size:200 }
[Size >110][Size <= 120] {text-size:220 }
[Size >120][Size <= 130] {text-size:240 }
[Size >130][Size <= 140] {text-size:260 }
[Size >140][Size <= 150] {text-size:280 }
[Size >150][Size <= 160] {text-size:300 }
[Size >160][Size <= 170] {text-size:320 }
[Size >170][Size <= 180] {text-size:340 }
[Size >180][Size <= 190] {text-size:360 }
[Size >190][Size <= 200] {text-size:380 }
[Size >200][Size <= 210] {text-size:400 }
[Size >210][Size <= 220] {text-size:420 }
[Size >220][Size <= 230] {text-size:440 }
[Size >230][Size <= 240] {text-size:460 }
[Size >240][Size <= 250] {text-size:480 }
[Size >250][Size <= 260] {text-size:500 }
[Size >260][Size <= 270] {text-size:520 }
[Size >270][Size <= 280] {text-size:540 }
[Size >280][Size <= 290] {text-size:560 }
[Size >290][Size <= 300] {text-size:580 }
[Size >=300][Size <= 310] {text-size:600 }

and for 18:

[zoom=18] {
marker-width: [Size] * 4;
[Size >0][Size <= 5] {text-size:2 }
[Size >5][Size <= 10] {text-size:5 }
[Size >10][Size <= 20] {text-size:10 }
[Size >20][Size <= 30] {text-size:20 }
[Size >30][Size <= 40] {text-size:30 }
[Size >40][Size <= 50] {text-size:40 }
[Size >50][Size <= 60] {text-size:50 }
[Size >60][Size <= 70] {text-size:60 }
[Size >70][Size <= 80] {text-size:70 }
[Size >80][Size <= 90] {text-size:80 }
[Size >90][Size <= 100] {text-size:90 }
[Size >100][Size <= 110] {text-size:100 }
[Size >110][Size <= 120] {text-size:110 }
[Size >120][Size <= 130] {text-size:120 }
[Size >130][Size <= 140] {text-size:130 }
[Size >140][Size <= 150] {text-size:140 }
[Size >150][Size <= 160] {text-size:150 }
[Size >160][Size <= 170] {text-size:160 }
[Size >170][Size <= 180] {text-size:170 }
[Size >180][Size <= 190] {text-size:180 }
[Size >190][Size <= 200] {text-size:190 }
[Size >200][Size <= 210] {text-size:200 }
[Size >210][Size <= 220] {text-size:210 }
[Size >220][Size <= 230] {text-size:220 }
[Size >230][Size <= 240] {text-size:230 }
[Size >240][Size <= 250] {text-size:240 }
[Size >250][Size <= 260] {text-size:250 }
[Size >260][Size <= 270] {text-size:260 }
[Size >270][Size <= 280] {text-size:270 }
[Size >280][Size <= 290] {text-size:280 }
[Size >290][Size <= 300] {text-size:290 }
[Size >300][Size <= 310] {text-size:300 }

Then just continue dividing, until you reach your last zoom level. It’s important not to use too many intervals in a zoom level, or TileMill will crash, at least on Windows.

Now set up colors. If you used modularity, look up the the numbers for modularity classes in Gephi and use them in CSS. In my example, I use Modularity column in CSS to determine node colors:

[Modularity = 43]  {marker-fill:#0000FF;}
[Modularity = 72]  {marker-fill:#008B00;}
[Modularity = 5]  {marker-fill:#EEB422;}
[Modularity = 3]  {marker-fill:#8E388E;}
[Modularity = 9]  {marker-fill:#FF1493;}
[Modularity = 0]  {marker-fill:#CD0000;}
[Modularity = 25]  {marker-fill:#388E8E;}
[Modularity = 38]  {marker-fill:#8E8E38;}
[Modularity = 6]  {marker-fill:#1E90FF;}
[Modularity = 10]  {marker-fill:#000080;}
[Modularity = 7]  {marker-fill:#00EE00;}
[Modularity = 21]  {marker-fill:#B8860B;}

Before exporting, don’t forget to set the metatile size to at least 7 to prevent clipping.

Exporting to MapBox

Create an account on MapBox. It’s free for up to 50 MB of data and 5000 views in a rolling month.

In TileMill, select “Export / MBTiles”. Name your map, review the settings and click “Export”, if you want to save tiles to disk, or “Upload” if you want them to immediately appear in your MapBox account. Then make your map visible and share!