Tagged: network

group1

Voting and attendance in Slovenian Parliament from 2004 to current term

Share Button

In Slovenia, we have a love/hate relationship with our politicians. We hate them, because at almost every single step they make, they let us know they are corrupt and they can easily get away with it. But in each new election new faces appear, promptly get elected and are hailed as saviors, who will finally clean the Augean stables of greed and corruption that has been accumulating for too long.

Most emotions are reserved for those in the front row, mainly government members. Members of parliament are somehow exempted, as they are not so widely known. Somehow, they are not monitored properly, at least in my book. There is a site that contains session records per member and per session, but it’s not widely known. It was an inspiration for this attempt to present members’ activity in an easily understandable and graphic way for current term and a few terms in the past.

See the interactive version:        Slo                             Eng

Interest groups

The main idea was to group the parliamentary members by similarity of their voting record. Most parliamentary members are bound by strict voting discipline, imposed by the parties they belong to. This way the parties can guarantee that some or another act will pass and become a law. But is this really so? I tried to use a simple machine learning technique to answer that question. First I collected all the voting results from parliamentary term and sorted them in chronological order, then applied the technique (k-means clustering, for technologically minded). Number of groups was set to ten, but I could increase it to see smaller groups – maybe fractions inside parties, or cross-party interest groups.

Below you can see an example of two groups from recent term.

Here is the first:

And here another:group1

It’s apparent that groups do not contain representatives from one party only, and the visual representation imparts a feel for the differences in voting. As I mentioned above, I arbitrarily constructed ten groups, but a serious researcher would play and tinker with the number, as every clustering technique is an exploratory process and must be iterated upon for best results. It’s interesting that the results also show other parliamentary tactics. This one below could be interpreted as obstruction, or simply passivity or indifference. So what is it? To ask this question is to answer it, I guess.

To put it in context, this is a group of left-wing opposition representatives during a period when they were in heavy minority.

Indifference or obstruction?
Indifference or obstruction?

In contrast, this is the right-wing voting machine that prevailed:

A disciplined voting machine
A disciplined voting machine

The contrast between these two groups is so dramatic that it would be funny, if these were funny affairs.  While the opposition was idling away, the majority voted into existence law after law that, together, still influence the lives of the Slovenian citizenry. In interactive version (English) you can explore what the votes were about by simply moving the mouse over horizontal stripes.

See the interactive version:        Slo                             Eng

Attendance record

Session attendance is another telling indicator of particular representative’s zeal in upholding democracy and fulfilling the interests of his constituency. It’s already apparent from  charts above, but I still constructed a separate graphics for that. It’s sorted by presence and more easily readable.

It has to be noted that some representatives were excused from voting sessions for various periods of time. Among them are those who became ministers and those who replaced them in the parliamentary seat, not being there before.

Here’s an example from the recent term. At the bottom, you can see two blocks with alternating presence. That’s because there were two governments. When the first one fell, the ministers returned to their seats; those who originally replaced them, returned to the party’s roster; new ministers were sworn in and abandoned their seats; and new replacements came from opposite camp.

attendanceEN

See the interactive version:        Slo                             Eng

 Yes-men and rebels

Another interesting statistics is: representatives with most votes for yea or nay. I don’t really know how to interpret this, but I did it nevertheless. One could say that in terms with only one governments, members of ruling majority with most yea votes are those who unquestioningly toe the party line.  Conversely, those with most nay votes are most fervent members of the opposition. In terms with two governments, this is a little less clear-cut: one would have to separate the timelines and run the statistics on subperiods for each government. I didn’t do this, but a serious researcher would. I made this report to let them know that they are being monitored, but it’s a task of an investigative journalist to delve into the data and interpret it in a meaningful way. I don’t have time for this, and I don’t really know the particulars of daily politics here enough to be able to do that.

But I’m offering the database to anyone who would like to do that. Send me a mail for details, I’ll gladly oblige.

Here are a few simple pie charts that illustrate what I just wrote:

Yes men and rebels
Yes men and rebels

See the interactive version:        Slo                             Eng

Unity index

While programming, it struck me that I could calculate a synthetic measure that would show the unity in the parliament. The reasoning goes: if the vote was unanimous, the parliament as a whole was united in cause at hand. But if half of representatives voted yea, and the other half nay, the parliament was divided. So I constructed a timeline of all voting sessions and colored every session according to this measure. Blue for unanimous vote, red for evenly split vote, and violet hues as nuances of disharmony.

Additionally, the bar heights indicate the presence ratio. Lower heights obviously mean lower presence.

In some terms, the presence falls toward the end, and the proportion of red bars increase. This means that the representatives lost heart and abandoned their posts, and those who stayed, quarreled bitterly.

Here are these graphics for various terms. They are stretched to same length. Perhaps a more correct, but less visually appealing approach would be not to stretch them, so the length of particular term would be apparent.

indexEN
IV (2004 – 2008) – PM Janez Janša
indexEN
V (2008 – 2011) – PM Borut Pahor – ended prematurely
indexEN
VI (2011 – 2014) PM Janez Janša, PM Alenka Bratušek – ended prematurely
indexEN
VII (present) – probable PM Miro Cerar

See the interactive version:        Slo                             Eng

Session timelines and voting networks

The drive behind this section was to find out whether the attendance is falling, as the session progresses into small hours. I found that not to be so, which is encouraging in a way. These charts at least show which sessions were bitterly contested, and which were almost unanimous. You can see examples of both behaviors in the graphic below.

sessions

Going one step further, I constructed a separate network for each session in a way that if a representative voted for a proposition, he or she is connected with it, otherwise no.

Networks are a little bit messy, and people tend to not understand them well. This network below shows three groups of representatives (you can zoom in and out in the interactive version). They are grouped close to the propositions they voted for. So this is another opportunity to find out the interest groups on the micro level, for each proposition. Some propositions don’t have a name, just a date. That’s not my fault, but the parliament’s, as they didn’t bother to publish it on the web.network

See the interactive version:        Slo                             Eng

Seating order

Finally, here are some heatmaps for various variables, mapped on to seating orders. The first is partitioned according to representatives’ party. Sorry, no legend here. You can mouse over in the interactive version to show details.

The second is attendance heatmap. Green is full attendance, red is total absence, and there’s a linear color scale between them. This one provides at-a-glance overview of attendance of entire party blocks.

Next two are yea and nay heatmaps, so you can see which party blocks mostly voted yea, and which nay. They are normalized to their local maxima for visual appeal, but a more correct approach would be to not normalize them, so it would be apparent that a nay vote is much less frequent than a yea. Why, I have no Idea, but I imagine there must be a lot of technical votings, for example establishing presence and so on.

seatsEN

These seating orders are approximate, as I couldn’t get them for past terms from the parliament. They asserted that they didn’t have them, and claimed they don’t even have the current one, even if it’s published on their own website. There were more lies, but I won’t go into that here. They are, after all, in power, and I’m just a blogger.

Why they should engage in such behaviour is beyond me. Maybe they think that the information is theirs and should be kept from the public.

Again, if anyone needs the MongoDB database, drop me a note. My email address is on the About page.

See the interactive version:        Slo                             Eng

Data-driven drinking: ingredients and brands in the cocktail shaker

Share Button

A better title for post would probably be “What kinds booze to drink together to get drunk in style, according to those who write, compile, publish, test and enjoy cocktail recipes”. Continuing from previous posts, I wanted to see how does it look a network of ingredients of all possible cocktail recipes, and if it’s possible to divide them into sensible groups, so that they would be instantly recognizable and even helpful to experienced and casual drinkers alike.

To do this, more than 25,000 recipes from Drinksmixer.com and Drinksnation.com were scraped, a network was constructed with Gephi, and visualized here below. Dot size reflects the count of that particular ingredient in all analyzed recipes. Dots of same color frequently appear together in recipes. One could say that one can hardly make a mistake if one combines three ingredients of the same color and drinks the concoction.

The map below is interactive, try panning and zooming with mouse or use the control in the upper left-hand corner.

I see five major groups of ingredients, but your alcohol proof may vary. Actually I suspected something like that:

  • ice is in its own group. For some reason it also contains tequila,
  • milky drinks are in their own group (gray-blue),
  • salty and spicy drinks are also in an easily recognizable group (pink),
  • blue group is dominated by vodka and rum,
  • green group mostly has gin and tangy juices, and
  • red group mostly contains fruit schnappses and liqueurs.

You can download hi-res static images here: black background | white background.

For a more mobile-friendly, searchable map with advanced interactivity, click here (Sigma.js). Clicking on an ingredient on this map will show a list of all connected ingredients. Clicking on an element in the list will show a subgraph.

Most recipes contained preferred brands for spirits and fruit juices, so I constructed another diagram. It shows which brands are usually grouped together in drinks.

Here is the interactive map:

Download hi-res static images here: black background | white background.
For a searchable map with advanced interactivity, click here. Clicking on a brand on this map will show a list of all connected brands.

I find it funny that Everclear, Kool-Aid and Mountain Dew are so close. Does that mean that people just pour 100% ethanol and caffeinated water in a jug and drink that? Possibly.

 

Coming up next: data-driven cooking.

Visualizing drug talk on bluelight.ru

Share Button

In mainstream media, there’s not a lot to be found about recreational drugs except horror stories and arguments for prohibition. From time to time we also hear that Steve Jobs liked to drop acid when he was young, that countless Vietnam vets easily kicked heroin habit upon coming home, and, as US-fed-sponsored study found out, that psychedelic mushrooms can bring a lasting and positive personality change in more than half of those who take them.

Where to find good information? There exist internet communities, so-called harm-reduction forums, where one can spend a few hours to discover that the truth is not black and white. Surely junkies exist, and using meth daily is not a life strategy anyone could recommend, but not all drugs were created equal. There are many classes of recreational drugs, each acting on specific chemical pathways in body – uppers on dopamine, hallucinogens on serotonin, downers on GABA, etc.

Mapping drugs

I thought it would be nice to visualize these drug groups based on what users of harm-reduction forums say, so I analyzed around 1.2 million posts on bluelight.ru and constructed a simple diagram that tells a lot. It was constructed in such a way that drugs that are frequently mentioned together, appear together. Circle radii are proportional with frequency of appearance of the same drugs in the posts. Methodology is explained at the bottom of the post.

Here’s the diagram, pan and zoom at will:

Click here to peruse a clickable, searchable version of the same diagram (give it a second to load). To download a high-resolution image (8000 x 6000), click here (black) or here (white).

The drug groups are color coded for better readability. Starting from the top:

  • light blue group: mostly antidepressives – SSRIs such as Prozac (fluoxetine), Zoloft and such.
  • violet group:  mainly contains benzodiazepines such as Xanax, Valium, and Lorazepam, which are commonly abused, but there are a lot of other downers there.
  • orange group: opiates and opioids, soch as heroin, oxycontin and the like. There were so many mentions of “opiates” without referring to a specific chemical that I considered it would be a pity to leave the word out.
  • dark yellow group on the right: mostly dissociatives such as ketamine and DXM, but there’s also a subgroup on the right side. It forms a larger group, mixed with differently colored drugs, that could be called “shamanic corner”, as it mostly contains so-called entheogens and natural concoctions such as ayahuasca.
  • light orange group: mainly nootropics such as Piracetam. Some use them to enhance a psychedelic or MDMA experience, but they have a more general use as memory, intelligence and sensory enhancers.
  • red group: I don’t know what to call this, but these are “working man’s drugs”. The common drugs that we hear about in the media. Some of these drugs are not considered drugs at all, for example alcohol and tobacco, but the Bluelight discussions show that they are very common. Thinking about it, one must have something to drink while one insufflates synthetic powders, and a cigarette is also a good thing to have while waiting for something stronger to take hold.
  • green group: psychedelic drugs such as shrooms, LSD, DMT and mescaline, along with many newer variations and analogs, such as 2C-X family, the DMT analogs and the whole Tihkal inventory.
  • blue group: Ecstasy (MDMA) and newer stimulants and entactogens, such as methylone, mephedrone, etc. “Plant foods” and “bath salts” are in this category.

Mapping effects

Simply mapping out the drugs is nice, but additional step seemed in order: mapping coincidence of various effects the drugs have on users. Again, posts were analyzed, but in addition to drugs, some (not all!) common effects were extracted and mapped in a network. Result is in the diagram below. Darker dots are effects, lighter are drugs. Size is again proportional to number of mentions in all posts.

Click here to peruse a clickable, searchable version of the same diagram. To download a high-resolution image, click here (black) or here (white).

Note that above diagram does not indicate semantic relationships between drugs and their effects. For example, why is “marijuana” close to “death”? Maybe there was a lot of talk about fear of death that the marijuana experience helps to resolve, or maybe people like to describe how they are dying of laughter while smoking weed. I honestly don’t know. I suspect it’s because of close relationship between mentions (not necessarily use!) of marijuana and those of alcohol, cocaine and methamphetamine, which could have a more significant relation with death or dying.
What’s really notable is heavy clustering of adverse effects around opiates, and relative absence of same around psychedelics. Based on Bluelight data, I can safely conclude that psychedelic drugs do not cause users to complain a lot, except maybe mentioning hallucinations and visuals, but, well …

Drug use over the years

My whole database contains posts from 2010 until March 2013. Here’s an analytical tool to better understand what’s going on in the recreational drug market community. Time is on horizontal axis, while the proportion of posts mentioning specific drug relative to all posts in that month is on the vertical axis.

Play around with interactive chart to discover emerging trends, or simply to behold the wax and wane of specific chemicals as they compete for users’ neurological apparatuses, while their manufacturers are temporarily evading ever stricter analog laws:

Commentary: Bluelight is a harm reduction forum, historically established for the users to be able to tell a good Ecstasy pill from the bad, so MDMA is the most mentioned drug. Use of “classic” drugs doesn’t change much, but it’s interesting to note the rise of new “research chemicals” such as NBOME family, new cathinones (3-MMC), new synthetic canabinoids (STS-135) and different amphetamines, prevalently methamphetamine. You can also see how the newly banned drugs, for example mephedrone, go out of use, and their analogs, in this case 3-MMC, replace them.

Methodology and tools

First, all the Bluelight forums were crawled and contents, dates and other metadata of all posts put into a SOLR index. That took approximately two days of not too aggressive load on their server (thanks Bluelight for not banning my IP).
To make first two network diagrams, undirected graphs were constructed with JGraphT library so that all extracted entities – drugs and effects – in every post were connected as nodes. Mentions of all extracted entities were counted to make the dots size show frequencies, not network degrees. That yielded complete graphs to be visualized with Gephi. Gephi files were exported to a TileMill-friendly format to render map tiles. Tiles are displayed on the site using Leaflet.
To make the interactive chart, SOLR was used to produce time series. Data was then packed into suitable format for the Flot library to be able to display.
To extract entities, two dictionaries were used – one for drugs, one for effects. You can download them here: drugs / effects.
If anyone is interested in the SOLR core, I can put it on Dropbox. Send me a note, my email is on the About page.

What is not here, but could be

  • analysis of effects that specific drugs have over time
  • a chart of effects only
  • some different visualization that could help to establish relationships between specific drugs and effects they have. For example, it’s been known for some time that mephedrone and various dragonflies have vasoconstrictive effects. Maybe some other relationship could be inferred that way.
  • first map should be clickable to search on Wikipedia, I’ll add that as soon as I figure out the Wax lib.

I may revisit this theme in the future.

Some pics:
drug_effects_diagram

Drug talk visualizations

How the social network of Hollywood actors evolved over time

Share Button

This is a short update on previous post, which is a static visualization of relatedness between movies, genres and tags, as seen on IMDB. It’s produced from the same dataset of around 15,000 films, grabbed from IMDB.

actors at 2010

It shows the social network of actors and its growth and changes over time.Every movie was analyzed, and a network constructed such that actors that worked together in a movie are connected. This connection also has a time dimension, so from first time when two actors have first seen each other, the network considers them as friends. Of course the actors have their own creative preferences, so they work with other people over time, so they move around the space – at first they are associated with one group, then with another.

The video below shows the animated network, as it evolved from 1960 to 2013. Only actors with more than 100 connections are shown.

It’s interesting to follow Robert De Niro’s associations. He starts off center, moves to center and then separates himself from the majority along with select other actors, which might be some kind of Hollywood elite.

Also note how Christopher’s career flowers and then wanes, as the century comes to a close.

Drop me a note in comments, if you want a bigger or more detailed visualization. This one was made with Gephi and some heavy processing and filtering before that.

 

Exploring Hollywood values through IMDB genres and tags

Share Button

A typical Hollywood story always portraits life in a twisted way. Movies are infused with values. There are typical stories: justice always prevails in the end, even if it means the death of a good guy; the coming-of-age story, in which hero becomes a man, the revenge story, in which the hero is wronged in the beginning, and must regain his life and justice in the course of the film. In American movies, family values are all-important, and so on.

These values are interrelated in the movie world, but what is their importance relative to other values? Is war a good or a bad thing, as portrayed in the movies? Is friendship close to romance, and is marriage close to love? What is science fiction – action, adventure or fantasy?

There happens to be a treasure trove of useful information on IMDB to visualize these relations. Each movie belongs to one or more genres, and on every movie page, there are tags for themes that occur in it. One could construct a network of movies that are interrelated through genres and tags they share. If two films share a tag, they must be closer than films that don’t share it. But there are many tags and over ten genres, so how does it look?

It looks like this (click image to launch interactive page):

 
Network graph

Roughy 15,000 movies, as presented on IMDB. Full map in a bigger window. There’s also a post showing how a social network of actors evolves over time from 1960 to 2013.

If a circle is bigger, it means it has more connections (movies associated with it). For example, The “Drama” tag seems to be the biggest, because apparently a big part of movies are dramas.

Same-colored circles belong to common categories, so for example “Drama”, “Romance”, “Love”, “Friendship”, “Marriage” and surprisingly “History” and “Biography” belong to the same group. Romance and drama are actually genres, and “love”, “history” and “biography” are tags. If you zoom in, you can see the movies associated with each tag and category.

It seems that most of Hollywood romance takes place in New York City, and that there’s a lot of sex going on there at the same time. There is some friendship involved, but not much. It’s interesting that marriage is on the opposite side of romance in relation to sex. It also seems that there is a lot of romantic activity on the set, as actors and actresses are closely related to it. This must be an artifact of Hollywood self-reflection.

On the other hand, California, as represented in movies, seems much more family-oriented. There’s a lot of boys, children, girls and babies around it. There’s also a lot of dreams and female nudity.

It’s also fun to construct sentences containing words of closely positioned tags. Drugs and money lead to suicide? Death by doctor in a hospital? Murder someone, get apprehended by police and go to prison?

It’s also apparent that sci-fi is nothing but a sub-genre of adventure. I always thought there are more brains to it. And fantasy seems nothing more than adventure for family audiences.

Have fun browsing the map, and let me know if you discover more fun facts.

Recent experiments with Gephi led me to speculate that it’s possible to extract meaning from a large volume of data with network analysis. This is a first post in this blog, and also the first in a series dealing with data and visualization.

If you are interested in making your own diagrams like that, here’s a how-to.

Edit: after being mentioned in Canadian Business magazine (thanks Matthew McClearn!), I should maybe add an explanation of my interpretation above. I’m actually interpreting relationships as portrayed in the movies, as inserted in the IMDB database. So what I wrote was actually an interpretation of an aggregation of simplifications of interpretations. I still think that my methodology is sound – after all, the diagram looks OK, and actually makes sense. It’s just curious to interpret.