These two projects are a result of recent collaboration with Transparency International Slovenia. The datasets were provided by the state, and I was asked to develop visualizations that would structure the information in an accessible way. Much help was also provided by members of Institut Jožef Štefan.
State project browser
The first project is a browser of all projects, initiated by state institutions, from 1991 on. The idea was to let users discover, where and for what purposes the money goes in their county. The dataset and visualization allow for exploration by various categories, as well as time.
The projects in the dataset also contain projects that are still in the planning phase, and won’t be completed until year 2025. With this tool, citizens can hopefully inspect the planned expenditures for roads, water sources, and other categories of infrastructure, culture and other fields of development, and compare that with their own expectations.
It allows browsing and filtering of projects by statistical regions and counties, as well as displaying the timeline of all projects, which is basically an expandable version of a Gantt chart.
To see the interactive project website, click here, or click the image below.
The original data is provided on the project’s “About” page.
County budget browser
The new project is a straightforward visualization of county budgets. The budgets are displayed as dynamic, zoomable hierarchical (“sunburst”) diagrams. They react to each other, allowing a side-by-side comparison of budgets of two user-selected counties.
The visualization enables users to delve into expenses and incomes of all Slovenian counties on separate tabs.
To see the interactive project website, click here, or click the image below.
Technology and design
The data cleanup and preparation was done with some Python scripts. The sunburst diagram accepts hierarchical data in a tree format, so this provided an interesting exercise of converting a tabular dataset into a nested dictionary of optional depth.
The visualizations were done in d3, which is really an indispensable tool for any serious work in online visualization.
Both projects were minimalistically, yet expertly designed by Tomaž Plahuta (Bitnik, Eno).
Check out the projects and let me know your opinion in the comments!
Haha, what a funny question. Of course they can’t. How can one teach a computer all the intricacies of lawmaking process, and trust it well enough to let it vote? This must surely be a recipe for disaster.
Yet, as I realized in previous research, the parties mostly demand ruthless discipline from their parliamentary representatives at voting time, simply to be able to actually govern in Slovenian multiparty democracy, where there’s never an absolute winner. This leads to coalition governments, where every vote counts towards a majority.
That means that in a polarized parliament, one could theoretically predict a representative’s vote by examining the votes cast by all other representatives. If an opposition party proposes to vote on an act, it’s very likely that members of government block will uniformly, or at least predominantly, vote against it, and vice versa. There are few exceptions to that rule, namely some profoundly ethical decisions, in which majority parties will let their members vote by conscience. But they are few and far apart.
Fun with neural networks
I decided to test this out by modeling some representatives by neural networks, and training the networks with a few voting sessions and their outcomes in the beginning of the parliamentary term.
Model for each representative was fed votes by every other rep except him- or herself as input, and his or her vote as desired output. This was repeated and repeated again for all hundred training sessions, until the model converged (loss fell under 0.05).
It was then shown voting sessions iz hasn’t seen yet, and tasked to predict the outcomes.
The results are shown in images below. For each representative, the image contains:
name and party,
training vector (the votes he/she cast in first 100 voting sessions – red for “against”, blue for “in favor”, yellow for absence for whatever reason),
actual votes (400 votes the network hasn’t seen and was trying to predict),
predicted votes (how the neural network thought the representative would vote), and
difference indicator (with red rectangles for wrong prediction, green rectangles for correct prediction, and yellow rectangles for absence)
I didn’t bother too much with statistics, to see who was the most predictable, neither did I try to predict voting for every rep.
In short, those with the mainly green bottom strip were the most predictable.
A cursory examination of results yields several realizations:
even in best predictions with lowest error rate, the model doesn’t predict absences well, especially for representatives with low incidence of absence in training data. This is intuitively understandable on two levels: first, it’s hard for the network to generalize something it didn’t observe, and second, absences can happen on a human whim, which is unreachable for a mathematical model. For representatives of opposition parties, who frequently engage in obstruction as a valid tactics, the model fares a little better.
the model predicts best the voting behavior of majjority party (SMC) members.
the model utterly fails to predict anything for representatives whowere absent in training period (duh).
So, could we substitute the actual representatives with simple neural networks? Not with this methodology. The problem is that we need votes of everyone else in the same session to predict the vote of modeled rep, so at the time of prediction, we already have their vote. We don’t have a way of inferring votes from scratch, or from previous votes.
We could, in theory, try to predict each rep’s vote independently from others by training the network on proposed acts’ texts. I speculate that a deeper network could correlate vectorized keywords in training texts with voting outcomes, and then be able to predict voting for each rep independently based on previous unseen texts. Maybe I’ll do that when I get the texts and learn a bit more. It’s still ANN 101 period for me.
I used a simple perceptron with 98 inputs (there have been 99 representatives in this term, counting also current ministers and substitutes), a hidden layer of 60 neurons, and a softmax classifier on the end.
On the basis of previous post, Transparency International Slovenia asked me to collaborate on some projects. This is one of them, and it was launched today on a separate site: kdovpliva.si (English: whoinfluences.si).
It’s an attempt to visualize several networks of lobbyists, their companies, politicians and state institutions. Perhaps the most interesting part is the network of lobbying contacts, which was constructed with data containing around 700 reported contacts between 2011 and late 2014.
As you may imagine, not every lobbying contact is reported. For those who are, records are kept at the Komisija za preprečevanje korupcije (Commission for prevention of corruption, a state institution). Transparency International Slovenia obtained those records as PDF files, since the institution refused to provide them in a machine-readable format. They hired a few volunteers to copy and paste the information in spreadsheets, then handed them to me to visualize them.
You can see the results below. Click here or the image to open the site in a new window. It’s in Slovenian. For methodology, continue reading below the image.
The meaning of every network is determined by the nature of its nodes and connections. Here, we have four node types:
those who were lobbied – state officials
organizations on which behalf lobbying was performed
state institutions at which the abovementioned officials work
Lobbying contact is initiated by a company or an organization, which employs a lobbyist to to the work. These people then contact state officials of a sufficient influence, who work at appropriate state institution.
So an organization is connected to the lobbyist with a weight of 2, the lobbyist to a state official with a weight of 1, and state official to her institution with a weight of 2. The weights signify the approximate loyalty between these entities. We presupposed that lobbyists are more loyal to their clients than they are to the state officials, with which they must be in a promiscuous relationship. Furthermore, the state officials are also supposed to be more loyal to their employers than to the lobbyists, although this is a daring supposition. But let’s say they are, or at least that they should be.
After some processing, the network emerged. Immediately apparent are the interest groups, centered around seats of power. Here’s an image of the pharmaceutical lobby. It’s centered on the Public Agency for Pharmaceuticals and Medicine. Main actors of influence are companies such as Merck, Novartis, Eli Lilly, Aventis, etc.
A click on the agency node brings up a panel with some details, such as a list of companies (font size indicates the frequency of contact), lobbying purposes and a timeline of lobbying contacts. Here we can see that Novartis and Krka were most active companies, and that they lobbied for purposes of pricing and to limit potential competition by producers of generic drugs.
You can explore the network by yourself to see the other interest groups.
Some advice from Information Commissioner
Unfortunately, we had to omit lobbyists’ names for reasons of supposed privacy. The Information Commissioner strongly advised us not to display them on the basis of some EU ruling. I’m not an expert in EU law, and perhaps there are good reasons for this. On the other hand, there may not be. I fail to see why this information would not be in public interest, since these decisions have an impact on a significant number of taxpayers, if not all of them.
Anyway, we have the names. After all, we had to use them to connect the network. They are present in raw data, just not displayed.
We’re are probably going to continue developing this project, as new information comes to light and new rulings regarding privacy are issued.
In Slovenia, we have a love/hate relationship with our politicians. We hate them, because at almost every single step they make, they let us know they are corrupt and they can easily get away with it. But in each new election new faces appear, promptly get elected and are hailed as saviors, who will finally clean the Augean stables of greed and corruption that has been accumulating for too long.
Most emotions are reserved for those in the front row, mainly government members. Members of parliament are somehow exempted, as they are not so widely known. Somehow, they are not monitored properly, at least in my book. There is a site that contains session records per member and per session, but it’s not widely known. It was an inspiration for this attempt to present members’ activity in an easily understandable and graphic way for current term and a few terms in the past.
The main idea was to group the parliamentary members by similarity of their voting record. Most parliamentary members are bound by strict voting discipline, imposed by the parties they belong to. This way the parties can guarantee that some or another act will pass and become a law. But is this really so? I tried to use a simple machine learning technique to answer that question. First I collected all the voting results from parliamentary term and sorted them in chronological order, then applied the technique (k-means clustering, for technologically minded). Number of groups was set to ten, but I could increase it to see smaller groups – maybe fractions inside parties, or cross-party interest groups.
Below you can see an example of two groups from recent term.
Here is the first:
And here another:
It’s apparent that groups do not contain representatives from one party only, and the visual representation imparts a feel for the differences in voting. As I mentioned above, I arbitrarily constructed ten groups, but a serious researcher would play and tinker with the number, as every clustering technique is an exploratory process and must be iterated upon for best results. It’s interesting that the results also show other parliamentary tactics. This one below could be interpreted as obstruction, or simply passivity or indifference. So what is it? To ask this question is to answer it, I guess.
To put it in context, this is a group of left-wing opposition representatives during a period when they were in heavy minority.
In contrast, this is the right-wing voting machine that prevailed:
The contrast between these two groups is so dramatic that it would be funny, if these were funny affairs. While the opposition was idling away, the majority voted into existence law after law that, together, still influence the lives of the Slovenian citizenry. In interactive version (English) you can explore what the votes were about by simply moving the mouse over horizontal stripes.
Session attendance is another telling indicator of particular representative’s zeal in upholding democracy and fulfilling the interests of his constituency. It’s already apparent from charts above, but I still constructed a separate graphics for that. It’s sorted by presence and more easily readable.
It has to be noted that some representatives were excused from voting sessions for various periods of time. Among them are those who became ministers and those who replaced them in the parliamentary seat, not being there before.
Here’s an example from the recent term. At the bottom, you can see two blocks with alternating presence. That’s because there were two governments. When the first one fell, the ministers returned to their seats; those who originally replaced them, returned to the party’s roster; new ministers were sworn in and abandoned their seats; and new replacements came from opposite camp.
Another interesting statistics is: representatives with most votes for yea or nay. I don’t really know how to interpret this, but I did it nevertheless. One could say that in terms with only one governments, members of ruling majority with most yea votes are those who unquestioningly toe the party line. Conversely, those with most nay votes are most fervent members of the opposition. In terms with two governments, this is a little less clear-cut: one would have to separate the timelines and run the statistics on subperiods for each government. I didn’t do this, but a serious researcher would. I made this report to let them know that they are being monitored, but it’s a task of an investigative journalist to delve into the data and interpret it in a meaningful way. I don’t have time for this, and I don’t really know the particulars of daily politics here enough to be able to do that.
But I’m offering the database to anyone who would like to do that. Send me a mail for details, I’ll gladly oblige.
Here are a few simple pie charts that illustrate what I just wrote:
While programming, it struck me that I could calculate a synthetic measure that would show the unity in the parliament. The reasoning goes: if the vote was unanimous, the parliament as a whole was united in cause at hand. But if half of representatives voted yea, and the other half nay, the parliament was divided. So I constructed a timeline of all voting sessions and colored every session according to this measure. Blue for unanimous vote, red for evenly split vote, and violet hues as nuances of disharmony.
Additionally, the bar heights indicate the presence ratio. Lower heights obviously mean lower presence.
In some terms, the presence falls toward the end, and the proportion of red bars increase. This means that the representatives lost heart and abandoned their posts, and those who stayed, quarreled bitterly.
Here are these graphics for various terms. They are stretched to same length. Perhaps a more correct, but less visually appealing approach would be not to stretch them, so the length of particular term would be apparent.
The drive behind this section was to find out whether the attendance is falling, as the session progresses into small hours. I found that not to be so, which is encouraging in a way. These charts at least show which sessions were bitterly contested, and which were almost unanimous. You can see examples of both behaviors in the graphic below.
Going one step further, I constructed a separate network for each session in a way that if a representative voted for a proposition, he or she is connected with it, otherwise no.
Networks are a little bit messy, and people tend to not understand them well. This network below shows three groups of representatives (you can zoom in and out in the interactive version). They are grouped close to the propositions they voted for. So this is another opportunity to find out the interest groups on the micro level, for each proposition. Some propositions don’t have a name, just a date. That’s not my fault, but the parliament’s, as they didn’t bother to publish it on the web.
Finally, here are some heatmaps for various variables, mapped on to seating orders. The first is partitioned according to representatives’ party. Sorry, no legend here. You can mouse over in the interactive version to show details.
The second is attendance heatmap. Green is full attendance, red is total absence, and there’s a linear color scale between them. This one provides at-a-glance overview of attendance of entire party blocks.
Next two are yea and nay heatmaps, so you can see which party blocks mostly voted yea, and which nay. They are normalized to their local maxima for visual appeal, but a more correct approach would be to not normalize them, so it would be apparent that a nay vote is much less frequent than a yea. Why, I have no Idea, but I imagine there must be a lot of technical votings, for example establishing presence and so on.
These seating orders are approximate, as I couldn’t get them for past terms from the parliament. They asserted that they didn’t have them, and claimed they don’t even have the current one, even if it’s published on their own website. There were more lies, but I won’t go into that here. They are, after all, in power, and I’m just a blogger.
Why they should engage in such behaviour is beyond me. Maybe they think that the information is theirs and should be kept from the public.
Again, if anyone needs the MongoDB database, drop me a note. My email address is on the About page.
This is an attempt at visualizing different conspiracy theories. The visualization tries to show interconnectedness of actors, organizations and concepts in each one, so a network graph was chosen as a mode of presentation. The presented theories are: The Antigravity Drive, Chemtrails, The Cabal (American deep state from JFK assassination to 9/11), The Illuminati/New World Order, and the most recent, the Malaysian Airlines Flight MH370 disappearance. In a way, it’s a progression from the previous network visualization about the PRISM scandal, which was once also considered a conspiracy theory.
I chose this topic because those theories always attracted me as a means of alternative explanation of things that I couldn’t understand in official versions of events. That is not to say that I necessarily believe in any of them. For example, I’d be hard pressed to believe in the Moon Landing Hoax theory, which I first included here because of relative ease of gathering source material, but later discarded because of its relatively low value. The Flt 370 theory has extremely low credibility too, and I wonder what I’ll think when this post is a year old.
A conspiracy, according to Wikipedia ” … may also refer to a group of people who make an agreement to form a partnership in which each member becomes the agent or partner of every other member and engage in planning or agreeing to commit some act.“. This is a pretty broad definition. It can apply to a government, a company, or every group of people who are trying to further an agenda, be it good or bad for their natural or social environment. But anything labelled as a conspiracy almost always has an evil association, for example “A civil conspiracy or collusion is an agreement between two or more parties to deprive a third party of legal rights or deceive a third party to obtain an illegal objective.” (Wikipedia – civil conspiracy), or “In criminal law, a conspiracy is an agreement between two or more persons to commit a crime at some time in the future.” (Wikipedia – criminal conspiracy).
A conspiracy theory is therefore an attempt at explaining a real or imagined conspiracy. In this sense, even official stories of various incidents are conspiracy theories, unless they are well founded in evidence and irrefutable facts. In a free society, a kind of market then forms of conspiracy theories, in which those with better means, but also more vested interests, compete for public’s attention with other bodies of citizenry, whose interests and aims can differ significantly. For example, a government can execute a false flag attack, as the Nazis did in Poland at the beginning of WW2, and spin a theory that the other party did it, in order to go to war and grab land. The public may then be motivated to concoct a variety of counter theories with various motives – simply seeking the truth, overthrowing the government by exposing the lies it tells, furthering some commercial agenda, for example selling books, or purely personal paranoid agendas, which serve no one else than the authors and their need to sustain their delusions.
Let me briefly explain the theories I used in this visualization. First two are quite believable.
The Cabal: the story of American deep state and events from JFK assassination to 9/11 attacks
A story of how the Nazi regime allegedly developed a form of anti gravity propulsion in total secrecy, made possible by a strictly compartmentalized environment, imposed on the German war production efforts by the SS. The technology was then seized by the US military and other allies after the war and developed further in utmost secrecy. The first such machines ever seen were so-called foo fighters. These balls of light, sighted and documented by various US Air Force pilots, flew in parallel with bombers and fighter planes, and frequently executed seemingly impossible air maneuvres. Also mentioned is a mythical machine The Glocke (The Bell), which ran on red mercury and was responsible for death of several scientists due to extreme radiation it produced, and the discoveries of Viktor Schauberger. His implosion engine, which drew heavily on vortex physics, was allegedly successful, and produces two flying prototypes. The US military immediately grabbed and classified much of this work, and it stays secret until now. It’s said to be employed in B-2 bomber and various flying craft sighted around Area 51 in Nevada. The story also goes to mention modern experiments in anti gravity physics, notably performed by Evgeniy Podkletnov, which allegedly succeeded in reducing gravity over a spinning superconducting electromagnet for two percent.
How a handful of secret societies dominate the world. The plot allegedly has its roots in The Bavarian Illuminati society, started in the eighteen century by Adam Weisshaupt. They were eradicated, but some claim they survived in a covert form, forging an alliance with international bankers. Most big world events since then were planned in advance, among them both the advent of Communism, Nazism and Zionism, World Wars, and the third too. Says Pike: “The Third World War must be fomented by taking advantage of the differences caused by the “agentur” of the “Illuminati” between the political Zionists and the leaders of Islamic World. The war must be conducted in such a way that Islam (the Moslem Arabic World) and political Zionism (the State of Israel) mutually destroy each other. Meanwhile the other nations, once more divided on this issue will be constrained to fight to the point of complete physical, moral, spiritual and economical exhaustion…We shall unleash the Nihilists and the atheists, and we shall provoke a formidable social cataclysm which in all its horror will show clearly to the nations the effect of absolute atheism, origin of savagery and of the most bloody turmoil.”
In recent times, the organizations that further Illuminati goals are Council for Foreign Relations, Trilateral Commission and the Bilderbergers. Here are some books: The Illuminati: Facts & Fiction by Mark Dice, and The Illuminati original by Adam Weisshaupt.
Malaysian Airlines Flight MH370 disappearance
A recent theory about the whereabouts of the missing plane. On it, there seemed to be an awful lot of technical personnel, involved in developing military hardware. They supposedly worked for a company named Freescale Semiconductors, which was in a patent wrestle with the Rothschild family. Acording to the story, Israeli agents and elements of US military hijacked the plane and secretly flew it to Diego Garcia military base in the Indian Ocean to debrief the experts and possibly use the plane in another 9/11-style attack in the future.
Construction and visualization of visualization networks
A few words for technologically minded. The networks were constructed by text-mining the source material, isolating known entities in sentences by means of massive dictionaries, connecting them in subnetworks (each sentence – one subnetwork), and finally adding them in the master network for that topic. Only sentence-length subnetworks were constructed, although it would be probably more fruitful to connect entities in paragraphs too. That would yield a too convoluted master network, so I stayed with sentences for clarity.
The dictionaries were automatically generated from source texts, then edited, Many synonyms had to be added, since my dictionary generating technique relies more on brute force than on semantic aspects of text. Again, the connections are not semantic, which means that if there was a sentence “The Illuminati are NOT connected with the CFR”, Illuminati and CFR would still be connected. Here I’m relying on the power of statistics: in majority of sentences there mostly appear connected entities. For the minority in which they are not, the bonds between them are too weak to influence the big picture.
I did try to process volumes of texts with a natural language processing framework, namely Apache OpenNLP, but got frustrated with the amount of work that would be needed for this little hobby project. I’d need to train the classifiers to extract named entities, which is no small feat, and I’d probably not use them again. To gain some insight in types of connections between these entities, I tried parsing the sentences into parse trees, then extract relationships, but parsing tech is not very accurate. It would probably do, again relying on power of statistics, but the sheer amount of relationship types would add little to visual value of the graphs, so I decided that I’d do this with a simpler project first. The logic I wrote is still in project source code, so if anyone is interested, mail me (About page) and I’ll send it your way. Same goes for the graph files and the categorized dictionaries.
Finally, the topic networks were exported as subgraphs, so that every node in the network is represented by a subgraph. These subgraphs are added into – or removed from – the master graph by the client. The networks in Browser are managed by sigma.js. Preliminary analysis was done in Gephi, I recommend Network Graph Analysis and Visualization with Gephi by Ken Cherven.
Additionally, geographic entities were extracted for each node. These are represented on a small map in the bottom of the screen. Map is managed by d3.js.
Interacting with visualization
There are two modes – reading the story or exploring on your own. Switch between them by clicking a button on top right of the graph. While read the story, the graph will change in real time as you scroll the text down. If you choose to explore, you can click on terms, and their subgraphs will be interactively added to the master graph.
Clicking on a graph node will expand it (load its associated nodes and display them, if previously not loaded), or delete it, if it was already loaded, at the same time showing the text from which its existence was text-mined.
There’s no way for the user to control the map. It’s there for informative and decorative purposes.