This is my first attempt to use open data for data visualization in web presentation and for a mobile app. The idea was to cross-pollinate promotion, but it didn’t go so well – more on this later.
According to data provided by state police, highway authority and local traffic wardens, there occurred a little less than a million traffic violations between start of 2012 and September 2014. Given that there are 1,300,000 registered vehicles and 1,400,000 active driving licenses in the country, this is a lot. A big majority of them are parking and toll tickets.
In the main article, there are a lot of images and charts. For example, I analyzed data for major towns in Slovenia to get the streets with the highest number of issued traffic tickets. Here’s an example for Ljubljana:
I had temporal data for each issued ticket, so I could also show on which streets you are more likely to be ticketed in the morning, midday or evening. On the image below, morning is blue, midday is yellow, and evening is red.
This is, however, only the beginning. Here are questions I tried to answer:
Are traffic wardens and traffic police just another type of tax collectors for the state and counties?
Do traffic wardens really issue more tickets now than in the past, or is that just my perception?
Which zones in bigger towns are especially risky, should you forget to pay the parking?
Are traffic wardens more active in specific time intervals?
Does the police lay speed traps in locations with most traffic accidents? What about DUI checking?
How does temperature influence the number of issued traffic tickets?
Does the moon influence the number of issued traffic tickets? If so, which types?
Where and when are drivers most at risk of encountering other drunk drivers?
Where does the highway authority check for toll, and when to hit the road if one does not want to pay it?
How can we drive safer using open data?
Be sure to read the main article to see all the visualizations and interactive maps. There are also videos, for example this one, showing how the ticketing territory expanded through time in Ljubljana:
The big finding was a sharp increase of number of parking tickets issued in Ljubljana by the end of 2013, which coincides with publishing of debt that the county has run into:
There’s an interactive map showing the quadrants with most DUI tickets and their distribution by day of week and month in year:
Mobile app for Android
I also wrote an Android mobile app (get it on Google Play if you are interested) that locates the user and shows locations of violations of selected type on the map, as well as a threat assessment, should she want to break the law. Here’s the description on Google Play:
The app helps the user find out where and when were traffic tickets issued in Slovenia, thus facilitating safer driving.
Ticket database is limited to territory of Republic of Slovenia.
Choose between these issued citations to show in app:
– driving while using a cellphone
– ignoring safety belt laws
– unpaid toll
and traffic accidents.
The app will locate you, fetch data about traffic citations issued in your vicinity, and show them on map. To see citations, that were issued somewhere else, click on map. Additionally available is summary of threat level, derived from statistical data, collected by government agencies.
Locating the user and showing dots on map wasn’t really a challenge, but I wanted to show a realistic threat assessment, based on location and time. To do that, I wrote an API method that calculates the number of tickets issued on the same day of week in the same hour interval and then draws a simple gauge.
Let’s say, for example, that you find yourself in the center of Ljubljana on Monday at noon, don’t have the money for parking fee, and you really only want to take a box to a friend who lives there. You’ll be gone for ten minutes only, so should you risk not paying the parking fee?
The app finds out the total number of tickets issued on Mondays in the three-hour period between noon and 3 PM, then graphically shows the threat level along with some distributions, something like this:
It works pretty well, and I use it sometimes, although I admit that its use cases may be marginal for majority of population. It does get ten new installs a day, although I don’t know how long this trend will continue.
I did send out press reviews and mounted a moderate campaign on Twitter (here’s the app’s account), but it amounted to precious little. Maybe the timing was bad – I launched it during Christmas holidays, when Internet usage is low. Or this type of app just isn’t so interesting.
I’m currently working on analysis of parking tickets for New York City, maybe that will be more interesting. There were, after all, more than nine million tickets issued there, and data is much richer.
I often wondered what is an average lifetime of a pop song on the charts. If one follows music, it becomes intuitively apparent that there are in fact several types of hits. Some stay on the charts for many weeks, and others barely make it, then immediately slip out.
So I set about discovering groups of songs with similar trends, as they moved on weekly British Top 40 Chart from 1990 to 2014. A total of 1284 different songs appeared on the charts in that period. After a series of experiments, 100 groups were arbitrarily decided on. Position data for each song was collected across the weeks, then the songs were grouped using k-means clustering.
The result is part interactive, part static visualization, consisting of an exploratory chart and 100 small charts showing each separate group.
This is an attempt at visualizing different conspiracy theories. The visualization tries to show interconnectedness of actors, organizations and concepts in each one, so a network graph was chosen as a mode of presentation. The presented theories are: The Antigravity Drive, Chemtrails, The Cabal (American deep state from JFK assassination to 9/11), The Illuminati/New World Order, and the most recent, the Malaysian Airlines Flight MH370 disappearance. In a way, it’s a progression from the previous network visualization about the PRISM scandal, which was once also considered a conspiracy theory.
I chose this topic because those theories always attracted me as a means of alternative explanation of things that I couldn’t understand in official versions of events. That is not to say that I necessarily believe in any of them. For example, I’d be hard pressed to believe in the Moon Landing Hoax theory, which I first included here because of relative ease of gathering source material, but later discarded because of its relatively low value. The Flt 370 theory has extremely low credibility too, and I wonder what I’ll think when this post is a year old.
A conspiracy, according to Wikipedia ” … may also refer to a group of people who make an agreement to form a partnership in which each member becomes the agent or partner of every other member and engage in planning or agreeing to commit some act.“. This is a pretty broad definition. It can apply to a government, a company, or every group of people who are trying to further an agenda, be it good or bad for their natural or social environment. But anything labelled as a conspiracy almost always has an evil association, for example “A civil conspiracy or collusion is an agreement between two or more parties to deprive a third party of legal rights or deceive a third party to obtain an illegal objective.” (Wikipedia – civil conspiracy), or “In criminal law, a conspiracy is an agreement between two or more persons to commit a crime at some time in the future.” (Wikipedia – criminal conspiracy).
A conspiracy theory is therefore an attempt at explaining a real or imagined conspiracy. In this sense, even official stories of various incidents are conspiracy theories, unless they are well founded in evidence and irrefutable facts. In a free society, a kind of market then forms of conspiracy theories, in which those with better means, but also more vested interests, compete for public’s attention with other bodies of citizenry, whose interests and aims can differ significantly. For example, a government can execute a false flag attack, as the Nazis did in Poland at the beginning of WW2, and spin a theory that the other party did it, in order to go to war and grab land. The public may then be motivated to concoct a variety of counter theories with various motives – simply seeking the truth, overthrowing the government by exposing the lies it tells, furthering some commercial agenda, for example selling books, or purely personal paranoid agendas, which serve no one else than the authors and their need to sustain their delusions.
Let me briefly explain the theories I used in this visualization. First two are quite believable.
The Cabal: the story of American deep state and events from JFK assassination to 9/11 attacks
A story of how the Nazi regime allegedly developed a form of anti gravity propulsion in total secrecy, made possible by a strictly compartmentalized environment, imposed on the German war production efforts by the SS. The technology was then seized by the US military and other allies after the war and developed further in utmost secrecy. The first such machines ever seen were so-called foo fighters. These balls of light, sighted and documented by various US Air Force pilots, flew in parallel with bombers and fighter planes, and frequently executed seemingly impossible air maneuvres. Also mentioned is a mythical machine The Glocke (The Bell), which ran on red mercury and was responsible for death of several scientists due to extreme radiation it produced, and the discoveries of Viktor Schauberger. His implosion engine, which drew heavily on vortex physics, was allegedly successful, and produces two flying prototypes. The US military immediately grabbed and classified much of this work, and it stays secret until now. It’s said to be employed in B-2 bomber and various flying craft sighted around Area 51 in Nevada. The story also goes to mention modern experiments in anti gravity physics, notably performed by Evgeniy Podkletnov, which allegedly succeeded in reducing gravity over a spinning superconducting electromagnet for two percent.
How a handful of secret societies dominate the world. The plot allegedly has its roots in The Bavarian Illuminati society, started in the eighteen century by Adam Weisshaupt. They were eradicated, but some claim they survived in a covert form, forging an alliance with international bankers. Most big world events since then were planned in advance, among them both the advent of Communism, Nazism and Zionism, World Wars, and the third too. Says Pike: “The Third World War must be fomented by taking advantage of the differences caused by the “agentur” of the “Illuminati” between the political Zionists and the leaders of Islamic World. The war must be conducted in such a way that Islam (the Moslem Arabic World) and political Zionism (the State of Israel) mutually destroy each other. Meanwhile the other nations, once more divided on this issue will be constrained to fight to the point of complete physical, moral, spiritual and economical exhaustion…We shall unleash the Nihilists and the atheists, and we shall provoke a formidable social cataclysm which in all its horror will show clearly to the nations the effect of absolute atheism, origin of savagery and of the most bloody turmoil.”
In recent times, the organizations that further Illuminati goals are Council for Foreign Relations, Trilateral Commission and the Bilderbergers. Here are some books: The Illuminati: Facts & Fiction by Mark Dice, and The Illuminati original by Adam Weisshaupt.
Malaysian Airlines Flight MH370 disappearance
A recent theory about the whereabouts of the missing plane. On it, there seemed to be an awful lot of technical personnel, involved in developing military hardware. They supposedly worked for a company named Freescale Semiconductors, which was in a patent wrestle with the Rothschild family. Acording to the story, Israeli agents and elements of US military hijacked the plane and secretly flew it to Diego Garcia military base in the Indian Ocean to debrief the experts and possibly use the plane in another 9/11-style attack in the future.
Construction and visualization of visualization networks
A few words for technologically minded. The networks were constructed by text-mining the source material, isolating known entities in sentences by means of massive dictionaries, connecting them in subnetworks (each sentence – one subnetwork), and finally adding them in the master network for that topic. Only sentence-length subnetworks were constructed, although it would be probably more fruitful to connect entities in paragraphs too. That would yield a too convoluted master network, so I stayed with sentences for clarity.
The dictionaries were automatically generated from source texts, then edited, Many synonyms had to be added, since my dictionary generating technique relies more on brute force than on semantic aspects of text. Again, the connections are not semantic, which means that if there was a sentence “The Illuminati are NOT connected with the CFR”, Illuminati and CFR would still be connected. Here I’m relying on the power of statistics: in majority of sentences there mostly appear connected entities. For the minority in which they are not, the bonds between them are too weak to influence the big picture.
I did try to process volumes of texts with a natural language processing framework, namely Apache OpenNLP, but got frustrated with the amount of work that would be needed for this little hobby project. I’d need to train the classifiers to extract named entities, which is no small feat, and I’d probably not use them again. To gain some insight in types of connections between these entities, I tried parsing the sentences into parse trees, then extract relationships, but parsing tech is not very accurate. It would probably do, again relying on power of statistics, but the sheer amount of relationship types would add little to visual value of the graphs, so I decided that I’d do this with a simpler project first. The logic I wrote is still in project source code, so if anyone is interested, mail me (About page) and I’ll send it your way. Same goes for the graph files and the categorized dictionaries.
Finally, the topic networks were exported as subgraphs, so that every node in the network is represented by a subgraph. These subgraphs are added into – or removed from – the master graph by the client. The networks in Browser are managed by sigma.js. Preliminary analysis was done in Gephi, I recommend Network Graph Analysis and Visualization with Gephi by Ken Cherven.
Additionally, geographic entities were extracted for each node. These are represented on a small map in the bottom of the screen. Map is managed by d3.js.
Interacting with visualization
There are two modes – reading the story or exploring on your own. Switch between them by clicking a button on top right of the graph. While read the story, the graph will change in real time as you scroll the text down. If you choose to explore, you can click on terms, and their subgraphs will be interactively added to the master graph.
Clicking on a graph node will expand it (load its associated nodes and display them, if previously not loaded), or delete it, if it was already loaded, at the same time showing the text from which its existence was text-mined.
There’s no way for the user to control the map. It’s there for informative and decorative purposes.
There appeared an article, in which an attempt was made to expose questionable practices of some Slovenian enterpreneurs. The scheme is such: establish a company, perform some work, bleed it dry, then establish a new one and move all workers into it, at the same time avoiding paying benefits and a sizable portion of salaries. When the new company has server its purpose, establish a new one, and so on, as far as it goes. These companies are frequently registered at the same address.
The article says that there are as many as 120 companies registered in one residential building. But because of a weakness of the law, state inspectors can’t put an end to such practice.
I wanted to see these addresses on the map, so here’s an attempt. For every address with more than five companies, there’s a dot, with color and radius proportional with number of companies registered there. The biggest dots represent business buildings, in which a predominantly legitimate businesses reside. My data sources didn’t allow for filtering out just residential buildings.
You can see the standalone map here. (In Slovene.)
Clicking on a marker displays a popup with a list of companies, sorted by date of establishment – youngest first. There’s also a chart of predominant business categories at that address. The categories that the article mentions as most prone to scheme in question, are Construction and Retail. So even of this map can’t really show the locations with these questionable companies, it can maybe help their discovery. If there’s a big dot with predominantly these categories, there’s a certain possibility that some of these fraudulent companies are there.
Most addresses shown here of course don’t have anything to do with any illegal activity.
The Global Gender Gap Index examines the gap between men and women in four fundamental categories (subindexes): Economic Participation and Opportunity, Educational Attainment, Health and Survival and Political Empowerment. Table 1 displays all four of these subindexes and the 14 different indicators that compose them, along with the sources of data used for each.
I thought it would be nice to try to visualize the data and make it as interactive as I could, and learn d3.js in process. I actually tried to mobilize all the data in the report, which one can see in graphical form by clicking on countries on world map, or selecting the categories in the dropdown.
There are several categories:
In addition to that, I calculated the differences between 2013 and previous years. These maps are also accessible through dropdown menu, or simply by scrolling up and down.
This subindex is captured through three concepts: the participation gap, the remuneration gap and the advancement gap. The participation gap is captured using the difference in labour force participation rates. The remuneration gap is captured through a hard data indicator (ratio of estimated female-to-male earned income) and a qualitative variable calculated through the World Economic Forum’s Executive Opinion Survey (wage equality for similar work). Finally, the gap between the advancement of women and men is captured through two hard data statistics (the ratio of women to men among legislators, senior officials and managers, and the ratio of women to men among technical and professional workers).
In this subindex, the gap between women’s and men’s current access to education is captured through ratios of women to men in primary-, secondary- and tertiary-level education. A longer-term view of the country’s ability to educate women and men in equal numbers is captured through the ratio of the female literacy rate to the male literacy rate.
Health and Survival
This subindex provides an overview of the differences between women’s and men’s health. To do this, we use two indicators. The first is the sex ratio at birth, which aims specifically to capture the phenomenon of “missing women” prevalent in many countries with a strong son preference. Second, we use the gap between women’s and men’s healthy life expectancy, calculated by the World Health Organization. This measure provides an estimate of the number of years that women and men can expect to live in good health by taking into account the years lost to violence, disease, malnutrition or other relevant factors.
This subindex measures the gap between men and women at the highest level of political decision-making, through the ratio of women to men in minister-level positions and the ratio of women to men in parliamentary positions. In addition, we include the ratio of women to men in terms of years in executive office (prime minister or president) for the last 50 years. A clear drawback in this category is the absence of any indicators capturing differences between the participation of women and men at local levels of government. Should such data become available at a global level in future years, they will be considered for inclusion in the Global Gender Gap Index.
Out of the 110 countries that have been involved every year since 2006, 95 (86%) have improved their performance over the last four years, while 15 (14%) have shown widening gaps. Ten countries have closed the gap on both the Health and Survival and Educational Attainment subindexes. No country has closed the economic participation gap or the political empowerment gap. On the Economic Participation and Opportunity subindex, the highest-ranking country (Norway) has closed over 84% of its gender gap, while the lowest ranking country (Syria) has closed only 25% of its economic gender gap. There is similar variation in the Political Empowerment subindex. The highest-ranking country (Iceland) has closed almost 75% of its gender gap whereas the two lowest-ranking countries (Brunei Darussalam and Qatar) have closed none of the political empowerment gap according to this measure.