visualization Archives - Page 3 of 4

by virostatiq - September 9, 2013April 11, 2014

Interactive timeline of the PRISM scandal

Purpose of this visualization

This is an interactive timeline of events about the Prism scandal, chronicled by selected media in online news articles, giving a summarized view of events as they unfolded. It’s intended as a parody of a NSA software to track people and analyze their metadata. It consists of these parts:

the chronological order of articles, visualized as a timeline,
a network of people, places and organizations that appear in the articles,
geographic information that the articles refer to, and
a bar graph showing wordcounts of interesting words, associated with the main theme.

In the bottom part there is a timeline displaying the published articles in chronological order. Articles are accessible by clicking on a title and then using the Go to article link in the popup bubble.

In the center background there is a rotating globe, displaying major cities referred by articles. Labels next to cities contain titles of all articles visible in timeline view that refer to specific city.

Center foreground contains a network showing interconnectedness of various entities recognized in the article text. Entities appearing in the same paragraph are connected. The network is additive, which means that the more frequently entities appear in same paragraphs, the stronger is the bond between them.

In the right corner, there is a small bar graph containing frequencies of selected words, giving an idea of Snowden’s options at the time. It’s just a word count of shown words in visible articles, not a semantic analysis.

As the timeline is moved, new articles appear, and network is updated with new data, giving a quick overview of who and how frequently was involved in discussion, and who was related to them.

Hovering the mouse over a network node shows only the portion of the network that the node is directly connected to, making this useful for detailed exploration of relationships between those entities.

Launch the PRISM Scandal visualization! Use Chrome if possible. It won’t work in IE.

Interacting with the visualization

Use the mouse to drag timeline left or right, or rotate the middle wheel for the same effect. For quicker navigation use the Quick jump menu. The subnetworks will load and unload automatically, and the whole network will try to stabilize so that it accurately reflects frequencies of terms and bond stregths between them. If it doesn’t stabilize well, click the Reorder network button or double-click in the middle of the timeline.

It’s possible to zoom the network in and out to get a better idea of shown names and connections. To do that, position the mouse pointer over the network area and drag or zoom with the wheel. Node size corresponds to term frequency in visible articles, while the bond thickness corresponds to its weight, that is to say frequency of said bond.

Click the article titles in the timeline to display more information and links. To change the publisher, click the Publisher menu and select a desired one. This will load new set of events into the timeline. To automatically move the timeline, use the Play button.

Data sources

News articles containing keywords Snowden, Prism, NSA, Wikileaks and Julian Assange were scraped from selected media and stored locally for processing. The articles themselves are linked from the timeline. Their content, apart from titles, is not accesible in this visualization for copyright issues. Geographical database with city names and corresponding latitudes and longitudes was obtained as a free download at GeoNames. The media an/or publishing houses were selected to give a balanced set of worldviews. These are, in alphabetical order:

Processing the articles

First a dictionary of all capitalized word sequences and their permutations was constructed by processing all articles in the database. This is essentially a dictionary of all people, states, cities and organizations appearing in the whole database. Then, the title and body text of each article was scanned for these dictionary entries and city names, so that an article abstract was constructed, containing of title, publishing date, a link to article, a wordcount of selected words, a subnetwork of connected entities, and a list of cities along with latitudes and longitudes.

Constructing the network

The article subnetwork was constructed so that entities in the same sentence (connecting in a paragraph shown in the picture) are connected with a set weight. Nodes not connected with any other nodes are dropped at this point, since their inclusion would lead to a largely unconnected network, which is visually unappealing and cumbersome to navigate.

All web scraping and text processing was done in Java locally, there were around 10,000 articles processed in the latest count. See picture below.

There does not exist a live server database that this visualization would query. The entity dictionaries are here (names) and here (selected words).

Constructing the visualization

The visualization was constructed entirely in HTML5 and JavaScript. Four major libraries were used:

Sigma.js for displaying the networks. The latest version does not contain some key functionality for dynamically and additively loading and unloading of subgraphs into the main graph, so the source code was updated with required methods. Separate article on that topic is upcoming.
Three.js for rotating Earth and all geographically-related work.
Simile Timeline for the timeline.
Flot for the bar graph.

If anyone is interested in Java code for web scraping / networking / constructing timeline input files, drop me a note. My email is on the About page.

by virostatiq - August 13, 2013April 11, 2014

Building ages in Ljubljana, Slovenia

Such is the beauty of open data that when I saw the excellent Portland: The Age of a City by Justin Palmer, I immediately wanted to do something similar, but for my town. The people at the government office (GURS) were kind enough to provide me with the files, and after some coding, here it is.

It’s an exploration of how the city grew through the last century. Blue is old, violet younger, res still younger, bright red the youngest.

Launch the interactive map showing structure ages in Ljubljana

Here’s the number of structures built by years. I was able to identify causes for some spikes in building activity, but not all:

1899: four years after the big earthquake,
1919: rebuilding after WW1? I’m not sure there was much destruction here,
1929: more building – in 1929 Ljublaana became the capital of Dravska banovina,
1949: rebuilding after WW2,
1959, 1969, 1979, 1989: might be effects of Yugoslav loans, but I suspect it’s more of an effect of administrative laziness, resulting in entering new buildings into evidence at the end of each decade,
2004: the last surge of prosperity in independent Slovenia.

Generally, it’s been going downhill from 1969 on. The best spots were probably taken by then.

Here’s a animation of the whole thing. It shows city evolution between years 1500 and 2013, since there’s not much happening before that.

City of Ljubljana – growth between years 1500 – 2013 from Marko O’Hara on Vimeo.

Map was made with TileMill, animation in Processing.

Corruption visualized: Global Corruption Barometer 2013 on world map

Interactive map of data from Global Corruption Barometer 2013 (Transparency report), showing corruption levels per country for political parties, educational sector, private companies, media, civil servants, judicial and medical institutions, military, NGOs, parliament, police and religious institutions.

Launch interactive map

Global Corruption Barometer 2013 on interactive world map

Made with TileMill, data: Transparency International.

by virostatiq - May 16, 2013April 11, 2014

Data-driven drinking: ingredients and brands in the cocktail shaker

A better title for post would probably be “What kinds booze to drink together to get drunk in style, according to those who write, compile, publish, test and enjoy cocktail recipes”. Continuing from previous posts, I wanted to see how does it look a network of ingredients of all possible cocktail recipes, and if it’s possible to divide them into sensible groups, so that they would be instantly recognizable and even helpful to experienced and casual drinkers alike.

To do this, more than 25,000 recipes from Drinksmixer.com and Drinksnation.com were scraped, a network was constructed with Gephi, and visualized here below. Dot size reflects the count of that particular ingredient in all analyzed recipes. Dots of same color frequently appear together in recipes. One could say that one can hardly make a mistake if one combines three ingredients of the same color and drinks the concoction.

The map below is interactive, try panning and zooming with mouse or use the control in the upper left-hand corner.

I see five major groups of ingredients, but your alcohol proof may vary. Actually I suspected something like that:

ice is in its own group. For some reason it also contains tequila,
milky drinks are in their own group (gray-blue),
salty and spicy drinks are also in an easily recognizable group (pink),
blue group is dominated by vodka and rum,
green group mostly has gin and tangy juices, and
red group mostly contains fruit schnappses and liqueurs.

You can download hi-res static images here: black background | white background.

For a more mobile-friendly, searchable map with advanced interactivity, click here (Sigma.js). Clicking on an ingredient on this map will show a list of all connected ingredients. Clicking on an element in the list will show a subgraph.

Most recipes contained preferred brands for spirits and fruit juices, so I constructed another diagram. It shows which brands are usually grouped together in drinks.

Here is the interactive map:

Download hi-res static images here: black background | white background.
For a searchable map with advanced interactivity, click here. Clicking on a brand on this map will show a list of all connected brands.

I find it funny that Everclear, Kool-Aid and Mountain Dew are so close. Does that mean that people just pour 100% ethanol and caffeinated water in a jug and drink that? Possibly.

Coming up next: data-driven cooking.

by virostatiq - May 7, 2013April 11, 2014

Visualizing drug talk on bluelight.ru

In mainstream media, there’s not a lot to be found about recreational drugs except horror stories and arguments for prohibition. From time to time we also hear that Steve Jobs liked to drop acid when he was young, that countless Vietnam vets easily kicked heroin habit upon coming home, and, as US-fed-sponsored study found out, that psychedelic mushrooms can bring a lasting and positive personality change in more than half of those who take them.

Where to find good information? There exist internet communities, so-called harm-reduction forums, where one can spend a few hours to discover that the truth is not black and white. Surely junkies exist, and using meth daily is not a life strategy anyone could recommend, but not all drugs were created equal. There are many classes of recreational drugs, each acting on specific chemical pathways in body – uppers on dopamine, hallucinogens on serotonin, downers on GABA, etc.

Mapping drugs

I thought it would be nice to visualize these drug groups based on what users of harm-reduction forums say, so I analyzed around 1.2 million posts on bluelight.ru and constructed a simple diagram that tells a lot. It was constructed in such a way that drugs that are frequently mentioned together, appear together. Circle radii are proportional with frequency of appearance of the same drugs in the posts. Methodology is explained at the bottom of the post.

Here’s the diagram, pan and zoom at will:

Click here to peruse a clickable, searchable version of the same diagram (give it a second to load). To download a high-resolution image (8000 x 6000), click here (black) or here (white).

The drug groups are color coded for better readability. Starting from the top:

light blue group: mostly antidepressives – SSRIs such as Prozac (fluoxetine), Zoloft and such.
violet group: mainly contains benzodiazepines such as Xanax, Valium, and Lorazepam, which are commonly abused, but there are a lot of other downers there.
orange group: opiates and opioids, soch as heroin, oxycontin and the like. There were so many mentions of “opiates” without referring to a specific chemical that I considered it would be a pity to leave the word out.
dark yellow group on the right: mostly dissociatives such as ketamine and DXM, but there’s also a subgroup on the right side. It forms a larger group, mixed with differently colored drugs, that could be called “shamanic corner”, as it mostly contains so-called entheogens and natural concoctions such as ayahuasca.
light orange group: mainly nootropics such as Piracetam. Some use them to enhance a psychedelic or MDMA experience, but they have a more general use as memory, intelligence and sensory enhancers.
red group: I don’t know what to call this, but these are “working man’s drugs”. The common drugs that we hear about in the media. Some of these drugs are not considered drugs at all, for example alcohol and tobacco, but the Bluelight discussions show that they are very common. Thinking about it, one must have something to drink while one insufflates synthetic powders, and a cigarette is also a good thing to have while waiting for something stronger to take hold.
green group: psychedelic drugs such as shrooms, LSD, DMT and mescaline, along with many newer variations and analogs, such as 2C-X family, the DMT analogs and the whole Tihkal inventory.
blue group: Ecstasy (MDMA) and newer stimulants and entactogens, such as methylone, mephedrone, etc. “Plant foods” and “bath salts” are in this category.

Mapping effects

Simply mapping out the drugs is nice, but additional step seemed in order: mapping coincidence of various effects the drugs have on users. Again, posts were analyzed, but in addition to drugs, some (not all!) common effects were extracted and mapped in a network. Result is in the diagram below. Darker dots are effects, lighter are drugs. Size is again proportional to number of mentions in all posts.

Click here to peruse a clickable, searchable version of the same diagram. To download a high-resolution image, click here (black) or here (white).

Note that above diagram does not indicate semantic relationships between drugs and their effects. For example, why is “marijuana” close to “death”? Maybe there was a lot of talk about fear of death that the marijuana experience helps to resolve, or maybe people like to describe how they are dying of laughter while smoking weed. I honestly don’t know. I suspect it’s because of close relationship between mentions (not necessarily use!) of marijuana and those of alcohol, cocaine and methamphetamine, which could have a more significant relation with death or dying.
What’s really notable is heavy clustering of adverse effects around opiates, and relative absence of same around psychedelics. Based on Bluelight data, I can safely conclude that psychedelic drugs do not cause users to complain a lot, except maybe mentioning hallucinations and visuals, but, well …

Drug use over the years

My whole database contains posts from 2010 until March 2013. Here’s an analytical tool to better understand what’s going on in the recreational drug ~~market~~ community. Time is on horizontal axis, while the proportion of posts mentioning specific drug relative to all posts in that month is on the vertical axis.

Play around with interactive chart to discover emerging trends, or simply to behold the wax and wane of specific chemicals as they compete for users’ neurological apparatuses, while their manufacturers are temporarily evading ever stricter analog laws:

Commentary: Bluelight is a harm reduction forum, historically established for the users to be able to tell a good Ecstasy pill from the bad, so MDMA is the most mentioned drug. Use of “classic” drugs doesn’t change much, but it’s interesting to note the rise of new “research chemicals” such as NBOME family, new cathinones (3-MMC), new synthetic canabinoids (STS-135) and different amphetamines, prevalently methamphetamine. You can also see how the newly banned drugs, for example mephedrone, go out of use, and their analogs, in this case 3-MMC, replace them.

Methodology and tools

First, all the Bluelight forums were crawled and contents, dates and other metadata of all posts put into a SOLR index. That took approximately two days of not too aggressive load on their server (thanks Bluelight for not banning my IP).
To make first two network diagrams, undirected graphs were constructed with JGraphT library so that all extracted entities – drugs and effects – in every post were connected as nodes. Mentions of all extracted entities were counted to make the dots size show frequencies, not network degrees. That yielded complete graphs to be visualized with Gephi. Gephi files were exported to a TileMill-friendly format to render map tiles. Tiles are displayed on the site using Leaflet.
To make the interactive chart, SOLR was used to produce time series. Data was then packed into suitable format for the Flot library to be able to display.
To extract entities, two dictionaries were used – one for drugs, one for effects. You can download them here: drugs / effects.
If anyone is interested in the SOLR core, I can put it on Dropbox. Send me a note, my email is on the About page.

What is not here, but could be

analysis of effects that specific drugs have over time
a chart of effects only
some different visualization that could help to establish relationships between specific drugs and effects they have. For example, it’s been known for some time that mephedrone and various dragonflies have vasoconstrictive effects. Maybe some other relationship could be inferred that way.
first map should be clickable to search on Wikipedia, I’ll add that as soon as I figure out the Wax lib.

I may revisit this theme in the future.

Some pics:

Related articles