Interactive timeline of the PRISM scandal

Share Button

中国翻译此页

Purpose of this visualization

This is an interactive timeline of events about the Prism scandal, chronicled by selected media in online news articles, giving a summarized view of events as they unfolded. It’s intended as a parody of a NSA software to track people and analyze their metadata. It consists of these parts:

  • the chronological order of articles, visualized as a timeline,
  • a network of people, places and organizations that appear in the articles,
  • geographic information that the articles refer to, and
  • a bar graph showing wordcounts of interesting words, associated with the main theme.

In the bottom part there is a timeline displaying the published articles in chronological order. Articles are accessible by clicking on a title and then using the Go to article link in the popup bubble.

In the center background there is a rotating globe, displaying major cities referred by articles. Labels next to cities contain titles of all articles visible in timeline view that refer to specific city.

Center foreground contains a network showing interconnectedness of various entities recognized in the article text. Entities appearing in the same paragraph are connected. The network is additive, which means that the more frequently entities appear in same paragraphs, the stronger is the bond between them.

In the right corner, there is a small bar graph containing frequencies of selected words, giving an idea of Snowden’s options at the time. It’s just  a word count of shown words in visible articles, not a semantic analysis.

As the timeline is moved, new articles appear, and network is updated with new data, giving a quick overview of who and how frequently was involved in discussion, and who was related to them.

Hovering the mouse over a network node shows only the portion of the network that the node is directly connected to, making this useful for detailed exploration of relationships between those entities.

Launch the PRISM Scandal visualization!  Use Chrome if possible. It won’t work in IE.

PRISM Scandal Visualization Window
PRISM Scandal Visualization Window

Interacting with the visualization

Use the mouse to drag timeline left or right, or rotate the middle wheel for the same effect. For quicker navigation use the Quick jump menu. The subnetworks will load and unload automatically, and the whole network will try to stabilize so that it accurately reflects frequencies of terms and bond stregths between them. If it doesn’t stabilize well, click the Reorder network button or double-click in the middle of the timeline.

It’s possible to zoom the network in and out to get a better idea of shown names and connections. To do that, position the mouse pointer over the network area and drag or zoom with the wheel. Node size corresponds to term frequency in visible articles, while the bond thickness corresponds to its weight, that is to say frequency of said bond.

Click the article titles in the timeline to display more information and links. To change the publisher, click the Publisher menu and select a desired one. This will load new set of events into the timeline. To automatically move the timeline, use the Play button.

Data sources

News articles containing keywords Snowden, Prism, NSA, Wikileaks and Julian Assange were scraped from selected media and stored locally for processing. The articles themselves are linked from the timeline. Their content, apart from titles, is not accesible in this visualization for copyright issues. Geographical database with city names and corresponding latitudes and longitudes was obtained as a free download at GeoNames. The media an/or publishing houses were selected to give a balanced set of worldviews. These are, in alphabetical order:

Processing the articles

First a dictionary of all capitalized word sequences and their permutations was constructed by processing all articles in the database. This is essentially a dictionary of all people, states, cities and organizations appearing in the whole database. Then, the title and body text of each article was scanned for these dictionary entries and city names, so that an article abstract was constructed, containing of title, publishing date, a link to article, a wordcount of selected words, a subnetwork of connected entities, and a list of cities along with latitudes and longitudes.

Constructing the network

Subgraphs in an article

The article subnetwork was constructed so that entities in the same sentence (connecting in a paragraph shown in the picture) are connected with a set weight. Nodes not connected with any other nodes are dropped at this point, since their inclusion would lead to a largely unconnected network, which is visually unappealing and cumbersome to navigate.

All web scraping and text processing was done in Java locally, there were around 10,000 articles processed in the latest count. See picture below.

 

prism-solr-admin

There does not exist a live server database that this visualization would query.  The entity dictionaries are here (names) and here (selected words).

Constructing the visualization

The visualization was constructed entirely in HTML5 and JavaScript. Four major libraries were used:

  • Sigma.js for displaying the networks. The latest version does not contain some key functionality for dynamically and additively loading and unloading of subgraphs into the main graph, so the source code was updated with required methods. Separate article on that topic is upcoming.
  • Three.js for rotating Earth and all geographically-related work.
  • Simile Timeline for the timeline.
  • Flot for the bar graph.

If anyone is interested in Java code for web scraping / networking / constructing timeline input files, drop me a note. My email is on the About page.

Enhanced by Zemanta