How to publish Gephi graphs on MapBox with TileMill

Share Button

This is a simple how-to on publishing Gephi graphs in a tile-based, zoomable map suitable for online presentation. It’s intended for Gephi users, who find other solutions lacking, or would simply like to learn to publish in a free, cloud-based service.

If you have a Gephi graph and would like to publish it, you can export node and edge data in one of standard formats and use it to render tiles to use in Leaflet, Google Maps or other suitable APIs. This requires a server to store the tiles and some knowledge to render them and use the API.

There’s another possibility – you can use TileMill to render the tiles and free online service MapBox to host and display them. So here’s how to do it. I used this procedure to render the map here.

Exporting the graph data to TileMill

There’s more ways to do that. This one is relatively easy,  but it requires some programming knowledge, as you have to use the Gephi Toolkit.

First make the graph. You can use Gephi or Gephi Toolkit. I use Gephi, since it allows to visually inspecting the graph, run additional correcting layout algorithms and so on. Save the graph as a .gephi file.

1-gephi-graph

The graph is already spatialized and analyzed for modularity classes, so the node information in .gephi file contains at least coordinates (x, y), modularity classes, labels and sizes. TileMill can import CSV, so this is what we are going to do using Gephi Toolkit.

Here is the Java code:

 public class RGraph {
private static final String root = “C:\\Users\\solipsy\\Documents\\!Data\\Gephi\\”;

public void openGephi (File file) {
//Init a project – and therefore a workspace
ProjectController pc = Lookup.getDefault().lookup(ProjectController.class);
if (pc.getCurrentProject() != null) {
pc.closeCurrentProject();
}
pc.openProject(file).run();
Workspace workspace = pc.getCurrentWorkspace();

ImportController importController = Lookup.getDefault().lookup(ImportController.class);
GraphModel graphModel = Lookup.getDefault().lookup(GraphController.class).getModel();
AttributeModel attributeModel = Lookup.getDefault().lookup(AttributeController.class).getModel();

//Append imported data to GraphAPI
//See if graph is well imported
DirectedGraph graph = graphModel.getDirectedGraph();
System.out.println(“Nodes: ” + graph.getNodeCount());
System.out.println(“Edges: ” + graph.getEdgeCount());

//export to CSV
try {
CSVWriter writer = new CSVWriter(new FileWriter(root + file.getName() + “.csv”), ‘\t’);
String[] header  = “Latitude#Longitude#Modularity#Size#Label”.split(“#”);
writer.writeNext(header);
for ( Node n: graphModel.getGraph().getNodes().toArray()) {
if (Math.sqrt(n.getNodeData().x() * n.getNodeData().x() + n.getNodeData().y() * n.getNodeData().y()) > 3500) {
graph.removeNode(n);
}
String [] entry = new String[5];
entry[0] = String.valueOf(n.getNodeData().x());
entry[1] = String.valueOf(n.getNodeData().y());
entry[2] = String.valueOf(n.getAttributes().getValue(Modularity.MODULARITY_CLASS));
entry[3] = String.valueOf(n.getNodeData().getRadius());
entry[4] = String.valueOf(n.getNodeData().getLabel());
writer.writeNext(entry);
System.out.println (“modclass: ” + n.getAttributes().getValue(Modularity.MODULARITY_CLASS) +
“\tx: ” + n.getNodeData().x() +
” \ty:” + n.getNodeData().y() +
“\tsize: ” + n.getNodeData().getRadius() +
“\tlabel: ” + n.getNodeData().getLabel());

}

writer.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

//Export
ExportController ec = Lookup.getDefault().lookup(ExportController.class);
try {
ec.exportFile(new File(root + file.getName() + System.currentTimeMillis() + “.png”));
} catch (IOException ex) {
ex.printStackTrace();
return;
}
}

You’ll need to import Gephi Toolkit and OpenCSV. Incorporate the export class in your project and call the “openGephi” method. Correct paths to reflect directory structure on your computer. The method should produce a CSV file with following attributes: latitude (x), longitude (y),  size, label, modularity class. Open the file in Notepad and replace out the quotes. Now it’s ready to import into TileMill.

Importing data into TileMill and setting up a project

2-tilemill-graph

This is not what you’ll get when you import the data. Its’ a look of a finished project. First you open Tilemill and add a new layer:

2-tilemill-add-layer

Give it a name and choose “900913” (Google) in SRS dropdown. That’ll place your graph right in the center of the map. You’ll notice that it’s just a tiny dot on the first zoom level. Zoom in, until you can clearly see the distinct dots. Zoom some more to decide the zoom bracket for your map, then set it using the slider. For the map above, I used zooms from 14 to 19. You should really use this option, or else the map will be huge, result in thousands of GBs of data and render for a year. You should also mark a part of the whole map to later export. Shift and drag around your graph to select the smallest possible area.

2-tilemill-project settings

The metatile setting is important, but leave it on 1 for now. It’s used to prevent marker and label clipping on closer zoom levels. Larger means less clipping, but also less responsive map during editing.

Now it’s time to style your map so the nodes and labels are displayed in correct sizes and colors.

Styling the map

TileMill uses something called CartoCSS for styling labels, lines, markers, etc. It’s a simple conditional CSS. You can adjust values for each zoom level, and that’s what we are going to do. We’ll use markers to display the nodes, and set marker sizes so that they reflect the values in the “Size” column of your CSV file.

2-tilemill-sheet

We’ll have to set the marker size to read the data in the column. This if for the biggest zoom level. Marker sizes get lower on lower levels by a factor 2, so for zoom levels 19 and 18 the marker size is specified like this:

[zoom = 19] {
marker-width: [Size] * 8;

}

[zoom = 18] {
marker-width: [Size] * 4;

}

You can guess the rest, it’s just dividing down the marker size. Unfortunately, it’s impossible to do something like that for labels. So we have to generate a list of node size brackets and corresponding marker sizes for each zoom level separately. I use Excel to do this, but maybe it’d be better to just write another method to generate all that during export. So, for zoom 19 we have:

[zoom = 19] {
marker-width: [Size] * 8;
[Size >0][Size <= 5] {text-size:8 }
[Size >5][Size <= 10] {text-size:13 }
[Size >10][Size <= 20] {text-size:20 }
[Size >20][Size <= 30] {text-size:40 }
[Size >30][Size <= 40] {text-size:60 }
[Size >40][Size <= 50] {text-size:80 }
[Size >50][Size <= 60] {text-size:100 }
[Size >60][Size <= 70] {text-size:120 }
[Size >70][Size <= 80] {text-size:140 }
[Size >80][Size <= 90] {text-size:160 }
[Size >90][Size <= 100] {text-size:180 }
[Size >100][Size <= 110] {text-size:200 }
[Size >110][Size <= 120] {text-size:220 }
[Size >120][Size <= 130] {text-size:240 }
[Size >130][Size <= 140] {text-size:260 }
[Size >140][Size <= 150] {text-size:280 }
[Size >150][Size <= 160] {text-size:300 }
[Size >160][Size <= 170] {text-size:320 }
[Size >170][Size <= 180] {text-size:340 }
[Size >180][Size <= 190] {text-size:360 }
[Size >190][Size <= 200] {text-size:380 }
[Size >200][Size <= 210] {text-size:400 }
[Size >210][Size <= 220] {text-size:420 }
[Size >220][Size <= 230] {text-size:440 }
[Size >230][Size <= 240] {text-size:460 }
[Size >240][Size <= 250] {text-size:480 }
[Size >250][Size <= 260] {text-size:500 }
[Size >260][Size <= 270] {text-size:520 }
[Size >270][Size <= 280] {text-size:540 }
[Size >280][Size <= 290] {text-size:560 }
[Size >290][Size <= 300] {text-size:580 }
[Size >=300][Size <= 310] {text-size:600 }
}

and for 18:

[zoom=18] {
marker-width: [Size] * 4;
[Size >0][Size <= 5] {text-size:2 }
[Size >5][Size <= 10] {text-size:5 }
[Size >10][Size <= 20] {text-size:10 }
[Size >20][Size <= 30] {text-size:20 }
[Size >30][Size <= 40] {text-size:30 }
[Size >40][Size <= 50] {text-size:40 }
[Size >50][Size <= 60] {text-size:50 }
[Size >60][Size <= 70] {text-size:60 }
[Size >70][Size <= 80] {text-size:70 }
[Size >80][Size <= 90] {text-size:80 }
[Size >90][Size <= 100] {text-size:90 }
[Size >100][Size <= 110] {text-size:100 }
[Size >110][Size <= 120] {text-size:110 }
[Size >120][Size <= 130] {text-size:120 }
[Size >130][Size <= 140] {text-size:130 }
[Size >140][Size <= 150] {text-size:140 }
[Size >150][Size <= 160] {text-size:150 }
[Size >160][Size <= 170] {text-size:160 }
[Size >170][Size <= 180] {text-size:170 }
[Size >180][Size <= 190] {text-size:180 }
[Size >190][Size <= 200] {text-size:190 }
[Size >200][Size <= 210] {text-size:200 }
[Size >210][Size <= 220] {text-size:210 }
[Size >220][Size <= 230] {text-size:220 }
[Size >230][Size <= 240] {text-size:230 }
[Size >240][Size <= 250] {text-size:240 }
[Size >250][Size <= 260] {text-size:250 }
[Size >260][Size <= 270] {text-size:260 }
[Size >270][Size <= 280] {text-size:270 }
[Size >280][Size <= 290] {text-size:280 }
[Size >290][Size <= 300] {text-size:290 }
[Size >300][Size <= 310] {text-size:300 }
}

Then just continue dividing, until you reach your last zoom level. It’s important not to use too many intervals in a zoom level, or TileMill will crash, at least on Windows.

Now set up colors. If you used modularity, look up the the numbers for modularity classes in Gephi and use them in CSS. In my example, I use Modularity column in CSS to determine node colors:

[Modularity = 43]  {marker-fill:#0000FF;}
[Modularity = 72]  {marker-fill:#008B00;}
[Modularity = 5]  {marker-fill:#EEB422;}
[Modularity = 3]  {marker-fill:#8E388E;}
[Modularity = 9]  {marker-fill:#FF1493;}
[Modularity = 0]  {marker-fill:#CD0000;}
[Modularity = 25]  {marker-fill:#388E8E;}
[Modularity = 38]  {marker-fill:#8E8E38;}
[Modularity = 6]  {marker-fill:#1E90FF;}
[Modularity = 10]  {marker-fill:#000080;}
[Modularity = 7]  {marker-fill:#00EE00;}
[Modularity = 21]  {marker-fill:#B8860B;}

Before exporting, don’t forget to set the metatile size to at least 7 to prevent clipping.

Exporting to MapBox

Create an account on MapBox. It’s free for up to 50 MB of data and 5000 views in a rolling month.

In TileMill, select “Export / MBTiles”. Name your map, review the settings and click “Export”, if you want to save tiles to disk, or “Upload” if you want them to immediately appear in your MapBox account. Then make your map visible and share!

Exploring Hollywood values through IMDB genres and tags

Share Button

A typical Hollywood story always portraits life in a twisted way. Movies are infused with values. There are typical stories: justice always prevails in the end, even if it means the death of a good guy; the coming-of-age story, in which hero becomes a man, the revenge story, in which the hero is wronged in the beginning, and must regain his life and justice in the course of the film. In American movies, family values are all-important, and so on.

These values are interrelated in the movie world, but what is their importance relative to other values? Is war a good or a bad thing, as portrayed in the movies? Is friendship close to romance, and is marriage close to love? What is science fiction – action, adventure or fantasy?

There happens to be a treasure trove of useful information on IMDB to visualize these relations. Each movie belongs to one or more genres, and on every movie page, there are tags for themes that occur in it. One could construct a network of movies that are interrelated through genres and tags they share. If two films share a tag, they must be closer than films that don’t share it. But there are many tags and over ten genres, so how does it look?

It looks like this (click image to launch interactive page):

 
Network graph

Roughy 15,000 movies, as presented on IMDB. Full map in a bigger window. There’s also a post showing how a social network of actors evolves over time from 1960 to 2013.

If a circle is bigger, it means it has more connections (movies associated with it). For example, The “Drama” tag seems to be the biggest, because apparently a big part of movies are dramas.

Same-colored circles belong to common categories, so for example “Drama”, “Romance”, “Love”, “Friendship”, “Marriage” and surprisingly “History” and “Biography” belong to the same group. Romance and drama are actually genres, and “love”, “history” and “biography” are tags. If you zoom in, you can see the movies associated with each tag and category.

It seems that most of Hollywood romance takes place in New York City, and that there’s a lot of sex going on there at the same time. There is some friendship involved, but not much. It’s interesting that marriage is on the opposite side of romance in relation to sex. It also seems that there is a lot of romantic activity on the set, as actors and actresses are closely related to it. This must be an artifact of Hollywood self-reflection.

On the other hand, California, as represented in movies, seems much more family-oriented. There’s a lot of boys, children, girls and babies around it. There’s also a lot of dreams and female nudity.

It’s also fun to construct sentences containing words of closely positioned tags. Drugs and money lead to suicide? Death by doctor in a hospital? Murder someone, get apprehended by police and go to prison?

It’s also apparent that sci-fi is nothing but a sub-genre of adventure. I always thought there are more brains to it. And fantasy seems nothing more than adventure for family audiences.

Have fun browsing the map, and let me know if you discover more fun facts.

Recent experiments with Gephi led me to speculate that it’s possible to extract meaning from a large volume of data with network analysis. This is a first post in this blog, and also the first in a series dealing with data and visualization.

If you are interested in making your own diagrams like that, here’s a how-to.

Edit: after being mentioned in Canadian Business magazine (thanks Matthew McClearn!), I should maybe add an explanation of my interpretation above. I’m actually interpreting relationships as portrayed in the movies, as inserted in the IMDB database. So what I wrote was actually an interpretation of an aggregation of simplifications of interpretations. I still think that my methodology is sound – after all, the diagram looks OK, and actually makes sense. It’s just curious to interpret.