gephi toolkit Archives

by virostatiq - May 16, 2013April 11, 2014

Data-driven drinking: ingredients and brands in the cocktail shaker

A better title for post would probably be “What kinds booze to drink together to get drunk in style, according to those who write, compile, publish, test and enjoy cocktail recipes”. Continuing from previous posts, I wanted to see how does it look a network of ingredients of all possible cocktail recipes, and if it’s possible to divide them into sensible groups, so that they would be instantly recognizable and even helpful to experienced and casual drinkers alike.

To do this, more than 25,000 recipes from Drinksmixer.com and Drinksnation.com were scraped, a network was constructed with Gephi, and visualized here below. Dot size reflects the count of that particular ingredient in all analyzed recipes. Dots of same color frequently appear together in recipes. One could say that one can hardly make a mistake if one combines three ingredients of the same color and drinks the concoction.

The map below is interactive, try panning and zooming with mouse or use the control in the upper left-hand corner.

I see five major groups of ingredients, but your alcohol proof may vary. Actually I suspected something like that:

ice is in its own group. For some reason it also contains tequila,
milky drinks are in their own group (gray-blue),
salty and spicy drinks are also in an easily recognizable group (pink),
blue group is dominated by vodka and rum,
green group mostly has gin and tangy juices, and
red group mostly contains fruit schnappses and liqueurs.

You can download hi-res static images here: black background | white background.

For a more mobile-friendly, searchable map with advanced interactivity, click here (Sigma.js). Clicking on an ingredient on this map will show a list of all connected ingredients. Clicking on an element in the list will show a subgraph.

Most recipes contained preferred brands for spirits and fruit juices, so I constructed another diagram. It shows which brands are usually grouped together in drinks.

Here is the interactive map:

Download hi-res static images here: black background | white background.
For a searchable map with advanced interactivity, click here. Clicking on a brand on this map will show a list of all connected brands.

I find it funny that Everclear, Kool-Aid and Mountain Dew are so close. Does that mean that people just pour 100% ethanol and caffeinated water in a jug and drink that? Possibly.

Coming up next: data-driven cooking.

by virostatiq - March 31, 2013April 11, 2014

How to publish Gephi graphs on MapBox with TileMill

This is a simple how-to on publishing Gephi graphs in a tile-based, zoomable map suitable for online presentation. It’s intended for Gephi users, who find other solutions lacking, or would simply like to learn to publish in a free, cloud-based service.

If you have a Gephi graph and would like to publish it, you can export node and edge data in one of standard formats and use it to render tiles to use in Leaflet, Google Maps or other suitable APIs. This requires a server to store the tiles and some knowledge to render them and use the API.

There’s another possibility – you can use TileMill to render the tiles and free online service MapBox to host and display them. So here’s how to do it. I used this procedure to render the map here.

Exporting the graph data to TileMill

There’s more ways to do that. This one is relatively easy, but it requires some programming knowledge, as you have to use the Gephi Toolkit.

First make the graph. You can use Gephi or Gephi Toolkit. I use Gephi, since it allows to visually inspecting the graph, run additional correcting layout algorithms and so on. Save the graph as a .gephi file.

The graph is already spatialized and analyzed for modularity classes, so the node information in .gephi file contains at least coordinates (x, y), modularity classes, labels and sizes. TileMill can import CSV, so this is what we are going to do using Gephi Toolkit.

Here is the Java code:

public class RGraph {
private static final String root = “C:\\Users\\solipsy\\Documents\\!Data\\Gephi\\”;

public void openGephi (File file) {
//Init a project – and therefore a workspace
ProjectController pc = Lookup.getDefault().lookup(ProjectController.class);
if (pc.getCurrentProject() != null) {
pc.closeCurrentProject();
}
pc.openProject(file).run();
Workspace workspace = pc.getCurrentWorkspace();

ImportController importController = Lookup.getDefault().lookup(ImportController.class);
GraphModel graphModel = Lookup.getDefault().lookup(GraphController.class).getModel();
AttributeModel attributeModel = Lookup.getDefault().lookup(AttributeController.class).getModel();

//Append imported data to GraphAPI
//See if graph is well imported
DirectedGraph graph = graphModel.getDirectedGraph();
System.out.println(“Nodes: ” + graph.getNodeCount());
System.out.println(“Edges: ” + graph.getEdgeCount());

//export to CSV
try {
CSVWriter writer = new CSVWriter(new FileWriter(root + file.getName() + “.csv”), ‘\t’);
String[] header = “Latitude#Longitude#Modularity#Size#Label”.split(“#”);
writer.writeNext(header);
for ( Node n: graphModel.getGraph().getNodes().toArray()) {
if (Math.sqrt(n.getNodeData().x() * n.getNodeData().x() + n.getNodeData().y() * n.getNodeData().y()) > 3500) {
graph.removeNode(n);
}
String [] entry = new String[5];
entry[0] = String.valueOf(n.getNodeData().x());
entry[1] = String.valueOf(n.getNodeData().y());
entry[2] = String.valueOf(n.getAttributes().getValue(Modularity.MODULARITY_CLASS));
entry[3] = String.valueOf(n.getNodeData().getRadius());
entry[4] = String.valueOf(n.getNodeData().getLabel());
writer.writeNext(entry);
System.out.println (“modclass: ” + n.getAttributes().getValue(Modularity.MODULARITY_CLASS) +
“\tx: ” + n.getNodeData().x() +
” \ty:” + n.getNodeData().y() +
“\tsize: ” + n.getNodeData().getRadius() +
“\tlabel: ” + n.getNodeData().getLabel());

}

writer.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

//Export
ExportController ec = Lookup.getDefault().lookup(ExportController.class);
try {
ec.exportFile(new File(root + file.getName() + System.currentTimeMillis() + “.png”));
} catch (IOException ex) {
ex.printStackTrace();
return;
}
}

You’ll need to import Gephi Toolkit and OpenCSV. Incorporate the export class in your project and call the “openGephi” method. Correct paths to reflect directory structure on your computer. The method should produce a CSV file with following attributes: latitude (x), longitude (y), size, label, modularity class. Open the file in Notepad and replace out the quotes. Now it’s ready to import into TileMill.

Importing data into TileMill and setting up a project

This is not what you’ll get when you import the data. Its’ a look of a finished project. First you open Tilemill and add a new layer:

Give it a name and choose “900913” (Google) in SRS dropdown. That’ll place your graph right in the center of the map. You’ll notice that it’s just a tiny dot on the first zoom level. Zoom in, until you can clearly see the distinct dots. Zoom some more to decide the zoom bracket for your map, then set it using the slider. For the map above, I used zooms from 14 to 19. You should really use this option, or else the map will be huge, result in thousands of GBs of data and render for a year. You should also mark a part of the whole map to later export. Shift and drag around your graph to select the smallest possible area.

The metatile setting is important, but leave it on 1 for now. It’s used to prevent marker and label clipping on closer zoom levels. Larger means less clipping, but also less responsive map during editing.

Now it’s time to style your map so the nodes and labels are displayed in correct sizes and colors.

Styling the map

TileMill uses something called CartoCSS for styling labels, lines, markers, etc. It’s a simple conditional CSS. You can adjust values for each zoom level, and that’s what we are going to do. We’ll use markers to display the nodes, and set marker sizes so that they reflect the values in the “Size” column of your CSV file.

We’ll have to set the marker size to read the data in the column. This if for the biggest zoom level. Marker sizes get lower on lower levels by a factor 2, so for zoom levels 19 and 18 the marker size is specified like this:

[zoom = 19] {
marker-width: [Size] * 8;

}

[zoom = 18] {
marker-width: [Size] * 4;

}

You can guess the rest, it’s just dividing down the marker size. Unfortunately, it’s impossible to do something like that for labels. So we have to generate a list of node size brackets and corresponding marker sizes for each zoom level separately. I use Excel to do this, but maybe it’d be better to just write another method to generate all that during export. So, for zoom 19 we have:

[zoom = 19] {
marker-width: [Size] * 8;
[Size >0][Size <= 5] {text-size:8 }
[Size >5][Size <= 10] {text-size:13 }
[Size >10][Size <= 20] {text-size:20 }
[Size >20][Size <= 30] {text-size:40 }
[Size >30][Size <= 40] {text-size:60 }
[Size >40][Size <= 50] {text-size:80 }
[Size >50][Size <= 60] {text-size:100 }
[Size >60][Size <= 70] {text-size:120 }
[Size >70][Size <= 80] {text-size:140 }
[Size >80][Size <= 90] {text-size:160 }
[Size >90][Size <= 100] {text-size:180 }
[Size >100][Size <= 110] {text-size:200 }
[Size >110][Size <= 120] {text-size:220 }
[Size >120][Size <= 130] {text-size:240 }
[Size >130][Size <= 140] {text-size:260 }
[Size >140][Size <= 150] {text-size:280 }
[Size >150][Size <= 160] {text-size:300 }
[Size >160][Size <= 170] {text-size:320 }
[Size >170][Size <= 180] {text-size:340 }
[Size >180][Size <= 190] {text-size:360 }
[Size >190][Size <= 200] {text-size:380 }
[Size >200][Size <= 210] {text-size:400 }
[Size >210][Size <= 220] {text-size:420 }
[Size >220][Size <= 230] {text-size:440 }
[Size >230][Size <= 240] {text-size:460 }
[Size >240][Size <= 250] {text-size:480 }
[Size >250][Size <= 260] {text-size:500 }
[Size >260][Size <= 270] {text-size:520 }
[Size >270][Size <= 280] {text-size:540 }
[Size >280][Size <= 290] {text-size:560 }
[Size >290][Size <= 300] {text-size:580 }
[Size >=300][Size <= 310] {text-size:600 }
}

and for 18:

[zoom=18] {
marker-width: [Size] * 4;
[Size >0][Size <= 5] {text-size:2 }
[Size >5][Size <= 10] {text-size:5 }
[Size >10][Size <= 20] {text-size:10 }
[Size >20][Size <= 30] {text-size:20 }
[Size >30][Size <= 40] {text-size:30 }
[Size >40][Size <= 50] {text-size:40 }
[Size >50][Size <= 60] {text-size:50 }
[Size >60][Size <= 70] {text-size:60 }
[Size >70][Size <= 80] {text-size:70 }
[Size >80][Size <= 90] {text-size:80 }
[Size >90][Size <= 100] {text-size:90 }
[Size >100][Size <= 110] {text-size:100 }
[Size >110][Size <= 120] {text-size:110 }
[Size >120][Size <= 130] {text-size:120 }
[Size >130][Size <= 140] {text-size:130 }
[Size >140][Size <= 150] {text-size:140 }
[Size >150][Size <= 160] {text-size:150 }
[Size >160][Size <= 170] {text-size:160 }
[Size >170][Size <= 180] {text-size:170 }
[Size >180][Size <= 190] {text-size:180 }
[Size >190][Size <= 200] {text-size:190 }
[Size >200][Size <= 210] {text-size:200 }
[Size >210][Size <= 220] {text-size:210 }
[Size >220][Size <= 230] {text-size:220 }
[Size >230][Size <= 240] {text-size:230 }
[Size >240][Size <= 250] {text-size:240 }
[Size >250][Size <= 260] {text-size:250 }
[Size >260][Size <= 270] {text-size:260 }
[Size >270][Size <= 280] {text-size:270 }
[Size >280][Size <= 290] {text-size:280 }
[Size >290][Size <= 300] {text-size:290 }
[Size >300][Size <= 310] {text-size:300 }
}

Then just continue dividing, until you reach your last zoom level. It’s important not to use too many intervals in a zoom level, or TileMill will crash, at least on Windows.

Now set up colors. If you used modularity, look up the the numbers for modularity classes in Gephi and use them in CSS. In my example, I use Modularity column in CSS to determine node colors:

[Modularity = 43] {marker-fill:#0000FF;}
[Modularity = 72] {marker-fill:#008B00;}
[Modularity = 5] {marker-fill:#EEB422;}
[Modularity = 3] {marker-fill:#8E388E;}
[Modularity = 9] {marker-fill:#FF1493;}
[Modularity = 0] {marker-fill:#CD0000;}
[Modularity = 25] {marker-fill:#388E8E;}
[Modularity = 38] {marker-fill:#8E8E38;}
[Modularity = 6] {marker-fill:#1E90FF;}
[Modularity = 10] {marker-fill:#000080;}
[Modularity = 7] {marker-fill:#00EE00;}
[Modularity = 21] {marker-fill:#B8860B;}

Before exporting, don’t forget to set the metatile size to at least 7 to prevent clipping.

Exporting to MapBox

Create an account on MapBox. It’s free for up to 50 MB of data and 5000 views in a rolling month.

In TileMill, select “Export / MBTiles”. Name your map, review the settings and click “Export”, if you want to save tiles to disk, or “Upload” if you want them to immediately appear in your MapBox account. Then make your map visible and share!