Grouping entries in Slovenian Wikipedia by contributors

Some time ago, I helped Miha Mazzini extract some data from Slovenian Wikipedia. For that, I needed to write a comprehensive parser, extracting not only titles and text, but also number of overall and per-contributor revisions, along with contributor usernames. So, for each entry, I got a list of contributing accounts and number of edits that were performed by that account. I wondered: how are the areas of expertise distributed

K-means clustering with Processing.js

K-means clustering is an algorithm to quickly group a large quantity of data. It’s used in variety of ways, from statistical analysis to improving usability of user interfaces. If you read Google News, you’re probably familiar with the way they group similar news items together. When I first saw that, I thought there must be some serious language processing and semantics behind that – that they somehow extract meaning from

