Ours and theirs – an endless duel between computer generated Slovenian news commenters

Share Button

I’ve been following comments under Slovenian web news items for quite some time. The commenters there are well known for their animosity towards anyone who disagrees with their political worldview. Reading the comment section usually means immersing yourself in verbal filth, depravity and all imaginable kinds of hate speech.

Some time ago I wrote software for scraping comments off these websites and have been since then storing them in a database. They are useful for a number of things. For example,  I’ve had some success with stylometry (identifying commenters by their writing style, even when they post under different name), but this is a matter of another post. I also helped SAZU compiling a list of new slang words for the new Dictionary of Slovene Language.

So here’s a lighter project for people who don’t read these comments, neither they want to. If you want to see it all, just click the image below and behold the auto-generated stream for a minute or two.

Note that these are not real comments. The text is generated from two Markov chains, which have been initialized with texts of left-wing and right-wing commenters.  The comments used are approximately a year old, lest someone accuses me of participating in election campaign of some kind. The web page simply generates a few sentences from one, then from the other, and so it continues ad nauseam infinitum.

I think it’s a fitting commentary of Slovenian mentality. Slovenian-speaking visitors will notice that, even if the texts are probabilistically computer-generated, there’s still ample hurling of insults based on the outcome of the last World War. There’s quite a lot of that.

Also, even though both sides pack serious vitriol, the right wingers use more classic hate speech, and they write comparatively worse.

See for yourself!

Naši in vaši
Naši in vaši

Technically, it was a breeze to make. First I pulled entire corpora of selected commenters from the database in text form, then I used RiTA, a generative text tool, for initializing the two models and generating sentences. The code is very short, most of it has to do with displaying and scrolling.

But initializing models from loaded text:

model.loadText(text);

And then generating sentences with just:

var sentences = model.generateSentences(nr);

Such is the beauty of RiTA.