Exploring Hollywood values through IMDB genres and tags

A typical Hollywood story always portraits life in a twisted way. Movies are infused with values. There are typical stories: justice always prevails in the end, even if it means the death of a good guy; the coming-of-age story, in which hero becomes a man, the revenge story, in which the hero is wronged in the beginning, and must regain his life and justice in the course of the film. In American movies, family values are all-important, and so on.

These values are interrelated in the movie world, but what is their importance relative to other values? Is war a good or a bad thing, as portrayed in the movies? Is friendship close to romance, and is marriage close to love? What is science fiction – action, adventure or fantasy?

There happens to be a treasure trove of useful information on IMDB to visualize these relations. Each movie belongs to one or more genres, and on every movie page, there are tags for themes that occur in it. One could construct a network of movies that are interrelated through genres and tags they share. If two films share a tag, they must be closer than films that don’t share it. But there are many tags and over ten genres, so how does it look?

It looks like this (click image to launch interactive page):

Roughy 15,000 movies, as presented on IMDB. Full map in a bigger window. There’s also a post showing how a social network of actors evolves over time from 1960 to 2013.

If a circle is bigger, it means it has more connections (movies associated with it). For example, The “Drama” tag seems to be the biggest, because apparently a big part of movies are dramas.

Same-colored circles belong to common categories, so for example “Drama”, “Romance”, “Love”, “Friendship”, “Marriage” and surprisingly “History” and “Biography” belong to the same group. Romance and drama are actually genres, and “love”, “history” and “biography” are tags. If you zoom in, you can see the movies associated with each tag and category.

It seems that most of Hollywood romance takes place in New York City, and that there’s a lot of sex going on there at the same time. There is some friendship involved, but not much. It’s interesting that marriage is on the opposite side of romance in relation to sex. It also seems that there is a lot of romantic activity on the set, as actors and actresses are closely related to it. This must be an artifact of Hollywood self-reflection.

On the other hand, California, as represented in movies, seems much more family-oriented. There’s a lot of boys, children, girls and babies around it. There’s also a lot of dreams and female nudity.

It’s also fun to construct sentences containing words of closely positioned tags. Drugs and money lead to suicide? Death by doctor in a hospital? Murder someone, get apprehended by police and go to prison?

It’s also apparent that sci-fi is nothing but a sub-genre of adventure. I always thought there are more brains to it. And fantasy seems nothing more than adventure for family audiences.

Have fun browsing the map, and let me know if you discover more fun facts.

Recent experiments with Gephi led me to speculate that it’s possible to extract meaning from a large volume of data with network analysis. This is a first post in this blog, and also the first in a series dealing with data and visualization.

If you are interested in making your own diagrams like that, here’s a how-to.

Edit: after being mentioned in Canadian Business magazine (thanks Matthew McClearn!), I should maybe add an explanation of my interpretation above. I’m actually interpreting relationships as portrayed in the movies, as inserted in the IMDB database. So what I wrote was actually an interpretation of an aggregation of simplifications of interpretations. I still think that my methodology is sound – after all, the diagram looks OK, and actually makes sense. It’s just curious to interpret.