I was wondering if presence of faces in video content was an indicator of anything, and if so, of what. So I decided to scan episodes of a popular TV series and analyze them, second by second, for number of faces in video frames, and then compare charts of various episodes. Here is the result of this research.
I decided to analyze House Of Cards, partly because it’s a great series, but also because it’s character focused, so there are many scenes with a lot of people. I built an interactive viewer, which allows to see which faces were recognized at a particular point in time in Episode 3, which contains a variety of scenes with many people in them.
Launch the viewer, or continue reading for short description of technology.
Technology
To pull this off, I used the OpenCV computer vision library, which has a good capability to recognize faces. As the computer watches TV, this tool scans every frame for faces, and, if it finds any, communicates the relevant rectangles, so they can be drawn or extracted and saved.
Here’s a screenshot of a scene in church. It’s immediately apparent that the tool does not do such a good job, for many faces remain unrecognized. Still, many are recognized.
In this frame below, more faces are recognized.
There are also many false positives. The computer sometimes thinks that something is a face, where it most certainly it’s not, as in this picture below. If one looks carefully, one can sometimes see something face-like in these rectangles.
To construct the viewer, I extracted individual faces from frames so I could display them on the page. They are of various sizes and look like this:
To construct the charts, I just counted the faces in each seconds, then displayed the time series for each episode.
Results
This is the final chart. It’s a series of timelines that show how many faces were recognized per second. Why are some lines orange, and some yellow?
As video frames scanning progressed, some faces were recognized in only one frame in entire second – there are 23 of them. Some other faces were recognized in more frames, ans others in yet more frames. I thought this to be a good indicator of face detection reliability, but that’s not so. If it tells anything, it’s how steady the camera was in that section.
My inspiration was small multiples, a visualization technique which allows for easier comparison of several datasets from the same domain. Wikipedia says:
A small multiple (sometimes called trellis chart, lattice chart, grid chart, or panel chart) is a series or grid of small similar graphics or charts, allowing them to be easily compared. The term was popularized by Edward Tufte.
According to Tufte (Envisioning Information, p. 67):
- At the heart of quantitative reasoning is a single question: Compared to what? Small multiple designs, multivariate and data bountiful, answer directly by visually enforcing comparisons of changes, of the differences among objects, of the scope of alternatives. For a wide range of problems in data presentation, small multiples are the best design solution.
As always, if anyone is interested in code, mail me. My address is on About page.