Difficulty Level = 8 [What's this?]
One of the coolest things you can do with the nootropic design Video Experimenter shield for Arduino is decode the closed caption data embedded in NTSC (North American) television broadcasts. I figured out how to do this and documented it in a another project, so if you want to understand all the details of how to capture and decode closed captions, refer to that project. With this project, I take it a step further and show how the spoken dialog embedded in a television show can be visualized on a computer in a “cloud” of words. This is the same type of cloud (often called a “tag cloud”) that you see on blogs, where the frequency of a particular word is reflected in the size of the word. More frequent == larger word.
First, here’s how the hardware is set up. It’s really simple. The Video Experimenter needs a composite video feed from a TV tuner like a DVR (e.g. Tivo) or VCR. You can also use a DVD player because DVDs usually have closed captioning data. The USB cable connects to your computer where you run a Processing sketch (program) to visualize the words as they are decoded by the Arduino. The Processing sketch dynamically builds the TV cloud as the words are extracted from the closed caption stream!
Here’s a video where I show a TV cloud of spoken dialog being created dynamically. I superimposed a video of the television broadcast so you can correlate the broadcast with the Processing application, but note that the Processing application doesn’t acutally display the video. Words spoken with higher frequency are larger.
Example TV Clouds
I always have noticed that whenever I happen to see a US national news broadcast, all the commercials are for drugs. I guess only old people watch the news on TV anymore. Here’s a TV cloud of the commercials shown during NBC Nightly News. Can you guess which drugs are being advertised? Can you guess which maladies they claim to cure? Look at all those nasty side effects!
Here is a TV cloud made while watching a baseball game. For US readers familiar with baseball, can you guess which teams were playing? Answer is at the end of this post.
The Processing sketch reads words from the serial line and filters out any word less than 3 letters and some very common words like “the”, “and”, “for”, etc. This application relies on the very nice OpenCloud Java library, so you’ll need to download that and use it in your Processing environment. Create this structure in your Processing sketchbook libraries directory: opencloud/library/opencloud.jar
Download the Processing sketch
Answer to the baseball broadcast question: Kansas City Royals vs. Minnesota Twins (go Twins!)