Visualizing TV Dialog Using Closed Caption Data

Difficulty Level = 8 [What’s this?]

One of the coolest things you can do with the nootropic design Video Experimenter shield for Arduino is decode the closed caption data embedded in NTSC (North American) television broadcasts. I figured out how to do this and documented it in a another project, so if you want to understand all the details of how to capture and decode closed captions, refer to that project. With this project, I take it a step further and show how the spoken dialog embedded in a television show can be visualized on a computer in a “cloud” of words. This is the same type of cloud (often called a “tag cloud”) that you see on blogs, where the frequency of a particular word is reflected in the size of the word. More frequent == larger word.

Hardware Setup

First, here’s how the hardware is set up. It’s really simple. The Video Experimenter needs a composite video feed from a TV tuner like a DVR (e.g. Tivo) or VCR. You can also use a DVD player because DVDs usually have closed captioning data. The USB cable connects to your computer where you run a Processing sketch (program) to visualize the words as they are decoded by the Arduino. The Processing sketch dynamically builds the TV cloud as the words are extracted from the closed caption stream!

Hardware setup

Demo Video

Here’s a video where I show a TV cloud of spoken dialog being created dynamically. I superimposed a video of the television broadcast so you can correlate the broadcast with the Processing application, but note that the Processing application doesn’t acutally display the video. Words spoken with higher frequency are larger.

Example TV Clouds

I always have noticed that whenever I happen to see a US national news broadcast, all the commercials are for drugs. I guess only old people watch the news on TV anymore. Here’s a TV cloud of the commercials shown during NBC Nightly News. Can you guess which drugs are being advertised? Can you guess which maladies they claim to cure? Look at all those nasty side effects!

TV cloud made from drug commercials. Click to enlarge.


Here is a TV cloud made while watching a baseball game. For US readers familiar with baseball, can you guess which teams were playing? Answer is at the end of this post.

TV cloud built from part of a baseball game broadcast. Click to enlarge.

The Software

The Arduino sketch is fairly simple, and for details on how it works, please see the in-depth article about decoding closed captions.
This project is the “ExtractCaptionWords” example in the TVout library for Video Experimenter.

The Processing sketch reads words from the serial line and filters out any word less than 3 letters and some very common words like “the”, “and”, “for”, etc. This application relies on the very nice OpenCloud Java library, so you’ll need to download that and use it in your Processing environment. Create this structure in your Processing sketchbook libraries directory: opencloud/library/opencloud.jar
Download the Processing sketch


Answer to the baseball broadcast question: Kansas City Royals vs. Minnesota Twins (go Twins!)

Published by Michael, on July 18th, 2011 at 7:17 pm. Filed under: Arduino,Level 8,Processing,Video. | 13 Comments |

13 Responses to “Visualizing TV Dialog Using Closed Caption Data”

  1. wow. im speechless. will this work if i am using the motorola cablebox?

    Comment by abeish on July 19, 2011 at 11:00 PM

  2. Any TV source with a composite video output. Single RCA jack, usually yellow.

    Comment by Michael on July 19, 2011 at 11:26 PM

  3. Im having problem to get this to work with PAL source. Only geting a white line the picture do you having tipps for me?

    Comment by gustaf on July 31, 2011 at 2:33 PM

  4. The closed captioning standards are only on NTSC broadcasts. I believe PAL has a completely different standard, and I don’t have access to PAL signals so I am afraid I never was able to experiment with it.

    Comment by Michael on July 31, 2011 at 7:46 PM

  5. Would this project allow the ability for Video mashups?

    In that I mean, being able to take two separate video streams, and pull pieces from each to over lay onto one final screen.

    It’s a similar idea to DJ’ing and doing music mash-ups, however, I’d like to add another dimension of complexity by adding in the visual mixing.

    thought? ideas? comments? know anywhere I could find what I’m looking for?

    Comment by Cam on August 19, 2011 at 2:08 PM

  6. No, that’s not possible with this hardware. The Video Experimenter is just a sync separator, and we bit-bang out monochrome composite video from the Arduino. It doesn’t actually do video processing like you describe. You need much more sophisticated hardware (which I don’t really know about). Anyone else know?

    Comment by Michael on August 20, 2011 at 5:19 PM

  7. I don’t know if gustaf ever figured out the PAL closed captioning data but according to the wikipedia article on closed captioning PAL uses (or used to use?) a “teletext” system which sounds infinitely more interesting than the NTSC CC system.

    The history at least makes it sound really fun:

    Anyway, as best I can understand (without dedicating a lot of time to something I don’t have) PAL sends its CC data on “page 888” though I’m not sure what this means. Also, Ive read that they refer to “closed captioning” as “subtitles”, there isn’t the distinction as in the US.
    You might check out the following pdf if you want to try and pull the data.

    Comment by Another Michael on September 16, 2011 at 1:00 AM

  8. Does this work with digital TV or only supported with analogue television signal? If you have normal analogue TV, then a top set box needed to decode the tv program(as the programm is encoded in MPEG format).

    Comment by Jack on January 12, 2012 at 5:00 PM

  9. This works with an analog composite signal. If your decoder can output composite, then you can do it.

    Comment by Michael on January 12, 2012 at 7:39 PM

  10. I am not really making a comment but seeking your help. I am a 73 years old man with a severe hearing problem living in Nigeria. As we do not hsve closed captioning in our tv programs, I was wondering if there is a gadget that will enable me generate closed captions on my tv. I am hoping that I can do it with cnn and bbc world tv broadcast.
    Please let me know what I can do to enable me get cc on my tv.
    Thanks and sorry to trouble you
    Sam Sikpojie

    Comment by Sam Sikpojie on July 18, 2012 at 4:21 PM

  11. broken Links, not read:

    Comment by celso on February 22, 2013 at 9:14 AM

  12. @celso, thanks. Links fixed.

    Comment by Michael on February 22, 2013 at 9:24 AM

  13. I had the ardunio sketch running and it was printing CC to the serial monitor fine. Then when I ran the processing sketch, the text turned into jibberish in the processing sketch and the arduino serial monitor. I switched the port to 57600 and it mostly fixed the problem but when I opened it a second time it stopped reading anything in both the serial monitor and the processing sketch. I’ve tried reloading the sketch, reseting both the arduino and the video experimenter shield but nothing is working. The white CC bars are still showing up as normal on the video overlay on the TV, by the way.

    Not sure if this is relevant but it’s giving me a warning:

    RXTX Warning: Removing stale lock file. /var/lock/LK.031.018.006

    Comment by Will on March 24, 2013 at 9:33 PM

Leave a Reply