Decoding Closed Captioning Data

Difficulty Level = 8 [What's this?]

How Does Closed Captioning Work?

Closed captioning is the technology used to embed text or other information in an NTSC television broadcast (North America, Japan, some of South America). It is typically a transcription of the broadcast audio for the benefit of hearing impaired viewers. No doubt, you’ve all seen closed captions displayed on a TV, but how does it work? This project will explain how closed captioning technology works and then show you how you can decode and display the data using your Arduino and a Video Experimenter shield. There a lot to learn here, so be patient! First, you can take a look at a video showing this capability, then keep reading to learn how it works.

The data that your TV displays is embedded in the broadcast itself in a special format, and in a special location of the video image. When you activate the closed captioning feature on your TV, your TV decodes the information and displays it on the screen. Whether you are displaying it or not, the data is in the broadcast encoded on line 21 of the video frame. This is defined by the standard EIA-608. Here is what the line 21 signal looks like:

Waveform for the closed caption data on line 21 of a TV frame

 
This shows the voltage of a composite video signal for line 21. The horizontal sync and color burst are just like any other video line, but the section called “clock run-in” is a special sinusoidal wave that allows the TV to synchronize with the closed captioning data which is about to start. The 7-peak run-in is followed by 3 start bits with values of 001. You can see how the voltage rises for the third bit S3. The next 16 bits represent two 8-bit characters of text. That’s right, there are only two characters per video frame, but at 30 frames per second, there is enough bandwidth for closed captions. The last bit of each byte b7 is an odd parity bit. Parity bits are an error detection mechanism. That is, this bit is either on or off in order to keep the total bits in the byte at an odd number. So, if bits b0-b6 have 4 bits on, then the parity bit is on to achieve an odd number of bits (5).

Capturing and Decoding the Data

So, how do we capture and decode this data using the Video Experimenter? We need to use the enhanced TVout library used with all Video Experimenter projects. You may already know from other Video Experimenter projects that we can capture a video image in the TVout frame buffer. For this project, we just want to capture the line 21 data so we can decode it. This is accomplished with the API method: tv.setDataCapture(int line, int dataCaptureStart, char *ccdata) where ‘line’ is the TVout scan line to capture, ‘dataCaptureStart’ is the number of clock cycles on that line to wait before starting to capture, and ‘ccdata’ is a buffer to store the bits in. Typically, we do something like this:

  unsigned char ccdata[16]; // 128 pixels wide is 16 bytes
  ...
  tv.setDataCapture(13, 310, ccdata);

Even though the data is on line 21, I have found it to be on line 13 or 14 as far as TVout is concerned. The value of 310 for dataCaptureStart is the value I have found to work best in order to fit both characters of data in the width of the TVout frame buffer. This will make more sense later when we visually look at the pixels captured. It may take a while to “find” the data by trying different lines and different values for dataCaptureStart to get the right alignment. Just try different values. I have also needed to adjust the small potentiometer near the reset button upward a bit. A resistance of around 710K was required instead of the standard 680K required by the LM1881 chip on the Video Experimenter. You’ll know when you’ve found the data when you see a data line like in the images below. Sometimes you might find data that is not closed captions, but information about the program, like the title, etc. This is called XDS or Extended Data Services. This can be interesting information to decode also!

Once we tell enhanced TVout where to find the data, the buffer ccdata will always contain the pixels of the specified line of the current frame. If we display the captured pixels on the screen we can visually see how it matches up with the line 21 waveform. To produce the picture below, I copied the contents of ccdata to the first line of the TVout frame buffer so we can see the data with our eyes. The data appears as white pixels at the top of the image. It isn’t necessary to display it on the screen in order to decode it and write it to the Serial port. But it makes it easier to find the data visually and see what’s going on.

Closed captioning data line displayed at top of image

 
On the left side we can see the last 2 peaks of the clock run in sine wave. Then we clearly see the start bits 001. Each bit is about 5 or 6 pixels wide. Then there are 7 zero bits (pixels off) and the parity bit (on). When this picture was taken, no dialog was being spoken, so the characters are all zero bits except for the parity bit. When text data is being broadcast, the bits flash very quickly:

When data is present, the bits flash quickly.

 
Now that we have found the data in the broadcast, and can display it for inspection, we need to decode this 128-bit wide array of pixels into the two text characters. To do that, we need to note where each bit of the characters starts. Each bit is 5 or 6 pixels wide. The next step I took in my program was to define an array of bit positions that describe the starting pixel of each bit:

byte bpos[][8]={{26, 32, 38, 45, 51, 58, 64, 70}, {78, 83, 89, 96, 102, 109, 115, 121}};

These are the bit positions for the two bytes in the data line. By displaying these bit positions just below the data line, we can adjust them if needed by trial and error. Here’s an image with the bit positions displayed below the data line. Since each data bit is nice and wide, they don’t have to line up perfectly to get reliable decoding. These positions have worked well for me for a variety of video sources.

Values of the bit position array displayed to show alignment with data.

 
OK, we are almost done. Now that we have found the closed caption data line, and have established the starting points for each bit, we can easily decode the bits into characters and write them to the serial port for display on a computer. We can also just print them to the screen if we want. I have taken care of all this code for you, and you can download the complete project code here.

If you have problems finding the data, try different lines for the data (13 or 14), different values for dataCaptureStart, and adjust both potentiometers on the Video Experimenter. Try slowly turning the small pot near the reset button clockwise. If you are patient, you’ll find the data and decode it!

Other project ideas

  • Instead of writing the data to the serial port, write it to the screen itself with tv.print(s)
  • Search for keywords in a closed captions and light an LED when the word is found.



Published by Michael, on March 20th, 2011 at 4:04 pm. Filed under: Arduino,Level 8,Video. | 28 Comments |





28 Responses to “Decoding Closed Captioning Data”

  1. Why do you only get 5 or 6 pixels per bit? I assume that the video data is standard 640 pixels across, so the two 8-bit bytes, 3 start bits, and a couple of cycles of the sine wave should work out to around 30 pixels per bit.

    I mention this because many asynchronous serial receivers using 32x oversampling, and then they look for the rising or falling edge of the start bit to determine alignment. Once you know where the start bit lies, you should be able to predict where each bit lies. The ideal would be to look in the middle of each bit area. With ~30 pixels per bit, you could scan for the three start bits (001) and align on the rising edge of the third bit (1). From there, skip 15 pixels ahead to find the middle of the area, then jump forward 30 pixels to read each subsequent bit.

    Oops, now I realize that the LM1881 is not a video frame grabber, but just provides sync. So the A/D is not fast enough to capture more than 128 pixels per line. A faster A/D would allow 640 pixels or 720 pixels, or I suppose even 1920 pixels per line. With a faster A/D, your CC decoding would probably work better. Also, and FPGA or similar could be programmed like an async serial receiver that is specific to CC patterns, and then you could just load the 2 bytes per line directly into the CPU. You’d still need the ability to select between line 21 or the other ones (13, 14).

    Comment by rsdio on July 22, 2011 at 2:58 AM



  2. rsdio: You are correct that the limitation is speed. I’m not using ADC, I’m using the analog comparator which is faster than a true ADC. The real speed limitation is the clock speed of the MCU. If you look at the assembly code where I capture image data, you can see that it takes 5 clock cycles to do the work (store the analog comparator result in frame buffer, increment stuff) for each pixel. It actually takes 3 cycles and there’s a 2 cycle NOP, but the last pixel of the byte takes more time, so I’m bound by that.
    At 16MHz, this means I can only capture 128 pixels across. In short, there isn’t enough time to capture a higher resolution. There isn’t enough memory either, but that’s a whole other constraint. See my article about the Seeeduino Mega for doing higher resolution overlay.

    Comment by Michael on July 22, 2011 at 9:08 AM



  3. Hi. I was wondering is there anyway of encoding Closed Captioning Data using an Arduino? Thanks.

    Comment by Tristan on July 26, 2011 at 12:53 PM



  4. The hard part of generating a closed caption signal for display on a TV is that you need to generate a sine wave with a particular frequency, and the timing needs to be just right.

    Comment by Michael on July 27, 2011 at 11:55 AM



  5. Isn’t CEA-708 the relevant spec?

    Comment by Glen on August 17, 2011 at 12:13 AM



  6. CEA-708 seems to be the spec for digital TV signals. For analog, it’s EIA-608.
    http://en.wikipedia.org/wiki/EIA-608

    Comment by Michael on August 17, 2011 at 6:59 AM



  7. Both 608 and 708 may be in use. Sometimes one is primary language and the other is secondary language.
    And sometimes for broadcast they have to be tweaked to egt them in the right place…

    Comment by kf2qd on September 15, 2011 at 7:19 PM



  8. Hi Michael,
    Was very happy to come across your page and this particular project.
    I am very interested in this process of pulling the words out of the caption,
    and would love to chat to see if we have the same ideas regarding this
    technology.
    Thanks for your time,
    Bryan Amburgey
    513.293.6788
    Los Angeles, CA

    Comment by Bryan Amburgey on October 28, 2011 at 1:24 AM



  9. Thanks for creating this! i’m using it for a kinetic sound installation using ‘f’ words.
    so far the text is really jumbled, barely coherent, but i will persevere.
    I’ll send you a video link when the project is done :)

    Comment by MMCIII on November 8, 2011 at 11:23 PM



  10. For analog, it’s 608. And 608 holds CC1, CC2, CC3, and CC4. CC3 and CC4 are in field 2, while the other two are in field 1. I think Michael is decoding only field 1. I don’t know if he’s separating out CC1 from CC2, although it is rare for a show to mix these (but the Oprah Show used to do that for before it went off-air).

    Field 2 also has other potentially interesting things, like the name of the show, the name of the next show to air, station id, etc, in addition to CC3/CC4 (in what is called an XDS stream).

    For digital, the standard is 708, which has an embedded 608 stream. Few programs really fully use the 708 standard, though. You usually get away with just decoding the 608 stream.

    Comment by TVR on December 21, 2011 at 9:03 AM



  11. Hi Michael,

    I’ve found out how PAL captions works. http://www-user.tu-chemnitz.de/~heha/vt25/tele1.pdf, most PAL countries use the standard in the PDF. The main difference is:PAL uses line 21;PAL uses 8 clock run-in instead of 7;start bit is 11100100 rather than 001, so what changes we have to do to your lib to make it works with PAL.

    Thanks

    Comment by Jack on January 12, 2012 at 7:18 PM



  12. So.. can this be used as a regular CC decoder can? By that I mean to add subtitles to a film on laserdisc or something right on the TV? Without doing any mofiication to the code?

    Comment by retrorepair on January 15, 2012 at 8:48 PM



  13. Yes, you can use TV.print(c) to write the characters to the screen instead of writing them to the Serial line. I assume your TV doesn’t decode closed captions (that would be rare).

    Comment by Michael on January 16, 2012 at 8:23 AM



  14. WOW cool project.
    Is it also possible to generate Closed Captioning with an arduino and decode it with a second unit?

    Thx
    Andy

    Comment by Andreas Kraeuchi on March 3, 2012 at 4:05 PM



  15. No, it’s not possible to generate the required analog waveforms to create a closed captioning signal.

    Comment by Michael on March 4, 2012 at 7:17 AM



  16. Hi Michael,

    Great work! Very interesting project you’ve done.
    I got the video experimenter shield and have been testing it.

    Lately, I’m playing around with closed captions. I got it working and decoding, but I’m not able to decode latin characters like “ç” or “é”. Am I missing something? I tried reading the serial port using UTF-8 but no luck.
    Maybe the output is single byte encoding and i’m trying to read multibyte chars, not sure what could be the problem. Could you help me on this?

    Thanks
    Vítor

    Comment by Vítor on March 16, 2012 at 12:08 PM



  17. Hi there,

    I’ve just got a VE and it’s a great piece of kit.

    Does anyone know how to get Closed Caption working with PAL (UK)? At the moment I’m just getting a solid white line at the top of the screen where the CC pixels should be.

    Thanks,

    Angus

    Comment by Angus on April 2, 2012 at 5:40 PM



  18. Vitor
    Cjange the baud rate on the serial monitor to 57600 and it will work.

    Comment by Brad on September 23, 2012 at 5:24 PM



  19. Very nice!

    Comment by Tony on December 27, 2012 at 12:49 PM



  20. HI, I can not synchronize the text, so get Y # Y # Y $ by serial. any tips

    Comment by celso on February 15, 2013 at 11:46 AM



  21. Angus: I don’t know if you are still be around here but… Most of the rest of the world outside North America uses Teletext (a.k.a BBC Ceefax) which encodes far more bits into a line than EIA-608. It transmits 45 bytes of dat (41 7-bit bytes after run-in and framing) in each of 17 lines from line 6 through 22. There’s probably no way to decode this at reduced resolution.

    TVR: aside from CC1/CC2 on Field 1 and CC1(3)/CC2(4) on Field 2, there’s also T1/T2 on Field 1 and T1(3)/T2(4) on Field 2. So, 4 total data channels on Field 1 and 5 on Field 2.

    Comment by Jason on March 26, 2013 at 1:32 PM



  22. Please do not comment to ask for help. See full product details or use the support forum to get help.

    Comment by Michael on April 7, 2013 at 8:09 AM



  23. Has anyone been able to find CC data in a commercially produced DVD? I am using a DVD that has the “CC” logo on it, but I can’t seem to find the signal.

    Comment by Michael on August 29, 2013 at 1:52 PM



  24. Yes, I have. You may need to try looking on different lines.

    Comment by Michael on August 29, 2013 at 4:09 PM



  25. error ,class TVout’ has no member named ‘setDataCature’
    plz halp

    Comment by zach on April 7, 2014 at 11:33 PM



  26. You need to install the TVout library properly.

    Comment by Michael on April 8, 2014 at 8:09 PM



  27. Great!really liked. I am looking for a way to convert string following the pattern eia 608 to send that signal to an encoder. Have you seen anything about it? Congratulations on your project!

    Comment by Rafael on July 15, 2014 at 7:25 AM



  28. Hello, With this is there anyway to determine when a tv show switches to commercials? It would be great to make some smart remote to skip commercials.

    Comment by Richard on August 26, 2014 at 10:52 PM



Leave a Reply

*