Skip to main content


Third Eye: API for Internet Archive TV News chyrons

How we turn TV News chyrons into data

TV cable news channels display chryons on the "lower thirds" of screens, to display breaking news and other highlights. Using the Internet Archive TV News, TV architect Tracey Jaquith built the Third Eye to scan the lower parts of the screen and apply OCR, or optical character recognition, to turn the words into text.

Third Eye captures four TV cable news channels: BBC News, CNN, Fox News, and MSNBC.
The project launched with four million chyrons captured in just over two weeks.

TV chyron "lower third" and OCR example:

| V | V AFTER WH MEETING, SCHUMER DISHES WHEN HE THOUGHT NIC WAS OFF

Filtered chyrons turned into toots

Because chyrons are created in near real-time, they can sometimes include misspellings; in addition, the OCR process can return some messy text. Jaquith has adapted algorithms to find the most representative and clearest toots for every 60-second period. This cleaned up feed fuels the bots that post to Mastodon, in near-real time, which chyrons are appearing on TV news screens.

Request TSV format here.

Most recent: hours
Most recent: days
Defaults to last 3 hours
Day value can be a float, eg: '1.5' for last 1.5 days



Date range:
from:
to:
Days are in GMT timezone. The default is prior full day plus today's current data.



Save results to file:
Raw format:

There are two sets of data you can find here. The default is to pull the filtered toots data; dates/times here are in Pacific Standard Time. The alternative is to check the box next to 'Raw format', which shows results up to once per second with minimal filtering (and more OCR errors).

Image lookups and Video links

Image lookup

You can copy a row from our API and do an image lookup to see what we OCR-ed. We have approximately the prior six months available. Paste and press the [show image] button below.

Video links

You can use the [show video] button below to see the clip around the OCR-ed region.
Paste a line from TV Third Eye TSV API output here to return the image used for its OCR:

  • Chryons are derived in near real-time from the Internet Archive TV News's collection of TV news. The constantly updating public collection contains 1.4 million TV news shows, some dating back to 2009.
  • Result times have some approximation calculations buit in due to storage / retrieval efficiency concerns. Asking for "most recent 3 hours" will typically have some extra data in results.
  • You can alternatively find/use the daily raw .tsv files directly (from this item ).
  • Data can be affected by temporary collection outages, which typically can last minutes or hours, but rarely more. If you are concerned about a specific time gap in a feed and would like to know if it's the result of an outage, please inquire at tvnews@archive.org.
  • The "raw feed" option provides all of the OCR'ed text from chryons at the rate of approximately one entry per second. The "filtered toots" download provides the data feed that fuels our Mastodon bots; this has been filtered to find the most representative, clearest chyrons from a 60-second period. The filtered feed relies on algorithms that are a work in progress; we invite you to share your ideas on how to effectively filter the noise from the raw data.
  • Dates/times are in UTC (Coordinated Universal Time) in API feeds, (PST (Pacific Standard Time) in toots.
  • Because the size of the raw data is so large (about 20 megabytes per day), we limit results to seven days per request.
  • We began collecting raw data on August 25, 2017; the filtered feed begins on September 7, 2017.
  • To open a TSV file in a program such as Google sheets or Excel, you'll need to download the text file, check the box next to "Save results to file" to save as a file on your computer. Then you can use the import function to pull the text file into your program, using "tab" as your delimiter.
  • "Duration" column is in seconds –- the amount of time that particular chyron appeared on the screen.
  • To view clips in context on the Internet Archive TV News, paste
    https://archive.org/details/
    before the field that begins with a channel name.
    For example:
    https://archive.org/details/
       +
    MSNBCW_20170914_220000_The_Beat_With_Ari_Melber/start/3385
       =
    https://archive.org/details/MSNBCW_20170914_220000_The_Beat_With_Ari_Melber/start/3385