July 7, 2013

New insights into the PRISM program

(Updated: January 21, 2016)

Last Saturday, June 29, the Washington Post unexpectedly disclosed four new slides from the powerpoint presentation about the PRISM data collection program.

This disclosure came as a surprise, because earlier, Guardian-journalist Glenn Greenwald said that no more slides would be published because they contain very specific technical NSA means for collection, for which The Guardian would probably be prosecuted.

That The Washington Post now disclosed them, is even more surprising, not only because it's an American paper, but also because it's said that Edward Snowden initially went to The Post asking to publish all 41 slides of the PRISM presentation. But The Washington Post refused to do so and therefore Snowden gave the scoop to The Guardian, which published the first four slides.

It's not clear who exactly released the four new slides, whether it was Snowden himself or editors of The Washington Post, and what the reason was for doing it. Allthough these new slides show some of the same oddities we already saw in the first series, these new ones have a very specific and detailed content. This makes them look far more genuine and, more importantly, show much better how PRISM actually works.

We now learn that PRISM is not one single technical system or computer application, but a data collecting project which combines a number of different tools, computer systems and databases, some existing, some maybe new. This also means that this PRISM program is not the same thing as the Planning tool for Resource Integration, Synchronization and Management (PRISM), a theory which was examined in our previous posting.

> The latest information: What is known about NSA's PRISM program

The PRISM tasking process

In this first new slide (below) we see details of the PRISM Tasking Process, which is how instructions for gathering the requested data are sent and reviewed. This process starts with an NSA analyst typing one or more search terms, or "selectors" as NSA calls them, into the Unified Targeting Tool (UTT). Selectors may refer to people (by name, e-mail address, phone number or some other digital signature), organizations or subjects such as terrorism or uranium related terms.

Along with the selectors, the analyst must fill out an electronic form that specifies the foreign-intelligence purpose of the search and the basis for the analyst’s reasonable belief that the search will not return results for US citizens or foreign nationals who are within the US at the time of data collection.

The slide shows that it's possible to search existing communications that are already stored ("Stored Comms") and also to initiate a search for new, future communications of selected targets. The latter option is called "Surveillance", which by a number of media was erroneously interpreted as the possibility of real-time monitoring of for example an internet chat.

Every request made by a target analyst must be approved twice. For new surveillance requests, an FAA Adjudicator (S2) does the first review and validation of the target. The slide says that there are such adjudicators in every so-called Product Line, which are the NSA departments for specific issues like counter terrorism and couter proliferation. A second and final review of the analysts' determination is done by NSA unit S343 for Targeting and Mission Management, which then releases the tasking request through the Unified Targeting Tool. Then it's apparently a computer system called PRINTAURA which distributes the requests to the different collection sites.

For searching stored communications, the first check is done by the Special FISA Oversight and Processing unit (SV4). According to The Washington Post this seems to refer to the federal judges of the secret Foreign Intelligence Surveillance Court (FISC), but according to national security reporter Marc Ambinder, the "FISA Oversight and Processing" is an internal NSA unit. The second and final review is once again done by unit S343 for Targeting and Mission Management. After the request is released to PRINTAURA, the Electronic Communications Surveillance Unit (ECSU) of the FBI checks against its own database to filter out known Americans.

Different tasking tools

In another source the Unified Targeting Tool (UTT) is described as a DNR tasking tool, which means it's a software program used to send so called tasking instructions to dedicated devices, telling them which data should be collected. As DNR stands for Dial Number Recognition, this sounds like the targeting tool is aimed at finding out who is behind a certain phone number, but as NSA sources often mention DNR equal to DNI (Digital Network Intelligence or internet content), it seems DNR stands for information derived from telephone networks in general.

According to one of the earlier slides, NSA analysts should also use other sources, like data which can be gathered through access points that tap into the internet’s main gateway switches ("Upstream collection"). This is done through collection programs codenamed FAIRVIEW, STORMBREW, BLARNEY and OAKSTAR. Allthough by its name the Unified Targeting Tool (UTT) seems to be of a generic nature, it's not clear whether it can be used also for tasking these other sources, or that they need other tasking tools. According to the book "Der NSA Komplex" the UTT replaced the OCTAVE telephony tasking system in 2011.*

Screenshot of the Unified Targeting Tool (UTT) showing how to select a "Foreigness Factor"
(note the URL in the address bar starting with "gamut")

From a number of job descriptions we learn that this Unified Targeting Tool is often mentioned in connection to GAMUT and sometimes also to CADENCE. We see this written like "GAMUT-UNIFIED TARGETING TOOL", "GAMUT/UTT" or "CADENCE/UTT". Both GAMUT and CADENCE are nicknames for what is said to be a "collection mission system for tasking" and probably refer to databases which store the tasking requests from the Unified Targeting Tool.

An interesting coincedence is that the word gamut means a range of colors that can be reproduced by a certain technique - like a prism can break light up into its constituent spectral colors.

More important is that the new slide shows that for PRISM the Unified Targeting Tool (UTT) is used for tasking, which means that this PRISM program is different from the Planning tool for Resource Integration, Synchronization and Management (PRISM), which itself is a tasking tool. Before the new slides were released, The Guardian and The Washington Post failed to explain whether PRISM was a single application or a project-like program.

Infographic comparing the PRISM data collection program and the PRISM planning tool
(click for a bigger picture)

Now we know that the PRISM planning tool isn't the application used for tasking the data collection from the internet companies, it's also clear that the PRISM planning tool is used primarily for requesting information needed for military operations and therefore tasks various intelligence sources deployed to those operations. By contrast, the Unified Tasking Tool used under the PRISM program is for requesting information on the national level.

The actual data collection

The actual collecting of the internet data under the PRISM program is not done by the NSA, but by the Data Intercept Technology Unit (DITU) of the FBI. This makes sense, as the FBI is the agency which is primarily responsible for investigating US companies and citizens.

From one source it seems that the Data Intercept Technology Unit was set up in 2011 or 2012 to monitor new and emerging technology with court-authorized intercepts, but this source (pdf) says that it already existed in 1997. There's a challenge coin of DITU (right) dating from after 9/11, as it shows pictures of the World Trade Center and the Pentagon.

In it's comments on this slide, The Washington Post says this FBI "interception unit [is] on the premises of private companies", which isn't the case as DITU is an FBI unit based at Quantico, Virginia. They can have equipment installed at sites of the internet companies, but for that no evidence is presented, making one author questioning whether there is such equipment at all.

Initially the DITU managed the FBI's internet monitoring programs Omnivore and Carnivore, tapping into the internet at ISP locations. The raw data were decoded by using the Packeteer en Coolminer tools, as can be read in this document (pdf) from 2010, but according to the PRISM-reporting, the unit can now also order data from companies like Google, Yahoo, Microsoft, Apple and others directly.

A new report by Declan McCullagh says that internet companies don't want the FBI to install listening devices on their networks. In order to prevent that, they are willing to cooperate with the FBI by adding their own monitoring capabalities to their network and server equipment, which makes it easier for them to comply with government information requests. This would mean that there's no need for dedicated FBI data collecting devices at the companies premises.

Earlier, Google said that when it receives a valid FISA court order, it delivers the information to the US government through secure FTP transfers or in person. Another option is doing this by using an encrypted dropbox, where an internet company can drop the requested data. Facebook and Microsoft said they will only hand over data or information about specific individuals upon receiving a legally binding order or subpoena.

Depending on the company, a PRISM-tasking may return e-mails, attachments, address books, calendars, files stored in the cloud, text or audio or video chats and metadata that identify the locations, devices used and other information about a target. After collecting, the FBI's Data Intercept Technology Unit passes this information to one or more so called customers at the NSA, the CIA or the FBI itself.

Storage of collected PRISM data

A second slide (below) shows how collected data flows into the various NSA servers. It's the Data Intercept Technology Unit (DITU) of the FBI which collects raw data from the internet companies, and sends them to the NSA. At NSA the data first go to a system called PRINTAURA, which, according to the Washington Post, automates the traffic flow.

As PRINTAURA also distributes the tasking requests, it seems this system is the technical heart of the PRISM program, which may also be indicated by the fact that both nicknames/codewords start with the same three letters. As we learn from the slide, PRINTAURA is managed by NSA unit S3532.

All NSA offices, operations, units and cells have their own designation, consisting of a letter, followed by some numbers. We remember that the first slide of the PRISM presentation has a line which says "[...] PRISM Collection Manager, S35333". This means the author of the slides was a collection manager attached to unit S35333, which, just like the PRINTAURA unit S3532, is part of the S35 or Special Source Operations (SSO) division according to this NSA orgchart.

From PRINTAURA data go to a database called TRAFFICTHIEF, which according to this article was set up as part of the TURBULANCE program to detect threats in cyberspace. From a slide about the XKeyscore tool, published by The Guardian on July 29, we learn that TRAFFICTHIEF is a database for metadata about specifically selected e-mail addresses.

Data to be processed are send to a system called SCISSORS, which is managed by unit T132, and from there onto unit S3132 for "Protocol Exploitation". This does the processing of something which is blacked out - probably the specific classified codeword used for these internet data.

This processing sorts the data into different types and protocols and dispatches them to the various NSA databases for storage. But before that, metadata and voice content have to pass FALLOUT and CONVEYANCE. According to the Washington Post, these systems appear to be a final layer of filtering to reduce the intake of information about Americans, but an internal NSA document describes FALLOUT as a "DNI ingest processor". All other data once again pass the SCISSORS system.

Finally, the collected data are stored in the following databases:
- MARINA: for internet metadata
- MAINWAY: for telephone and internet metadata contact chaining
- NUCLEON: for voice content
- PINWALE: contrary to what many other media say, this database is not only for video content, but also for "FAA partitions" and "DNI content". DNI stands for Digital Network Intelligence, which is intelligence derived from digital networks, or simply: internet content, like forum postings and e-mail and chat messages. The word PINWALE is often combined with the abbreviation UIS, which stands for User Interface Services, apparently an interface tool for accessing and searching databases.

Analysing collected data

There are no slides available saying what happens with these data after being stored, but The Washington Post says that "After processing, [collected data] are automatically sent to the analyst who made the original tasking. The time elapsed from tasking to response is thought to range from minutes to hours. A senior intelligence official would say only, Much though we might wish otherwise, the latency is not zero."

At the moment it's not clear which tool or application is used to analyse the data gathered from the US internet companies. National security reporter Marc Ambinder says that PRISM itself might be "a kick-ass GUI [graphic user interface] that allows an analyst to look at, collate, monitor, and cross-check different data types". However, until now there's no evidence for PRISM being such a tool for analysis.

Most tools used by NSA employees are listed in job descriptions and the PRISM we see there is always the Planning tool for Resource Integration, Synchronization and Management, that we talked about in our previous posting.

Therefore, it's likely that data gathered under the PRISM program are analysed using other common NSA analysing tools, like the XKEYSCORE indexing and analysing tool, which The Guardian erroneously presented as a collection program, or a more specific tool called DNI Presenter, which is used to read the content of stored e-mails and chats or private messages from Facebook and other social networks.

Based upon what such analysis presents, NSA analysts use other tools, like CPE (Content Preparation Environment), to write a report. Such reports are then stored in databases for finished NSA intelligence products, like ANCHORY. Finally, these intelligence reports are available to end users through the Top Secret section of INTELINK, which is the intranet of the US intelligence community.

PRISM case notations

A third slide (below) shows how each target gets a unique PRISM case notation and what the components of these notations are.

Abbreviations: IM = Instant Messaging; RTN-EDC = Real Time Notification-Electronic Data Communication(?);
RTN-IM = Real Time Notification-Instant Messaging; OSN = Online Social Networking; CASN = Case Notation

The first position is the designation for each of the providers from which internet data are collected. Some people noticed the numbers jumped from P8 for AOL to PA for Apple, but someone suggests that P9 was maybe assigned to a company that fell out, and that the numbers may be hexadecimal, so the next provider will be PB, followed by PC, etc., as B = 11, C = 12, etc.

The next position of the case notation is a single letter, designating the content type, like e-mail and chat messages, social network postings, but also so-called real-time notifications (RTN) for e-mail and chat events. The Washington Post and other media apparently misinterpreted this by saying that NSA officials "may receive live notifications when a target logs on or sends an e-mail, or may monitor a voice, text or voice chat as it happens".

(Update: compare this to the data analysing tool TAC, which is used by the Defense Intelligence Agency and offers "real-time analysis of data" by alerting "analysts immediately when fresh intelligence is detected".)

In the slide, the real-time notifications are clearly listed as being "Content Type" and most of us will know them as the messages you get when someone logs in at an internet chatroom or an instant messenger, or when you receive an e-mail through an e-mail client. These notification messages are also available for NSA analysts, but only after being collected and stored, just like all other types of internet content.

Searching the collected data

The fourth new slide (below) is presented by The Washington Post as being about "Searching the PRISM database", but as we just learned from the dataflow slide, there is no single PRISM-database. Data collected from the internet companies go into separate databases, according to the type of data. Some of these databases already existed before the PRISM program was started in 2007.

The content of the slide shows a screenshot of a web based application called REPRISMFISA, which is probably accessible through the web address which is blacked out by the Post. Unfortunately there's no further explanation of what application we see here, but if we look at the word REPRISMFISA we can imagine the application is for going "back to data collected under the PRISM program according to the Foreign Intelligence Surveillance Act (FISA)". Remember also that in one of the earlier slides it's said: "Complete list and details on PRISM web page: Go PRISMFAA".

Above the olive green bar, there is a line saying: "DYNAMIC PAGE - HIGHEST POSSIBLE CLASSIFICATION IS TOP SECRET // [blacked out] / SI / TK // ORCON // NOFORN" This means that depending on the generated content of the page, it has to be classified as TOP SECRET, with additionally one or several of the following Sensitive Compartmented Information control systems:
- TALENT KEYHOLE (TK - for data collected by space-based collection platforms)
- Special Intelligence (SI - for data from communications intercepts)
- an undisclosed control system marked by a classified codeword, which is blacked out by The Washington Post. Probably this is the codeword used for information which is based upon data derived from the internet companies. As said earlier, "PRISM" is not a codeword used for content, but rather the (unclassified) nickname of the program for collecting certain internet data.

In the center of the page there are three icons, which can be clicked: PRISM, FBI FISA and DOJ FISA. This seems to confirm that this application is used to search data collected under the Foreign Intelligence Surveillance Act (FISA), specified for use by NSA, FBI and the Department of Justice (DOJ).

Below these icons there is a search field, to get a partial list of records. The search options seem rather limited, as only two keywords can be entered, with an additonal "and/or" option. At the left there's a column presenting a number of options for showing totals of PRISM entries. For checking the record status, one can click the following options:
- See Entire List (Current)
- See Entire List (Expired)
- See Entire List (Current and Expired)
- See NSA List
- See New Records
- Ownership count

Below this list, the text says: "If the total count is much less than this, REPRISMFISA is having issues, E-MAIL the REPRISMFISA HELP DESK AT [address blacked out] AND INFORM THEM"

The numbers below that text are hardly readable, but the Washington Post says that on "April 5, according to this slide, there were 117,675 active surveillance targets in PRISM's counterterrorism database". This sounds like a huge number, but without any further details about these targets it's almost impossible to give some meaningful opinion about it.

(Updated with minor additions and corrections based upon recently disclosed documents)

Links and Sources

- ForeignPolicy.com: Meet the Spies Doing the NSA's Dirty Work
- TheWeek.com: Solving the mystery of PRISM
- ForeignPolicy.com: Evil in a Haystack
- WashingtonPost.com: Inner workings of a top-secret spy program
- TechDirt.com: Newly Leaked NSA Slides On PRISM Add To Confusion, Rather Than Clear It Up
- Technovia.co.uk: Something doesn’t add up in the lastest Washington Post PRISM story
- VanityFair.com: PRISM Isn’t Data Mining and Other Falsehoods in the N.S.A. “Scandal”
- CNet.com: FBI: We need wiretap-ready Web sites - now (2012)
- CNet.com: How the U.S. forces Net firms to cooperate on surveillance


barton gellman said...

Hi, thanks for the honor of a careful reading. You have a few things mixed up, and in general, if I may say so, you're mistaken when you say my descriptions in the Washington Post are incorrect. Keep in mind that I have many interviews and much supporting information behind the summary captions I wrote for the slides. You do raise one specific question I wanted to answer: it was the executive editor, at my recommendation, who decided to publish the additional slides. Snowden made no request on that, one way or the other, after our first story. I suggested posting new slides because I had had additional time to understand their nuances and security implications, and I thought they would help answer outstanding questions. --Barton Gellman

P/K said...

Thank you for your response and also for recommending to publish the additional slides. They are very helpful to get a better idea of what PRISM actually is. Of course I am limited to a critical view at what is published, therefore I hope that more slides or more detailed information about the PRISM program can be published in the future, so hopefully the still many questions about it can be answered and we can better see what NSA is actually doing. - Peter Koop

Anonymous said...

Thanks for the blog

In Dutch: Meer over het wetsvoorstel voor de Tijdelijke wet cyberoperaties