Electrospaces.net: Metadata

Showing posts with label Metadata. Show all posts

July 14, 2017

Dutch report provides metadata numbers to compare with Snowden documents

(Updated: September 30, 2019)

Since the Snowden revelations, we know that signals intelligence agencies are trying to acquire large sets of telephone metadata in order to analyse them in support of protecting their national security.

Less known is that commercial companies also analyse similar big data sets, albeit for research purposes and with personal information being anonymized.

Now, a research report from the Netherlands provides us with actual numbers of mobile telephone metadata, which can be used to compare with the numbers that NSA and GCHQ collected according to the Snowden documents.

Tourist movements

Recently published was a report about visitor movements in and around the Dutch capital of Amsterdam. It was prepared by the economic research company Decisio on behalf of the province of Noord-Holland and the municipalities of Amsterdam and Zandvoort.

Since a few years, Amsterdam almost suffers from a huge increase of tourists, but it was difficult to get detailed insights in where they come from, where they stay and which areas of the city, as well as which surrounding towns are most popular.

Now, these insights became available by using information from the "tracking device" carried by almost every individual: the mobile phone.

Anonymized data sets

Decisio acquired a huge set of mobile telephone metadata from Vodafone, which is the second largest provider in the Netherlands, with over 5 million customers. When they use their mobile phones, they connect to one of the 32.000 cell towers or base stations, which associate the phone number with a location.

Each month, Vodafone provides these data to another research company called Mezuro, which processes and analyses them to map the movements using a grid of 1250 regions containing multiple base station cells. The results were then analysed by Decisio and compared with other information sources.

But before that, the Vodafone metadata were anonymized by replacing every phone number with a random number that changes every month. Foreign phone numbers were replaced daily. Also, only the movements of groups of more than 15 numbers were reported, so it's impossible to track the movements of individual phone users.

Development of the average number of daily mobile phone transactions
from Vodafone users between January 2013 and September 2015
(source: Decisio research report)

Mobile phone metadata

Most interesting parts of the report are the details about the telephone metadata: Mezuro periodically receives information about some 3 million Vodafone phone numbers that are active on a daily basis. These phones generated 400 million "transactions", or Call Detail Records (CDRs) a day.

These transactions are the moments that a mobile phone connects to a cell tower, not only for a phone call or a text message (SMS), but increasingly often for a social media posting, sending or receiving an e-mail, a Google search or checking a website - for Dutch users, this is on average 100 times a day. Besides these communication transactions, mobile phones also connect to cell towers without transmitting any kind of message; these non-communication transactions are only stored for up to a few hours.

An article from October 2013 about Mezuro says that the company analyzed some 150 million data points daily and that an average smartphone connects 150 to 200 times a day with a cell tower.

This number was confirmed during a parliamentary hearing in Germany, when someone from BND explained that one cell phone generates between 100 and 200 metadata and business records a day.

Update: similar numbers come from the Russian billing (and interception) company Peter-Service, which installs billing installations capable of 18 million subscribers creating 2,851 billion transactions a day, which equals 158 transactions per person per day.

If we take these metadata as the rows of a (database) table, each of them contain multiple fields, corresponding to columns for information pieces like for example the number calling, the number called, date, time, cell tower location, and information needed to transfer various types of messages.

Multiplying

For the tourism research, the Vodafone data were multiplied in order to get the numbers for the full population. The multiplier changes daily depending on the day of the week, holidays, etc, but lies roughly around 5 (for foreign visitors it's much more difficult to calculate this number).

As this total also includes people who don't use a mobile phone, the multiplier for the total number of metadata must be lower. According to the report, the users of the Vodafone network account for 1/3 of all mobile phone users in the Netherlands, so here we can use 3 as multiplier.

That makes that in 2016, all Dutch mobile phones generated some 1200 million transactions a day. In a month that's over 36 billion and in a year 432 billion telephone metadata records.

For comparison with the numbers from the Snowden documents, we have to look for the numbers from early 2013. The chart from the report shows that in January 2013, there were ca. 85 million transactions by Vodafone users a day, which makes 255 million for all Dutch users. In a month that's 7,65 billion.

Numbers from BOUNDLESSINFORMANT

Now, let's take a look at some of the numbers from the Snowden revelations. For the Netherlands there was a chart from the NSA tool BOUNDLESSINFORMANT, which shows 1,8 million telephone metadata records for 30 days around January 1, 2013.

Initially it was thought that this were Dutch data sucked up by NSA, but later it came out that they were actually collected by Dutch military intelligence, ~~most likely in Afghanistan~~, and subsequently shared with the Americans.

Now that we know that in the same period of time, the Dutch mobile phone users alone already accounted for over 7 billion metadata, 1,8 million is a tiny number, maybe generated by not more than 2500 smartphones. In Afghanistan, old fashioned cell phones may have created less transactions, so the 1,8 million metadata could have been the traffic captured from a small town.

Update: On Twitter, a Dutch journalist involved with the Snowden revelations said that the 1,8 million records represent some 12 million pieces of metadata (which means one record consists of at least 6 fields) and that the Dutch Ministry of Defence had confirmed that they were collected from Somalia.

The BOUNDLESSINFORMANT chart for the Netherlands with data
collected from December 10, 2012 to January 8, 2013
(click to enlarge)

Late 2013, major European newspapers published similar charts for other countries too, again claiming that they showed how many phone calls NSA was intercepting. But even if those claims were true, the 70 million the BOUNDLESSINFORMANT chart presented for France, 60 million for Spain, 45 million for Italy and 33 million for Norway, are tiny numbers given the actual 7,65 billion metadata for a small country like the Netherlands.

Even the 552 million metadata in the chart for Germany doesn't come close. If the Netherlands with some 16 million people generated 7,65 billion mobile phone metadata a month, then for 80 million German citizens that number would be over 38 billion.

And to be clear: the data represented in these specific BOUNDLESSINFORMANT charts were not collected by NSA in Europe, but shared with NSA by European intelligence agencies, as part of their military cooperation in various crisis zones.

NSA and GCHQ totals

Finally, we can look at how many telephony metadata NSA and GCHQ collect in total and compare that with the numbers from the Netherlands. In 2012, the British GCHQ "was handling 600m "telephone events" each day" - according to Snowden documents seen by The Guardian.

This seems a surprisingly small number compared to the 225 million transactions generated by Dutch users, but it's possible that the 600 million only apply to traditional telephone and SMS metadata, excluding the internet data from smartphones.

The NSA collected a total of 135 billion telephone metadata a month during the first half of 2012. This is some 17 times the amount for the Netherlands as a whole - again not a very excessive number, as it would roughly equal the telephone metadata of around 300 million people, which is more or less the population in the Middle East.

Update: It should be noted that the NSA's total of 135 billion telephone metadata includes both landline and cellphone metadata, where the 7,65 billion metadata for the Netherlands are only those generated by mobile phones.

The volumes of NSA metadata collection between January and June 2012
(click to enlarge)

Conclusion

During the Snowden revelations, the press was eager to present numbers about NSA and GCHQ data collection that seemed impressingly high. But not a single media outlet took the time or effort to come up with the total numbers of telephone and internet communications, needed to put them in the right perspective.

From the research report about Amsterdam tourism we finally learned what the actual number of mobile telephone metadata for a western country look like. Although we still don't know how exactly NSA and GCHQ are counting their metadata, comparing them to the numbers from the Netherlands shows that their collection efforts may be not as excessive as initially thought.

Links and sources
- Decisio: Bezoekersstromen Amsterdam - Zandvoort (2017)
- Autoriteit Consument & Markt: Telecommonitor eerste kwartaal 2016 (2016)
- ITU: Innovation of tourism statistics through the use of new big data sources (2014)
- CBS: Rapportage project impact ICT; Mobiele telefonie (2013)
- Data Management: Building a Data Warehouse of Call Data Records (CDR): Solution Considerations (2012)

One interesting result from the tourism report is that measuring the number of visitors of Amsterdam's annual Gay Pride showed that instead of the 560.000 visitors according to the organisation, only 115.000 visitors came from outside the city center, additional to the 235.000 people who are present on every Saturday and may or may not have watched the event. This confirms that visitor numbers for free public events are often significantly exaggerated.

February 13, 2016

How NSA contact chaining combines domestic and foreign phone records

(Updated: September 18, 2017)

In the previous posting we saw that the domestic telephone records, which NSA collected under authority of Section 215 of the USA PATRIOT Act (internally referred to as BR-FISA), were stored in the centralized contact chaining system MAINWAY, which also contains all kinds of metadata collected overseas.

Here we will take a step-by-step look at what NSA analysts do with these data in order to find yet unknown conspirators of foreign terrorist organisations.

It becomes clear that the initial contact chaining is followed by various analysis methods, and that the domestic metadata are largely integrated with the foreign ones, something NSA never talked about and which only very few observers noticed.

What is described here is the situation until the end of 2015. The current practice under the USA FREEDOM Act differs in various ways. The information in this article is almost completely derived from documents declassified by the US government, but these have various parts redacted.

- RAS-approval - Different kinds of queries - Contact chaining queries -

- Analysing the contact chains - Conclusion -

- . - . - . - . -

RAS-approval

As a seed for starting a contact chain, NSA analysts can take a telephone identifier like a phone number (also called a selector), based upon:

- their own ongoing analysis on an existing target set;
- a Request for Information (RFI) from another government agency;
- a notification of a match between a known counterterrorism-related selector and an identifier among newly ingested phone metadata.

Access to the domestic phone records was granted to about 125 intelligence analysts from the Homeland Security Analysis Center (HSAC, or S2I4) of the NSA's Signals Intelligence Directorate. There were also up to 22 specially trained officials called Homeland Mission Coordinators or HMCs (initially shift coordinators).

As required by the FISA Court orders, only these HMCs, the chief and the deputy chief of the HSAC are allowed to determine that there is a Reasonable, Articulable Suspicion (RAS) that a certain selector is associated with a designated foreign terrorism group and/or Iran. Such a RAS-approval is only needed for the domestic phone records, not the ones collected overseas.

NSA has a special RAS Identifier Management System to streamline the adjudication of the requests for RAS approval and the documentation thereof. The codename of this system is IRONMAN, as we learn from this document from a declassified 2011 training presentation (.pdf) in which this codeword wasn't redacted twice:

A RAS-approval is effective for one year, meaning that during the next year, repeated queries using the approved seed selector can be made. If the selector is reasonably believed to be used by a US person, the approval period is 6 months.

The number of RAS-approved identifiers varied substantially over the years, but in 2012, there were fewer than 300. According to the annual Transparancy Report from the Director of National Intelligence (DNI), there were 423 such selectors in 2013, but just 161 in 2014. It's not known how many of these belonged to Americans.

Different kinds of queries

From various declassified documents analysed in an article on the weblog EmptyWheel, it becomes clear that there are three different kinds of queries that NSA analysts conducted on the domestic phone records database:

1. Queries for data integrity purposes
2. Queries for "Ident lookups"
3. Queries for contact chaining

In the EmptyWheel article it's assumed that besides these queries, NSA also conducted some kind of pattern analysis: in many declassified documents a redaction appears right after the term "contact chaining", which according to EmptyWheel could hide something like "pattern analysis".

Given that in these documents the targets are also redacted, there's also the possibility that the redaction hides a description of the target, like "contact chaining al-Qaida affiliates".

At least one NSA memorandum from 2009 indeed speaks about "chaining and analysis", but there can be two kinds of analysis: one conducted on the bulk of raw metadata records, and another one on selected results of contact chaining.

NSA always denied that it conducts pattern analysis on the bulk metadata themselves, stating that every search begins with a specific telephone number or other specific selection term. So far, there are no indications of the contrary, so the analysis apparently refers to the results of contact chaining queries, which is confirmed by the 2014 report (.pdf) about the Section 215 program by the Privacy and Civil Liberties Oversight Board (PCLOB).

As we will see later on, this second type of analysis is indispensable for making the contact chaining queries useful for foreign intelligence purposes.

(1) Data integrity queries

The first way the domestic phone records were queried was for data integrity purposes. This was done by some 25 specialized Data Integrity Analysts (DIAs). They didn't conduct target analysis, but helped intelligence analysts with questions on a target. For those cases, a DIA could use a standard login (with appropriate controls) to query the phone records for foreign intelligence purposes.

However, when they queried for data integrity purposes, DIAs used a special login that bypassed the normal controls (like EAR) and also the auditing. This because for this task, they were allowed to use identifiers that were not RAS-approved (not allowed though were selectors that had expired because they were not revalidated).

One goal of these data integrity queries was to discover selectors that, for reasons that were redacted in the review report, should not become part of analysis, both for BR FISA and other purposes. These selectors could then be added to a defeat list of identifiers that were deemed to be of little analytic value, and/or to a database holding those that should not be tasked onto the collection system.

There was of course a risk of mixing up these tasks, and after an expired identifier had been queried in March 2010, the NSA Inspector General recommended that the duties of DIAs and foreign intelligence analysts should be clearly separated.

(2) Ident lookup queries

A second kind of query was for so-called "ident lookup". According to an NSA Inspector General test report (.pdf) from April 2010, this refers to:

"querying a selector using [tool name redacted] to determine the approval status of a selector. In such cases, the Emphatic Access Restriction controls will prevent chaining of a selector that is not marked as approved for querying, and return an error message to the analyst. Because the selector was not actually chained, there is no violation of the Order"

Emphatic Access Restriction (EAR, pronounced as "ear") is a tool that was installed at the MAINWAY database in February 2009. It automatically prevents using a selector that is not RAS-approved. It seems therefore that when an analyst started a query and the seed selector appeared to be not approved, that query was called an "ident lookup" (although EmptyWheel has a different interpretation).

This could be the way it worked before the IRONMAN system was established, as in a training module from 2011, it is said that by then, analysts just had to "use [tool name redacted] to determine the identifier’s approval status".

(3) Contact chaining queries

The most important queries on the domestic phone records were of course those conducted by intelligence analysts in order to "identify unknown terrorist operatives through their contacts with known suspects, discover links between known suspects, and monitor the pattern of communications among suspects".

For this, an analyst took a RAS-approved selector (often a telephone number) and entered it into a specialized metadata tool, which searched the telephone metadata in the MAINWAY contact chaining system. To limit the number of results, the analyst could set a certain timeframe for the query.

The metadata tool then returns "a .cml file, usually referred to as a chain, which is made up of the individual first hop contacts of the seed". Usually, the analyst will also be interested in the second-hop contacts, and then the tool will retrieve the batches of one-hop chains for the identifiers that had been in direct contact with those from the first hop series.

Number of hops

Based upon the FISA Court orders, NSA analysts were also allowed to retrieve the numbers in contact with all the numbers from the second hop, which would make a third hop. The software tools are said to prevent looking beyond the third hop, or performing a query of a selection term that has not been RAS-approved.

The initial authorizations under the President's Surveillance Program (PSP) did not prohibit chaining more than two degrees of separation from the target, but "NSA analysts determined that it was not analytically useful to do so".* When this collection was brought under supervision of the FISA Court, it limited contact chaining to 3 hops.

But despite that authorization, the policy of NSA's Counter Terrorism branch restricted chaining to 2 hops, as can be seen in an NSA training presentation (.pdf) from 2007:

A 2011 training module says that chaining to a third hop is possible, but only after prior approval by the analyst's division management (for example when a contact that comes up with the first hop appears to be an already known suspect).

Strangely enough, both a government white paper and the PCLOB-report don't mention this policy restriction and in the latter it's even assumed that chaining 3 hops was regular practice:

"If a seed number has seventy-five direct contacts, for instance, and each of these first-hop contact has seventy-five new contacts of its own, then each query would provide the government with the complete calling records of 5,625 telephone numbers. And if each of those second-hop numbers has seventy-five new contacts of its own, a single query would result in a batch of calling records involving over 420,000 telephone numbers"

As of 2012, the FISA Court also allowed an automated chaining process in which "the NSA's database periodically performs queries on all RAS-approved seed terms, up to three hops away from the approved seeds. The database places the results of these queries together in a repository called the "corporate store" - the NSA was never able to get that working though (although the PCLOB report, again, describes it as if it was actually implemented).

Visualization

The results from a contact chaining query can be visualized by a contact graph. An example was published by the German magazine Der Spiegel, showing a slide from an NSA presentation with a 2-hop contact graph for the e-mail addresses of the CEO and the chairwoman of the Chinese telecommunications company Huawei:

> See also: An NSA eavesdropping case study

Domestic and foreign results

Generally, it is said that analysts query the "Section 215 calling records", the "BR metadata" or something similar. This sounds like they only access the domestic telephone records and that therefore the resulting contact chains would fully consist of American phone numbers.

The initial seed number however will often be a foreign number, as the whole purpose of the Section 215 program is to discover connections between foreign terrorists and potential conspirators inside the US. Analysts will therefore choose a seed for which they expect a good chance it has a domestic nexus, which probably explains the low numbers of RAS-approved identifiers.

But as we have seen in the previous article, NSA stored the domestic phone records in MAINWAY, which also contains the foreign telephone and internet metadata collected overseas. That means that a contact chaining query will not only return identifiers from the domestic, but also from the NSA's worldwide metadata collection.

Federated queries

Such results from multiple sources are called federated queries. According to a 2011 training module, BR FISA queries initially only resulted in these federated queries, but in later versions of the query tool, the analyst could also check boxes to conduct an "unfederated" query and choose individual collection sources.

These options can be seen in the following screenshot from the user interface (the codename of which is redacted) used to conduct the contact chaining:

Selecting the "FISABR Mode" makes that an additional checkbox for the EO12333 source appears. An NSA memorandum explains that when this BR FISA option is chosen, the analyst will not only be provided with the domestic telephone metadata, but also with those from the SIGINT realm (which is collection overseas under EO 12333 authority), dating back to late 1998.

When the analyst used a RAS-approved selector, he could also check the box for PENREGISTRY, or PR/TT, which refers to the domestic internet metadata, but the collection thereof was ended by the end of 2011. Normal mode is for all other metadata collected abroad.

Analysts can determine the collection sources of each result by examining the Producer Designator Digraph (PDDG) and/or SIGINT Activity Designator (SIGAD) from each line of the contact chain file. BR FISA metadata can be identified by specific SIGADs.

SPCMA

There's also a fourth box for SPCMA mode, which stands for the "Special Procedures governing Communications Metadata Analysis" from January 2011. These allow contact chaining and other types of analysis on metadata that have already been collected under EO 12333, regardless of nationality and location (because metadata aren't constitutionally protected).

This means that US person identifiers that were in contact with valid foreign intelligence targets may be used for searching these foreign metadata too.

NSA isn't allowed to collect US data overseas, but these do come in "incidentally" when for example foreigners communicate with Americans - precisely the kind of communications that could reveal conspirators inside the US. Many international phone calls from or to the US, will likely be intercepted by NSA collection facilities abroad too.

In other words:

- By default, any contact chaining query will use the foreign metadata collected overseas. For these, any useful selector may be used as a seed, and, under SPCMA, even one that belongs to an American.

- If the seed selector is RAS-approved, then the domestic phone records will be used too, which could lead to the discovery of additional contacts within in the US.

The fact that most contact chains will consist of both foreign and domestic identifiers means that they contain much less American numbers then in calculations like the one from PCLOB, which give the impression that queries resulted in up to 3 hops of domestic numbers.

Federated contact chaining queries including domestic and foreign phone call records

Analysing the contact chains

It should be noted that the phone numbers (or other selectors) which are returned after an initial contact chaining query are anonymous and therefore meaningless. They're just numbers which could belong to anyone: from a pizza delivery to a dangerous conspirator.

So, in order to identify which numbers are of interest for finding unknown suspects, additional analysis is needed - a comprehensive GCHQ book (.pdf) disclosed last week calls contact chaining the start of a "painstaking process of assembling information about a terrorist cell or network".

Analytic tools

In the early years of the President's Surveillance Program (PSP), only the SIGINT Navigator (SIGNAV) tool was available to view the output of the MAINWAY contact chaining system. Later, new tools were created to improve efficiency and to obtain the most complete results, they were designed to use phone records collected both domestically and overseas.

According to the 2009 BR FISA review, there were 19 different analytic tools used for analysing both the raw metadata as well as the results of contact chaining. The glossary of the review lists following tools, unfortunately with their codenames redacted:

S................?
"This tool is used by HMCs to conduct contact chaining against BR FISA metadata and provide the results to the [...]team. HMCs only used RAS-approced selectors when using this tool. The [...] team ultimately provided the results to NSA's [....]"

S.........?
"The primary desktop graphical user interface (GUI) for access to [....] data and services"

S....?
"An analytic query tool used to seek out additional information on telephony selectors from [MAINWAY?] and other knowledge bases and reporting repositories"

[SYNAPSE Workbench?]
"A next generation metadata analysis graphical user interface (GUI) which is the replacement for [......]"

W......?
"The query tool, which indicates whether a telephony selector is present in NSA data repositories, the total number of unique contacts, total number of calls, and "first heard" and "last heard" information for the selector"

The 2009 PR/TT review also mentions the following tool, which could have been redacted in the BR FISA review:

M.....?
"A database analytic system and user interface tool for integrated analysis of multiple types of metadata, facilitating more comprehensive target activity tracking"

Update:
According to the internal NSA newsletter SIDToday from March 4, 2005, which was published by The Intercept in September 2017, MAINWAY's Sigint Navigator (SigNav) version 4.0 became the vehicle for the new single sign-on tool GLOBALVISION, which gave analysts access to 11 databases.

Combining multiple contact chains

In 2006, a "high-level Bush Administration intelligence official" told Seymour Hersh that analysts could for example look whether any number that is two or three hops away from the seed number is also in direct contact with that original suspect number. That sounds smart, but in that case, that number which is two or three hops away is simply a first-hop contact.

Finding suspects just by looking at connections between anonymous numbers could work however when several contact chains (from related suspect seed numbers for example) are combined: then a number that appears to be in contact with seed #1 and also with seed #2, would be suspicious, as it apparently belongs to someone known by both initial suspects.

This approach was seen in the CBS television program 60 Minutes from December 15, 2013, in which an NSA employee gave a demonstration of how metadata contact chaining works. He used a tool for foreign collection under EO 12333, resulting in some contact chains of almost fully masked phone numbers from Somalia. Clearly visible are numbers that different targets had in common:

Detailed call record analysis

Besides analysing the breadth of the contact chains, each contact between two phone numbers can also be analysed in depth. For this, the analytic software provides analysts access to the complete calling records associated with all the phone calls from a contact chain.

Such a record, as provided by the telecoms, includes the calling and the called number, a calling-card number, the IMEI number of a mobile handset and the IMSI number of a SIM card, as well as the date and time of the call, its duration and technical information about how the call was routed through the telephone networks.

This provides analysts with information like which number initiated the call, the day and time the call was made, and how long it lasted. And although the domestic phone records may not contain cell phone location data, the area code and prefix of a landline telephone number, as well as the trunk identifier for mobile networks, still indicate the area where a particular phone was located.

As described in the previous article, these data weren't derived from the MAINWAY system, but from a second database which holds "individual BR FISA metadata call records for access by authorized Homeland Security Analysis Center (HSAC) and data integrity analysts to view detailed information about specific telephony calling events".

Searching the second database

This database of calling records also enables analysts to subject these records "to other analytic methods or techniques besides querying", like for example searching them "using numbers, words, or symbols that uniquely identify a particular caller or device", or using "selection terms that are not uniquely associated with any particular caller or device" - according to the PCLOB report.

So, when analysing one or more contact chains resulted in finding several suspicious phone numbers, analysts can then use those numbers for querying the second database in order to see whether these numbers also appear in phone records that were not included in their initial contact chains.

And it also seems possible to query for example a trunk identifier to discover other phones from the same region. These kind of searches can therefore provide potential connections that could not have been found by conducting a direct contact chaining query.

Update:
An NSA slide that was already published in December 2013, shows that MAINWAY can indeed be used for queries with cell tower identifiers, in order to find selectors in certain geographical areas:

Some numbers

In a Department of Justice report (.pdf) from 2006 it's said that NSA "estimated that only a tiny fraction (0,000025% or one in four million) of the call-detail records [...] were expected to be analyzed". This would mean that of the 1,8 billion domestic phone records provided daily by AT&T, just 450 would be used for analysis.

So in a year, the records (not the content) of roughly 230.000 individual calls from the domestic metadata collection could have been used for analysis in addition to contact chaining.

Foreign call records

As we have seen, a contact chaining query on Section 215 telephone metadata will generally result in both foreign and domestic numbers. Analysts will therefore not only like to analyze the associated call records from the domestic collection, but also those from foreign collection conducted abroad.

These foreign phone records could be retrieved from the known metadata repositories like ASSOCIATION (for mobile calls) and BANYAN (for landline calls), or from a single foreign "SIGINT" database, as is suggested by an NSA memorandum from 2009.

Enrichment

Analyzing the detailed call records will still not provide names or other information that allows the identification of the people to which the numbers from a contact chain belong. For that, the phone numbers have to be correlated ("enriched") with other kinds of information.

The easiest way is probably to combine them with target watch lists to see if the contact chains contain phone numbers that belong to already known targets. This is demonstrated in the following video, which shows contact chain analysis using Sentinel Visualizer, which is a commercially available program for this purpose:

Telephone identifiers found through contact chaining and subsequent analysis can of course also be correlated with internet metadata. NSA does not collect domestic internet metadata anymore, but its collection abroad results in over 10 billion internet metadata a day being stored in the MARINA database.

The metadata from contact chains can also be enriched with data from for example GPS and TomTom, billing records and bank transactions, passenger manifests, voter registration rolls, property records and unspecified tax data - for both Americans and foreigners, according to a New York Times report, but in which NSA denies using this for the domestic metadata collected under Section 215.

SYNAPSE Data Model

With all this, analysts can build extensive social network graphs (or "community of interest" profiles) using 164 different relationship types like "travelsWith, hasFather, sentForumMessage, employs". It seems that this refers to the SYNAPSE Data Model, for which internal NSA relationships are shown in the following diagram that was published by The New York Times too:

Apparently also based upon this data model is SYNAPSE Workbench, which seems to be the "next generation metadata analysis graphical user interface (GUI)" described in the 2009 BR FISA review. SYNAPSE Workbench is apparently capable of fusing metadata from multiple sources and is also enabled for SPCMA searches.

Further action

When all this makes an analyst to believe that a certain telephone identifier belongs to someone who is of interest but wasn't yet known or identified, the following actions can be taken:

⇨ Is the identifier American and of counterterrorism value, then it can be passed on to the FBI for further intelligence or criminal investigation. From 2006-2009, NSA provided the FBI (and other intelligence agencies) a total of 277 reports containing 2883 telephone identifiers.

⇨ Is the identifier foreign, then NSA can use it as a selector to retrieve the content of associated communications that might be already in its databases. It can also be entered into the NSA collection system in order to pull in the content of any future communications of the target systematically.

In case the identifier of the yet unknown suspect is foreign, the analyst might have found out a name through the various enrichment correlations, but if not, this can also be achieved by listening into the content of associated phone calls or additional Human Intelligence (HUMINT) methods.

Conclusion

As we have seen, the domestic phone records collected by NSA under Section 215 are used for contact chaining that combines both domestic and foreign identifiers. NSA never explicitly explained this, probably because they didn't want to draw attention to their foreign metadata collection and analysis efforts. But it did became clear from the many documents about the Section 215 program that were declassified by the US government.

These documents made clear that NSA rarely went to 3 hops of contact chaining, which is contrary to what most people, including the Privacy and Civil Liberties Oversight Board (PCLOB) assumed. Because of the federated queries, the resulting contact chains were made up of both domestic and foreign identifiers, which means contact chaining under the Section 215 program involved far less American phone numbers than often presumed.

The documents also show that contact chaining for finding yet unknown conspirators isn't as easy as it may appear. It's not that one enters a phone numbers and the software provides a list of suspects. Data retrieved through the contact chains have to be analysed and correlated with other data sets in order to find out which numbers could matter. It still depends on experience, analysis and eventually even guessing which data and which numbers might be worth a closer investigation.

How successful this contact chaining and subsequent analysis is, is difficult to say. The PCLOB report judged that there was "no instance in which the [Section 215] program directly contributed to the discovery of a previously unknown terrorist plot or the disruption of a terrorist attack" - but it's also possible that there were just no such conspirators.

The PCLOB report noticed that analysing the domestic telephone metadata did provide some value "by offering additional leads regarding the contacts of terrorism suspects already known to investigators, and by demonstrating that foreign terrorist plots do not have a U.S. nexus" - although useful, this seems a rather meager result of what for sure required lots of work.

> Next: Collection of domestic phone records under the USA FREEDOM Act

Links and Sources
- Lawfare Blog: Understanding Footnote 14: NSA Lawyering, Oversight, and Compliance (2016)
- EmptyWheel.net: Federated Queries and EO 12333 FISC Workaround (2013) - What We Know about the Section 215 Phone Dragnet and Location Data (2016)
- PCLOB: Report on the Telephone Records Program Conducted under Section 215 of the USA PATRIOT Act (pdf) (2014)
- Cryptome.org: NSA FISA Business Records Offer a Lot to Learn (2013)
- Huffingtonpost.com: The NSA's Telephone Meta-data Program: Part I (2013)
- US Administration White Paper: Bulk Collection of Telephony Metadata under Section 215 of the USA PATRIOT Act (pdf) (2013)
- The New Yorker: What the N.S.A. Wants to Know About Your Phone Calls (2013)
- NSA: Business Records FISA NSA Review (.pdf) (2009)

January 20, 2016

Section 215 bulk telephone records and the MAINWAY database

(Updated: November 23, 2016)

One of the most controversial NSA programs was the bulk collection of domestic telepone records (metadata) under authority of Section 215 of the USA PATRIOT Act.

The Snowden revelations provided hardly any information about this program, but many details became available from documents that were declassified by the US Director of National Intelligence (DNI).

Because in these declassified documents all codenames are redacted, it was largely a mystery which NSA systems were used to store and analyse these metadata.

By combining many separate pieces from both the Snowden-documents, as well as those declassified by the government, it now has become clear that NSA put the domestic phone records in its central contact chaining system MAINWAY, which also contains all sorts of metadata collected overseas.

Reconstruction of the MAINWAY dataflow
(Click to enlarge)

MAINWAY versus MARINA

Initially it was thought that MAINWAY was a repository just for telephone metadata. This goes back to a report by USA Today from May 10, 2006, which revealed that the NSA created a database containing "the phone call records of tens of millions of Americans" obtained from AT&T, Verizon and BellSouth (the latter merged with AT&T as of 2007).

As such, MAINWAY was seen as the equivalent of MARINA, which is NSA's storage for internet metadata. But meanwhile, various documents from the Snowden revelations have made clear that the actual repositories for telephone metadata are ASSOCIATION (for metadata from mobile calls) and BANYAN (for metadata from landline calls).

MAINWAY itself isn't just a database that stores raw metadata, but a system that also "performs data quality, preparation and sorting functions, and then summarizes contacts represented in the processed data". Afterwards, MAINWAY stores the "resulting contact chains and provides analysts with access to these contact chains".

New documents have also shown that MAINWAY contains metadata from internet communications too. For example, in the following diagram about the FAIRVIEW collection program, we see that internet metadata from the Upstream collection first flow into MAINWAY before ending up in MARINA:

Dataflow for internet metadata collected under the
FAIRVIEW program under Transit Authority
(Click to enlarge)

It seems likely that in MAINWAY, metadata are stored more or less temporarily for the purpose of contact chaining and analysing them. Metadata that NSA wants to keep for a longer period of time, or even indefinitely are then stored in repositories like MARINA, ASSOCIATION and BANYAN.

(However, a report by The Guardian from September 30, 2013 says that MARINA "has the ability to look back on the last 365 days' worth of DNI metadata seen by the Sigint collection system")

While the domestic metadata collected in bulk have to be destroyed after 5 years, the calling records that are the result of a query can be stored by the analyst. According to the PCLOB-report (.pdf), they may then be "subjected to other analytic methods or techniques besides querying, or integrated with records obtained by the NSA under other authorities", as well as shared with others inside and outside NSA.

MAINWAY, SIGINT Navigator (SIGNAV), ASSOCIATION and BANYAN
mentioned in a presentation about DEMONSPIT, under which call
records were obtained from major Pakistan telecom providers(!)
(Click to enlarge)

MAINWAY receiving domestic phone records

Based upon Snowden documents, The New York Times reported on September 28, 2013, that MAINWAY is used for chaining both phone numbers and e-mail addresses and that it is fed with data from tapping "fiber-optic cables, corporate partners and foreign computer networks that have been hacked".

The report also says that as of August 2011, MAINWAY was fed with "1.1 billion cellular records a day in addition to the 700M records delivered currently". However, The New York Times erroneously attributed these numbers to collection under authority of section 702 FAA and was therefore not able to identify that MAINWAY was also fed with the bulk phone records of Americans (which happens under section 215 Patriot Act).

The latter only became clear after The New York Times and ProPublica published some NSA documents about the FAIRVIEW program on August 15, 2015. One of these documents confirms that it was AT&T that provided the aforementioned number of records, and also that this happened under BR FISA (= Section 215) authority.

(A report by the Washington Post from June 15, 2013 also identified MAINWAY as the database in which the phone records from the Section 215 program were stored)

So as of 2011, at least 1,8 billion domestic phone records a day were coming in, which makes 54 billion a month and about 650 billion a year. Before they were handed over to NSA, AT&T stripped off the location data in order to comply with the FISA Court orders, that don't allow those data to be collected.

Apparently Verizon Wireless and T-Mobile US saw no obligation to remove these location data, so their cell phone records couldn't be collected by NSA, which therefore only got less than 30% of the domestic telephone metadata.

According to NSA, one of the advantages of putting phone records from multiple American telecommunication companies in one big repository, was that this allowed analysts "to identify chains of communications that cross different telecommunications networks".

Under the President's Surveillance Program (2001 - 2004/2006)

NSA started collecting telephone and internet metadata from US telecommunication providers shortly after the attacks of September 11, 2001. This was part of the President's Surveillance Program (PSP, protected under the STELLARWIND classification compartment), which was based upon what in the end would be 43 subsequent secret authorizations by president George W. Bush.

The goals of collecting these metadata were identifying unknown terrorist operatives through their contacts with known suspects, discover links between known suspects, and monitor the pattern of communications among suspects.

At first, only metadata were collected from communications in which at least one party was outside the US. AT&T (identified as Company A, codenamed LITHIUM, with collection under FAIRVIEW) started to provide both phone and internet metadata from international channels as early as November 2001. For Verizon (Company B, with collection under STORMBREW) the automated transfer of such data started in February 2002. Qwest refused to hand over its records because the government couldn't present a warrant.

Allegedly, raw metadata were transferred in real-time through a high speed data link from the main computer centers of the telecoms to a government facility in Quantico, Virginia. Although Quantico is an FBI compund, the BR FISA review says that it was an NSA mission element, the name of which was redacted, that obtained the records from the providers.

Then, parsers were used to filter the metadata of unwanted information (like credit card numbers), and the records were put in a standard format compatible with NSA databases.

For example, in September 2003, AT&T "captured" several trillion internet metadata, of which some 400 billion records (apparently those with a high probability of containing terrorist communications) were selected for processing. These were flowing into the MAINWAY contact chaining database, which also contains metadata from collection abroad. The 2009 report about the STELLARWIND program says:

"NSA's primary tool for conducting metadata analysis, for PSP and traditional SIGINT collection, was MAINWAY. MAINWAY was used for storage, contact chaining, and for analyzing large volumes of global communications metadata."

(interestingly, in some documents MAIN WAY seems to be written as two separate words, which make it resemble MAIN CORE, which is a central database containing essential intelligence information on Americans produced by the FBI and other US intelligence agencies)

Under FISA Court orders (2004/2006 - 2011/2015)

In July 2004, the collection of domestic internet metadata was moved from the President’s Surveillance Program to the FISA Court, which authorized this effort based upon section 402 FISA, or as it is called by NSA: PR/TT (short for Pen Register/Trap and Trace).

In May 2006, the same happened with the bulk telephone records, for which the FISA Court allowed continuation under authority of section 215 USA PATRIOT Act, or as NSA calls it: BR FISA (short for Business Records FISA).

Under the FISA Court orders, bulk telephone collection eventually became to include "all call detail records or 'telephony metadata' created [...] for communications between the United States and abroad" or "wholly within the United States, including local telephone calls". Only metadata of fully foreign communications were excluded, as was the case for most mobile phone calls, due to technical reasons.

Because right from the beginning, NSA stored these domestic phone and internet metadata in the same database (MAINWAY) that contains metadata from traditional collection efforts abroad, queries could result in contacts chains made up of identifiers from both foreign and domestic sources. The query tool simply didn't identify the difference.

Also it was possible for analysts to start a query with selectors that were not BR FISA-approved, and in some cases this also provided results from both the foreign and the domestic collection. This was not according to the FISA Court orders, and after NSA informed the court about this, they had to stop accessing the telephone metadata in 2009, until these issues had been solved.*

An internal NSA training module from 2011 shows that at least by then, NSA had tagged the metadata records with XML tags to identify not only what legal authority the metadata were collected under, but also the SIGAD of the intercept facility where that had happened.

A rare diagram about the BR FISA metadata collection:
the decision process as it was from 2006 - 2009
(an explanation of this process can be read here)
(Source - Click to enlarge)

Other databases for domestic call records

The domestic call records were not only stored in MAINWAY, but also in another database, one that was apparently dedicated for US phone metadata. An NSA training presentation (.pdf) from 2007 confirms that BR FISA data were stored in two NSA repositories, although both names had been redacted.

An NSA review from June 2009 describes this second database as a "repository for individual BR FISA metadata call records for access by authorized Homeland Security Analysis Center (HSAC) and data integrity analysts to view detailed information about specific telephony calling events".

This seems to refer to the complete calling records, and also the PCLOB-report (.pdf) about the BR FISA program says there's analysis software that "provides the associated information about the telephone calls involved, such as their date, time of day, and duration".

So probably the second database gave access to these additional details, whereas MAINWAY only contains or provides "summaries of one-hop chains", i.e. selector #1 was in contact with selector #2 and the number of times this happened within a specific timeframe.

In the glossary of the 2009 NSA Review, the second repository is listed with a remarkably long name, which, according to its position, has to start with and M, N or O:

This exceptionally long name of the second database could indicate that it was some kind of provisional repository, because on page 23 of the 2009 BR FISA review it is said:

"NSA is preparing to incorporate the [second database] into the NSA corporate architecture. This transition to the corporate engineering framework will maximize use of the latest technologies and proven configuration management to minimize any security and compliance risks"

And indeed, in appendix B of a report (.pdf) by the NSA's Inspector General from August 1, 2012, we see that the second database now has a shorter name, and that it had replaced a "Transaction Database" with a much longer name in January 2011:

Transaction is another term that NSA uses for metadata, so "transaction database" probably just means that it contains the (full) metadata records. This 2012 Inspector General report lists three additional storage systems for BR FISA data, making a total of five being involved here:

1. Contact chaining database that accepts metadata from multiple sources (= MAINWAY)
2. Database repository that stores detailed metadata information, which supports the contact chaining summaries in [MAINWAY]. Replaced an earlier database in January 2011.
3. Contingency database for the time the aforementioned database was being rebuild
4. System backup that stores an exact copy of the raw metadata from the providers
5. Backup tapes on which periodically the raw metadata were saved off-line

So when NSA needs large data centers, that's also because the same sets of data are stored multiple times. Besides backups, there are often separate databases dedicated to a specific purpose or analysis method.

Bulk internet metadata (PR/TT)

As mentioned before, MAINWAY was not only fed with telephone metadata, but also with metadata from domestic internet communications. These metadata include the "to", "from", and "cc" lines of an e-mail, as well as the e-mail’s time and date. Its seems that for contact chaining, no metadata from other kinds of internet communications, like messengers, were used.

On August 11, 2014, an internal NSA Review (.pdf) about this PR/TT program was declassified, which shows similar storage systems as for the phone records: full copies of the internet metadata were also stored in the MAINWAY contact chaining database, as well as in a dedicated second repository:

The PR/TT bulk internet metadata program was shut down in December 2011 for "operational and resource reasons" and all data were deleted. Based upon declassified NSA reports, The New York Times reported on November 19, 2015, that this "internet dragnet" was ended because, among other reasons, similar results could be achieved under other authorities:

- Section 702 FAA, which allows access to internet communications between foreigners and Americans from the "PRISM-providers" and "Upstream collection".

- The SPCMA regulation, which allows using US person identifiers for querying metadata that have been collected abroad.

With collection of internet metadata both overseas (under EO 12333 authority) as well as at the physical and virtual borders of the US (under 702 FAA), NSA probably didn't need the purely domestic ones anymore, to still capture those that are of interest.

Also, querying the metadata collected overseas appeared more attractive, because abroad, NSA is allowed to collect much more types of metadata, than inside the US, where collection was heavily restricted by the FISA Court.

In a declaration for the FISA Court from February 13, 2009, then NSA director Alexander explained that multi-tiered chaining of phone calls is more efficient and useful, "because unlike e-mail, which involves the heavy use of spam, a telephonic device does not lend itself to simultaneous contact with large numbers of individuals".

Update:
An SIDtoday newsletter from August 25, 2003, that was published in August 2016, revealed that "MAINWAY, a system that uses phone call contact chaining to identify targets of interest, was provided to each of our [Five Eyes] partners. The partners now supply additional contact information to the database to enhance the joint ability to identify targets". We have no documents that show that Second Party analysts are restricted from access to Section 215 metadata, but recently published dataflow diagrams show that MAINWAY has separate BRF(ISA) partitions.

Replacement?

According to the secret Budget Request to Congress for 2013, NSA wanted to create (or maybe expand MAINWAY into) a metadata repository capable of taking in 20 billion metadata records a day and make these available to analysts within 60 minutes.

But after Snowden disclosed the Verizon bulk phone records order in June 2013, the American public became aware of the actual scope of this program and it became the most controversial part of NSA's activities.

In January 2014, the Privacy and Civil Liberties Oversight Board (PCLOB) judged that Section 215 collection was actually of "minimal value in safeguarding the nation from terrorism" and that there was "no instance in which the program directly contributed to the discovery of a previously unknown terrorist plot or the disruption of a terrorist attack".

According to PCLOB, the bulk phone records did provide some value "by offering additional leads regarding the contacts of terrorism suspects already known to investigators, and by demonstrating that foreign terrorist plots do not have a U.S. nexus". This however, was not seen as a sufficient justification for the large-scale collection of domestic phone records.

In the course of 2015, US Congress eventually enacted the USA FREEDOM Act, which prohibits NSA to collect and store domestic call records in bulk as of November 29, 2015. Instead, the agency now has to apply for a warrant from the FISA Court approving specific selectors, which are then provided to telecommunication providers, who use them for querying their own databases and only the results are handed over to NSA.

> See: Collection of domestic phone records under the USA FREEDOM Act

> Next: How NSA contact chaining combines domestic and foreign phone records

Links and Sources
- EmptyWheel.net: At the moment NSA shut down the PRTT metadata dragnet, FISC permitted it to query Upstream metadata (2017)
- Lawfare Blog: Understanding Footnote 14: NSA Lawyering, Oversight, and Compliance (2016)
- EmptyWheel.net: What We Know about the Section 215 Phone Dragnet and Location Data (2016)
- PCLOB: Report on the Telephone Records Program Conducted under Section 215 of the USA PATRIOT Act (pdf) (2014)
- Cryptome.org: NSA FISA Business Records Offer a Lot to Learn (2013)
- US Administration White Paper: Bulk Collection of Telephony Metadata under Section 215 of the USA PATRIOT Ac(pdf) (2013)
- NSA: Business Records FISA NSA Review (.pdf) (2009)
- NSA: Pen Register/Trap and Trace FISA NSA Review (.pdf) (2009)
- Andrew P. MacArthur: The NSA Phone Call Database: The Problematic Acquisition and Mining of Call Records in the United States, Canada, the United Kingdom, and Australia (2007)

March 23, 2014

Video demonstration of two intelligence analysis tools

(Updated: May 9, 2015)

In a previous article we provided a very extensive description of a communications analysis tool used by the Canadian agency CSEC. Here we will show two video demonstrations of analysis tools which are used by intelligence and law enforcement agencies all over the world: Sentinel Visualizer and Analyst's Notebook.

Sentinel Visualizer

The first intelligence analysis program is Sentinel Visualizer, which was developed by FMS Advanced Systems Group. This is a 'minority-owned' small business founded in 1986 and based in Vienna, Virginia, which provides custom software solutions to customers in over 100 countries.

This video shows a demonstration of how the Sentinel Visualizer software program can be used to analyse telephony metadata in order to discover new targets:

FMS claims that In-Q-Tel, the CIA's venture capital arm is an investor in FMS, apparently in order to improve their products so they can fit the needs of the CIA. FMS also claims that its product is much cheaper than the alternative, with the price of a single-computer license for its Sentinel Visualizer starting at 2699,- USD, while IBM's Analyst's Notebook tool starts at 7160,- USD.

Analyst's Notebook

Very similar to the Sentinel Visualizer is Analyst's Notebook, which was developed in the early 1990's by i2, a UK-based arm of software company i2 Group which produced visual intelligence and investigative analysis software. After a number of acquisitions, it became part of IBM in 2011.

Both programs offer similar functions, like metadata/link analysis, call chaining, timeline views, social network analysis, geospatial visualizations, and the import of data from knowledge bases and other data sets.

For analysing telephony metadata, Analyst's Notebook has an extension called Pattern Tracer, which enables rapid pattern analysis for "quickly identifying potential targets and predict future incidents more accurately".

This video demonstrates how a "Pattern-of-Life Analysis" can be conducted by using Analyst's Notebook - Esri Edition:

Analyst's Notebook is said to be used by about 2500 intelligence, security and law enforcement agencies, as wel as police forces (like for example the Dutch police, the German Federal Criminal Police Office and the London Metropolitan Police) and investigative organizations and companies in over 150 countries. According to a range of job descriptions, Analyst's Notebook is also used by analysts at NSA.

Usage

As can be seen in the second video, these intelligence analysis tools are quite powerful and able to provide a deep insight into the life of a targeted person. But the presentation also shows that this kind of surveillance is consuming too much time and resources for using it against millions of innocent civilians.

Like the example in the second video, these tools are mainly used for operations against known and potential terrorists and a number of other people of interest, like drugs and weapons traffickers, and also some high level foreign government and military officials.

Regarding the intrusiveness of these tools, we should also keep in mind that they are used by law enforcement and police forces too. Where intelligence agencies use these tools generally for preparing reports for political and military decision makers, their use in numerous criminal investigations by the police can affect ordinary citizens much more directly.

Examples

On December 15, 2013 the CBS television program 60 Minutes provided some hitherto unseen vieuws from inside the NSA headquarters. One of those was an NSA employee who gave a demonstration of how the metadata contact chaining method works. The following screenshots show a tool very similar to the ones in the videos above:

Today, the German magazine Der Spiegel published in its print edition a slide from an NSA presentation that shows a contact graph based upon a social network analysis for the CEO and the Chairwoman of the Chinese telecommunications company Huawei:

(image provided by @koenrh)

See our previous article about the Canadian OLYMPIA tool for how intelligence agencies can map such a social communications network by using just one or two e-mail addresses to start with. See also an earlier article about how NSA used similar techniques to create contact graphs about the Mexican and the Brazilian president.

Update:
The presentation below shows how Analyst's Notebook was used in an operation in which Italian law enforcement tracked CIA operatives who were involved in kidnapping a Muslim cleric in Milan in 2003:

Links and Sources
- FMSASG.com: How Sentinel Visualizer is a Superior Alternative to IBM's i2 Analyst's Notebook

Welcome!

Here you can read about:
- Signals Intelligence,
- Communications Security,
- Top Level Telecommunications,
which means the equipment, from past and present, that makes that civilian and military leaders can safely communicate.

► INDEX of all posts on this blog

The main focus will be on the United States and its National Security Agency (NSA), but attention will also be paid to other countries, like Germany and the Netherlands.

Any comments, additions, corrections, questions or suggestions will be very appreciated! There's no login or registration required for commenting.

twitter.com/electrospaces

mastodon.social/@Electrospaces

electrospaces.bsky.social

info (at) electrospaces.net

RSS-feed

The header photo of this weblog shows the watch floor of the NSA/CSS Threat Operations Center (NTOC) in 2006. The URL of this weblog recalls Electrospace Systems Inc., the company which made most of the top level communications equipment for the US Government. All information on this weblog is obtained from unclassified or publicly available sources.

QW5kIGZpbmFsbHksIHRoaXMgaXMgd2hhdCBhIHRleHQgbG9va3MgbGlrZSwgd2hlbiBpdCdzIG9ubHkgZW5jb2RlZCB3aXRoIHRoZSBzdGFuZGFyZCBCYXNlNjQgc3lzdGVtLiBHdWVzcyBob3cgY29tcGxpY2F0ZWQgaXQgbXVzdCBiZSB3aGVuIGEgcmVhbCBzdHJvbmcgYWxnb3JpdGhtIHdhcyB1c2VkLg==