July 14, 2017

Dutch report provides metadata numbers to compare with Snowden documents

(Updated: September 30, 2019)

Since the Snowden revelations, we know that signals intelligence agencies are trying to acquire large sets of telephone metadata in order to analyse them in support of protecting their national security.

Less known is that commercial companies also analyse similar big data sets, albeit for research purposes and with personal information being anonymized.

Now, a research report from the Netherlands provides us with actual numbers of mobile telephone metadata, which can be used to compare with the numbers that NSA and GCHQ collected according to the Snowden documents.


Tourist movements

Recently published was a report about visitor movements in and around the Dutch capital of Amsterdam. It was prepared by the economic research company Decisio on behalf of the province of Noord-Holland and the municipalities of Amsterdam and Zandvoort.

Since a few years, Amsterdam almost suffers from a huge increase of tourists, but it was difficult to get detailed insights in where they come from, where they stay and which areas of the city, as well as which surrounding towns are most popular.

Now, these insights became available by using information from the "tracking device" carried by almost every individual: the mobile phone.


Anonymized data sets

Decisio acquired a huge set of mobile telephone metadata from Vodafone, which is the second largest provider in the Netherlands, with over 5 million customers. When they use their mobile phones, they connect to one of the 32.000 cell towers or base stations, which associate the phone number with a location.

Each month, Vodafone provides these data to another research company called Mezuro, which processes and analyses them to map the movements using a grid of 1250 regions containing multiple base station cells. The results were then analysed by Decisio and compared with other information sources.

But before that, the Vodafone metadata were anonymized by replacing every phone number with a random number that changes every month. Foreign phone numbers were replaced daily. Also, only the movements of groups of more than 15 numbers were reported, so it's impossible to track the movements of individual phone users.



Development of the average number of daily mobile phone transactions
from Vodafone users between January 2013 and September 2015
(source: Decisio research report)


Mobile phone metadata

Most interesting parts of the report are the details about the telephone metadata: Mezuro periodically receives information about some 3 million Vodafone phone numbers that are active on a daily basis. These phones generated 400 million "transactions", or Call Detail Records (CDRs) a day.

These transactions are the moments that a mobile phone connects to a cell tower, not only for a phone call or a text message (SMS), but increasingly often for a social media posting, sending or receiving an e-mail, a Google search or checking a website - for Dutch users, this is on average 100 times a day. Besides these communication transactions, mobile phones also connect to cell towers without transmitting any kind of message; these non-communication transactions are only stored for up to a few hours.

An article from October 2013 about Mezuro says that the company analyzed some 150 million data points daily and that an average smartphone connects 150 to 200 times a day with a cell tower.

This number was confirmed during a parliamentary hearing in Germany, when someone from BND explained that one cell phone generates between 100 and 200 metadata and business records a day.

Update: similar numbers come from the Russian billing (and interception) company Peter-Service, which installs billing installations capable of 18 million subscribers creating 2,851 billion transactions a day, which equals 158 transactions per person per day.

If we take these metadata as the rows of a (database) table, each of them contain multiple fields, corresponding to columns for information pieces like for example the number calling, the number called, date, time, cell tower location, and information needed to transfer various types of messages.


Multiplying

For the tourism research, the Vodafone data were multiplied in order to get the numbers for the full population. The multiplier changes daily depending on the day of the week, holidays, etc, but lies roughly around 5 (for foreign visitors it's much more difficult to calculate this number).

As this total also includes people who don't use a mobile phone, the multiplier for the total number of metadata must be lower. According to the report, the users of the Vodafone network account for 1/3 of all mobile phone users in the Netherlands, so here we can use 3 as multiplier.

That makes that in 2016, all Dutch mobile phones generated some 1200 million transactions a day. In a month that's over 36 billion and in a year 432 billion telephone metadata records.

For comparison with the numbers from the Snowden documents, we have to look for the numbers from early 2013. The chart from the report shows that in January 2013, there were ca. 85 million transactions by Vodafone users a day, which makes 255 million for all Dutch users. In a month that's 7,65 billion.


Numbers from BOUNDLESSINFORMANT

Now, let's take a look at some of the numbers from the Snowden revelations. For the Netherlands there was a chart from the NSA tool BOUNDLESSINFORMANT, which shows 1,8 million telephone metadata records for 30 days around January 1, 2013.

Initially it was thought that this were Dutch data sucked up by NSA, but later it came out that they were actually collected by Dutch military intelligence, most likely in Afghanistan, and subsequently shared with the Americans.


Now that we know that in the same period of time, the Dutch mobile phone users alone already accounted for over 7 billion metadata, 1,8 million is a tiny number, maybe generated by not more than 2500 smartphones. In Afghanistan, old fashioned cell phones may have created less transactions, so the 1,8 million metadata could have been the traffic captured from a small town.

Update: On Twitter, a Dutch journalist involved with the Snowden revelations said that the 1,8 million records represent some 12 million pieces of metadata (which means one record consists of at least 6 fields) and that the Dutch Ministry of Defence had confirmed that they were collected from Somalia.


The BOUNDLESSINFORMANT chart for the Netherlands with data
collected from December 10, 2012 to January 8, 2013
(click to enlarge)


Late 2013, major European newspapers published similar charts for other countries too, again claiming that they showed how many phone calls NSA was intercepting. But even if those claims were true, the 70 million the BOUNDLESSINFORMANT chart presented for France, 60 million for Spain, 45 million for Italy and 33 million for Norway, are tiny numbers given the actual 7,65 billion metadata for a small country like the Netherlands.

Even the 552 million metadata in the chart for Germany doesn't come close. If the Netherlands with some 16 million people generated 7,65 billion mobile phone metadata a month, then for 80 million German citizens that number would be over 38 billion.

And to be clear: the data represented in these specific BOUNDLESSINFORMANT charts were not collected by NSA in Europe, but shared with NSA by European intelligence agencies, as part of their military cooperation in various crisis zones.


NSA and GCHQ totals

Finally, we can look at how many telephony metadata NSA and GCHQ collect in total and compare that with the numbers from the Netherlands. In 2012, the British GCHQ "was handling 600m "telephone events" each day" - according to Snowden documents seen by The Guardian.

This seems a surprisingly small number compared to the 225 million transactions generated by Dutch users, but it's possible that the 600 million only apply to traditional telephone and SMS metadata, excluding the internet data from smartphones.

The NSA collected a total of 135 billion telephone metadata a month during the first half of 2012. This is some 17 times the amount for the Netherlands as a whole - again not a very excessive number, as it would roughly equal the telephone metadata of around 300 million people, which is more or less the population in the Middle East.

Update: It should be noted that the NSA's total of 135 billion telephone metadata includes both landline and cellphone metadata, where the 7,65 billion metadata for the Netherlands are only those generated by mobile phones.



The volumes of NSA metadata collection between January and June 2012
(click to enlarge)


Conclusion

During the Snowden revelations, the press was eager to present numbers about NSA and GCHQ data collection that seemed impressingly high. But not a single media outlet took the time or effort to come up with the total numbers of telephone and internet communications, needed to put them in the right perspective.

From the research report about Amsterdam tourism we finally learned what the actual number of mobile telephone metadata for a western country look like. Although we still don't know how exactly NSA and GCHQ are counting their metadata, comparing them to the numbers from the Netherlands shows that their collection efforts may be not as excessive as initially thought.



Links and sources
- Decisio: Bezoekersstromen Amsterdam - Zandvoort (2017)
- Autoriteit Consument & Markt: Telecommonitor eerste kwartaal 2016 (2016)
- ITU: Innovation of tourism statistics through the use of new big data sources (2014)
- CBS: Rapportage project impact ICT; Mobiele telefonie (2013)
- Data Management: Building a Data Warehouse of Call Data Records (CDR): Solution Considerations (2012)


One interesting result from the tourism report is that measuring the number of visitors of Amsterdam's annual Gay Pride showed that instead of the 560.000 visitors according to the organisation, only 115.000 visitors came from outside the city center, additional to the 235.000 people who are present on every Saturday and may or may not have watched the event. This confirms that visitor numbers for free public events are often significantly exaggerated.

In Dutch: Meer over het wetsvoorstel voor de Tijdelijke wet cyberoperaties