GDPR and Fingerprinting - how are you being tracked?

As the GDPR approaches to reform our thinking on internet privacy there seems to be a general scurrying from a whole host of digital marketing platforms to alter their approaches to tracking personally identifiable customer data. Never have the words 1st party and 3rd party been so over-used! ‘Legitimate interest’ has now usurped ‘the dog ate my homework’ as the world’s favourite excuse for getting away with something you shouldn’t be doing!

But is GDPR really going to change the impact of digital tracking on consumers? Or simply push an already black-box art deeper into the digital underground.

The average internet user would be stunned at how invasive online tracking has become. And this has very little to do with the much-maligned cookie, which has been at the heart of so many misguided attempts to deliver greater internet privacy. Digital marketing is one of the most aggressive industries when it comes to tracking people online, and this is because digital marketing is obsessed with the art of tracking a user’s online ‘fingerprint’. Sometimes user’s will be fully aware that this is happening. But, even when GDPR comes into force, many users will continue to be fingerprinted with increasing impunity by a host of online tracking companies who rely on the principle of uniquely identifying a user to power their business models.  

What is Fingerprinting?

Once only heard if you were watching the The Bill or having your collar felt, online fingerprinting is the art of persistently tracking and positively identifying a unique internet user. Tracking technologies attempt to continuously identify online users by identifying a combination of properties, some based on personal data and some based on device settings, to build a unique profile of a user.  

Some fingerprinting methods rely on the collection of personal data to fingerprint. This data might include:

  • Email address
  • IP address
  • Name
  • Job/Company
  • Social media logins

Other fingerprinting methods use a collection of non-personal data, normally related to a user’s device:  

  • Browser user agent (a combination of browser version, operating system and installed items)
  • Location and time settings
  • Fonts
  • Audio settings
  • Battery Status

Tying information about a user to information about a device creates a very unique identification.

 Impact of Fingerprinting

Online Fingerprinting has developed significantly over the last few years in direct response to the growing demands for internet privacy and the war this has waged on cookie-based internet tracking.

Google’s fingerprinting technology is the most widely propagated. In 2016, a Princeton University study by Englehardt and Narayanan showed that Google’s third-party HTTP requests were present on more than 80% of the top 1 million sites as listed by Alexa.

However, fingerprint tracking is also widespread, performed by a whole host of online marketing companies for the purposes of delivering advertising and tracking its results. The same study found 81,000 different third party HTTP requests used specifically for tracking on Alexa’s top 1 million sites.

The most definitive way to set a fingerprint against an individual user is to collect some form of first-party data (submitted with the user’s consent to the company that is collecting it). An email address is the most common way of doing this. For example, collecting and matching encrypted email addresses is how the vast majority of cross-device matching solutions work, because users will often access the same platform, using the same email address on more than one device.

Google and Facebook, with a combined share of digital advertising spend approaching 70%, have a significant advantage when it comes to collecting first-party fingerprints, because users are often persistently logged in to these platforms with a common email address. These platforms even allow this email address to be used for the user to log-in to a whole host of other websites, apps and online services. Of all the online tracking domains being used today to collect the identity of internet users, Google owns 12 of the top 20.

Outside of Google and Facebook, Appnexus is the only other company with a greater than 10% coverage of their tracking domains on Alexa’s top 1 million sites. And it’s a company like this where the impact of GDPR will be most keenly felt. Without the natural consent that comes from being actively logged in to your Gmail or Facebook accounts, GDPR will force a company like Appnexus to seek affirmative consent from the user to have their online fingerprint taken using personal data. This will be totally impractical for companies used to harvesting this information in the background, and likely to result in significant user opt-out that will negatively impact the advertising powered by these platforms. In December, Criteo forecasted a 22% negative impact on Revenue due to ongoing privacy upgrades to Apple’s Safari like ITP and blocking a HSTS browser cache fingerprint  

Many online tracking companies will rely on the GDPR (Recital 47) known as ‘legitimate interest’ to preserve the status quo of their business models. But most impartial experts believe that legitimate interest cannot be reliably and consistently established for collecting personal data without an internet user’s knowledge for the purpose of advertising, even if it is pseudonymised. The GDPR states very clearly that the rights of the user must always be maintained when establishing legitimate interest. The dilemma is obvious. How can a user retain the right to be forgotten, if that user did not consent or know they were tracked in the first place?

These issues will throw users and online tracking companies on the mercy of more furtive, aggressive forms of online fingerprinting, already in use today and likely to become increasingly common in a post-GDPR world. These fingerprinting techniques will be used to protect the reliability of digital marketing’s online tracking systems, while floating under the radar of regulators. These forms of fingerprinting do not rely on personal data like email addresses or unique customer identifiers, but aim to make a ‘best guess’ at identifying a user based on a collection of data points that when amalgamated pass as unique identifiers. This form of online fingerprinting is sometimes referred to as probabilistic tracking.

Some of these fingerprinting techniques are very basic. The most common form of probabilistic fingerprinting is to match a user’s IP address with their browser’s user agent (an string identifying device, operating system, browser version and plugins), in theory creating a unique identification of the user. Unfortunately, 3G, 4G and 5G internet connections now mean only about 20% of IP addresses are unique to a user.

Also, the advent of smartphones – yes that did happen in 2007 – now means internet browsers can not be customised like they used to, which means two people using an iPhone 7 and browsing the internet in London, on Vodafone could look like the same person to a tracking technology, even though they are two separate people that have never met.

This flawed method of tracking is great for the privacy of the user because it has very little chance of accurately identifying anybody. But it’s very bad for the digital marketing vendors and advertisers that are spending millions of pounds based on the results of these spurious fingerprinting efforts. It is estimated that 19% of fingerprint matches made by online tracking systems are incorrect.

However, digital marketing will never abandon the need to track users accurately, and it will use almost any data point possible, consenting or not, to fingerprint a user accurately.

Canvas Fingerprinting has become increasingly popular in recent years for both its accuracy and difficulty to block. It works because a browser’s HTML Canvas allows a web application (like a piece of JavaScript placed on a website) to draw graphics in real time on the canvas, with functions to support drawing shapes, arcs, and text to a custom canvas element. The different ways a browser renders each font causes devices to draw the image differently. This allows the resulting pixels to be used as part of a fingerprint. Put another way, your computer draws a picture online without you knowing it, which has all the unique properties and flaws as if a user had drawn it themselves.

Englehardt and Narayanan identified Canvas Fingerprinting on 1.46% of Alexa’s top 1 million sites. It sounds small but that’s still 14,371 of the world’s most high-trafficked websites.

Canvas Font Fingerprinting is the beefed up version of Canvas Fingerprinting, and adds a matching of a browser’s font-list to provide an even more unique fingerprint. An intricate form of fingerprinting, this was found to be present on less than 1% of the world’s top 1 million sites.

Audio Context and Battery API are two more complex fingerprinting techniques used to uniquely identify a user’s device without them knowing they are being tracked. Audio Context uses a browser’s Audio Context API to record the audio signature of the device, rather than the actual audio or video being played, while Battery API allows a tracking script to harvest information about the device’s battery settings. Again, combining these data points with other non-personal information can create a reliable, unique fingerprint of a user.

And while these techniques might be relatively small scale at the moment, the key point is that they offer digital tracking providers a way to easily circumvent the restrictive consent requirements imposed by GDPR.

Which means their use is likely to become increasingly widespread post-GDPR. For those of a more legal persuasion Recital 29 of the GDPR deals with pseudonymisation (the data law makers even invented a new word for it!). The processing of pseudonymous data without affirmative user consent is allowed under GDPR, although the definition of pseudonymous is still being hotly debated. The GDPR uses the term pseudonymisation largely to classify the encryption/decryption of personal data using a cryptographic hash function like SHA-256. However, leniency around pseudonymous profiling will almost certainly see digital tracking companies reach for these fringe fingerprinting techniques to replace the more common tracking techniques, like storing personal data in a fist-party cookie. 

Fingerprinting using some of the techniques outlined here are also notoriously difficult to detect. The most popular ad-blocker databases – the EasyPrivacyList and EasyList – detect less than 25% of Canvas Fingerprinting and incredibly less than 5% of AudioContext and Battery API fingerprinting.

There is a fine balance between disrupting internet journeys with increased consent notices and protecting a user’s personal data. GDPR will impact the digital marketing industry’s ability to deliver and track online advertising. Expect fingerprinting that does not rely on collecting personal data to become an increasingly used, and rarely discussed, alternative for the online tracking providers to maintain their careful, scrupulous watch on your internet behaviour.