In the wake of the revelations of data capture by Apple and Android phones, there is an amazing article in the WSJ about the enormous databases of information collected from tracking cellphone usage. The article is eye-opening in terms of the type of data that cellphones can track (locations, calling data, messaging activity, search requests, online activities, movements, people proximity, light levels, rotation, direction) and the range of potential questions that the data can be applied to (sickness, stock movements, political trends, drug marketing, health care decisions). The number of records collected in databases is also staggering. The article refers to a database of 100,000 European mobile users with 16 million records of cell data, time, and position, available to researchers at Northeastern; AT&T has its researchers using records of hundreds of thousands of subscribers in NY and LA to do research on commuting; a database of location and billing data from a billion calls in Belgium; and Verizon and Sprint (at least) provide “anonymous” location data to a company called AirSage so that it can track millions of cellphones and billions of data points to generate live traffic reports.
The power and utility of these data sets is fascinating and not so surprising when one thinks about it. What is surprising is how this appears to have fallen through the cracks of privacy concerns, particularly given the huge uproar over behavioral advertising and proposals such as Do Not Track. In comparison, the wireless databases seem much more threatening to privacy and the fact that companies have been mining the data and sharing them with others without oversight or debate is mindboggling.
To summarize and simplify what I know about privacy law, the fractured regime in the US focuses on a concept called personally identifiable information (‘PII”). PII is data such as names, phone numbers, email addresses, social security number or other information that directly links to someone’s identity. If something is PII, then there may be potential restrictions or duties related to the collection (by an authorized party from users), use (by the authorized collector), disclosure (sharing the PII with a third party), and access to the PII (by the user). If something is not PII, there are probably no such duties right now, but this is something that is being intensely debated. The problem is that the more data you collect and the more powerful are the tools to analyze the data, it is possible to take a collection of non-PII data and link it back to an individual user. This has been shown again and again in the behavioral advertising context and according to this article, it’s also been shown in the wireless context (unsurprising given the robustness of the databases).
There are a lot of good reasons for creating systems that encourage users to give consent to sharing their info, including in the potential beneficial research that can be done and, as in the behavioral advertising context, to enable great services that are supported by ad revenue and are otherwise “free” to users. That said, society is struggling through these issues on the behavioral advertising front and it is amazing that this data on the wireless front has been collected and shared with researchers and companies with so little scrutiny by society to date. This will, as it should, spark a serious debate.