Profiling moves beyond the cookie

With the the e-Privacy Directive now entering into force across Europe with it’s upgraded cookie rules, it could have been disappointing to read this new academic article about “browser profiling“, a practise that achieves profiling without using cookies.

The e-Privacy Directive does not mention cookies by name in its Article 5.3, although they are referred to in the recitals. Instead the Article addresses “the storing of information, or the gaining of access to information already stored, in the terminal equipment of a subscriber or user“. The notion was of a unique identifier (UID) stored on the users computer enabling successive visits to a website to be matched. The provision is beginning to fail one the EU’s stalwart regulatory principles – technology neutrality.

Browser profiling involves a webpage loading a script that combines a set of relatively unique (‘high entropy’) features of the browser’s environment, in particular the list of fonts available, into a UID sent back to the website. The resulting profile is unique in seemingly 95%+ cases, and does not require the local storing of information.

But while the Directive does not apply, the industry responses to policy maker anxiety about cookies do. The IAB Europe self-regulation rules and the Do Not Track (DNT) provisions refer to the practise of ‘tracking’ or ‘profiling’, rather than technologies. While the main online ad companies and publishers are complying with these rules, the article’s authors lament the fact that the sites they found using finger-printing (404 of the top 1 million sites, see chart) were also not respecting the DNT signal. That can hardly be a surprise as few were providing details of their practises in their privacy notices either.

Image

So the real issue is enforceability against bad actors. This is a function of the enforcement agent’s (data protection agencies) determination, and the assistance the law provides in proving a transgression. I believe that privacy-based ‘collection & processing of personally identifiable information’ approaches are poorly suited in these instances. Recall that browser profiling is based on looking at installed fonts!

Are there other approaches possible? The draft EU Privacy Regulation promises a ban on profiling, which passes the technological neutrality test, but in the process has been criticised for over-broadly inhibiting all manner of legitimate business activity. Another is to look at this kind of practise from a cybercrime perspective – for example Article 13.4 of the e-Privacy Directive provides that “the practice of sending electronic mail for the purposes of direct marketing which disguise or conceal the identity of the sender … shall be prohibited“. Interestingly the article refers to the practise of some browser profiling code to delete itself after having sent the fingerprint back to the site owner (presumably to make itself harder to detect).

Learning to love losers

The economy is not a zero-sum game, but productivity gains do require inefficient firms to cede markets to efficient ones. Where there are winners, there must also be losers, and losing may ultimately mean bankruptcy and job losses. But, and this is the key point, productivity gains lead to higher average living standards, and fast growing companies create the majority of the new jobs in an economy.

One of the most frequently cited differences between the US and the EU is the lack of an entrepreneurial, risk taking culture on the eastern side of the Atlantic. But do we only have ourselves to blame?

  • We hardly cherish wealth generators; and
  • we ‘punish’ (both socially and more structurally) those that go bankrupt.

The result – as elegantly shown in this chart from the Lisbon Council’s Plan I published today, is a corporate environment where a ‘comfortable’ life is too often preferred over the excitement of mega growth and the tension of rapid reversal (see chart). Here lies, however, the roots of the transatlantic productivity gap and Europe’s slower economic recovery.

Image

Net neutrality: unneeded regulatory complexity

I was invited to provide a ‘firestarter’ contribution at a WEF / BCG event about telecoms today. The conversation was under Chatham House rules, so I won’t report on the details, but I can flesh out my contribution.

Call my cynical, but the debate supposedly about ‘net neutrality’ is really more about who gets rents in the burgeoning digital economy. It is therefore rife with spin.

In the traditional telecoms world, there has long been a refrain of ‘investors in infrastructure” versus “[mere] providers of services”. I always feel that the reference to “OTT (Over-the-top)” players by telcos is little more than an attempt to recast the prejudice in a modern light.

It made little sense before, and even less now. A Hollywood studio makes several $100m films each year, each of which risks a panning by critics (or, more importantly nowadays, on social networks). And if we think of the main software platforms as “R&D” based, then their risks become more discernible. Startups take this to the extreme, committing an even higher proportion of their budget to R&D and often with even greater uncertainty. 

In a final straw pole of participants, about half the room thought the demand side for telcos was uncertain in the <5 year horizon, but few (if any) had doubts about the longer term. Returns have always been linked to risks and we shouldn’t lose site of that when thinking about regulating the interaction between telcos and online players.

And that’s where I have issues with net neutrality, as I don’t think telco regulators are well placed to compare risk profiles. Indeed, I’m not sure anyone can and consequently believe in leaving it to the market.

So I also diverge from the protagonists of net neutrality, who are effectively calling for regulatory oversight.

It was interesting to note that in the recently published Startup Manifesto, 9 of Europe’s leading entrepreneurs did not include the issue in their prioritised set of policy prescriptions. It might have made the long list, but the key issues to unlock jobs and growth from Europe’s startups lie in other policy areas, such as education/training and privacy. The net neutrality ‘debate’ is consequently being conducted by large companies with sufficient means to defend their interests without a regulator’s support.

My focus would be on the (large) corporate protagonists of net neutrality. If they changed their mind and purchased special access to telco networks, that would create additional barriers for their startup competitors. Moreover, as the deals would probably be limited to the larger ISPs, accelerate ISP consolidation too. That’s not a reason to expand the powers of regulators, but it does require vigilance by the competition authorities.

Some lessons for government from social networking

(An essay I wrote in 1998)

Introduction

“Web 2.0” is hardly common parlance, but it is also not mere hype.  The basic notions are well understood, not least in contrast to Web 1.0 – the bi-directionality of traffic represents a fundamental evolution of the Internet as a medium, and the Internet relative to previous media.  The consequences for business and citizens are the subject of many books.  I want to explore some issues for public policy, and in particular enforcement.

The Internet probably raises many public policy concerns, that are not unfamiliar.  For example, promoting economic growth, while safeguarding the interests of children and other vulnerable groups; balancing privacy against commercial and public security needs; and whether all these extra electrical gadgets can ultimately be good for the environment if they promote more distance working, learning and shopping.   However, the Internet puts all of these issues in a new context, with scale and global reach being probably the two most common and most challenging.

The Internet may be virtual, but it is a media – in other words it is about content, and content has repercussions in the real world – emotional, political, sometimes criminal.  Content can therefore be defined as “harmful” or “illegal”, but there are large differences in these definitions between regions and between countries.  Yet nearly all Internet content is available to everyone.  (Although vastly important I will return to the global aspect only in section 3.)

And because everyone is connected to every website, the first indications of the scale of content available become clear.  And to that needs to be added the rapidly declining costs for consumers in making, replicating and publishing the 1s and 0s that make up online content which are fuelling user generated content in a manner that has no real prior analogue.

This paper is divided into 3 parts.  The first discusses some of the current policy concerns about online content.  The following section looks at the types of regulation being applied, and more importantly how it is being enforced.  It also introduces some of the mechanisms of mass collaboration that characterise Web 2.0.  The final section considers how these mechanisms, which are today not exploited for the benefit of public policy, could be employed for the pubic good.

Public policy concerns about content

A first important distinction to make is between illegal content and harmful content, where the latter means content that is inappropriate for some audiences (typically children).  Harmful content is a concern but policy towards the protection of minors has, however, tend to emphasise child abuse images – i.e. illegal content – above all else.

An industry has built up to provide software tools for parents to monitor/ control the online activities of their children – the French and Australian governments have gone as far as mandating their Internet Access Providers (IAPs) to provide these technologies as part of their service.

Meanwhile, the Commission provided early backing for the ICRA tagging technology.  This ought to have been the sort of open, flexible de-centralised system that could grow with the Internet, but it is largely failed to make an impact on the ground.  ICRA’s failure is telling.  The core feature is a meta language that website creators can use to “rate” their site.  Meanwhile, parents can adjust the filter settings of their children’s browser to block content deemed unsuitable.  The system needed a number of things to go rightly to succeed:

 

•A substantial number of websites need to self-rate (accurately).

•The major browsers needed to incorporate the technology.  Internet Explorer did, as did (more reluctantly) the then market leader, Netscape.

•Parents needed to adopt the technology.

 

Success would have been easy to measure – what default setting (i.e. what should the browser do with an un-rated site) do parents choose in practice?  At the outset, when few sites would have been rated, parents probably felt they had no choice but to let through unrated sites as otherwise they would have spent all their time unlocking sites that their children wanted to go to.  ICRA would demonstrably have succeeded when most parents switched the default setting to “block unrated” sites.

 

In practise, the switch never took place.  Only the pornography industry felt the need to “show willing” and rated their sites.  Meanwhile, neither Firefox nor Opera felt there was sufficient consumer demand to incorporate ICRA in their modern browsers, and it is not present on the browsers available on GSMs and other mobile devices.  Most parents arguably still remain unaware of the technology (Germans may be the exeception), and the challenge of persuading the now millions of e.g. MySpace users to rate their own content (often containing posts from third parties anyway) is of course only greater.

 

The more recent attempt to control harmful content has been the revision of the TV without Frontiers Directive, now called the Audiovisual Media Services Directive.  The revisions work by analogy: in the TV world it was possible to identify the “broadcaster” as the central actor to whom regulation could, in general, be applied regarding the scheduling of harmful content (the “watershed”), European content (quotas) and advertising (time limits).  In the recent revisions, new definitions were introduced to identify two types of operator: linear and non-linear audiovisual media service providers.  The former is essentially broadcasters, while the latter covers some part of the emerging video-on-demand services.  The Directive is not believed to cover services such as YouTube – the notion being that such video “platforms” are more akin to the mere conduits and webhosters that are not expected to be aware of everything available (as a result of the E-Commerce Directive).  At the same time, however, the confidence that the “no monitoring” obligation makes political sense is now subject to considerable reflection: to the extent that it was grounded in a technical assessment of what ISPs could realistically do, Moore’s law is progressively shifting expectations.  What member states will make of the Directive when they transpose it is therefore to be seen.

 

Illegal content is, at first sight, the work of law enforcement authorities.  Nonetheless, many ISPs have offered their pro-active support, in particular in the fight against child abuse images.  Faced with spiralling costs of handling spam (and related cyber nasties) and, often even more costly, of fielding complaints from customers, ISPs are also active in the fight against those that send bulk emails.  This is especially true in the US.

 

However, with the internet becoming ever more important in daily life, the enforcement community has upped its asks of the ISPs.  The police now seek far greater retention of data, while rights holders seek cooperation to discourage, or even to prevent, use of the Internet for copyright infringement.  Add to that the traditional hesitations in being involved in other civil wrongs (such as defamation), and ISPs have been left rather on the defensive.  This seems likely to continue as the trend towards “everything in the cloud” is going to leave many policy makers wanting to have something national and specific to fix on to in order to be show that they have taken action to protect the public interest.

 

Alongside the challenge of the exponential growth in the amount of content online, a similarly explosive growth in the number of content hosts is taking place.  Spam provides the clearest example, with the vast majority of unwanted email now being channelled through compromised end user computers (so-called ‘botnets’).  Keeping a ‘blacklist’ of spam mail servers is progressively harder (most ISPs provide Internet access to some users with compromised computers), and beyond technical measures to filter outgoing emails the other option for an ISP is to inform their customer of the compromised nature of their computer – a costly call centre activity at the very least!.  But with botnets now being used to mount denial of service attacks – cyber warfare – and to provide a sort of dynamic hosting environment that is near impossible to disrupt (e.g. fluxnet hosting of terrorist information), the public policy interest in addressing botnets should be far higher.

 

Rights holders also claim to suffer from the spiralling increase in individual users hosting and sharing copyrighted content.  As with spam, their focus is on the ISPs that (they claim) should do more to staunch outflows of copyrighted content.

Last, and almost certainly not least, is the as yet unclear consequences of growth in other connected devices, notably mobile phones.

 Mass collaboration

I have taken the term mass collaboration from the subtitle of Wikinomics, by Don Tapscott & Anthony D. Williams.  The quintessential example is Wikipedia – the user generated encyclopaedia that now rivals its commercial counterparts.  However, Wikinomics provides insight into how the same concepts are revolutionising business life far more generally.  However, our interest is in how mass collaboration and public policy.

The Internet continues to grow daily at a tremendous rate.  Rapid development by commercial enterprises is being dwarfed by the growth of user created content.  This ranges from photo and video sharing, to blogging and contributions to discussion fora.  Although some prophesied chaos, the reality is very different – quality materials still rise readily to the top.  This is largely due to the role of ‘meta-data’: links, ratings & tags.

 

For example, many consumers rate online sellers.  This can be retailers (e.g. those in the Amazon market place), peer sellers (e.g. on eBay) or services (e.g. Trip Advisor for hotels).

 

Editorial content is subject to a more complex process.  Bloggers link to interesting content, and sometimes tag their postings; readers can rate content that see, hear or read.  Google’s search algorithms put huge stress on the mesh of links that it’s spiders unearth; Digg’s algorithms process the positive and negative ratings to determine what it is the story of the moment.  The cumulative reports from users that a a particular email is spam, help email service providers refine their spam filters.  YouTube has an “inappropriate content” button that appears to lead quickly and effectively to the removal of, for example, pornography.  Yes, the systems can be gamed, but the reality is that the major sites process so much meta data so quickly that this has not proved to be a major distraction for most users.

 

Rating is straightforward compared to tagging content.  Traditionally, taxonomists developed organised classifications.  On the Internet, no classification system is imposed – anyone can choose any tag they like.  It is simply the force of large numbers that leads a work to stand out as the prime example of a particular tag.  Note also that this system is language neutral – for example, a popular Dutch work could be posted on YouTube and emerge as the top hit for a search for ‘grappig’, while it would be entirely invisible to someone searching for ‘funny’.

 

Mass collaboration is arriving in policy circles, led by activists.  Petitions and Amnesty International letter writing represented an early examples, with the orchestrated campaign against the Software Patents Directive representing the pinnacle to date of what is possible.  (The Swiss system of frequent referenda might also be characterised as formal version of mass collaboration, but there seems to be little pressure to abandon parliamentary democracies elsewhere.)

 

Policy enforcement, however, seems to be a different matter.  Section 1 identified a number of policy concerns, and a number of challenges for traditional enforcement methods.  Sifting through the content available online however would be nearly impossible.  Companies can usually be relied upon to report transgressions by their competitor but this is clearly not enough for today’s Internet.  Some anti-spam enforcers have created spam mailboxes that users can forward unsolicited emails, but regulators have very few links to the world of user generated content.

 

Some thoughts for policy makers

 

Governments risk being the last ones to embrace mass collaboration.  As noted above, it clashes with politicians notions of parliamentary democracies and they may have led many people to expect them to be able to solve too many problems.  And will public sector work forces, locked into rigid hierarchies and subject to strong unionisation, be able to embrace this openness?  Yet, if section 1 is believed, existing policy prescriptions are hardly working.  The need to adopt mass collaboration has not been proved here, but to ignore this predominant Internet trend would be foolhardy.  What might be entailed?

 

Under a heading of “helping citizens”, media literacy is supreme, and it clearly needs to be linked to life long learning rather than seen merely as children’s education.  Understanding meta data (and the societal value of contributing it) is going to be as important as learning how to understand content itself.  Encouraging users to rate and tag government websites would immediately render them more useful (to other citizens and to government web developers alike).

 

This could perhaps be linked to a revised implementation of ICRA too.  The spamboxes described above are a first step.  But why not a Firefox extension that enables citizens to report inappropriate content whenever they find it, with no need to resort to a call centre helpline.   The larger communities include such feedback systems by default, but Government could play a role in providing these same features (that all e.g. YouTube users now understand) to help ‘police’ the broader Internet.  (I suspect there are some civil liberty concerns here that also need to be taken into account though.)

 

If the idea of a ‘driving licence’ for computer users is contemplated, a contrôle technique for connected devices is also worth consideration.  Indeed botnets exhibit clear externalities – my infected computer is probably causing me less harm than is being caused to others (e.g. receiving spam, or a denial of service attack, from my computer).

 

Although the internet is uniquely adept at supporting the creation of communities, Government clearly has a role in supporting the emergence of those that are socially beneficial.  In practise this may mean more in terms of ensuring access to the Internet (e.g. to the homeless) than community software, but that may also be necessary for groups that cannot take advantage of existing web technologies (perhaps due to disability or use of a minority language) that need additional support.

 

With such fundamental shifts occurring in the business world, it should be no surprise that economic regulation also needs review.  The massive interweaving and sharing of information between companies does not, at first glance, sound very attractive to an anti-trust regulator.  Hopefully the Technology Transfer rules already provide a framework for this, but I have not had time to verify.  Secondly, these new services often exhibit network effects.  Whether dominance can be abused, however, is unclear as dominance often seems to be transient (an example might be Facebook’s emergence to challenge MySpace).  Abuse will of course not take the form of excessive pricing (so many services are free), but perhaps in contractual terms (e.g. assignment of IP rights).  (It would be tempting to add privacy to this list of potential abuses, but as the Google/Doubleclick case has shown DG COMP clearly understands that the privacy acquis already prohibits abuses with personal data.)

 

However, the bigger challenge clearly lies in the area of intellectual property protection.  Creative commons has emerged as an efficient framework to govern mass collaboration by individual content creators, but it’s principles are only slowly being adopted by traditional content owners.  Here the preference currently seems to be to reserve all rights but not to pursue legal redress against (some) amateur reworkings of their content, such as a fansites or editing of content.  This hardly constitutes legal certainty for creative users, and the calls of the content industry for far more education about IPR surely necessitates a clear and fair framework for “fair use” at the same time.

 

Some thoughts for industry

 

Although this paper has argued is favour of mass collaboration, the traditional approach to regulation is to impose rules on one market player or another.  Internet access providers (IAPs) seem invariably to be top preference, with large online services a close second.  In order to avoid burdensome regulation, it clearly falls to these players to help instigate the more contemporary approaches (that may well also prove more effective) to protecting users described above.  Using their ‘control’ over Internet users to educate, would be more valuable than trying to turn control into ‘policing’.  IAPs are also well placed to identify computers that have been infected and absorbed into a botnet.  IAPs can also help in respect of privacy (for example, if a user uses an ISP’s cache to access the Internet then it will be the cache’s IP address that gets registered with websites rather than the individuals).  Government’s should encourage this activity by guaranteeing to IAPs that such intervention would not subsequently be used to promote the (false) idea that ISPs could in fact be effective ‘policers’.