Category: Google and Search Engines


The Memory Hole

On RocketLawyer’s Legally Easy podcast, I talk with Charley Moore and Eva Arevuo about the EU’s proposed “right to be forgotten” and privacy as censorship. I was inspired by Jeff Rosen and Jane Yakowitz‘s critiques of the approach, which actually appears to be a “right to lie effectively.” If you can disappear unflattering – and truthful – information, it lets you deceive others – in other words, you benefit and they are harmed. The EU’s approach is a blunderbuss where a scalpel is needed.

Cross-posted at Info/Law.


Autonomous Agents and Extension of Law: Policymakers Should be Aware of Technical Nuances

This post expands upon a theme from Samir Chopra and Lawrence White’s excellent and thought-provoking book – A Legal Theory for Autonomous Artificial Agents.  One question pervading the text: to what extent should lawmakers import or extend existing legal frameworks to cover the activities of autonomous (or partially autonomous) computer systems and machines?   These are legal frameworks that were originally created to regulate human actors.  For example, the authors query whether the doctrines and principles of agency law can be mapped onto actions carried out by automated systems on behalf of their users?  As the book notes, autonomous systems are already an integral part of existing commercial areas (e.g. finance) and may be poised to emerge in others over the next few decades (e.g. autonomous, self-driving automobiles). However, it is helpful to further expand upon one dimension raised by the text: the relationship between the technology underlying autonomous agents, and the activity or results produced by the technology.

Two Views of Artificial Intelligence

The emergence of partially autonomous systems – computer programs (or machines) carrying out activities at least partially in a self-directed way, on behalf of their users, is closely aligned with the field of Artificial Intelligence (AI) and developments therein. (AI is a sub-discipline of computer science.) What is the goal of AI research? There is probably no universally agreed upon answer to this question – as there have been a range of approaches and criteria for systems considered to be successful advances in the field. However, some AI researchers have helpfully clarified two dimensions along which we can think about AI developments. Consider a spectrum of possible criteria under which one might label a system to be a “successful” AI product:

View 1) We might consider a system to be artificially intelligent only if it produces “intelligent” results based upon processes that model, approach or replicate the high-level cognitive abilities or abstract reasoning skills of humans ;or

View 2) We might most evaluate a system primarily based upon the quality of the output it produces – if it produces results that humans would consider accurate and helpful – even if the results or output came about through processes that do not necessarily model , approach, or resemble actual human cognition, understanding, or reasoning.

We can understand the first view as being concerned with creating systems that replicate to some degree something approaching human thinking and understanding, whereas the second is more concerned with producing results or output from computer agents that would be considered “intelligent” and useful, even if produced from systems which likely do not approach human cognitive processes. (Russell and Norvig, Artificial Intelligence: A Modern Approach, 3 Ed, 2009, 1-5). These views represent poles on a spectrum, and many actual positions fall in between. However, this distinction is more than philosophical.  It has implications on the sensibility of extending existing legal doctrines to cover the activities of artificial agents. Let us consider each view briefly in turn, and some possible implications upon law.

View 1 – Artificial Intelligence as Replicating Some or All Human Cognition

The first characterization – that computer systems will be successful within AI when they produce activities resulting from processes approaching the high-level cognitive abilities of humans, is considered an expansive and perhaps more ambitious characterization of the goals of AI. It also seems to be the one most closely associated with the view of AI research in the public imagination. In popular culture, artificially intelligent systems replicate and instantiate – to varying degrees – the thinking facilities of humans (e.g. the ability to engage in abstract thought, carry on an intelligent conversation, or to understand or philosophize concerning concepts at a depth associated with intelligence). I raise this variant primarily to note that despite   (what I believe is a) common lay view of the state of the research- this “strong” vision of AI is not something that has been realized (or is necessarily near realization) within the existing state-of-the art systems that are considered successful products of AI research. As I will suggest shortly, this nuance may not be something within the awareness of lawmakers and judges who will be the arbiters of such decisions concerning systems that are labeled artificially intelligent.  Although AI research has not yet produced artificial human-level cognition, that does not mean that AI research has been unsuccessful.  Quite to the contrary – over the last 20 years AI research has produced a series of more limited, but spectacularly successful systems as judged by the second view.

View 2 – “Intelligent” Results (Even if Produced by Non-Cognitive Processes)

The second characterization of AI is perhaps more modest, and can be considered more “results oriented.”  This view considers a computer system (or machine) to be a success within artificial intelligence based upon whether it produces output or activities that people would agree (colloquially speaking) are “good” and “accurate” and “look intelligent.”  In other words, a useful AI system in this view is characterized by results or output are likely to approach or exceed  that which would have been produced by a human performing the same task.  Under this view, if the system or machine produces useful, human-like results, this is a successful AI machine – irrespective as to whether these results were produced from a computer-based process instantiating or resembling human cognition, intelligence or abstract reasoning.

In this second view, AI “success” is measured based upon whether the autonomous system produces “intelligent” (or useful) output or results.  We can use what would be considered “intelligent” conduct of a similarly situated human as a comparator. If a modern auto-pilot system is capable of landing airplanes in difficult conditions (such as thick fog) at a success rate that meets or exceeds human pilots under similar conditions, we might label it a successful AI system under this second approach. This would be the case even if we all agreed that the autonomous autopilot system did not have a meaningful understanding of the concepts of “airplanes”, “runways”, or “airports.” Similarly, we might label IBM’s Jeopardy playing “Watson” computer system to be a successful AI system since it was capable of producing highly accurate answers, to a surprisingly wide and difficult range of questions – the same answers that a strong, human Jeopardy champions would have produced. However, there is no suggestion that Watson’s results were the result of the same high-level cognitive understanding and processes that likely animated the result of the human champions like Ken Jennings. Rather, Watson’s accurate output came from techniques such as highly sophisticated statistical machine-learning algorithms that were able to quickly rank possible candidate answers through immense parallel processing of large amounts of existing written documents that happened to contain a great deal knowledge about the world.

Machine-Translation: Automated Translation as an Example

To understand this distinction between AI views rooted in computer-based cognition and those in “intelligent” or accurate results, it is helpful to examine the history of computer-based language translation (e.g. English to French). Translation (at least superficially) appears to be a task deeply connected to the human understanding of the meaning of language, and the conscious replication of that meaning in the target language. Early approaches to machine translation followed this cue, and sought to convey aspects to computer system – like the rules of grammar in both languages, and the pairing of words with the same meanings in both language – that might mimic the internal structures undergirding human cognition and translation. However, this meaning and rules-based approach to translation proved limited and surprised researchers by producing somewhat poor results based upon the rules of matching and syntactical construction. Such system had difficulty in determining whether the word “plant” in English should be translated to the equivalent of “houseplant” or “manufacturing plant” in French. Further efforts attempted to “teach” the computer rules about how to understand and make more accurate distinctions for ambiguously situated words but still did not produce marked improvements in translation quality.

Machine Learning Algorithms: Using Statistics to Produce Surprisingly Good Translations

However, over the last 10-15 years, a markedly different approach to computer translation occurred – made famous by Google and others. This approach was not primarily based upon top-down communication of the basics of constructing and conveying knowledge to a computer system (e.g. language pairing and rules of meaning). Rather, many of the successful translation techniques developed were largely statistical in nature, relying on machine-learning algorithms to scour large amounts of data and create a complex representation of correlations between languages. Google translate – and other similar statistical approaches – work in part by leveraging vast amounts of data that has previously been translated by humans. For example, the United Nations and the European Union frequently translate official documents into multiple languages using professional translators. This “corpus” of millions of paired and translated documents became publicly available electronically over the last 20 years to researchers. Systems such as Google Translate are able to process vast numbers of documents and leverage these paired, translated translation to create statistical models which are able to produce surprisingly accurate translation results using probabilities – for arbitrary new texts.

Machine-Learning Models: Producing “intelligent”, highly useful results 

The important point is that these statistical and probability-based machine-learning models (often combined with logical-knowledge based rules about the world) often produce high-quality and effective results (not quite up to the par of nuanced human translators at this point), without any assertion that the computers are engaging in profound understanding with the underlying “meaning” of the translated sentences or employing processes whose analytical abilities approach human-level cognition (e.g. view 1). (It is important to note that the machine-learning translation approach does not achieve translation on its own but “leverages: previous human cognition through the efforts of the original UN translators that made the paired translations.)  Thus, for certain, limited tasks,  these systems have shown that it is possible for contemporary autonomous agents to produce “intelligent” results without relying upon what we would consider processes approaching human-level cognition.

Distinguishing “intelligent results” and actions produced via cognitive intelligence

The reason to flag this distinction, is that such successful AI systems (as judged by their results), will pose a challenge to the task of importing and extending of existing legal doctrinal frameworks – (which were mostly designed to regulate people) into the domain of autonomous computer agents.  Existing “type 2″ systems that produce surprisingly sophisticated, useful, and accurate results without approaching human cognition are the basis of many products now emerging from earlier AI research and are becoming integrated (or are poised to become ) integrated into life.    These include IBM’s Watson, Apple’s SIRI, Google Search – and in perhaps the next decade or two – Stanford’s/Google’s Autonomous self-driving cars, and autonomous music composing software.  These systems often use statistics to leverage existing, implicit human knowledge.  Since these systems produce output or activities that in some cases appear to approach or exceed humans in particular tasks, and the results that are autonomously produced are often surprisingly sophisticated, and seemingly intelligent – such “results-oriented”, task specific (e.g. driving, answering questions, landing planes) systems seem to be the near path of much AI research.

However, the fact that these intelligent-seeming results do not result from systems approaching human-cognition is a nuance that should not be lost on policymakers (and judges) seeking to develop doctrine in the area of autonomous agents. Much – perhaps most of law – is designed and intended to regulate the behavior of humans (or organizations run by humans).  Thus embedded in many existing legal doctrines are underlying assumptions about cognition and intentionality that are implicit and are so basic that they are often not articulated.   The implicitness of such assumptions may make these assumptions easy to overlook.

Given current trends, many contemporary (and likely future) AI systems that will be integrated into society (and therefore more likely the subject of legal regulation) will use algorithmic techniques focused upon producing “useful results” (view 2), rather than focusing on systems aimed at replicating human-level cognition, self-reflection, and abstraction (view 1).  If lawmakers merely follow the verbiage (e.g. a system that has been labeled “artificially intelligent” did X or resulted in Y) and employ only a superficial understanding of AI research, without more closely understanding these technical nuances, there is the possibility of conflation in extending existing legal doctrines to circumstances based upon “intelligent seeming” autonomous results.   For example, the book authors explore the concept of requiring fiduciary duties on the part of autonomous systems in some circumstances. But it will take a careful judge or lawmaker to distinguish existing fiduciary/agency doctrines with embedded (and often unarticulated) assumptions of human-level intentionality among agents (e.g. self-dealing) from those that may be more functional in nature (e.g. duties to invest trust funds). In other words, an in-depth understanding of the technology underlying particular autonomous agents should not be viewed as a technical issue.   Rather it is a serious consideration which should be understood in some detail by lawmakers in any decisions to extend or create new legal doctrine from our existing framework to cover situations involving autonomous agents.


The E.U. Data Protection Directive and Robot Chicken

The European Commission released a draft of its revised Data Protection Directive this morning, and Jane Yakowitz has a trenchant critique up at In addition to the sharp legal analysis, her article has both a Star Wars and Robot Chicken reference, which makes it basically the perfect information law piece…


Censorship on the March

Today, you can’t get to The Oatmeal, or Dinosaur Comics, or XKCD, or (less importantly) Wikipedia. The sites have gone dark to protest the Stop Online Piracy Act (SOPA) and the PROTECT IP Act, America’s attempt to censor the Internet to reduce copyright infringement. This is part of a remarkable, distributed, coordinated protest effort, both online and in realspace (I saw my colleague and friend Jonathan Askin headed to protest outside the offices of Senators Charles Schumer and Kirstin Gillibrand). Many of the protesters argue that America is headed in the direction of authoritarian states such as China, Iran, and Bahrain in censoring the Net. The problem, though, is that America is not alone: most Western democracies are censoring the Internet. Britain does it for child pornography. France: hate speech. The EU is debating a proposal to allow “flagging” of objectionable content for ISPs to ban. Australia’s ISPs are engaging in pre-emptive censorship to prevent even worse legislation from passing. India wants Facebook, Google, and other online platforms to remove any content the government finds problematic.

Censorship is on the march, in democracies as well as dictatorships. With this movement we see, finally, the death of the American myth of free speech exceptionalism. We have viewed ourselves as qualitatively different – as defenders of unfettered expression. We are not. Even without SOPA and PROTECT IP, we are seizing domain names, filtering municipal wi-fi, and using funding to leverage colleges and universities to filter P2P. The reasons for American Internet censorship differ from those of France, South Korea, or China. The mechanism of restriction does not. It is time for us to be honest: America, too, censors. I think we can, and should, defend the legitimacy of our restrictions – the fight on-line and in Congress and in the media shows how we differ from China – but we need to stop pretending there is an easy line to be drawn between blocking human rights sites and blocking Rojadirecta or Dajaz1.

Cross-posted at Info/Law.


The Fight For Internet Censorship

Thanks to Danielle and the CoOp crew for having me! I’m excited.

Speaking of exciting developments, it appears that the Stop Online Piracy Act (SOPA) is dead, at least for now. House Majority Leader Eric Cantor has said that the bill will not move forward until there is a consensus position on it, which is to say, never. Media sources credit the Obama administration’s opposition to some of the more noxious parts of SOPA, such as its DNSSEC-killing filtering provisions, and also the tech community’s efforts to raise awareness. (Techdirt’s Mike Masnick has been working overtime in reporting on SOPA; Wikipedia and Reddit are adopting a blackout to draw attention; even the New York City techies are holding a demonstration in front of the offices of Senators Kirstin Gillibrand and Charles Schumer. Schumer has been bailing water on the SOPA front after one of his staffers told a local entrepreneur that the senator supports Internet censorship. Props for candor.) I think the Obama administration’s lack of enthusiasm for the bill is important, but I suspect that a crowded legislative calendar is also playing a significant role.

Of course, the PROTECT IP Act is still floating around the Senate. It’s less worse than SOPA, in the same way that Transformers 2 is less worse than Transformers 3. (You still might want to see what else Netflix has available.) And sponsor Senator Patrick Leahy has suggested that the DNS filtering provisions of the bill be studied – after the legislation is passed. It’s much more efficient, legislatively, to regulate first and then see if it will be effective. A more cynical view is that Senator Leahy’s move is a public relations tactic designed to undercut the opposition, but no one wants to say so to his face.

I am not opposed to Internet censorship in all situations, which means I am often lonely at tech-related events. But these bills have significant flaws. They threaten to badly weaken cybersecurity, an area that is purportedly a national priority (and has been for 15 years). They claim to address a major threat to IP rightsholders despite the complete lack of data that the threat is anything other than chimerical. They provide scant procedural protections for accused infringers, and confer extraordinary power on private rightsholders – power that will, inevitably, be abused. And they reflect a significant public choice imbalance in how IP and Internet policy is made in the United States.

Surprisingly, the Obama administration has it about right: we shouldn’t reject Internet censorship as a regulatory mechanism out of hand, but we should be wary of it. This isn’t the last stage of this debate – like Wesley in The Princess Bride, SOPA-like legislation is only mostly dead. (And, if you don’t like the Obama administration’s position today, just wait a day or two.)

Cross-posted at Info/Law.


Stanford Law Review Online: Don’t Break the Internet

Stanford Law Review

The Stanford Law Review Online has just published a piece by Mark Lemley, David S. Levine, and David G. Post on the PROTECT IP Act and the Stop Online Piracy Act. In Don’t Break the Internet, they argue that the two bills — intended to counter online copyright and trademark infringement — “share an underlying approach and an enforcement philosophy that pose grave constitutional problems and that could have potentially disastrous consequences for the stability and security of the Internet’s addressing system, for the principle of interconnectivity that has helped drive the Internet’s extraordinary growth, and for free expression.”

They write:

These bills, and the enforcement philosophy that underlies them, represent a dramatic retreat from this country’s tradition of leadership in supporting the free exchange of information and ideas on the Internet. At a time when many foreign governments have dramatically stepped up their efforts to censor Internet communications, these bills would incorporate into U.S. law a principle more closely associated with those repressive regimes: a right to insist on the removal of content from the global Internet, regardless of where it may have originated or be located, in service of the exigencies of domestic law.

Read the full article, Don’t Break the Internet by Mark Lemley, David S. Levine, and David G. Post, at the Stanford Law Review Online.

Note: Corrected typo in first paragraph.


The Pluses of Google+

I love shiny new toys. Sometimes, its a crisp new book (Pauline Maier, for one… thanks Gerard!); other times, it’s something plush and adorable, like the yellow Angry Birds doll my 5-year-old nephew “bought” for me last month. Last week, it was Google+.

Google+ is social networking done the Google way. The soft launch is part of Google’s long-running master plan to enter the social networking market and try to do it better than the basically moribund MySpace and the supposedly plateauing Facebook. We are told that Google+’s chief asset is its ability to simulate real relationships, and our different interactions with different types of friends, on the Internet.

Google+ introduces us to circles, where you can take the 800 or so “friends” you would have on Facebook and break them down on your own terms. You have friends, acquaintances, co-workers, well-wishers, frenemies, those-guys-you-met-at-that-terrible-bar, whatever. And, you can use these classifications to tailor your interactions, thus avoiding the problem of your mother, sister or child accessing a picture meant for your pals.

There are also sparks, which are news and video aggregators. It is easy enough to tell a spark what you enjoy doing when you’re not working on important affairs of state, thus allowing you to spend “more time wasting time without wasting your time looking how to waste time.”

And, hangouts are Google+’s attempts to recreate chance encounters. I’m not sure these are completely functioning yet, though. Remember when you used to visit the mall or walked through the West Village and ran into someone you hadn’t seen in years? Hangouts attempt to turn an online social networking into a place where anything social can happen, only with Google+, you “bump” into someone through a video message.

Let’s assume for the moment that all this works as well as we hope and that Google+ allows us to recreate real life in the virtual realm. Facebook is not really trying to recreate real life and simulate precisely how we interact with one another in the physical world. It is trying to supplement it, foster new interactions in new ways. At times, we don’t like that. Facebook’s forced socialization and privacy issues give many social networkers pause. There are many other digital technologies that seek to supplement our physical social world. Grindr, a geolocating social networking service for gay men, is one such example. Grindr allows its members to be out and about, smartphone in hand and find other gay men in the vicinity. Its purpose is to eschew traditional social networking that keeps you saddled to your computer and to let you physically meet people you have something in common with who may be living across the street or down the block. It is interactive, mobile and a multi-purpose tool.

So, Google+ is trying to forge a different path, i.e., using the Internet as an extension of our physical social circles and to keep those circles the way they are now. Of course, that is not to say that Google+ will not bring us closer to new friends — we can still interact with friends of friends, let people we barely know into our network and share content with whomever we please. But, Google+’s chief draw appears to be its greater fidelity to real life. If that is true in the long run, as Google works out the kinks and listens to its users, is that what we want in our online social networks?

The benefits are clear — we can avoid the grandmother seeing you at the bar problem. But there are also disadvantages — we lose the liberating potential of reaching new people. What do you think?

Beyond Cyber-Utopianism

What encapsulates the ethos of Silicon Valley? Promoting his company’s prowess at personalization, Mark Zuckerberg once said that, “A squirrel dying in front of your house may be more relevant to your interests right now than people dying in Africa.” Scott Cleland argues that “you can’t trust Google, Inc.,” compiling a critical mass of dubious practices that might seem quite understandable each taken alone. Apple’s “reality distortion field” is the topic of numerous satires. As the internet increasingly converges through these three companies, what are the values driving their decisionmaking?

For some boosters, these are not terribly important questions: the logic of the net itself assures progress. But for Chris Lehmann, the highflying internet-academic-industrial complex has failed to think critically about a consolidating, commercialized cyberspace. Previously featured on this blog for his book, Lehmann’s review of Clay Shirky’s Cognitive Surplus is fairly scathing:

With the emergence of Web 2.0–style social media (things like Facebook, Twitter and text messaging), Shirky writes, we inhabit an unprecedented social reality, “a world where public and private media blend together, where professional and amateur production blur, and where voluntary public participation has moved from nonexistent to fundamental.” This Valhalla of voluntary intellectual labor represents a stupendous crowdsourcing, or pooling, of the planet’s mental resources, hence the idea of the “cognitive surplus.” . . .

[But why] assign any special value to an hour spent online in the first place? Given the proven models of revenue on the web, it’s reasonable to assume that a good chunk of those trillion-plus online hours are devoted to gambling and downloading porn. Yes, the networked web world does produce some appreciable social goods, such as the YouTubed “It Gets Better” appeals to bullied gay teens contemplating suicide. But there’s nothing innate in the character of digital communication that favors feats of compassion and creativity; for every “It Gets Better” video that goes viral, there’s an equally robust traffic in white nationalist, birther and jihadist content online. . . .

Read More