Category: Google and Search Engines

Amazon’s Pawns

I sometimes speculate at the end of my copyright class that, years hence, we’ll stop using a statutory supplement and just refer to the Amazon, YouTube, Facebook, etc. service agreements to find sources of legal authority. The cultural power of Google & Facebook gets a lot of media attention, and now Amazon is under renewed scrutiny. Wired highlights the business acumen of Jeff Bezos; Mac McClelland has told the story of the sweat it’s based on. Now The Nation is featuring an intriguing series on the company, with pieces by Robert Darnton, Michael Naumann, and Steve Wasserman (along with the slide show on 10 reasons to avoid Amazon). A few reflections on the series below:

1) Wasserman compiles an array of stats: according to the revised 2012 edition of Merchants of Culture, “in 2011 e-book sales for most publishers were “between 18 and 22 percent.” “Two decades ago, there were about 4,000 independent bookstores in the United States; only about 1,900 remain.” Publishers stand to be disintermediated, since too many have been “complacent, allergic to new ideas, even incompetent.” Amazon stands triumphant:

[By 2011], it had $48 billion in revenue, more than all six of the major American publishing conglomerates combined, with a cash reserve of $5 billion. The company is valued at nearly $100 billion and employs more than 65,000 workers (all nonunion); Bezos, according to Forbes, is the thirtieth wealthiest man in America

The aggregator has triumphed over the aggregated, and its own workers. As exposes revealed, “in one of Amazon’s main fulfillment warehouses in Allentown, Pennsylvania . . . employees risked stroke and heat exhaustion while running themselves ragged [and] [a]mbulances were routinely stationed in the facility’s giant parking lot to rush stricken workers to nearby hospitals.”
Read More


Social Search; It’s Might Be Around for a Bit

Hey! Bing is innovating! It has added social to search based on its relationship with Facebook. Oh wait, Google did that with Google+. So is this innovation or keeping up with the Joneses, err Pages and Brins? I thought this move by MS would happen faster given that FB and MS have been in bed together for some time. So did Google innovate while Microsoft and Facebook imitated? Maybe. Google certainly plays catch-up too. The real questions may turn on who executes and/or can execute better. That seems to be part of the innovation game too.

Facebook is top dog in social; Google in search. The thing they both (with MS lurking in the wings to make a big comeback (an odd thing given how well MS does as it is)) are doing is to take recommendations to a new level (with ads thrown in of course). I have tried logged in search. I must say I was surprised. To be clear, I find there is mainly rot in social network data just as there is in search. Whether I would have used Google+ had I not been at Google is unclear. Probably not. But I did. Then I searched for some law review articles and some basic technology information. WOW. The personal results at the top had links to blog posts by people whom I followed on Google + AND THEY WERE…RELEVANT. Blew my mind. My search time went down and I found credible sources faster. Will that last? Who knows? Someone may find ways to game the system, but the small experiences make me hopeful. Now to Facebook and Bing.

If Google can do well with a much smaller set of users for Google +, Facebook and Bing might do really well. After all, Facebook has the social piece and MS has some search computer science types. Whoever wins here may offer the next thing in search. I like conducting logged out searches and logged in. When logged in, I like the potential for seeing things from friends and people I trust. For example, if I start to be interested in cameras and search gives me posts by friends I’d ask anyway, that is a pretty cool result. I can read the post and call the friend for deeper advice or just use what they posted.

All in this space will, of course, cope with privacy concerns etc. But I think that this new level of relevance has the chance to co-exist with those concerns and users may flock to one of these services to have results well-beyond the current ones in search without social. In other words, let the games continue.


If Infrastructure, then Commons: an analytical framework, not a rule

It is probably worth making it clear that, as I state multiple times in the book, my argument is not “if infrastructure, then commons.” Rather, I argue that if a resource is infrastructure—defined according to functional economic criteria I set forth in the book, then there are a series of considerations one must evaluate in deciding whether or not to manage the resource as a commons. Chapter four provides a detailed analysis of what resources are infrastructure, and chapter five provides a detailed analysis of the advantages and disadvantages of commons management from the perspective of private infrastructure owner (private strategy) and from the perspective of the public (public strategy). Chapters six, seven and eight examine significant complicating factors/costs and arguments against commons management.

After reviewing the excellent posts, it occurred to me that blog readers might come away with the mistaken impression that in the book I argue that the demand side always trumps the supply side or that classifying a resource as infrastructure automatically leads to commons management. That is certainly not the case. I do argue that the demand-side analysis of infrastructure identifies and helps us to better appreciate and understand a significant weight on one side of the scale, and frankly, a weight that is often completely ignored.  Ultimately, the magnitude of the weight and relevant counterweights will vary with the infrastructure under analysis and the context.

In chapter thirteen, I argue that the case for network neutrality regulation—commons management as a public strategy applied in the context of Internet infrastructure—would remain strong even if markets were competitive. In his post, Tim disagreed with this position.  In Tim’s view, competition should be enough to sustain an open Internet, for a few reasons, but mainly because consumers will appreciate (some of) the spillovers that are produced online and will be willing to pay for (and switch to) an open infrastructure, provided that competition supplies options. I replied to his post with some reasons why I disagree. In essence, I pointed out that consumers would not appreciate all of the relevant spillovers because many spillovers spill off-network and thus private demand would still fall short of social demand, and I also noted that I was less confident about his predictions about what consumers would want and how they would react. (My disagreement with Tim about the relevance of competition in the network neutrality context should not be read to mean that competition is unimportant. The point is that the demand-side market failures are not cured by competition, just as the market failures associated with environmental pollution are not cured by competition.)

In my view, the demand side case for an open, nondiscriminatory Internet infrastructure as a matter of public strategy/regulation is strong, and would remain strong even if infrastructure markets were competitive. But as I say at the end of chapter thirteen, it is not dispositive. Here is how I conclude that chapter:

 My objective in this chapter has not been to make a dispositive case for network neutrality regulation. My objective has been to demonstrate how the infrastructure analysis, with its focus on demand-side issues and the function of commons management, reframes the debate, weights the scale in favor of sustaining end-to-end architecture and an open infrastructure, points toward a particular rule, and encourages a comparative analysis of various solutions to congestion and supply-side problems. I acknowledge that there are competing considerations and interests to balance, and I acknowledge that quantifying the weight on the scale is difficult, if not impossible. Nonetheless, I maintain that the weight is substantial. The social value attributable to a mixed Internet infrastructure is immense even if immeasurable. The basic capabilities the infrastructure provides, the public and social goods produced by users, and the transformations occurring on and off the meta-network are all indicative of such value.



Pakistan Scrubs the Net

Pakistan, which has long censored the Internet, has decided to upgrade its cybersieves. And, like all good bureaucracies, the government has put the initiative out for bid. According to the New York Times, Pakistan wants to spend $10 million on a system that can block up to 50 million URLs concurrently, with minimal effect on network speed. (That’s a lot of Web pages.) Internet censorship is on the march worldwide (and the U.S. is no exception). There are at least three interesting things about Pakistan’s move:

First, the country’s openness about its censorial goals is admirable. Pakistan is informing its citizens, along with the rest of us, that it wants to bowdlerize the Net. And, it is attempting to do so in a way that is more uniform than under its current system, where filtering varies by ISP. I don’t necessarily agree with Pakistan’s choice, but I do like that the country is straightforward with its citizens, who have begun to respond.

Second, the California-based filtering company Websense announced that it will not bid on the contract. That’s fascinating – a tech firm has decided that the public relations damage from helping Pakistan censor the Net is greater than the $10M in revenue it could gain. (Websense argues, of course, that its decision is a principled one. If you believe that, you are probably a member of the Ryan Braun Clean Competition fan club.)

Finally, the state is somewhat vague about what it will censor: it points to pornography, blasphemy, and material that affects national security. The last part is particularly worrisome: the national security trump card is a potent force after 9/11 and its concomitant fallout in Pakistan’s neighborhood, and censorship based on it tends to be secret. There is also real risk that national security interests = interests of the current government. America has an unpleasant history of censoring political dissent based on security worries, and Pakistan is no different.

I’ll be fascinated to see which companies take up Pakistan’s offer to propose…

Cross-posted at Info/Law.

Symposium on Configuring the Networked Self: Cohen’s Methodological Contributions

Julie Cohen’s extraordinarily illuminating book Configuring the Networked Self makes fundamental contributions to the field of law and technology. In this post, I’d like to focus on methodology and theory (a central concern of Chapters 1 to 4). In another post, I hope to turn to the question of realizing Cohen’s vision of human flourishing (a topic Chapters 9 and 10 address most directly).

Discussions of rights and utility dominate the intellectual property and privacy literatures.* Cohen argues that their appeal can be more rhetorical than substantive. As she has stated:

[T]he purported advantage of rights theories and economic theories is neither precisely that they are normative nor precisely that they are scientific, but that they do normative work in a scientific way. Their normative heft derives from a small number of formal principles and purports to concern questions that are a step or two removed from the particular question of policy to be decided. . . . These theories manifest a quasi-scientific neutrality as to copyright law that consists precisely in the high degree of abstraction with which they facilitate thinking about processes of cultural transmission.

Cohen notes “copyright scholars’ aversion to the complexities of cultural theory, which persistently violates those principles.” But she feels they should embrace it, given that it offers “account[s] of the nature and development of knowledge that [are] both far more robust and far more nuanced than anything that liberal political philosophy has to offer. . . . [particularly in understanding] how existing knowledge systems have evolved, and how they are encoded and enforced.”

A term like “knowledge system” may itself seem very abstract and formal. But Cohen’s work insists on a capacious view of network-enabled forms of knowing. Rather than naturalizing and accepting as given the limits of copyright and privacy law on the dissemination of knowledge, she can subsume them into a much broader framework of understanding where “knowing” is going. That framework includes cultural practices, norms, economics, and bureaucratic processes, as well as law.
Read More


Santorum: Please Don’t Google

If you Google “Santorum,” you’ll find that two of the top three search results take an unusual angle on the Republican candidate, thanks to sex columnist Dan Savage. (I very nearly used “Santorum” as a Google example in class last semester, and only just thought better of it.) Santorum’s supporters want Google to push the, er, less conventional site further down the rankings, and allege that Google’s failure to do so is political biased. That claim is obviously a load of Santorum, but the situation has drawn more thoughtful responses. Danny Sullivan argues that Google should implement a disclaimer, because kids may search on “Santorum” and be disturbed by what they find, or because they may think Google has a political agenda. (The site has one for “jew,” for example. For a long time, the first result for that search term was to the odious and anti-Semitic JewWatch site.)

This suggestion is well-intentioned but flatly wrong. I’m not an absolutist: I like how Google handled the problem of having a bunch of skinheads show up as a top result for “jew.” But I don’t want Google as the Web police, though many disagree. Should the site implement a disclaimer if you search for “Tommy Lee Pamela Anderson”? (Warning: sex tape.) If you search for “flat earth theory,” should Google tell you that you are potentially a moron? I don’t think so. Disclaimers should be the nuclear option for Google – partly so they continue to attract attention, and partly because they move Google from a primarily passive role as filter to a more active one as commentator. I generally like my Web results without knowing what Google thinks about them.

Evgeny Morozov has made a similar suggestion, though along different lines: he wants Google to put up a banner or signal when someone searches for links between vaccines and autism, or proof that the Pentagon / Israelis / Santa Claus was behind the 9/11 attacks. I’m more sympathetic to Evgeny’s idea, but I would limit banners or disclaimers to situations that meet two criteria. First, the facts of the issue must be clear-cut: pi is not equal to three (and no one really thinks so), and the planet is indisputably getting warmer. And second, the issue must be one that is both currently relevant and with significant consequences. The flat earthers don’t count; the anti-vaccine nuts do. (People who fail to immunize their children not only put them at risk; they put their classmates and friends at risk, too.) Lastly, I think there’s importance to having both a sense of humor and a respect for discordant, even false speech. The Santorum thing is darn funny. And, in the political realm, we have a laudable history of tolerating false or inflammatory speech, because we know the perils of censorship. So, keeping spreading Santorum!

Danielle, Frank, and the other CoOp folks have kindly let me hang around their blog like a slovenly houseguest, and I’d like to thank them for it. See you soon!

Cross-posted at Info/Law.


Ubiquitous Infringement

Lifehacker‘s Adam Dachis has a great article on how users can deal with a world in which they infringe copyright constantly, both deliberately and inadvertently. (Disclaimer alert: I talked with Adam about the piece.) It’s a practical guide to a strict liability regime – no intent / knowledge requirement for direct infringement – that operates not as a coherent body of law, but as a series of reified bargains among stakeholders. And props to Adam for the Downfall reference! I couldn’t get by without the mockery of the iPhone or SOPA that it makes possible…

Cross-posted to Info/Law.


The Memory Hole

On RocketLawyer’s Legally Easy podcast, I talk with Charley Moore and Eva Arevuo about the EU’s proposed “right to be forgotten” and privacy as censorship. I was inspired by Jeff Rosen and Jane Yakowitz‘s critiques of the approach, which actually appears to be a “right to lie effectively.” If you can disappear unflattering – and truthful – information, it lets you deceive others – in other words, you benefit and they are harmed. The EU’s approach is a blunderbuss where a scalpel is needed.

Cross-posted at Info/Law.


Autonomous Agents and Extension of Law: Policymakers Should be Aware of Technical Nuances

This post expands upon a theme from Samir Chopra and Lawrence White’s excellent and thought-provoking book – A Legal Theory for Autonomous Artificial Agents.  One question pervading the text: to what extent should lawmakers import or extend existing legal frameworks to cover the activities of autonomous (or partially autonomous) computer systems and machines?   These are legal frameworks that were originally created to regulate human actors.  For example, the authors query whether the doctrines and principles of agency law can be mapped onto actions carried out by automated systems on behalf of their users?  As the book notes, autonomous systems are already an integral part of existing commercial areas (e.g. finance) and may be poised to emerge in others over the next few decades (e.g. autonomous, self-driving automobiles). However, it is helpful to further expand upon one dimension raised by the text: the relationship between the technology underlying autonomous agents, and the activity or results produced by the technology.

Two Views of Artificial Intelligence

The emergence of partially autonomous systems – computer programs (or machines) carrying out activities at least partially in a self-directed way, on behalf of their users, is closely aligned with the field of Artificial Intelligence (AI) and developments therein. (AI is a sub-discipline of computer science.) What is the goal of AI research? There is probably no universally agreed upon answer to this question – as there have been a range of approaches and criteria for systems considered to be successful advances in the field. However, some AI researchers have helpfully clarified two dimensions along which we can think about AI developments. Consider a spectrum of possible criteria under which one might label a system to be a “successful” AI product:

View 1) We might consider a system to be artificially intelligent only if it produces “intelligent” results based upon processes that model, approach or replicate the high-level cognitive abilities or abstract reasoning skills of humans ;or

View 2) We might most evaluate a system primarily based upon the quality of the output it produces – if it produces results that humans would consider accurate and helpful – even if the results or output came about through processes that do not necessarily model , approach, or resemble actual human cognition, understanding, or reasoning.

We can understand the first view as being concerned with creating systems that replicate to some degree something approaching human thinking and understanding, whereas the second is more concerned with producing results or output from computer agents that would be considered “intelligent” and useful, even if produced from systems which likely do not approach human cognitive processes. (Russell and Norvig, Artificial Intelligence: A Modern Approach, 3 Ed, 2009, 1-5). These views represent poles on a spectrum, and many actual positions fall in between. However, this distinction is more than philosophical.  It has implications on the sensibility of extending existing legal doctrines to cover the activities of artificial agents. Let us consider each view briefly in turn, and some possible implications upon law.

View 1 – Artificial Intelligence as Replicating Some or All Human Cognition

The first characterization – that computer systems will be successful within AI when they produce activities resulting from processes approaching the high-level cognitive abilities of humans, is considered an expansive and perhaps more ambitious characterization of the goals of AI. It also seems to be the one most closely associated with the view of AI research in the public imagination. In popular culture, artificially intelligent systems replicate and instantiate – to varying degrees – the thinking facilities of humans (e.g. the ability to engage in abstract thought, carry on an intelligent conversation, or to understand or philosophize concerning concepts at a depth associated with intelligence). I raise this variant primarily to note that despite   (what I believe is a) common lay view of the state of the research- this “strong” vision of AI is not something that has been realized (or is necessarily near realization) within the existing state-of-the art systems that are considered successful products of AI research. As I will suggest shortly, this nuance may not be something within the awareness of lawmakers and judges who will be the arbiters of such decisions concerning systems that are labeled artificially intelligent.  Although AI research has not yet produced artificial human-level cognition, that does not mean that AI research has been unsuccessful.  Quite to the contrary – over the last 20 years AI research has produced a series of more limited, but spectacularly successful systems as judged by the second view.

View 2 – “Intelligent” Results (Even if Produced by Non-Cognitive Processes)

The second characterization of AI is perhaps more modest, and can be considered more “results oriented.”  This view considers a computer system (or machine) to be a success within artificial intelligence based upon whether it produces output or activities that people would agree (colloquially speaking) are “good” and “accurate” and “look intelligent.”  In other words, a useful AI system in this view is characterized by results or output are likely to approach or exceed  that which would have been produced by a human performing the same task.  Under this view, if the system or machine produces useful, human-like results, this is a successful AI machine – irrespective as to whether these results were produced from a computer-based process instantiating or resembling human cognition, intelligence or abstract reasoning.

In this second view, AI “success” is measured based upon whether the autonomous system produces “intelligent” (or useful) output or results.  We can use what would be considered “intelligent” conduct of a similarly situated human as a comparator. If a modern auto-pilot system is capable of landing airplanes in difficult conditions (such as thick fog) at a success rate that meets or exceeds human pilots under similar conditions, we might label it a successful AI system under this second approach. This would be the case even if we all agreed that the autonomous autopilot system did not have a meaningful understanding of the concepts of “airplanes”, “runways”, or “airports.” Similarly, we might label IBM’s Jeopardy playing “Watson” computer system to be a successful AI system since it was capable of producing highly accurate answers, to a surprisingly wide and difficult range of questions – the same answers that a strong, human Jeopardy champions would have produced. However, there is no suggestion that Watson’s results were the result of the same high-level cognitive understanding and processes that likely animated the result of the human champions like Ken Jennings. Rather, Watson’s accurate output came from techniques such as highly sophisticated statistical machine-learning algorithms that were able to quickly rank possible candidate answers through immense parallel processing of large amounts of existing written documents that happened to contain a great deal knowledge about the world.

Machine-Translation: Automated Translation as an Example

To understand this distinction between AI views rooted in computer-based cognition and those in “intelligent” or accurate results, it is helpful to examine the history of computer-based language translation (e.g. English to French). Translation (at least superficially) appears to be a task deeply connected to the human understanding of the meaning of language, and the conscious replication of that meaning in the target language. Early approaches to machine translation followed this cue, and sought to convey aspects to computer system – like the rules of grammar in both languages, and the pairing of words with the same meanings in both language – that might mimic the internal structures undergirding human cognition and translation. However, this meaning and rules-based approach to translation proved limited and surprised researchers by producing somewhat poor results based upon the rules of matching and syntactical construction. Such system had difficulty in determining whether the word “plant” in English should be translated to the equivalent of “houseplant” or “manufacturing plant” in French. Further efforts attempted to “teach” the computer rules about how to understand and make more accurate distinctions for ambiguously situated words but still did not produce marked improvements in translation quality.

Machine Learning Algorithms: Using Statistics to Produce Surprisingly Good Translations

However, over the last 10-15 years, a markedly different approach to computer translation occurred – made famous by Google and others. This approach was not primarily based upon top-down communication of the basics of constructing and conveying knowledge to a computer system (e.g. language pairing and rules of meaning). Rather, many of the successful translation techniques developed were largely statistical in nature, relying on machine-learning algorithms to scour large amounts of data and create a complex representation of correlations between languages. Google translate – and other similar statistical approaches – work in part by leveraging vast amounts of data that has previously been translated by humans. For example, the United Nations and the European Union frequently translate official documents into multiple languages using professional translators. This “corpus” of millions of paired and translated documents became publicly available electronically over the last 20 years to researchers. Systems such as Google Translate are able to process vast numbers of documents and leverage these paired, translated translation to create statistical models which are able to produce surprisingly accurate translation results using probabilities – for arbitrary new texts.

Machine-Learning Models: Producing “intelligent”, highly useful results 

The important point is that these statistical and probability-based machine-learning models (often combined with logical-knowledge based rules about the world) often produce high-quality and effective results (not quite up to the par of nuanced human translators at this point), without any assertion that the computers are engaging in profound understanding with the underlying “meaning” of the translated sentences or employing processes whose analytical abilities approach human-level cognition (e.g. view 1). (It is important to note that the machine-learning translation approach does not achieve translation on its own but “leverages: previous human cognition through the efforts of the original UN translators that made the paired translations.)  Thus, for certain, limited tasks,  these systems have shown that it is possible for contemporary autonomous agents to produce “intelligent” results without relying upon what we would consider processes approaching human-level cognition.

Distinguishing “intelligent results” and actions produced via cognitive intelligence

The reason to flag this distinction, is that such successful AI systems (as judged by their results), will pose a challenge to the task of importing and extending of existing legal doctrinal frameworks – (which were mostly designed to regulate people) into the domain of autonomous computer agents.  Existing “type 2″ systems that produce surprisingly sophisticated, useful, and accurate results without approaching human cognition are the basis of many products now emerging from earlier AI research and are becoming integrated (or are poised to become ) integrated into life.    These include IBM’s Watson, Apple’s SIRI, Google Search – and in perhaps the next decade or two – Stanford’s/Google’s Autonomous self-driving cars, and autonomous music composing software.  These systems often use statistics to leverage existing, implicit human knowledge.  Since these systems produce output or activities that in some cases appear to approach or exceed humans in particular tasks, and the results that are autonomously produced are often surprisingly sophisticated, and seemingly intelligent – such “results-oriented”, task specific (e.g. driving, answering questions, landing planes) systems seem to be the near path of much AI research.

However, the fact that these intelligent-seeming results do not result from systems approaching human-cognition is a nuance that should not be lost on policymakers (and judges) seeking to develop doctrine in the area of autonomous agents. Much – perhaps most of law – is designed and intended to regulate the behavior of humans (or organizations run by humans).  Thus embedded in many existing legal doctrines are underlying assumptions about cognition and intentionality that are implicit and are so basic that they are often not articulated.   The implicitness of such assumptions may make these assumptions easy to overlook.

Given current trends, many contemporary (and likely future) AI systems that will be integrated into society (and therefore more likely the subject of legal regulation) will use algorithmic techniques focused upon producing “useful results” (view 2), rather than focusing on systems aimed at replicating human-level cognition, self-reflection, and abstraction (view 1).  If lawmakers merely follow the verbiage (e.g. a system that has been labeled “artificially intelligent” did X or resulted in Y) and employ only a superficial understanding of AI research, without more closely understanding these technical nuances, there is the possibility of conflation in extending existing legal doctrines to circumstances based upon “intelligent seeming” autonomous results.   For example, the book authors explore the concept of requiring fiduciary duties on the part of autonomous systems in some circumstances. But it will take a careful judge or lawmaker to distinguish existing fiduciary/agency doctrines with embedded (and often unarticulated) assumptions of human-level intentionality among agents (e.g. self-dealing) from those that may be more functional in nature (e.g. duties to invest trust funds). In other words, an in-depth understanding of the technology underlying particular autonomous agents should not be viewed as a technical issue.   Rather it is a serious consideration which should be understood in some detail by lawmakers in any decisions to extend or create new legal doctrine from our existing framework to cover situations involving autonomous agents.


The E.U. Data Protection Directive and Robot Chicken

The European Commission released a draft of its revised Data Protection Directive this morning, and Jane Yakowitz has a trenchant critique up at In addition to the sharp legal analysis, her article has both a Star Wars and Robot Chicken reference, which makes it basically the perfect information law piece…