Home | About | RSS Feed | Contact and Publicity Guidelines | Comment Policy the Law, the Universe, and Everything 

advertise-here4


Slip Opinions


Groundhog Day. (fp)

Banned in Tucson. (kw)

The Best and Worst of 2011 in Race and Law (kw)

Tortured to death for trespassing. (fp)

Drones of contention. (fp)

DOJ still coddling banks. (fp)

Creative destruction? Thank banks. (fp)

Blog about a new book, on how to talk to little girls--stressing smarts not cutes.   LAC

Macey on the heroic Rakoff. (fp)

Captured NY Fed. (fp)


solicitors

Our Podcast

Subscribe to Law Talk

law-rev-contents2.jpg


  • Posts by Author

  • Categories

  • Archives


  • Recent Comments


    • JR on Physical Punishment and Parental Rights

    • Jan on Physical Punishment and Parental Rights

    • Mark on Physical Punishment and Parental Rights

    • Shag from Brookline on Omelets and Eggs

    • Shag from Brookline on Omelets and Eggs

    • Joe on What Exactly is Wrong With Polygamy?

    • Phil on What Exactly is Wrong With Polygamy?

    • Lee on Lifecycles and the Firm

    • Car accident claim lawyers on Symposium Next Week on "A Legal Theory for Autonomous Artificial Agents"

    • Andrew MacKie-Mason on Can't the Supreme Court Just Say No to Cameras?

    • Joe on Employment Division v. Smith is Wrong

    • Shag from Brookline on Employment Division v. Smith is Wrong

    • Joe on Employment Division v. Smith is Wrong

    • Joe on Super En Banc in the Ninth Circuit

    • Shag from Brookline on Employment Division v. Smith is Wrong
  •  

    Site Meter

    About the Blog

    Concurring Opinions is a multiple authored, general interest legal blog.

    (Image: Wikicommons)

Amazon’s Text Stats and a Little Orwell

posted by Deven Desai

abacus 2.JPG

Watching changes on Amazon.com is a good way to see how much one’s information can be stretched. The continual refinement of suggested books and other items is a little disturbing, but it often yields titles that I find useful. The Gold Box with its game show approach to sales is an example of the information mining. To use the Gold Box one clicks on the box and then one is offered an item that ususally relates to something you purchsed before or at least looked at. When the item is on screen, one must choose between accepting the sale offer or passing on it to see the next offer with no chance to go back to the previous offer. All decisions must be made within one hour of opening the box. I have opened the box a few times and am often surprised by some of the items that show up in there. Given how often Amazon seems to correlate interests, when what seems to be an aberration pops up, I wonder whether it is a random shot to see if it will stick or whether in some deep way Amazon has discerned that I have a hidden desire for vitamins, herbal remedies, or hairdryers. So when I saw that Amazon had added Text Stats I had to poke around. After all who knows what information would come my way by seeing the statistics (whatever they may be) on a book?

I found that not all books have this information but it seems that when publishers play along Amazon will give a book’s statistics including syllables per word; words per sentence; total number of characters, words, and sentences; and my favorites, the “Fun Stats,” words per dollar and words per ounce. Amazon takes this information and gives scores for Readability (explained below). Apparently the Bible, depending on the edition, requires either a twelfth grade reading level or a tenth grade reading level . Yet, one study of government Web sites states that “half of Americans read[] at no higher than the 8th grade level.” You may draw your own conclusions.

Text stats also gives information about where the book is in relation to all other books (and in some cases one can compare within classes of texts). So I started to poke around and it seems that (if we take the numbers seriously and there is reason not to do so when one examines exactly what readability means) perhaps the best writing correlates to simpler writing which reminded me of Orwell’s Politics and the English Language but I’ll get to that later. To have fun and play with that idea I looked at the Modern Library’s list of 100 best novels to see how they compared to all text in the Amazon set and then within literature.


Ulysses text stats show that a ninth grade reading level is required under the fog test and that 80 percent of all books are more difficult. Furthermore 10 percent of the words are complex but that means that 71 percent of all books have more complex words. At 1.5 syllables per word, 73 percent of books have more syllables per word; yet, with only 12.1 words per sentence 76 percent of literature books have more words per sentence. Oh yes, one obtains 16,776 words per dollar and 9,516 words per ounce for the Modern Library edition.

How do other books do? Only 16 percent of texts are more difficult than Heidegger’s Being and Time (not on the list). Looking at the rest of the Modern Library’s list of 100 best novels here are the numbers for the top ten: Ulysses, 80 percent are more difficult; The Great Gatsby, 79 percent; A Portrait of an Artist as a Young Man, 62 percent; Lolita, 49 percent; Brave New World, 77 percent; The Sound and the Fury; 96 percent; Catch-22, 75 percent; Darkness at Noon, (unavailable); Sons and Lovers, 93 percent; Grapes of Wrath, 90 percent.

If we take the same list and compare it to literature authors A to Z (Amazon’s classification):

Ulysses, 69 percent; The Great Gatsby, 65 percent; A Portrait of an Artist as a Young Man, 33 percent; Lolita, 21 percent; Brave New World, N/A; The Sound and the Fury; 93 percent; Catch-22, 56 percent; Darkness at Noon, (unavailable); Sons and Lovers, 96 percent; Grapes of Wrath, 92 percent.

What does all this information mean? Again draw your own conclusions. It may be a matter of style. Although The Sound and the Fury is technically simple, I don’t think one would suggest that a less than sixth grade reading level is needed to read and understand it. Still I wonder if someone could cull the data and find that better writing correlates with simpler words and sentences. That idea reminds me of Orwell’s “Politics and the English Language” where he is clear that he does prescribe rote rules of language or “setting up of a “standard English’” but still offers some rules to guide writers:

(i) Never use a metaphor, simile, or other figure of speech which you are used to seeing in print.

(ii) Never us a long word where a short one will do.

(iii) If it is possible to cut a word out, always cut it out.

(iv) Never use the passive where you can use the active.

(v) Never use a foreign phrase, a scientific word, or a jargon word if you can think of an everyday English equivalent.

(vi) Break any of these rules sooner than say anything outright barbarous.

Sounds like marching orders. Marching orders from Orwell. Hmmm.

For those curious about readability etc., Amazon gives three scores based on the Fog Index, Flesch Index, and the Flesch-Kincaid Index. The site had explanations of what readability meant but the link seems to be dead for a few weeks now. It seems that the Fog index is the Gunning Fog index which uses this formula ((words/sentence) + 100 * (complex words/words)) * 0.4 where complex words are words with three or more syllables. The score will be from the single digits on up. The score equals the grade reading level. So a score of eight means an eighth grade reading level is required; a score of 12, a twelfth grade level, and so on. The Flesch-Kincaid formulas purport to show readability as well.


 July 22, 2006 at 7:11 pm   Posted in: Culture   Print This Post Print This Post

Responses (2)

  1. John Armstrong - July 22, 2006 at 9:14 pm

    All these examples show is just how horribly reductive these indices are. The readability of a text simply cannot be reduced to a statistical analysis of its syntax with total disregard to its semantics.

    Incidentally the Declaration of Independence has a Gunning-Fog index of 15.7. Evidently one can’t understand the founding precepts of our nation until one is almost through college.

  2. Deven Desai - July 23, 2006 at 2:40 am

    John, I agree and thanks for finding the Declaration number as an example of the index’s oddity. I’m curious, where did you find it or did you calculate it on your own?

Leave a Reply

Spam protection by WP Captcha-Free


  • « Previous post
  • Next post »

Authors

Daniel J. Solove
Kaimipono Wenger
Dave Hoffman
Frank Pasquale
Deven Desai
Danielle Citron
Lawrence Cunningham
Sarah Waldeck
Jaya Ramji-Nogales
Solangel Maldonado
Gerard Magliocca

Guests

Derek Bambauer
Gabriella Coleman
andré douglas pond cummings
David Gray
Brishen Rogers
Joseph Turow
Elizabeth A. Wilson













Previous Guests

Michael Abramowicz
Michelle Adams
Robert Ahdieh
Marvin Ammori
Michelle Anderson
Laura Appleman
Taunya Lovell Banks
Ann Bartow
Steven Bellovin
Adam Benforado
Gaia Bernstein
Francesca Bignami
Josh Blackman
Joseph Blocher
Jeremy Blumenthal
Kathleen Boozang
Bruce Boyden
Donald Braman
Al Brophy
Neil H. Buchanan
Bill Burke-White
Scott Burris
Paul Butler
Ryan Calo
Naomi Cahn
Anupam Chander
Miriam Cherry
Jack Chin
Glenn Cohen
Jennifer Collins
Caroline Mala Corbin
Thomas Crocker
Allison Danner
Brannon Denning
Deven Desai
Mike Dimino
Mark Edwards
Maxine Eichner
Jessica Erickson
David Fagundes
Lisa Fairfax
Joshua Fairfield
Christine Haight Farley
Kim Ferzan
Dan Filler
Mary Anne Franks
Michael Froomkin
Amanda Frost
Brian Frye
Timothy Glynn
Rachel Godsil
Eric Goldman
Kyle Graham
David Gray
Craig Green
Tristin Green
Jonathan Hafetz
Meredith Harbach
Michelle Harner
Jeffrey Harrison
Hosea Harvey
Erica Hashimoto
Jennifer Hendricks
Carissa Hessick
Laura Heymann
Robert Hillman
Gilbert A. Holmes
Nicole Huberfeld
Christine Hurt
Darian Ibrahim
Sherrilyn Ifill
John Ip
Shavar Jeffries
Kevin Johnson
Kristin Johnson
Jeff Jonas
Courtney Joslin
Dan Kahan
Jeffrey Kahn
Brian Kalt
Sam Kamin
Michael Kang
Chimène Keitner
Alicia Kelly
Orin Kerr
Nancy Kim
Heidi Kitrosser
Adam Kolber
Russell Korobkin
Alex Kreit
Anita S. Krishnakumar
Susan Kuo
Greg Lastowka
Sarah Lawsky
Youngjae Lee
Margaret Lewis
Erik Lillquist
Jeff Lipshaw
Jonathan Lipson
Jacqueline Lipton
Matthew Lister
Joseph Liu
Michael Madison
Kevin Noble Maillard
Solangel Maldonado
Jason Mazzone
Linda McClain
William McGeveran
Salil Mehra
Carrie Menkel-Meadow
Max Minzner
Viva Moffat
Scott Moss
Eric Muller
Jaya Ramji-Nogales
Helen Norton
Elizabeth Nowicki
Paul Ohm
Angela Onwuachi-Willing
Michael O'Shea
David Opderback
Kristen Osenga
Rafael Pardo
Marcy Peek
Eduardo Peñalver
Robert Percival
Michael J. Pitts
Marc Poirier
David Post
Amanda Pustilnik
Shruti Rana
Geoffrey Rapp
Neil Richards
Lori Ringhand
Alice Ristroph
Marc Roark
Sasha Romanosky
Tuan Samahon
Susan Scafidi
David Schraub
Paul Secunda
Jonathan Siegel
Jessica Silbey
Peter Smith
Judd Sneirson
Adam Steinman
Charles Sullivan
Rick Swedloff
Olivier Sylvain
Steph Tai
Andrew Taslitz
Robert Tsai
Jenia Turner
Steve Vladeck
Ari Waldman
Spencer Weber Waller
Howard Wasserman
Melissa Waters
Frank Wu
Alfred Yen
Corey Yung
David Zaring
Timothy Zick
Michael Zimmer
Jonathan Zittrain

Ownership

Concurring Opinions is a
general-interest legal blog
operated by Concurring
Opinions LLC, a Pennsylvania
Limited Liability Corporation.

Blogroll

Above the Law
Access to Justice
ACS Blog
Althouse
Balkinization
Becker-Posner Blog
BlackProf
BoingBoing
Chicago Law Faculty Blog
Conglomerate
CrimLaw
Crime & Federalism
CrimProf Blog
Crooked Timber
Derechoalderecho
Discourse.net
Dorf on Law
Election Law
Emergent Chaos
The Faculty Lounge
Feminist Law Profs
43(B)log
Freakonomics Blog
Freedom to Tinker
Google Blogoscoped
How Appealing
Ideoblog
Info/Law
Instapundit.com
Juris Novus
Jurisdynamics
Just Books
Law and Humanities Blog
Law and Letters
Law Librarian Blog
Legal Profession Blog
Legal Theory Blog
Legal Times Blog
Leiter Reports
Brian Leiter's Law School Reports
Lessig Blog
Madisonian Theory
Media Law Blog
Mirror of Justice
The Moderate Voice
National Security Advisors
Opinio Juris
Point of Law
PrawfsBlawg
ProfessorBainbridge.com
Property Prof Blog
Red Tape Chronicles
The Right Coast
Schneier on Security
SCOTUSBlog
Security Dilemmas
Sentencing Law and Policy
Simple Justice
Sivacracy.net
The Situationist
Susan Crawford
TalkLeft
Talking Points Memo
TaxProf Blog
TeachPrivacy Blog
Tech & Marketing Law
Truth on the Market
Volokh Conspiracy
WorkPlace Prof Blog
WSJ Law Blog
Wonkette
The Yin Blog


© Concurring Opinions

Powered by WordPress