Wednesday, September 07, 2005

The Supreme Swiss Army Knife of the English Word Kit

If you plan to use "colubriform" in public you'd best devote fifteen minutes to making sure it really means what you want it to.

Hugh Kenner

In a 1986 essay* about his messy desk and G. K. Zipf's principle of least effort, Hugh Kenner made some interesting observations about the "carpentry" of texts. Chapter X of Henry James' The Ambassadors consists of 2339 individual words, but only 665 different ones. "The 665 words," he notes, "seem a meager resource. Spread evenly over the pages like soft margarine, they'd turn up with dull uniformity, each one just three or four times." But this, of course, is not how a text is made. Indeed, Kenner reports that "fully 75 percent of the chapter's carpentry is done with a mere 176 of the different words that went into it--only 27 percent of its vocabulary." That is, of the 2339 words that comprise the chapter, 1754 are uses of only 176 different words (86 "the"s, 72 "you"s, 62 "to"s, etc.). By the time this core part of his vocabulary has been used, he has 585 words left in his chapter to write but wholly 489 words left in his vocabulary, which are "available for special effects." Needless to say, he will use most of these words only once, as when the fire has "burnt down to the silver ashes of light wood", where four words are used that are not used in the chapter again.

Kenner draws our attention to a number of seemingly lawlike regularities of this kind about the texts we write. As a rough approximation, we can expect 80 percent of the work in a text to be done by 20 percent of its vocabulary. (This 80-20 rule has been claimed to count for everything, including the kitchen sink. "The one-time words resemble those kitchen gadgets you must rummage for because you want them so seldom." 80 percent of the cooking gets done using 20 percent of the kitchen: a reasonable hypothesis.) Hugh Kenner contributes his own law, derived from a study of Shakespeare: 40 percent of the plays consist of 40 different words. This, he says, will be true of "any extensive text sample".

So we can expect it to be true of academic texts as well. Indeed, I expect to do some statistics of my own in weeks to come on the texts I read and edit. I imagine that we can learn something important about those first 40 words (that do 40 percent of the work) and that first 20 percent of a text's vocabulary (that does 80 percent of the work).

But the thing I want to emphasize is that of the different words you use when writing 80 percent of them will be used very rarely and over half of them will be used only once. These rare words are cognitively more expensive to use because they are more difficult to find (fifty-dollar words, Kenner calls them). It is therefore well worth the effort to make a list of words that you regularly use only once or twice in a text (there will be hundreds of them). I suspect that these words are the ones that define your discipline and your area of specialization. Without them your text would have none of the "special effects" that display your knowledge.

Kenner liked to draw attention to the ease with which dictionaries define difficult words and the difficulties they face in defining easy ones. His favourite example may have been the word "set",

the supreme Swiss Army knife of the English word kit, handy in any thinkable context--get set to set the table with the dinner set, set the alarm so we can set out early, and set things up so we'll not be upset by a prowler but can set our teeth and set a dog on him--the Oxford entry was thirty years in the pondering, forty days in the writing, and ran to two-thirds the length of Milton's Paradise Lost.
By contrast, James Murray, the chief editor of the original OED wrote the entry for "Dziggetai" on Christmas Eve, 1896, "while his wife watched."**

*Kenner, Hugh. "The Untidy Desk and the Larger Order of Things", originally published in Discover magazine, 1986, and reprinted in Mazes (North Point Press, San Fransisco, 1989).
**Reported by Kenner in his review of the Oxford American Dictionary for Harper's in 1981, reprinted in Mazes, p. 83.

1 comment:

Andrew Shields said...

Kenner also reviewed the second edition of the OED in the NY Times Book Review sometime in the 80s or 90s, and he repeated the point about "set" (I know he did, because I've been quoting the point for years!). But in the second edition, "set" had become four-fifths as long as PL!