Critical Mass

Questions for Wordnik’s Erin McKean

By Jane Ciabattari

Wordnik is a combo dictionary, thesaurus, encyclopedia, and OED—self-dubbed, “an ongoing project devoted to discovering all the words and everything about them. More than 1.7 million words, and more than 130 million examples.” It’s interactive, with readers uploading pronunciations, definitions, words.

As founder and CEO Erin McKean puts it, “We hope Wordnik answers the question ‘How is this word used?’ That’s what we’re trying to show—real information about real words and how they are used.” Wordnik includes contextual sentences from Twitter and Flickr, as well as Scrabble points for each word. Wordnik also gives a statistical bubble chart telling how many times a word has been used over the past year. (Here’s how that complicated maneuver gets explained on Wordnik’s blog.)

McKean says the genesis of Wordnik goes back to the 2007 TED conference in Monterey. She gave a talk there. “After meeting Roger and Ann McNamee at TED we talked for the better part of a year before the idea ‘expand the dictionary’ turned into ‘create a start-up to make a better online dictionary experience.’  We incorporated on Leap Day, 2008, had an alpha release that fall, a closed beta a bit after that, and an open beta this past June! We’ll be in beta for a bit longer.”

Wordnik got a shout-out from the TED blog shortly after going open beta: “Wittgenstein would love this…”

McKean and several of her Wordnik team cut their teeth at Oxford University Press. She has 17 years of experience as a dictionary editor, was editor in chief of OUP’s American Dictionaries for a while, and then a Consulting Editor. Grant Barrett was Project Editor for the Historical Dictionary of American Slang (he’s also co-host of the radio show A Way With Words), and Orion Montoya, Wordnik’s chief computational lexicographer, worked on the Oxford English Corpus. “It’s a bit funny to go from a place with 400+ years of tradition to a brand-new company where you have to invent the way you do things,” McKean notes, with a tip of the hat to her tech team—Tony Tam, David Wu, Heather Rivers, and Andy Stanberry. “Plus we have a Corpus Librarian, Mary Mark Ockerbloom, and a Corpus Technician, Tim Allen, who help us find our text sources for the examples.”

How does Wordnik “vet” entries? “All the definitions now on Wordnik are from established dictionaries: The American Heritage 4E, the ten-volume Century Dictionary (which we had digitized & tagged), the GNU Cide Dictionary of English (Websters 1913), and Wordnet. We also have two thesauruses, the Rogets II International and Allen’s Synonyms and Antonyms. The example sentences are pulled from everywhere, and sorted so that the most useful sentence we have comes to the top. We’re constantly refining our sorting techniques and adding new data.”

And what kind of interaction are they expecting?

“We hope users will add helpful notes (e.g. ‘I hear this word used mostly in Northern England,’ or ‘This word sound disgusting!’) that we’ll then rank and display according to their usefulness, too. But the fun and easy things for users to do are many—mark a word as a favorite, tag a word (someone’s tagged a bunch of words with ‘theprincessbride’!), and record your own pronunciation for a word.”

Where is Wordnik headed? “My dream is pretty much what it was when I was eight: to have as much information about as many words as possible available to as many people as possible. That’s the perfect dictionary. The best thing is, that we’ll never be done—there will always be new words to discover and describe.”