Everything is Miscellaneous

At the Open Media Camp in Denver yesterday, Kevin Reynen (kreynen) got things off to a great start with Overview of Metadata Standards for Video - Why doesn't it work like a Library? We discussed the difficulties of even agreeing on a standard set of genres for tagging video, from how to decide what genres to include in a taxonomy to the user interface in presenting those options to editors. (For instance, he said that his experience with the standards used at PEGMedia.org is that editors often use the Action genre by default, which is listed first in the listing, rather than scrolling through the hierarchy of available choices.)

I was reminded during this discussion of a book I read last summer, Everything is Miscellaneous: The Power of the New Digital Disorder, by David Weinberger. I mentioned the book, and got a few nods and a few shakes of the head. Seems quite a few people have looked at these issues from a lot of directions...

Kevin mentioned a meeting he facilitated with a group of librarians, whom he hoped would have a magic bullet for the question. After presenting the dillemma, he turned it to them, but got answers ranging from maintaining a simple list to "use the Library of Congress's classification".

I liked David Weinberger's approach in Everything is Miscellaneous. He discusses the inherent difficulties in classifying information, and the centuries-long struggles to achieve the perfect classification. This history has ranged from the revolutionary idea in the middle ages of ordering books alphabetically, to the Library of Congress's flexible classification, to the Encyclopedia Britannica's approach of using editors to decide what information is worthy enough to even classify, to Melvil Dewey's entrepreneurial and astoundingly successful attempt to force public libraries to adopt his Christian-biased taxonomy (which gives 9% of the entire taxonomy to Christianity, 0.1% to Islam, and a mere 0.01%, at the austere classification of 294.3, to represent Buddhist literature from over 2000 years of a large portion of the world's population).

Weinberger goes on to show how there are basically three orders of information. The first order is atoms, the physical world, and is the challenge faced by brick and mortar store fronts, such as Best Buy needing to determine what items to place at the front of the store, and to arrange things in such a way as to make it relatively easy to find that particular ink cartridge for your printer (but still be tempted to buy that worthless Microsoft software placed on the way back to the cash register).

The second order of information is likened to library card catalogs. This is also Drupal's basic core taxonomy, where you create and populate a vocabulary. This is the focal point of Kevin's discussion, where video librarians struggle to agree on a set of genres to offer their editors. This is merely the latest battle in an old political war, that sees its roots in medieval universities and has had no good resolution. Whomever can force their system of classification on the world can control the information, and that's a dangerous power, whether it's a mainstream newspaper deciding to highlight a slanted story on its front page while not even publishing an article that might be detrimental to one of its advertisers, or whether it's Blockbuster fooling the public into believing that all movies can be neatly classified into Action, Comedy, Drama, Family, Horror, and Foreign.

The third order of metadata is what the Internet has brought into being, exemplified by Drupal's folksonomy. This allows everyone to classify their own information, and compare that with others'. Free-tagging, whether it's a comma-separated list, or YouTube's foolhardy space-separated list (what's the point of a full taxonomy term for "the"?). This idea is also at the root of Amazon.com's recommended buys, and for the Google Summer of Code's
Making Drupal Smart: The Recommender Bundle
.

I've decided that I'll remain neutral in this particular battle for control of video genres. However, I also recognize the desperate need for a sensible and consistent system of storing file metadata. This morning, during the Media Sprint, one of our first orders of business will be to implement a Meta-Data Plan and Structure for Drupal's Media module. This will create a simple table of key/value pairs by fid, and allow implementing modules to decide how to fill those pairs. This will allow any meta data to be easily exposed, such as with Views or in an XSPF play list, and also allow them to be consistently presented to editors, so that any video, whether a local video from a cell phone, a flash video, or a stream from YouTube, can be tagged meaningfully, regardless of the final editorial decisions.

Fred Gooltz wrote 46 weeks 4 days ago

This taxonomy strife between library scientists reminds me of the David Foster Wallace essay “Authority and American Usage.”

It's a fifty-plus page treatise on grammar and the 'Usage Wars.' In reviewing Bryan Garner’s 'Dictionary of Modern American Usage', Wallace traces the history of American linguistics and patiently delineates the relationship between our politics and language, without ever belaboring the point (like Orwell occasionally did), to show how the language we speak informs the things we believe, and vice versa.

It begins:

Did you know that probing the seamy underbelly of US lexicography reveals ideological strife and controversy and intrigue and nastiness and fervor on a near-Lewinskian scale?

For instance, did you know that some modern dictionaries are notoriously liberal and others notoriously conservative, and that certain conservative dictionaries were actually conceived and designed as corrective responses to the "corruption" and "permissiveness" of certain liberal dictionaries? That the oligarchic device of having a special "Distinguished Usage Panel ... of outstanding professional speakers and writers" is an attempted compromise between the forces of egalitarianism and traditionalism in English, but that most linguistic liberals dismiss the Usage Panel as mere sham-populism? Did you know that U.S. lexicography even had a seamy underbelly?

Originally published in Harper's in 2001, the essay was titled "Democracy, English, and the Wars over Usage" but reprinted for his essay collection, it became "Authority and American Usage" -- cutting to the heart of it all. Indeed, the sliding scale of authoritarianism is the main variable in American party politics. But that's a post for a different blog...

Contact Us

About Aaron Winborn

Aaron Winborn was Advomatic's first full time hire in 2006, and is a very active leader in the Drupal community. His first book, Drupal Multimedia is now available from Packt Publishing.

Advomatic on Twitter