Everything is Miscellaneous

At the Open Media Camp in Denver yesterday, Kevin Reynen (kreynen) got things off to a great start with Overview of Metadata Standards for Video – Why doesn’t it work like a Library? We discussed the difficulties of even agreeing on a standard set of genres for tagging video, from how to decide what genres to include in a taxonomy to the user interface in presenting those options to editors. (For instance, he said that his experience with the standards used at PEGMedia.org is that editors often use the Action genre by default, which is listed first in the listing, rather than scrolling through the hierarchy of available choices.)

I was reminded during this discussion of a book I read last summer, Everything is Miscellaneous: The Power of the New Digital Disorder, by David Weinberger. I mentioned the book, and got a few nods and a few shakes of the head. Seems quite a few people have looked at these issues from a lot of directions…

Kevin mentioned a meeting he facilitated with a group of librarians, whom he hoped would have a magic bullet for the question. After presenting the dillemma, he turned it to them, but got answers ranging from maintaining a simple list to “use the Library of Congress’s classification”.

I liked David Weinberger’s approach in Everything is Miscellaneous. He discusses the inherent difficulties in classifying information, and the centuries-long struggles to achieve the perfect classification. This history has ranged from the revolutionary idea in the middle ages of ordering books alphabetically, to the Library of Congress’s flexible classification, to the Encyclopedia Britannica’s approach of using editors to decide what information is worthy enough to even classify, to Melvil Dewey’s entrepreneurial and astoundingly successful attempt to force public libraries to adopt his Christian-biased taxonomy (which gives 9% of the entire taxonomy to Christianity, 0.1% to Islam, and a mere 0.01%, at the austere classification of 294.3, to represent Buddhist literature from over 2000 years of a large portion of the world’s population).

Weinberger goes on to show how there are basically three orders of information. The first order is atoms, the physical world, and is the challenge faced by brick and mortar store fronts, such as Best Buy needing to determine what items to place at the front of the store, and to arrange things in such a way as to make it relatively easy to find that particular ink cartridge for your printer (but still be tempted to buy that worthless Microsoft software placed on the way back to the cash register).

The second order of information is likened to library card catalogs. This is also Drupal’s basic core taxonomy, where you create and populate a vocabulary. This is the focal point of Kevin’s discussion, where video librarians struggle to agree on a set of genres to offer their editors. This is merely the latest battle in an old political war, that sees its roots in medieval universities and has had no good resolution. Whomever can force their system of classification on the world can control the information, and that’s a dangerous power, whether it’s a mainstream newspaper deciding to highlight a slanted story on its front page while not even publishing an article that might be detrimental to one of its advertisers, or whether it’s Blockbuster fooling the public into believing that all movies can be neatly classified into Action, Comedy, Drama, Family, Horror, and Foreign.

The third order of metadata is what the Internet has brought into being, exemplified by Drupal’s folksonomy. This allows everyone to classify their own information, and compare that with others’. Free-tagging, whether it’s a comma-separated list, or YouTube’s foolhardy space-separated list (what’s the point of a full taxonomy term for “the“?). This idea is also at the root of Amazon.com’s recommended buys, and for the Google Summer of Code’s
Making Drupal Smart: The Recommender Bundle

I’ve decided that I’ll remain neutral in this particular battle for control of video genres. However, I also recognize the desperate need for a sensible and consistent system of storing file metadata. This morning, during the Media Sprint, one of our first orders of business will be to implement a Meta-Data Plan and Structure for Drupal’s Media module. This will create a simple table of key/value pairs by fid, and allow implementing modules to decide how to fill those pairs. This will allow any meta data to be easily exposed, such as with Views or in an XSPF play list, and also allow them to be consistently presented to editors, so that any video, whether a local video from a cell phone, a flash video, or a stream from YouTube, can be tagged meaningfully, regardless of the final editorial decisions.