As I sit here writing up my final post on media metadata, it hits me… one of my least favorite tasks of writing a new blog post. Tagging.
Why does search have to be such an afterthought? Why is it that the operating system has to have such deep insight into binary file types? Because nobody ever had the foresight to say that people would want to index everything – and that only odd content would be unindexed.
I had lunch at Terra Burger a few months ago not long before I started writing this series of posts. I bumped in to a parent who, like I, was trying to get used to using his new Mac to edit video and photos. He had been using the Faces component of iPhoto, and was disappointed at how it confused photos of his 1 year old son with photos of his 4 year old son that were 3 years old. Funny, isn’t it – how “search” indexes can be thrown off by just failing to take one metric (the age of the photo) into account? The other glitch was, as I have recently found, that iPhoto is handy, but that Faces – like all other postscript indexing (indexing after content generation has been completed) is a pain in the butt and consumes time that I sure as heck don’t have.
I think that the future lies in devices that are more indexing aware – GPS-savvy cameras, camera phones (the iPhone did a great job of schooling the industry here), video cameras, and more. Content itself will grow to be more natively indexable. MIT Technology Review had an interesting – but fundamentally flawed – article a few months ago discussing “open video”. While I will be discussing video technology more in the future (given my new job), it’s important to bring up this article because I believe that while the technology just doesn’t work the way the author defined here, that it does highlight something that does need to happen. That is, that video content needs to become innately searchable. Not just by Google, but on your local computer, on Facebook, everywhere. The fact that content is in a binary compressed streamed form (making it inherently hard to decompile) the content itself should instead provide for a first-tier indexing experience by instead promoting it’s own manifest of what the content is. Inherent content indexes (name, date, generator) explicitly defined indexes (creator, location, participants) and content (objects, places, definitions, categories and tagging, a storyline/scene flow, and of course a good transcript of any spoken word – often relatively easily done via speech recognition – albeit with the potential for flaws). Just as interesting is the relationship of this snippet of content with any other pieces of content created in the same medium, at the same time, etc. The genetics of content are just as important. Knowing that the tiny 250KB JPG came named f34af3.jpg came from the original image D642242.jpeg offloaded from your camera on 12/4/2005 – all of that is useful information. Just as crucial? Finding duplicates in a useful way and sharing that data across content consumers (spouses, grandparents, aunts and uncles, etc).
Think about it – today, search indexers must go out of their way to juice those pieces of data out of a chunk of content. If all content exposed its metadata in a uniform, easily consumed way, anyone could index it – Google, Spotlight, Microsoft Desktop Search… anyone.
Indeed the future of content search and metadata success lies not upon better search – but rather in better metadata exposure, and more of that being populated and published as automatically as possible.
People want to find things. People don’t want to make them searchable.