Emerging Technologies: #5 - The Big Data Problem…. haven't we been here before?

This post is not so much about a particular technology or invention, but more about a concept and problem. A concept and problem that we have been trying to solve for the past 5,000 years. Enjoy.

I hear a lot about big data. I'm sure everyone has. My customers bring it up in meetings, co-workers and classmates have discussions about the topic, and soon the news will cover it as the next biggest threat to society. IBM even had a commercial on the topic two years ago during prime-time television hours. So when one of my favorite magazines, Popular Science did a weeklong series on the topic, I thought.....maybe it’s time I investigated the subject. After reading all of the articles for the week I began to realize a couple of things:

Data management has always been an issue
Our current data issues are not as bad as we think; the sky is not falling
We will always have a big data problem (unless we reach the maximum – yes there is a maximum)

I say data problems have always been an issue because if you look at the history of library science, you will realize that our ancestors faced and tackled similar problems. One of the articles from Popular Science gave an excellent timeline of the various data management tools and techniques that humans used to address the problems of their day. Whether it was the creation of written language to capture ideas, the Dewey Decimal System to manage the searchability of large public libraries during the library explosion of the late 19th century, or the creation of a species classification system to manage the naming structure and lineage of organic life, there has always been a need and system for data management.

So when we think of today and look at the rise and capturing of massive amounts of computer generated information, we're faced with the same problem as our ancestors; how do we organize this material for easier consumption? Currently, there are many ways to do it and the one that has the most traction at this moment is actually over 2,000 years old; data sharing. The concept is simple; rather than recreate a new system or structure to house all the information, why not just combine the information through a shared network. Several great examples of data sharing these days which have addressed the big data problem for their respective industries/fields are listed below:

· The Combined DNA Index System

· The Encyclopedia of Life

· The Food and Agriculture Organization Database

· The Genographic Project

· The International Panel on Climate Change's Data Distribution Centre

· The MD:Pro

· OKCupid's OKTrends

· Sloan Digital Sky Survey Database

· The Wayback Machine

· WorldCat

In looking at other methods which are currently being utilized to solve the big data “problem,” it appears that the unstructured approach for data management is also proving to be promising. Because we’re generating so much data today, maybe it’s now become impossible (and not worth our time?) for us to classify every bit of information and data. I know this point will probably go against the thinking of data architects and data minded individuals, but the trend appears to be shifting away from the traditional data structured environments whose genesis was in the 70’s and 80’s.

In reviewing the various technologies and trends for data management, one should not forget that there really isn’t a “problem” or “crisis” in data management as you may have been led to believe. If you really want to know more about the big data problem, look deeper into the timing of the IBM commercial announcing the “problem”; you’ll notice that it coincided with the release of their big data product called InfoSphere…..interesting.

Lastly, we need not worry or fret about the big data problem because as history has taught us; data management is an iterative process and one which never stops….. or does it? An interesting question and answer: When will we ever catch up to our big data problem? When we run out of space. Yes, there is a maximum amount of data space in the universe and it’s estimated to be 10^90 bits. If you’re interested why, then I would suggest looking down the data rabbit hole of information theory. Be careful, there’s a lot of data to process.

Emerging Technologies

Saturday, November 5, 2011

#5 - The Big Data Problem…. haven't we been here before?

No comments:

Post a Comment