Monthly Archives: February 2012

What is Big Data ? – another sterile BI debate ?

There is a lot of talk about “Big Data” at the moment, but, as always, it is difficult to really get to the bottom of what it is all about. This was demonstrated on Twitter this morning during a conversation between Steve Lucas and Timo Elliott from SAP.

Steve Lucas: Wondering if people agree on the definition of the term “big data”. If anyone is willing can you direct message me your definition?

Timo Elliot: @nstevenlucas reality is that #bigdata term was created to talk about #Hadoop, etc. But everybody has interpreted largely, no “true” def.

In my mind, the issue is that the use of the term, “Big Data” is most wide-spread (and most abused) in the marketing of products and services. So, as Timo hints, every “definition” is slanted towards the particular marketing whims of a particular vendor. A good example of this is the recent press release from Reardon Commerce, which talks about analysis of punctuality data from 67M US airline flights as “Big Data”.

I am not sure that this data really counts as “Big Data”. I have not thought about this deeply, but I am relatively sure that with a copy of SQL Server Express and a moderately powered laptop you could get to all the same headline conclusions highlighted in the press release (e.g. this is historically the worst day to travel, this day is not as bad as you would expect, …) in a matter of hours or days.

That is not to say that this analysis is not useful. Significant insight can come from analysis of small amounts of data, the point I am making is that labeling this as “Big Data”, is probably not that helpful in terms of understanding what the term means.

In fact, with “Big Data”, I think we have quickly reached the point we often get to with technology definitions (as Timo and I have discussed here and here), where the term actually does more harm than good, because the sterile discussion of what it means wastes time and gets in the way of focusing on what really matters.

As always, the key is to approach things from a business, rather than a technology angle. All you need to be aware of is that there are new, maturing technologies which allow you to analyse unparalleled volumes of data. If you have a lot of raw data and a business problem you think it will solve then these technologies might be for you. But, ALWAYS work from the business requirements down and NEVER ask (or worry) if a particular problem is an example of “Big Data” or not.