Creating Value from Big, Messy Data
Information is power, so the saying goes. Today, decades into the digital age, a transition that enabled the collection and creation of data on a previously unknown scale, we have enormous data sets. These sets are growing today at a monstrous pace, even relative to the digital age, with social networks, blogs and self-publishing, and the data being collected by mobile phones. We also for the first time have the ability to cheaply (“bootstrapped startup” levels of cheap) use technology to search and find patterns in these datasets, opening opportunities to creating new markets by disrupting existing markets. An article from the FT, excerpted below, provides an overview of these issues. I am going to dive much deeper into this as it has personal relevance for my current project.
If information is power, harnessing the increased information available provides opportunities for unprecedented value creation/disruption/redistribution.
Excerpt below (bolding added):
“While “big data” has become the buzzword, a better description would be “messy data”, says Roger Ehrenberg of IA Ventures, an early-stage investor. Harvesting, cleaning up and organising raw data in a way that it can be processed is a large part of the battle, he says.
This has been complicated further by the big growth in unstructured data – information, such as text, that is not organised in a way that a computer can easily process. With the volume of user-generated text and video growing rapidly, this has become one of the main focuses of technological development.
Chief among the new tools are natural language processing, which enables a computer to extract meaning from text, and machine learning, the feedback loops through which computers can test their conclusions on large amounts of data in order progressively to refine their results.
Subjecting large data sets to analysis has also been made easier by two of the forces that have reshaped information technology more widely: the spread of low-cost, standardised computer hardware and the emergence of open-source software.
This has created a cheap computing platform for new technologies such as Hadoop – a piece of software architecture that is designed to handle massive amounts of data. The idea was based on breakthroughs at Google, which needed to find ways to conduct large volumes of intensive web searches simultaneously. It has since been taken up by companies including Facebook and Yahoo.
The rise of cloud computing – which centralises storage and processing power in larger data centres – has also brought big data within the reach of more companies. By tapping into the cloud computing services offered by Amazon, say, a company such as Color can get instant access to all the analytical power it needs without needing to take on the fixed costs of buying its own servers, says D.J. Patil, chief product officer at the IT start-up.
For business leaders, “the big skill in future will be to ask the right question”, says Tim O’Reilly, a technology commentator and publisher.
Besides smartphones, new sources of data include social networks, blogs and other sources of user-generated content; sensors collecting everything from traffic patterns to a user’s heart rhythm; and click streams generated by people spending an increasing amount of their lives online.
Much of the information is in unstructured form. It has never been collated in a traditional relational database, where it could be queried at will. Without techniques to harvest, verify and analyse it – often in real time – valuable commercial signals are lost in the noise.
It sometimes takes the analysis of massive data sets to detect useful patterns, says Michael Olson. His California start-up, Cloudera, is commercialising the type of technology used by companies such as Facebook and Yahoo to crunch through vast bodies of information. Retailers, for instance, might learn far more from the 10 years’ worth of customer data they can now analyse in one go than from the more limited runs to which they were once restricted, he says.”