Is the secret to (big) data success collect once and use many?

swiss_army_knife_512373Many moons ago, I used to write code, badly. I learnt to program in COBOL, PL1 and JCL (Job Control Language). I then moved on to programming Java which promised portability, reuse and “write once, run many”. I’ve spent the last few weeks talking to a lot of Splunk customers and it struck me that the companies having the most success and making the best case for value of big data are the ones who are using the same data for multiple purposes. It got me thinking if there is something in that promise of Java many years ago that we can learn from with big data. Is the secret for big data success “collect once and use many”?


I was at the Gartner IT Operations Management event this week in Berlin and we were lucky to have a customer, Datev, present. The spoke about how they are capturing, indexing and storing 400GB a day and plan to be at a TeraByte by the end of the year. One of the parts of their presentation was the multiple uses they get from the data. This includes IT incident investigation, improving customer service and potentially looking at using it for security in the near future. We’ve seen this a lot with other customers all over the world.

If “collect once and use many” is the secret to getting the value from big data then this raises some interesting things to consider:


Avoiding silos of big data.

If you are healthily sceptical and it seems like everything is becoming a big data problem, then how do we make sure we don’t create a big data silo for security, IT ops, customer information etc. All we’ve done then is replicate the problem we’ve had with a different databases for each department at a much larger scale and suddenly we need a big data MDM solution. I haven’t decided if I like the term “data lake” yet – it implies something that is a bit out of control, “data reservoir” is a term I’ve heard that implies something a little bit more managed. From what I saw from Coca Cola at last year’s .conf there is benefit from having all your data in once place, removing the silos and allowing the data to be used for many different uses. There is also a change in perception that needs to accompany this. If you can get all your data in once place then it becomes your multi-faceted system of record. If you get the data in one place then the right platform acts a kind of prism to allow to use the same data for many uses



Getting value for many use cases from the same data

So if we’re trying to create a data lake or reservoir then we need to understand how to get the value from it for a wide range of different things. The demands placed on data from a variety of use cases can be very different but there is a lot that is common. As an example, using data for security, IT ops and customer experience all share some common requirements. These include searching through data, spotting patterns, getting notified and creating data visualizations and analytics. The “business outcome” for each use case might be very different but the techniques and methodology are similar. The best analogy I can come up with is Thunderbird 2 and its “pods”. Sometimes you’re going to need “The Mole” for drilling, sometimes Thunderbird 4 for underwater, sometimes you need “Firefly” for dealing with towering infernos. If your data and a decent data platform are Thunderbird 2 then sometimes you need a different “pod” based on the situation you face.

Giving many different people access

They say one of the keys to a good presentation is knowing your audience and the hardest presentations are when you have a mixed audience to cater for. It is the same with your data. If the goal is to collect once and use many then giving the right information and the right way of engaging with data to a wide range of people is hard but very important. Some of this is the technical way you make data available. This could be APIs for developers, a search language or query tool for ops to investigate or self-service analytics for a business user. Think of giving your data a Babelfish so everyone can understand it, whatever “language” they speak. Another consideration is making the case for “why” data should be shared inside an organisation. Even if it is technically possible, the communication and explanation of why people should share data is often overlooked and hence becomes an obstruction. One of the best ways of overcoming this is to explain the value that everyone gets back by sharing the data and technology is often a great enabler of sharing this value-added data across an organisation. Hopefully sharing your data and giving everyone the ability to understand it isn’t this painful:

From what we’ve seen at Splunk, one of the keys to big data success is finding the value from the data and “collecting once and using many”. Think of Thunderbird 2, a prism and a Babelfish and you can’t go too wrong…


(Note: this post originally appeared on