Analytics Staffing for Big Data: A Perspective

Couple of weeks ago, we talked about the need to appropriately invest in people, when you invest in technology.  I wanted to continue the discussion and focus on the new area of “Big Data” – more specifically the analyst who works on big data – the “Data Scientist” and the data analyst.

I love the term “data scientist”.  It has finally made the data junkie’s job title more glamorous.  It has given both name and fame to the role.  Well everyone is talking about “big data”.  Many organizations think  hiring a data scientist is requirement for solving all “big data” problems and the only analyst required with a big data problem are data scientist. If you have invested in…

» Continue reading

Forecasting Cloud Analytics

Looking forward to being on a panel at the upcoming Cloud Analytics Conference on April 25 to represent Splunk and opportunity of mining big data for the enterprise.  Will be contrasting Business Intelligence with Operational Intelligence.

During my career I’ve been around for the dramatic growth of the market for BI tools and now BI services.  In the beginning of the BI era, large capital projects were necessary to deliver needed functionality, as the industry for BI was still reaching maturity, and it would be some time before these processes were made more streamlined, and the data democratized. At this point, in the new millennium, the majority of CIO’s I know embrace BI solutions that…

» Continue reading

Some BIG DATA this way comes… (or is already here)

Some time ago, in a company not too far away, I woke up with unstructured data on my mind, thinking about ways to correlate real-time clickstream information with my existing customer base. The internal IT folks said that this was not possible, at least in a timeframe that would help my decision-making.  This merely increased my fascination with people interacting with all these layers of technology in unpredictable ways, defying the cry of the old-school database administrator:  “that which exists must fit into my predefined schema!”   The marketer in me just wanted to make sense of it all, but I was at a loss at how to get around that schema requirement and let my data crayon wander outside the…

» Continue reading

Hong Kong Chief Executive Election 香港行政長官選舉

An election will be held on 25 March 2012 to select the Chief Executive of Hong Kong. There are three nominees, says, Albert Ho (何俊仁), Henry Tang (唐英年) and Leung Chun Ying (梁振英) to compete as the next Chief Executive of Hong Kong.

In the internet world, there is also a large amount of discussion in different social networks such as Facebook, Twitter or Weibo. We can use splunk to do some interesting analytics. 1) Calculating which nominee has the most tweet/retweet.  2) Analysis the daily distribution of all tweet within 24 hours. 3) Top 20 unique top tweets. 4) Top Re-tweeters  5) Top Topics

You can drill down to see any interesting Tweet by just clicking the…

» Continue reading

Big Data Thoughts…

It happens to me quite a bit that I hear a song and then it keeps playing in my head.  My 4 year old is notorious for singing the same song over and over and then I find myself humming during my long train ride to work.

Sometimes, it happens at work – you hear a thing and you keep hearing about the same thing in almost every conversation.  I am sure you have had those times too.  A number of you will have had days or weeks when you have had some discussion on “big data”.

For the last three weeks, I have had number of conversations on the topic of big data.  Strata,

» Continue reading

Simple Splunking of HDFS Files

om nom nomThere’s something to be said about the power of command line interfaces. For simple things, they are simple. For complex things–well, maybe not so simple. Fortunately, I have a simple problem: I want to index a single file from a Hadoop Distributed File Sytem, HDFS. To do this, I’ll use the CLI for both Splunk and Hadoop.

There are a few things we want to take into account when we index a file. Normally, indexing a log file in Splunk means creating an input to “monitor” that file. This enables you to not only index the file’s current contents, but also index subsequent appends. However, the contents of an HDFS are typically historical files, so in this case, I don’t…

» Continue reading

Splunk and the Cybersecurity Act of 2012

“The United States confronts a dangerous combination of known and unknown vulnerabilities in the cyber domain, strong and rapidly expanding adversary capabilities, and limited threat and vulnerability awareness.”[1]

I recently listened to the final set of hearings on The Cyber Security Act of 2012. The bill was developed, “…in response to the ever-increasing number of cyber attacks on both private companies and the United States government.” The bill is really about critical infrastructure protection as may be managed, owned or operated by either the government or the private sector.  It’s a bi-partisan bill and combines efforts from past sessions from the Senate Committees on Commerce, Homeland Security and Governmental Affairs, and Intelligence Committees. The bill would empower the Department…

» Continue reading

Big Data and Cross-Channel Data Integration

I have been an avid follower of eMarketer research articles for many years.  eMarketer does a great job at bringing research and marketing trends for the evolving area of digital marketing.  The recent research about cross channel data integration brings some of the pervasive challenges for cross-channel integration.  Many of us talk about the challenges with multi-channel  or cross-channel – need for effective analysis, improving targeting,  difficulty in bringing data from various sources together.  Well how hard it is depends on who you ask.  Thanks to Axciom and Digidays Nov 2011 survey and analysis by eMarketer.  The insights from the survey and analysis provide a better visibility of how successful organization are with cross-channel data integration…

» Continue reading
Dev:

Data, Best Used By…

To state the obvious, “Big Data” is big. The deluge of data, has people talking about volume of data, which is understandable, but not as much attention has been paid to how the value of data can age. Instead, value is often actually not just about volume. It can also be thought of as perishable.

Perishability

When we think about the perishability of data, we find all sorts of every-day examples around us. When we pick up daily newspaper, the headlines catch our attention. The value is in the recentness of the data. Why we call it “news.” A year old newspaper in comparison is usually useful for starting a fire, or lining a bird cage. The…

» Continue reading
Dev:

Introducing Shep

These are exciting times at Splunk, and for Big Data. During the 2011 Hadoop World, we announced our initiative to combine Splunk and Hadoop in a new offering. The heart of this new offering is an open source component called Shep. Shep is what will enable seamless two-way data-flow across the the systems, as well as opening up two-way compute operations across data residing in both systems.

Use Cases

The thing that intrigues us most is the synergy between Splunk and Hadoop. The ways to integrate are numerous, and as the field evolves and the project progresses, we can see more and more opportunities to provide powerful solutions to common problems.

Many of our customers are…

» Continue reading