Hadoop’s rise to fame is based on a fundamental optimization principle in computer science: data locality. Which translated to Hadoop speak would be: Move computation to data, not the other way around
In this post I will rant about one core Hadoop area where this principle is broken (or at least not implemented yet). But, before that I will highlight the submission process of a MapReduce job that processes data residing in HDFS:
On the client: 1. gather all the correct confs, user input etc ... 2. contact NameNode to get a list of files that need to be processed 3. generate a lists of splits that need to run Map tasks on, by: 3.1 for each file returned in
The Story of Buttercup, the Splunk Pwny
You may have noticed that we’re quite fond of ponies here at Splunk. Many have asked what the connection is, so I sent around the story below a while back. Enough people keep asking that we decided to share with a wider audience… Enjoy:
Back around the middle of 2006, engineering already had a large backlog of fixes that needed to be made to the codebase – removing the use of various open source projects, writing our own libraries that would run on more platforms, etc. It was well understood that some of these projects would be pretty nightmarish – someone would have to be dedicated to them full time…
Splunk Joins Public-Private Partnership to Improve Cybersecurity
Last week Splunk joined several other companies at U.S. NIST’s signing ceremony symbolizing our participation and partnership in the National Cybersecurity Center of Excellence (NCCoE).
There’s no doubt that there is a critical need to protect private-sector intellectual property and other valuable business data from a growing number of cyber threats. This partnership illustrates our commitment to the spirit of collaboration while providing real-world cybersecurity capabilities that address business needs.
The NCCoE has three key goals:
- Provide practical cybersecurity – Help people secure their data and digital infrastructure by equipping them with practical ways to implement cost-effective, repeatable and scalable cybersecurity solutions.
- Increase rate of adoption – Enable companies rapidly adopt commercially available cybersecurity technologies by reducing their total
The Splunk SDKs for Ruby and C# are now in Beta
The Splunk SDKs for Ruby and C# have reached Beta! Developers familiar with Ruby and C#/.NET can now easily leverage their existing skills to integrate data and functionality from Splunk with other applications across the enterprise, letting the entire organization get more value out of Splunk. Do you have an existing reporting app or customer support system that would benefit from being able to search and display data from Splunk? Want to build a .NET or Ruby app powered by Splunk data? Then these SDKs are for you. As Beta releases these SDKs are now fully supported, customers with support contracts are covered for any questions about the Splunk SDKs for C# and Ruby.
- Download the
Modular Inputs Tools
And so it is with software. Languages, libraries, frameworks are just tools that make it easier for us to accomplish some task.
With the release of Splunk 5 came a great new feature called Modular Inputs.
Modular Inputs extend the Splunk framework to define a custom input capability.In many respects you can think of them as your old friend the “scripted input” , but elevated to first class citizen status in the Splunk Manager. Splunk treats your…
Enabling Splunk as a Windows Domain User with Group Policy
Many times, we develop Windows-based apps (for example, the Splunk App for Exchange or the Splunk App for Active Directory) without special privileges. We recommend installing the Universal Forwarder on the target system with system-level privileges, which has all the necessary rights we need. Sometimes, we come across situations where we need to install Splunk with domain privileges. If you have set up WMI-based remote audit log collection, then this applies to you. Recently, we found that some of the upcoming apps needed domain privileges, so we set about researching exactly how this could be accomplished through the application of group policy in an Active Directory server. We learned that, although the process is long-winded,…
A macro to give a human readable time to each event, like “earlier today” or “last month”.
Using Splunk for Computer Forensics
I was talking to one of our Sales Engineers, Bert Hayes, the other day about using Splunk for computer forensics. Bert formerly was a Splunk customer at a large university in the southern U.S. where he used Splunk for security….he really knows his stuff in this area. Anyhow, Bert mentioned to me how he used to use Splunk for computer forensics and pointed me to a great blog that he found helpful on the topic. I found the blog post to be a great read and wanted to share it.
The blog is courtesy of Klein & Co, experts in computer forensics. In the posting they detail how to use Splunk to build a computer forensic timeline for analysis. The…
Splunking Websphere MQ Queues and Topics
What is Websphere MQ
IBM Websphere MQ , formerly known as MQSeries , is IBM’s Message Oriented Middleware offering and has been the most widely implemented system for messaging across multiple platforms over the last couple of decades.
What is Message Oriented Middleware
From Wikipedia :
“Message-oriented middleware (MOM) is software or hardware infrastructure supporting sending and receiving messages between distributed systems. MOM allows application modules to be distributed over heterogeneous platforms and reduces the complexity of developing applications that span multiple operating systems and network protocols. The middleware creates a distributed communications layer that insulates the application developer from the details of the various operating system and network interfaces. APIs that extend across diverse platforms and networks are typically
Splunk with PowerShell? Yes, Please
Do you manage Windows servers? If the answer is yes, then the likelihood is that you utilize PowerShell in your daily operations. As many know, PowerShell is an extraordinarily powerful shell command language that Microsoft invented to manage their most complex server applications. Exchange, SharePoint, Lync, SQL Server and Active Directory can all be managed through PowerShell; and that’s just the start. The Splunk App for Exchange and the Splunk App for Active Directory both use this facility to get inventory and usage information from the depths of the systems.
But it isn’t easy. Scripted inputs are, well, expensive. Firstly, you have to wrap the PowerShell executable inside a CMD batch file. When it executes, you are…