Custom Message Handling and HEC Timestamps with the Kafka Modular Input
Custom Message Handling
If you are a follower of any of my Modular Inputs on Splunkbase , you may see that I employ a similar design pattern across all of my offerings. That being the ability to declaratively plug in your own parameterizable custom message handler to act upon the raw received data in some manner before it gets output to Splunk for indexing. This affords many benefits :
- Many of my Modular Inputs are very cross cutting in terms of the numerous potential types and formats of data they will encounter once they are let loose in the wild. I can’t think of every data scenario. An extensibility design allows the user and community to be able to customize the
Turbo charging Modular Inputs with the HEC (HTTP Event Collector) Input
HTTP Event Collector (HEC)
Splunk 6.3 introduces a new high performance data input option for developers to send event data directly to Splunk over HTTP(s). This is called the HTTP Event Collector (HEC).
In a nutshell , the key features of HEC are :
- Send data to Splunk via HTTP/HTTPS
- Token based authentication
- JSON payload grammar
- Acknowledgment of sent events
- Support for sending batches of events
- Keep alive connections
A typical use case for HEC would be a developer wanting to send application events to Splunk directly from their code in a manner that is highly performant and scalable and alleviates having to write to a file that is monitored by a Universal Forwarder.
But I have another use case …
Protocol Data Inputs
It must have been about a year ago now that I was talking with a Data Scientist at a Splunk Live event about some of the quite advanced use cases he was trying to achieve with Splunk. That conversation seeded some ideas in my mind , they fermented for a while as I toyed with designs , and over the last couple of months I’ve chipped away at creating a new Splunk App , Protocol Data Inputs (PDI).
So what is this all about ? Well to put it quite simply , it is a Modular Input for receiving data via a number of different protocols, with some pretty cool bells and whistles.
So let’s break down some of …
Decoding IIS Logs
Everyone (just about) knows that there is a table of status codes that HTTP/1.1 defines. However, IIS gives you two more status codes in the log files. The HTTP/1.1 status is stored in sc_status (and it is automagically decoded for you in Splunk 6). There is also an extended code called sc_substatus and a Win32 error code. How can you really decode these, especially since the sc_win32_status seems to have really large numbers?
Let’s start with the sc_status and sc_substatus codes. These are normally written together as a decimal number. So, for instance, 401.1 means an sc_status of 401 and an sc_substatus of 1. The sc_status codes follow a pattern: 1xx are informational, 2xx indicate success, 3xx indicate redirection, and …
Getting data from your REST APIs into Splunk
More and more products,services and platforms these days are exposing their data and functionality via RESTful APIs.
REST really has emerged over previous architectural approaches as the defacto standard for building and exposing web APIs to enable third partys to hook into your data and functionality. It is simple , lightweight , platform independent,language interoperable and re-uses HTTP constructs. All good gravy. And of course , Splunk has it’s own REST API also.
The Data Potential
I see a world of data out there available via REST that can be brought into Splunk, correlated and enriched against your existing data, or used for entirely new uses cases that you might conceive of once you see what is available and …
Indexing data into Splunk Remotely
Data can reside anywhere and Splunk recognizes that fact by providing the concept of forwarders. The Splunk Forwarder will collect data locally and send it to a central Splunk indexer which may reside in a remote location. One of the great advantages of this approach is that forwarders maintain an internal index for where they left off when sending data. If for some reason the Splunk Indexer has to be taken offline, the forwarder can resume its task after the indexer is brought back up. Another advantage to forwarders is that they can load balance delivery to multiple indexers. Even a Splunk Light Forwarder (a forwarder that consumes minimal CPU resources and network bandwidth) can participate in an auto …