What’s next? Next-level Splunk sysadmin tasks, part 1

splunktrust

(Hi all–welcome to the latest installment in the series of technical blog posts from members of the SplunkTrust, our Community MVP program. We’re very proud to have such a fantastic group of community MVPs, and are excited to see what you’ll do with what you learn from them over the coming months and years.
–rachel perkins, Sr. Director, Splunk Community)


 

Hi, I’m Mark Runals, Lead Security Engineer at The Ohio State University, and member of the SplunkTrust.

While deployed to Bosnia years ago I latched onto something I heard in a briefing once: When loosely describing when particular roadmap type things would take place, the person speaking said there were things that were going to be done Now, Next, and After Next. That fit the way I think to a tee.

In this three part series I’m going to talk about a few things Splunk administrators should do after data starts coming in. In other words a few ‘Next’ activities. These three things are:

  1. Making sure the host field really contains the name of the server
  2. Making sure the local time of the server is set to the correct time
  3. Evaluating the inbound data for indexing latency

While you can get data in a variety of ways, what I’m really focusing on is data coming in from Splunk forwarders installed on servers. (Syslog-based data comes with its own set of fun challenges local to your environment and is beyond the scope of this posting.)

I put checking host field values first because this is the start of making sure the data in your Splunk instance accurately reflects your environment. It’s a data integrity thing, really. At any rate, the most frequent situation I’ve come across where a Splunk forwarder is ingesting the data and the value in the host field isn’t correct is one where a virtual server has been built, a Splunk forwarder is installed and turned on, and then the image is copied multiple times. This is an issue since the Splunk forwarder only checks the local system once to get and set the host value.

With all of that as a backdrop, let’s tackle possible solutions–or at least the solution we’ve come to use:

Windows

With Windows systems, we can leverage the ComputerName field which is on a number of events like 4624. We want to make sure we can count on the data being there, though, and at a cadence that we can control. To achieve this, we turned to wmic and are using the following script: 

@echo off
wmic /node:"%COMPUTERNAME%" os get bootdevice, caption, csname, description, installdate, lastbootuptime, localdatetime, organization, registereduser, serialnumber, servicepackmajorversion, status, systemdrive, version /format:list

Getting data via wmic is great and easy! We are bringing in more fields than are needed but the data is valuable in its own right so might as well. Since in our case we are bringing this in once a day and it is small, we aren’t worried about any license impacts.

The /format:list part is nice as the data will come out in Splunk friendly field = value format. Drop that in a bat file and use a script statement like

[script://.\bin\wmic_os.bat]
disabled = 0
## Run once per day
interval = 86400
sourcetype = Windows:OS
source = wmic_os

Linux

For the Linux portion of this effort we modified the script that generates the Unix:Version data (version.sh script) that comes with the Linux TA. The script uses just about all of the uname switches except -n. We simply added uname –n, is an easy modification, and called the field ‘hostname’.

Bringing the data together

The first portion of your query will bring the data together and normalize the key fields. You could (and probably should) adjust either the data generation components or knowledge objects such that the data and query conforms more to the CIM – but there is no telling if you are using the CIM or not so will show you this method =)

At any rate that query might look like this:

sourcetype=windows:os OR sourcetype=unix:version | eval host_name = lower(coalesce(CSName, hostname)) | where isnotnull(host_name) | eval host = lower(host) | eval host_matches = if(match(host,host_name), "true", "false") | where host_matches = "false" | rex field=host "(?<first_name1>[^\.]+)" | rex field=host_name "(?<first_name2>[^\.]+)" | where first_name1!=first_name2 | eval os_type = case(isnotnull(CSName), "Windows", isnotnull(hostname), "Linux", 1=1, "fixme") | table index host host_name os_type | rename host AS "Reporting in Splunk As" host_name AS "OS Logged Host As" os_type AS "Server Type"

Once we’ve brought both sourcetypes together, the rex commands allow us to compare strings in cases where the data from one or the other field is in a fully qualified form. After that, we create a field to show whether the system is Windows- or Linux-based. Depending on the environment you are in, this can help shape the conversation with any teams you will have to reach out to.

With this data in hand now its just a matter of review and talking to whomever can make the change on the forwarder. There are several ways to make the change; we generally just request the server admin adjust the host line in $SPLUNK_HOME/etc/system/local/inputs.conf

Are you doing anything similar to this? If so let’s hear what it is in the comments so that we can generate several options for other Splunk admins out there.

See you next week for Part 2: http://blogs.splunk.com/2016/02/16/whats-next-next-level-splunk-sysadmin-tasks-part-2/

I use a simple search on _internal to find universal forwarders on systems that have been provisioned by cloning existing production VMs.

index=_internal sourcetype=splunkd component=Metrics sourceHost=”*” hostname=”*” | stats dc(sourceHost) AS duplicates by hostname | search duplicates>1

This doesn’t provide the actual name of the running host, but only requires a working forwarder to implement.

Jason Spears
February 12, 2016

Thanks for sharing that little trick there, Mark. I”ve seen this problem in Citrix environments where the server admins fail to run “splunk clone-prep clear-config” on their golden image. Since I’m such a cowboy and don’t have time to wait eight weeks for some admin to change local/inputs.conf I usually just deploy an app to the forwarder that deletes or modifies that file. Five mins later I undeploy it.

February 12, 2016

2 Trackbacks

  1. […] This is part 2 of a series. Find part 1 here: http://blogs.splunk.com/2016/02/11/whats-next-next-level-splunk-sysadmin-tasks-part-1/ […]

  2. […] is part 3 of a series. Find part 1 here: http://blogs.splunk.com/2016/02/11/whats-next-next-level-splunk-sysadmin-tasks-part-1/. Find part 2 […]