Developing Correlation Searches Using Guided Search

Guided Search was released in Splunk Enterprise Security 3.1, nearly two years ago, but is often an overlooked feature. In reality, it is an excellent tool for streamlining the development of correlation searches. The goal of this blog is to provide a better understanding of how this capability can be used to create correlation searches above and beyond what Enterprise Security has to meet your unique security requirements.

So what is Guided Search?

It’s a “wizard”-like process to gather the key attributes that make up a correlation search. Essentially, there are five elements to Guided Search:

  • Identify the data set to search
  • Apply a time boundary
  • Filter the data set (optional)
  • Apply statistics (optional)
  • Establish thresholds (optional)

Along the way, Guided Search provides search syntax to validate the filters and thresholds to ensure the output meets your needs.

How does this work?

Let’s take the following example:

We have recently deployed an intrusion detection system (IDS) and we would like to tune the signatures to ensure that the IDS is not overly “chatty” as it pertain to a set of systems within a specific part of your network. Our analysts need to be notified when more than 20 IDS events are triggered by the same signature against any host in that network range within an hour.

Splunk uses data models to create search time mappings of datasets to specific security domains. A data model does not group events by vendor or by network but by the type of event. Examples of data models provided in the Splunk Common Information Model (CIM) include authentication, malware, intrusion detection, and network traffic.

Because we want to identify IDS events, we’ll choose the Intrusion_Detection data model and its IDS_Attacks object. Click Next.

GuidedSearchCreation

We will need to define a time range for the correlation search to run against. This range will bound the events. Because we are focusing on the previous hour, we select Last 60 minutes.

GSC_60minutes

Now that we have a data set and a time frame for our data, we can use Guided Search to filter our search results. If we don’t need all of the IDS data, then why search all of it? Because we are focused on a specific network range, we could use the LIKE operator and the % wildcard to look for any destination IP addresses that start with 192.168.1.x like this: dest_ip LIKE “192.168.1.%”. Note that the filter will be implemented using Splunk’s where command so the asterisk (*) wildcard character that you may be familiar with won’t work here. Alternatively, we could use CIDR notation in this way: cidrmatch(“192.168.1.0/24”, dest_ip).

GSC_Filter

At this point, our search is created and the syntax is displayed. We could click Run search to test our search to ensure that it is collecting the data we are expecting. Once we are satisfied with our data set, we can proceed to applying statistics to the data. Click Next.

While we can create multiple aggregates for our search, we only need a single aggregate and that is a count. Click Add a new aggregate.

GSC_stats

The Function drop-down has a number of arithmetic functions including count, distinct count, average, standard deviation and more. These functions can be applied to any of the fields that are available in the Attribute drop down. Because we want a count of events, we are going to count the values in the _raw field and then assign an alias of count to the statistic.

GSC_stats2

Click Next. If we needed to add additional aggregates, we do that now. Click Next.

GSCStats_3

In most cases, we want to use Split-By to select fields to group our data. If we had asked for a count as our aggregate, but did not apply at least one field in the Split-By, we would have ended up with a number that would have represented the count of IDS alerts during the past 60 minutes, making the search fairly useless to an analyst. In our case, we want to know the count of alerts by dest (destination) and by signature, so our Split-By will be both of those fields. Click Next.

GSC_signature

We can create aliases for these fields if we want. If we do not create an alias, it will default to the field name, in this case dest and signature.

GSC_field

The last step we need to take before we are finished is to determine a threshold for the search. If you recall, our requirement was to trigger this correlation search when a host saw the same signature 20 times in one hour so our analysts could review and tune the signature. Based on that, we will set our count Attribute threshold by selecting Greater than Operation and Value of 20. Click Next.

GSC_operationvalue

At this point our search syntax is available for review. We can click Run search to review the output and to ensure the results are what we would expect.

GSC_ready

GSC_newsearch

Once we are happy with the results, we can click Save to write the generated search processing language (SPL) into the Correlation Search. From there, we can configure the rest of our correlation search, but we will save that for another post!

John Stoner
Federal Security Strategist
Splunk Inc.