Enriching threat feeds with WHOIS information

It’s almost been 2 years since I spent a summer in Seattle interning with the Splunk Security Practice (SecPrax) Team. Damn, time flies! The Splunk Security community is growing everyday, due to the unbelievable amount of flexibility, visibility, insight Splunk Enterprise offers for all data and as I have learned all data is security relevant. Back at Splunk to work with the Security Research team, this is my first blog post and I would like to hear what you people have got to say about it, so please leave a feedback/comment.

What am I missing while doing threat intelligence?

While I am doing some research looking for threat intelligence data sets to ingest into Splunk, I realized there can be an operational gap between the attributes offered by threat feeds (which is a boring list of publicly known bad IPs, domains, etc) and how can I effectively leverage that list to improve my security.

The questions to answer are: Do we have any additional context regarding bad IP address? Do we care if the owner of a bad domain is sending us emails? Do you know if those malicious domains are registered with your company’s details? How relevant is the information from the threat feed while performing an incident investigation?  The answer to these questions is something that is missing while doing threat intel. A good way to start efficient threat intelligence is to enrich the list of raw IP address with required external context such as…….

  1. Domain/IP to URL: to check what each of the bad IP would resolve into.
  2. Passive DNS and WHOIS information, the AS routes, and the score of its BGP routing
  3. Reputation score, IP Address resolution, website category on VirusTotal
  4. Bad IP associated to what malware campaign, other associated IOCs, etc

…….and to integrate them seamlessly into the detect process to not only make the Threat Intel more actionable, but also to assist in triaging and investigating incidents. So my aim is to generate and enrich threat intel data which essentially is of Splunk, by Splunk and for Splunk-ingDid I just alter the quote by Mr. Lincoln?

I will walk through the process I learned to generate your own threat dataset with whois information and provide you with use cases on how do we best leverage this custom command for your Splunk instance.

So, how do I generate context around an IP address in Splunk?

  1. Enrich the IP address with WHOIS information
  2. In Splunk, you are only limited by your creativity. Use other sources like VirusTotal, Passive DNS, IOC Bucket, etc to gather context and enrich your threat data.

Step 1: Create an app skeleton for custom search commands
(download the code from the git repository :mysplunk_csc)

Refer to the blog, if you are new to custom search commands. Alternatively, you can copy paste the generateblocklist_app in $SPLUNK_HOME/etc/apps directory. As with any Splunk app there is a specific file layout and some configuration files that are required. Use the searchcommands_template in the Splunk SDK for Python which can be found at :

/splunk-sdk-python-1.5.0/examples/searchcommands_template
  • Create a folder generateblocklist_app in $SPLUNK_HOME/etc/apps/ and copy the contents of searchcommands_template to the new folder.
  • In the bin folder of the generateblocklist_app, delete the filter.py, report.py and stream.py as they are not going to be used in our application
  • Rename generate.py ->generateblocklist.py
  • Copy the /splunk-sdk-python/splunklib folder into the $SPLUNK_HOME/etc/apps/generateblocklist_app/bin folder

Step 2: Search and Replace in generateblocklist_app.

Edit bin/generateblocklist.py, and app.conf, commands.conf and logging.conf in the default folder.

  • Replace each instance of %(command.title()) with GenerateBlocklist in bin/generateblocklist.py
  • Replace each instance of %(command.lower()) with generateblocklist as this is the name of the command.
  • Replace each of the remaining %(…) values in app.conf with the appropriate information based on the name. The specific values in this case don’t really matter, but you must put something.
  • Create a file collections.conf in generateblock_app/default directory to point the application to the KVStore in use. In our case the name is kvwhois
[kvwhois]
  • Create a file transforms.conf in generateblock_app/default to enable a lookup emergingthreats and include the following code
[emergingthreats]
external_type = kvstore
collection = kvwhois
fields_list = _key, _user, asn_registry, asn_country_code, nets,raw, asn_cidr,raw_referral,asn_date,query,referral

By this step, your generateblocklist_app should have the file structure setup and the configurations required to be able to generate data.

Step 3: Python to fetch raw threat feeds and enrich them

 Edit the generateblocklist.py to include two main code snippets:

  • Specify parameters for the search command
@Configuration()
class GenerateBlocklistCommand(GeneratingCommand):
url = Option(require=False)
delete = Option(require=False, validate=validators.Boolean())
whois =  Option(require=False, validate=validators.Boolean())
  • Implement the following functions in  generate function with logic to generate threat feeds, I have done the following:-
    – generate(): This function must exist in the GenerateBlocklistCommand  class, initialize and validate the parameters to get a raw list of Bad IPs (raw threat feeds)
    – add_kvstore(): Given appropriate parameters, query each IP address from the raw threat feed against the WHOIS server for threat list enrichment and store them in Splunk KV Store.
  • One great thing about Splunk is the ability to add a plethora third party libraries to supplement the development of custom solutions. Copy and paste the library dependencies in $SPLUNK_HOME/etc/apps/generateblocklist_app/bin
    Note: Alternatively, you can copy paste the generateblocklist_app in $SPLUNK_HOME/etc/apps directory.Refer to the Readme file in the app folderto see the command manual. You can  download the code from my git repository :mysplunk_csc

Well, if you are following the above steps, you should be ready to create a KV store enhanced threat list.

  1. To verify the custom command, run this query to generate raw threat feeds and fast result
    | generateblocklist url=default whois=False
  2. To generate and store enhanced threat list, run the following query
    | generateblocklist url=default whois=True
    Note: This step is relatively slow and it takes some time to query and push data to the KV store.

    Generate all the IP address and add to KV storePicture2

Step 4: Using Lookup to verify threat intel enrichment

Once we have a KVStore of enriched threat intel, we can use lookups to check what is the new information we now have along with each BadIP

| inputlookup emergingthreats

Picture1

Oh Splunk! ‘whois’ this new data?

We now have additional information about the raw IP address. Useful information like registrant information, email address used for domain registration, IP range that a certain BadIP belongs to. We can now determine if the hosts in our network are interacting with these bad domains via this additional information, create an alert or create a work flow action in Splunk.

How to apply this in our infrastructure and Use-cases

Now that we have this additional information about the IP address like CIDR range, Name, Address and email of the registrant, domain names associated with an IP, we can now make an event or indicator more actionable. This will help the Level 1 SOC analyst and the hunters with more context around an incident

Use this command to:

  • Generate a list of publicly known bad IPs from all relevant sources and the corresponding whois information to store in a KV store.
  • Use lookups to access the KVStore information and compare them against your network logs, email logs to detect malicious communication with the bad actors, infected hosts or to monitor the company’s critical assets’ interaction with these bad domains.
  • Leverage the WHOIS information (emails, asn_registry) to identify and report fraud and other abusive behaviors. You will be surprised to know the number of domains registered which look similar to your company domain or if you happen to discover your hosts resolving IPs to these domain names.
  • Develop TI signatures which can be introduced into other security tools like NIDS, HIDS, snort and SIEM event correlation.
  • An interesting use case would be to compare the Whois and DNS information of the IP addresses and monitor the internal DNS lookup queries within the company

The blog post gives you a narrow example about how to get whois data around a threat feed however feel free enable the code to interact with other third party sources and tools like VirusTotal, DNSWL.org, PassiveDNS to generate a custom datasets of threat intelligence and create specific use cases (Malware communication, Phish and Spam email, data exfiltration) depending on the threat feed used.

Generating threat intelligence is not only about blocking all the bad IPs, it is more effective when one knows about the characteristics of the bad actors, their interaction with the company’s assets and the seamless integration of that intelligence into the security.

Also, I suggest you read this blog post regarding the IP Reputation app developed by fellow Splunker Matthias Maier, to get more ideas about threat intel using Splunk Enterprise.

Beware of the threats and happy splunking!

Screenshots:

Well, if you’re working with IP data, HEAT MAPS are awesome
| inputlookup emergingthreats  | iplocation query | geostats count 

Picture3

This map represents where are these BadIPs located

 

Bhavin,

Awesome post and greatly appreciate you pointing your thoughts to paper (screen) on this subject as this is something I’m sure everyone who’s used Splunk before has thought about this many times.

Jeff Walzer
May 4, 2016

Hi Jeff

Thank you for reading and the feedback.You can refer to the code to extend the capability to interact with other enrichment resources : VirusTotal, IP scanner, etc.

I have updated the code on the Git Repo : https://github.com/patel-bhavin/mysplunk_csc

Bhavin Patel
May 4, 2016