Enriching threat feeds with WHOIS information
It’s almost been 2 years since I spent a summer in Seattle interning with the Splunk Security Practice (SecPrax) Team. Damn, time flies! The Splunk Security community is growing everyday, due to the unbelievable amount of flexibility, visibility, insight Splunk Enterprise offers for all data and as I have learned all data is security relevant. Back at Splunk to work with the Security Research team, this is my first blog post and I would like to hear what you people have got to say about it, so please leave a feedback/comment.
What am I missing while doing threat intelligence?
While I am doing some research looking for threat intelligence data sets to ingest into Splunk, I realized there can be an operational gap between the attributes offered by threat feeds (which is a boring list of publicly known bad IPs, domains, etc) and how can I effectively leverage that list to improve my security.
The questions to answer are: Do we have any additional context regarding bad IP address? Do we care if the owner of a bad domain is sending us emails? Do you know if those malicious domains are registered with your company’s details? How relevant is the information from the threat feed while performing an incident investigation? The answer to these questions is something that is missing while doing threat intel. A good way to start efficient threat intelligence is to enrich the list of raw IP address with required external context such as…….
- Domain/IP to URL: to check what each of the bad IP would resolve into.
- Passive DNS and WHOIS information, the AS routes, and the score of its BGP routing
- Reputation score, IP Address resolution, website category on VirusTotal
- Bad IP associated to what malware campaign, other associated IOCs, etc
…….and to integrate them seamlessly into the detect process to not only make the Threat Intel more actionable, but also to assist in triaging and investigating incidents. So my aim is to generate and enrich threat intel data which essentially is of Splunk, by Splunk and for Splunk-ing – Did I just alter the quote by Mr. Lincoln?
I will walk through the process I learned to generate your own threat dataset with whois information and provide you with use cases on how do we best leverage this custom command for your Splunk instance.
So, how do I generate context around an IP address in Splunk?
- Enrich the IP address with WHOIS information
- In Splunk, you are only limited by your creativity. Use other sources like VirusTotal, Passive DNS, IOC Bucket, etc to gather context and enrich your threat data.
Step 1: Create an app skeleton for custom search commands
(download the code from the git repository :mysplunk_csc)
Refer to the blog, if you are new to custom search commands. Alternatively, you can copy paste the generateblocklist_app in $SPLUNK_HOME/etc/apps directory. As with any Splunk app there is a specific file layout and some configuration files that are required. Use the searchcommands_template in the Splunk SDK for Python which can be found at :
- Create a folder generateblocklist_app in $SPLUNK_HOME/etc/apps/ and copy the contents of searchcommands_template to the new folder.
- In the bin folder of the generateblocklist_app, delete the filter.py, report.py and stream.py as they are not going to be used in our application
- Rename generate.py ->generateblocklist.py
- Copy the /splunk-sdk-python/splunklib folder into the $SPLUNK_HOME/etc/apps/generateblocklist_app/bin folder
Step 2: Search and Replace in generateblocklist_app.
Edit bin/generateblocklist.py, and app.conf, commands.conf and logging.conf in the default folder.
- Replace each instance of %(command.title()) with GenerateBlocklist in bin/generateblocklist.py
- Replace each instance of %(command.lower()) with generateblocklist as this is the name of the command.
- Replace each of the remaining %(…) values in app.conf with the appropriate information based on the name. The specific values in this case don’t really matter, but you must put something.
- Create a file collections.conf in generateblock_app/default directory to point the application to the KVStore in use. In our case the name is kvwhois
- Create a file transforms.conf in generateblock_app/default to enable a lookup emergingthreats and include the following code
[emergingthreats] external_type = kvstore collection = kvwhois fields_list = _key, _user, asn_registry, asn_country_code, nets,raw, asn_cidr,raw_referral,asn_date,query,referral
By this step, your generateblocklist_app should have the file structure setup and the configurations required to be able to generate data.
Step 3: Python to fetch raw threat feeds and enrich them
Edit the generateblocklist.py to include two main code snippets:
- Specify parameters for the search command
@Configuration() class GenerateBlocklistCommand(GeneratingCommand): url = Option(require=False) delete = Option(require=False, validate=validators.Boolean()) whois = Option(require=False, validate=validators.Boolean())
- Implement the following functions in generate function with logic to generate threat feeds, I have done the following:-
– generate(): This function must exist in the GenerateBlocklistCommand class, initialize and validate the parameters to get a raw list of Bad IPs (raw threat feeds)
– add_kvstore(): Given appropriate parameters, query each IP address from the raw threat feed against the WHOIS server for threat list enrichment and store them in Splunk KV Store.
- One great thing about Splunk is the ability to add a plethora third party libraries to supplement the development of custom solutions. Copy and paste the library dependencies in $SPLUNK_HOME/etc/apps/generateblocklist_app/bin
Note: Alternatively, you can copy paste the generateblocklist_app in $SPLUNK_HOME/etc/apps directory.Refer to the Readme file in the app folderto see the command manual. You can download the code from my git repository :mysplunk_csc
Well, if you are following the above steps, you should be ready to create a KV store enhanced threat list.
- To verify the custom command, run this query to generate raw threat feeds and fast result
| generateblocklist url=default whois=False
- To generate and store enhanced threat list, run the following query
| generateblocklist url=default whois=True
Note: This step is relatively slow and it takes some time to query and push data to the KV store.
Generate all the IP address and add to KV store
Step 4: Using Lookup to verify threat intel enrichment
Once we have a KVStore of enriched threat intel, we can use lookups to check what is the new information we now have along with each BadIP
| inputlookup emergingthreats
Oh Splunk! ‘whois’ this new data?
We now have additional information about the raw IP address. Useful information like registrant information, email address used for domain registration, IP range that a certain BadIP belongs to. We can now determine if the hosts in our network are interacting with these bad domains via this additional information, create an alert or create a work flow action in Splunk.
How to apply this in our infrastructure and Use-cases
Now that we have this additional information about the IP address like CIDR range, Name, Address and email of the registrant, domain names associated with an IP, we can now make an event or indicator more actionable. This will help the Level 1 SOC analyst and the hunters with more context around an incident
Use this command to:
- Generate a list of publicly known bad IPs from all relevant sources and the corresponding whois information to store in a KV store.
- Use lookups to access the KVStore information and compare them against your network logs, email logs to detect malicious communication with the bad actors, infected hosts or to monitor the company’s critical assets’ interaction with these bad domains.
- Leverage the WHOIS information (emails, asn_registry) to identify and report fraud and other abusive behaviors. You will be surprised to know the number of domains registered which look similar to your company domain or if you happen to discover your hosts resolving IPs to these domain names.
- Develop TI signatures which can be introduced into other security tools like NIDS, HIDS, snort and SIEM event correlation.
- An interesting use case would be to compare the Whois and DNS information of the IP addresses and monitor the internal DNS lookup queries within the company
The blog post gives you a narrow example about how to get whois data around a threat feed however feel free enable the code to interact with other third party sources and tools like VirusTotal, DNSWL.org, PassiveDNS to generate a custom datasets of threat intelligence and create specific use cases (Malware communication, Phish and Spam email, data exfiltration) depending on the threat feed used.
Generating threat intelligence is not only about blocking all the bad IPs, it is more effective when one knows about the characteristics of the bad actors, their interaction with the company’s assets and the seamless integration of that intelligence into the security.
Beware of the threats and happy splunking!
Well, if you’re working with IP data, HEAT MAPS are awesome
| inputlookup emergingthreats | iplocation query | geostats count