Random Words on Entropy and DNS
During my last blog post, I mentioned that I would delve more into how to detect subdomains with relatively high entropy. But first I think it is important to discuss WHAT is entropy; WHY do I care if a domain or subdomain has high entropy; and finally, HOW you can use entropy in Splunk to find potentially bad things.
So, what does entropy mean? For the purposes of computer science, I tend to use the definition of entropy as “… a measure of uncertainty in a random variable” . For most things in computer science, entropy is calculated with the Shannon Entropy formula invented by Claude Shannon:
In other words (since if you are still reading this section, that formula meant as much to you as it did to me), the more random a string is, the higher its calculation of randomness (or rather “entropy”). This calculation is often referred to as a “score” of entropy. To illustrate what this “measure of uncertainty” looks like in the real (ha!) cyber world, lets calculate the Shannon entropy of the domains listed below:
- The domain aaaaa.com has a Shannon Entropy score of 1.8 (very low)
- The domain google.com has a Shannon Entropy score of 2.6 (rather low)
- A00wlkj—(-a.aslkn-C.a.2.sk.esasdfasf1111)-890209uC.4.com has a Shannon Entropy score of 3 (rather high)
As seen in the examples above above, a domain with lower levels of randomness (aaaaa.com and google.com) have correspondingly lower entropy scores than the long random domain A00wlkj—(-a.aslkn-C.a.2.sk.esasdfasf1111)-890209uC.4.com.
Why should you care about entropy? One good reason you should care about entropy is that it can help you detect malware and web exploits that make use of domains (and subdomains) that were created using a domain generation algorithm (DGA). As I discussed in my previous blog post, malicious actors will use a domain generation algorithm to create random looking domains and subdomains using some sort of “key” or “salt” that only they can decode. This new DGA domain can then be used for future malicious campaigns. Many different varieties malware  or other threats to your network use these DGA domains but some of the most famous include worms like Conficker  and web exploits like Blackhole Exploit Kit . Since these domains are randomly generated (and may only be up for a short amount of time) , it makes it extremely difficult for network defenders to block them using traditional methods like blacklists.
When Anton Chekov the famous Russian playwright said: “only entropy comes easy”, I believe it is safe to assume he was an avid user of the Splunk app “URL Toolbox”*. URL Toolbox can be used to split a URL or DNS query apart and calculate Shannon entropy on one of its corresponding fields in Splunk. Since you can’t use traditional block lists (the domains are constantly changing) to detect DGA domains, calculating entropy on those fields helps you detect possibly malicious domains that would otherwise get lost in the data. It should be noted that this is not a perfect method. Some legitimate domains (especially content delivery network (CDN) domains, news sites, streaming video sites, and Facebook) will be extremely long and have high entropy. When you start looking at these searches, spend some time manually reviewing your data and add those legitimate domains to a whitelist domain lookup table so that you can filter them out from your results.
Now lets look at a couple of example queries to see how to look for domains and sub-domains with URL Toolbox:
*Please note I am not a literature professor and this may not be an accurate statement
tag=dns | `ut_parse(query)` | lookup FP_entropy_domains domain AS ut_domain | search NOT FP_entropy=* | `ut_shannon(ut_domain)` | search ut_shannon > 4.0 | stats count by query ut_shannon
With this search we are looking at Common Information Model (CIM) compliant DNS queries via the “tag” field, but you could run this against Stream, Bro, or Host DNS sourcetypes. You could even run this against an http request from a proxy log if you wanted. Then (as discussed above) we remove any false positives by adding those domains to a “FP_entropy_domains” lookup table. Following that step, we then calculate the level of entropy in the field “ut_domain” (which is the base domain of the query that ut_parse created earlier). Finally, we tell Splunk to only display domains (ut_domain) that have an entropy score higher than 4.0. This is an arbitrary score that I created for this data set, but may need to be adjusted for your environment (lower means more false positives and higher means more false negatives). Try to find the crossover error rate (CER) that works best for your network!
tag=dns | `ut_parse(query)` | lookup FP_entropy_domains domain AS ut_domain | search NOT FP_entropy=* | `ut_shannon(ut_subdomain)` | search ut_shannon > 4.5 | stats count by query ut_shannon
This is identical to the above search but instead of looking for domains with high entropy, we are looking for SUB domains with high entropy. You could also have some fun by combining the dynamic DNS lookup table from my last blog post with this search! That would be especially good at finding APT malware that is beaconing home to dynamic DNS providers.
Domains and subdomains with relatively high entropy are great indicators of malicious behavior on your network. Take these searches and start hunting! I can almost certainly guarantee you will find something. Happy Hunting
Jayasree, N., and P. P. Amritha. “A Model for the Effective Steganalysis of VoIP.” Advances in Intelligent Systems and Computing Artificial Intelligence and Evolutionary Algorithms in Engineering Systems, 2014, 379-87.
“Domain Generation Algorithms (DGA) in Stealthy Malware – Damballa.” Damballa. March 5, 2012. Accessed September 28, 2015. https://www.damballa.com/domain-generation-algorithms-dga-in-stealthy-malware/.
“Introduction.” Conficker Working Group. Accessed September 28, 2015. http://www.confickerworkinggroup.org/wiki/pmwiki.php/ANY/Introduction.
“OpenDNS Security Research:.” OpenDNS Blog OpenDNS Security Research Blackhole Exploit Kit DGA Analysis Comments. July 10, 2012. Accessed September 28, 2015. https://blog.opendns.com/2012/07/10/opendns-security-team-blackhole-exploit/.
Bilge, Leyla, Sevil Sen, Davide Balzarotti, Engin Kirda, and Christopher Kruegel. “Exposure.” ACM Transactions on Information and System Security TISSEC ACM Trans. Inf. Syst. Secur., 2014, 1-28.
 Le Roux, Cedric. “Documentation.” URL Toolbox. Accessed September 28, 2015. https://splunkbase.splunk.com/app/2734/.