Tips & Tricks

April 21, 2016

2 Minute Read

When entropy meets Shannon

By Splunk

This is the third post on URL analysis, please have a look at the two other posts for more context about what can be done with Splunk to analyze URLs:

You will find in this article information on how one can detect DNS tunnels. While you can find lots of very useful apps on Splunkbase to help you analyze DNS data, it is always good for curious individuals to discover some techniques being used underneath.

A lot of captive portals are bypassed everyday by anyone able to run a DNS request, if someone can run on their machine the following command:

$ host splunk.com
splunk.com has address 54.69.58.243
...

Without being authenticated on the captive portal, then they can use any service on the internet using a DNS tunnel. There are a lot of tools out there to create those tunnels. And for a great paper on the topic, I encourage you to read the Detecting DNS Tunneling from SANS Institute.

Claude Shannon to the rescue!

Claude Shannon
By Jacobs, Konrad - https://opc.mfo.de/detail?photo_id=3807, CC BY-SA 2.0 de, Link

Long time ago, the venerable Claude E. Shannon wrote the paper “A Mathematical Theory of Communication“, which I strongly encourage to read for its clarity and amazing source of information.

He invented a great algorithm known as the Shannon Entropy which is useful to discover the statistical structure of a word or message.

If you consider a word, being a discrete source of the finite number of characters type which can be considered, for each possible character there will be a set of probabilities which would produce various outputs. There will be an entropy for each character. This entropy on the chosen word is defined as the average of the output weighted on the probability of occurrence of the characters.

The previous paragraph can easily be translated into the following Python code (taken from the excellent URL Toolbox on Splunkbase:

def shannon(word):
    entropy = 0.0
    length = len(word)

    occ = {}
    for c in word :
        if not c in occ:
            occ[ c ] = 0
        occ += 1

    for (k,v) in occ.iteritems():
        p = float( v ) / float(length)
        entropy -= p * math.log(p, 2) # Log base 2

    return entropy

Which can be run directly from any word you can have in Splunk:

As you can see, the score is pretty high, which makes sense since there is a high variety of frequency over those data. If we click on the ut_shannon field to sort in reverse order, this is what you could get:

As one can see, words of low characters distribution get a low score.

Catching DNS tunnels from subdomains in URLs

If we run the following query, interesting results are shown:
sourcetype="isc:bind:query" | eval list="mozilla" | `ut_parse(query, list)` | `ut_shannon(ut_subdomain)` | table ut_shannon, query | sort ut_shannon desc

As you can see in the results here, the high score come from tunnels made to the domain ip-dns.info as well as something which is unknown but could also be a tunnel: traffic towards greencompute.org

I hope this post helps you to see tools and methodologies one can use to find out unusual activity strictly based on the DNS traffic. More to come…

----------------------------------------------------
Thanks!
Sebastien Tricaud

Splunk

The world’s leading organizations trust Splunk to help keep their digital systems secure and reliable. Our software solutions and services help to prevent major issues, absorb shocks and accelerate transformation. Learn what Splunk does and why customers choose Splunk.

Tips & Tricks 2 Min Read

Indexing Video “Playlists” in Splunk

Tips & Tricks 7 Min Read

Attesting the Health of Ethereum 2.0

Splunkers Nate McKervey and Stephen Luedtke love data and couldn’t resist spending a few days to make sense of Ethereum 2.0 data using the Data-to-Everything Platform.

Tips & Tricks 5 Min Read

Splunking Microsoft Azure Monitor Data – Part 1 – Azure Setup

Use Splunk to harness the power of the data that Azure Monitor makes accessible. Learn to install the Microsoft Azure Add-on for Splunk and set up data inputs.

About Splunk

The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.

Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.

Learn more about Splunk

When entropy meets Shannon

Claude Shannon to the rescue!

Catching DNS tunnels from subdomains in URLs

Related Articles

Indexing Video “Playlists” in Splunk

Attesting the Health of Ethereum 2.0

Splunking Microsoft Azure Monitor Data – Part 1 – Azure Setup

About Splunk

Subscribe to our blog

Connect with Splunk on X

Connect with Splunk on Instagram