Tracing your TCP IPv4 connections with eBPF and BCC from the Linux kernel JIT-VM to Splunk

Starting with Linux Kernel 4.1, an interesting feature got merged: eBPF. For anyone playing with network, BPF should sound familiar: it is a filtering system available to user-space tools such as tcpdump or wireshark to filter and display only the wanted (filtered) packets. The e in eBPF means extended, to bring that out of just Network traffic and allowing to trace from the Kernel various things, syscall capture, kprobes, tracepoints etc.

eBPF will run a piece of C code compiled in bytecode which uses the Just-In-Time Compiler to the BPF interpreter. In short, eBPF uses the virtual machine which interprets code into the Linux Kernel. In the current git tree, BPF offers 89 instructions called from the bytecode buffer making the eBPF instructions.

It is an amazing tool for tracing, but in this post I would like to share how we can list TCP IPv4 connections and send them to Splunk using the HTTP Event Collector (HEC), all that kernel side!

We will cover the Linux kernel configuration that you need, as well as the Splunk dashboard which monitors those events.

Step 1: Getting the latest Linux Kernel

Those steps are done on a Debian distribution, should also work on Ubuntu. If you have another distribution, adjust this or find a way to grab a Kernel > 4.1.
We first grab the freshest Linux source code from the Linus tree by running the git clone command:
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Because we are on a Debian distribution, we would like to use the standardize tools provided by the Debian Kernel Package (more information here: https://wiki.debian.org/BuildADebianKernelPackage)
We need to install the following packages to automate the building and packaging creation of this kernel:
$ sudo apt-get install kernel-package build-essential libncurses5-dev fakeroot
Now we can configure options we need for our kernel by running the ncurses frontend, menuconfig:
$ make ARCH=x86_64 menuconfig
If you want to play with the new bpf() syscall, activate into the “General Setup” the item “Enable bpf() system call”:
kernelconfig
We save in the “.config” file, and we make sure the Kernel configuration builds BPF:
$ grep BPF .config
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NET_CLS_BPF=m
# CONFIG_NET_ACT_BPF is not set
CONFIG_BPF_JIT=y
CONFIG_HAVE_BPF_JIT=y
CONFIG_BPF_EVENTS=y
CONFIG_TEST_BPF=m
Now we can use the Debian kernel package builder, make-kpkg:
$ make-kpkg --initrd --rootcmd fakeroot kernel_image
exec make kpkg_version=12.036+nmu3 -f /usr/share/kernel-package/ruleset/minimal.mk debian ROOT_CMD=fakeroot
====== making target debian/stamp/conf/minimal_debian [new prereqs: ]======
...
dpkg —build                   ~/git/linux/debian/linux-image-4.6.0-rc6+ ..
dpkg-deb: building package `linux-image-4.6.0-rc6+' in `../linux-image-4.6.0-rc6+_4.6.0-rc6+-10.00.Custom_amd64.deb'.
make[1]: Leaving directory ‘~/git/linux'
It builds the kernel bzImage, as well as the modules.
We install the package like this:
$ sudo dpkg -i ../linux-image-4.6.0-rc6+_4.6.0-rc6+-10.00.Custom_amd64.deb
Now it is time to reboot on your new kernel. You can then check the version by typing:
$ uname -a | grep 4.6.0-rc6
4.6.0-rc6
$ echo $?
0
If the echo command returns 1, you have booted on the wrong kernel. So now you can check if things were started correctly from GRUB.
This is all good from the Linux kernel point of view, we can now move on to the userspace tools, with BCC.

Step 2: Building BCC

Once our kernel is setup, we are now going to install and use BCC (BPF Compiler Collection), which offers a Python API where you include the C code you will bytecode for BPF and get results directly from the Linux Kernel… in Python!
You can get BCC from the latest git repository:
$ git clone http://github.com/iovisor/bcc
Simply follow the BCC building instructions:
We also install the tools iperf and netperf:
$ sudo apt-get install iperf netperf
To test BCC built fine, you can run the provided hello_world.py program:
sudo python /usr/share/bcc/examples/hello_world.py
          tpvmlp-1636  [000] d...  2633.342396: : Hello, World!
          tpvmlp-1636  [000] d...  2648.547213: : Hello, World!
And also a 4 lines longer code trace_fields.py:
$ sudo python /usr/share/bcc/examples/tracing/trace_fields.py
PID MESSAGE
1636 Hello, World!
1636 Hello, World!
3182 Hello, World!
3182 Hello, World!
1636 Hello, World!
Working? Now let’s go to the next step, setting up the Splunk HTTP Event Collector!

Step 3: Splunk HTTP Event Collector

Recently, Splunk introduced the notion of a HTTP Event Collector, which allows us to craft any type of event to be ingested by Splunk. The Event must be formatted in JSON, and send to the listening socket on the Splunk side.
I recommend you go and read “Set up and use HTTP Event Collector”, before continuing.
 We create a new HEC service, go into Settings>Data inputs:
9E2F1C6D-AD1D-4FAB-8BD8-9C679DBF6688
Now select on the left side the HTTP Event Collector:
DBCD4D88-2243-44C1-8C24-43A540CB36FA
On the upper-right corner, click on Global Settings:
64D02552-F8C5-46EB-9654-4D1B42BA07E7
This pops up the following window. We click on “Enabled” for All Tokens, we deactivate SSL, since we want to avoid adding the SSL handling code to make things easier for this blog article (however if you are not playing, it is obviously strongly discouraged to deactivate it!), and we leave the port number to the default. Click Save.
D8DA5F91-9CB0-475E-9299-D48D350B0308
Now back to the previous page, click on “New Token” on the upper-right corner:
 92F54099-1E88-46BC-A8A2-E52312A5D332
We give the name “bcc” to this token, a brief description and we can click on “Next“:
215192B4-C220-4912-8CAB-ABD0E5562765
We leave the input settings to the defaults, we can click on “Review”:
E89472CB-7939-451E-ADBD-9A63FBAC147C
We can now Submit:
46A40EDA-4121-4107-9A57-FAB118E3A991
Upon completion, our token is creating successfully like this:
322C9789-ACA2-43D5-B8D8-C8B904CFD99A
Copy the value, you will need this in your Python code!
We test if events can be sent using the program curl:
$ curl -k  http://localhost:8088/services/collector/event -H "Authorization: Splunk 652AE968-58E4-4304-A1FE-C4AB7A5CF327" -d '{"event": "hello world"}'
{"text":"Success","code":0}
And can check in Splunk the event was emitted:
13F96200-A4C2-43BB-8E7D-CBCA3B3A13BA

Step 4: BCC + HEC = \m/

We are going to modify an example given by the BCC project team which simply list the connected sockets in TCP on IPV4:
$ wget https://raw.githubusercontent.com/iovisor/bcc/master/examples/tracing/tcpv4connect.py
We can test the tool, by running it:
$ sudo python tcpv4connect.py
PID    COMM         SADDR            DADDR            DPORT
And on the other side, run an active connection, using wget:
$ wget google.com/index.html
Now back to where we started the program:
$ sudo python tcpv4connect.py
PID    COMM         SADDR            DADDR            DPORT
4367   wget         172.16.99.163    216.58.194.73    80
4367   wget         172.16.99.163    74.125.21.105    80
We can now send a Splunk event every time there is a new connection. We need to modify the code a little bit, no need to touch the C part, just the Python one.
Copy the tcpv4connect.py to tcp2splunk.py
$ cp tcpv4connect.py tcp2splunk.py
Edit now tcp2splunk.py with your favorite editor (emacs!) and go to line 20 to add the imports of httplib, os and json libraries:
from bcc import BPF
import os
import httplib
import json
# define BPF program
Now go to line 92 and initialize everything before the while loop starts:
headers = {"Authorization": "Splunk 652AE968-58E4-4304-A1FE-C4AB7A5CF327", "Content-Type": "application/json"}
conn = httplib.HTTPConnection("172.16.99.1:8088")
# filter and format output
while 1:
And finally, in the loop, we post received data to Splunk. We however add a pid check to make sure we do not send the connection this process creates to Splunk, otherwise we end up in a nice infinite loop!
        # Ignore messages from other tracers
        if _tag != "trace_tcp4connect":
            continue
        if os.getpid() != pid:
                message = {"event": {"pid": pid, "task": task, "saddr": inet_ntoa(int(saddr_hs, 16)),
                                     "daadr": inet_ntoa(int(daddr_hs, 16)), "dport": dport_s}}
                conn.request("POST", "/services/collector/event", json.dumps(message), headers)
                res = conn.getresponse()
We can now enjoy seeing our wget, as well as other python processes:
B8BBA618-1753-4025-8623-1747F545D496

Conclusion

As you have seen, using latest features from the Linux kernel in order to connect to Splunk anything the Linux kernel receives, all that from the kernel side using the glue offered by BCC so we can simply write the code and prototype using Python. I hope you will find creative ways to use the new eBPF feature and I would be more than happy to hear from you amazing stuff you are doing with it and Splunk!