git commit -a -m “Splunking Github Blog”

Github Splunk Analysis

I <3 Github. Splunk <3’s Github (check out our repos here). I am told it is just a coincidence our HQ is opposite theirs.

One of the neat things about Github I am just starting to explore is their API. You can use it to do loads of things, from interrogating user activity to searching for keywords within code. I recently saw this analysis of the most popular programming languages hosted on Github and I was inspired to recreate it within Splunk.

Indexing Github data into Splunk makes it super-simple to start exploring it. In this post I wanted to show you some of my first experiments connecting Splunk into the Github API.

The Prep Work

Github Token

First download and install the Github Modular Input. This will enable us t0 make the API calls to Github.

Now you’ll need to grab a Github token. This is to avoid some rate limiting imposed by unauthicated requests to the API. To do this: log onto github.com > settings > applications > generate new token

Store this somewhere safely.

And that’s it :)

Add an Input

Github Input

In the Splunk GUI head to: settings > data inputs > github commits > add new

In this example I am going to be querying a repo from our own Splunk org. I’ll use our Javascript SDK repo.

owner: splunk
repository: splunk-sdk-javascript
token: <YOUR_TOKEN>

And that’s it :)

Start searching

Github Splunk Search

If you use the example above a basic search should return 100 results, as per the per_page value set in the call.

source="source="github_commits://github-commits""

Here’s some other simple searches we can immediately run on this dataset:

The most active users in the organisation:

source="github_commits://github-commits"| stats count(type) as count by author | sort - count

… or least:

source="github_commits://github-commits"| stats count(type) as count by author | sort count

Repository activity over time:

source="github_commits://github-commits" | timechart count(_raw) as activity

Repository activity over time by user:

source="github_commits://github-commits" | timechart count(_raw) as activity by author

You get the idea.

Now you’ve got the basics nailed go away and show me some cool stuff :)