High Performance syslogging for Splunk using syslog-ng – Part 1
Today I am going to discuss a subject that I consider to be extremely critical to any Splunk’s successful deployment. What is the best method of capturing syslog events into Splunk? As you probably already know there is no lack of articles on the topic of syslog on the Internet. Which is fantastic because it enriches the knowledge of our community. This blog is broken into two parts. In part one, I will cover three scenarios of implementing syslog with Splunk. In part two, I will share my own experience running a large Splunk/Syslog environment and what can you do to increase performance and ease management.
When given the choice between using syslog agent (ex: http://sflanders.net/2013/10/25/syslog-agents-windows/ ) or UF (Universal Forwarder), the UF should always win. The UF/Indexer pairs are designed to work with each other from the ground up. There are a lot of advantages to using Splunk Universal Forwarder (aka Splunk Agent) to push events into Splunk indexers. Here are a few reasons:
- Ease of management.
- Better traffic throttling and buffering
- Ability to drop events at the source (new to 6.x)
- In-transit encryption (SSL).
- Intelligent events distribution across the indexers.
- Automatic indexer discovery (in clustered environments).
Getting back to syslogging, I have observed three scenarios utilized by Splunk’s customers for capturing syslog events:
- Scenario 1: Using network inputs on the Indexer(s).
- Scenario 2: Running syslog & IDX on the same server.
- Scenario 3: SSeparate server(s) running syslog & HF/UF.
Scenario #1: Using network inputs on the Indexer(s)
As a Splunk ninja you already know that it is possible to configure inputs.conf to accept traffic on any TCP or UDP port http://docs.splunk.com/Documentation/Splunk/6.3.3/Data/Monitornetworkports While this mechanism is a workable solution it is not,however, ideal in high volume environments. The indexer’s main job is to write ingested events to a disk and to answer incoming queries from the Search Heads. So yes, you can enable network inputs and yes it will work, but if your indexer is not sized appropriately; then it may not be able to keep up with amount of incoming events. Here is a list of challenges with this approach:
- There is no error checking of any sort due to the fact that the indexers and the clients are utilizing generic network connection. With TCP inputs you get a transport layer error checking but not at the application layer. Unlike Universal Forwarders, network inputs do not have full awareness of the clients.
- In large implementation restarting Splunkd is slower than restarting syslog. So you will risk longer periods of service interruption. This issue may not be a big deal in load-balanced environments.
- Indexers normally get restarted more frequently than your syslogs engines, which will result in frequent service interruption.
- Setting up source types from a network input is less efficient than setting a source type from a file input. In some case it can also be complicated.
- If you use port numbers under 1024 (i.e. TCP/UDP 514) you will need elevated privileges, which means you may need to run splunkd as a root. This goes against best security practices.
Scenario #2: Running syslog & IDX on the same server:
Next, a Splunk ninja may investigate running syslog alongside splunkd (on the same server). This solution is also not a good fit for high volume environments. Here is why:
- Syslog daemons and Indexers are both I/O intensive applications and they will compete for valuable resources. Syslog will capture, filter and write to disk; then comes splunkd and repeats “similar” process consuming “similar” resources. Effectively you are doubling your hard drive reads/writes and using more CPU cycles with all the compression/decompression activities.
- Some Splunk ninjas may choose to limit where syslog is installed (maybe on one or two Indexers) thinking that will reduce the negative impact on Splunk performance. However, all you have to do to slow down your overall search speed is to drag down a single indexer in the indexing tier. As rule of thumb the search speeds (especially searches that depends on aggregation) will be as fast as the slowest indexer. This architectural “flaw” is more prevalent than I would like to see.
- Aside effect of scenario #2 is that the indexer running syslog will have more data than its peers which means it will need to work harder than its peers answering queries for the syslog data its holding. Data imbalance has a negative impact on storage utilization (if using built-in HD) and search performance.
- The performance of this design-approach gets worse in virtual environments where over provisioning is a common thing.
Scenario #3: Separate server(s) running syslog & HF/UF
A better design is to implement syslog engine(s) on their own hardware and run Universal Forwarders (or a Heavy Forwarders) to pick up the events and forward them to the rest of the indexing tier. In this configuration, syslog will act as file-based queuing mechanism, which will allow splunkd some “breathing room” to process events whenever it has the cycle to do so. Customers who has made the transition from scenarios one or two scenario three noticed significant improvement of the search speed and less UDP/514 packets drops.
In part two of this blog I will focus the discussion on syslog-ng because it is a tool I am very familiar with it. The stock and generic syslogd SHOULD NOT be used. It’s old and lacks the flexibility and the speed of modern syslog engines like SYSLOG-NG or RSYSLOG. I will cover some performance tuning tips and some management tips. The goal is to help you better manage and improve syslog events capturing.