High Performance syslogging for Splunk using syslog-ng – Part 2

As I mentioned in part one of this blog, I managed a sizable deployment of Splunk/Syslog servers (2.5TB/day). I had 8 syslog-ng engines in 3 geographically separate data centers. Hong Kong, London and St. Louis. Each group of syslog-ng servers was load balanced with F5. Each group was sending traffic to their own regional indexers. Some of the syslog servers processed upward of 40,000 EPS (bursts traffic). The recommendation that I am about to describe here is what worked for me; your mileage may vary of course. I tried optimizing the syslog-ng engines to get as much performance as possible out of them. If you feel, however, that it is over kill or if you don’t have the manpower to go through the tuning process; it maybe easier to just add additional hardware and use the default settings.

 

SYSLOG-NG MANAGING TIPS

Modular configurations:

With syslog-ng release 3.x a new feature was introduce that allows you to dynamically include configuration files in the body of the main syslog-ng.conf. This is similar to C language “include” or Python “import”.

To use this feature just add a line like this to syslog-ng.conf

@include "/etc/syslog-ng/buckets.d"

This feature enables you to create a main syslog-ng.conf file then move all source-related configurations to a directory (let’s call it buckets.d). By doing so you have effectively split your syslog-ng configuration into two parts: The Static part which contains the syslog-ng server specific configuration (i.e. IP address, listening ports, sockets conditioning…etc.); and the Dynamic part, which is related to the source devices (hostnames, permissions, locations, filter rules…etc.). The static part does not change from server to server. Once configured you probably don’t need to change it. The dynamic part (buckets.d files) is constantly changing every time you add or remove a source host.

Sample buckets.d filter-file:

destination d_firewalls { file ("/syslog/FIREWALLS/$SOURCEIP/$SOURCEIP.log"
          owner(syslog-ng) group(splunk) perm(0755) dir_perm(0755) create_dirs(yes)
};
filter f_firewalls {  match("%ASA-"  value ("MSG"))
                   or match("%ASA-"  value ("MSGHDR"))
                   or match("%FWSM-" value ("MSG"))
                   or match("%FWSM-" value ("MSGHDR"))
                   or match("%PIX"   value ("MSG"))
                   or match("%PIX-"  value ("MSGHDR"))
                   and  not netmask("10.96.50.13/32”)
;};
log {source(s_network); filter(f_firewalls);    destination(d_firewalls);
};

 

To make syslog-ng configuration modular, create as many filter-files as you want. Each filter-file should contain a list of individual group of sources. Then periodically sync the “buckets.d” directory across all of your syslog-ng servers.

My sources devices (dynamic part) are not the same across all these data centers. So why am I syncing filter-files I wouldn’t use, you ask? Good question. The answer is ease of administration. By synchronizing buckets.d you don’t need to worry about which source lives where. My intention was to create a universal set of filter-files that will work in any data center. The simplicity of management superseded the clutter in this case. From that point on, every time you restart syslog-ng the entire contents for buckets.d along with the main syslog-ng conf will appear as one single conf file for the syslog-ng daemon.

 

Keywords naming convention:

As with Hungarian notation https://en.wikipedia.org/wiki/Hungarian_notation , I strongly recommend using the following naming convention to make your configuration easy to read and follow:

d_     for destination
f_      for Filters
s_      for Sources

destination     d_damballa {file ("/syslog/DAMBALLA/$SOURCEIP/$SOURCEIP.log" ); };
 
filter  f_damballa { netmask ("10.63.1.1/32") ;}; 

log {source(s_network); filter(f_damballa); destination(d_damballa); };

 

Turn on statistical gathering:

Turning on statistical gathering in syslog-ng. It will enable you to have visibility to the engine’s operation. You will be able to see how many events per source are being collected. This information is critical for capacity planning and performance tuning.

destination d_logstats { file("/home/syslog-ng/logstats/logstats.log"
          owner(syslog-ng) group(splunk) perm(0644) dir_perm(0750) create_dirs(yes));};

filter f_logstats { match("Log statistics;" value ("MSGHDR"))
                 and match("d_windows" value ("MSGHDR")); };

log { source(s_local ); filter (f_logstats); destination(d_logstats); };

 

Watch for file permissions:

Make sure syslog-ng process can READ buckets.d directory and can READ/WRITE the logs directories. Make sure that splunkd daemons has full READ access to the log files (and their parent directories)

file("/syslog/MSSQL/$SOURCEIP/$SOURCEIP.log" 
owner(syslog-ng) group(splunk) perm(0755) dir_perm(0755) create_dirs(yes));};

 

 Watch for UDP packets drops on the syslog server:

Many new syslog-ng admins don’t pay attention to his item. They simply assume things will work. For the most part that is true, but in a high volume environment UDP traffic drops are unavoidable. You will start to hear some users complaining about “missing events” and they will probably blame Splunk for it. So do yourself a favor and monitor UDP packet drops on the interface. Use whatever tool you are comfortable with. And yes, there is a Splunk app for that https://splunkbase.splunk.com/app/2975/#/overview

 

 

 SYSLOG-NG TUNING TIPS

Syslog-ng has several tuning parameters to achieve higher “ingestion” or “capture” rates. Please be aware that if you use large values; some of these configurations may require adjustment to your kernel (/etc/sysctl.conf). For full details on all syslog-ng available options please consult https://www.balabit.com/sites/default/files/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/index.html?_ga=1.115117060.204896635.1456724547

Here are some configuration options that you can use:

 Set the receiving buffer size

You can control the size of the receiving buffer using rcvbuf(). Incoming events will be queued in memory before they are written to disk. While large buffer will improve the capture speed it may also result in undesirable side effect of timestamp skewing. Syslog-ng timestamps events when they are written to disk and not when the network card does receive them. In my environment I managed to get over 5 minutes timestamp skewing by just creating too large of a buffer. If your events have multiple timestamps (one added by syslog-ng and the one added by the source device), you can probably instruct Splunk to use the second timestamp. Again use with caution!

udp ( ip(10.16.128.93) port(2514) so_rcvbuf (805306368) so_sndbuf(8096) time_zone(GMT) keep_timestamp(no) );

 

Use multiple sockets:

The term “network socket” refers to the combination of port number and IP address. One-way of enhancing syslog performance is splitting your incoming log traffic to multiple sockets (or channels). For example in my environment I configured syslog-ng to listen on UDP/2514 for the firewalls, and TCP/2515 for VMware logs and so on. You can also utilize multiple IPs if you have them. The idea here is to distribute the load among multiple channels. However, before you rush into opening multiple sockets, make sure you have exhausted the existing one(s). There is no need to complicate your design just because you can. Simple is always elegant!

 

Allow TCP logging:

Many network devices can only be configured to use UDP/514 for logging. But try to enable TCP logging whenever possible. The advantage is reliability of the transport protocol.

tcp ( ip(10.16.128.93) port(2514) ) ;

 

Set max-connections the socket can handle:

The objective here is to prevent a single source that has “gone wild” from overwhelming the channel. Start with a large number then tuned down based on your environment’s “normal” activity.

tcp ( ip(10.16.128.93) port(514) max-connections(5000) ) ;

 

Turn off DNS name resolution:

Unless you really need it, I recommend filtering by IP address and not attempt to lookup DNS hostnames. However, if you must have it; then look into running DNS cache-only servers. http://www.tecmint.com/install-caching-only-dns-server-in-centos/ . Please remember that syslog-ng can do DNS caching of its own, so again do not rush to enabling DNS caching in the OS or syslog-ng unless you really need it. In my experience enabling DNS caching in syslog-ng.conf is sufficient to reduce DNS traffic (out of the server).

use_dns(no);
dns_cache(no);

 

Explicitly define logging templates:

The advantage is that you will be able to control how the log message is formatted in case you need to forward the events to other syslogs or third party tool. Syslog-ng engine, much like Splunk, can act as logs router (aka syslog HUB). There are few things you need to worry about when you configure your syslog-ng as a hub. Watch for chain_hostname() and keep_hostname()

template t_default { template("${DATE} ${HOST} ${MSGHDR}${MSG}\n");

 

Fast Filtering:

Creatinging filters by IP addresses rather than filtering by keywords in the message body (MSG) or header (MSGHDR) is faster. If you’re trying to achieve higher capturing capacity you should look into this option.

Having said this, there might be a need to use keywords filtering. In my case I had an environment with 500+ firewalls. It was very easy to identify all possible unique Cisco ASA keywords found in ASA logs. The alternative would have been listing every single IP in the configuration file. I opted for ease of management this round. Additionally, avoid using regex in your syslog-ng, Splunk is better suited for this task

  

TESTING YOUR SYSLOG-NG INSTANCE

 There are many ways to test your configurations. The best way is to use a traffic generator like IXIA since you can really push massive amounts of traffic. My next favorite tool is loggen by Balabit (which is part of syslog-ng distribution). With this tool you can stress test your syslog-ng install by specifying the rate of syslog messages. You can also test using TCP or UDP protocols. For more realistic simulated test, you can utilize a sample input file (ex: Cisco ASA log). Using real world sample file is also very useful for testing your filtering rules.

https://www.balabit.com/sites/default/files/documents/syslog-ng-ose-latest-guides/en/syslog-ng-ose-guide-admin/html/loggen.1.html

From Balabit documentations: When loggen finishes sending the messages, it displays the following statistics:

  • average rate: Average rate the messages were sent in messages/second.
  • count: The total number of messages sent.
  • time: The time required to send the messages in seconds.
  • average message size: The average size of the sent messages in bytes.
  • bandwidth: The average bandwidth used for sending the messages in kilobytes/second.

 

Example loggen commands:

 You can send data to your syslog using input file to have more realistic data

loggen 10.0.0.1 514 cisco_asa.log

 

The following command generates 1000 messages per second for ten minutes, and sends them to port TCP/514 on host 10.0.0.1 . Each message is 500 bytes long.

loggen --size 500 --rate 1000 --interval 600 10.0.0.1 514

 

 

In conclusion, as you can see syslog-ng is a very flexible and well designed open source tool. It can be a critical part of your Splunk deployment. You need to decide how far you want to take it. Weigh all your options, as every environment is different. Seek simplicity as much as possible, but don’t shy away from being on the bleeding edge if it makes sense. And finally don’t assume anything, test your configuration and monitor your deployment. As always I welcome your comments and feedback!

 

Back to part 1:  http://blogs.splunk.com/2016/05/05/high-performance-syslogging-for-splunk-using-syslog-ng-part-1/