Get Splunk
Splunk.com  |  Splunk Base  |  Splunk Blogs

IT Search - A New Approach to Payment Card Industy (PCI) Compliance

Posted:  May 8th, 2008
Tags:  Log Analysis, Splunk

pci.jpg The payment card industry data security standard, PCI DSS for short, was developed by the credit card industry to address data theft. The standard consists of twelve security requirement. Anything from traffic policies to requirements around anti virus software are covered by the standard.

If you are a company that does more than 20.000 transactions per year, you will have to implement the twelve requirements. If you are doing less, you will get away with a quarterly vulnerability scan.

IT search, Splunk, can directly address some of the areas and indirectly address most of the others. Specifically the areas where IT search assists are the following:

  • Log manageent (PCI requirement 10)
  • Secure & Central Log Collection (PCI requirement 10.5)
  • Audit Trail Retention (PCI requirement 10.7)
  • Daily Log Review (PCI requirement 10.6)
  • Secure Remote Access (PCI requirement 7.1)
  • File Integrity Monitoring (PCI requirements 10.2.2, 11.5 and 10.5.5)
  • PCI Control Reporting*

The Splunk for PCI application can be downloaded from SplunkBase. It provides a set of 91 searches and 57 reports, a dashboard, and a set of alerts that can be used to monitor the control objectives. The application makes use of Splunk’s IT search capabilities to address PCI. IT search has some very unique capabilities and is uniquely positioned to address PCI compliance:

  • satisfy ad-hoc requests form auditors
  • do large-scale reporting and investigations
  • automate control objective monitoring
  • add new control objectives and policies that require flexible monitoring and correlation capabilities
  • support ever changing data sources
  • re-use already collected data
  • incorportate file monitoring (not just traditional one-line log messages)

The Splunk for PCI application also gives you a capability to implement compensating controls for some of the PCI requirements. Also make sure to check out the daily log review process that helps you very easily tackle requirement 10.6.

Splunk is serious about PCI compliance: We are now part of the PCI Council. This is going to ensure that we know about upcoming changes to the PCI standard ahead of time and we can help influence future direction of it.

Permalink   |   No Comments

Splunk Fights Phishing

Posted:  April 16th, 2008
Tags:  Log Analysis, Splunk

images.jpgThis morning, there was yet another case of phishing that was reported by the New York Times. This phishing incident, Larger Prey Are Targets of Phishing, is interesting because of the victim demographics: executives of large companies. As I just learned, this is also referred to as whaling. We have all seen phishing emails that tried to lure us into logging into our PayPal account. But an email from the United States District Court in San Diego that has a very authentic look is a different story. Would you fall for it?

The best way to address phishing is to educate users to make sure they don’t give out personal information. Have a look at the AntiPhishing Working Group’s phishing checklist that contains a lot of specific tips to prevent successful phishing attacks.

Splunk can addresses a couple of use-cases surrounding phishing attacks:

  • Detecting, after the fact, whether someone in your company fell victim to the scam (phishing).
  • Protecting your company from being phished. (In today’s story, the United States District Court in San Diego)

Detecting Phishing Victims

Once you know about a phishing attack, you can use Splunk to figure out whether anyone in your company has fallen victim. There are a few ways to do so, depending on the attack vector:

  1. The phish infects the victim and installs a trojan that starts leaking information.
  2. The phish uses a Web site to collect victims’ personal information (such as credit cards)

Both of these infections will start communicating with the outside. In the case of the phish reported today, the computers started communicating with machines in Singapore. By analyzing the traffic patterns and figuring out where in the world connections are being made to, this infection can be detected very easily. The Splunk reporting is a great way to quickly generate traffic reports and isolate traffic patterns based on geographic locations of the communicating machines. If , for example, your normal access pattern looks like the first graph and then after some time, you get the result of the second picture, where China suddenly shows up at second position, there might be something wrong.

Normal traffic patterns hitting Web site:

normal_web.png

Suspicious traffic pattern hitting Web site. Note China on second position:

picture-6.png

Protecting Your Company From Being Phished

If you are operating a Web site, you should try to make sure that there is nobody trying to phish it. There are a couple of ways that IT Search can help you with this:

  • Monitor your Web server logs for non-complete session requests. A lot of phishers request images from your site, but not the original site itself (the HTML page).
  • Monitor Web server logs for sessions that directly send a login, without ever requesting the login page itself. This happens when the victim logged into the phishing site and the credentials are passed to the real site, making everything look normal for the victim.
  • Check DNS lookups and see whether you get a lot of lookups from one single machine. This is tricky and you need to know the baseline of lookups, but spikes might turn out interesting to investigate.

Here is a search in Splunk that you can use to determine whether someone posted credentials without ever requesting the login page:

sourcetype=access_comined (login_form.php OR sales.php) | stats count by clientip | search count=1

This assumes you have a page, sales.php, which you can only access once you logged in via the login_form.php. For more complicated Web site architectures, you will have to build a more sophisticated search that uses transactions, but more on that another time.

Permalink   |   1 Comment

All the Data That’s Fit to Visualize - SOURCE Boston 2008

Posted:  March 27th, 2008
Tags:  Log Analysis, Splunk, Visualization

img-62_t.jpgI was giving a talk at SOURCEBoston 2008. The topic this time was around general visualization and what has gone wrong in security visualization in the past. I showed how we can learn and steal from other disciplines, in this case, the New York Times. The NYT has done some pretty fantastic work in the area of data visualization. Their interactive market map, for example, is a great way of exploring stock data. During the talk, I outlined some of the design principles that the NYT graphics department is using when they are designing their graphs: Show - Don’t Tell.


To start my presentation, I showed a little video about security visualization (see below).

2340391938_67b956ed2e.jpgAt conferences lately, I find myself not to be the only one that talks about security visualization. More and more presentations are showing visualizations. A lot of projects are using visualization to help them analyze all the data at hand. At SOURCE, Dave Dittrich from the University of Washington, talked about BotNet analysis and visualizing network traffic captured from BotNets. He definitely has a challenge of displaying large amounts of data. We discussed some approaches and possibly, parallel coordinates, could work for his data. Parallel coordinates are what I used in my book for some BotNet traffic analysis.

Permalink   |   No Comments

Common Event Syntax

Posted:  March 6th, 2008
Tags:  Log Analysis, Splunk

cee-logo.gifAs part of the common event expression (CEE) effort, a list of field names has been published.

If log records from different log sources have to be correlated or reports have to be generated across different log sources, a common set of field names is needed. Take a firewall log example. Assume that you have two types of firewalls in your environment: Netscreen and PIX. Both devices write different types of log entries. Assume you have a parser that extracts fields from the two logs. Each of the parsers might call fields differently, making it either impossible, or really hard to correlate these two log files. Just think about reporting. How do you find the top source addresses across both logs? These are logs from each of the firewalls:

Netscreeen:

May  5 17:01:40 45.2.0.1 NOC-FWa: NetScreen device_id=NOC-FWa [Root]
system-notification-00257(traffic): start_time="2006-05-05 17:01:40"
duration=0 policy_id=52 service=tcp/port:26212 proto=6 src zone=backbone
dst zone=noc-mgt action=Deny sent=0 rcvd=0 src=222.81.119.59dst=45.2.121.102
src_port=7000 dst_port=26212

Pix:

Jan 18 12:43:50 192.168.1.1 %PIX-6-106015: Deny TCP (no connection)
from 208.58.193.69/1062 to a.b.c.d/443 flags ACK

If you report on “src”, you won’t get the “from” from the PIX log. We need unified names.

It is not just important to have a common set of names, but also a common understanding of what individual fields mean. What is the semantics of a field? For example, how do you measure a duration? In seconds? Hours? Days? What is a destination host? Is it fully qualified or just the host name itself? The field list, which can be found in this post: CEE Fields List is a first step towards standardizing this.

Note that, for example, ArcSight’s CEF publishes a dictionary along with their log syntax. The CEE field list can be used to standardize the names across various log formats and can hopefully substitute and expand ArcSight’s dictionary.

Permalink   |   3 Comments

Common Event Expression (CEE) - Email Archives

Posted:  February 7th, 2008
Tags:  Log Analysis, Splunk

cee-logo.gifThe common event expression (CEE) effort is moving along. If you haven’t seen much coming out of CEE, it is not that we are not working on it. We have been busy defining and hashing out various aspects of the CEE standard. I am getting ready to release a list of fields for the syntax part of CEE. The taxonomy is moving along as well and I am compiling the final pieces to release for discussion.

If you are interested in the public discussions around CEE, the Mailing list archives are now online.

Permalink   |   No Comments

Applied Security Visualization

Posted:  January 25th, 2008
Tags:  Log Analysis, Splunk, Visualization

picture-3.pngFor the past year I have been working on a book about visualization. It will be called “Applied Security Visualization“. The book is going to talk about all the aspects of visualizing security data. Anything from important data sources and graphs to use-cases and open source tools for visualization. The main use-cases I write about evolve around Perimeter Threat, Compliance, and Insider Threat.

Last year during RSA, Addison-Wesley (my publisher) recorded some videos, where I talk about the book and some of its contents. Here are the links to the videocasts:

At this point, I have one more chapter to write before the book is done. A rough-cut version should be available by RSA this year and the book should be out by BlackHat (August). Keep your fingers crossed!

Permalink   |   No Comments

IT Search vs. SIEM - Data Collection - Feedback

Posted:  January 8th, 2008
Tags:  Log Analysis, Splunk

Steve posted a commentary to my blog post about IT Search vs. SIEM - Data Collection. I want to address some of his comments here, showing that IT search is more than a lot of people think!

  • Steve writes: “Raffy mentions a small change in the syslog format causing the connector to break. Well syslog is a standard so if it would not break any standard syslog receiver, what it actually meant is that the syslog message has not changed but the content had.” - If I say that the “syslog format” changed, I mean the syslog message, the text. And yes, a changed message will break the specific syslog parser/connector. If you write a parser for, let’s say sendmail, you have a capability to extract all the fields that sendmail logs. If I change the sendmail message, your parser won’t work anymore. Hence, your connector will break and not parse the message. In the worst case not even collecting that message at all.
  • Log Management vendors provide “knowledge” about the logs beyond simple collection.” - Agreed. The parsers or field extractions are definitely knowledge. Same with IT search. There are field extractions (see for example splunkbase.com) that you can use to extract individual fields to report on them. It’s about the way you approach data collection. If you need a parser to start with, you won’t be able to collect the data that you don’t have a connector for. That was my whole point. Nothing else. There are other differences in search vs. SIEM, but that’s a topic for a future blog entry (which is overdue, I know).
  • What Log Management vendors do is to help you ( as the user) out with the knowledge – rules that categorize important event logs from unimportant ones, alerts, reports that are configured to look for key words in the different log streams.” - Yes. I was not talking about reports, searches, dashboards, etc. in my blog post at all. However, IT search is not different. It has reports, searches, alerts, tags, classifications, etc.
  • In IT Search, there is no possibility for anything to get out of date mainly because there is no knowledge, only the ability to search the log in its native format.” - Not true at all. The question is where you impose the log format. If it’s at collection time, you run into all the problems that I talked about in my previous post. If you are imposing the schema at search time, as IT search is, you get pretty much the same benefits, but a few more (dynamic schemas, multiple name-spaces, etc.) And yes, this information is prone to get out of date, but hence the dynamic approach!
  • Finally, if a Log Management vendor is storing the original log and you can search on it, your Log Management application gives you all the capability of IT Search.” - Well, sure. But would you say that searching your documents with grep is better than using a search engine like google? I guess not. Same with IT search, which is built for quickly and efficiently searching logs, versus storing log files and grepping through them. The search language that you can use is another factor. You cannot just do simple searches but all kinds of operations on the data - statistics, conversions, comparisons, etc. You are not comparing apples with apples.

Note also that Steve only addressed a subset of my issues. I hope you realize that IT search is more than just searching your log files!

Permalink   |   1 Comment

Common Event Expression

Posted:  December 7th, 2007
Tags:  Log Analysis, Splunk

cee-logo.gif Common Event Expression (CEE) standardizes the way computer events are described, logged, and exchanged. It is an effort hosted by Mitre, as so many other computer security standards like CVE, or OVAL. The CEE effort is subdivided into four sub-efforts. Each of them will publish their own set of requirements to guarantee seamless future interoperability of devices and applications:

  • Event Syntax
  • Event Taxonomy
  • Event Transport
  • Event Logging Recommendations

The order in which I listed these efforts is most likely the order in which CEE is going to address the different standards and how they are going to be standardized. There is a real need to standardize all of these items if we want companies (mainly vendors) to focus on building meaningful and interesting analysis capabilities, instead of spending all their time on normalizing log files, building connectors, and trying to interpret the meaning of log messages.

I am posting this in lieu of the official launch of the CEE Web site!

Permalink   |   No Comments

Common Event Format - Add-on

Posted:  December 6th, 2007
Tags:  Log Analysis

logo_splunk.gifThe common event format (CEF) is a standard for the interoperability of event- or log generating devices and applications. The standard defines a syntax for log records. It comprises of a standard prefix and a variable extension that is formatted as key-value pairs. The standards document is unfortunately only available if you register on the Web site. I wish ArcSight would post a link to the standards document, instead of making you register to download it. If you want more detailed information about CEF, check out an older post that I have written when I was still working on CEF.

I just wrote a CEF add-on for Splunk. It defines field extractions for CEF formatted messages. Just install the add-on, set your source type to cef and you will be able to use the extracted fields from your CEF messages. Note that because CEF has an extension that is all key-value pairs, I did not have to write any special extractions for that part. I only had to implement extractions for the prefix. Very slick!

Permalink   |   4 Comments

IT Search vs. SIEM - Data Collection

Posted:  December 3rd, 2007
Tags:  Log Analysis, Splunk

lock1.jpgI have a lot of conversations lately about the topic of IT search versus SIEM (security information and event management), the more traditional way of doing security event management. People are asking me how Splunk’s technology is different from all the log management tools. With ArcSight (my former employer) going public, LogLogic going through some turmoil in their executive management, and Splunk that just got an amazing round of investment, people are very interested in understanding what the deal is.
The topics of SIEM and IT search are fairly similar. However, there are some very important differences that I want to start pointing out in a series of blog posts.

Let me start with the topic of data collection. In an SIEM system, you use a collector, a connector, or an agent (I don’t really care what you call it, but it’s some piece of code which reads the data and feeds it into the system) to process the data before you can use it in your SIEM for correlation, reporting, or forensic purposes. If you do not have a connector specifically written for your data source, you are out of luck. Just to be clear, I am not talking about having a connector for files, for ODBC, for SNMP traps, or for syslog over UDP/514. I am talking about a connector for each specific data source: Snort syslog, Snort database, CheckPoint OPSEC, CheckPoint syslog (do they have a syslog output?), PIX over syslog, CISCO router over syslog, etc.

What this means is that the SIEM has to either already support your data source or you need them to build you a connector; or you build it yourself. Most of the SIEM tools have some sort of an SDK that you can use for this purpose. However, do you have the man power and the skills in-house to do so? If not, does the SIEM company have the bandwidth to build your connector in acceptable time?

What happens, if the source data format changes? For example, Snort might slightly change its syslog format. Guess what has to happen. Yes! The connector needs to be updated to support the new format. This could mean a down-time of your data source of a few days, if you don’t plan accordingly and get an updated connector right away.

No connector - No data

Now, what is the deal in the IT search world? Well, you need some sort of connector as well. However, you only need one to transport the data from the data source into the search system. In other words, you need about a handful of connectors: One for ODBC, one for receiving syslog on UDP/514, one for text-files, and one for databases other than ODBC. (Okay, okay, I will add one for CheckPoint’s OPSEC). That’s it. You don’t need a specific connector for each data source. You also don’t have to update every time the data source decides to slightly change the logging format. [And if you think that never happens, have a look at SiteProtector.]

What does this mean? It means that from day one that you install your IT search technology, you are able to work with your logs. You don’t have to wait until the right connector is available.

So much for now. In my next post I will talk about structured data.

Permalink   |   16 Comments