Steps for implementing Fraud Detection

A couple of years ago, I wrote about how easy it is to detect fraud, mostly in the financial services industry, using Splunk Enterprise in a blog article. What I provided were the last steps on using the Splunk Search Processing Language to accomplish the task. However, for most people, who are new to Splunk, that doesn’t really help as it only gives you a prescription after you’ve uncovered the symptoms and, should I say, possible disease.

Today, I’d like to step back a little bit and give you the full high level steps on implementing fraud detection for your needs. This may make the previous article a little more clear.

Understand Your Use Cases

Before you do anything, first understand your use cases, What problems are being seen, and who does it matter if the problems are solved? For instance, you may have a public Wifi that clearly states it should be used for web surfing and email, but should not be used for streaming media. A few enterprising users figure out a way to circumvent the the controls and watch Youtube videos. This happens a few times a month. Solving this problem is probably not on the highest priority unless the frequency of abuse goes up. You can actually use Splunk to show a timechart of how many times the policy has been violated using your web logs.

On the other hand, suspicious credit card patterns for conservative users going on an international buying spree is clearly a red flag and needs to be detected and mitigated as fast as possible. Understand your use cases, enumerate them, categorize them, and prioritize them before doing anything else. Then, it becomes a three step approach listed below.

Define Your Rules

Every business will have its own set of rules for thresholds and possibly even secret algorithms to monitor behavior. Independent of product choice, spend some time defining rules for what constitutes fraud. You may start with your own knowledge of the business and then move onto your industry’s best practices that could be found through on-line reference or outside consultants. There rules are the trigger to fraud detection, but not an end in themselves. They may start off as English definitions for what are the conditions which lead to suspicious activity.

Define the Data Behind the Rules

Rules are algorithms, but do nothing without data. Next, define the data that supports the rules. Where does the data come from and can it be accessed via computer systems to be processed in rules? Transaction logs, either buried in log files or hidden in databases, would be a good start. With this type of time series data in mind, you’ll also want to enrich the data with more meaning to make it easier to understand. For instance, if data says account number xxxx-9999 purchased this item in city X for Y dollars, then you may want to have your system enrich this data at search time with user names, home address, and phone number to provide some context, both for geographic patterns and for contacting the user directly from the event data. The enriched data may come from a database or CMDB.

Ingest the Data and Search, Report, and Alert on it

Using something like Splunk, you now can easily ingest this machine generated data into a system that indexes all data in real-time and use the Splunk Search Processing Language to analyze the data using your rules. This allows you to create reports for trending, anomalies, and pattern detection. Furthermore, since it’s impossible for humans to continuously watch dashboards, you can schedule your searches in Splunk and have them deliver results as actionable alerts. You now have fraud detection.


I have tried to show you in a three step process the high level steps for using a Data Analysis engine for fraud detection. The advantages of doing this with Splunk include universal indexing of machine generated data, real-time ingestion and search, adding knowledge or schema for the data after it has already been ingested to allow it to adapt to your needs, a powerful search language that allows you to state your rules without having to always write a computer program, and out of the box reporting and alerting. With this approach, you’ll have a flexible and scalable platform to build your solution.