Data Mining Techniques & Tools for Fraud Detection

    Use of Data Mining in Fraud Detection

    Data mining, with its wide variety of techniques, is able to juice out a lot of useful information from a large set of data.

    With its ability to find useful knowledge from a given data, it is a potent technique to identify abnormal patterns in data and any underlying unwanted activity.

    Industries like Insurance, Banking, Credit cards, and Telecom are most vulnerable to financial fraud with large sets of data.

    Before we delve into data mining techniques for fraud detection, let’s look at some of the already-developed fraud detection systems.

    1. The fuzzy logic system determines the fraudulent cases using optimum threshold values.
    2. A credit fraud model has a classification technique for fraud/legal values and second, clustering followed by classification again for no fraud/legal values.
    3. Kohonen’s Self-Organizing Feature Map was used to evaluate auto injury claims on the size of fraud suspicion.

    Now, let’s look at some of the data mining techniques that are helpful in fraud detection.

    Two Most Prominent Data Mining Techniques that Help with Fraud Detection

    1. Bayesian Belief Networks

    Bayesian Belief Networks basically set up a model of the causal relationship on the basis of which they predict probabilities and hence determine instances to be legal or illegal.

    For the purpose of detecting fraud, two Bayesian Networks aim to determine the behavior of auto insurance companies.

    The model basically makes two assumptions: one that considers the driver to be fraudulent and the other one that the driver is legitimate.

    Two nets are set up; one is a fraud net and the other one (user net) is that from a genuine user.

    Now, as this operation is carried out, the user net is adapted to a specific user based on the incoming data and then the user’s behavior is observed for any deviations.

    2. Decision Trees

    Decision trees are a set of machine learning techniques that consist of independent and dependent attributes. The basic algorithm for the decision tree is below:

    We begin by assuming that there are two classes: legal and illegal. The tree begins with a single node consisting of training samples.

    If the given samples are of the same class of fraud, then the node will become a leaf and shall be labeled as a fraud.

    Otherwise, the algorithm uses an entropy-based measure that shall separate the samples into individual classes.

    What are the Best Data Mining Tools for Fraud Detection?

    Some of the best data mining tools for fraud detection include:

    1. Clementine 4.0 from Integral Solutions Ltd.
    2. Darwin 3.0.1 from Thinking Machines Corp.
    3. Enterprise Mines from SAS Institute,
    4. Intelligent Miner for Data from IBM
    5. Pattern Recognition Workbench from Unica Technologies Inc.

    Also Read: List of 6 Open Source Data Mining Tools