With the concentration of people's shopping habits on online systems, the need for fast and reliable solutions in detecting the fraud activities of businesses operating in this field has increased.
According to the Ecommerce Fraud Statistics1, $24 billion were lost worldwide due to fraudulent activities in 2018. This information highlights the importance of fighting against fraud activities and the strong need for fraud detection systems.
Before the emergence of machine learning supported fraud detection, rule-based systems were used for detection, or even it was handled manually by teams consisting of dozens. However, this was highly open to various mistakes and mispredictions due to the possibility of making the wrong decisions or defining missing rules. Also, the inability to detect new types of fraudulent methods and associated low accuracy and slow decision-making processes caused inefficiencies.
Machine-Learning based detectors can analyze hundreds of transactions in a second with the maximum accuracy so that the businesses can save money and the genuine customers are kept from cancellation of their orders.
AWS announced the general availability of the Amazon Fraud Detector service in late July 2020 and shared its more than 20-years of experience with the customers. Amazon Fraud Detector service uses your own historical data as the dataset. So, the ML model that you create is specialized to catch fraudulent activities based on your own business model. The best thing about this service is that it requires no prior machine learning experience.
For the dataset, AWS recommends using historical data of 3-6 months long with a minimum of 10.000 lines and 2-variables. However, the model also supports up to 100-variables, based on your needs. Also, the number of both fraudulent and legitimate rows must exceed 400 in the dataset, otherwise model training will be failed.
Amazon Fraud Detector has Amazon SageMaker integration, so if you have already created a fraud detection model before you can bring it to Amazon Fraud Detector and use it.
When it comes to pricing you pay hourly fees for model training and hosting, along with a fee per the estimate in Amazon Fraud Detector. If you use Amazon Fraud Detector without machine learning models (based on rules), costs will be less. For further details, check here.
Before we start to work on our example, let’s check some of the concepts of the Amazon Fraud Detector.
Events: Events are the activities evaluated as fraudulent.
Entity: Entities represent who triggered the events e.g. customers, merchants.
Label: Label indicates the event as fraudulent or legitimate.
Detector: Detectors contain the defined rules and/or models for a particular event that is evaluated for fraud detection. Despite multiple versions of detectors being available, only one of them can be in ‘Active’ status.
Variable: Variables are the elements that you want to use for fraudulent events. They should be matched with the columns from the provided dataset and the variables defined by Amazon.
Rule: Rules specify the effect of variable values in fraud detection. Can have multiple variables and outcomes. A detector must have a minimum of one rule.
Outcome: Outcomes are the actions to be taken at the end of the fraud prediction based on the prediction scores produced by the machine learning model. For example, if the score is higher than 700, you can mark the activity as suspicious and require verification from the customers.
In this tutorial, we are going to use Amazon Fraud Detector service with the sample dataset (registration_data_20K_full.csv) created by Amazon. It has more than all the necessary columns for us to detect registration fraud events. %5 of the dataset consists of fraudulent events. Although we will be using Amazon Fraud Detector from the console in this tutorial, it can also be used via the Python SDK. Let’s get to work.
First of all, we are going to create an S3 bucket and call it “fraud-detector-test-bucket”. Then, we will upload our train dataset into this bucket. If you are going to use your own dataset, note that timestamp and event label fields are mandatory. In addition, their header names must be set to “EVENT_TIMESTAMP” and “EVENT_LABEL”. The maximum size of the dataset is 5GB. In the other columns except these two, null and missing values under a certain limit can be handled. Supported timestamp formats can be found here.
When the upload process is done, navigate to the Amazon Fraud Detector panel. First of all, we are going to create an event from the “Events” tab. After entering the event type name, we create an entity by entering the entity name.
To select event variables, we chose “Select variables from the training dataset”, but you can also choose “Select variables from your variable list” and continue with the variables you’ve defined earlier.
After that, create an IAM role to allow Amazon Fraud Detector access to your training dataset stored in the S3 bucket. Then enter the location of the S3 bucket and press “Upload”. When the upload is done, match the dataset variables with the variables that are defined by Amazon and move forward.
Finally, create two labels for legitimate and fraudulent events and we are all set to train our model.
So far, we have created the fraudulent event type and now we are going to create and train our model using it. Let’s continue.
From the “Models” page, create a new model and choose model and event types. Then select the IAM role and the location of training data information and move on.
For the model type, “Online Fraud Insights” is currently available. As you can see from the below image, models can be trained more accurately to detect online payment, new accounts and fake review fraud events by giving a dataset including the recommended variables with the Online Fraud Insights model type.
In the “Model input” tab, we will leave all the variable boxes checked. Proceeding to the “Label classification” tab, select the label names corresponding to the fraud and legitimate events you’ve defined. Remember that, there should not be different values other than the defined labels in the events related column of the dataset. Otherwise, model training will fail.
Now we are finally ready to train our model. It takes about an hour to train the model with our dataset. Amazon Fraud Detector uses 15% of the data that was not used in model training to determine the model performance. When the training is completed, score and accuracy details can be found on the model detail page.
These details provide quite useful information when it comes to creating detectors and rules. As mentioned before, the model produces a score as an output and the fraud detection is done based on that score (0: the least risk, 1000: the highest risk). As you can see from the Confusion matrix, for the 500 score threshold, 95.3% of fraudulent events are caught but at the same time, 13.3% of the legitimate events are marked as fraud.
You can check the accuracy for the other threshold values from the image below.
Models can be easily deployed by selecting the model version from the Deploy tab. In a few minutes, the model will be deployed and its status will change to “Active”. When the deployment is done, you will be charged per hour the model is hosted.
Models with active status can be added to the detectors. The more models the better, you can add more than one model to the detectors to increase their accuracy.
Now let's create rules for the model to interpret the score that is given to us after the prediction. We have created three rules named “high_fraud_risk”, “medium_fraud_risk” and “low_fraud_risk” by evaluating the threshold table shown as a result of model training. We also created “outcomes” to give information about what will happen when the condition of the rule is achieved. While defining the rule, let's not forget that besides the score given by the model, we can also use the variables we’ve defined earlier. For more information about rule expressions, check here.
Now, we are only one step away to get real-time fraud predictions from Amazon Fraud Detector. Select “Run tests” from the Detector version details page. Enter the values of the variables then run the test. Amazon Fraud Detector will return the outcome(s) based on the set rules. If the test results meet your needs, you can change the status of the detector from “Draft” to “Active” and make your detector available for real-time predictions. You can find the “GetEventPrediction” API here and start using Amazon Fraud Detector.
That is it! We have created our event types for fraudulent activity detection and trained our model with the historical data. Now, we are able to get an accurate prediction on fraud patterns and take appropriate actions when new accounts are created, without prior machine learning expertise.
An enthusiastic developer, Beyza is eager to follow and leverage next-gen technologies for modern application development. Her interest in adopting transforming practices for excellent applications follows her passion for learning and sharing skills and emerging technologies.
Cookies are small files that are sent to and stored in your computer by the websites you visit. Next time you visit the site, your browser will read the cookie and relay the information back to the website or element that originally set the cookie.
Cookies allow us to recognize you automatically whenever you visit our site so that we can personalize your experience and provide you with better service.