Improving Probabilistic Fraud Risk Analysis with ML


With the rise of more sophisticated fraud tactics, it’s become even more essential for businesses to find better ways to predict fraud risk.

Catching and stopping fraud has been a business necessity since the advent of eCommerce. While the earliest fraudsters were typically unsophisticated individual actors, the potential to make real money targeting goods and services sold online pushed cyber fraud into the domain of organized crime. Driven by a dramatic rise in computing power and a booming increase in eCommerce and mobile payments, fraud has steadily increased in scale and sophistication, presenting greater challenges for the businesses trying to stop it. These issues are driving the need for more comprehensive fraud risk analysis.

Some businesses tackle identity verification and fraud detection with deterministic risk assessment using highly-verifiable data (such as date of birth or social security number) to make a definitive determination of a person’s identity. The goal is to conclusively say, “Yes, the person behind this transaction is who they say they are.”

See also: Fraud and Financial Crime Reduction in Banking with AI

This is a very similar approach to how credit bureaus verify identities. When a person applies for credit, they provide identity information, like their name, contact information, date of birth, social security number, etc. The credit bureau communicates directly with the person via email, physical mail, and phone calls. They contact their bank and collect pay stubs. Through this process, the creditor gains very strong confidence that the borrower applying for credit is real.

The myth of a definite answer

While deterministic risk assessment is effective for verifying identities, it has some drawbacks when it comes to fraud. For starters, deterministic analysis requires near-perfect data quality and often leaves out good customers whose identities aren’t well-established enough, including young people, recent immigrants, economically disadvantaged people, etc. Further, it doesn’t take the likelihood of fraud into account. Verifying an identity does not provide a complete picture of fraud risk. That is why there is a need for improved fraud risk analysis techniques.

For instance, two common fraud tactics — synthetic identities and account takeover — can often be difficult to detect using deterministic identity data and verification methods. With synthetic identities, fraudsters combine real and fake data to create a “new” person, while in the case of account takeover, a bad actor poses as a real person using stolen account credentials. Because they often pass through fraud systems undetected, these methods of fraud are growing in popularity. In fact, synthetic identity fraud is one of the fastest-growing financial crimes in the United States. The digital trust and safety firm Sift found that account takeover fraud increased by 282% between Q2 2019 and Q2 2020 due to the rise in digital business and online shopping following the COVID-19 outbreak.

Making better fraud risk predictions

To combat increasingly sophisticated fraud methods and make better predictions of fraud risk, a new type of probabilistic assessment has emerged using a wider range of identity data (such as telco and utility accounts, IP addresses, or online behavior data). By focusing on a broader variety of sources, types of data, and metadata, businesses can probabilistically assess online interactions to get a more complete picture of fraud risk. For example, a business might verify that a customer’s name matches a date of birth (which wouldn’t raise any red flags in a deterministic assessment) but also sees that the email address provided was used on 45 different websites in the last 30 days or that the phone number entered is a non-fixed VoIP. Suddenly the transaction starts to look like a much higher fraud risk.

While probabilistic assessments provide businesses with a better prediction of fraud risk, this sort of analysis is far beyond the capability of humans. Finding patterns that indicate fraud requires huge computing power to perform sophisticated analytics on massive data sets to make near real-time predictions.

Benefits of leveraging machine learning for probabilistic fraud risk analysis

Machine learning has become a powerful way for businesses to quickly make sense of the vast amount of identity data required for strong probabilistic fraud risk assessments. Advantages of leveraging machine learning for this analysis include:

1) Uncovering actionable insights: Probabilistic data often involve a multitude of signals (such as site behavior, device IDs, digital signatures like IP addresses, and more) that together provide a strong indication of fraud likelihood. Traditional rules-based approaches to onboarding customers are quickly overwhelmed by this number of data points. Machine learning, on the other hand, offers a way to take advantage of all known data points available on a customer transaction or account opening in order to reduce the noise and translate the multitude of signals into a single actionable score.

2) Holistically understanding identity: A rules-based approach uses a series of if/then statements to evaluate binary data points (if true, X action is taken, and if false, Y action is taken). Teams can buy or build platforms that combine rules using multiple data points or move from binary data to field arrays. The most complex rules systems leverage scorecards where any number of criteria might be met in order to trigger a specific action. However, these systems only leverage a small portion of available data to come to a decision point about whether or not something is fraud. On the other hand, a machine learning approach to probabilistic fraud analysis can help businesses understand the linkages between all data points. This provides a holistic understanding of an identity, creating profiles of both fraud and customers, and allows them to make decisions across the spectrum of risk.

3) Improving customer experiences: With machine learning, businesses can more easily focus on good customers as well as on preventing fraud. In rules-based systems, fraud is usually isolated through binary signals with low recall and high precision that are effective but leave the majority of customers in an ambiguous “low risk” bucket. This leaves no ability to treat their best customers better and instead forces many good customers through unnecessary friction. By approaching fraud risk on a spectrum based on probabilistic data and machine learning, businesses can reduce the amount of friction good customers experience, thus improving the overall customer journey.

Last word

With the rise of more sophisticated fraud tactics, it’s become even more essential for businesses to find better ways to predict fraud risk. Leveraging probabilistic identity data for more accurate fraud assessment is a crucial first step. However, just as important is building machine learning models that are equipped to handle the influx of data points that probabilistic assessments create. Done well, businesses will be able to reduce fraud while also minimizing friction points for good customers.

Trevor Anderson

About Trevor Anderson

Trevor Anderson joined Ekata in 2012 as the company's first sales engineer. Today he leads Ekata's global Field Data Science team, focusing on helping clients to get the best value from Ekata's products within their models and rule sets. Previous to Ekata, Trevor worked at QL2 Software as a software engineer, sales engineer, and account manager.

Leave a Reply

Your email address will not be published. Required fields are marked *