Reducing false positives using contextual AI
In the current world of sanction screening, financial institutions are damned if they do and damned if they don’t comply with the ever-evolving regulatory landscape. The consequences of non-compliance are punitive, with the annual penalties issued by regulators between 2018 and 2020 alone have more than doubled, going from $4.5bn to $10.4bn [1]. Full compliance does not come cheap either. Compliance departments within banks have been expanding at an alarming rate to tackle the avalanche of alerts coming out of sanction screening systems resulting in the doubling of operational costs every four years [2].
Scaling up resources is by no means a long-term strategy to the false positive problem. The fact is, more efficient controls are required. Most banks still utilize legacy rule-based screening technology, and it shows. False positive rates produced by legacy screening systems can reach upward of 95% [3]. As a result, Investigators spend most of their time laboriously investigating false alarms instead of thoroughly inspecting suspicious ones.
The emergence of Artificial Intelligence (AI) has been seen as a potential light at the end of the tunnel, showing it can reduce false positives by upwards of 70%. Its power lies in its ability to mimic human decision-making when it comes to matching entities together. It does so by learning from past decisions and leveraging the available context as part of the prediction process. This article explores the challenges of sanction screening, the discrepancies of traditional screening systems and how AI transforms this process using contextual matching.
What makes sanction screening so complex?
Sanctions screening is a process whereby banks check their customers and transactions against lists of sanctioned entities such as individuals, businesses, and vessels. This is done in order to avoid doing business with sanctioned parties and comply with sanctions laws and regulations.
While the process of information comparison may seem trivial for a human, automating this using computer is far from it. The complexity of sanction screening comes down to an amalgamation of factors, which can be broken down as follows:
Data is unstructured
Transaction messages such as payments or trade messages, do not always explicitly define key details about an entity in a structured way, making it difficult to extract the right information and perform like-for-like comparisons.
Names are varied
Names can be written in a variety of ways and still refer to the same person. This is even more pronounced when languages such as Arabic or Mandarin are transliterated into Latin-based characters. Abbreviations, initials, aliases and even spelling mistakes add to the challenge.
Information is limited
The amount of data available will vary. In many cases, the name is the only available information. Occasionally, address such as street name, city, and country will also be present.
Large Evolving Watchlists
Watchlists can contain millions of records and be updated daily. Real-time entity matching on lists largely make sanction screening a big data problem
The limitations of legacy sanction screening
Entity names tend to be the only consistently available information to screen against. For that reason, the foundation of sanction screening is the name-matching component. Legacy screening systems typically utilize fuzzy and phonetic name-matching techniques, supplemented by dictionaries, as part of a wider rule-based algorithm.
Fuzzy matching is an algorithm designed to measure the similarity of two names based on the number of character changes required for one name to match the other. For example, John and Jon have an edit distance of 1, since an insertion of an ‘h’ is required. The more edits required, the lower the fuzzy match score. This approach is used to handle inconsistencies in spelling. On its own fuzzy matching can easily fall short. Take the names ‘Abdul Rasheed’ and ‘Abd Arrachide’, the same name but spelled vastly differently. More sophisticated approaches will supplement fuzzy matching with phonetic matching, which accounts for the similarity in how words sound allowing for more robust matching.
Fuzzy and phonetic matching is effective at text matching but not necessarily entity matching, hence why they are orchestrated as part of a larger rule-based approach. These rules are the product of manually hand-crafted checks which are weighted according to their importance toward the final similarity score. An advanced rule-based screening algorithm can have more than 30 rules that need to be carefully coordinated and weighted to result in meaningful scores. A major limitation of this approach is the rigidity and fragility of the scoring system. Removing or adding a rule requires re-tinkering the entire weighting system. Similarly, changing the weighting of one rule can have major consequences on the final score.
On a more fundamental level, the underlying approach of legacy systems is incongruent with the way investigators perform entity matching. Investigators will gather all the available information such as names, addresses, and past behaviour for context before making any decisions. More generally, humans also subconsciously infer additional details even with the limited available information. For example, to humans, the name Leonardo Mancini is not just a collection of letters. Most people can automatically assert that it belongs to a person most likely to be a male originating from Italy. The ability to extract data from data is an invaluable asset, one that can transform the way entity matching is performed.
Contextual matching using SafeWatch screening’s AI
False positives are a by-product of rigid, imprecise algorithms that lack the full context of the data available. This is why Eastnets has built its own AI-based sanction screening engine with investigators in mind. Building on the strengths of tried-and-tested methods such as fuzzy and phonetic matching, SWS AI expands on this by leveraging Machine Learning, word embeddings, and knowledge bases to enrich the existing data with more context.
There are three core ingredients that ensure SWS’s AI achieves laser-level precision with full transparency.
1. Context Extraction
As mentioned earlier, names are not just a set of characters. Just like humans, AI can be used as a tool to extract even more information from the available names, such as:
- Gender: Jenny and Jonny have similar fuzzy and phonetic scores but could potentially indicate two different people based on their genders.
- Name origin: Some names are associated with different regions and countries such as Mathew and Matthieu. One is English and the other is French, another indication of potentially referring to different people
- Entity Type: AI combined with knowledge bases can help identify if a name belongs to a person or a business. Reducing the chances of matching a company with a person.
- Company Semantic Similarity: ChemTech Drugs and ChemTech Pharmaceutical is more likely to be similar than ChemTech Media Company. AI can infer semantic similarities between company types.
The aim here is information gathering and context building to construct a more complete picture when performing entity matching. In addition to this, more details can also be incorporated as part of the overall context wherever possible. This can include the address, IBAN, entity’s past detections, and adverse media.
2. Automatic contextual matching using historical decisions
Assembling rules for sanction screening can get cumbersome particularly as the rule set becomes more complex. This is where AI has a significant advantage. Rules are implicitly defined within the model itself, and the weights of these rules are calibrated automatically by learning from historical data. Using past decisions, the model will adjust the importance of certain features and context based on how investigators have made decisions in the past. This tailors the model according to the investigators’ decisions.
3. Explainability
The transition from rule-based to AI-based approaches has usually been met with some hesitation due to the black-box nature of machine learning models. Ironically, many legacy screening solutions can be difficult to decipher as well. Ad hoc enhancements and updates to handle edge cases culminate in overly complex rule structures which are far from interpretable. A key part of entity matching and resolution is understanding which factors contributed to the match, which is why SWS’s AI matching solution provides full explainability. Investigators can view exactly how and why the AI classified the result as a match.
In conclusion, the true cost of compliance using legacy sanction screening technologies is quickly becoming infeasible due to the overwhelming number of alerts it generates and the resources required to handle them. AI’s ability to extract hidden context from names and combine it with tried-and-trusted algorithms such as fuzzy matching allows it to screen entities in a contextual manner, just like humans. And with their self-learning capacity, AI models can be trained to make decisions just as an investigator would, reducing false positives dramatically and allowing investigators to focus on the cases that matter most.
About the author
Daoud Abdel Hadi
Lead Data Scientist - Eastnets
"Solving problems using data has been a part of my life for the past six years. I have the privilege of using my data science skills to tackle real-world issues such as fraud, money laundering, and terrorist financing by any means necessary whether it’s machine learning, graph theory, or simple rules.
Eastnets is a global provider of compliance and payment solutions for the financial services sector. Our experience and expertise help ensure trust at 800 financial institutions across the world, including 11 of the top 50 banks. For more than 35 years, we’ve worked to keep the world safe and secure from financial crime. We do it by helping our partners manage risk through screening, monitoring, analysis, and reporting, plus state-of-the-art consultancy, and customer support.