6. Case approval closed-loop operation and maintenance thinking

The internet is a double-edged sword. With the internet becoming a key technology for accelerating business development in various industries, all industries are facing fraud issues such as spam comments, ranking fraud, transaction fraud, sheep-rugging, account theft, scanning, cashback, and bulk fake invoicing. Anti-fraud has become a prerequisite for the normal operation of business operations and marketing activities in all industries. Behind these fraud phenomena are often black industry gangs working together, and the increasing anti-fraud costs for enterprises.

As an important technical means, anti-fraud models exist widely in various industries such as the internet, finance, and traditional retail, but there is no complete theoretical framework and methodology. This article combines the current situation of internet anti-fraud and lists the experiences and lessons learned in the construction of the internet anti-fraud model system.

With the gangsterization and industrialization of internet fraud, issues such as risk control, anti-fraud, credit, and abnormal detection in all fields need to be seriously considered in the following six aspects, and as many reasonable solutions as possible should be proposed:

Original case data annotation source
Historical case in-depth mining application
Self-learning mechanism of expert experience rules
Explainability of risk assessment results
Model real-time calculation prediction service
Case approval closed-loop operation and maintenance thinking

1. Original case data annotation source

Anti-fraud case annotation data is an extremely valuable resource. In most scenarios, there is no data annotation in the initial stage of the project. The sources of annotation data are classified from technical means:

List library

Traditional black and white lists judge fraudulent behavior through various personnel information, phone numbers, devices, IP addresses, etc., accumulated internally and obtained externally, which is a simple and low-cost anti-fraud method. However, it has shortcomings such as low accuracy and narrow coverage, and can only be used as the first filter for internet anti-fraud.

Expert rules

Expert rules are currently mature anti-fraud methods and means, based on the experience and lessons of anti-fraud strategy personnel, formulate anti-fraud rules, and when the user's operational behavior triggers the anti-fraud rules, initiate the corresponding disposal strategy.

Advantages: relatively simple implementation, strong explainability, and can quickly respond to financial industry regulatory documents.
Defects: There is serious lag, and it cannot respond in time to new fraud methods and methods. It often takes a lot of loss costs to summarize the rules and extract new rules.

In addition, due to the limitations of the human brain, expert rules can only use one or several scalar dimensions for calculation and recognition, which has a high false alarm rate. At the same time, expert rules are heavily dependent on the experience level of strategy personnel, and the effectiveness of expert rules formulated by personnel with different levels of experience will also differ greatly, mainly as an emergency response means and bottom-line defense for internet anti-fraud.

Statistical analysis

Using comparative analysis, trend analysis, distribution analysis, funnel analysis and other data analysis methods to explore data consistency, concentration and other characteristics to find fraud patterns, suitable for static data scenarios, such as model POC, machine learning competitions, etc. By combining data analysis technology + customer group classification + scenario-based prior knowledge assumption technology, it can achieve good recall rate and model explainability.

In the real-time computing scenario of the production environment, the disadvantages are also very obvious, mainly including the following points:

As a post-event analysis technique, it does not have real-time performance, and it cannot play a role in real-time detection in the event.
As an analysis and reasoning based on business prior experience, it does not have对抗性, the effective time of the model is short, and it is easily bypassed by experienced, organized black industry gangs.
Based on the hypothetical data distribution and business scenario changes, the model will lose its original predictive effect most of the time after the business evolution and scenario changes, and the model does not have scenario universality and time stability.

Anomaly detection

Suitable for individual fraud scenarios and gang fraud scenarios. In individual fraud scenarios such as account theft, illegal transfer and illegal刷刷、anti-insider trading, there are two common approaches: one is rule + simple statistical model, based on user registration, login, consumption, transfer information to build statistical features, fitting features and classification features, etc., and connect to exponential moving average algorithm, LOF, IForest, Holt-Winters, ARIMA algorithm to find anomaly points.

Advantages: Fast development and deployment, strong explainability.
Defects: The rate of false positives is high, not suitable for sudden changes in user normal behavior. With the passage of time, feature engineering will become more and more complex, and the time complexity will also increase a lot during prediction.

Another type is deep learning + time series detection algorithm, using common LSTM models to fit periodic time series, or using the reconstruction error and local error of autoencoders, the input data is normalized raw data (normalized raw data), and the output is two labels (normal & abnormal).

Advantages: It can realize end-to-end (End to End) training process, effectively fitting the regularity of user behavior.
Defects: The cost of maintaining the model is relatively high, not very suitable for large-scale time series anomaly detection scenarios. The periodic curve effect is better, and it may not be suitable for burr-type data.

Association graph

Graph data is a more extensive data representation method. To exaggerate, there is no scenario where data exists in isolation from each other, and the relationships between these data can all be expressed in the form of a graph. In the past year, the application scenarios of GNN have continuously extended, covering computer vision, 3D vision, natural language processing, scientific research, knowledge graph, recommendation, anti-fraud, and other scenarios.

The mining of relationship data is the most important technical perspective for most anti-fraud business operations, whether it is based on association analysis of fraud risk propagation or structured mining of black industry gang patterns, the graph is the preferred tool for anti-fraud business personnel. Against this background, GNN becomes extremely versatile. For example, GNN is used to identify and detect Alipay fraud users, and GNN is used to identify fraud in the comments on Xianyu.

In the scenario of gang fraud, the definition of association graph relationships is introduced, through the definition of common use, sharing, and connection direction, a complex relationship graph based on different resource dimensions is constructed, such as account graph, device graph, and phone number graph. Graph algorithms such as in-degree, centrality, modularity, shortest path, Riskrank, vertex embedding, and graph convolution from the field of graph theory are introduced, which are very applicable to most gang fraud scenarios.

Advantage: By deploying a single model, it can cope with the gang fraud mining of multiple scenarios, with low project deployment and maintenance costs.
Drawback: Compared with independent scenario-based gang fraud detection, the recall rate of the model has a certain loss within the acceptable range.

Second, the application of deep mining of historical fraud cases

After the project has been online for a period of time, the original case data has a certain sedimentation and accumulation. At this stage, the problem is how to better mine hidden fraud patterns from the existing sedimentation cases and provide online real-time prediction services. Common algorithms for deep fraud detection include XGBoost, DeepFFM, XDeepFM, Wide&Deep, DIN, and algorithm fusion.

Supervised learning models have the following five advantages in discovering fraud patterns contained in historical cases compared to expert experience:

It discovers deep complex implicit fraud features, which is more targeted compared with expert rules;
The discovered rules are nonlinear, and have strong immunity against adversarial attacks during post-event analysis;
It has both positive and negative bidirectional forces, compared with the unidirectional action model of expert rules, the bidirectional action nature can play a good hedging role for the overfitting risk of the model;
Through deep network multi-layer cross-combination, on the one hand, it further strengthens the adversarial and sustainable nature of the rule set, and on the other hand, it improves the generalization ability of the model and enhances the universality of fraud scenarios;
Achieving real-time online risk prevention and control, enhancing the timeliness of model risk control, shifting from post-event analysis to real-time prevention during the event, and covering all channel scenarios for fraud type detection with a single model truly realizing One Model Rule All. The drawback is that the single rule understanding of the expert rule model is relatively poor, and the algorithm complexity is high.

Of course, to correctly understand the role and function of machine learning models, it is necessary for business personnel and technical operation personnel related to the model to establish model governance concepts and cognition, from the perspective of a single rule to the perspective of the overall prevention and control of the model, to better and more comprehensively understand the model effect. From global indicators such as detection rate, false alarm rate, missed detection rate, and accuracy, analyze and judge the overall effect and trend of the model.

Three,Expert experience rule self-learning mechanism

Under the background of the big data era, new means of financial crime and financial fraud emerge in an endless stream, becoming increasingly technological, professional, and scaled, and also more concealed. It has evolved from past individual combat to large-scale, organized gang fraud, seriously restricting the continuous and effective role of traditional expert knowledge bases and expert experience rules. It is urgent to solve two problems:

Based on data and algorithm-driven, automatically adjust the thresholds and weights in the scenario rule set to ensure the continuous effectiveness of the rules;
From different dimensions, explore new risk control rules with strong timeliness, to continuously enrich and improve the anti-fraud rule set.

Rule threshold, weight

Rule threshold, weight learning involves feature discretization, feature selection, feature dimensionality reduction, weight parameter regression, and other processes.

Information entropy：A parameter used to evaluate the purity of the sample set, it is a measure of the uncertainty and randomness of the system.
Chi-squareBinningPrinciple：Feature discretization algorithm, to infer whether there is a significant difference between the distribution and expected distribution of the overall sample data, or to infer whether two classification variables are related or independent.
Minimum entropy binning：Feature discretization algorithm, minimum entropy binning is a typical top-down binning method. Minimum entropy binning puts all the values of the feature to be binned into a single bin, and then splits the bin based on the minimum entropy principle.
Gini (gini) coefficient：Feature discretization algorithm, the more chaotic the overall internal, the larger the Gini coefficient; the higher the purity inside, the smaller the Gini coefficient.
Feature selection：Feature selection lies in selecting features that have classification ability for the training data, improving the overall predictive ability of the model, commonly used technologies include lasso regression, Feature Importance, feature collinearity verification, etc.
Lasso regression：Using a small number of observed values can completely recover non-zero coefficients. The difference is that the number of samples needs to be 'sufficiently large', otherwise the performance of the L1 model will be full of randomness.

New risk control rules

New fraud rule mining, according to the value type processed by the rule, it is divided into two categories:

If the rule considers the association to be the existence or non-existence of items, then it isBoolean association rules;
If the rule describes the association between quantified items or attributes, then it isquantitative association rules.

The essence is frequent itemset mining, commonly used algorithms includeApriori, FpGrowth.

The Apriori algorithm takes advantage of the two characteristics of frequent sets, filters out many irrelevant sets/sets, and improves efficiency a lot, but the Apriori algorithm is a candidate elimination algorithm, and each elimination needs to scan all the data records, causing the entire algorithm to seem helpless when facing large data sets.
The FpGrowth algorithm compresses data records by constructing a tree structure, so that mining frequent item sets only needs to scan the data records twice, and this algorithm does not need to generate candidate sets/sets, so the efficiency will be higher.

Four, the problem of model interpretability in credit risk evaluation model

Artificial intelligence algorithm models are having a significant impact on the financial industry, on the premise of understanding the advantages and limitations of AI algorithms, as well as the suitable scenarios for their application in the financial field. When we use AI algorithms in financial business, we expect to analyze and explain the model results through measurable methods, in order to optimize the algorithm, guide the business direction, and ensure that the model remains consistent with the business problems to be solved, that is, the problem of model interpretability analysis. Model interpretability analysis can help model developers and business operation personnel understand and explain the feature factors included and not included in the model, as well as how features affect model results.

In a broad sense, the purpose of model interpretability analysis is to globally define the impact of features on the model, while defining the personalized impact of individual features on the model; to better understand model behavior and make corresponding business decisions; machine learning models are usually divided into 'white-box' models and 'black-box' models.

White-box model

White-box models such as linear and Logistic regression models have relatively weak predictive power but are simple and easy to understand. Linear relationships, such as the correlation between credit fraud and demographic characteristics, economic income, etc. Decision tree models (RandomForest vs XGBoost vs LightGBM) are currently recognized as the best interpretable non-linear machine learning algorithms in the industry. Each decision made by the model is presented to us through a decision sequence, showing the basis for the model's decision-making. The variable selection criteria based on information theory built into the decision tree model also helps us understand which variables played a significant role in the process of model decision-making; white-box models have clear features, simple logic, and inherent interpretability.

For 'white-box' models, we can provide analysis techniques from dimensions such as data distribution graphs, feature importance, Embedding visualization, independent feature analysis (PDP), global analysis (SHAP), local feature information (LIME), and feature correlation (correlation coefficient).

Feature importanceFrom two perspectives, we analyze the 'importance of feature gain' and 'importance of feature splitting'; the importance of feature gain refers to the reduction in the average training loss of feature classification, indicating the relative contribution value of the corresponding feature to the model; the importance of feature splitting refers to the number of times the training samples are split in decision-making, both of which measure the importance of different features in the model training from different dimensions.
Independent feature analysis: We know the influence degree of a feature on the model through feature importance, and use SHAP to evaluate the global feature importance; however, how to evaluate a single feature, try to evaluate the independent feature through PDP, show the marginal benefit of a single feature for model prediction, and how a single feature affects prediction. By drawing a one-dimensional or two-dimensional relationship graph between the feature and the prediction target, we can understand the relationship between the feature and the target. The complexity of multiple features is further verified by the analysis of a single feature to further verify the influence degree of a single feature on the model.

The 'black box' model

The 'black box' model uses complex machine learning algorithms (deep network learning) to train data, and the input features are transformed through combination transformation and dimension transformation, etc., feature engineering steps. It is necessary to analyze and adjust the data and model to ensure that the model results are consistent with the problems to be solved; however, the internal mechanism of these models is difficult to understand, and it is also impossible to estimate the importance of each feature for the model prediction results to explain the potential association between the model results and the input feature attributes, nor can it directly understand how different features interact with each other (data correlation is not equivalent to causation).

For 'black box' models, we can explain the effectiveness of the model from the perspective of deep network structure analysis and the hit of risk assessment results on the gang association graph.

The associated graph provides rich technologies for graph construction, graph query, graph analysis, graph embedding, and graph mining. Through scenario-based algorithms and models such as community detection, key node analysis, association analysis, and discovery of abnormal network structures, it provides scenario-based auxiliary decision-making services for risk control and marketing, etc. For example, in the identification of credit card fraud, card fraud gangs often communicate with black merchants. During the normal card maintenance stage, the merchants of consumption are relatively fixed. By traversing the associated graph, it is possible to establish a fraud association graph based on the above behavioral characteristics to identify card fraud gangs.

The risk control scenario has key requirements for the explainability, stability of data, models, and features. Explainability and stability still need to be judged in combination with business and expert experience.

Fifth, real-time calculation and prediction service for models

The risk control system has two technical routes: rules and models. The advantages of rules are simplicity, intuitiveness, strong explanatory power, and flexible configuration. The disadvantages are that they are easily broken and can be bypassed by black industries after multiple attempts, rendering the pre-set rules ineffective. In actual risk control systems, it is often combined with risk control links based on models to increase the robustness of the system.

Traditional machine learning modelsThe use of the siloed development, testing, and deployment model has the following drawbacks:

The preparation of training data needs to be manually completed. Data cleaning and feature mining are the preliminary work of model training, which includes both cleaning out feature data from raw data and processing the cleaned features. Due to the lack of a unified feature management platform, the original data required for training still needs to be collected, organized, and cleaned by algorithm engineers themselves.
Model services are in the offline prediction stage. Compared to offline prediction, real-time prediction can make refined predictions by combining users' real-time behavior data information, effectively improving the accuracy of predictions. However, real-time data has complex and changeable characteristics, and real-time prediction has stricter requirements on performance, with a high engineering and technical threshold. Many teams have chosen the relatively easy-to-implement offline prediction method.
The development cycle of real-time model services is long. Real-time model services are inseparable from steps such as real-time feature preparation, business logic development, and model invocation development. Real-time features are generally maintained by developers of each project, inevitably leading to the phenomenon of repeated feature development, resulting in waste of development and storage resources. In addition, a prediction scenario is generally supported by a model service, and new model service requirements need to be developed from scratch, leading to a long development cycle.

Stream-batch integrated architectureA relatively perfect implementation method: adopting a dual-engine architecture of stream computing + interactive analysis, in which the stream computing is responsible for basic data, while the interactive analysis engine is the center. The stream computing engine performs real-time ETL work on the data, compared to offline, it reduces the latency of the ETL process. The interactive analysis engine has its own storage, and through the collaborative optimization of computing and storage, it achieves high write TPS, high query QPS, and low query latency, thus achieving the real-time and SQL-ization of the whole link. This allows real-time analysis and on-demand analysis to be implemented in batch mode, and can quickly respond to changes in business, with the combination of the two, achieving the effect of 1+1>2.

In view of the current situation and drawbacks of the industry mentioned above,It is imperative to provide integrated stream-batch mixed model real-time computing and prediction service capabilities.Fully consider the actual business scenarios and technical indicators of practical applications, providing an efficient and reliable infrastructure platform for scenario computing and model services.

Dynamically parse and load model files based on message-driven mode, accurately restore the runtime model process, and build high concurrency, low latency, and high availability model real-time prediction services.
Support mainstream AI algorithms, including feature engineering algorithms, classifier algorithms, regression algorithms, unsupervised algorithms, and so on.
Provide millisecond-level real-time computing services to efficiently meet the online real-time model engine feature calculation requirements.

From the aspects of design, architecture, function, and performance, the specific features should include the following:

Model performanceProvide high concurrency, low latency, and high availability real-time prediction services. 99.99% of the requests have a latency within a hundred milliseconds, and the TPS reaches above ten thousand;

High availability guaranteeProvide high availability computing capabilities. Ensure that the computing engine can normally provide computing service capabilities in case of partial node failures;

Model process replaySupports dynamic updates of model algorithm processes based on message-driven mode, and efficiently generates runtime instances of related algorithm components, providing real-time feature transformation and calculation;

Feature algorithm type: Support common feature engineering algorithms, including feature encoding, feature transformation, feature dimensionality reduction, feature fusion, and so on;

AI algorithm type: Support common classifiers, regression, unsupervised algorithms, including but not limited to linear regression, Logistic, TreeModel, Graph, Logistic, SVM, NN, kmeans, DBSCan, and so on;

File type: Support various model file formats, such as PMML, Python files, Parquet, and so on;

System monitoring: Support data stream link performance monitoring, service component monitoring, client call monitoring, system resource utilization monitoring, and so on;

Dynamic expansion: Achieve system dynamic expansion capability under the premise of uninterrupted service;

Data preheating: Provide data processing capabilities based on event time data, achieve historical data preheating.

Six,Case approval closed-loop operation and maintenance thinking

The second law of thermodynamics in physics tells us that in an isolated system, if no external force is doing work, its total disorder (entropy) will continue to increase. For example, matter always evolves towards entropy increase, the house will become messy if not cleaned, the phone will become slower and slower, the earphone wire will become disordered, hot water will gradually cool down, and the sun will continue to burn and decay... until the end of the universe - heat death.

In the field of anti-fraud, any excellent model will gradually amplify the problem of model limitations with the development of business platforms, changes in behavior paths, and the evolution of fraud patterns, and the predictive effect of the model is bound to tend towards mediocrity. It is an operation that is neither elegant nor economical to periodically invest in algorithm researchers to optimize existing models. Providing疑似 sample approval technology and mechanisms at the platform level, collecting the approved case samples into the model training set through machine learning sampling technology, can effectively solve the problem of sample failure, update fraud samples to the model training set in a timely manner, and continuously ensure the training and prediction effect of the model.

The疑似 sample approval technical support system includes but is not limited to: OLAP online analysis, real-time reports, user profile tags, criminal fraud graphs, sample edge sampling, sample staining and other technologies, which can effectively reduce the cost of manual review, alleviate the pressure of cumbersome work, and improve labeling efficiency and effectiveness.

User profile tags are divided into a multi-level tag system, including basic tags, entity association tags, sliding window tags, and so on.

Basic feature class labels:Including device types, device brands, device network modes, phone numbers, type of certificate/document, number of certificate/document, products, channels, geographical locations, and so on;
Entity association feature class labels:Spatial network feature class labels, session feature class labels, including associated devices, associated accounts, account operation devices, account-device network topology degree, and so on;
Sliding window feature class labelsTime series feature class labels include commonly used IP addresses, commonly used devices, commonly used accounts, usual locations, commonly used channels in various scenarios, preferred times in various scenarios, frequency of behavior sequences, attribute preferences, and so on;

Criminal Fraud Graph CommunityThe indicators of agglomeration degree include association, centrality, network clustering, and PageRank.

Association indicators: One-degree association refers to how many other nodes a node is adjacent to. If it is a directed network, it can be further divided into in-degree and out-degree, which correspond to the number of nodes linked in and out, respectively. Two-degree association refers to the number of nodes associated through an intermediate node. One-degree and two-degree associations are widely used in anti-fraud scenarios;
The indicators of node importance are centrality（Centrality）: For example, betweenness centrality, eigenvector centrality, etc. Betweenness centrality measures how many other nodes between the shortest paths must pass through node X. Eigenvector centrality can be considered as an extension of one-degree association, when considering the importance of a node, not only consider how many adjacent nodes it has, but also consider the importance of these adjacent nodes;
Network clustering: As things gather by kind and people by group, it examines the commonality of clusters formed by nodes closely connected to a node, which often has a strong guiding significance for studying the characteristics of the node. Based on a small amount of data label annotation algorithms including label propagation algorithm, Triangle count, Connected components, etc.

Xindun era machine learning engine

The Xindun machine learning engine provides an integrated solution that integrates systematic risk control models, whole lifecycle management, and model real-time service.

Systematic risk control model

Anomaly detection model:By unsupervised methods, effectively solves the problem of group and individual fraud detection, captures new types of fraud samples, and provides new types of fraud samples for the case platform.
Deep fraud detection model:By comprehensively learning incremental fraud cases and historical fraud cases, deeply mining new types of nonlinear high-dimensional combined fraud features, improving the effective fraud detection rate and reducing the false alarm rate.
Threshold-weight dynamic learning model: By incrementally learning new types of fraud cases, dynamically adjusting the threshold and weight ratio of specific rules in the rule set, and continuously optimizing the accuracy and timeliness of the risk evaluation engine results.
User profiling and fraud association graph: Provides a visual analysis tool for the aggregation of indicators from multiple business dimensions and the perspective of gang network fraud, offering scientific decision-making support for the judgment of model prediction results.

Whole lifecycle management

Relying on case approval technology and operational mechanisms, the training set is continuously iteratively updated. Through the lifecycle management functions such as model monitoring, model training, model selection, model release, and model optimization, the operation and maintenance adjustment from the micro-level rules is elevated to a macro-level management mechanism based on the model lifecycle.

Model real-time service

The Xindun model online service engine, based on the design philosophy of kappa architecture, provides a real-time feature calculation and model prediction service capability integrating high concurrency, low latency, high availability, easy expansion, and hot deployment, which efficiently realizes the evaluation and feedback of event risks in real-time.

你可能想看：

It is possible to perform credible verification on the system boot program, system program, important configuration parameters, and application programs of computing devices based on a credible root,

4.5 Main person in charge reviews the simulation results, sorts out the separated simulation issues, and allows the red and blue teams to improve as soon as possible. The main issues are as follows

In today's rapidly developing digital economy, data has become an important engine driving social progress and enterprise development. From being initially regarded as part of intangible assets to now

Article 2 of the Cryptography Law clearly defines the term 'cryptography', which does not include commonly known terms such as 'bank card password', 'login password', as well as facial recognition, fi

1. Based on scenarios, enhance and strengthen the capabilities of the security system to accelerate the operational process

APP Illegal Trend： Interpreting the 'Identification Method for Illegal and Unauthorized Collection and Use of Personal Information by APPs'

Ensure that the ID can be accessed even if it is guessed or cannot be tampered with; the scenario is common in resource convenience and unauthorized vulnerability scenarios. I have found many vulnerab

b) It should have the login failure handling function, and should configure and enable measures such as ending the session, limiting the number of illegal logins, and automatically exiting when the lo

Based on AbstractProcessor, MapStruct is extended to automatically generate entity mapping utility classes

Data security can be said to be a hot topic in recent years, especially with the rapid development of information security technologies such as big data and artificial intelligence, the situation of d

最后修改时间：2025-03-26 07:52:48