
- Original case data annotation source
- Historical case in-depth mining application
- Self-learning mechanism of expert experience rules
- Explainability of risk assessment results
- Model real-time calculation prediction service
- Case approval closed-loop operation and maintenance thinking
1. Original case data annotation source
- Advantages: relatively simple implementation, strong explainability, and can quickly respond to financial industry regulatory documents.
- Defects: There is serious lag, and it cannot respond in time to new fraud methods and methods. It often takes a lot of loss costs to summarize the rules and extract new rules.
- As a post-event analysis technique, it does not have real-time performance, and it cannot play a role in real-time detection in the event.
- As an analysis and reasoning based on business prior experience, it does not have对抗性, the effective time of the model is short, and it is easily bypassed by experienced, organized black industry gangs.
- Based on the hypothetical data distribution and business scenario changes, the model will lose its original predictive effect most of the time after the business evolution and scenario changes, and the model does not have scenario universality and time stability.
- Advantages: Fast development and deployment, strong explainability.
- Defects: The rate of false positives is high, not suitable for sudden changes in user normal behavior. With the passage of time, feature engineering will become more and more complex, and the time complexity will also increase a lot during prediction.
- Advantages: It can realize end-to-end (End to End) training process, effectively fitting the regularity of user behavior.
- Defects: The cost of maintaining the model is relatively high, not very suitable for large-scale time series anomaly detection scenarios. The periodic curve effect is better, and it may not be suitable for burr-type data.
- Advantage: By deploying a single model, it can cope with the gang fraud mining of multiple scenarios, with low project deployment and maintenance costs.
- Drawback: Compared with independent scenario-based gang fraud detection, the recall rate of the model has a certain loss within the acceptable range.
Second, the application of deep mining of historical fraud cases
- It discovers deep complex implicit fraud features, which is more targeted compared with expert rules;
- The discovered rules are nonlinear, and have strong immunity against adversarial attacks during post-event analysis;
- It has both positive and negative bidirectional forces, compared with the unidirectional action model of expert rules, the bidirectional action nature can play a good hedging role for the overfitting risk of the model;
- Through deep network multi-layer cross-combination, on the one hand, it further strengthens the adversarial and sustainable nature of the rule set, and on the other hand, it improves the generalization ability of the model and enhances the universality of fraud scenarios;
- Achieving real-time online risk prevention and control, enhancing the timeliness of model risk control, shifting from post-event analysis to real-time prevention during the event, and covering all channel scenarios for fraud type detection with a single model truly realizing One Model Rule All. The drawback is that the single rule understanding of the expert rule model is relatively poor, and the algorithm complexity is high.
Three,Expert experience rule self-learning mechanism
Under the background of the big data era, new means of financial crime and financial fraud emerge in an endless stream, becoming increasingly technological, professional, and scaled, and also more concealed. It has evolved from past individual combat to large-scale, organized gang fraud, seriously restricting the continuous and effective role of traditional expert knowledge bases and expert experience rules. It is urgent to solve two problems:
- Based on data and algorithm-driven, automatically adjust the thresholds and weights in the scenario rule set to ensure the continuous effectiveness of the rules;
- From different dimensions, explore new risk control rules with strong timeliness, to continuously enrich and improve the anti-fraud rule set.
Rule threshold, weight
Rule threshold, weight learning involves feature discretization, feature selection, feature dimensionality reduction, weight parameter regression, and other processes.
- Information entropy:A parameter used to evaluate the purity of the sample set, it is a measure of the uncertainty and randomness of the system.
- Chi-squareBinningPrinciple:Feature discretization algorithm, to infer whether there is a significant difference between the distribution and expected distribution of the overall sample data, or to infer whether two classification variables are related or independent.
- Minimum entropy binning:Feature discretization algorithm, minimum entropy binning is a typical top-down binning method. Minimum entropy binning puts all the values of the feature to be binned into a single bin, and then splits the bin based on the minimum entropy principle.
- Gini (gini) coefficient:Feature discretization algorithm, the more chaotic the overall internal, the larger the Gini coefficient; the higher the purity inside, the smaller the Gini coefficient.
- Feature selection:Feature selection lies in selecting features that have classification ability for the training data, improving the overall predictive ability of the model, commonly used technologies include lasso regression, Feature Importance, feature collinearity verification, etc.
- Lasso regression:Using a small number of observed values can completely recover non-zero coefficients. The difference is that the number of samples needs to be 'sufficiently large', otherwise the performance of the L1 model will be full of randomness.
New risk control rules
New fraud rule mining, according to the value type processed by the rule, it is divided into two categories:
- If the rule considers the association to be the existence or non-existence of items, then it isBoolean association rules;
- If the rule describes the association between quantified items or attributes, then it isquantitative association rules.
The essence is frequent itemset mining, commonly used algorithms includeApriori, FpGrowth.
- The Apriori algorithm takes advantage of the two characteristics of frequent sets, filters out many irrelevant sets/sets, and improves efficiency a lot, but the Apriori algorithm is a candidate elimination algorithm, and each elimination needs to scan all the data records, causing the entire algorithm to seem helpless when facing large data sets.
- The FpGrowth algorithm compresses data records by constructing a tree structure, so that mining frequent item sets only needs to scan the data records twice, and this algorithm does not need to generate candidate sets/sets, so the efficiency will be higher.
Four, the problem of model interpretability in credit risk evaluation model
Artificial intelligence algorithm models are having a significant impact on the financial industry, on the premise of understanding the advantages and limitations of AI algorithms, as well as the suitable scenarios for their application in the financial field. When we use AI algorithms in financial business, we expect to analyze and explain the model results through measurable methods, in order to optimize the algorithm, guide the business direction, and ensure that the model remains consistent with the business problems to be solved, that is, the problem of model interpretability analysis. Model interpretability analysis can help model developers and business operation personnel understand and explain the feature factors included and not included in the model, as well as how features affect model results.
In a broad sense, the purpose of model interpretability analysis is to globally define the impact of features on the model, while defining the personalized impact of individual features on the model; to better understand model behavior and make corresponding business decisions; machine learning models are usually divided into 'white-box' models and 'black-box' models.
White-box model
White-box models such as linear and Logistic regression models have relatively weak predictive power but are simple and easy to understand. Linear relationships, such as the correlation between credit fraud and demographic characteristics, economic income, etc. Decision tree models (RandomForest vs XGBoost vs LightGBM) are currently recognized as the best interpretable non-linear machine learning algorithms in the industry. Each decision made by the model is presented to us through a decision sequence, showing the basis for the model's decision-making. The variable selection criteria based on information theory built into the decision tree model also helps us understand which variables played a significant role in the process of model decision-making; white-box models have clear features, simple logic, and inherent interpretability.
For 'white-box' models, we can provide analysis techniques from dimensions such as data distribution graphs, feature importance, Embedding visualization, independent feature analysis (PDP), global analysis (SHAP), local feature information (LIME), and feature correlation (correlation coefficient).
- Feature importanceFrom two perspectives, we analyze the 'importance of feature gain' and 'importance of feature splitting'; the importance of feature gain refers to the reduction in the average training loss of feature classification, indicating the relative contribution value of the corresponding feature to the model; the importance of feature splitting refers to the number of times the training samples are split in decision-making, both of which measure the importance of different features in the model training from different dimensions.
- Independent feature analysis: We know the influence degree of a feature on the model through feature importance, and use SHAP to evaluate the global feature importance; however, how to evaluate a single feature, try to evaluate the independent feature through PDP, show the marginal benefit of a single feature for model prediction, and how a single feature affects prediction. By drawing a one-dimensional or two-dimensional relationship graph between the feature and the prediction target, we can understand the relationship between the feature and the target. The complexity of multiple features is further verified by the analysis of a single feature to further verify the influence degree of a single feature on the model.
The 'black box' model
The 'black box' model uses complex machine learning algorithms (deep network learning) to train data, and the input features are transformed through combination transformation and dimension transformation, etc., feature engineering steps. It is necessary to analyze and adjust the data and model to ensure that the model results are consistent with the problems to be solved; however, the internal mechanism of these models is difficult to understand, and it is also impossible to estimate the importance of each feature for the model prediction results to explain the potential association between the model results and the input feature attributes, nor can it directly understand how different features interact with each other (data correlation is not equivalent to causation).
For 'black box' models, we can explain the effectiveness of the model from the perspective of deep network structure analysis and the hit of risk assessment results on the gang association graph.
The associated graph provides rich technologies for graph construction, graph query, graph analysis, graph embedding, and graph mining. Through scenario-based algorithms and models such as community detection, key node analysis, association analysis, and discovery of abnormal network structures, it provides scenario-based auxiliary decision-making services for risk control and marketing, etc. For example, in the identification of credit card fraud, card fraud gangs often communicate with black merchants. During the normal card maintenance stage, the merchants of consumption are relatively fixed. By traversing the associated graph, it is possible to establish a fraud association graph based on the above behavioral characteristics to identify card fraud gangs.
The risk control scenario has key requirements for the explainability, stability of data, models, and features. Explainability and stability still need to be judged in combination with business and expert experience.
Fifth, real-time calculation and prediction service for models
The risk control system has two technical routes: rules and models. The advantages of rules are simplicity, intuitiveness, strong explanatory power, and flexible configuration. The disadvantages are that they are easily broken and can be bypassed by black industries after multiple attempts, rendering the pre-set rules ineffective. In actual risk control systems, it is often combined with risk control links based on models to increase the robustness of the system.
Traditional machine learning modelsThe use of the siloed development, testing, and deployment model has the following drawbacks:
The preparation of training data needs to be manually completed. Data cleaning and feature mining are the preliminary work of model training, which includes both cleaning out feature data from raw data and processing the cleaned features. Due to the lack of a unified feature management platform, the original data required for training still needs to be collected, organized, and cleaned by algorithm engineers themselves.
Model services are in the offline prediction stage. Compared to offline prediction, real-time prediction can make refined predictions by combining users' real-time behavior data information, effectively improving the accuracy of predictions. However, real-time data has complex and changeable characteristics, and real-time prediction has stricter requirements on performance, with a high engineering and technical threshold. Many teams have chosen the relatively easy-to-implement offline prediction method.
The development cycle of real-time model services is long. Real-time model services are inseparable from steps such as real-time feature preparation, business logic development, and model invocation development. Real-time features are generally maintained by developers of each project, inevitably leading to the phenomenon of repeated feature development, resulting in waste of development and storage resources. In addition, a prediction scenario is generally supported by a model service, and new model service requirements need to be developed from scratch, leading to a long development cycle.
Stream-batch integrated architectureA relatively perfect implementation method: adopting a dual-engine architecture of stream computing + interactive analysis, in which the stream computing is responsible for basic data, while the interactive analysis engine is the center. The stream computing engine performs real-time ETL work on the data, compared to offline, it reduces the latency of the ETL process. The interactive analysis engine has its own storage, and through the collaborative optimization of computing and storage, it achieves high write TPS, high query QPS, and low query latency, thus achieving the real-time and SQL-ization of the whole link. This allows real-time analysis and on-demand analysis to be implemented in batch mode, and can quickly respond to changes in business, with the combination of the two, achieving the effect of 1+1>2.
In view of the current situation and drawbacks of the industry mentioned above,It is imperative to provide integrated stream-batch mixed model real-time computing and prediction service capabilities.Fully consider the actual business scenarios and technical indicators of practical applications, providing an efficient and reliable infrastructure platform for scenario computing and model services.
Dynamically parse and load model files based on message-driven mode, accurately restore the runtime model process, and build high concurrency, low latency, and high availability model real-time prediction services.
Support mainstream AI algorithms, including feature engineering algorithms, classifier algorithms, regression algorithms, unsupervised algorithms, and so on.
Provide millisecond-level real-time computing services to efficiently meet the online real-time model engine feature calculation requirements.
From the aspects of design, architecture, function, and performance, the specific features should include the following:
Model performanceProvide high concurrency, low latency, and high availability real-time prediction services. 99.99% of the requests have a latency within a hundred milliseconds, and the TPS reaches above ten thousand;
High availability guaranteeProvide high availability computing capabilities. Ensure that the computing engine can normally provide computing service capabilities in case of partial node failures;
Model process replaySupports dynamic updates of model algorithm processes based on message-driven mode, and efficiently generates runtime instances of related algorithm components, providing real-time feature transformation and calculation;
Feature algorithm type: Support common feature engineering algorithms, including feature encoding, feature transformation, feature dimensionality reduction, feature fusion, and so on;
AI algorithm type: Support common classifiers, regression, unsupervised algorithms, including but not limited to linear regression, Logistic, TreeModel, Graph, Logistic, SVM, NN, kmeans, DBSCan, and so on;
File type: Support various model file formats, such as PMML, Python files, Parquet, and so on;
System monitoring: Support data stream link performance monitoring, service component monitoring, client call monitoring, system resource utilization monitoring, and so on;
Dynamic expansion: Achieve system dynamic expansion capability under the premise of uninterrupted service;
Data preheating: Provide data processing capabilities based on event time data, achieve historical data preheating.
Six,Case approval closed-loop operation and maintenance thinking
The second law of thermodynamics in physics tells us that in an isolated system, if no external force is doing work, its total disorder (entropy) will continue to increase. For example, matter always evolves towards entropy increase, the house will become messy if not cleaned, the phone will become slower and slower, the earphone wire will become disordered, hot water will gradually cool down, and the sun will continue to burn and decay... until the end of the universe - heat death.
In the field of anti-fraud, any excellent model will gradually amplify the problem of model limitations with the development of business platforms, changes in behavior paths, and the evolution of fraud patterns, and the predictive effect of the model is bound to tend towards mediocrity. It is an operation that is neither elegant nor economical to periodically invest in algorithm researchers to optimize existing models. Providing疑似 sample approval technology and mechanisms at the platform level, collecting the approved case samples into the model training set through machine learning sampling technology, can effectively solve the problem of sample failure, update fraud samples to the model training set in a timely manner, and continuously ensure the training and prediction effect of the model.
The疑似 sample approval technical support system includes but is not limited to: OLAP online analysis, real-time reports, user profile tags, criminal fraud graphs, sample edge sampling, sample staining and other technologies, which can effectively reduce the cost of manual review, alleviate the pressure of cumbersome work, and improve labeling efficiency and effectiveness.
User profile tags are divided into a multi-level tag system, including basic tags, entity association tags, sliding window tags, and so on.
Basic feature class labels:Including device types, device brands, device network modes, phone numbers, type of certificate/document, number of certificate/document, products, channels, geographical locations, and so on;
Entity association feature class labels:Spatial network feature class labels, session feature class labels, including associated devices, associated accounts, account operation devices, account-device network topology degree, and so on;
Sliding window feature class labelsTime series feature class labels include commonly used IP addresses, commonly used devices, commonly used accounts, usual locations, commonly used channels in various scenarios, preferred times in various scenarios, frequency of behavior sequences, attribute preferences, and so on;
Criminal Fraud Graph CommunityThe indicators of agglomeration degree include association, centrality, network clustering, and PageRank.
Association indicators: One-degree association refers to how many other nodes a node is adjacent to. If it is a directed network, it can be further divided into in-degree and out-degree, which correspond to the number of nodes linked in and out, respectively. Two-degree association refers to the number of nodes associated through an intermediate node. One-degree and two-degree associations are widely used in anti-fraud scenarios;
The indicators of node importance are centrality(Centrality): For example, betweenness centrality, eigenvector centrality, etc. Betweenness centrality measures how many other nodes between the shortest paths must pass through node X. Eigenvector centrality can be considered as an extension of one-degree association, when considering the importance of a node, not only consider how many adjacent nodes it has, but also consider the importance of these adjacent nodes;
Network clustering: As things gather by kind and people by group, it examines the commonality of clusters formed by nodes closely connected to a node, which often has a strong guiding significance for studying the characteristics of the node. Based on a small amount of data label annotation algorithms including label propagation algorithm, Triangle count, Connected components, etc.
Xindun era machine learning engine
The Xindun machine learning engine provides an integrated solution that integrates systematic risk control models, whole lifecycle management, and model real-time service.
Systematic risk control model
Anomaly detection model:By unsupervised methods, effectively solves the problem of group and individual fraud detection, captures new types of fraud samples, and provides new types of fraud samples for the case platform.
Deep fraud detection model:By comprehensively learning incremental fraud cases and historical fraud cases, deeply mining new types of nonlinear high-dimensional combined fraud features, improving the effective fraud detection rate and reducing the false alarm rate.
Threshold-weight dynamic learning model: By incrementally learning new types of fraud cases, dynamically adjusting the threshold and weight ratio of specific rules in the rule set, and continuously optimizing the accuracy and timeliness of the risk evaluation engine results.
User profiling and fraud association graph: Provides a visual analysis tool for the aggregation of indicators from multiple business dimensions and the perspective of gang network fraud, offering scientific decision-making support for the judgment of model prediction results.
Whole lifecycle management
Relying on case approval technology and operational mechanisms, the training set is continuously iteratively updated. Through the lifecycle management functions such as model monitoring, model training, model selection, model release, and model optimization, the operation and maintenance adjustment from the micro-level rules is elevated to a macro-level management mechanism based on the model lifecycle.
Model real-time service
The Xindun model online service engine, based on the design philosophy of kappa architecture, provides a real-time feature calculation and model prediction service capability integrating high concurrency, low latency, high availability, easy expansion, and hot deployment, which efficiently realizes the evaluation and feedback of event risks in real-time.

评论已关闭