1. Background
The search and recommendation algorithm architecture provides services for all search and recommendation businesses of the JD Group and returns the processed results in real-time to the upstream. Various subsystems of the department have already realized adaptive limiting based on CPU, but the calls from the client side to the server side are still in the RR polling mode, without considering the performance differences of the downstream machines. This leads to the inability to maximize the overall CPU of the cluster, and there is an issue of unbalanced CPU on the server side.
The load balancing method developed by the JD advertising department for its business scenario is very instructive. They proposed the RALB (Remote Aware Load Balance) algorithm, which can improve the CPU resource efficiency of the downstream service cluster machines, avoid the bottleneck effect of CPU, and allow machines with good performance to handle more traffic. We have applied the core idea of this to our system and obtained good benefits.

The structure of this article is as follows:
1. Introduction to RALB
◦A simple introduction to the principle of the algorithm is given.
2. Functional Verification
◦The RALB load balancing technology is applied to the search and recommendation architecture system for functional verification.
3. Throughput Test
◦This section mainly compares the RALB and RR load balancing techniques. It verifies that there is no significant difference in throughput between the two under the conditions of unlimited and fully limited traffic in the cluster. When it comes to partial limiting of RR, there is a difference in throughput between the two, and there is a point of maximum throughput difference. For RALB, the transition from unlimited traffic on the server side to full limiting is a turning point, with almost no partial limiting scenarios.
4. Boundary Testing
◦By simulating various boundary conditions, the system is tested to verify the stability and reliability of RALB.
5. Function Launch
◦Fully enable the RALB load balancing mode on all server-side clusters. It can be seen that before and after going online, the QPS of the server-side gradually appears stratified, and the CPU of the server-side gradually tends to be unified.
II. Introduction to RALB
RALB is a high-performance load balancing algorithm that aims to balance CPU usage
2.1 Algorithm Objective
1. Adjust the CPU usage rate of the server-side to make the CPU between nodes relatively balanced, avoiding CPU usage rate too high to trigger cluster flow control
2. QPS and CPU usage rate are linearly related; adjusting QPS can achieve the goal of CPU usage rate balance
2.2 Algorithm Principle
2.2.1 Algorithm Steps
1. When distributing traffic, distribute according to the weight (weighted random algorithm, wr)
2. Collect CPU usage rate: The server-side feedbacks the CPU usage rate (average 1s) to the client-side through RPC
3. Weight Adjustment: Adjust the weight (every 3s) according to the CPU usage rate (average within the window) of the cluster and each node to balance the CPU of each node
2.2.2 Indicator Dependency
Number | Indicator | Function | Source |
---|---|---|---|
1 | IP | Available IP list | Maintenance of the service registration discovery and fault shielding module |
2 | Real-time health index | Real-time change of IP availability status, providing boundary conditions for the algorithm | Maintenance of RPC framework health check function |
3 | Historical health index | Historical health index, used to judge IP failure and recovery, and other boundary conditions | Historical values of indicator 2 |
4 | Dynamic Target (CPU Usage Rate) | Provide the most direct objective basis for the load balancing algorithm | Server-side timing statistics, the RPC framework returns through RPC |
5 | Weight weight | Real-time Load Distribution Basis | Algorithm Update |
2.2.3 Weight Adjustment Algorithm
2.2.4 Boundary Handling
Boundary 1: Within the feedback window (3s), if the downstream IP is not accessed, its CPU average is 0, and the weighting algorithm will consider the node's performance to be extremely good, thereby increasing the weight
Boundary 2: When a network failure occurs, the RPC framework sets the faulty node as unavailable with CPU and weight set to 0; after the network is restored, the RPC framework sets the IP as available, but nodes with a weight of 0 cannot receive traffic, resulting in the node remaining in an unavailable state
Processing: the update of weights is triggered by a timer, records the availability status of the node, and gives a low weight when the node recovers from an unavailable state to an available state, and gradually recovers the weight
2.3 Key to Implementation
Both fast and stable, avoid getting stuck or collapsing in any situation, especially handle boundary conditions well
Algorithm Key Points:
1. Keep the independent meaning and update mechanism of each dependent factor in the formula to maintain the reliability and simplicity of the algorithm
◦The update of the IP list is guaranteed by service registration discovery and RPC framework together
◦RPC updates CPU
2. Note the meaning of boundary values, and distinguish the meaning of continuous values
◦CPU = 0 indicates unknown, and does not mean that the CPU performance is good
◦w = 0 indicates that it will not be allocated traffic, and it will only be 0 when it is unavailable; in the available case, there should be at least a smaller value to ensure that RPC can still be triggered, thereby updating the weights
3. Update the algorithm weights without relying on RPC triggers, but should be updated regularly
Third, Function Verification
3.1 Load Testing Preparation
Module | IP | CPU |
---|---|---|
Client Side | 10.173.102.36 | 8 |
Server Side | 11.17.80.238 | 8 |
11.18.159.191 | 8 | |
11.17.191.137 | 8 |
3.2 Load Testing Data
Since the performance gap between machines is not large, the CPU effect of the load testing is not significant. In order to make the CPU effect more obvious, apply an initial load to the node '11.17.80.238' (i.e., when there is no traffic, the CPU utilization rate is 12.5%).
3.3 Load Testing Conclusion
After load testing, both RR and LA have the problem of CPU imbalance, which will lead to the短板 effect due to the performance difference of machine resources, and cannot achieve the purpose of fully utilizing resources.
RALB takes CPU as the balancing target, so it will adjust the QPS that the node can handle in real-time according to the node's CPU, thereby achieving the target of CPU balance. Functionally, it has been verified to be available, and the CPU performance meets expectations.
Fourth, Throughput Testing
4.1 Load Testing Objective
RALB is a load balancing algorithm that uses CPU utilization as a dynamic indicator, which can effectively solve the problem of CPU imbalance, avoid the短板 effect of CPU, and allow machines with good performance to handle more traffic. Therefore, we expect that the RALB load balancing strategy can achieve a certain degree of throughput improvement compared to the RR round-robin strategy.
4.2 Load Testing Preparation
100 machines on the Server side are for testing, and the Server side is purely CPU adaptive throttling with a throttling threshold configured at 55%.
4.3 Pressure Test Data
Through pressure testing, the change trend of Server side throughput under the two load balancing modes of RALB and RR is compared, analyzing the impact of both load balancing strategies on cluster throughput.
4.3.1 RALB
4.3.1.1 Throughput Data
The table below is the throughput data of the Server side, obtained from the test pressure Client side, with the load balancing mode set to RALB. The situation of the Server side at 18:17 is close to the just throttled state. During the entire pressure test phase, three cases were tested: no traffic throttling, partial traffic throttling, and complete traffic throttling.
Time | 17:40 | 17:45 | 17:52 | 18:17 | 18:22 |
---|---|---|---|---|---|
Total Traffic | 2270 | 1715 | 1152 | 1096 | 973 |
Processed Traffic | 982 | 1010 | 1049 | 1061 | 973 |
Throttled Traffic | 1288 | 705 | 103 | 35 | 0 |
Throttling Ratio | 56.74% | 41% | 8.9% | 3.2% | 0% |
Average CPU Usage | 55% | 55% | 54% | 54% | 49% |
4.3.1.2 Indicator Monitoring
The traffic received by the Server side machines is distributed according to performance, and the CPU remains balanced.
4.3.2 RR
4.3.2.1 Throughput Data
The table below is the throughput data of the Server side, obtained from the test pressure Client side, with the load balancing mode set to RR. The overall traffic of the Server side at 18:46 is close to the overall traffic of the Server side at 18:17. The focus will be on comparing the data of these two key moments later.
Time | 18:40 | 18:46 | 19:57 | 20:02 | 20:04 | 20:09 |
---|---|---|---|---|---|---|
Total Traffic | 967 | 1082 | 1149 | 1172 | 1263 | 1314 |
Processed Traffic | 927 | 991 | 1024 | 1036 | 1048 | 1047 |
Throttled Traffic | 40 | 91 | 125 | 136 | 216 | 267 |
Throttling Ratio | 4.18% | 8.4% | 10.92% | 11.6% | 17.1% | 20.32% |
Average CPU Usage | 45%(Partial traffic throttled) | 51%(Partial traffic throttled) | 53%(Partial traffic throttled) | 54%(Close to all traffic throttled) | 55%(All traffic throttled) | 55%(All traffic throttled) |
4.3.2.2 Indicator Monitoring
The traffic received by the Server side is balanced, but there is a difference in CPU.
4.4 Pressure Test Analysis
4.4.1 Throughput Curve
Based on the pressure test data in Section 4.3, the throughput curve of the Server side is plotted, and the change trend of throughput under the two load balancing modes of RALB and RR is compared.
import matplotlib.pyplot as plt
import numpy as np
x = [0,1,2,3,4,5,6,7,8,9,9.73,10.958,11.52,17.15,22.7]
y = [0,1,2,3,4,5,6,7,8,9,9.73,10.61,10.49,10.10,9.82]
w = [0,1,2,3,4,5,6,7,8,9.674,10.823,11.496,11.723,12.639,13.141,17.15,22.7]
z = [0,1,2,3,4,5,6,7,8,9.27,9.91,10.24,10.36,10.48,10.47,10.10,9.82]
plt.plot(x, y, 'r-o')
plt.plot(w, z, 'g-o')
plt.show()
4.4.2 Curve Analysis
Load Balancing Strategy | RALB | RR |
---|---|---|
Stage 1: All Machines Not Throttled | Received QPS = Processed QPS, represented by the straight line y = x | Received QPS = Processed QPS, represented by the straight line y = x |
Stage 2: Partial Machine Throttling | There is no RALB that allocates traffic based on the downstream CPU, and the downstream throttles according to CPU. Theoretically speaking, the CPU of the downstream always remains consistent. All machines reach throttling at the same time, and there is no situation where only some machines are throttled. Therefore, in the figure, non-throttling and full machine throttling is a turning point, without a smooth transition stage. | RR strategy, the QPS allocated to the downstream machines is consistent, since the downstream throttles according to CPU, so the time of throttling for different machines is different. Compared to RALB, RR appears throttling earlier, and before reaching throttling, the throughput of RR is always less than that of RALB. |
Stage 3: Full Machine Throttling | After all machines reach the throttling threshold of 55%, theoretically, regardless of how the traffic increases thereafter, the QPS of the processing will remain unchanged. The figure shows that the QPS of the processing has decreased to some extent, because the processing of throttling also consumes part of the CPU. | The time for RR to reach full throttling is later than that of RALB. After full throttling, the QPS of the two modes is consistent. |
4.5 Pressure Test Conclusion
Critical point: the situation with the largest throughput difference, that is, the turning point from non-throttling to full throttling under the RALB mode.
Through the above analysis, it can be known that at the critical point where RALB is not throttled and fully throttled, the throughput difference between RR and RALB is the largest.
At this time, the calculation shows that the throughput of the Server cluster is increased by 7.06% under the RALB mode.
5. Boundary Testing
By simulating various boundary conditions, the stability of the system under boundary conditions can be judged.
Boundary Conditions | Load Testing Scenario | Load Testing Conclusion |
---|---|---|
Downstream Node Throttling | CPU Throttling | Adjustments of the penalty factor have an important impact on traffic distribution |
QPS Throttling | Meets expectations | |
Timeout of downstream nodes | Fixed sleep 1s for each request timeout on the Server End | The traffic allocated during the period of request continuous timeout is basically 0 |
Abnormal exit of downstream nodes | Directly kill -9 pid if the Server End process is killed | Killing the process and automatically restarting it, traffic distribution is quickly restored |
Addition and subtraction of downstream nodes | Manual jsf going online and going offline on the Server End | No traffic handling during the jsf offline period |
Server End restart stop + start | Normal unregistration and registration operations on the Server End process, traffic distribution meets expectations |
6. Function Going Online
Configuration of the Client End at Suqian Data Center for going online, with RALB Load Balancing mode fully enabled on all Server End clusters. It can be seen that before and after going online, the QPS of the Server End gradually shows stratification, and the CPU of the Server End gradually tends to be unified.
References

评论已关闭