I. Background and system security requirement analysis

0 19
I. Background and system security requirement analysis 1. Importance of the syst...

I. Background and system security requirement analysis

I. Background and system security requirement analysis

1. Importance of the system

As shown in the figure above, the location of the interface is very important for e-commerce platforms or online stores. Categorizing and searching for products is crucial. By providing clear product categories to users, it helps them quickly find the products they need, saves browsing time, improves shopping efficiency, and is a core link in the generation of GMV during shopping checkout. Why do e-commerce platforms all pay great attention to the crawling of product information?

a. Data collection and analysis: These data are very valuable for market research, competitive analysis, price comparison, and other aspects. They provide information about product trends, consumer preferences, price fluctuations, and are helpful for enterprises to make decisions and formulate marketing strategies.

b. Price monitoring and dynamic adjustment: can track and monitor the price changes of competitors in real time. Enterprises can adjust their product pricing in a timely manner according to market conditions to maintain competitiveness and better meet consumer needs, etc.

2. Risk assessment

a. System security, as well as triggering various alarms

b. Data security

c. Bandwidth and server resource consumption

d. Unfair competition, etc.;

3. Crawling issues during the 618 period

Since this interface is still relatively special, we just migrated to the color gateway 3 versions ago, and other lower versions use another physical gateway, which we temporarily call: B gateway. In addition, due to some historical reasons, B gateway has differentiated Get and Post two interfaces to provide clients. So there are a total of 3 interfaces.

Client has multi-platforms: h5, WeChat mini program, Alipay mini program, android, ios, rn

a. The crawling curve is clearly identifiable from the monitoring, and the performance also alarms, and it is not very clear whether it is a login crawler or a non-login crawler

b. Back-end service monitoring the traffic from these 3 interfaces is not differentiated between color gateway and B gateway, and it is still to be determined where the source of the crawler comes from

c. There are crawlers on the clients of all platforms, and the performance of android is more affected

d. The situation of migration to the color gateway is not consistent among various ends, and some ends have problems and are downgraded to B gateway. In addition, h5 and rn do not have version issues, everything is fully switched, while ios, android, and mini programs still have issues with calling old interfaces of old versions.

Second, anti-crawling strategy selection

1. Selectable anti-crawling strategies

Strategies that can be used by the B gateway

a. Login state - intercept for non-logged-in users

b. Anti-human strategy - frequency control of user pin dimension frequency control

c. Security personnel conduct behavioral analysis, and pin to black list

Shendu strategies that can be used by the color gateway

a. Login state - intercept for non-logged-in users

b. One of the Shendu anti-fraud capabilities --- reinforcement

c. One of the Shendu anti-fraud capabilities --- anti-crawling

d. One of the Shendu anti-fraud capabilities --- small accounts

e. One of the Shendu anti-fraud capabilities --- risk control

2. How to choose? Balance between system status -- between user experience and anti-crawling

a. Since the login state was not opened historically, the priority is to choose not to open the login state

b. Check the version proportion, deal with the old version first to see the effect, which is the old B platform

c. If the effect after ab execution is not very obvious, consider enabling login state in the old version, such as enabling login state

d. If the login state is still not OK after the login state is opened, consider that the crawler has a login state for crawling and implement frequency control

e. If it is still not OK, consider that the crawler is crawling a higher version, and it is necessary to put more effort into color

f. Contact a mature color team to communicate the crawling issue, and then they will provide suggestions and operations that can be done, and evaluate the application for execution

g. If it doesn't work in the end, consider logging in state

h. During the anti-crawling process, both sides have been in conflict all the time, such as some strategies taking effect, the other party trying to crack them, continuously getting stricter, and continuously trying to crack them

3. Implement anti-crawling measures

Notes: The monitoring involved below has several types

a. Server background monitoring is divided into client-side: all sources (including B gateway's GET and POST and Color's) monitoring

b. B gateway's own monitoring key is divided into client-side: only includes B gateway (GET and POST interfaces have separate monitoring)

c. Color gateway's monitoring is divided into client-side: only includes Color gateway

19. Find the security personnel of B gateway to analyze, identify some blacklist, after adding the effect is not very obvious, it is recommended to see the version proportion to open the login state

18. Find the埋点analysis personnel to get the data isColor=1 is Color gateway, 0 is B gateway; isLogin=1 is logged in, 0 is not logged in

17. When looking at the monitoring of B gateway and Color gateway separately, it was found that both had crawlers. After confirming with the product team, the login state verification of the old platform was opened - first try the Android end to see the effect. In fact, some competing products also have login state interception for such interfaces.

Monitoring on the server side for the Android end

15. After the launch, the effect was significant, so the login state of the B gateway interface on all ends was also opened. It was found that there was little effect on H5 and the mini-program, and it was suspected that there was a login state crawler. The security personnel continued to analyze.

Monitoring on the server side for the H5 end

13. Suspecting that it is a login state crawler, the security personnel tried to obtain risky pins. The plan was to obtain data through埋点behavior analysis, and restrict the following conditions to obtain 35 pins, and then add them to the blacklist after full exposure.

Data extraction strategy one

a. Interface埋点time range: 2023.05.30 2:00 - 2023.05.30 4:00

b. platcode = H5

c. The single pin access has no latitude and longitude parameters

d. The number of different store ids accessed with a single pin is greater than 5

Data extraction strategy two

a. Interface埋点time range: 2023.05.30 2:00 - 2023.05.30 4:00

b. platcode = H5

c. The number of times a single pin is accessed is greater than 100

d. The number of different store ids accessed with a single pin is greater than 10

Monitoring of B gateway's POST interface for the H5 end

6. At the same time, it was realized why there was still traffic on the B gateway's GET interface? Is it normal users, or all crawlers? The interface call situation of each end was confirmed. It was determined that neither end was calling the old GET interface, and to be more rigorous, it was confirmed which platform it was coming from. Looking at the logs, it was found that it was the H5 end, and it was further confirmed that the H5 also does not have the concept of version. Therefore, it was determined that all were crawlers, and it was decided to directly disable them.

Monitoring of B gateway for GET interfaces

7. The mini-program seems to have no effect. It was confirmed that the traffic came from Color. At this point, all that could be done by the B gateway had been done, so the battle shifted to the Color platform, and contact was made with Shen Dun.

First, do not consider opening the login state, and see what strategies the Shendun can adjust, and whether there is an effect.

8. After the Shendun personnel analyze the verification results:

a. The Shendun identifies some cases without uuid, and we determine that uuid must be transmitted on all client endpoints, so this part is a scraper. The strategy of intercepting when uuid is empty is to force the black industry to transmit uuid to make the anti-crawling model related to uuid take effect

b. Strengthen the opening of strict rules grayed out 10%

c. Increase recognition strategy after anti-crawling is opened

The security monitoring effect of the color gateway

9. Up to this stage, it is relatively OK, but the web crawler side is also fighting against cracking and changing strategies to continue crawling. After a few days of tranquility, there was a large amount of abnormal traffic on the Android platform at 6:00 AM on June 6, and then there was a gradual regular abnormal traffic curve, indicating that the other party has cracked and started to continue crawling. Moreover, the performance monitoring of the Android endpoint is twice as high as the daily tp99, triggering an alarm, and suspecting that it may be related to merchants with a large amount of product data.

The color gateway Android endpoint had an abnormal increase in traffic on June 6

The server-side Android endpoint monitoring started daily crawling on June 7, indicating that it has been cracked

After feedback, supplement the recognition strategy, continue to tighten, and in addition, the small number has recognition without interception, which is because the previous application to open the small number was not approved, and after further communication, it was opened, and the effect was effective. The effect is as follows:

The monitoring effect of the color gateway's security monitoring account

10. However, there are still web crawlers on Android, and the performance has not recovered. Therefore, consider opening frequency control, with a single pin 10 times without effect, and a single pin 5 times without effect. Finally, in the middle of the night, we decided to try to open the login state of the color gateway, and the effect was immediate, and the performance was restored. This indicates that the web crawlers are from unlogged users, which is also why the effect of pin frequency control is not obvious. The jddjAPP (Android and iOS) application with the most affected performance is opened, as the operations of the Shendun are based on interface + application level, so they are all opened at once.

Server side Android endpoint monitoring

11. However, there is a problem that when the client endpoints migrate to the color gateway, they did not handle the compatibility of the color gateway's default return result intercept, so the endpoints show abnormal display. Fortunately, the color gateway can mock parameters, and it occurred to me to mock the parameters according to the B gateway's login state intercept output to adapt to the client. The following output JSON is required from the B gateway:

{"type":"1","code":"201","msg":"For the safety of your account, please log in again"}

After the above parameters are mocked, all endpoints can be recognized normally, but a new problem is found that the logical processing of each endpoint is inconsistent. The login state is not good for iOS users, and it is not possible to jump to login. Since Android and iOS use the same application name in the color gateway, and the login state is at the application level, the login state has to be turned off.

12. Test and confirm with the client whether there are any issues on the front-end endpoints, as well as the subsequent handling of the login state. Our backend will also discuss with the front-end whether there are other solutions. After discussion, there is a solution to mock the anti-crawling output code 605, but the oldest version of the three versions of the client that接入color does not have the anti-crawling sdk, and the gateway cannot control the version when the login state is enabled.

Due to the above issues, the feasible solution is: for ios versions that have not yet connected to the anti-crawling sdk, downgrade to B gateway, so that color can enable login states.

But this feasible solution was not adopted because it needs to go through a layer-by-layer approval process to downgrade to B gateway, and it is also close to the end of the big promotion, so there is a concern that the approval process will be too slow, or there may be other unknown issues, so it was not adopted.

13. Continue to consult the shield side solutions of the color platform, the proposed solution is to continue to tighten the anti-crawling, reinforcement, and small number of accounts, and observe whether there are customer complaints. The tightening will be temporarily halted when the impact is roughly the same.

14. The anti-crawling measures at the service end, we have the ability to set separate throttling thresholds for each endpoint, solving problems on a single endpoint will not affect other endpoints, and minimizing the impact. As well as the ability to quickly expand capacity.

IV. Anti-crawling effect of color gateway

Overall, the anti-crawling capabilities of color's shield are still very strong, with significant effects. There have always been strategies to deal with the attack and defense, giving a thumbs up, and thanking the relevant personnel, as well as the partners who have worked together to deal with anti-crawling.

V. Summary

In this anti-crawling experience, there are many points that need to be reflected on, summarized, and improved. Below are several points for your reference, hoping this article will be helpful to everyone.

1. It is necessary to prevent problems before they happen, to do some basic anti-crawling work in advance for core system interfaces, and avoid being caught off guard.

2. When involving multiple gateways, it is necessary to sort out the source and characteristics of the crawler, make targeted processing with the help of the anti-crawling capabilities of each physical gateway.

3. It is necessary to test in advance to determine the impact of enabling login states, whether each endpoint supports it, and whether the user experience is affected, and other anti-crawling measures are the same.

4. WeChat Mini Programs cannot be set to strong login, which may be considered违规 and punished.

5. Previously, there was little knowledge about the anti-crawling capabilities of color, underestimating the anti-crawling capabilities and thinking that once enabled, there was no way for the crawler to work, but in fact, color still has many strategies that have not been publicly exposed.

6. It should not be assumed that a crawler must be anti-crawling clean, because the stricter the rules, the more likely it is to mistakenly kill, a balance needs to be struck between user experience and anti-crawling, and the cost of the crawler should be increased as much as possible while ensuring user experience, to minimize the impact of the crawler on the system.

7. The backend needs to have the ability to set separate throttling thresholds for each endpoint. Solving problems on a single endpoint will not affect other endpoints, and the impact will be minimized.

Author: JD Retail, Lian Yingying
Source: JD Cloud Developer Community. Please indicate the source of the reprint.

你可能想看:
最后修改时间:
admin
上一篇 2025年03月30日 01:49
下一篇 2025年03月30日 02:12

评论已关闭