API Attack Analysis： Hackers exploit a large number of IPs for low-frequency crawling of sensitive data.

Recently, a certain Internet company captured a data leakage risk event while using the Yongan Online API Security Control Platform.Attackers used a large number of dynamic proxy IP addresses for rapid dialing to carry out low-frequency sensitive data crawling attacks on the company's user information query API.

The Yongan Online API Security Team responded promptly to this risk event, located the defective API, extracted the attack characteristics of this risk event, and helped the company adjust its security response strategy in time, thus avoiding large-scale user information leakage.

Event Analysis

API Attack Analysis： Hackers exploit a large number of IPs for low-frequency crawling of sensitive data.

According to information, the company's website provides a public API for candidate information query. Users can obtain sensitive information related to candidates, including personal photos, scores, admission ticket numbers, certificate numbers, and the schools they belong to, by entering their names and identity information.Candidate information is of extremely high value to fraud gangs for telecommunications fraud and to recruitment institutions for targeted marketing and promotion.

In 2016, a student was defrauded. The fraudster, having obtained the candidate's information, carried out fraud under the pretext of scholarship distribution, leading to the unfortunate death of the student; in 2019, a provincial Putonghua score query website exposed candidate information due to improper configuration, resulting in the leakage of information of 405 candidates; the Yongan Online Data Asset Leakage Risk Monitoring Platform often detects candidate information being sold on the dark web and QQ groups...

Considering the sensitivity of the data, from the perspective of data security, the website's information query API uses image captcha for human-machine verification and adopts frequency limiting strategies based on IP addresses, but this is far from enough.Under the temptation of high-value data, attackers bypass the website's security protection strategy by using captcha platforms and a large number of dynamic proxy IP addresses for rapid dialing, and initiate low-frequency crawling attacks on the information query API to obtain candidate information.

Below, we will restore the attack situation from multiple perspectives:

1. From the perspective of attack time

The attack trend of the attacker highly coincides with working hours, mainly concentrated on daytime attacks.

To make the attack traffic look more 'normal', attackers often initiate attacks during working hours (i.e., daytime) and limit the number of attacks at night to prevent the abnormal period (night) from initiating requests from being easily perceived by enterprises as data anomalies, thereby benefiting in the long run.

1651044628_6268f114d99b9090bbb0e.png!small?1651044629522

2. From the perspective of the identity certificate and name information used by the attacker

The identity certificate/ name group used in this attack are all one-to-one corresponding, without any case where an identity certificate corresponds to multiple names or a name corresponds to multiple identity certificates.The query success rate is 97%, thereby excluding the possibility of brute force attacks.

Through the analysis,The average crawling success rate of the attacker is 17.8%, it is inferred that this batch of identity certificate/ name group may have been obtained by the attacker from other platforms and attempts to crawl and attack on this platform.

Query success rate:It refers to the matching of identity certificates and names, which can successfully initiate a query on the website.

Query success rate:It refers to the successful initiation of a query after which the website successfully responds with the corresponding candidate information.

3. From the perspective of the source IP information used by the attacker

To bypass the IP frequency limit security strategy, attackers rely on captcha platforms and dynamic proxy platforms, using12790A dynamic proxy dial IP performs crawling attacks on this query interface,The average number of attacks per IP is 1.96 times, and the highest number of attacks is only 10 times.

1651044666_6268f13a7beddce0571af.png!small?1651044667189

Based on the risk intelligence of Yong'an Online, the source IP of this attack event is traced back for analysis,It is found that 93.79% of the IPs have been active on the Abuyun platform in the past month, so it is speculated that the attackers are likely to have initiated this attack using the IP resources provided by the Abuyun proxy.

1651044675_6268f14302a5be87fb3a3.png!small?1651044675661

In addition,From the perspective of the territorial information of IPAlmost half of the IPs come from Anhui Province and Jiangsu Province, with the rest scattered in many places

1651044689_6268f151b516e027c66b2.png!small?1651044690551

However, through the analysis of the territorial consistency of IP and identity certificates, among 25146 requests, 23227 times (accounting for 92.37%) have inconsistent identity certificate attribution and IP geographical location, indicating that attackers did not carefully construct the corresponding relationship between identity certificate attribution and source IP during the attack.

The proportion of inconsistent identity addresses and IP addresses in this category is very high, which is a matter that security operation personnel need to pay special attention to in their daily operations.

4. From the perspective of attacker behavior

Compared to the normal query business of users, attackers lack the behavior of requesting the query page API.In addition, compared to normal business, when calling the API to obtain image verification codes and query candidate information, the referer information of the query page API is carried along.The attack requests constructed by the attackers do not have a referer field.Platform operators can limit requests with abnormal logic of such non-normal behaviors.

1651044704_6268f160c5d6992aca42d.png!small?1651044706272

5. From the perspective of the UA used by the attacker

It is speculated that there are two batches of attack groups. One attack group uses newer browser versions to attack, and the other attack group伪造了一个老版本的Chrome浏览器发起攻击, which shows that in the daily operation process, enterprises need to pay attention to the request traffic initiated by old version client-side requests.

Summary and suggestions for handling

1. About data leakage

Data leakage is an old topic in cybersecurity. According to the latest report released by Imperva, since 2017, the number of data records leaked in global network attacks has increased by an average of 224% per year. In January 2021 alone, the reported leakage records (8.78 billion) exceeded the total for the whole year of 2017 (8.26 billion).

According to the data leakage intelligence monitoring of Yong'an Online,Data leakage has evolved from system intrusion and database extraction in earlier years to subsequent leakage through SDK reporting, carrier DPI, and SMS channels.With the continuous strengthening of enterprise security measures and more data digitization and onlineization (more and more data is transferred through APIs), throughEnterprise-provided APIs to crawl dataThe proportion of data leakage brought about by it is increasing.

In Gartner's report

2. Defense and Detection

The implementation of theFor enterprises, protecting user information security is one of the basic responsibilities. Leaking user information will lose credibility and damage the company's brand image. It is recommended that enterprises start from every small detail below to continuously improve their own data security protection capabilities.

The following are suggestions for defense and detection against low-frequency crawling attacks on APIs:

Strengthen the basic protection measures of APIs

For APIs involving sensitive data, using behavior verification codes with high cracking complexity, such as slider verification codes and spatial reasoning verification codes, although they can be bypassed through captcha platforms, they have increased the cost of attacks to a certain extent and slowed down the attack efficiency of black markets;

增加访问控制和授权管理，需要登录验证权限后才可以查询归属该账号权限的敏感数据信息；

Increase access control and authorization management, and login verification is required before querying sensitive data information belonging to the account's permissions;

For some special scenarios, it is also necessary to conduct secondary verification of the identity of the inquirer, implement multi-factor authentication such as SMS verification, fingerprint authentication, and facial recognition;

For APIs that need to be opened for access, terminal environment information and behavior information can be collected, and crawling detection models can be built from aspects such as clustering characteristics of access environments, historical access frequency, and access behavior sequences to identify abnormal crawling behaviors. The threshold for crawling is improved by constructing false return data and dynamic verification codes.

Use dynamic verificationDynamic signatures and other information can be added to the API request parameters

If the server randomly issues a set of encryption algorithms from multiple prepared encryption algorithms to the client, the client dynamically encrypts the parameters of the API request and then compares the signature information of both the server and the request ends to ensure that the request comes from an attacker's automated attack. Since the encryption algorithm is dynamically changing and has been obfuscated, it increases the threshold for attackers to automatically crawl sensitive data.

Based on risk intelligence, a login API attack detection system is builtFor massive IP low-frequency crawling attacks, it is also necessary to rely on API attack detection and perception tools based on IP risk intelligence and API attack intelligence.