From Akamai, we can see how to quickly recover from a large-scale outage of cloud services

0 22
Last Thursday, the world's largest CDN service provider, Akamai, experienced a m...

Last Thursday, the world's largest CDN service provider, Akamai, experienced a major network incident, and their DNS service caused global network paralysis again due to a 'formatting error'. In fact, in the past month, there have been at least three large-scale outages around us, and the common point is that they are all related to international large CDN suppliers. 

The first case is the outage of Fastly at the end of June, the second case is the issue with Akamai's DDoS service, and the third case is the Akamai DNS event last week. In between, there was a small incident where a major vulnerability in AWS DNS was discovered, although it has been resolved, the researcher indicated that there are still two other large cloud service providers with the same vulnerability.

From Akamai, we can see how to quickly recover from a large-scale outage of cloud services

In some professional Internet communities, discussions have already begun on the risks generated by the extensive use of CDN/Cloud. However, due to the professionalism of this topic, there has not been similar discussion in ordinary IT communities. Previous discussions have mostly focused on whether to use Multi-cloud architecture to improve the overall security and stability of the network, considering the reality of our country, we believe that risks can be reduced from another perspective.

The core issue here is that the large-scale CDN/Cloud outages that have occurred in the past have all highlighted the risk of using a single CDN/Cloud. Google even praised the

After the Akamai incident last week, an in-depth analysis found that Amazon.com was also affected, but they perfectly avoided this incident. How did they do it?

The role of DNS in connecting CDN/Cloud

The role of DNS is generally only considered to be converting IP addresses to human-readable URLs, and DNS outages only affect websites. When a company's website is not very important, the importance of DNS is naturally overlooked. But in fact, the role of DNS in modern networks is far from this. DNS is one of the most basic architectures of the Internet and has become one of the most important Single Points of Failure (SPOFs) in the Internet. Any service that needs to use IP addresses basically needs to rely on DNS: such as APPs, email, APIs, various types of authentication (such as SPF, SSL applications, MS365 login), and connections with various types of CDN/Cloud.

Imagine how your customers connect to your cloud services (such as APPs)? The answer is through domain names and DNS. Generally, domain names are connected to CDN via CNAME or ALIAS records, and then any service that searches for the domain name is handled by CDN/Cloud. Therefore, from this perspective, without DNS, it is impossible to access CDN/Cloud.

In this event, Amazon was also affected, but they were able to restore all services in a short period of time. From their settings, we can learn how to avoid good methods of cloud service disconnection. Their settings have a three-part structure: }}

1. Although they have their own CDN/Cloud services, Amazon.com deliberately separates DNS and CDN, and also uses 2 enterprise-level suppliers to provide redundancy (Redundancy) at the DNS level.

2. On the CDN level, they also use multiple different CDN services, so that they can quickly transfer to another server when a CDN disaster occurs. They should also have IP Failover (IP failover) capabilities at the DNS level, so that their disaster recovery can be automated.

3. They should have IP Failover (IP failover) capabilities at the DNS level, so that their disaster recovery can be automated.

As mentioned before, not every company can completely imitate Amazon's approach. Taking their approach as an example, we have the following suggestions for Chinese enterprises:

1. Remember that CDN/Cloud is connected through DNS, so to avoid any CDN/Cloud downtime issues, it is necessary to ensure that CDN/Cloud and DNS use different suppliers. If resources do not allow for the simultaneous possession of 2 DNS suppliers, then it should be ensured that the single DNS architecture is a dedicated DNS (Dedicated DNS network; i.e., not a large hosting network that provides DNS as a byproduct) and belongs to the enterprise level.

2. Ensure the independence between core IT infrastructure, which is a suggestion expressed by many overseas experts. For example, if CDN has a problem, it basically needs to rely on DNS to achieve recovery, so it is very important to separate CDN and DNS.

3. There needs to be an IP Failover function at the DNS level, but what if we don't use multiple cloud services? The answer is also that it can be solved at the DNS level: for example, we can set up IP failover to directly connect to the final server, so we can bypass CDN/Cloud directly when CDN/Cloud is offline.

4. Essentially, all systems rely on DNS. Enterprises should treat other high-risk systems with the same principle.Evaluation,SupervisionAndInvestmentDNSTo ensure its robustness.

We hope that Chinese enterprises can see the importance of domain names and DNS in the internet architecture from this case. In today's internet era, it is an indispensable part.

AboutDNS SecurityFor more information on risk and global best practices, please download here.

你可能想看:
最后修改时间:
admin
上一篇 2025年03月29日 04:48
下一篇 2025年03月29日 05:11

评论已关闭