1. Horizontal splitting of the database, setting different initial values and the same step size

0 22
PrefaceIn business development, a large number of scenarios require unique IDs f...

Preface

In business development, a large number of scenarios require unique IDs for identification: users need unique identity identifiers; products need unique identifiers; messages need unique identifiers; events need unique identifiers... etc., all need globally unique IDs, especially in distributed scenarios. Resource IDs are usually associated with horizontal scaling, and one solution to horizontal scaling is to generate unordered and unpredictable IDs.

This article will take a look at the common methods of generating unique IDs and their characteristics.

Characteristics of unique ID

The unique ID should have the following characteristics:

1) Uniqueness: The generated ID is globally unique, with an extremely low probability of conflict in a specific range

2) Availability: It can ensure high availability under high concurrency

3) Security: The generated ID cannot be predicted and will not expose system and business information

Common methods

Currently commonly used methods for generating unique IDs include: UUID, database auto-increment ID, Snowflake algorithm, and hash algorithm

Solution

Uniqueness

Orderliness

Availability

Snowflake algorithm

Strong uniqueness

Approximately ordered

High availability

UUID

Strong uniqueness

Unordered

High availability

Database auto-increment ID

Strong uniqueness

Ordered

High availability

Hash algorithms

Strong uniqueness

Not required

High availability

Snowflake algorithm

‌The ID generated by the Snowflake algorithm can be predicted to a certain extent through its components.

The ID generated by the Snowflake algorithm consists of the following parts:

1) Sign bit: Always 0, indicating a positive number.

‌2) Timestamp: It occupies 41 digits, representing the time difference from a fixed point in time (such as 00:00:00 on January 1, 2010).

‌3) Machine Identifier: It includes the data center ID and machine ID, each occupying 5 digits, used to distinguish different server nodes.

‌4) Sequence number: occupies 12 bits, used for different IDs generated within the same millisecond‌

1734936697_67690879858d4941863f0.png!small?1734936698092


Is the Snowflake algorithm guessable?

During the testing process, we may encounter some resource IDs generated by the Snowflake algorithm. Can we guess other resource IDs?

Theoretically, it is possible, but it is麻烦.According to the algorithm, the timestamp uses milliseconds. Assuming that there are 5 machine IDs in total, one resource ID is generated every millisecond, then there are 60*60*1000*5=18,000,000 in an hour.

When the amount of data is too large, it is difficult to obtain information through brute force, and it is also easy to be discovered. When the cost of the attack exceeds the benefits, there is no point. Therefore, the Snowflake algorithm also guarantees security to some extent.

Snowflake algorithm demo

import time
import threading

class SnowflakeGenerator:
    def __init__(self, datacenter_id, worker_id):
        self.datacenter_id = datacenter_id
        self.worker_id = worker_id
        self.sequence = 0
        self.last_timestamp = -1

        # Bit lengths for different parts
        self.datacenter_id_bits = 5
        self.worker_id_bits = 5
        self.sequence_bits = 12

        # Maximum values
        self.max_datacenter_id = -1 ^ (-1 << self.datacenter_id_bits)
        self.max_worker_id = -1 ^ (-1 << self.worker_id_bits)
        self.max_sequence = -1 ^ (-1 << self.sequence_bits)

        # Shift amounts
        self.worker_id_shift = self.sequence_bits
        self.datacenter_id_shift = self.sequence_bits + self.worker_id_bits
        self.timestamp_shift = self.sequence_bits + self.worker_id_bits + self.datacenter_id_bits

        self.lock = threading.Lock()

    def _current_milliseconds(self):
        return int(time.time() * 1000)

    def _til_next_millis(self, last_timestamp):
        timestamp = self._current_milliseconds()
        while timestamp <= last_timestamp:
            timestamp = self._current_milliseconds()
        return timestamp

    def generate_id(self):
        with self.lock:
            timestamp = self._current_milliseconds()

            if timestamp < self.last_timestamp:
                raise ValueError("Clock moved backwards. Refusing to generate id.")

            if timestamp == self.last_timestamp:
                self.sequence = (self.sequence + 1) & self.max_sequence
                if self.sequence == 0:
                    timestamp = self._til_next_millis(self.last_timestamp)
            else:
                self.sequence = 0

            self.last_timestamp = timestamp

            return ((timestamp - 1288834974657) << self.timestamp_shift) | \
                   (self.datacenter_id << self.datacenter_id_shift) | \
                   (self.worker_id << self.worker_id_shift) | \
                   self.sequence

def generate_unique_id(prefix: str, datacenter_id: int, worker_id: int) -> str:
    generator = SnowflakeGenerator(datacenter_id, worker_id)
    snowflake_id = generator.generate_id()
    return f"{prefix}{snowflake_id}"
















# Create a generator, specifying the data center ID and worker machine ID
datacenter_id = 0
worker_id = 1

# Generate user ID
user_id = generate_unique_id("USER_", datacenter_id, worker_id)
print(f"Generated user ID: {user_id}")


Database auto-increment

The database auto-increment ID may be the most familiar way to generate unique IDs for everyone. It has the advantages of simple use, meeting basic needs, and being naturally ordered. However, there are also defects:

1) Poor concurrency

2) The database write pressure is high

3) The database cannot be used after a failure

4) There is a risk of quantity leakage

The database auto-increment ID is the most common way to generate IDs. By using the database itself for settings, it can maintain uniqueness within the entire database. The advantages are simple to use, meet basic business needs, and are naturally ordered; the disadvantages are heavily dependent on the DB, and there may be single-point failures, data consistency issues, etc. due to some characteristics of database deployment. In view of the defects of the database auto-increment ID introduced above, there will be the following two optimization solutions:

1. Horizontal splitting of the database, setting different initial values and the same step size

1734936765_676908bd98d46840c6d94.png!small?1734936766018

As shown in the figure, it can ensure that the IDs generated by each database are not conflicting, but this fixed step method will also bring expansion problems. It is easy to think that when expanding, there will be a dilemma of no initial ID value to be divided. Solutions include:

  • Determine the step size based on expansion considerations
  • Increase other bit markers to distinguish expansion

This is actually a balance between demand and solution, and the most suitable method is chosen according to the demand.

2. Generate a batch of IDs

If you want to use a single machine to generate IDs, to avoid the expansion problem caused by fixed steps, you can generate a batch of IDs each time and give them to different machines to consume slowly. This will also reduce the database pressure to one-Nth, and can withstand for a period of time after a failure.

1734937320_67690ae85e1ea48b7a79c.png!small?1734937320780

As shown in the figure, but the disadvantage of this approach is that server restarts and single-point failures can cause ID discontinuity. As always, there is no best solution, only the most suitable one.

UUID

UUID is a set of standards for generating globally unique identifiers, also known as GUID (Globally Unique Identifier). By using UUID, unique IDs can be generated in distributed systems. There are many ways to generate UUIDs.

UUID Format

The standard format of UUID is a string composed of 32 hexadecimal numbers, separated into five parts, such as:

467e8542-2275-4163-95d6-7adc205580a9

The number of digits in each part is: 8-4-4-4-12

When used in the system, the hyphen (-) is generally removed. Therefore, the format is as follows: 467e85422275416395d67adc205580a9

Characteristics

Uniqueness: Based on the combination of random numbers and timestamps, a globally unique ID is generated

Unordered: UUID is randomly generated and does not have time sorting

Performance: The generation speed of UUID is fast, suitable for high-concurrency environments

Can UUID resist unauthorized access?

UUID is completely unordered and cannot be guessed, which can prevent horizontal unauthorized access attacks

Hash algorithms

Hash algorithms can be used to generate unique IDs, but it is necessary to understand the characteristics and limitations of hash algorithms. Hash algorithms convert input data of any length into fixed-length hash values, which are usually fixed-length binary strings and are typically represented in hexadecimal.

Hash principle

  • Hash algorithms convert input data into hash values deterministically. This means that the same input data will always generate the same hash value.
  • Hash algorithms are typically one-way, meaning that the original data cannot be还原 from the hash value. This is because the hash value is usually shorter than the original data, resulting in the loss of some information.

Uniqueness

The goal of hash algorithms is to ensure that different input data generates different hash values. However, since the length of hash values is finite, hash collisions are inevitable. Hash collisions refer to different input data mapping to the same hash value.

Well-designed hash algorithms should minimize the probability of hash collisions, but they may not be able to eliminate them completely.

Methods to resolve hash collisions

For the purpose of generating unique IDs, it is necessary to consider how to handle hash collisions. One method is to add randomness to reduce the probability of collisions. Another method is to use a uniqueness index to verify whether the generated ID already exists.

Common hash algorithms

  • Hash algorithms are widely used in fields such as cryptography, data integrity verification, data retrieval, data comparison, digital signatures, data structures (such as hash tables), and information security.
  • In the context of generating unique IDs, hash algorithms can be used to create identifiers to ensure their uniqueness on given data.

Summary

To test for improper authorization, it is necessary to fully understand the generation method of resource IDs before determining whether it is possible to perform unauthorized test. The article introduces several common methods of unique ID generation, and it is necessary to flexibly judge the algorithm used by resource IDs during the testing process.


你可能想看:

Ensure that the ID can be accessed even if it is guessed or cannot be tampered with; the scenario is common in resource convenience and unauthorized vulnerability scenarios. I have found many vulnerab

2.8 Continue to click the getTomcatWebServer method, find the initialize () method, and you can see the tomcat.start () method to start the Tomcat service.

Different SRC vulnerability discovery approach: Practical case of HTTP request splitting vulnerability

Announcement regarding the addition of 7 units as technical support units for the Ministry of Industry and Information Technology's mobile Internet APP product security vulnerability database

Distributed Storage Technology (Part 2): Analysis of the architecture, principles, characteristics, and advantages and disadvantages of wide-column storage and full-text search engines

4.5 Main person in charge reviews the simulation results, sorts out the separated simulation issues, and allows the red and blue teams to improve as soon as possible. The main issues are as follows

Data security can be said to be a hot topic in recent years, especially with the rapid development of information security technologies such as big data and artificial intelligence, the situation of d

As announced today, Glupteba is a multi-component botnet targeting Windows computers. Google has taken action to disrupt the operation of Glupteba, and we believe this action will have a significant i

It is possible to perform credible verification on the system boot program, system program, important configuration parameters, and application programs of computing devices based on a credible root,

HTTP data packets & request methods & status code judgment & brute force encryption password & exploiting data packets

最后修改时间:
admin
上一篇 2025年03月25日 08:01
下一篇 2025年03月25日 08:23

评论已关闭