Double-write database backup scheme for MySQL to TiDB migration

0 28
Author: JD Retail, Shi LeiAs one of the typical representatives of open-source N...

Author: JD Retail, Shi Lei

As one of the typical representatives of open-source NewSQL databases, TiDB also supports SQL and supports ACID transaction features. In terms of communication protocols, TiDB chooses to be fully compatible with MySQL and tries to be compatible with MySQL syntax. Therefore, systems developed based on MySQL databases can be smoothly migrated to TiDB with minimal code changes. For users, the migration cost is extremely low and the transition is natural.

Double-write database backup scheme for MySQL to TiDB migration

However, there are still some features and behaviors of MySQL that TiDB currently does not support or differ from MySQL. In addition, TiDB provides some extended syntax and features for more convenience to users.

TiDB is still on the fast track of development, and it is advancing in supporting MySQL functionality and behavior according to the roadmap.

Compatibility strategy

Firstly, summarize the compatibility strategy of TiDB and MySQL in the following table:

Communication protocolSQL syntaxFunctionality and behavior
Fully compatibleCompatible with the vast majorityCompatible with most

As of version 4.0, the differences between TiDB and MySQL are summarized in the following table:

MySQLTiDB
Isolation levelSupports read uncommitted, read committed, repeatable read, and serializable, with repeatable read as the defaultOptimistic transactions support snapshot isolation, and pessimistic transactions support snapshot isolation and read committed
Locking mechanismPessimistic lockingOptimistic locking, pessimistic locking
Stored proceduresSupportedNot supported
TriggersSupportedNot supported
EventsSupportedNot supported
Custom functionsSupportedNot supported
Window functionsSupportedPartially supported
JSONSupportedDoes not support some of the new functions added in MySQL 8.0
Foreign key constraintsSupportedIgnore foreign key constraints
Character setOnly supports ascii, latin1, binary, utf8, utf8mb4
Add/delete primary keySupportedThrough alter-primary-keyConfiguration switch provided
CREATE TABLE tblName AS SELECT stmtSupportedNot supported
CREATE TEMPORARY TABLESupportedTiDB ignores the TEMPORARY keyword and creates it as a regular table
DML affected rowsSupportedNot supported
AutoRandom column attributeNot supportedSupported
Sequence generatorNot supportedSupported

Comparison of three solutions

Duplicate write solution: Write data to mysql and tidb simultaneously, keeping the data in both databases completely synchronized

•Advantage: This solution is the safest, and as a fallback solution, there is no need to worry about database rollback issues because the data is completely consistent and can be seamlessly rolled back to mysql

•Disadvantage: The new solution, research and implementation of the solution, is costly

Read-write splitting: Data is written to mysql and read from tidb. The specific plan is to maintain read-write splitting for about a week after switching online, and this week is used to determine that there are no problems with the tidb database, and then switch write operations to tidb as well

•Advantage: During the switching process, mysql and tidb data are kept synchronized, meeting the data rollback to mysql solution

•Disadvantage: There is a delay in the synchronization of mysql and tidb databases, which may cause query failures for some write data that require real-time queries, and once the overall switch to tidb is completed, it cannot be switched back to mysql

Direct switch: Switch directly to tidb in one step

•Advantage: The switching process is the simplest and the cost is the lowest

•Disadvantage: This solution does not have a fallback plan. Switching to tidb and being unable to switch back to mysql or synchronize data back to mysql poses a high risk, and it cannot be guaranteed that the data is usable

Django duplicate write strategy for mysql and tidb

New configuration added in settings.py
# Dev Database settings
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'name',
        'USER': 'root',
        'PASSWORD': '123456',
        'HOST': 'db',
    ,
    'replica': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'name',
        'USER': 'root',
        'PASSWORD': '123456',
        'HOST': 'db',
    ,
    'bak': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'name',
        'USER': 'root',
        'PASSWORD': '123456',
        'HOST': 'db',
    ,
}
# Configuration for multiple database writes
MULTI_WRITE_DB = "bak"

Duplicate write middleware basemodel.py

import copy
import logging
import traceback
from django.db import models, transaction, router
from django.db.models.deletion import Collector
from django.db.models import sql
from django.db.models.sql.constants import CURSOR
from jcdp.settings import MULTI_WRITE_DB, DATABASES

multi_write_db = MULTI_WRITE_DB


# Override QuerySet
class BaseQuerySet(models.QuerySet):

    def create(self, **kwargs):
        return super().create(**kwargs)

    def update(self, **kwargs):
        try:
            rows = super().update(**kwargs)
            if multi_write_db in DATABASES:
                self._for_write = True
                query = self.query.chain(sql.UpdateQuery)
                query.add_update_values(kwargs)
                with transaction.mark_for_rollback_on_error(using=multi_write_db):
                    query.get_compiler(multi_write_db).execute_sql(CURSOR)
        except Exception:
            logging.error(traceback.format_exc())
            raise
        return rows

    def delete(self):
        try:
            deleted, _rows_count = super().delete()
            if multi_write_db in DATABASES:
                del_query = self._chain()
                del_query._for_write = True
                del_query.query.select_for_update = False
                del_query.query.select_related = False
                collector = Collector(using=multi_write_db)
                collector.collect(del_query)
                collector.delete()
        except Exception:
            logging.error(traceback.format_exc())
            raise
        return deleted, _rows_count

    def raw(self, raw_query, params=None, translations=None, using=None):
        try:
            qs = super().raw(raw_query, params=params, translations=translations, using=using)
            if multi_write_db in DATABASES:
                super().raw(raw_query, params=params, translations=translations, using=multi_write_db)
        except Exception:
            logging.error(traceback.format_exc())
            raise
        return qs

    def bulk_create(self, objs, batch_size=None, ignore_conflicts=False):
        try:
            for obj in objs:
                obj.save()
        except Exception:
            logging.error(traceback.format_exc())
            raise
        #     objs = super().bulk_create(objs, batch_size=batch_size, ignore_conflicts=ignore_conflicts)
        # if multi_write_db in DATABASES:
        #     self._db = multi_write_db
        #     super().bulk_create(objs, batch_size=batch_size, ignore_conflicts=ignore_conflicts)
        return objs

    def bulk_update(self, objs, fields, batch_size=None):
        try:
            super().bulk_update(objs, fields, batch_size=batch_size)
            if multi_write_db in DATABASES:
                self._db = multi_write_db
                super().bulk_update(objs, fields, batch_size=batch_size)
        except Exception:
            logging.error(traceback.format_exc())
            raise


class BaseManager(models.Manager):
    _queryset_class = BaseQuerySet


class BaseModel(models.Model):
    objects = BaseManager()

    class Meta:
        abstract = True

    def delete(
            self, using=None, *args, **kwargs
    :)
        try:
            instance = copy.deepcopy(self)
            super().delete(using=using, *args, **kwargs)
            if multi_write_db in DATABASES:
                super(BaseModel, instance).delete(using=multi_write_db, *args, **kwargs)
        except Exception:
            logging.error(traceback.format_exc())
            raise

    def save_base(self, raw=False, force_insert=False,
                  force_update=False, using=None, update_fields=None):
        try:
            using = using or router.db_for_write(self.__class__, instance=self)
            assert not (force_insert and (force_update or update_fields))
            assert update_fields is None or update_fields
            cls = self.__class__
            # Skip proxies, but keep the origin as the proxy model.
            if cls._meta.proxy:
                cls = cls._meta.concrete_model
            meta = cls._meta
            # A transaction isn't needed if only one query is issued.
            if meta.parents:
                context_manager = transaction.atomic(using=using, savepoint=False)
            else:
                context_manager = transaction.mark_for_rollback_on_error(using=using)
            with context_manager:
                parent_inserted = False
                if not raw:
                    parent_inserted = self._save_parents(cls, using, update_fields)
                self._save_table(
                    raw, cls, force_insert, or parent_inserted
                    force_update, using, update_fields,
                )
            if multi_write_db in DATABASES:
                super().save_base(raw=raw,
                                  force_insert=raw,
                                  force_update=force_update,
                                  using=multi_write_db,
                                  update_fields=update_fields)
            # Store the database on which the object was saved
            self._state.db = using
            # Once saved, this is no longer a to-be-added instance.
            self._state.adding = False
        except Exception:
            logging.error(traceback.format_exc())
            raise

After the above configuration is completed, you can implement the dual-writing purpose by referencing the new BaseModel class as the model base class in each application's models.py

class DirectoryStructure(BaseModel):
    """
    Directory structure
    """
    view = models.CharField(max_length=128, db_index=True)  # View name eg: Department View Project View
    sub_view = models.CharField(max_length=128, unique=True, db_index=True)  # Sub-view name
    sub_view_num = models.IntegerField()  # Sub-view sequence number

Note: This method currently does not support the bidirectional writing scenario of the many-to-many model. If there is a business need, the ManyToManyField class needs to be rewritten, and the method can refer to the monkey patching method.

Record of pitfalls encountered during the database migration process

TIDB configuration item differences: Confirm the database configuration: ONLY_FULL_GROUP_BY is disabled (disabled by default in MySQL)

TIDB does not support transaction savepoint, and savepoint=False needs to be explicitly closed in the code.

TIDB being a distributed database, differs from MySQL in the auto-increment strategy for the auto-increment primary key field. If the business code is associated with the primary key id, attention should be paid.

你可能想看:

Database入门:Master the five basic operations of MySQL database and easily navigate the data world!

It is possible to perform credible verification on the system boot program, system program, important configuration parameters, and application programs of computing devices based on a credible root,

From 0 to 1, this article is enough to collect SQL injection (sql ten injection types), technical analysis and practical training

In today's rapidly developing digital economy, data has become an important engine driving social progress and enterprise development. From being initially regarded as part of intangible assets to now

Announcement regarding the addition of 7 units as technical support units for the Ministry of Industry and Information Technology's mobile Internet APP product security vulnerability database

As announced today, Glupteba is a multi-component botnet targeting Windows computers. Google has taken action to disrupt the operation of Glupteba, and we believe this action will have a significant i

High-performance MySQL Practical (Part 1): Table Structure

3.6 Should not use OS package manager update instructions such as apt-get update or yum update separately or on a single line in Dockerfile

0x02 Abusing SeBackupPrivilege permission to perform NTDS.dt shadow copy for privilege escalation

Data security can be said to be a hot topic in recent years, especially with the rapid development of information security technologies such as big data and artificial intelligence, the situation of d

最后修改时间:
admin
上一篇 2025年03月28日 14:13
下一篇 2025年03月28日 14:36

评论已关闭