1 Background
The billing data volume has increased dramatically, requiring the old database to be split into multiple sharded databases, data sharding;
The sharding rule is for the payment and receipt object (or ID) field, perform HASH, take modulo (32), and divide into 32 databases
2 Target
Implement data migration from the old database to the sharded databases according to the sharding rules
Ensure smooth data migration, minimize the downtime as much as possible
Support rollback, synchronization failure, support rollback for single database
4 Solution
3.1 Implemented based on the Hive middleware
3.2 Semi-self-developed synchronous data processing program
- Develop data processing programs, consume historical data MQ; consume incremental data MQ
- Based on dts synchronization of historical data (specify time point, synchronize historical data)
- Based on JDQ synchronization of real-time data (specify time point, restore real-time synchronization)
3.3 Strategy based on open-source middleware
3.4 Completely self-developed data processing tool
- Develop a data query program, send historical data query to MQ for writing
- Real-time data dual writing
- Unified sending to MQ, and asynchronous processing of writing by MQ
3.5 Solution comparison
After comprehensive evaluation, we finally select the overall data migration solution based on sharding-proxy
4 Introduction and construction of Proxy
4.1 Introduction
4.1.1 Design significance
Positioned as a transparent database proxy end, it provides a server-side version encapsulating the database binary protocol for supporting heterogeneous languages. Currently, it provides MySQL and PostgreSQL (compatible with openGauss and other databases based on PostgreSQL) versions, which can use any client compatible with MySQL/PostgreSQL protocols (such as MySQL Command Client, MySQL Workbench, Navicat, etc.) to operate data, making it more user-friendly for DBAs.
It is completely transparent to the application and can be used directly as MySQL/PostgreSQL;
It is suitable for any client compatible with MySQL/PostgreSQL protocols.
4.1.2 Overall architecture
The entire architecture can be divided into three parts: front-end, back-end, and core component.
The front-end is responsible for network communication with the client, using a client/server framework based on NIO. It uses the NIO model under Windows and Mac operating systems, and automatically adapts to the Epoll model under Linux systems. During the communication process, the encoding and decoding of the MySQL protocol are completed. After the core component decodes the MySQL command, it starts to call Sharding-Core for SQL parsing, routing, rewriting, result merging, and other core functions. The back-end interacts with the real database with the help of the Hikari connection pool.
4.2 Construction
4.2.1 Keyword interpretation
Download, unzip, install mysql driver, start, done
4.2.2 Install shareding-proxy
Download the installation package, select a suitable version (this article uses 4.1.1), and download it from the official website, the address of which ishttps://shardingsphere.apache.org/document/current/cn/downloads
Unzip the installation package (automatically unzip or command unzip), and specify the unzip directory as you wish (any directory with permission is okay)
The shareding-proxy directory after extraction
The shareding-proxy configuration directory conf contains all configuration data
4.2.3 Install mysql driver
Place the mysql driver jar package (mysql-connector-java-5.1.44.jar) in the lib directory of shareding-proxy. ShardingSphere-Proxy does not come with the mysql driver jar package, so it needs to be downloaded manually
Download addresshttps://dev.mysql.com/downloads/connector/j/
4.2.4 Proxy startup
The start.sh in the bin directory of shareding-proxy, which can be started through https://www.freebuf.com/articles/database/start.sh
4.2.5 Remark
The default startup port of Sharding-Proxy is 3307
4.3 Configuration
4.3.1 Key points interpretation
Six major configurations - log configuration (logback.xml), basic service configuration (server.yaml), logical configuration (four conf configuration files, sharding (core)/shadow/read-write separation/encryption configuration)
This example is based on the server.yaml and config-sharding.yaml configuration for sharding strategy
4.3.2 server.yaml
Basic service configuration, consisting of three parts
1) Configuration of shareding-jdbc orchestration governance, providing data governance functions, including the following:
Configure centralization and dynamism. (Supports dynamic switching of data source, table, and sharding read-write separation strategies)
Data governance. Provides the ability to break the access to the database and disable the access to the standby database
Supports the registration center of Zookeeper and etcd;
2) Permission configuration, configure username and password as well as authorized database
The following configuration includes two users: root/root and sharding/sharding. The root user is authorized to access all databases by default, while the sharding user is authorized to access sharding_db database. Here, the database (schema) is a logical database, and the corresponding distributed database mapping is configured in config-*.yaml
Proxy data source parameter configuration
Configure data links, threads, core numbers, etc.
4.3.3 config-sharding.yaml
shareding-proxy core configuration, related configuration of sharding rules, including schemaName, dataSources, and shardingRule
1) The mapping configuration of the logical database corresponding to the distributed data source is shown in the figure below
schemaName logical database name, the schema declared as authorized in server.yaml is the schemaName here
dataSources is the data source configuration, this example maps two shard databases (ds0, ds_1), ds${0..1} corresponds to the logical database name, the url fills in the actual database
4.3.4 logback.xml
Log configuration based on logback
4.3.5 Remaining three configurations
config-shadow.yaml/config-master_slave.yaml/config-encrypt.yaml
Configure for shadow database, master-slave configuration, data field encryption configuration, you can check the following link if you wish
5 Debugging
Based on the ShardingSphere-Proxy proxy, choose the direct connection tool client
- Use Navicat or directly connect with mysql command
- Manual mysql command link as follows
Query without split key defaults to search the entire database, new ones default to route to the corresponding real database based on split key
6 Data migration
Three steps of migration
1) Online installation of sharding-proxy
2) Data synchronization: create migration task, start synchronization, the principle is to create DTS task
3) Data integrity verification
Full comparison, overall synchronization progress query
Time segment comparison, sample according to each time period for new and old database total comparison, manually verify
Random sampling comparison: compare the data of a certain time period in the new database one by one, and manually verify with the tool
Manually sample query according to the development tool and compare the queried data with the old database
Full data verification: compare synchronized data for full data verification, based on DTS tool for verification, it takes a long time
7 Configure query machine
Configure logistics query machine based on easyops or myops, and query proxy through the query machine
8 Questions and Summary
The biggest problem encountered in the overall data migration process is that the data is not measurable. For various historical data problems that cause data migration to break, rework, clean up garbage data, and re-migrate
8.1 Empty split key
Empty split key is not supported by default
8.2 Update split key
The update statement does not support updating the split key by default (actually, 4.x does not support updating with split key, and 5.x supports updating with split key without modification)
Unknown exception: [INSERT INTO …. ON DUPLICATE KEY UPDATE does not support update for sharding column.]
8.3 Solutions for the above two exceptions
The split key cannot be empty, set a default split key
Update with split key, upgrade sharding-proxy to 5.x or configure DTS without split key update
Author: Ren Hongbo

评论已关闭