1. Introduction
Hello everyone, ClickHouse is a column-oriented database management system (DBMS) used for online analytical processing (OLAP). Many of our internal reports and data dashboards are developed based on it. Today, I bring you a complete introduction to the process of ClickHouse data table migration using the remote method. If there are any mistakes, please let the seniors correct them.
The following SQL statements are for testing purposes. Please modify them according to the actual situation if needed.
2. Background

We are using the distributed database JCHDB provided by JD Cloud. The original ClickHouse was shared by two departments. Due to issues related to business, management, and cost allocation, it is necessary to separate the ClickHouse cluster. The original ClickHouse contains tables such as business A order table and business B screen data table; after splitting, the business B screen data table needs to be migrated to the new ClickHouse cluster.
3. Migration method
After investigation, there are the following migration methods:
1. Perform data migration through the remote function
2. Perform data migration through file export and import methods
3. Export and import through CSV files
4. Stream export and import through Linux pipe
For the detailed plan of JD Cloud JCHDB, please refer to the document:
https://docs.jdcloud.com/cn/jchdb/local-clickhouse-clickhouse
After communicating with the operation and maintenance colleagues responsible for JCHDB in the cloud and conducting research, since the data volume is currently not large, it is more suitable to use the remote method for migration. Pay attention to the prerequisite requirements for the use of remote. If the number is too large, please refer to other migration methods.
Please increase before using the remote method
The value of the max_partitions_per_insert_block parameter should be set to avoid statement execution failure, for example, the following error message is reported:
Error message:
Too many partitions for a single INSERT block (more than 100). The limit is controlled by the 'max_partitions_per_insert_block' setting
Reason:
ClickHouse requires that the data written each time does not span too many partitions, which is controlled by the parameter 'max_partitions_per_insert_block'. Adjust this parameter to do so.
4. Steps
1. Creating a new ClickHouse cluster: Please apply for it on the cloud management platform. First, estimate the future data volume of the business, and then fill in the capacity configuration of the application.
2. Creating a database:
CREATE DATABASE IF NOT EXISTS new_database on cluster default;
Note that 'on cluster default;' must be included at the end.
3. Creating a table:
Write SQL based on the actual table fields and table engine. Refer to:
https://clickhouse.com/docs/en/sql-reference/statements/create/table
4. Establishing the test table
CREATE TABLE
IF NOT EXISTS new_database.test_ck_01_local ON CLUSTER default
(
id String COMMENT 'Random primary key',
dt Date COMMENT 'Partition field'
)
ENGINE = ReplicatedReplacingMergeTree
('/clickhouse/new_database/tables/{shard}/test_ck_01_local', '{replica}')
PARTITION BY toYYYYMMDD
(dt)
ORDER BY id;
CREATE TABLE
IF NOT EXISTS new_database.test_ck_01 ON CLUSTER default AS new_database.test_ck_01_local
ENGINE=Distributed(default, new_database, test_ck_01_local, rand());
5. Write test data:
Execute the data writing statement in the original clickhouse:
INSERT INTO old_database.test_ck_01 values('1', NOW());
INSERT INTO old_database.test_ck_01 values('2', NOW());
Add more data according to the actual situation.
Execute the query statement from the new ClickHouse cluster client: (if it fails, it means the network is not connected)
SELECT * from
remote('old cluster address', old_database.test_ck_01, 'user', 'password')
6. Test migration command:
INSERT INTO new_database.test_ck_01
SELECT * from
remote('old cluster address', old_database.test_ck_01, 'user', 'password')
7. The formal migration steps are as follows:
- Modify the clickhouse address in the code in advance and replace it with the new address;
- Notify the big data real-time person in charge to stop flink and other writing tasks;
- Perform data migration to the new ClickHouse cluster (refer to the above migration statement);
- Notify the big data real-time person in charge to start flink and other writing tasks;
- Verify if the data is synchronized to the new ClickHouse cluster;
- Deploy or restart in the灰度or pre-release environment, and call the query to the new ClickHouse cluster through code to check if it is normal.
8. Migration statement: (execute in the target clickhouse cluster client)
INSERT INTO new_database.data_table_to_be_migrated
SELECT * from
remote('old cluster address', old_database.old_data_table, 'user', 'password')
9. Verify if the table data volume is consistent:
SELECT COUNT(1) from the data table to be migrated final
Note:The amount of data may be inconsistent after migration, please use the final merge query to merge duplicate data entries.
5 References
Official documentation:
https://clickhouse.com/docs/zh
JD Cloud clickhouse learning:
https://docs.jdcloud.com/cn/jchdb/product-overview
remote usage:
https://blog.csdn.net/u010180815/article/details/115070235
Author: JD Logistics Liu Dengzhong
Content source: JD Cloud Developer Community

评论已关闭