ClickHouse data table migration实战之

0 23
1. IntroductionHello everyone, ClickHouse is a column-oriented database manageme...

1. Introduction

Hello everyone, ClickHouse is a column-oriented database management system (DBMS) used for online analytical processing (OLAP). Many of our internal reports and data dashboards are developed based on it. Today, I bring you a complete introduction to the process of ClickHouse data table migration using the remote method. If there are any mistakes, please let the seniors correct them.

The following SQL statements are for testing purposes. Please modify them according to the actual situation if needed.

2. Background

ClickHouse data table migration实战之

We are using the distributed database JCHDB provided by JD Cloud. The original ClickHouse was shared by two departments. Due to issues related to business, management, and cost allocation, it is necessary to separate the ClickHouse cluster. The original ClickHouse contains tables such as business A order table and business B screen data table; after splitting, the business B screen data table needs to be migrated to the new ClickHouse cluster.

3. Migration method

After investigation, there are the following migration methods:

1. Perform data migration through the remote function

2. Perform data migration through file export and import methods

3. Export and import through CSV files

4. Stream export and import through Linux pipe

For the detailed plan of JD Cloud JCHDB, please refer to the document:
https://docs.jdcloud.com/cn/jchdb/local-clickhouse-clickhouse

After communicating with the operation and maintenance colleagues responsible for JCHDB in the cloud and conducting research, since the data volume is currently not large, it is more suitable to use the remote method for migration. Pay attention to the prerequisite requirements for the use of remote. If the number is too large, please refer to other migration methods.

Please increase before using the remote method
The value of the max_partitions_per_insert_block parameter should be set to avoid statement execution failure, for example, the following error message is reported:

Error message:
Too many partitions for a single INSERT block (more than 100). The limit is controlled by the 'max_partitions_per_insert_block' setting

Reason:
ClickHouse requires that the data written each time does not span too many partitions, which is controlled by the parameter 'max_partitions_per_insert_block'. Adjust this parameter to do so.

4. Steps

1. Creating a new ClickHouse cluster: Please apply for it on the cloud management platform. First, estimate the future data volume of the business, and then fill in the capacity configuration of the application.

2. Creating a database:

CREATE DATABASE IF NOT EXISTS new_database on cluster default;

Note that 'on cluster default;' must be included at the end.

3. Creating a table:

Write SQL based on the actual table fields and table engine. Refer to:
https://clickhouse.com/docs/en/sql-reference/statements/create/table

4. Establishing the test table

CREATE TABLE 
IF NOT EXISTS new_database.test_ck_01_local ON CLUSTER default
( 
    id String COMMENT 'Random primary key',
    dt Date COMMENT 'Partition field'
) 
ENGINE = ReplicatedReplacingMergeTree
('/clickhouse/new_database/tables/{shard}/test_ck_01_local', '{replica}')
PARTITION BY toYYYYMMDD
(dt) 
ORDER BY id;

CREATE TABLE
IF NOT EXISTS new_database.test_ck_01 ON CLUSTER default AS new_database.test_ck_01_local
ENGINE=Distributed(default, new_database, test_ck_01_local, rand());

5. Write test data:

Execute the data writing statement in the original clickhouse:
INSERT INTO old_database.test_ck_01 values('1', NOW());
INSERT INTO old_database.test_ck_01 values('2', NOW());
Add more data according to the actual situation.

Execute the query statement from the new ClickHouse cluster client: (if it fails, it means the network is not connected)
SELECT * from 
remote('old cluster address', old_database.test_ck_01, 'user', 'password')

6. Test migration command:

INSERT INTO new_database.test_ck_01
SELECT * from 
remote('old cluster address', old_database.test_ck_01, 'user', 'password')

7. The formal migration steps are as follows:

  • Modify the clickhouse address in the code in advance and replace it with the new address;
  • Notify the big data real-time person in charge to stop flink and other writing tasks;
  • Perform data migration to the new ClickHouse cluster (refer to the above migration statement);
  • Notify the big data real-time person in charge to start flink and other writing tasks;
  • Verify if the data is synchronized to the new ClickHouse cluster;
  • Deploy or restart in the灰度or pre-release environment, and call the query to the new ClickHouse cluster through code to check if it is normal.

8. Migration statement: (execute in the target clickhouse cluster client)

INSERT INTO new_database.data_table_to_be_migrated
SELECT * from 
remote('old cluster address', old_database.old_data_table, 'user', 'password')

9. Verify if the table data volume is consistent:

SELECT COUNT(1) from the data table to be migrated final

Note:The amount of data may be inconsistent after migration, please use the final merge query to merge duplicate data entries.

5 References

Official documentation:
https://clickhouse.com/docs/zh

JD Cloud clickhouse learning:
https://docs.jdcloud.com/cn/jchdb/product-overview

remote usage:
https://blog.csdn.net/u010180815/article/details/115070235

Author: JD Logistics Liu Dengzhong

Content source: JD Cloud Developer Community

你可能想看:

It is possible to perform credible verification on the system boot program, system program, important configuration parameters, and application programs of computing devices based on a credible root,

Exploration of ClickHouse Table Engine

Cloud Migration Security (Part Two): Understanding AWS Cloud Security Strategies from the Perspective of Buying and Decorating a House

Cloud Migration Security (Part 1): Gartner's 5R Security Migration Model

b) It should have a login failure handling function, and should configure and enable measures such as ending the session, limiting the number of illegal login attempts, and automatically logging out w

Article 2 of the Cryptography Law clearly defines the term 'cryptography', which does not include commonly known terms such as 'bank card password', 'login password', as well as facial recognition, fi

b) It should have the login failure handling function, and should configure and enable measures such as ending the session, limiting the number of illegal logins, and automatically exiting when the lo

Double-write database backup scheme for MySQL to TiDB migration

Data security can be said to be a hot topic in recent years, especially with the rapid development of information security technologies such as big data and artificial intelligence, the situation of d

Announcement regarding the addition of 7 units as technical support units for the Ministry of Industry and Information Technology's mobile Internet APP product security vulnerability database

最后修改时间:
admin
上一篇 2025年03月27日 22:19
下一篇 2025年03月27日 22:42

评论已关闭