In terms of enterprise data security governance, in addition to being familiar with the provisions of laws and regulations, minimizing information collection, clearly defining privacy agreements at service entry, more importantly, it is necessary to build internal basic capabilities such as data recognition, classification and grading, data encryption, and permission control, which are the basic capabilities of data security.
This article focuses on the concept of data-centric, surrounding data recognition, classification and grading, basic protection, and combines open-source software for a sorting and functional demonstration, hoping to help those in need have a direct understanding of data security.

Based on data recognition, establish a data asset dashboard to achieve full lifecycle management of data asset risk identification, monitoring, and operation;
Based on the classification and grading of data, classify and grade different data assets, and invest superior resources in the security protection of key assets;
In terms of data security basic protection, in addition to having a secure and stable infrastructure and architecture, based on data recognition and classification, encrypt sensitive data for storage and transmission, control account permissions, desensitize data, and manage data distribution, combined with internal and external risk changes, ultimately moving towards the road of data security risk control.
1. Data recognition and classification
In today's era of big data, how to do a good job in data recognition and classification, and on this basis, establish a full lifecycle management of data assets, is a challenge faced by many enterprises. For example, how many databases within the enterprise store phone numbers explicitly, and how many interfaces expose phone number fields to the outside, and what risks do these databases and interfaces face, and how to achieve full lifecycle risk control. In terms of structured data, such as field recognition in database tables, semi-structured data, such as data recognition in logs, and unstructured data such as images, audio and video files, this still presents a considerable challenge to many enterprises in terms of the scope of recognition capabilities, recognition accuracy, and impact on performance.
1.1. Content recognition example
Data recognition can be realized through keywords, regular expressions, algorithms, etc., with many online articles and some large companies also having mature recognition technologies and solutions. In terms of implementation, it is mainly based on business scenarios, and in terms of data types, it is mainly divided into structured, semi-structured, and unstructured data recognition.
Structured: Relational databases
Semi-structured: Log data, JSON data, XML documents, etc.
Unstructured: HTML web pages, office documents, images, audio and video files, etc.
1.2. Classification and tiered management display
2. Use ShardingSphere-Proxy to implement sensitive data encryption
ShardingSphere is an Apache top-level open-source project aimed at building standards and ecosystems on top of heterogeneous databases. It focuses on how to fully and reasonably utilize the computational and storage capabilities of databases rather than implementing a new database. ShardingSphere stands at the upper level of the database, focusing more on their collaboration than on the database itself.
Connection, incremental, and pluggable are the core concepts of Apache ShardingSphere.
- Connection: Connect applications with multi-model heterogeneous databases quickly through flexible adaptation to database protocols, SQL dialects, and database storage.
- Incremental: Obtain database access traffic and provide transparent incremental functions such as traffic redirection (data sharding, read-write separation, shadow library), traffic transformation (data encryption, data desensitization), traffic authentication (security, audit, permissions), traffic governance (circuit breaking, throttling), and traffic analysis (service quality analysis, observability).
- Pluggable: The project adopts a microkernel + three-layer pluggable model, making the kernel, functional components, and ecological connection fully capable of being flexibly expanded in a plug-and-play manner. Developers can customize their unique systems as if using building blocks.
ShardingSphere-Proxy is positioned as a transparent database proxy endpoint, providing a server version wrapped with database binary protocols to support heterogeneous languages. Currently, it provides MySQL and PostgreSQL.
Installation
Download the latest release of ShardingSphere-Proxy. After unpacking, modify the conf/server.yaml and files prefixed with config-, such as conf/config-encrypt.yaml, for field encryption configuration. Other configurations such as sharding rules and read-write separation rules are not discussed here. Run bin/start.sh on Linux operating systems, where you can specify the proxy port, bin/start.sh 3308.
Sensitive field configuration
(base) gengdeMacBook-Pro:conf js2thon$ mysql -h127.0.0.1 -uroot -P3308 Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 13 Server version: 8.0.20-Sharding-Proxy 4.1.0 Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show tables; +----------------------+------------+ | Tables_in_encrypt_db | Table_type | +----------------------+------------+ | t_encrypt | BASE TABLE | +----------------------+------------+ 1 row in set (0.03 sec) mysql> select * from t_encrypt; +----------+---------+-------------+ | order_id | user_id | phone | +----------+---------+-------------+ | 10 | 0 | 18516014911 | | 11 | 1 | 18516014922 | | 12 | 2 | 18516014933 | | 13 | 3 | 18516014944 | | 14 | 4 | 18516014955 | +----------+---------+-------------+ 5 rows in set (0.09 sec) mysql> select * from t_encrypt; +----------+---------+--------------------------+-------------+ | order_id | user_id | phone_cipher | phone | +----------+---------+--------------------------+-------------+ | 10 | 0 | uFZ1RGQfxsUM+GUJqI5rlQ== | 18516014911 | | 11 | 1 | SGxnMaUHY/HR50hJcYp6Vg== | 18516014922 | | 12 | 2 | Z5NBefdS9WN3Bl6p45R1Dw== | 18516014933 | | 13 | 3 | SKqYOUF4dxloUH5M9t/wEg== | NULL | | 14 | 4 | 4q+dOa+bxUTFSzX6AOjvUg== | NULL | +----------+---------+--------------------------+-------------+
3. Implementing database dynamic credentials and data encryption/decryption interface calls using Vault
Hashicorp Vault solves the problem of managing sensitive information, such as database credentials and API keys, which are sensitive data that need to be stored and provided to applications in a secure manner. Vault supports many secret engines, some of which are as follows:
Key-Value: Simple static key-value pairs
Dynamically generated credentials: Generated by Vault based on client requests
Encryption key: Used to perform encryption functions using client data
3.1. Database dynamic credentials
The installation and configuration of Vault is relatively simple, here we mainly list the relevant configuration information:
Database link configuration
Role configuration
Get database credentials
Use dynamic credentials for login verification
(base) js2thondeMacBook-Pro:Downloads js2thon$ mysql -u v-my-r-owFmZ3LFu -pM8DdaYZXYRU-rNIm2CbQ mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1379 Server version: 5.6.41-log MySQL Community Server (GPL) Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>
The user was only configured with SELECT permissions in Vault, and the verification can be queried normally.
mysql> use mysql_test; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> show tables; +----------------------+ | Tables_in_mysql_test | +----------------------+ | account | +----------------------+ mysql> select * from account; +------+------+ | id | name | +------+------+ | 100 | abc | +------+------+ 1 row in set (0.00 sec)
An error occurred during the data insertion operation because Vault reported an error due to lack of permissions.
mysql> insert into account values(101,'def'); ERROR 1142 (42000): INSERT command denied to user 'v-my-r-owFmZ3LFu'@'localhost' for table 'account'
3.2. Data encryption and decryption interface calls
Vault provides encryption and decryption interfaces as a KMS, and data encryption and decryption are realized by calling the interface. The creation of Vault data keys can be referred to in the official documentation, and this will be ignored here.
Key creation in Vault management backend
Encryption and decryption interface calls are implemented in Python
4. Issues and Thoughts
This article outlines the process from data recognition, classification and grading, configuration management, and storage field encryption, and it actually requires a lot of horizontal and vertical expansion. Unified key management, account permissions, asset dashboard, risk monitoring, and operational management, these constitute the basic capabilities of data security. The above list only includes some open-source tools' capabilities, and only a simple functional demonstration of single points has been done, without a detailed understanding of the implementation mechanism and advantages and disadvantages. Subsequent efforts will be made to conduct in-depth research, and it is also hoped that peers with practical experience in implementation can communicate, such as multidimensional practical exploration in aspects such as architectural scheme design, heterogeneous adaptation, and performance stability.

评论已关闭