Ant OceanBase was commercially available on Alibaba Cloud in March 2020 and officially opened to the public on the public cloud. There are also related ecosystem products on the Internet, including OCP (OceanBase Cloud Platform), OTA (ocean base tuning Advisor), OMS (OceanBase Migration Service), and ODC (OceanBase Developer Center).
First, deploy OceanBase servers on the public cloud
Ant Financial's self-developed distributed relational database OceanBase is a pure native distributed relational database, which is completely controllable at the code level.Ant OceanBaseThe product is deployed with three replicas based on the 3AZ architecture, ensuring data consistency among multiple nodes through the Paxos protocol. Even in the event of a single point of failure or a single AZ failure, business continuity can be guaranteed, with RPO=0 and RTO.

In terms of security and availability, Ant OceanBase is very suitable for financial business scenarios. Due to regulatory requirements, financial business scenarios (such as banks, etc.) cannot be placed on the public cloud. However, this does not affect other financial businesses, such as insurance, funds, and so on.
Second, the architecture principle of Ant OceanBase
Unlike most distributed systems, OceanBase does not have a separate master server or master process. Typically, distributed systems contain an overall control process for global management, load balancing, and other purposes. Ant OceanBase does not have a separate master process. Its master is a service called RootService, integrated into ObServer. OceanBase dynamically selects an observer from all working machines to perform the overall control service. Additionally, when the observer hosting the master service fails, the system automatically selects a new observer to provide the master service. The benefit of this approach is to simplify deployment, although the implementation is complex, it greatly reduces the cost of use.
OceanBase can be infinitely expanded through its partitioning capability. Unlike traditional databases, where all partitions are located on a single server, each partition of OceanBase can be distributed across different servers, with each partition having three replicas. From the perspective of the data model, OceanBase can be regarded as a multi-machine implementation of the traditional database partitioned table. It can integrate all data generated by different users into a unified table. Regardless of how these partitions are distributed across multiple servers, the entire system presents a single table to the user, with the backend implementation completely transparent to the user. OceanBase uses OBProxy in the user portal. It is an access proxy that forwards requests to the appropriate server based on the data requested by the user. The biggest highlight of OBProxy is its outstanding performance, which can reach one million transactions per second on a very ordinary server.
Multiple partitions are distributed across multiple servers. Since multiple partitions span across observers, distributed transactions are internally implemented through two-phase commit. Of course, the performance of the two-phase commit protocol is relatively poor, and Ant OceanBase has made many internal optimizations. It proposes the concept of partition groups, putting multiple frequently accessed partitions of different tables together, or those with similar access patterns, into a partition group. OceanBase will schedule the same partition group to a single server in the background as much as possible to avoid distributed transactions. At the same time, internal implementation of the two-phase commit protocol is optimized. The two-phase commit protocol involves multiple servers, including coordinators and participants. Participants maintain the local state of each server, while coordinators maintain the global state of the distributed transaction. The usual practice is to retain a coordinator's log to persist the global state of the distributed transaction, while OceanBase's approach is to recover the distributed transaction by querying the state of all participants in case of failure. This saves the coordinator's log, and as long as all participants successfully pre-commit, the entire transaction is successful, and there is no need to wait for the coordinator to write the log before responding to the client.
Third, submarine storage architecture
OceanBase is a shared notification architecture. Each observer has an independent storage engine that stores data locally to meet the need for continuous service in disaster recovery scenarios. Ant OceanBase adopts the LSM tree architecture for cache and data storage design, writing data first to the MemTable in memory, making the most frequently accessed and active data accessible in memory, which greatly improves the access efficiency of hot data. When the write to MemTable reaches a certain threshold, the data in MemTable will be merged and transferred to the SSTable on disk. In many storage systems based on the LSM tree, to solve the performance issues of write operations, tables are usually divided into multiple layers. When the number or size of tables in a layer reaches a certain threshold, they are merged into the next layer.
Within OceanBase, there are many different types of caches, similar to Oracle and MySQL buffers. Caches are used for block caching of table data, as well as row caching, log caching, and location caching, etc. Baseline data is cached in memory to improve query performance. For different tenants, each tenant has its own independent cache, which can be configured with corresponding upper and lower limits of memory usage for the tenant, thus isolating tenants or preempting oversold resources, suitable for different scenarios.
In terms of storage costs, Ant's OceanBase adopts many data compression algorithms, such as lz4, zstd, etc. OceanBase will slim down the dataset in two layers. The first layer is encoding, which uses dictionaries, RLE, and other algorithms to streamline data. The second layer is general compression, which uses lz4 and other compression algorithms to slim down the data after re-encoding. Compared with the traditional compression of MySQL Innodb, the zstd algorithm can save only 1/3 of the storage space for the same dataset, helping users greatly save storage costs. What's more important, the fixed-length page design and compression in traditional databases will inevitably produce storage holes, affecting the compression efficiency. However, in storage systems using the LSM tree architecture like OceanBase, compression has no impact on data write performance.
Fourth, Ant's OceanBase SQL engine
OceanBase tenants support Oracle and MySQL compatibility. Firstly, compared with traditional MySQL, OB not only supports hard parsing but also supports soft parsing like Oracle. At the same time, the parser also supports SQL parameterization and binding variables. The parser puts the parsed SQL template and execution plan into the plan cache, and SQL in the plan cache can save the overhead of hard parsing each time, thereby improving the running efficiency of SQL.
Based on the LSM tree storage architecture, OB has designed a unique cost model, introduced statistical information, and has an optimizer based on the code model, which means that OB can calculate the optimal access path for each SQL based on statistical information and provide the optimal execution plan. At the same time, OB can dynamically bind solid execution plans online according to user needs, which can provide convenience for emergency and efficient scenarios. On the executor side, OB not only supports nested loop join methods, but also supports hash join and merge join, which improves the efficiency of large table joins. It also supports concurrent execution, distributed SQL, and more.
Fifth, the AACID feature of Ant's OceanBase
OceanBase is a distributed relational database that conforms to the ACID principle. On the basis of traditional ACID, OceanBase particularly emphasizes A, availability. The multi-replica log replication based on the Paxos protocol can provide business continuity without data loss in the event of a single point of failure. In terms of consistency, OB adopts MVCC multi-version consistent reading. When data blocks are updated, OB will open a new data block and bring the data version into the transaction id. Only SQL in the transaction can be accessed, and uncommitted data will not be accessed by other sessions. In terms of isolation, OceanBase supports Oracle's commit read and serialization two transaction isolation levels, and is well compatible with Oracle. As for durability, like most traditional databases, logs are given top priority. When a transaction is committed, the redo log is successfully written before the data is written. In the event of an exception, there will be no ambiguity in the data.
In terms of data security, OceanBase has also taken various protective measures to ensure the maximum security of data. For example, the recycle bin mechanism sets the switch at the tenant level. When the recycle bin is open, in the case of drop table or truncate, data will not be deleted immediately but will enter the recycle bin. During the stay in the recycle bin, tables can be restored to their original state through the flashback command, which greatly reduces the risks brought by some incorrect operations.
For data modification operations such as deletion and update, OB supports flashback queries based on location, which can restore data to a certain point in time, thereby enabling the ability to retrieve data for SQL execution errors in business or operations. At the same time, queries are also supported under the Oracle tenant starting from timestam/scn.
In October 2019, OceanBase topped the TPC-C performance test. It created a world record of 60.88 million tpmc, twice the previous top Oracle. In November of the same year, it also set a world record in the Alipay Double 11 promotion with a payment peak of 61 million transactions per second. After multiple extreme business tests, OceanBase has proven that distributed databases can match centralized databases in terms of performance, reliability, and availability. Traditional commercial databases, such as Oracle, SQL Server, DB2, etc., all rely on high-end hardware equipment (minicomputers, storage, and fiber optic networks), while OceanBase only needs ordinary PC servers, SSD disks, and 10 gigabit networks. Moreover, it also has a high storage compression ratio. After the cloud migration of Ant OceanBase, other management platforms (OCP, ODC, OTA) are currently free, except for the database itself which is charged according to regulations, and migration services are charged by the hour. OCP can conveniently manage clusters, tenants, and databases, allowing users to monitor the performance of tenants and nodes. ODC can conveniently manage and maintain database objects (tables/views/functions/stored procedures, etc.). Its SQLConsole can be used to conveniently operate the database. Through OTA, it can quickly identify problematic SQL in the current business foundation, provide optimization suggestions, and bind execution plans. Using these platforms can make operations easier and reduce the difficulty of operation and maintenance.

评论已关闭