The ranking of Elasticsearch in db_ranking has risen again, as shown in Figure 1-1; it can be seen that ES has become popular in the storage field and occupies a very important position.
As Elasticsearch becomes more and more popular, the cost spent by enterprises on ES construction is naturally not small. How to reduce the cost of ES? Today, we especially come to talk about the common methods of reducing costs and increasing efficiency of ES:
Elastic scaling
Tiered storage
Other: (1)Data compression(2)off heap
Figure 1-1 Elasticsearch db_ranking
1 Elastic scaling
So-called elasticity scaling, translated into plain language, means that it can quickly slim down and gain weight at any time, and it is a headache treatment for the head, adjusting resources dynamically as needed. When computing power is insufficient, we can quickly expand out computing resources; when storage resources are insufficient, we can quickly expand the disk.
1-1 Compute-storage separation
After ES uses the compute-storage separation architecture, it solves the problem of resource reservation causing resource waste. In the early days, the implementation method of compute-storage separation that everyone thought of was: using cloud disks instead of local disks, which can improve data reliability, quickly expand disk resources and computing resources, but the elasticity requirements of ES itself cannot be solved, that isSecond-level shard migration and replica expansion.
So how to solve the elasticity of ES itself? This part of the article will provide the answer.
Shared storage version of ES
This part of the article will introduce usJD Cloud Middleware Search Team developed the shared storage version of ES; the computation and storage separation architecture is shown in Figure 1-2.
Figure 1-2 Computation and storage separation architecture (shared)
As shown in Figure 1-2, we only store one copy of data, the primary shard is responsible for read and write, and the replica is only responsible for read; when we need to expand the replica, there is no need for data migration, and we can directly skip the two phases of peer recover in the native es,Second-level completion of replica expansion.
When the primary shard is relocating, you can directly skip the first phase of peer recover in the native es (this phase is the most time-consuming), and also do not need the second phase of sending translog of the native es.
The shared version of ES with computation and storage separation, compared with the native ES and the general version of computation and storage separation, has the followingOutstanding advantages:
Data is saved only once, and the storage cost is reduced by multiples
Storage capacity is automatically expanded on demand, almost no space waste
Billed based on actual usage, no need for capacity planning
Performance Testing
The dataset is http_logs provided by esrally
Shared version ES7.10.2: 3 data nodes (16C64GB)
Native ES7.10.2: 3 data nodes (16C64GB)
Table 1-1 Comparison of replica performance testing
Our preliminary performance test results are shown in Table 1-1; the more the number of replicas, the more advantageous the shared version of es is;
From Table 1-1, it can be seen that the performance improvement is not particularly ideal, and we are currently optimizing and improving from two aspects:
The underlying dependenciesCloud Sea Storage, and at present, we are actively planning for performance improvement
On the source code side, we are alsoOptimization in progress
During the development of es computation and storage separation, we have overcome many problems, and detailed articles will be published in the future to introduce them, such as:The specific implementation of master write and replica read-only,The near real-time access issue of replica,The issue of primary shard switching and dirty write in ESetc.
1-2 External Construction Segment
For scenarios with a large amount of writing, there is usually not a continuous high flow of writing, but only a flood peak of writing traffic for 1-2 hours; the most time-consuming process in the writing process is not writing to disk but building segment, since building segment is so time-consuming, then can we isolate this part of the function to form a resource that can be rapidly expanded (to avoid directly modifying the es source code and introducing other problems)?
Currently, the industry has some good examples of external construction of Segment, which is simpler to implement than the shared storage version of es; its core solution uses batch processing engines such as spark or map reduce for batch computation and processing, and then moves the built segment to the corresponding index shard. The function of external construction of segment is also in our planning.
2 Tiered Storage
Another direction for ES to achieve cost reduction and efficiency improvement: tiered storage. This solution is mainly for businesses with large amounts of data, few queries, and not very sensitive to query time.
2-1 Cold-Hot Architecture
Application scenarios of cold-hot architecture: time-series data or both of these indexes exist in the same cluster (one hot data, the other cold data).
The cold-hot architecture of es, this feature has already beenJD CloudIt has been online for some time, and everyone is welcome to try it according to their own business form. The cold data node enablement is shown in Figure 2-1.
Figure 2-1 Enable Cold Data Nodes
It is recommended that if the index table is stored on a daily/hourly basis, and the data queries have cold and hot characteristics, it is recommended to enable cold nodes; after enabling cold nodes, you may obtain the following benefits:
Enabling cold nodes can reduce your storage costs because we can choose to reduce the number of replicas for the cold node indexes, and the storage medium for cold nodes is cheaper.
The cluster can store more data.
Force merge cold data to improve the query performance of cold data.
After the cold data is migrated away from the hot nodes, it reduces the resource usage of the hot nodes, thereby making hot queries faster.
The core technology of the cold-hot architecture is
shard-allocation-filtering;
Principle of cold-hot architecture:
The following configuration is added to the hot nodes of es.
node.attr.box_type: hot
The following configuration is added to the warm nodes of es.
node.attr.box_type: warm
To limit shard allocation on hot nodes, add the following configuration to the hot data index setting.
"index.routing.allocation.require.box_type": "hot"
When data queries weaken, we can migrate data from hot nodes to warm nodes through the following configuration.
"index.routing.allocation.require.box_type": "warm"
2-2 Searchable Snapshots
Searchable snapshots are an advanced tiered storage based on the cold-hot architecture. Previously, after taking data snapshots, it was impossible to search the data within the snapshots. To search the data within the snapshots, the snapshot data needed to be restored first (the restore process might be relatively long) before it could be searched.
After introducing searchable snapshots, we can directly search for data within the snapshots, greatly reducing unnecessary resource usage.
3 Other
3-1 Data compression
In addition to reducing storage costs from the perspective of resources, using excellent compression algorithms is also an indispensable search based on the characteristics of the data itself; for time-series data, Facebook has opened an excellent compression algorithm zstd
It has been widely used in the industry.
Table 3-1 Comparison test results of three compression algorithms
Currently, there are also open-source enthusiasts who have submitted custom codecs providing Zstandard compression/decompression (zstd pr) in the lucene code base.
3-2 Off-heap
The storage capacity of a single es node is limited by the JVM heap memory, in order to make a single node be able to store more data, we need to reduce the data in the heap memory.
The largest proportion of the resident memory in the Elasticsearch heap is FST, that is, the space occupied by the tip (terms index) file, an index of 1TB occupies about 2GB or more memory, so in order to ensure the stable operation of the node, the industry generally believes that the number of open indexes on a single node does not exceed 5TB. Now, starting from ES 7.3 version, the tip file is modified to be loaded through mmap, which makes the memory occupied by FST shift from heap to off-heap (i.e., off-heap technology) managed by the operating system's pagecache [6].
Write an index of 1TB using the official esrally dataset geonames, use the _cat/segments API to view the segments.memory memory consumption, compare the memory consumption effect after off-heap, as shown in Table 3-2; JVM memory usage has decreased by about 78%;
Table 3-2 Usage of segments.memory memory consumption
4 References
[1] Indexing Service
[2] ES-Fastloader
[3] Large-scale testing of the new Elasticsearch cold layer searchable snapshots
[4] Introducing Elasticsearch searchable snapshots
[5] New improvements in version 7.7: Significantly reduce Elasticsearch heap memory usage
[6] Off-heap principle of Elasticsearch 7.3
Author: Yang Songbai

评论已关闭