The ranking of Elasticsearch in db_ranking has risen again, as shown in Figure 1-1; it can be seen that ES has become popular in the storage field and occupies a very important position.

As Elasticsearch becomes more and more popular, the cost spent by enterprises on ES construction is naturally not small. How to reduce the cost of ES? Today, we especially come to talk about the common methods of reducing costs and increasing efficiency of ES:

Elastic scaling
Tiered storage
Other: (1)Data compression(2)off heap

Figure 1-1 Elasticsearch db_ranking

1 Elastic scaling

So-called elasticity scaling, translated into plain language, means that it can quickly slim down and gain weight at any time, and it is a headache treatment for the head, adjusting resources dynamically as needed. When computing power is insufficient, we can quickly expand out computing resources; when storage resources are insufficient, we can quickly expand the disk.

1-1 Compute-storage separation

After ES uses the compute-storage separation architecture, it solves the problem of resource reservation causing resource waste. In the early days, the implementation method of compute-storage separation that everyone thought of was: using cloud disks instead of local disks, which can improve data reliability, quickly expand disk resources and computing resources, but the elasticity requirements of ES itself cannot be solved, that isSecond-level shard migration and replica expansion.

So how to solve the elasticity of ES itself? This part of the article will provide the answer.

Shared storage version of ES

This part of the article will introduce usJD Cloud Middleware Search Team developed the shared storage version of ES; the computation and storage separation architecture is shown in Figure 1-2.

Figure 1-2 Computation and storage separation architecture (shared)

As shown in Figure 1-2, we only store one copy of data, the primary shard is responsible for read and write, and the replica is only responsible for read; when we need to expand the replica, there is no need for data migration, and we can directly skip the two phases of peer recover in the native es,Second-level completion of replica expansion.

When the primary shard is relocating, you can directly skip the first phase of peer recover in the native es (this phase is the most time-consuming), and also do not need the second phase of sending translog of the native es.

The shared version of ES with computation and storage separation, compared with the native ES and the general version of computation and storage separation, has the followingOutstanding advantages:

Data is saved only once, and the storage cost is reduced by multiples
Storage capacity is automatically expanded on demand, almost no space waste
Billed based on actual usage, no need for capacity planning

Performance Testing

The dataset is http_logs provided by esrally
Shared version ES7.10.2: 3 data nodes (16C64GB)
Native ES7.10.2: 3 data nodes (16C64GB)

Table 1-1 Comparison of replica performance testing

Our preliminary performance test results are shown in Table 1-1; the more the number of replicas, the more advantageous the shared version of es is;

From Table 1-1, it can be seen that the performance improvement is not particularly ideal, and we are currently optimizing and improving from two aspects:

The underlying dependenciesCloud Sea Storage, and at present, we are actively planning for performance improvement
On the source code side, we are alsoOptimization in progress

During the development of es computation and storage separation, we have overcome many problems, and detailed articles will be published in the future to introduce them, such as:The specific implementation of master write and replica read-only,The near real-time access issue of replica,The issue of primary shard switching and dirty write in ESetc.

1-2 External Construction Segment

For scenarios with a large amount of writing, there is usually not a continuous high flow of writing, but only a flood peak of writing traffic for 1-2 hours; the most time-consuming process in the writing process is not writing to disk but building segment, since building segment is so time-consuming, then can we isolate this part of the function to form a resource that can be rapidly expanded (to avoid directly modifying the es source code and introducing other problems)?

Currently, the industry has some good examples of external construction of Segment, which is simpler to implement than the shared storage version of es; its core solution uses batch processing engines such as spark or map reduce for batch computation and processing, and then moves the built segment to the corresponding index shard. The function of external construction of segment is also in our planning.

2 Tiered Storage

Another direction for ES to achieve cost reduction and efficiency improvement: tiered storage. This solution is mainly for businesses with large amounts of data, few queries, and not very sensitive to query time.

2-1 Cold-Hot Architecture

Application scenarios of cold-hot architecture: time-series data or both of these indexes exist in the same cluster (one hot data, the other cold data).

The cold-hot architecture of es, this feature has already beenJD CloudIt has been online for some time, and everyone is welcome to try it according to their own business form. The cold data node enablement is shown in Figure 2-1.

Figure 2-1 Enable Cold Data Nodes

It is recommended that if the index table is stored on a daily/hourly basis, and the data queries have cold and hot characteristics, it is recommended to enable cold nodes; after enabling cold nodes, you may obtain the following benefits:

Enabling cold nodes can reduce your storage costs because we can choose to reduce the number of replicas for the cold node indexes, and the storage medium for cold nodes is cheaper.
The cluster can store more data.
Force merge cold data to improve the query performance of cold data.
After the cold data is migrated away from the hot nodes, it reduces the resource usage of the hot nodes, thereby making hot queries faster.

The core technology of the cold-hot architecture is
shard-allocation-filtering;
Principle of cold-hot architecture:
The following configuration is added to the hot nodes of es.

node.attr.box_type: hot

The following configuration is added to the warm nodes of es.

node.attr.box_type: warm

To limit shard allocation on hot nodes, add the following configuration to the hot data index setting.

"index.routing.allocation.require.box_type": "hot"

When data queries weaken, we can migrate data from hot nodes to warm nodes through the following configuration.

"index.routing.allocation.require.box_type": "warm"

2-2 Searchable Snapshots

Searchable snapshots are an advanced tiered storage based on the cold-hot architecture. Previously, after taking data snapshots, it was impossible to search the data within the snapshots. To search the data within the snapshots, the snapshot data needed to be restored first (the restore process might be relatively long) before it could be searched.

After introducing searchable snapshots, we can directly search for data within the snapshots, greatly reducing unnecessary resource usage.

3 Other

3-1 Data compression

In addition to reducing storage costs from the perspective of resources, using excellent compression algorithms is also an indispensable search based on the characteristics of the data itself; for time-series data, Facebook has opened an excellent compression algorithm zstd
It has been widely used in the industry.

Table 3-1 Comparison test results of three compression algorithms

Currently, there are also open-source enthusiasts who have submitted custom codecs providing Zstandard compression/decompression (zstd pr) in the lucene code base.

3-2 Off-heap

The storage capacity of a single es node is limited by the JVM heap memory, in order to make a single node be able to store more data, we need to reduce the data in the heap memory.

The largest proportion of the resident memory in the Elasticsearch heap is FST, that is, the space occupied by the tip (terms index) file, an index of 1TB occupies about 2GB or more memory, so in order to ensure the stable operation of the node, the industry generally believes that the number of open indexes on a single node does not exceed 5TB. Now, starting from ES 7.3 version, the tip file is modified to be loaded through mmap, which makes the memory occupied by FST shift from heap to off-heap (i.e., off-heap technology) managed by the operating system's pagecache [6].

Write an index of 1TB using the official esrally dataset geonames, use the _cat/segments API to view the segments.memory memory consumption, compare the memory consumption effect after off-heap, as shown in Table 3-2; JVM memory usage has decreased by about 78%;

Table 3-2 Usage of segments.memory memory consumption

4 References

[1] Indexing Service
[2] ES-Fastloader
[3] Large-scale testing of the new Elasticsearch cold layer searchable snapshots
[4] Introducing Elasticsearch searchable snapshots
[5] New improvements in version 7.7: Significantly reduce Elasticsearch heap memory usage
[6] Off-heap principle of Elasticsearch 7.3

Author: Yang Songbai

你可能想看：

d) Adopt identification technologies such as passwords, password technologies, biometric technologies, and combinations of two or more to identify users, and at least one identification technology sho

Common attack methods used to conceal real IP addresses in network attacks and methods for tracing and tracing false IP addresses

Article 2 of the Cryptography Law clearly defines the term 'cryptography', which does not include commonly known terms such as 'bank card password', 'login password', as well as facial recognition, fi

Announcement regarding the addition of 7 units as technical support units for the Ministry of Industry and Information Technology's mobile Internet APP product security vulnerability database

EMOTET banking trojan is still active： shellcode release methods, infrastructure updates, and traffic encryption

5. Collect exercise results The main person in charge reviews the exercise results, sorts out the separated exercise issues, and allows the red and blue sides to improve as soon as possible. The main

04／7 The systematic security risks of outsourcing and crowdsourcing are no different from those of formal employees

Ensure that the ID can be accessed even if it is guessed or cannot be tampered with; the scenario is common in resource convenience and unauthorized vulnerability scenarios. I have found many vulnerab

Distributed Storage Technology (Part 2)： Analysis of the architecture, principles, characteristics, and advantages and disadvantages of wide-column storage and full-text search engines

I introduction of black rose Lucy MaaS products