본문으로 바로가기

ElasticSearch

category software engineering/backend 2022. 8. 29. 17:51
728x90

Scale-out architecture

Physical

An ES cluster is made up of multiple instances. The instances have their own computing power and storage to distribute write and read load.

We can think an index as a kind of a table of RDBMS. An index can have several shards, the number of shards can be different with any other indices. The reason why we are using multiple shards is to spread the load into multiple instances. So the number of shards need to be more than the number of instances.

A document, which can be thought as a records of RDBMS, will be inserted into a shard. The shard for a document has decided with _id of the document. The _id is needed to be unique in an index to find an exact shard which have the document we want to get. Given the fact that we find a shard from the _id, we should query every shards of an index if we search arbitrary documents without _id.

Logical

 

It is very common to have many indices by date while people tend to use ES for storing time-series data. Time-series data like logs or telemetry have characteristics which can be compared the others. They mostly added and very rarely updated or deleted, and the read usually occurs on recent data. So if we have separated index by date, we can move old one to colder instance which have slower storage and little computing power.

To achieve this we need an idea to search multiple indices at once, because we don’t want to pick indices to search manually. That’s why we need aliases. We want to write into an index and read from multiple indices. So we make two or more aliases to do that.

Sounds familiar? It’s similar with the index with shards above the physical chapter. Since the indices in an alias work as similar with the shards in an index, for the usage of date-based indices, we use the technics like rollover, shrink, and _forcemerge.

'software engineering > backend' 카테고리의 다른 글

Locking - Optimistic concurrency control (OCC)  (0) 2022.08.31
Rest? gRPC? GraphQL?  (0) 2022.08.31
DynamoDB  (0) 2022.08.29
Query on S3 ( Athena, Redshift )  (0) 2022.08.29
DocumentDB Vs MongoDB  (0) 2022.08.26