深入了解Elasticsearch存储

背景

In this article we’ll investigate the files written to the data directory by various parts of Elasticsearch. We will look at node, index and shard level files and give a short explanation of their contents in order to establish an understanding of the data written to disk by Elasticsearch.

本文对 Elasticsearch 7.10 适用
Elasticsearch 7.10 对应 Lucene 8.7
Lucene 8.7 关于扩展名的官方文档 https://lucene.apache.org/cor…

第一部分路径

Elasticsearch 运行前需要配置多种文件系统存储路径。分别在JVM启动参数或者配置文件中config/elasticsearch.yml中进行配置。

1.1 `JVM`中路径参数

path.home：运行 Elasticsearch 进程的用户的主目录。默认为 Java 系统属性user.dir，它是进程所有者的默认主目录。我们在bin/elasticsearch执行脚本中找到下面的Java启动参数：
1
-Des.path.home="$ES_HOME"
而这个变量ES_HOME在bin/elasticsearch-env(source 这个变量文件)中定义如下：
1
2
ES_HOME=`dirname "$SCRIPT"`
ES_HOME=`cd "$ES_HOME"; pwd`
所以默认路径为bin同路径，例如我们测试Elasticsearch的path.home路径为：/usr/share/elasticsearch/bin。
path.conf: 服务的配置文件路径。和path.home定义方式相同，在bin/elasticsearch中：
1
-Des.path.conf="$ES_PATH_CONF"
变量值在bin/elasticsearch-env中定义如下：
1
if [ -z "$ES_PATH_CONF" ]; then ES_PATH_CONF="$ES_HOME"/config;
path.plugins：Elasticsearch 插件目录。用于存放各类插件。位置为：$ES_HOME/plugins。

1.2 配置文件中路径

path.logs：Elasticsearch 运行日志存储路径。配置文件config/elasticsearch.yml可指定。
path.data：Elasticsearch 数据存储路径。

1.3 全部文件树

# tree -L 1 elasticsearch/
elasticsearch/
|-- LICENSE.txt       # 证书
|-- NOTICE.txt        # 提示
|-- README.asciidoc   # 说明
|-- bin               # 可执行文件
|-- config            # 配置路径
|-- data              # 数据路径
|-- jdk               # JDK路径，高版本为了解决jdk兼容问题，自带。减少用户自配置各类兼容问题。
|-- lib               # 依赖包路径
|-- logs              # 日志路径
|-- modules           # 各类模块
|-- plugins           # 插件路径

本文我们将详细介绍数据存储路径 ( path.data) 中的存储内容和用途。

第二部分数据路径

我们现在谈到全文检索，通常都是Elasticsearch或者solr。其实两者都是对搜索引擎Lucene的封装，Lucene成为一个依赖包。

我们查了一下Elasticsearch和solr两个词在Google趋势对比（2004-2022），明显Elasticsearch占优势（Elasticsearch 蓝色曲线，solr红色曲线）。

2.1 存储架构

Elasticsearch 底层使用 Lucene 来处理分片级别的索引和查询，因此数据目录中的文件由 Elasticsearch 和 Lucene 编写。其中Lucene 负责读写维护 Lucene 索引文件，而 Elasticsearch 在 Lucene 之上读写管理相关的元数据，例如字段映射、索引设置等。

Elasticsearch中一个Shard分片就是一个Lucene Index，每个Lucene里的Segment为Lucene存储的最小管理单元。例如下图中对于分片I1P2进行了案例说明。

lucene

我们先看看 Elasticsearch 写入的数据的外部级别。

2.2 数据路径

数据节点data/nodes/0

# tree -L 3 data/
data/
`-- nodes
    `-- 0
        |-- _state
        |-- indices
        `-- node.lock

node.lock，文件用于确保一次只有一个 Elasticsearch 安装从单个数据目录读取/写入。
_state，目录中存放集群元数据信息。
indices，目录中存放集群index索引数据。

2.3 集群元数据

# tree _state/ 
_state/
|-- _1q.fdt
|-- _1q.fdx
|-- _1q.fnm
|-- _1q.si
|-- _1q_2.liv
|-- _1q_Lucene84_0.doc
|-- _1q_Lucene84_0.tim
|-- _1q_Lucene84_0.tip
|-- _1u.cfe
|-- _1u.cfs
|-- _1u.si
|-- _1x.cfe
|-- _1x.cfs
|-- _1x.si
|-- _1z.cfe
|-- _1z.cfs
|-- _1z.si
|-- manifest-2.st
|-- node-2.st
|-- segments_27
`-- write.lock

Name	Extension	Brief Description
Segments File	segments_N	Stores information about a commit point
Lock File	write.lock	The Write lock prevents multiple IndexWriters from writing to the same file.(写入锁，防止多个IndexWriters 同时写一个文件)
Segment Info	.si	Stores metadata about a segment（segment的元数据信息，指明这个segment都包含哪些文件）
Compound File	.cfs, .cfe	An optional “virtual” file consisting of all the other index files for systems that frequently run out of file handles.
Fields	.fnm	Stores information about the fields
Field Index	.fdx	Contains pointers to field data
Field Data	.fdt	The stored fields for documents(Field数据文件)
Term Dictionary	.tim	The term dictionary, stores term info
Term Index	.tip	The index into the Term Dictionary
Frequencies	.doc	Contains the list of docs which contain each term along with frequency（保留包含每个Term的文档列表）
Positions	.pos	Stores position information about where a term occurs in the index
Payloads	.pay	Stores additional per-position metadata information such as character offsets and user payloads
Norms	.nvd, .nvm	Encodes length and boost factors for docs and fields
Per-Docukment Values	.dvd, .dvm	Encodes additional scoring factors or other per-document information.
Term Vector Index	.tvx	Stores offset into the document data file
Term Vector Documents	.tvd	Contains information about each document that has term vectors
Term Vector Fields	.tvf	The field level info about term vectors
Live Documents	.liv	Info about what files are live

更有趣的是global-0.st-file。global-前缀表示这是一个全局状态文件，而扩展.st名表示这是一个包含元数据的状态文件。正如您可能已经猜到的那样，这个二进制文件包含有关集群的全局元数据，前缀后面的数字表示集群元数据版本，这是遵循集群的严格递增的版本控制方案。

虽然技术上可以在紧急情况下使用十六进制编辑器编辑这些文件，但强烈建议不要这样做，因为它会很快导致数据丢失。

https://www.shenyanchao.cn/blog/2018/12/04/lucene-index-files/

2.4 索引数据

[root@f1f4420ca021 0]# tree -L 1 indices/
indices/
|-- Ce1hPxBFTc6xLH9zyzWRog
|-- LFTAEix1Q6u79iWX5wFdAg
|-- cNEZTRZGTtWn2f5r7zjveQ
|-- pg9zI9o1QNKFav7KsxVQ-Q
|-- r0lRikPvTUOWZaloPg7QXw
`-- wHQ5gnEESM2GWgQsybPTKA

让我们创建一个单一的分片索引并查看 Elasticsearch 更改的文件：

# tree cNEZTRZGTtWn2f5r7zjveQ
cNEZTRZGTtWn2f5r7zjveQ
|-- 0
|   |-- _state
|   |   |-- retention-leases-9.st
|   |   `-- state-0.st
|   |-- index
|   |   |-- _6.cfe
|   |   |-- _6.cfs
|   |   |-- _6.si
|   |   |-- _6_1.fnm
|   |   |-- _6_1_Lucene80_0.dvd
|   |   |-- _6_1_Lucene80_0.dvm
|   |   |-- _7.cfe
|   |   |-- _7.cfs
|   |   |-- _7.si
|   |   |-- segments_6
|   |   `-- write.lock
|   `-- translog
|       |-- translog-4.tlog
|       `-- translog.ckp
`-- _state
    `-- state-2.st

我们看到已经创建了一个与索引名称对应的新目录。该目录有两个子文件夹：_state和0. 前者包含所谓的索引状态文件 ( indices/{index-name}/_state/state-{version}.st)，其中包含有关索引的元数据，例如其创建时间戳。它还包含唯一标识符以及索引的设置和映射。后者包含与索引的第一个（也是唯一一个）分片（分片 0）相关的数据。接下来，我们将仔细研究一下。

分片数据

分片数据目录包含分片的状态文件，其中包括版本控制以及有关分片被视为主分片还是副本的信息。

$ tree -h data/elasticsearch/nodes/0/indices/foo/0
data/elasticsearch/nodes/0/indices/foo/0
├── [ 102]  _state
│   └── [  81]  state-0.st
├── [ 170]  index
│   ├── [  36]  segments.gen
│   ├── [  79]  segments_1
│   └── [   0]  write.lock
└── [ 102]  translog
    └── [  17]  translog-1429697028120

在早期的 Elasticsearch 版本中，在分片数据目录中也可以找到单独的{shard_id}/index/_checksums-文件（和.cks-files）。在当前版本中，这些校验和现在位于 Lucene 文件的页脚中，因为 Lucene 已为其所有索引文件添加了端到端校验和。

该{shard_id}/index目录包含 Lucene 拥有的文件。Elasticsearch 通常不会直接写入此文件夹（在早期版本中发现较旧的校验和实现除外）。这些目录中的文件构成了任何 Elasticsearch 数据目录的大部分大小。

在我们进入 Lucene 的世界之前，我们将看一下 Elasticsearch 事务日志，不出所料，它位于每个分片translog目录中，前缀为translog-. 事务日志对于 Elasticsearch 的功能和性能非常重要，因此我们将在下一节更深入地解释它的使用。

每个分片的事务日志

Elasticsearch事务日志确保数据可以安全地被索引到 Elasticsearch 中，而无需为每个文档执行低级别的 Lucene 提交。提交 Lucene 索引会在 Lucene 级别创建一个新段，该段是fsync()-ed 并导致大量磁盘 I/O 影响性能。

为了接受文档进行索引并使其可搜索而不需要完整的 Lucene 提交，Elasticsearch 将其添加到LuceneIndexWriter并将其附加到事务日志中。每次之后refresh_interval，它都会调用reopen()Lucene 索引，这将使数据无需提交即可搜索。这是 Lucene 近实时 API 的一部分。当IndexWriter由于事务日志的自动刷新或显式刷新操作而最终提交时，先前的事务日志将被丢弃，而新的事务日志将取而代之。

如果需要恢复，Lucene中写入磁盘的段将首先恢复，然后重放事务日志，以防止丢失尚未完全提交到磁盘的操作。

Lucene 索引文件

Lucene 在记录Lucene 索引目录中的文件方面做得很好，为了方便起见，在此处复制（Lucene 中的链接文档还详细介绍了这些文件自 Lucene 2.1 以来所经历的更改，因此请检查出来）：

Name	Extension	Brief Description
Segments File	segments_N	Stores information about a commit point
Lock File	write.lock	The Write lock prevents multiple IndexWriters from writing to the same file.
Segment Info	.si	Stores metadata about a segment
Compound File	.cfs, .cfe	An optional “virtual” file consisting of all the other index files for systems that frequently run out of file handles.
Fields	.fnm	Stores information about the fields
Field Index	.fdx	Contains pointers to field data
Field Data	.fdt	The stored fields for documents
Term Dictionary	.tim	The term dictionary, stores term info
Term Index	.tip	The index into the Term Dictionary
Frequencies	.doc	Contains the list of docs which contain each term along with frequency
Positions	.pos	Stores position information about where a term occurs in the index
Payloads	.pay	Stores additional per-position metadata information such as character offsets and user payloads
Norms	.nvd, .nvm	Encodes length and boost factors for docs and fields
Per-Docukment Values	.dvd, .dvm	Encodes additional scoring factors or other per-document information.
Term Vector Index	.tvx	Stores offset into the document data file
Term Vector Documents	.tvd	Contains information about each document that has term vectors
Term Vector Fields	.tvf	The field level info about term vectors
Live Documents	.liv	Info about what files are live

通常，您还会segments.gen在 Lucene 索引目录中看到一个文件，该文件是一个帮助文件，其中包含有关当前/最新segments_N文件的信息，用于可能无法通过目录列表返回足够信息来确定最新一代段文件的文件系统.

在较旧的 Lucene 版本中，您还可以找到带有.del后缀的文件。它们与 Live Documents ( ) 文件的用途相同.liv——换句话说，它们是删除列表。如果您想知道所有这些关于实时文档和删除列表的讨论是关于什么的，您可能想阅读自下而上文章中关于在我们的 Elasticsearch中构建索引的部分。

修复有问题的碎片

由于 Elasticsearch 分片包含 Lucene 索引，因此我们可以使用 Lucene 出色的CheckIndex 工具，它使我们能够扫描和修复有问题的段，通常数据丢失最少。我们通常会建议 Elasticsearch 用户简单地重新索引数据，但如果由于某种原因无法重新索引并且数据非常重要，那么即使需要相当多的手动工作和时间，也可以采用这种方法，取决于分片的数量及其大小。

LuceneCheckIndex工具包含在默认的 Elasticsearch 发行版中，无需额外下载。

# 更改它以反映您的分片路径，格式为# {path.data}/{cluster_name}/nodes/{node_id}/indices/{index_name}/{shard_id}/index/


$ export SHARD_PATH = data / elasticsearch / nodes / 0 / indices / foo / 0 / index / 
$java - cp lib / elasticsearch- *。jar : lib /*:lib/sigar/* -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex $SHARD_PATH

-fix如果 CheckIndex 检测到问题并且它的修复建议看起来很合理，您可以通过添加命令行参数告诉 CheckIndex 应用修复程序。

存储快照

您可能想知道所有这些文件如何转换为快照存储库使用的存储。不要再想了：获取这个集群，将其作为my-snapshot基于文件系统的网关进行快照，然后检查存储库中的文件，我们会找到这些文件（为简洁起见，省略了一些文件）：

$ tree -h snapshots
snapshots
├── [  31]  index
├── [ 102]  indices
│   └── [ 136]  foo
│       ├── [1.2K]  0
│       │   ├── [ 350]  __0
│       │   ├── [1.8K]  __1
...
│       │   ├── [ 350]  __w
│       │   ├── [ 380]  __x
│       │   └── [8.2K]  snapshot-my-snapshot
│       └── [ 249]  snapshot-my-snapshot
├── [  79]  metadata-my-snapshot
└── [ 171]  snapshot-my-snapshot$ tree - h 快照
快照
├── [ 31 ]  索引
├── [ 102 ]  索引
│ └── [ 136 ]   foo
 │ ├── [ 1.2K ] 0 │ │ ├── [ 350 ]   __0
 │ │ ├── [ 1.8K ]   __1
 ... │ │ ├── [ 350 ]   __w
 │ │ ├── [ 380 ]   __x
 │ │ └── [                    
                       
                                   8.2K ]  快照-我的-快照
│ └── [ 249 ]  快照-我的-快照
├── [ 79 ]  元数据-我的-快照
└── [ 171 ]  快照-我的-快照

在根目录下，我们有一个index文件，其中包含有关此存储库中所有快照的信息，并且每个快照都有一个关联的文件snapshot-和一个metadata-文件。根目录中的snapshot-文件包含有关快照状态的信息，它包含哪些索引等等。metadata-根目录中的文件包含快照时的集群元数据。

compress: true设置时，文件metadata-使用snapshot-LZF 进行压缩，它侧重于压缩和解压缩速度，这使得它非常适合 Elasticsearch。数据与标题一起存储：ZV + 1 byte indicating whether the data is compressed。在标头之后会有一个或多个压缩的 64K 块，格式为：2 byte block length + 2 byte uncompressed size + compressed data. 使用此信息，您可以使用任何与 LibLZF兼容的解压缩器。如果您想了解有关 LZF 的更多信息，请查看此格式的精彩描述。

在索引级别有另一个文件，indices/{index_name}/snapshot-{snapshot_name}其中包含索引元数据，例如快照时索引的设置和映射。

在分片级别，您会发现两种文件：重命名的 Lucene 索引文件和分片快照文件：indices/{index_name}/{shard_id}/snapshot-{snapshot_name}. 该文件包含有关快照中使用了分片目录中的哪些文件的信息，以及从快照中的逻辑文件名到它们在恢复时应存储为磁盘上的具体文件名的映射。它还包含可用于检测和防止数据损坏的所有相关文件的校验和、Lucene 版本控制和大小信息。

您可能想知道为什么这些文件被重命名，而不是仅仅保留它们的原始文件名，这可能更容易直接在磁盘上使用。原因很简单：可以对索引进行快照、删除并在再次创建快照之前重新创建它。在这种情况下，几个文件最终将具有相同的名称，但内容不同。

参考文献及资料

1、官网，链接：https://www.elastic.co/cn/

2、https://www.shenyanchao.cn/blog/2018/12/04/lucene-index-files/

3、https://www.elastic.co/cn/blog/found-dive-into-elasticsearch-storage

4、https://elasticsearch.cn/article/6178

目录

背景

第一部分 路径

1.1 JVM中路径参数