data stream的背后可以认为是一组自动创建的index。
数据流允许跨多个index仅追加时间序列数据,同时为请求提供单个index的命名(别名)。数据流非常适合于日志、事件、度量和其他连续生成的数据。
可以直接向数据流提交索引和搜索请求。流自动将请求路由到存储流数据的备份索引。您可以使用索引生命周期管理(ILM)来自动管理这些备份索引。
读数据
写数据
不能对其他index增加文档,即便是指定全名也不可以。对正在可写的index不能操作:
generation
index生成规则:一个六位数的零填充整数,作为流滚动的累积计数,从000001开始。
index的完整名称将会是
.ds-<data-stream>-<yyyy.MM.dd>-<generation>
例如 .ds-my-data-stream-2021.10.27-000001
append-only 不能将现有文档的更新或删除请求直接发送到data stream,可以使用 update by query and delete by query
如果有必要,可以指定完整的index名称进行更新、删除。
如果需要经常更新、删除操作的,使用index template 加 index别名的方式,而不是使用data stream。详见 Manage time series data without data streams.
创建Data stream
通常的步骤:
- Create an index lifecycle policy 创建ILM
- Create component templates 不是必须的
- Create an index template 创建index template
- Create the data stream 创建data stream
- Secure the data stream 权限控制,不是必须的
创建ILM
PUT _ilm/policy/my-lifecycle-policy { "policy": { "phases": { "hot": { "actions": { "rollover": { "max_primary_shard_size": "50gb" } } }, "warm": { "min_age": "30d", "actions": { "shrink": { "number_of_shards": 1 }, "forcemerge": { "max_num_segments": 1 } } }, "cold": { "min_age": "60d", "actions": { "searchable_snapshot": { "snapshot_repository": "found-snapshots" } } }, "frozen": { "min_age": "90d", "actions": { "searchable_snapshot": { "snapshot_repository": "found-snapshots" } } }, "delete": { "min_age": "735d", "actions": { "delete": {} } } } } }
这里创建2个_component_template供index template使用
PUT _component_template/my-mappings { "template": { "mappings": { "properties": { "@timestamp": { "type": "date", "format": "date_optional_time||epoch_millis" }, "message": { "type": "wildcard" } } } }, "_meta": { "description": "Mappings for @timestamp and message fields", "my-custom-meta-field": "More arbitrary metadata" } } PUT _component_template/my-settings { "template": { "settings": { "index.lifecycle.name": "my-lifecycle-policy" } }, "_meta": { "description": "Settings for ILM", "my-custom-meta-field": "More arbitrary metadata" } }
创建index template
PUT _index_template/my-index-template { "index_patterns": ["my-data-stream*"], "data_stream": { }, "composed_of": [ "my-mappings", "my-settings" ], "priority": 500, "_meta": { "description": "Template for my time series data", "my-custom-meta-field": "More arbitrary metadata" } }
接下来可以自动创建data stream了
PUT my-data-stream/_bulk { "create":{ } } { "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736" } { "create":{ } } { "@timestamp": "2099-05-06T16:25:42.000Z", "message": "192.0.2.255 - - [06/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" } POST my-data-stream/_doc { "@timestamp": "2099-05-06T16:21:15.000Z", "message": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736" }
也可以使用 PUT _data_stream/my-data-stream 来创建
查询data stream GET _data_stream/my-data-stream
删除data stream DELETE _data_stream/my-data-stream
使用Data stream通常有以下应用:
- Add documents to a data stream
- Search a data stream
- Get statistics for a data stream
- Manually roll over a data stream
- Open closed backing indices
- Reindex with a data stream
- Update documents in a data stream by query
- Delete documents in a data stream by query
- Update or delete documents in a backing index
增加文档
POST /my-data-stream/_doc/ { "@timestamp": "2099-03-08T11:06:07.000Z", "user": { "id": "8a4f500d" }, "message": "Login successful" }
如果指定ID时,不能使用 PUT /<target>/_doc/<_id> ,但可以使用PUT /<target>/_create/<_id>。
而_bulk只支持新增文档。
查询文档
跟index的查询是相同的
查询Data stream的状态度量数据
GET /_data_stream/my-data-stream/_stats?human=true
手动rollover
POST /my-data-stream/_rollover/
开启关闭背后的index
不能对closed的backing index进行查询、更新、删除。
如要reopen可以使用 POST /.ds-my-data-stream-2099.03.07-000001/_open/ , 也可以开启全部closed的backing index POST /my-data-stream/_open/
Reindex到Data stream
POST /_reindex { "source": { "index": "archive" }, "dest": { "index": "my-data-stream", "op_type": "create" } }
POST /my-data-stream/_update_by_query { "query": { "match": { "user.id": "l7gk7f82" } }, "script": { "source": "ctx._source.user.id = params.new_id", "params": { "new_id": "XgdX0NoX" } } }
POST /my-data-stream/_delete_by_query { "query": { "match": { "user.id": "vlb44hny" } } }
指定backing index更新或删除文档
先查询得到index名称和文档ID
修改mappings和settings
由于data stream有一个index template,它的mappings和settings是来自index template的,因此最初要考虑好使用的mappings和settings。
在后续如果想做变更,例如
- Add a new field mapping to a data stream
- Change an existing field mapping in a data stream
- Change a dynamic index setting for a data stream
- Change a static index setting for a data stream
增加字段
首先在index template上增加字段,这样后续自动创建的index将会有新字段
PUT /_index_template/my-data-stream-template { "index_patterns": [ "my-data-stream*" ], "data_stream": { }, "priority": 500, "template": { "mappings": { "properties": { "message": { "type": "text" } } } } }
再对已存在的backing index也增加字段,这将对所有的backing index起作用,包括write的index
PUT /my-data-stream/_mapping { "properties": { "message": { "type": "text" } } }
也可以只对write的index增加字段
PUT /my-data-stream/_mapping?write_index_only=true { "properties": { "message": { "type": "text" } } }
修改已存在的字段
因为ES的字段type是不能修改的,但可以修改其他的参数配置
首先修改index template
PUT /_index_template/my-data-stream-template { "index_patterns": [ "my-data-stream*" ], "data_stream": { }, "priority": 500, "template": { "mappings": { "properties": { "host": { "properties": { "ip": { "type": "ip", "ignore_malformed": true } } } } } } }
以上修改了 "ignore_malformed": true
再对已存在的backing index也作此修改,同上面增加字段
修改index的dynamic settings
同样也是以上步骤,使用对应的api
修改index的static settings
修改index template的settings,跟dynamic不同,static的修改只能对未来新增的backing index起作用。如果想要立即生效,可以使用手动rollover立即产生新的backing index达到效果。
使用reindex修改字段类型
跟index的reindex类似,data stream也可以reindex,实现例如@timestamp的date类型转date_nanos类型