curl :
http://keenwon.com/1393.html
During snapshot initialization, information about all previous snapshots is loaded into the memory, which means that in large repositories it may take several seconds (or even minutes) for this command to return even if the wait_for_completion
parameter is set to false
.
這意味着創建快照時,會占用很大的內存(同時為了計算,也會占用很多CPU),原因如下段中描述:在創建快照時需要分析已有倉庫中的索引
The index snapshot process is incremental. In the process of making the index snapshot Elasticsearch analyses the list of the index files that are already stored in the repository and copies only files that were created or changed since the last snapshot. That allows multiple snapshots to be preserved in the repository in a compact form. Snapshotting process is executed in non-blocking fashion.
快照本質上就是將索引(還有一些集群信息)復制,所謂增量式(incremental)就是僅復制自上次以來新增和改變的文件(索引)。
All indexing and searching operation can continue to be executed against the index that is being snapshotted. However, a snapshot represents the point-in-time view of the index at the moment when snapshot was created, so no records that were added to the index after the snapshot process was started will be present in the snapshot.
創建快照不會影響索引和查詢操作。快照是索引的實時反映,所以在創建快照過程中新增的索引都不會在快照中出現。
The snapshot process starts immediately for the primary shards that has been started and are not relocating at the moment. Elasticsearch waits for relocation or initialization of shards to complete before snapshotting them.
Besides creating a copy of each index the snapshot process can also store global cluster metadata, which includes persistent cluster settings and templates. The transient settings and registered snapshot repositories are not stored as part of the snapshot.
快照進程在復制索引時也會存儲集群的元信息,包括集群永久設置和模板。臨時設置和快照倉庫並不會作為快照的一部分存儲。
藍色部分表示懷疑
[root@datanode3 elasticsearch]# cd test/ [root@datanode3 test]# ll 總計 174092 -rw-r--r-- 1 root root 183796 03-04 21:48 123.tgz -rw-r--r-- 1 root root 177894546 03-04 17:15 1.tgz drwxr-xr-x 3 root root 4096 03-04 21:45 repo [root@datanode3 test]# cd repo/ [root@datanode3 repo]# ll 總計 16 -rw-r--r-- 1 root root 32 03-04 21:45 index drwxr-xr-x 3 root root 4096 03-04 21:45 indices -rw-r--r-- 1 root root 252 03-04 21:45 metadata-snapshot_test5 -rw-r--r-- 1 root root 202 03-04 21:45 snapshot-snapshot_test5 [root@datanode3 repo]# cat metadata-snapshot_test5 {"meta-data":{"version":435,"uuid":"IQXbkDFASIu_BlyerMYJcQ","templates":{},"repositories":{"my_backup":{"type":"fs","settings":{"compress":"true","location":"./mount/backups/my_backup"}},"testrepo":{"type":"fs","settings":{"location":"./test/repo"}}}}}
如上所示metadata-snapshot_test5文件中確實有快照倉庫信息。
但是通過如下實驗:
1.部署兩個彈搜集群(兩台單機),配置完全相同,一台A有索引(test5),並建立數據倉庫(testrepo),創建了一個快照(snapshot_test5),另一台B沒有任何數據。
2.把A快照倉庫下的所有文件拷到B中相應的文件夾下,直接通過API做數據恢復,提示缺少快照倉庫,說明快照中確實沒有存儲快照倉庫信息(至少應該不全)
3.在B中創建相同名稱和路徑的快照倉庫,把A中快照倉庫路徑(repo)下的所有文件拷貝到B中相同路徑(repo)下,通過API做數據恢復,數據恢復成功,所以可以通過該方式做數據遷移。
雖然不知道這么操作是否安全可靠,但是至少成功了。
Only one snapshot process can be executed in the cluster at any time. While snapshot of a particular shard is being created this shard cannot be moved to another node, which can interfere with rebalancing process and allocation filtering. Elasticsearch will only be able to move a shard to another node (according to the current allocation filtering settings and rebalancing algorithm) once the snapshot is finished.
一個急群眾只能有一個快照進程在執行
http://www.elasticsearch.org/blog/introducing-snapshot-restore/
However, while replication can protect a cluster from hardware failures, it doesn’t help when someone accidentally deletes an index. Anyone that relies on an Elasticsearch cluster needs to perform regular backups.
副本和備份有不同的目的:副本機制是為了防止硬盤故障,備份機制是為了防止誤刪索引。
The snapshot/restore mechanism can be also used to synchronize data between a “hot” cluster and a remote, “cold” backup cluster in a different geographic region for fast disaster recovery.
快照和恢復機制也用來同步“熱”集群和遠程“冷”備份集群
Java api
http://amsterdam.luminis.eu/2014/12/15/creating-elasticsearch-backups-with-snapshotrestore/