1、repository-hdfs的安裝
(1)去elasticsearch官網下載repository-hdfs安裝包
(elasticsearch-5.4.0對應的版本是repository-hdfs-5.4.0)
下載地址:
https://www.elastic.co/guide/en/elasticsearch/plugins/5.4/repository-hdfs.html
(2)將壓縮包拷到集群下,進入elasticsearch目錄:
執行安裝:
sudo bin/elasticsearch-plugin install
file:///home/huangyan/repository-hdfs-5.4.0.zip
2、源集群創建倉庫
源集群創建倉庫:
curl -XPUT 'http://host:9200/_snapshot/my_hdfs_repository?pretty' -d '{
"type": "hdfs", "settings": { "uri": "hdfs://host:8020", "path": "elasticsearch/repositories/my_hdfs_repository", "conf.dfs.client.read.shortcircuit": "false" } }'
這里conf.dfs.client.read.shortcircuit如果設置為true,那么hdfs里需要配置一些額外的東西,設置為true能減少通信次數,加快速度,如果不想折騰,還是建議設置為false。
查看創建好的倉庫:
curl -XGET 'http://10.45.*:9200/_snapshot/my_hdfs_repository?pretty'
刪除倉庫:
curl -XDELETE 'http://10.45.*:9200/_snapshot/my_hdfs_repository?pretty'
3、索引備份:
這里備份history_data_index-00002索引:
curl -XPUT 'http://10.45.157.*:9200/_snapshot/my_hdfs_repository/snapshot_2?wait_for_completion=false&pretty' -d '{
"indices": "history_data_index-00002",
"ignore_unavailable": true,
"include_global_state": false
}'
參數解釋:
wait_for_completion=true會一直等待備份結束。
wait_for_completion=false會立即返回,備份在后台進行,可以使用下面的api查看備份的進度:
curl -XGET '10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_2/_status?pretty'
"ignore_unavailable": true忽略有問題的shard
"include_global_state": false快照里不放入集群global信息
注意:
如果執行上述命令式報出could not read repository data from index blob的異常,如下圖,則是java的權限問題
需要修改配置如下:
(1)修改plugin-security.policy文件,添加內容如下:
permission javax.security.auth.AuthPermission "getSubject";
permission javax.security.auth.AuthPermission "doAs";
permission javax.security.auth.AuthPermission "modifyPrivateCredentials";
permission java.lang.RuntimePermission "accessDeclaredMembers";
permission java.lang.RuntimePermission "getClassLoader";
permission java.lang.RuntimePermission "shutdownHooks";
permission java.lang.reflect.ReflectPermission "suppressAccessChecks";
permission javax.security.auth.AuthPermission "doAs";
permission javax.security.auth.AuthPermission "getSubject";
permission javax.security.auth.AuthPermission "modifyPrivateCredentials";
permission java.security.AllPermission;
permission java.util.PropertyPermission "*", "read,write";
permission javax.security.auth.PrivateCredentialPermission "org.apache.hadoop.security.Credentials * \"*\"", "read";
(2)還需要手動配置一次/usr/elk/elasticsearch/config/jvm.options文件,在jvm.options文件中添加以下信息:
-Djava.security.policy=/usr/elk/elasticsearch/plugins/repository-hdfs/plugin-security.policy
(3)重啟ES,再次執行上面的索引備份即可成功
查看快照信息:
curl -XGET 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_3?pretty'
查看所有的快照信息:
curl -XGET 'http://10.45.*:9200/_snapshot/my_hdfs_repository/_all?pretty'
刪除快照:
curl -XDELETE 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_1_restore?pretty'
4、恢復快照
curl -XPOST 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_2/_restore?pretty' -d '{ "indices": "history_data_index-00002", "index_settings": { "index.number_of_replicas": 1 }, "ignore_index_settings": [ "index.refresh_interval" ] }'
恢復快照的時候分片的數量是不能改變的(要想改變分片數量只能re-index)。但是副本的數量是可以重新指定的(index.number_of_replicas )
如果集群中有與要恢復的索引名字相同的索引,可以通過"rename_pattern"和"rename_replacement"參數來對索引進行重命名,下面命令就可以將person_list_data_index_yinchuan索引的名稱改為restored_index_yinchuan:
curl -XPOST 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_3/_restore?pretty' -d '{ "indices": "person_list_data_index_yinchuan", "ignore_unavailable": "true", "include_global_state": false, "rename_pattern": "person_list_data_index_(.+)", "rename_replacement": "restored_index_$1" }'
查看恢復狀態:
curl -XGET 'http://10.45.*:9200/_recovery/'
如果是在別的集群上進行快照恢復,需要在目標集群創建倉庫:
curl -XPUT 'http://目標host:9200/_snapshot/my_backup?pretty' -d '{ "type": "hdfs", "settings": { "uri": "hdfs://待備份host:8020", "path": "/user/master/elasticsearch/repositories/my_hdfs_repository", "conf.dfs.client.read.shortcircuit": "false" } }'
然后恢復:
curl -XPOST 'http://目標host:9200/_snapshot/my_hdfs_repository/snapshot_2/_restore?pretty' -d '{ "indices": "history_data_index-00002", "index_settings": { "index.number_of_replicas": 1 }, "ignore_index_settings": [ "index.refresh_interval" ] }'
如果按照索引的別名創建快照的話,恢復時直接全部恢復:
curl -XPOST 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_4/_restore?pretty'
5、補充:
修改包:
需要將/usr/elk/elasticsearch/plugins/repository-hdfs路徑下的一些包的版本改為和hdfs相同的版本,例如我現在是2.7.1的版本,要改為2.6.0的版本。
/usr/cdh/phoenix/lib路徑下有2.6.0的版本,需要改的包有:hadoop-annotations-2.7.1.jar,hadoop-auth-2.7.1.jar,hadoop-client-2.7.1.jar,
hadoop-common-2.7.1.jar,hadoop-hdfs-2.7.1.jar
還需要將htrace-core-3.1.0-incubating.jar改為htrace-core4-4.0.1-incubating.jar才能成功重啟es
查看所有的jar包:
cd /opt/cloudera/parcels/CDH/jars/
ls
將htrace-core4-4.0.1-incubating.jar拷貝到/usr/elk/elasticsearch/plugins/repository-hdfs/下:
cp htrace-core4-4.0.1-incubating.jar /usr/elk/elasticsearch/plugins/repository-hdfs/
查看hdfs下的路徑:
查看根目錄下的子目錄:sudo -u hdfs hadoop fs -ls /
查看/user下面的子目錄:sudo -u hdfs hadoop fs -ls /user
創建倉庫時,如果path設置為:"path": "elasticsearch/repositories/my_hdfs_repository",
則其存儲的路徑為:/user/elasticsearch/elasticsearch/repositories/my_hdfs_repository
查看倉庫下的快照: sudo -u hdfs hadoop fs -ls /user/elasticsearch/elasticsearch/repositories/my_hdfs_repository
6、測試
1、備份532,391條數據1.52G(3.03G)共花費208541ms,大概3分半鍾
恢復532391條數據,花費時間大概為6.5s
2、備份1,578,227條數據9.09G(18.1G)共花費1510737ms,大概25分鍾
恢復1,578,227條數據,花費時間大概為105s
總體來說快照備份的速度不是很快,建議直接用reindex來遷移索引,但是要注意,5.4.0版本的es是不支持跨集群reindex的