ElasticSearch快照備份及恢復


1、repository-hdfs的安裝

 1)去elasticsearch官網下載repository-hdfs安裝包

elasticsearch-5.4.0對應的版本是repository-hdfs-5.4.0)

下載地址:

https://www.elastic.co/guide/en/elasticsearch/plugins/5.4/repository-hdfs.html

2)將壓縮包拷到集群下,進入elasticsearch目錄:

執行安裝:

sudo bin/elasticsearch-plugin install

file:///home/huangyan/repository-hdfs-5.4.0.zip

2、源集群創建倉庫

源集群創建倉庫:

curl -XPUT 'http://host:9200/_snapshot/my_hdfs_repository?pretty' -d '{
    "type": "hdfs", "settings": { "uri": "hdfs://host:8020", "path": "elasticsearch/repositories/my_hdfs_repository", "conf.dfs.client.read.shortcircuit": "false" } }'

這里conf.dfs.client.read.shortcircuit如果設置為true,那么hdfs里需要配置一些額外的東西,設置為true能減少通信次數,加快速度,如果不想折騰,還是建議設置為false。

查看創建好的倉庫:

curl -XGET 'http://10.45.*:9200/_snapshot/my_hdfs_repository?pretty'

 

刪除倉庫:
curl -XDELETE 'http://10.45.*:9200/_snapshot/my_hdfs_repository?pretty'

 

3、索引備份

這里備份history_data_index-00002索引:

curl -XPUT 'http://10.45.157.*:9200/_snapshot/my_hdfs_repository/snapshot_2?wait_for_completion=false&pretty' -d '{

  "indices": "history_data_index-00002",

  "ignore_unavailable": true,

  "include_global_state": false

}'

參數解釋:

wait_for_completion=true會一直等待備份結束。

wait_for_completion=false會立即返回,備份在后台進行,可以使用下面的api查看備份的進度:

curl -XGET '10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_2/_status?pretty'

"ignore_unavailable": true忽略有問題的shard

"include_global_state": false快照里不放入集群global信息

注意:

如果執行上述命令式報出could not read repository data from index blob的異常,如下圖,則是java的權限問題

需要修改配置如下:

1)修改plugin-security.policy文件,添加內容如下:

  permission javax.security.auth.AuthPermission "getSubject";

  permission javax.security.auth.AuthPermission "doAs";

  permission javax.security.auth.AuthPermission "modifyPrivateCredentials";

  permission java.lang.RuntimePermission "accessDeclaredMembers";

  permission java.lang.RuntimePermission "getClassLoader";

  permission java.lang.RuntimePermission "shutdownHooks";

  permission java.lang.reflect.ReflectPermission "suppressAccessChecks";

  permission javax.security.auth.AuthPermission "doAs";

  permission javax.security.auth.AuthPermission "getSubject";

  permission javax.security.auth.AuthPermission "modifyPrivateCredentials";

  permission java.security.AllPermission;

  permission java.util.PropertyPermission "*", "read,write";

  permission javax.security.auth.PrivateCredentialPermission "org.apache.hadoop.security.Credentials * \"*\"", "read";

 

2)還需要手動配置一次/usr/elk/elasticsearch/config/jvm.options文件,在jvm.options文件中添加以下信息:

-Djava.security.policy=/usr/elk/elasticsearch/plugins/repository-hdfs/plugin-security.policy

 

3)重啟ES,再次執行上面的索引備份即可成功

查看快照信息:
curl -XGET 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_3?pretty'
查看所有的快照信息:
curl -XGET 'http://10.45.*:9200/_snapshot/my_hdfs_repository/_all?pretty'

刪除快照:
curl -XDELETE 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_1_restore?pretty'

4、恢復快照

curl -XPOST 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_2/_restore?pretty' -d '{
  "indices": "history_data_index-00002",
   "index_settings": {
    "index.number_of_replicas": 1
  },
  "ignore_index_settings": [
    "index.refresh_interval"
  ]
}'

恢復快照的時候分片的數量是不能改變的(要想改變分片數量只能re-index)。但是副本的數量是可以重新指定的(index.number_of_replicas

如果集群中有與要恢復的索引名字相同的索引,可以通過"rename_pattern""rename_replacement"參數來對索引進行重命名,下面命令就可以將person_list_data_index_yinchuan索引的名稱改為restored_index_yinchuan

curl -XPOST 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_3/_restore?pretty' -d '{
  "indices": "person_list_data_index_yinchuan",
  "ignore_unavailable": "true",
  "include_global_state": false,
  "rename_pattern": "person_list_data_index_(.+)",
  "rename_replacement": "restored_index_$1"
}'

查看恢復狀態:
curl -XGET 'http://10.45.*:9200/_recovery/'

如果是在別的集群上進行快照恢復,需要在目標集群創建倉庫:

curl -XPUT 'http://目標host:9200/_snapshot/my_backup?pretty' -d '{
    "type": "hdfs",
    "settings": {
        "uri": "hdfs://待備份host:8020",
        "path": "/user/master/elasticsearch/repositories/my_hdfs_repository",
        "conf.dfs.client.read.shortcircuit": "false"        
    }
}'

然后恢復:

curl -XPOST 'http://目標host:9200/_snapshot/my_hdfs_repository/snapshot_2/_restore?pretty' -d '{
  "indices": "history_data_index-00002",
   "index_settings": {
    "index.number_of_replicas": 1
  },
  "ignore_index_settings": [
    "index.refresh_interval"
  ]
}'

如果按照索引的別名創建快照的話,恢復時直接全部恢復:
curl -XPOST 'http://10.45.*:9200/_snapshot/my_hdfs_repository/snapshot_4/_restore?pretty'

5、補充:

修改包:
需要將/usr/elk/elasticsearch/plugins/repository-hdfs路徑下的一些包的版本改為和hdfs相同的版本,例如我現在是2.7.1的版本,要改為2.6.0的版本。
/usr/cdh/phoenix/lib路徑下有2.6.0的版本,需要改的包有:hadoop-annotations-2.7.1.jar,hadoop-auth-2.7.1.jar,hadoop-client-2.7.1.jar,
hadoop-common-2.7.1.jar,hadoop-hdfs-2.7.1.jar
還需要將htrace-core-3.1.0-incubating.jar改為htrace-core4-4.0.1-incubating.jar才能成功重啟es

查看所有的jar包:
cd /opt/cloudera/parcels/CDH/jars/
ls
將htrace-core4-4.0.1-incubating.jar拷貝到/usr/elk/elasticsearch/plugins/repository-hdfs/下:
cp htrace-core4-4.0.1-incubating.jar /usr/elk/elasticsearch/plugins/repository-hdfs/

查看hdfs下的路徑:
查看根目錄下的子目錄:sudo -u hdfs hadoop fs -ls /
查看/user下面的子目錄:sudo -u hdfs hadoop fs -ls /user
創建倉庫時,如果path設置為:"path": "elasticsearch/repositories/my_hdfs_repository",
則其存儲的路徑為:/user/elasticsearch/elasticsearch/repositories/my_hdfs_repository
查看倉庫下的快照: sudo -u hdfs hadoop fs -ls /user/elasticsearch/elasticsearch/repositories/my_hdfs_repository

 6、測試

1、備份532,391條數據1.52G(3.03G)共花費208541ms,大概3分半鍾

    恢復532391條數據,花費時間大概為6.5s

2、備份1,578,227條數據9.09G(18.1G)共花費1510737ms,大概25分鍾

     恢復1,578,227條數據,花費時間大概為105s

總體來說快照備份的速度不是很快,建議直接用reindex來遷移索引,但是要注意,5.4.0版本的es是不支持跨集群reindex的

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM