ElasticSearch批量寫入時遇到EsRejectedExecutionException

本文轉載自查看原文 2020-09-13 12:05 2619 Elasticsearch

阿里雲ARMS控制台發現線上搜索服務的批量寫入方法報大量異常：

 [DUBBO] Got unchecked and undeclared exception which called by 192.168.x.x. service: xxx.IProductSearchService, method: saveProductEntitys, 
exception: org.springframework.data.elasticsearch.ElasticsearchException: Bulk indexing has failures. Use ElasticsearchException.getFailedDocuments() for detailed messages 
[{346833406144942081PY331010069=RemoteTransportException[[node-100][192.168.x.x:9300][indices:data/write/bulk[s]]]; nested: EsRejectedExecutionException[rejected 
execution of org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase$1@c1f6e62 on EsThreadPoolExecutor[bulk, queue capacity = 200, 
org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@66f7b4ec[Running, pool size = 16, active threads = 16, queued tasks = 200, completed tasks = 332599]]];}], dubbo 
version: search, current host: 192.168.x.x

推測跟上線前商品服務的改動有關，將更新商品增量索引的方法為了異步無返回的方式，該方法構建好數據后調搜索服務批量寫入ES的接口。

<dubbo:reference id="productIdxService" interface="com.xxx.ProductIdxService" lazy="true" timeout="10000">
    <!-- 異步調用無返回 -->
    <dubbo:method name="buildIncrementalProductIndex" async="true" return="false" />
</dubbo:reference>

由於是異步調用，調用方很快返回，可能導致提供方並發數增加，最終批量寫入ES接口的並發增加。
日志里的EsRejectedExecutionException也提示了ES的線程池EsThreadPoolExecutor bulk queue capacity = 200，pool size = 16, active threads = 16, queued tasks = 200，
跟JUC的固定線程池類似，任務數超過了隊列大小，ES拒絕執行。

線上使用的ES版本很老，是2.2版本。

通過ES提供的HTTP接口：

查看線程池配置
GET /ES地址/_cat/thread_pool?v
// 注：?v表示顯示表頭(header)，也可指定具體顯示哪里列
// 例：/_cat/thread_pool?v&h=host,bulk.active,bulk.rejected,bulk.queue,bulk.queueSize,bulk.size,bulk.min,bulk.max,search.active,search.rejected,search.queue,search.queueSize
查看各節點線程池配置
GET /ES地址/_nodes/thread_pool/
查看集群配置
GET /ES地址/_cat/_cluster/settings
修改集群配置
PUT /ES地址/_cat/_cluster/settings

{
  "transient": {
    "threadpool.bulk.type": "fixed",
    "threadpool.bulk.queue_size": 1000,
    "threadpool.bulk.size": 16,
    "threadpool.bulk.min": 16,
    "threadpool.bulk.max": 16
  }
}

// 將bulk的隊列大小修改為1000，注意size、min、max也可修改，但實測發現查看集群配置參數確實改了，而查看線程池3個參數仍然沒變，只是隊列大小已修改。
由於是線上ES，且考慮到線程數16是ES的保護機制以及機器配置和負載情況，只修改了隊列大小。
隊列大小調整到1000后，發現報EsRejectedExecutionException的數量少了很多，在並發量很大的時候還是會報該異常，
異常提示里的queue capacity已是1000了，說明參數生效。

該方案為臨時處理方案，少量的寫入異常從業務上看能接受，對業務影響不大基本上很難發現，且目前老版本ES的負載較高。
未來着手對ES版本進行升級，使用阿里雲ES7.4的版本，並且升級老的搜索服務，應用新版本的spring-data-elasticseach或者使用rest-high-level-client。

參考：
es 查看線程池 https://blog.csdn.net/yun0000000/article/details/106327838/
es寫入報錯，EsRejectedExecutionException https://elasticsearch.cn/question/4647
使用 /_cat/thread_pool 查詢線程池運行情況 https://www.letianbiji.com/elasticsearch/es7-cat-thread-pool.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Elasticsearch批量插入時，存在就不插入 python elasticsearch 批量寫入數據 ES批量索引寫入時的ID自動生成算法 Sql批量插入時如果遇到相同的數據怎么處理 elasticsearch之使用Python批量寫入數據 Spring Boot + Elasticsearch 實現索引批量寫入 elasticsearch之使用Python批量寫入數據寫入時復制（CopyOnWrite） Elasticsearch 5.4.3實戰--Java API調用：批量寫入數據 Java文件寫入時是否覆蓋