ES 分布式搜索

圖見 https://blog.csdn.net/thomas0yang/article/details/78572596?utm_source=copy 最后

我是有些懷疑文章里面的說法的，因為如果都是由master來做merge的話，那么勢必master的在查詢比較多的時候會負載很高！我個人感覺應該是client node接受到查詢，然后去master node的metadata里獲取各個index對應shard，拿到shard后，然后給所有的shard發送搜索請求，然后client node根據各shard的搜索結果進行merge，最后返回。

elasticsearch整個查詢是scatter/gather思想，也是多數分布式查詢的套路，即：
1. master服務端（配置為node.master: true）接收客戶端請求，查找對應的index、shard，分發數據請求到對應node服務端（node.data: true）
2. node端負責數據查詢，返回結果到master端
3. master端把查詢結果進行數據合並
上面流程是一個邏輯流程，es的具體查詢過程中會分為不同的查詢類型：QUERY_THEN_FETCH、QUERY_AND_FETCH（Deprecated），有不同的查詢動作。
由於QUERY_AND_FETCH在5.X已經廢除（使用QUERY_THEN_FETCH替代），所以這里只介紹QUERY_THEN_FETCH查詢流程。

更加清楚的介紹在：https://blog.csdn.net/qqqq0199181/article/details/82702557

master服務端
1、接收查詢請求，進行readblock檢查。根據request的index構造相應的ShardsIterator，shardIterators由localShardsIterator和remoteShardIterators合並而成，用戶遍歷所有的shard。生成shardits會有一些查詢策略，控制每個shard的查詢優先次序和條件控制。

preferenceType = Preference.parse(preference);
switch (preferenceType) {
case PREFER_NODES:
final Set<String> nodesIds =
Arrays.stream(
preference.substring(Preference.PREFER_NODES.type().length() + 1).split(",")
).collect(Collectors.toSet());
return indexShard.preferNodeActiveInitializingShardsIt(nodesIds);
case LOCAL:
return indexShard.preferNodeActiveInitializingShardsIt(Collections.singleton(localNodeId));
case PRIMARY:
return indexShard.primaryActiveInitializingShardIt();
case REPLICA:
return indexShard.replicaActiveInitializingShardIt();
case PRIMARY_FIRST:
return indexShard.primaryFirstActiveInitializingShardsIt();
case REPLICA_FIRST:
return indexShard.replicaFirstActiveInitializingShardsIt();
case ONLY_LOCAL:
return indexShard.onlyNodeActiveInitializingShardsIt(localNodeId);
case ONLY_NODES:
String nodeAttributes = preference.substring(Preference.ONLY_NODES.type().length() + 1);
return indexShard.onlyNodeSelectorActiveInitializingShardsIt(nodeAttributes.split(","), nodes);
default:
throw new IllegalArgumentException("unknown preference [" + preferenceType + "]");
}
from：https://blog.csdn.net/thomas0yang/article/details/78572596?utm_source=copy

主(Master)節點說明

主節點的主要職責是和集群操作相關的內容，如創建或刪除索引，跟蹤哪些節點是群集的一部分，並決定哪些分片分配給相關的節點。

elasticsearch可以使用preference參數來指定分片查詢的優先級，使用時就是在請求url上加上preference參數，如：http://ip:host/index/_search?preference=_primary

java的調用接口翻譯為：client.prepareSearch("index").setPreference("_primary")。

默認情況下es有5種查詢優先級：

_primary: 指查詢只在主分片中查詢

_primary_first: 指查詢會先在主分片中查詢，如果主分片找不到（掛了），就會在副本中查詢。

_local: 指查詢操作會優先在本地節點有的分片中查詢，沒有的話再在其它節點查詢。

_only_node:指在指定id的節點里面進行查詢，如果該節點只有要查詢索引的部分分片，就只在這部分分片中查找，所以查詢結果可能不完整。如_only_node:123在節點id為123的節點中查詢。

Custom (string) value:用戶自定義值，指在參數cluster.routing.allocation.awareness.attributes指定的值，如這個值設置為了zone，那么preference=zone的話就在awareness.attributes=zone*這樣的節點搜索，如zone1、zone2。

主(Master)節點說明

免責聲明！