elasticsearch 父子關系

本文轉載自查看原文 2018-08-04 10:15 1688 ELK

ElasticSearch 中的Parent-Child關系和nested模型是相似的，兩個都可以用於復雜的數據結構中，區別是 nested 類型的文檔是把所有的實體聚合到一個文檔中而Parent-Child現對於比較獨立，每個實體即為一個文檔
Parent-Child 優點
1、父文檔更新時不用重新為子文檔建立索引
2、子文檔的增加、修改、刪除是對父文檔和其他子文檔沒有任何影響的，這非常適用於子文檔非常大並且跟新頻繁的場景
3、子文檔也可以查詢結果返回
ElasticSearch 內部維護一個map來保存Parent-Child之間的關系，正是由於這個map，所以關聯查詢能夠做到響應速度很快，但是確實有個限制是Parent 文檔和所有的Child 文檔都必須保存到同一個shard中
ElasticSearch parent-child ID的映射是存到Doc value 中的，有足夠的內存時響應是很快的。當這個map很大的時候，還是有要有一部分存儲在硬盤中的。

Parent-Child Mapping

為了建立Parent-Child 模型我們需要在創建mapping的時候指定父文檔和子文檔或者在子文檔創建之前利用update-index API 來指定
例如：我們有個公司，其子公司分布在全國各地，我要分析員工和子公司的關系
我們使用Parent-Child 機構
我們需要建立employee type 和 branch type 並且指定 branch 為_parent

PUT /company
{
   "mappings": {
        "branch": {},
         "employee": {
             "_parent": {
                      "type": "branch"
              }
         }
     }
}

Indexing Parents and Children

創建父索引和創建其他索引並沒有區別，父文檔並不需要知道他們的子文檔

POST /company/branch/_bulk
{ "index": { "_id": "london" }}
{ "name": "London Westminster", "city": "London", "country": "UK" }
{ "index": { "_id": "liverpool" }}
{ "name": "Liverpool Central", "city": "Liverpool", "country": "UK" }
{ "index": { "_id": "paris" }}
{ "name": "Champs Élysées", "city": "Paris", "country": "France" }

創建子文檔的時候你必須指出他們的父文檔的id

PUT /company/employee/1?parent=london 
{
  "name":  "Alice Smith",
  "dob":   "1970-10-24",
  "hobby": "hiking"
}

指定parent id 有兩個目的：他是父文檔和子文檔的關聯，而且他也保證了父文檔和子文檔會存儲在同一個shard中，
在routing那個章節我們解釋了ElasticSearch 如何利用routing的值來決定分配到shard中的，如果文檔沒有指定routing的值的化，那么默認為_id,公式為

shard = hash(routing) % number_of_primary_shards

但是，如果指定了 parent id 那么routing的值就不是_id 了而是 parent id，換句話說就是父文檔和子文檔是具有相同的routing的值來確保他們會分配到同一個shard中的
當我們用GET請求來檢索子文檔時，我們需要指定parent id，並且創建索引、更新索引、還有刪除索引都需要指定parent id，不像搜索的請求，他會分發到所有的shard中，這些single-document請求只會發送到存儲它的shard中。如果沒有指定parent id 也許請求會發送到一個錯誤的shard中
當我們使用buk API 時也需要指定parent id

POST /company/employee/_bulk
{ "index": { "_id": 2, "parent": "london" }}
{ "name": "Mark Thomas", "dob": "1982-05-16", "hobby": "diving" }
{ "index": { "_id": 3, "parent": "liverpool" }}
{ "name": "Barry Smith", "dob": "1979-04-01", "hobby": "hiking" }
{ "index": { "_id": 4, "parent": "paris" }}
{ "name": "Adrien Grand", "dob": "1987-05-11", "hobby": "horses" }

Finding Parents by Their Children

has_child 和 filter 可以根據子文檔的內容來查詢父文檔，例如我們可以用這樣的語句搜索所有分公司，出生在1980年以后的員工：

GET /company/branch/_search
{
  "query": {
    "has_child": {
      "type": "employee",
      "query": {
        "range": {
          "dob": {
            "gte": "1980-01-01"
          }
        }
      }
    }
  }
}

has_child 查詢會匹配到多個子文檔，每個文檔都會有不同的關聯得分。這些得分如何減少父文檔的單個得分取決於分數模型的參數。默認參數為none，即會忽略子文檔的得分，並且父文檔會加1.0.
下面的查詢會同時返回london 還有 liverpool 但是london 會得到一個更好的得分，因為Alice Smith 更加匹配london

GET /company/branch/_search
{
  "query": {
    "has_child": {
      "type":       "employee",
      "score_mode": "max",
      "query": {
        "match": {
          "name": "Alice Smith"
        }
      }
    }
  }
}

min_children and max_children

has_child 和 filter 都有min_children 和 max_children 兩個參數，作用是返回那些具有子文檔個數與之相匹配的父文檔數據
下面的查詢會返回具有兩個員工以上的分公司

GET /company/branch/_search
{
  "query": {
    "has_child": {
      "type":         "employee",
      "min_children": 2, 
      "query": {
        "match_all": {}
      }
    }
  }
}

Finding Children by Their Parents

和nested 查詢只能返回根節點數據不同的是，父文檔和子文檔都是相對獨立的，並且可以被單獨查詢，has_child 查詢可以根據子文檔返回父文檔而 has_parent查詢會根據父文檔返回子文檔
和has_child 查詢很相似，下面的查詢會返回那些工作在uk的員工

GET /company/employee/_search
{
  "query": {
    "has_parent": {
      "type": "branch", 
      "query": {
        "match": {
          "country": "UK"
        }
      }
    }
  }
}

has_parent 查詢也支持score_mode模式，但是它只有兩種設置none(默認)和score，每個子文檔可以只擁有一個父文檔，所以就沒有必要將分數分給多個子文檔了，這僅僅取決於你使用none還是score模式了

Children Aggregation

Parent-child 支持children aggregation parent aggregation 是不支持的
下面的例子示范了我們分析了員工的興趣

GET /company/branch/_search
{
  "size" : 0,
  "aggs": {
    "country": {
      "terms": { 
        "field": "country"
      },
      "aggs": {
        "employees": {
          "children": { 
            "type": "employee"
          },
          "aggs": {
            "hobby": {
              "terms": { 
                "field": "hobby"
              }
            }
          }
        }
      }
    }
  }
}

Grandparents and Grandchildren

parent-child 關系不僅僅可以有兩代，他可以具有多代關系，但是所有關聯的數據都必須分到同一個shard中去。
我們稍微修改下之前的列子，叫county 成為branch 的父文檔

PUT /company
{
  "mappings": {
    "country": {},
    "branch": {
      "_parent": {
        "type": "country" 
      }
    },
    "employee": {
      "_parent": {
        "type": "branch" 
      }
    }
  }
}

Countries and branches 只是簡單的父子關系，所以我們用相同的方式來創建索引數據

POST /company/country/_bulk
{ "index": { "_id": "uk" }}
{ "name": "UK" }
{ "index": { "_id": "france" }}
{ "name": "France" }

POST /company/branch/_bulk
{ "index": { "_id": "london", "parent": "uk" }}
{ "name": "London Westmintster" }
{ "index": { "_id": "liverpool", "parent": "uk" }}
{ "name": "Liverpool Central" }
{ "index": { "_id": "paris", "parent": "france" }}
{ "name": "Champs Élysées" }

parent id 保證了每個branch和他們的父文檔都被分配到了同一個shard中了，
如果和之前一樣，我們來創建employee 數據，會發生什么？

PUT /company/employee/1?parent=london
{
  "name":  "Alice Smith",
  "dob":   "1970-10-24",
  "hobby": "hiking"
}

shard 會根據文檔的parent ID—london 來分配employee 文檔，但是這個london 文檔會根據他的parent id uk來分配，所以employee文檔和country、branch 很有可能被分配到不同的shard中。
所以我們需要一個額外的參數routing保證所有關聯的文檔被分配到同一個shard中。

PUT /company/employee/1?parent=london&routing=uk 
{
  "name":  "Alice Smith",
  "dob":   "1970-10-24",
  "hobby": "hiking"
}

parent 參數仍然用於子文檔和父文檔的關聯，routing 參數是用於保證文檔被分配到哪個shard中去
查詢和聚合對於多級的文檔也仍然有效，例如：問了找到哪些城市的員工喜歡hiking

GET /company/country/_search
{
  "query": {
    "has_child": {
      "type": "branch",
      "query": {
        "has_child": {
          "type": "employee",
          "query": {
            "match": {
              "hobby": "hiking"
            }
          }
        }
      }
    }
  }
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Elasticsearch 父子關系理解的shell父子關系 Logstash 父子關系配置 MFC窗口的父子關系和層級關系 Qt 對象間的父子關系 lucene join解決父子關系索引 JS數組父子關系生成對象 java遞歸父子關系菜單 Linux Shell的父子關系及內建命令 maven創建父子關系的聚合項目