通過使用JanusGraph索引提高性能

本文轉載自查看原文 2017-07-07 16:43 8408 JanusGraph/ index/ bigtable

翻譯整理：紀玉奇

Extending JanusGraph Server

JanusGraph支持兩種類型的索引：graph index和vertex-centric index。graph index常用於根據屬性查詢Vertex或Edge的場景；vertex index在圖遍歷場景非常高效，尤其是當Vertex有很多Edge的情況下。

Graph Index

Graph Index是整個圖上的全局索引結構，用戶可以通過屬性高效查詢Vertex或Edge。如下面的代碼：

g.V().has('name','hercules')
g.E().has('reason', textContains('loves'))

上面的例子即為根據屬性查找Vertex或Edge的實例，如果沒有設置索引，上述的操作將會導致全表掃描，對大圖來說是不可接受的。

JanusGraph支持兩種不同的Graph Index，Composte index和Mixed Index，Compostie非常高效和快速，但只能應用對某 特定的， 預定義的屬性key組合進行相等查詢。Mixed index可用在查詢任何index key的組合上並支持多條件查詢，除了相等條件要依賴於后端索引存儲。

這兩種類型的Index都是通過JanusGraph的management操作的：

JanusGraphManagement.buildIndex(String,Class）

第一個參數是index的名稱，第二個參數是要索引的類（如Vertex.class），name必須唯一。如果是在同一事務中新增的屬性key所構成Index將會即刻生效，否則需要運行一個reindex proceudre來同步索引和數據，直到同步完成，否則索引不可用。推薦在初始化schema時同時定義索引。

注意：如果沒有建索引，會進行全表掃面，此時性能非常低，可以通過配置 force-index 參數禁止全表掃描。

Composite Index

Comosite index通過一個或多個固定的key組合來獲取Vertex Key或Edge，也即查詢條件是在Index中固定的。

// 在graph中有事務執行時絕不能創建索引（否則可能導致死鎖）
graph.tx().rollback()
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
// 構建根據name查詢vertex的組合索引
mgmt.buildIndex('byNameComposite',Vertex.class).addKey(name).buildCompositeIndex()
// 構建根據name和age查詢vertex的組合索引
mgmt.buildIndex('byNameAndAgeComposite',Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()
//等待索引生效
mgmt.awaitGraphIndexStatus(graph,'byNameComposite').call()
mgmt.awaitGraphIndexStatus(graph,'byNameAndAgeComposite').call()
//對已有數據重新索引
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"),SchemaAction.REINDEX).get()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndAgeComposite"),SchemaAction.REINDEX).get()
mgmt.commit()

需要注意的是，Composite index需要在查詢條件完全匹配的情況下才能觸發，如上面代碼，g.V().has('name', 'hercules')和g.V().has('age',30).has('name','hercules')都是可以觸發索引的，但g.V().has('age',30)則不行，因並未對age建索引。 g.V().has('name','hercules').has('age',inside(20,50))也不可以，因只支持精確匹配，部支持范圍查詢。

Index Uniqueness

Composite Index也可以作為圖的屬性唯一約束使用，如果composite graph index被設置為unique()，則只能存在最多一個對應的屬性組合。

graph.tx().rollback()//Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
mgmt.buildIndex('byNameUnique',Vertex.class).addKey(name).unique().buildCompositeIndex()
mgmt.commit()
//Wait for the index to become available
mgmt.awaitGraphIndexStatus(graph,'byNameUnique').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameUnique"),SchemaAction.REINDEX).get()
mgmt.commit()

注意：對於設置為最終一致性的后端存儲，index的一致性必須被設置為允許鎖定。

Mixed Index

Mixed Index支持通過其中的任意key的組合查詢Vertex或者Edge。Mix Index使用上更加靈活，而且支持范圍查詢等（不僅包含相等）；從另外一方面說，Mixed index效率要比Composite Index低。

與Composite key不同，Mixed Index需要配置索引后端，JanusGraph可以在一次安裝中支持多個索引后端，而且每個索引后端必須使用JanusGraph中配置唯一標識：稱為indexing backend name。

graph.tx().rollback()//Never create new indexes while a transaction is active
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
age = mgmt.getPropertyKey('age')
mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name).addKey(age).buildMixedIndex("search")
mgmt.commit()
//Wait for the index to become available
mgmt.awaitGraphIndexStatus(graph,'nameAndAge').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("nameAndAge"),SchemaAction.REINDEX).get()
mgmt.commit()

上面的代碼建立了一個名為nameAndAge的索引，該索引使用name和age屬性構成，並設定其索引后端為"search"，對應到配置文件中為：index.serarch.backend，如果叫solrsearch，則需要增加：index. solrsearch.backend配置。

下面展示了如果使用text search作為默認的搜索行為：

mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name,Mapping.TEXT.getParameter()).addKey(age,Mapping.TEXT.getParameter()).buildMixedIndex("search")

更加詳細的使用參考：Charpter21, Index Parameter and Full-Test Search

在使用上，支持范圍查詢和索引中任何組合查詢，而不僅局限於“相等”查詢方式：

g.V().has('name', textContains('hercules')).has('age', inside(20,50))
g.V().has('name', textContains('hercules'))
g.V().has('age', lt(50))

Mixed Index支持全文檢索，范圍檢索，地理檢索和其他方式，參考Chapter20, Search Predicates and Data Types。

注意：不像composite index，mixed index不支持唯一性。

Adding Property Keys

可以向已經存在的mixed index中新增屬性，之后就可以在查詢條件中使用了。

//Never create new indexes while a transaction is active
graph.tx().rollback()
mgmt = graph.openManagement()
//創建一個新的屬性
location = mgmt.makePropertyKey('location').dataType(Geoshape.class).make()
nameAndAge = mgmt.getGraphIndex('nameAndAge')
//修改索引
mgmt.addIndexKey(nameAndAge, location)
mgmt.commit()
//Wait for the index to become available
mgmt.awaitGraphIndexStatus(graph,'nameAndAge').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("nameAndAge"),SchemaAction.REINDEX).get()
mgmt.commit()

如果索引是在同意事務中創建的，則在該事務中馬上可以使用。如果該屬性Key已經被使用，需要執行reindex procedure來保證索引中包含了所有數據，知道該過程執行完畢，否則不能使用。

Mapping Parameters

當向mixed index增加新的property key時（無論通過何種方式創建），可以指定一組參數來設置property value在后端的存儲方式。參考mapping paramters overview章節。

Ordering

圖查詢的集合返回順序可由order().by()指定，該方法包含了兩個參數：

排序依據的屬性名稱
升降序，incr和decr

如：

g.V().has('name', textContains('hercules')).order().by('age', decr).limit(10)

返回了name屬性中包含‘hercules’且以'age'降序返回的10條數據。

使用Order時需要注意：

composite graph index原生不支持對返回結果排序，數據會被先加載到內存中再進行排序，對於大數據集合來講成本非常高
Mixed graph index本身支持排序返回，但排序中要使用的property key需要提前被加到mix index中去，如果要排序的property key不是index的一部分，將會導致整個數據集合加載到內存。

Label Constraint

有些情況下，我們不想對圖中具有某一label的所有Vertex或Edge進行索引，例如，我們只想對有GOD標簽的節點進行索引，此時我們可以使用indexOnly方法表示只索引具有某一Label的Vertex和Edge。如下：

//Never create new indexes while a transaction is active
graph.tx().rollback()
mgmt = graph.openManagement()
name = mgmt.getPropertyKey('name')
god = mgmt.getVertexLabel('god')
//只索引有god這一label的頂點
mgmt.buildIndex('byNameAndLabel',Vertex.class).addKey(name).indexOnly(god).buildCompositeIndex()
mgmt.commit()
//Wait for the index to become available
mgmt.awaitGraphIndexStatus(graph,'byNameAndLabel').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("byNameAndLabel"),SchemaAction.REINDEX).get()
mgmt.commit()

label約束對mix index也是類似的，當一個有label約束的composite index被設置為唯一時，唯一約束只應用於具有此label的vertex或edge屬性上。

Composite versus Mixed Indexes

1. 使用comosite key應用與確切的匹配場景，composite key不需要外部索引系統且通常具有更好的性能。

作為一個例外，如果要精確匹配的值數量很小（如12個月份）或一個元素與圖中很多的元素有關聯，此時應使用mix index。

2. 對取范圍，全文檢索或位置查詢這樣的應用場景，應該使用mix index，而且使用mixed index可以提供order().by()的性能。

Vertex-centric Indexs

Vertex-centric index（頂點中心索引）是為每個vertex建立的本地索引結構，在大型graph中，每個vertex有數千條Edge，在這些vertex中遍歷效率將會非常低（需要在內存中過濾符合要求的Edge）。Vertex-centric index可以通過使用本地索引結構加速遍歷效率。

如：

h = g.V().has('name','hercules').next()
g.V(h).outE('battled').has('time', inside(10,20)).inV()

如果沒有vertex-centric index，則需要便利所有的batteled邊並找出記錄，在邊的數量龐大時效率非常低。

建立一個vertex-centric index可以加速查詢：

//Never create new indexes while a transaction is active
graph.tx().rollback()
mgmt = graph.openManagement()
//找到一個property key
time = mgmt.getPropertyKey('time')
// 找到一個label
battled = mgmt.getEdgeLabel('battled')
// 創建vertex-centric index
mgmt.buildEdgeIndex(battled,'battlesByTime',Direction.BOTH,Order.decr, time)
mgmt.commit()
//Wait for the index to become available
mgmt.awaitGraphIndexStatus(graph,'battlesByTime').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getGraphIndex("battlesByTime"),SchemaAction.REINDEX).get()
mgmt.commit()

上面的代碼對battled邊根據time以降序建立了雙向索引。buildEdgeIndex()方法中的第一個參數是要索引的Edge的Label，第二個參數是index的名稱，第三個參數是邊的方向，BOTH意味着可以使用IN/OUT，如果只設置為某一方向，可以減少一半的存儲和維護成本。最后兩個參數是index的排序方向，以及要索引的property key，property key可以是多個，order默認為升序（Order.ASC）。

graph.tx().rollback()//Never create new indexes while a transaction is active
mgmt = graph.openManagement()
time = mgmt.getPropertyKey('time')
rating = mgmt.makePropertyKey('rating').dataType(Double.class).make()
battled = mgmt.getEdgeLabel('battled')
mgmt.buildEdgeIndex(battled,'battlesByRatingAndTime',Direction.OUT,Order.decr, rating, time)
mgmt.commit()
//Wait for the index to become available
mgmt.awaitRelationIndexStatus(graph,'battlesByRatingAndTime','battled').call()
//Reindex the existing data
mgmt = graph.openManagement()
mgmt.updateIndex(mgmt.getRelationIndex(battled,'battlesByRatingAndTime'),SchemaAction.REINDEX).get()
mgmt.commit()

上面的代碼建立了battlesByRatingAndTime索引，並以rating和time構成，需要注意構成索引的property key的順序非常重要，查詢時只能根據propety key定義的順序查詢。

h = g.V().has('name','hercules').next()
g.V(h).outE('battled').property('rating',5.0)//Add some rating properties
g.V(h).outE('battled').has('rating', gt(3.0)).inV()
g.V(h).outE('battled').has('rating',5.0).has('time', inside(10,50)).inV()
g.V(h).outE('battled').has('time', inside(10,50)).inV()

對上面部分的代碼，只有查詢1,2是可以使用索引的，查詢3使用time查詢無法匹配先根據rating再根據time的index構造順序。可以對一個label創建多個不同的索引來支持不同的遍歷。JanusGraph自動選擇最有效的索引，Vertex-centric僅支持相等和range/interval約束。

注意：在vertex-centirc中使用的property key必須是顯式定義的且未確定的class類型（不是Object.class）才能支持排序。如果數據類型浮點型，必須使用JanusGraph的Decimal或Precision數據類型。

根據在同一事務中新建的label所創建的索引可以即刻生效，如果edge正在被使用，則需要運行reindex程序，直到該程序運行結束，否則該索引無法使用。

注意：JanusGraph自動為每個edge label的每個property key建立了vertex-centric label，因此即使有數千個邊也能高效查詢。

Vertex-centric label無法加速不受約束的遍歷（在所有邊中遍歷），這種遍歷隨着邊的增加會變的更慢，通常這些遍歷可以作為受約束遍歷重寫來提高性能。

Ordering Traversals

下面的查詢使用了local和limit方法獲取了遍歷過程的排序子集。

h = g..V().has('name','hercules').next()
g.V(h).local(outE('battled').order().by('time', decr).limit(10)).inV().values('name')
g.V(h).local(outE('battled').has('rating',5.0).order().by('time', decr).limit(10)).values('place')

如果排序字段和排序方向與vertex-centric index一致的話，上面的查詢非常高效。

注意：vertex 排序查詢時JanusGraph對Gremlin的擴展，要使用該功需要一段冗長的語句，而且需要_()步驟將JanusGraph轉換為Gremlin管道。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 在WPF中使用StreamGeometry提高性能。使用WITH AS提高性能簡化嵌套SQL EF提高性能 EF Core 使用編譯查詢提高性能為什么虛擬 dom 會提高性能? insert /*+append*/為什么會提高性能 http緩存提高性能高性能MySQL——創建高性能的索引 MySQL使用全文索引(fulltext index)---高性能如何提高CSS性能?CSS優化、提高性能提升總匯