JanusGraph 創建索引步驟(composite index)踩坑總結


前言

  JanusGraph是一個圖數據庫引擎,安裝及入門可以參考 JanusGraph 圖數據庫安裝小記。為了提高查詢速度,在使用過程中一般要為某些屬性創建索引。這篇隨筆主要是記錄創建索引過程中踩過的坑。

 

索引介紹

  與mysql創建索引不同,JanusGraph的索引有一套生命周期,如下圖所示:

        

  我們的目標是從<create>索引開始,通過一系列action,最終使索引進入ENABLED狀態。

       下面簡單說明以下各個狀態及操作:

  States(SchemaStatus)

  •   INSTALLED       The index is installed in the system but not yet registered with all instances in the cluster
  •    REGISTERED    The index is registered with all instances in the cluster but not (yet) enabled
  •   ENABLED          The index is enabled and in use (到這一步索引就可以用啦)
  •   DISABLED    The index is disabled and no longer in use (刪除索引)

  

  Actions (SchemaAction)

  •    REGISTER_INDEX    Registers the index with all instances in the graph cluster. After an index is installed, it must be registered with all graph instances
  •        REINDEX                   Re-builds the index from the graph(如果我們創建索引時已經存在數據,需要執行這個Action)
  •        ENABLE_INDEX       Enables the index so that it can be used by the query processing engine. An index must be registered before it can be enabled
  •   DISABLE_INDEX      Disables the index in the graph so that it is no longer used
  •        REMOVE_INDEX      Removes the index from the graph (optional operation). Only on composite index

 

創建索引

  創建索引的JanusGraph官方教程所描述的方法是一種理想情況,命令如下:

graph.tx().rollback() mgmt = graph.openManagement() name = mgmt.getPropertyKey('name') mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex() //Wait for the index to become available
ManagementSystem.awaitGraphIndexStatus(graph, 'byNameComposite').call() //Reindex the existing data mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get() mgmt.commit()

       實際上在操作時會遇到很多問題,其中最頭疼的就是在執行 awaitGraphIndexStatus()方法時,會報 “Script evaluation exceeded the configured 'scriptEvaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').call()]” 的錯誤。

    上面的命令其實忽略了關鍵的幾步,下面具體說明以下。

 

  1. 創建索引之前,確定JanusGraph沒有其它事務正在運行

   官方文檔說明如下:

  The name of a graph index must be unique. Graph indexes built against newly defined property keys, i.e. property keys that are defined in the same management transaction as the index, are immediately available. Graph indexes built against property keys that are already in use require the execution of a reindex procedure to ensure that the index contains all previously added elements. Until the reindex procedure has completed, the index will not be available. It is encouraged to define graph indexes in the same transaction as the initial schema.

    查詢事務命令:

graph.getOpenTransactions()

       假設有其它3條事務,通過官方的 graph.tx().rollback() 命令是無法全部關閉的,實際情況如下

gremlin> graph.getOpenTransactions() ==>standardtitantx[0x1e14c346] ==>standardtitantx[0x7a0067f2] ==>standardtitantx[0x0de3ee40] gremlin> graph.tx().rollback() ==>null gremlin> graph.getOpenTransactions() ==>standardtitantx[0x1e14c346] ==>standardtitantx[0x7a0067f2] ==>standardtitantx[0x0de3ee40]

  正確的關閉方法:

for(i=0;i<size;i++) {graph.getOpenTransactions().getAt(0).rollback()}  //size替換為事務的數量

 

  2. 執行 REGISTER_INDEX ACTION,使索引狀態INSTALLED 轉為 REGISTERED

  官方文檔里沒有這關鍵的一步,在創建完索引后,需要執行以下命令

 

m = graph.openManagement() m.updateIndex(m.getGraphIndex('index'), SchemaAction.REGISTER_INDEX).get() m.commit()

 

  其中第三條命令執行后實際上是在后台運行的,此時如果我們執行  ManagementSystem.awaitGraphIndexStatus(graph,"byNameComposite").status(SchemaStatus.REGISTERED).call() ,等待30s后很可能依然返回超時錯誤。這時候需要耐心等待。期間,我們可以通過查看后台cassandra進程CPU占用率來判斷是否執行完成。或者可以直接查看索引的狀態:

mgmt = graph.openManagement()
index = mgmt.getGraphIndex('index') Index.getIndexStatus(mgmt.getPropertyKey('name'))

  等待一段時間后,索引的狀態最終會變為 REGISTERED,此時再執行awaitGraphIndexStatus() ,會返回

GraphIndexStatusReport[success=true, indexName='byTitleLowercaseComposite', targetStatus=[REGISTERED], notConverged={}, converged={title_lowercase=REGISTERED}, elapsed=PT0.001S]

   注意:若索引遲遲沒有變為REGISTERED,也可嘗試進行下一步,更新到ENABLE。

  3. 執行REINDEX與ENABLE_INDEX,完成索引

  與上一步類似,需要通過updateIndex()方法來改變索引狀態。如果要索引的屬性中還未導入數據,則不需要REINDEX的操作,下面的命令二選一:

  REINDEX ACTION:

m = graph.openManagement()
m.updateIndex(m.getGraphIndex('index'), SchemaAction.REINDEX).get()
m.commit()

ManagementSystem.awaitGraphIndexStatus(graph, 'byNameComposite').status(SchemaStatus.ENABLED).call()

   ENABLED ACTION:

m = graph.openManagement()
m.updateIndex(m.getGraphIndex('index'), SchemaAction.ENABLE_INDEX).get() 
m.commit()

ManagementSystem.awaitGraphIndexStatus(graph,
'byNameComposite').status(SchemaStatus.ENABLED).call()

錯誤示例:
i = m.getGraphIndex('index')
m.updateIndex(i, SchemeAction.ENABLE_INDEX)
m.commit()

必須要加‘get()’

 

  到最后, 執行awaitGraphIndexStatus()返回成功信息:

GraphIndexStatusReport[success=true, indexName='byTitleLowercaseComposite', targetStatus=[ENABLED], notConverged={}, converged={title_lowercase=ENABLED}, elapsed=PT0.001S]

 

 

到此,索引就創建完畢了,如果想要了解更多問題可以留言討論,或者科學上網進一步學習。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM