HugeGraph介紹
以下引自官方文檔:
HugeGraph是一款易用、高效、通用的開源圖數據庫系統(Graph Database,GitHub項目地址), 實現了Apache TinkerPop3框架及完全兼容Gremlin查詢語言, 具備完善的工具鏈組件,助力用戶輕松構建基於圖數據庫之上的應用和產品。HugeGraph支持百億以上的頂點和邊快速導入,並提供毫秒級的關聯關系查詢能力(OLTP), 並可與Hadoop、Spark等大數據平台集成以進行離線分析(OLAP)。
HugeGraph典型應用場景包括深度關系探索、關聯分析、路徑搜索、特征抽取、數據聚類、社區檢測、 知識圖譜等,適用業務領域有如網絡安全、電信詐騙、金融風控、廣告推薦、社交網絡和智能機器人等。
划重點:
- 基於TinkerPop3框架,兼容Gremlin查詢語言
- OLTP(開源) 與 OLAP(商業版)
- 常用圖應用支持—— 路徑搜索、推薦等
架構介紹
架構圖
HugeGraph包括三個層次的功能,分別是存儲層、計算層和用戶接口層。 HugeGraph支持OLTP和OLAP兩種圖計算類型

組件
HugeGraph的主要功能分為HugeCore、ApiServer、HugeGraph-Client、HugeGraph-Loader和HugeGraph-Studio等組件構成,各組件之間的通信關系如下圖所示。

其中核心組件:
- HugeCore :HugeGraph的核心模塊,TinkerPop的接口主要在該模塊中實現。
- ApiServer :提供RESTFul Api接口,對外提供Graph Api、Schema Api和Gremlin Api等接口服務。
- HugeGraph-Client:基於Java客戶端驅動程序
生態組件:
- HugeGraph-Loader:數據導入模塊。HugeGraph-Loader可以掃描並分析現有數據,自動生成Graph Schema創建語言,通過批量方式快速導入數據。
- HugeGraph-Studio:基於Web的可視化IDE環境。以Notebook方式記錄Gremlin查詢,可視化展示Graph的關聯關系。HugeGraph-Studio也是本系統推薦的工具。
HugeGraph-Studio 看起來已經被拋棄了,研發團隊正開發一個名為'hugegraph-hubble' 的新項目:
hugegraph-hubble is a graph management and analysis platform that provides features: graph data load, schema management, graph relationship analysis and graphical display.
根據官方的說明,hubble定義為圖譜管理和分析平台,提供圖譜數據加載、schema管理、圖分析和可視化展示,目前正在研發中,預計2020年9月份會發布首個版本。
設計理念
常見的圖數據表示模型有兩種:
- RDF(Resource Description Framework)模型: 學術界的選擇,通過sparql來進行查詢,
jena,gStore等等 - 屬性圖(Property Graph)模型,工業界的選擇,
neo4j和janusgraph都是這種方案。
RDF是W3C標准,而Property Graph是工業標准,受到廣大圖數據庫廠商的廣泛支持。HugeGraph采用Property Graph,遵循工業標准。
HugeGraph存儲概念模型詳見下圖:

主要包含幾個部分:
- Vertex(頂點),對應一個實體(Entity)
- Vertex Label(頂點的類型),對應一個概念(Concept)
- 屬性(圖里的name、age),PropertyKey
- Edge邊(圖里的lives),對應RDF里的Relation
可擴展性
HugeGraph提供了豐富的插件擴展機制,包含幾個維度的擴展項:
- 后端存儲
- 序列化器
- 自定義配置項
- 分詞器
插件實現機制
- HugeGraph提供插件接口HugeGraphPlugin,通過Java SPI機制支持插件化
- HugeGraph提供了4個擴展項注冊函數:
registerOptions()、registerBackend()、registerSerializer()、registerAnalyzer() - 插件實現者實現相應的Options、Backend、Serializer或Analyzer的接口
- 插件實現者實現HugeGraphPlugin接口的
register()方法,在該方法中注冊上述第3點所列的具體實現類,並打成jar包 - 插件使用者將jar包放在HugeGraph Server安裝目錄的
plugins目錄下,修改相關配置項為插件自定義值,重啟即可生效
從案例深入源碼
想要深入的理解一個系統的源碼,先從具體的應用入手。先查看example代碼:
https://github.com/hugegraph/hugegraph/blob/master/hugegraph-example/src/main/java/com/baidu/hugegraph/example/Example1.java
public static void main(String[] args) throws Exception {
LOG.info("Example1 start!");
HugeGraph graph = ExampleUtil.loadGraph();
Example1.showFeatures(graph);
Example1.loadSchema(graph);
Example1.loadData(graph);
Example1.testQuery(graph);
Example1.testRemove(graph);
Example1.testVariables(graph);
Example1.testLeftIndexProcess(graph);
Example1.thread(graph);
graph.close();
HugeFactory.shutdown(30L);
}
1. loadGraph
要使用hugegraph,需要先初始化一個HugeGraph對象,而LoadGraph 正是做這個的。

public static HugeGraph loadGraph(boolean needClear, boolean needProfile) {
if (needProfile) {
profile();
}
registerPlugins();
String conf = "hugegraph.properties";
try {
String path = ExampleUtil.class.getClassLoader()
.getResource(conf).getPath();
File file = new File(path);
if (file.exists() && file.isFile()) {
conf = path;
}
} catch (Exception ignored) {
}
HugeGraph graph = HugeFactory.open(conf);
if (needClear) {
graph.clearBackend();
}
graph.initBackend();
return graph;
}
1.1 registerPlugins
其中 registerPlugins 注冊插件,注意上面介紹的擴展機制。hugegraph所有的后端存儲都需要通過插件注冊。
public static void registerPlugins() {
if (registered) {
return;
}
registered = true;
RegisterUtil.registerCassandra();
RegisterUtil.registerScyllaDB();
RegisterUtil.registerHBase();
RegisterUtil.registerRocksDB();
RegisterUtil.registerMysql();
RegisterUtil.registerPalo();
}
注冊主要是register配置、序列化器和backend,比如下面是mysql的。
public static void registerMysql() {
// Register config
OptionSpace.register("mysql",
"com.baidu.hugegraph.backend.store.mysql.MysqlOptions");
// Register serializer
SerializerFactory.register("mysql",
"com.baidu.hugegraph.backend.store.mysql.MysqlSerializer");
// Register backend
BackendProviderFactory.register("mysql",
"com.baidu.hugegraph.backend.store.mysql.MysqlStoreProvider");
}
1.2 HugeFactory.open
HugeFactory 是Hugraph的工廠類,支持傳入Configuraion配置信息,構建一個HugeGraph實例,注意這里為了線程安全,簽名采用synchronized
public static synchronized HugeGraph open(Configuration config) {
HugeConfig conf = config instanceof HugeConfig ?
(HugeConfig) config : new HugeConfig(config);
String name = conf.get(CoreOptions.STORE);
checkGraphName(name, "graph config(like hugegraph.properties)");
name = name.toLowerCase();
HugeGraph graph = graphs.get(name);
if (graph == null || graph.closed()) {
graph = new StandardHugeGraph(conf);
graphs.put(name, graph);
} else {
String backend = conf.get(CoreOptions.BACKEND);
E.checkState(backend.equalsIgnoreCase(graph.backend()),
"Graph name '%s' has been used by backend '%s'",
name, graph.backend());
}
return graph;
}
這里順帶提下配置文件,通過代碼看到,默認是讀取hugegraph.properties.
1.3 HugeGraph 對象
HugeGraph是一個interface,繼承gremlin的Graph接口,定義了圖譜的Schema定義、數據存儲、查詢等API方法。從上面1.2可以看到,默認的實現是StandardHugeGraph。
public interface HugeGraph extends Graph {
public HugeGraph hugegraph();
public SchemaManager schema();
public Id getNextId(HugeType type);
public void addPropertyKey(PropertyKey key);
public void removePropertyKey(Id key);
public Collection<PropertyKey> propertyKeys();
public PropertyKey propertyKey(String key);
public PropertyKey propertyKey(Id key);
public boolean existsPropertyKey(String key);
...
1.4 graph.clearBackend 與initBackend
clearBackend將后端數據清理,initBackend初始化基本的數據結構。
2. loadSchema
該方法,用來定義schema:
public static void loadSchema(final HugeGraph graph) {
SchemaManager schema = graph.schema();
// Schema changes will be commit directly into the back-end
LOG.info("=============== propertyKey ================");
schema.propertyKey("id").asInt().create();
schema.propertyKey("name").asText().create();
schema.propertyKey("gender").asText().create();
schema.propertyKey("instructions").asText().create();
schema.propertyKey("category").asText().create();
schema.propertyKey("year").asInt().create();
schema.propertyKey("time").asText().create();
schema.propertyKey("timestamp").asDate().create();
schema.propertyKey("ISBN").asText().create();
schema.propertyKey("calories").asInt().create();
schema.propertyKey("amount").asText().create();
schema.propertyKey("stars").asInt().create();
schema.propertyKey("age").asInt().valueSingle().create();
schema.propertyKey("comment").asText().valueSet().create();
schema.propertyKey("contribution").asText().valueSet().create();
schema.propertyKey("nickname").asText().valueList().create();
schema.propertyKey("lived").asText().create();
schema.propertyKey("country").asText().valueSet().create();
schema.propertyKey("city").asText().create();
schema.propertyKey("sensor_id").asUUID().create();
schema.propertyKey("versions").asInt().valueList().create();
LOG.info("=============== vertexLabel ================");
schema.vertexLabel("person")
.properties("name", "age", "city")
.primaryKeys("name")
.create();
schema.vertexLabel("author")
.properties("id", "name", "age", "lived")
.primaryKeys("id").create();
schema.vertexLabel("language").properties("name", "versions")
.primaryKeys("name").create();
schema.vertexLabel("recipe").properties("name", "instructions")
.primaryKeys("name").create();
schema.vertexLabel("book").properties("name")
.primaryKeys("name").create();
schema.vertexLabel("reviewer").properties("name", "timestamp")
.primaryKeys("name").create();
// vertex label must have the properties that specified in primary key
schema.vertexLabel("FridgeSensor").properties("city")
.primaryKeys("city").create();
LOG.info("=============== vertexLabel & index ================");
schema.indexLabel("personByCity")
.onV("person").secondary().by("city").create();
schema.indexLabel("personByAge")
.onV("person").range().by("age").create();
schema.indexLabel("authorByLived")
.onV("author").search().by("lived").create();
// schemaManager.getVertexLabel("author").index("byName").secondary().by("name").add();
// schemaManager.getVertexLabel("recipe").index("byRecipe").materialized().by("name").add();
// schemaManager.getVertexLabel("meal").index("byMeal").materialized().by("name").add();
// schemaManager.getVertexLabel("ingredient").index("byIngredient").materialized().by("name").add();
// schemaManager.getVertexLabel("reviewer").index("byReviewer").materialized().by("name").add();
LOG.info("=============== edgeLabel ================");
schema.edgeLabel("authored").singleTime()
.sourceLabel("author").targetLabel("book")
.properties("contribution", "comment")
.nullableKeys("comment")
.create();
schema.edgeLabel("write").multiTimes().properties("time")
.sourceLabel("author").targetLabel("book")
.sortKeys("time")
.create();
schema.edgeLabel("look").multiTimes().properties("timestamp")
.sourceLabel("person").targetLabel("book")
.sortKeys("timestamp")
.create();
schema.edgeLabel("created").singleTime()
.sourceLabel("author").targetLabel("language")
.create();
schema.edgeLabel("rated")
.sourceLabel("reviewer").targetLabel("recipe")
.create();
}
划重點:
- SchemaManager schema = graph.schema() 獲取SchemaManager
- schema.propertyKey(NAME).asXXType().create() 創建屬性
- schema.vertexLabel("person") // 定義概念
.properties("name", "age", "city") // 定義概念的屬性
.primaryKeys("name") // 定義primary Keys,primary Key組合后可以唯一確定一個實體
.create();
- schema.indexLabel("personByCity").onV("person").secondary().by("city").create(); 定義索引
- schema.edgeLabel("authored").singleTime()
.sourceLabel("author").targetLabel("book")
.properties("contribution", "comment")
.nullableKeys("comment")
.create(); // 定義關系
3. loadData
創建實體,注意格式,K-V成對出現:
graph.addVertex(T.label, "book", "name", "java-3");
創建關系,Vertex的addEdge方法:
Vertex james = tx.addVertex(T.label, "author", "id", 1,
"name", "James Gosling", "age", 62,
"lived", "San Francisco Bay Area");
Vertex java = tx.addVertex(T.label, "language", "name", "java",
"versions", Arrays.asList(6, 7, 8));
Vertex book1 = tx.addVertex(T.label, "book", "name", "java-1");
Vertex book2 = tx.addVertex(T.label, "book", "name", "java-2");
Vertex book3 = tx.addVertex(T.label, "book", "name", "java-3");
james.addEdge("created", java);
james.addEdge("authored", book1,
"contribution", "1990-1-1",
"comment", "it's a good book",
"comment", "it's a good book",
"comment", "it's a good book too");
james.addEdge("authored", book2, "contribution", "2017-4-28");
james.addEdge("write", book2, "time", "2017-4-28");
james.addEdge("write", book3, "time", "2016-1-1");
james.addEdge("write", book3, "time", "2017-4-28");
添加后,需要commit
4. testQuery 測試查詢
查詢主要通過GraphTraversal, 可以通過graph.traversal()獲得:
public static void testQuery(final HugeGraph graph) {
// query all
GraphTraversal<Vertex, Vertex> vertices = graph.traversal().V();
int size = vertices.toList().size();
assert size == 12;
System.out.println(">>>> query all vertices: size=" + size);
// query by label
vertices = graph.traversal().V().hasLabel("person");
size = vertices.toList().size();
assert size == 5;
System.out.println(">>>> query all persons: size=" + size);
// query vertex by primary-values
vertices = graph.traversal().V().hasLabel("author").has("id", 1);
List<Vertex> vertexList = vertices.toList();
assert vertexList.size() == 1;
System.out.println(">>>> query vertices by primary-values: " +
vertexList);
VertexLabel author = graph.schema().getVertexLabel("author");
String authorId = String.format("%s:%s", author.id().asString(), "11");
// query vertex by id and query out edges
vertices = graph.traversal().V(authorId);
GraphTraversal<Vertex, Edge> edgesOfVertex = vertices.outE("created");
List<Edge> edgeList = edgesOfVertex.toList();
assert edgeList.size() == 1;
System.out.println(">>>> query edges of vertex: " + edgeList);
vertices = graph.traversal().V(authorId);
vertexList = vertices.out("created").toList();
assert vertexList.size() == 1;
System.out.println(">>>> query vertices of vertex: " + vertexList);
// query edge by sort-values
vertices = graph.traversal().V(authorId);
edgesOfVertex = vertices.outE("write").has("time", "2017-4-28");
edgeList = edgesOfVertex.toList();
assert edgeList.size() == 2;
System.out.println(">>>> query edges of vertex by sort-values: " +
edgeList);
// query vertex by condition (filter by property name)
ConditionQuery q = new ConditionQuery(HugeType.VERTEX);
PropertyKey age = graph.propertyKey("age");
q.key(HugeKeys.PROPERTIES, age.id());
if (graph.backendStoreFeatures()
.supportsQueryWithContainsKey()) {
Iterator<Vertex> iter = graph.vertices(q);
assert iter.hasNext();
System.out.println(">>>> queryVertices(age): " + iter.hasNext());
while (iter.hasNext()) {
System.out.println(">>>> queryVertices(age): " + iter.next());
}
}
// query all edges
GraphTraversal<Edge, Edge> edges = graph.traversal().E().limit(2);
size = edges.toList().size();
assert size == 2;
System.out.println(">>>> query all edges with limit 2: size=" + size);
// query edge by id
EdgeLabel authored = graph.edgeLabel("authored");
VertexLabel book = graph.schema().getVertexLabel("book");
String book1Id = String.format("%s:%s", book.id().asString(), "java-1");
String book2Id = String.format("%s:%s", book.id().asString(), "java-2");
String edgeId = String.format("S%s>%s>%s>S%s",
authorId, authored.id(), "", book2Id);
edges = graph.traversal().E(edgeId);
edgeList = edges.toList();
assert edgeList.size() == 1;
System.out.println(">>>> query edge by id: " + edgeList);
Edge edge = edgeList.get(0);
edges = graph.traversal().E(edge.id());
edgeList = edges.toList();
assert edgeList.size() == 1;
System.out.println(">>>> query edge by id: " + edgeList);
// query edge by condition
q = new ConditionQuery(HugeType.EDGE);
q.eq(HugeKeys.OWNER_VERTEX, IdGenerator.of(authorId));
q.eq(HugeKeys.DIRECTION, Directions.OUT);
q.eq(HugeKeys.LABEL, authored.id());
q.eq(HugeKeys.SORT_VALUES, "");
q.eq(HugeKeys.OTHER_VERTEX, IdGenerator.of(book1Id));
Iterator<Edge> edges2 = graph.edges(q);
assert edges2.hasNext();
System.out.println(">>>> queryEdges(id-condition): " +
edges2.hasNext());
while (edges2.hasNext()) {
System.out.println(">>>> queryEdges(id-condition): " +
edges2.next());
}
// NOTE: query edge by has-key just supported by Cassandra
if (graph.backendStoreFeatures().supportsQueryWithContainsKey()) {
PropertyKey contribution = graph.propertyKey("contribution");
q.key(HugeKeys.PROPERTIES, contribution.id());
Iterator<Edge> edges3 = graph.edges(q);
assert edges3.hasNext();
System.out.println(">>>> queryEdges(contribution): " +
edges3.hasNext());
while (edges3.hasNext()) {
System.out.println(">>>> queryEdges(contribution): " +
edges3.next());
}
}
// query by vertex label
vertices = graph.traversal().V().hasLabel("book");
size = vertices.toList().size();
assert size == 5;
System.out.println(">>>> query all books: size=" + size);
// query by vertex label and key-name
vertices = graph.traversal().V().hasLabel("person").has("age");
size = vertices.toList().size();
assert size == 5;
System.out.println(">>>> query all persons with age: size=" + size);
// query by vertex props
vertices = graph.traversal().V().hasLabel("person")
.has("city", "Taipei");
vertexList = vertices.toList();
assert vertexList.size() == 1;
System.out.println(">>>> query all persons in Taipei: " + vertexList);
vertices = graph.traversal().V().hasLabel("person").has("age", 19);
vertexList = vertices.toList();
assert vertexList.size() == 1;
System.out.println(">>>> query all persons age==19: " + vertexList);
vertices = graph.traversal().V().hasLabel("person")
.has("age", P.lt(19));
vertexList = vertices.toList();
assert vertexList.size() == 1;
assert vertexList.get(0).property("age").value().equals(3);
System.out.println(">>>> query all persons age<19: " + vertexList);
String addr = "Bay Area";
vertices = graph.traversal().V().hasLabel("author")
.has("lived", Text.contains(addr));
vertexList = vertices.toList();
assert vertexList.size() == 1;
System.out.println(String.format(">>>> query all authors lived %s: %s",
addr, vertexList));
}
划重點
查詢指定label的實體:
vertices = graph.traversal().V().hasLabel("person");
size = vertices.toList().size();
根據primary-values查詢實體:
vertices = graph.traversal().V().hasLabel("author").has("id", 1);
List<Vertex> vertexList = vertices.toList();
查詢edge:
查詢所有edge:
GraphTraversal<Edge, Edge> edges = graph.traversal().E().limit(2);
根據ID查詢edge:
EdgeLabel authored = graph.edgeLabel("authored");
VertexLabel book = graph.schema().getVertexLabel("book");
String book1Id = String.format("%s:%s", book.id().asString(), "java-1");
String book2Id = String.format("%s:%s", book.id().asString(), "java-2");
String edgeId = String.format("S%s>%s>%s>S%s",
authorId, authored.id(), "", book2Id);
edges = graph.traversal().E(edgeId);
注意,edge的id由幾個字段拼接起來的: "S%s>%s>%s>S%s",authorId, authored.id(), "", book2Id)
根據條件查詢edge:
q = new ConditionQuery(HugeType.EDGE);
q.eq(HugeKeys.OWNER_VERTEX, IdGenerator.of(authorId));
q.eq(HugeKeys.DIRECTION, Directions.OUT);
q.eq(HugeKeys.LABEL, authored.id());
q.eq(HugeKeys.SORT_VALUES, "");
q.eq(HugeKeys.OTHER_VERTEX, IdGenerator.of(book1Id));
Iterator<Edge> edges2 = graph.edges(q);
assert edges2.hasNext();
System.out.println(">>>> queryEdges(id-condition): " +
edges2.hasNext());
while (edges2.hasNext()) {
System.out.println(">>>> queryEdges(id-condition): " +
edges2.next());
}
可以指定DIRECTION,
5. 刪除
刪除Vetex,調用vetex自帶的remove方法
// remove vertex (and its edges)
List<Vertex> vertices = graph.traversal().V().hasLabel("person")
.has("age", 19).toList();
assert vertices.size() == 1;
Vertex james = vertices.get(0);
Vertex book6 = graph.addVertex(T.label, "book", "name", "java-6");
james.addEdge("look", book6, "timestamp", "2017-5-2 12:00:08.0");
james.addEdge("look", book6, "timestamp", "2017-5-3 12:00:08.0");
graph.tx().commit();
assert graph.traversal().V(book6.id()).bothE().hasNext();
System.out.println(">>>> removing vertex: " + james);
james.remove();
graph.tx().commit();
assert !graph.traversal().V(james.id()).hasNext();
assert !graph.traversal().V(book6.id()).bothE().hasNext();
刪除關系,也類似:
// remove edge
VertexLabel author = graph.schema().getVertexLabel("author");
String authorId = String.format("%s:%s", author.id().asString(), "11");
EdgeLabel authored = graph.edgeLabel("authored");
VertexLabel book = graph.schema().getVertexLabel("book");
String book2Id = String.format("%s:%s", book.id().asString(), "java-2");
String edgeId = String.format("S%s>%s>%s>S%s",
authorId, authored.id(), "", book2Id);
List <Edge> edges = graph.traversal().E(edgeId).toList();
assert edges.size() == 1;
Edge edge = edges.get(0);
System.out.println(">>>> removing edge: " + edge);
edge.remove();
graph.tx().commit();
assert !graph.traversal().E(edgeId).hasNext();
小結
本文初步介紹了hugegraph設計理念、基本使用等。
作者:Jadepeng
出處:jqpeng的技術記事本--http://www.cnblogs.com/xiaoqi
您的支持是對博主最大的鼓勵,感謝您的認真閱讀。
本文版權歸作者所有,歡迎轉載,但未經作者同意必須保留此段聲明,且在文章頁面明顯位置給出原文連接,否則保留追究法律責任的權利。
