geotrellis使用（五）使用scala操作Accumulo

本文轉載自查看原文 2016-05-10 17:00 1664 HADOOP/ SCALA/ 地理信息

要想搞明白Geotrellis的數據處理情況，首先要弄清楚數據的存放，Geotrellis將數據存放在Accumulo中。

Accumulo是一個分布式的Key Value型NOSQL數據庫，官網為（https://accumulo.apache.org/），在使用Ambari安裝hadoop集群一文中已經介紹了如何安裝Hadoop集群以及Accumulo。

Accumulo以表來分區存放數據，結構為Key Value，其中Key又包含RowID和Column，Column又包含Family、Qualifier、Visibility。

閑話莫談，首先介紹一下如何在accumulo shell中操作Accumulo。

1、進入accumulo shell控制台

accumulo shell -u [username]

username就是具有操作accumulo的用戶

2、查看所有表

tables

3、創建表

createtable mytable

4、刪除表

deletetable mytable

5、掃描表，查看數據

scan

6、插入數據插入數據的時候要在當前表的工作域中

insert row1 colf colq value1

只要rowID family qualifier有一個不重復即可，如果重復會覆蓋掉原來的value。

7、切換表

table mytable

下面介紹一下如何使用Scala語言操作Accumulo，也比較簡單，先貼出全部代碼

 1 object Main {
 2 
 3   val token = new PasswordToken("pass")
 4   val user = "root"
 5   val instanceName = "hdp-accumulo-instance"
 6   val zooServers = "zooserver"
 7   val table = "table"
 8 
 9   def main(args: Array[String]) {
10 //    write
11     read
12   }
13 
14   def read = {
15     val conn = getConn
16     val auths = Authorizations.EMPTY// new Authorizations("Valid")
17     val scanner = conn.createScanner(table, auths)
18 
19     val range = new org.apache.accumulo.core.data.Range("row1", "row2") // start row --- end row  即row ID
20     scanner.setRange(range)
21 //    scanner.fetchColumnFamily()
22     //    println(scanner.iterator().next().getKey)
23     val iter = scanner.iterator()
24     while (iter.hasNext){
25       var item = iter.next()
26       //Accumulo中數據存放在table中，分為Key Value，其中Key又包含RowID和Column，Column包含Family Qualifier Visibility
27       println(s"key  row:${item.getKey.getRow} fam:${item.getKey.getColumnFamily} qua:${item.getKey.getColumnQualifier} value:${item.getValue}")
28     }
29 //    for(entry <- scanner) {
30 //      println(entry.getKey + " is " + entry.getValue)
31 //    }
32   }
33 
34   def write {
35     val mutation = createMutation
36     val writer = getWriter
37     writer.addMutation(mutation)
38 //    writer.flush()
39     writer.close
40   }
41 
42   def createMutation = {
43     val rowID = new Text("row2")
44     val colFam = new Text("myColFam")
45     val colQual = new Text("myColQual")
46     //  val colVis = new ColumnVisibility("public")  //不需要加入可見性
47     var timstamp = System.currentTimeMillis
48     val value = new Value("myValue".getBytes)
49     val mutation = new Mutation(rowID)
50     mutation.put(colFam, colQual, timstamp, value)
51     mutation
52   }
53 
54   def getConn = {
55     val inst = new ZooKeeperInstance(instanceName, zooServers)
56     val conn = inst.getConnector("root", token)
57     conn
58   }
59 
60   def getWriter() = {
61     val conn = getConn
62     val config = new BatchWriterConfig
63     config.setMaxMemory(10000000L)
64     val writer: BatchWriter = conn.createBatchWriter(table, config)
65     writer
66   }
67 }

以上代碼主要實現了Accumulo的讀寫操作，其中zooServers是安裝的zookeeper的主節點地址。instanceName是accumulo的實例名稱。read的Range實現了范圍內查找，但是此處的范圍需要輸入的是RowID的起始值，由於Accumulo是自動排序的，所以此處輸入范圍會將該范圍內的數據全部返回。其他代碼均通俗易懂（自認為，哈哈），所以不在這里贅述。

本文簡單介紹了Accumulo的操作，僅是為了方便理解Geotrellis的工作原理以及閱讀Geotrellis的源代碼做准備，若是有人恰好需要將數據存放到集群中，不妨可以試一下存入到Accumulo中。

參考鏈接

一、 geotrellis使用初探
二、 geotrellis使用（二）geotrellis-chatta-demo以及geotrellis框架數據讀取方式初探
三、 geotrellis使用（三）geotrellis數據處理過程分析
四、 geotrellis使用（四）geotrellis數據處理部分細節

五、 geotrellis使用（五）使用scala操作Accumulo

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 geotrellis使用初探使用Scala操作Mongodb geotrellis使用（四）geotrellis數據處理部分細節 geotrellis使用（三十八）COG 寫入和讀取 geotrellis使用（三十）使用geotrellis讀取PostGIS空間數據 geotrellis使用（十八）導入多波段Tiff、讀取多波段Tile Spark之使用SparkSql操作Hive的Scala程序實現 Spark之使用SparkSql操作mysql和DataFrame的Scala實現 scala的多種集合的使用(6)之映射Map的操作方法 scala簡單的文件操作