zeppelin是spark的web版本notebook編輯器,相當於ipython的notebook編輯器。
一Zeppelin安裝
(前提是spark已經安裝好)
1 下載https://zeppelin.apache.org/download.html(下載編譯好的bin版)
2 解壓運行:sh bin/zeppelin-daemon.sh start
3 權限問題:chown –R –v mapr:mapr zeppelin
4 異常:jackson版本沖突
4.1報錯:
com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
at [Source: {"id":"5","name":"textFile"}; line: 1, column: 1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
4.2原因:jackson版本沖突,查看自己spark的pow文件,下載需要的jackson版本
<fasterxml.jackson.version>2.4.4</fasterxml.jackson.version>
,依賴2.4.4,而zeppelin加載2.5.3。
[mapr@apm1 zeppelin-0.6.0-bin-netinst]$ find . | grep jackson
./lib/jackson-annotations-2.5.0.jar
./lib/jackson-core-2.5.3.jar
./lib/jackson-databind-2.5.3.jar
4.3方案:
把上面三個jar包替換這三個為2.4.4,在maven的依賴包中找到下面三個文件:
/lib/jackson-annotations-2.4.4.jar
/lib/jackson-databind-2.4.4.jar
/lib/jackson-core-2.4.4.jar
並重啟zeppelin
5瀏覽器登陸http://localhost:8080/,設置默認interpretation,點擊保存即可。
二 Zeppelin使用
1 加載bank.csv數據集
val bankText = sc.textFile("bank.csv") case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer) val bank = bankText.map(s => s.split(";")).filter(s => s(0) != "\"age\"").map( s => Bank(s(0).toInt, s(1).replaceAll("\"", ""), s(2).replaceAll("\"", ""), s(3).replaceAll("\"", ""), s(5).replaceAll("\"", "").toInt ) ).toDF() bank.registerTempTable("bank")
2sql統計
3 sql統計
4 sql統計