使用spark操作kudu


 

Spark與KUDU集成支持:

  • DDL操作(創建/刪除)

  • 本地Kudu RDD

  • Native Kudu數據源,用於DataFrame集成

  • 從kudu讀取數據

  • 從Kudu執行插入/更新/ upsert /刪除

  • 謂詞下推

  • Kudu和Spark SQL之間的模式映射

    到目前為止,我們已經聽說過幾個上下文,例如SparkContext,SQLContext,HiveContext,SparkSession,現在,我們將使用Kudu引入一個KuduContext。這是可在Spark應用程序中廣播的主要可序列化對象。此類代表在Spark執行程序中與Kudu Java客戶端進行交互。

    KuduContext提供執行DDL操作所需的方法,與本機Kudu RDD的接口,對數據執行更新/插入/刪除,將數據類型從Kudu轉換為Spark等。

    比較常見的操作:

// Create a Spark and SQL context
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
 
// Comma-separated list of Kudu masters with port numbers
val master1 = "ip-10-13-4-249.ec2.internal:7051"
val master2 = "ip-10-13-5-150.ec2.internal:7051"
val master3 = "ip-10-13-5-56.ec2.internal:7051"
val kuduMasters = Seq(master1, master2, master3).mkString(",")
 
// Create an instance of a KuduContext
val kuduContext = new KuduContext(kuduMasters)

Maven導包

 <repositories>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
    </repositories>


<dependencies>
    <!-- https://mvnrepository.com/artifact/org.apache.kudu/kudu-client -->
    <dependency>
        <groupId>org.apache.kudu</groupId>
        <artifactId>kudu-client</artifactId>
        <version>1.6.0-cdh5.14.0</version>
        <scope>test</scope>
    </dependency>


    <!-- https://mvnrepository.com/artifact/org.apache.kudu/kudu-client-tools -->
    <dependency>
        <groupId>org.apache.kudu</groupId>
        <artifactId>kudu-client-tools</artifactId>
        <version>1.6.0-cdh5.14.0</version>
    </dependency>


    <!-- https://mvnrepository.com/artifact/org.apache.kudu/kudu-spark2 -->
    <dependency>
        <groupId>org.apache.kudu</groupId>
        <artifactId>kudu-spark2_2.11</artifactId>
        <version>1.6.0-cdh5.14.0</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.1.0</version>
    </dependency>
</dependencies>
View Code

具體詳細代碼看下一章介紹

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM