flink 系列13 - Flink 集成 CDH 源码编译


由于 Apache Flink 的开源二进制包未提供 HDP、MapR 和 CDH 的下载,所以,所以要兼容基于这些厂商的库编译 Apache Flink。

1、环境

Jdk 1.8、Maven 3.6.2、Scala-2.12

2、源码和 CDH 版本

Flink 1.10.0、CDH 6.3.1(Hadoop 3.0.0、Spark 2.4.0、Kafka-2.2.1 、Hive-2.1.1 和 HBase-2.1.0)

3、配置maven源

<mirrors>
	<mirror>
            <id>alimaven</id>
            <mirrorOf>central</mirrorOf>
            <name>aliyun maven</name>
            <url>http://maven.aliyun.com/nexus/content/repositories/central/</url>
    </mirror>
    <mirror>
            <id>alimaven</id>
            <name>aliyun maven</name>
            <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
            <mirrorOf>central</mirrorOf>
    </mirror>
    <mirror>
            <id>central</id>
            <name>Maven Repository Switchboard</name>
            <url>http://repo1.maven.org/maven2/</url>
            <mirrorOf>central</mirrorOf>
    </mirror>
    <mirror>
            <id>repo2</id>
            <mirrorOf>central</mirrorOf>
            <name>Human Readable Name for this Mirror.</name>
            <url>http://repo2.maven.org/maven2/</url>
    </mirror>
    <mirror>
            <id>ibiblio</id>
            <mirrorOf>central</mirrorOf>
            <name>Human Readable Name for this Mirror.</name>
            <url>http://mirrors.ibiblio.org/pub/mirrors/maven2/</url>
    </mirror>
    <mirror>
            <id>jboss-public-repository-group</id>
            <mirrorOf>central</mirrorOf>
            <name>JBoss Public Repository Group</name>
            <url>http://repository.jboss.org/nexus/content/groups/public</url>
    </mirror>
    <mirror>
            <id>google-maven-central</id>
            <name>Google Maven Central</name>
            <url>https://maven-central.storage.googleapis.com
            </url>
            <mirrorOf>central</mirrorOf>
    </mirror>
    <!-- 中央仓库在中国的镜像 -->
    <mirror>
            <id>maven.net.cn</id>
            <name>oneof the central mirrors in china</name>
            <url>http://maven.net.cn/content/groups/public/</url>
            <mirrorOf>central</mirrorOf>
    </mirror>

</mirrors>

4、步骤

不同的 Flink 版本使用的 Flink-shaded不同,1.10 版本使用 10.0
https://mirrors.tuna.tsinghua.edu.cn/apache/flink/flink-shaded-10.0/flink-shaded-10.0-src.tgz
解压后,在 pom.xml 中,添加如下内容到

<profile>
	<id>vendor-repos</id>
	<activation>
		<property>
			<name>vendor-repos</name>
		</property>
	</activation>
	<!-- Add vendor maven repositories -->
	<repositories>
		<!-- Cloudera -->
		<repository>
			<id>cloudera-releases</id>
			<url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
			<releases>
				<enabled>true</enabled>
			</releases>
			<snapshots>
				<enabled>false</enabled>
			</snapshots>
		</repository>
		<!-- Hortonworks -->
		<repository>
			<id>HDPReleases</id>
			<name>HDP Releases</name>
			<url>https://repo.hortonworks.com/content/repositories/releases/</url>
			<snapshots><enabled>false</enabled></snapshots>
			<releases><enabled>true</enabled></releases>
		</repository>
		<repository>
			<id>HortonworksJettyHadoop</id>
			<name>HDP Jetty</name>
			<url>https://repo.hortonworks.com/content/repositories/jetty-hadoop</url>
			<snapshots><enabled>false</enabled></snapshots>
			<releases><enabled>true</enabled></releases>
		</repository>
		<!-- MapR -->
		<repository>
			<id>mapr-releases</id>
			<url>https://repository.mapr.com/maven/</url>
			<snapshots><enabled>false</enabled></snapshots>
			<releases><enabled>true</enabled></releases>
		</repository>
	</repositories>
</profile>

编译对应的 flink-shaded 版本
$ mvn -T2C clean install -DskipTests -Pvendor-repos -Dhadoop.version=3.0.0-cdh6.3.1 -Dscala-2.12 -Drat.skip=true

https://mirrors.tuna.tsinghua.edu.cn/apache/flink/flink-1.10.0/flink-1.10.0-src.tgz
编译 Flink 源码
$ mvn clean install -DskipTests -Dfast -Drat.skip=true -Dhaoop.version=3.0.0-cdh6.3.1 -Pvendor-repos -Dinclude-hadoop -Dscala-2.12 -T2C

目录地址:
flink-1.10.0/flink-dist/target/flink-1.10.0-bin

注意如果需要与 Hadoop 整合使用,需要提供 hadoop classes 依赖。
将 flink-shaded 编译的 hadoop 相关依赖jar 包(flink-shaded-hadoop-2-uber-2.4.1-9.0.jar)移动至二进制包 lib/ 目录下
参考链接

4.4、测试

./bin/flink run -m yarn-cluster -yjm 1024 -ytm 1024 -ys 2 -p 2 -ynm test_wordcount ./examples/batch/WordCount.jar --input hdfs://cluster_name/tmp/a.txt -yn 2


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM