一次失敗的嘗試hdfs的java客戶端編寫(在linux下使用eclipse)


一次失敗的嘗試hdfs的java客戶端編寫(在linux下使用eclipse)


給centOS安裝圖形界面

GNOME桌面環境
https://blog.csdn.net/wh211212/article/details/52937299


在linux下安裝eclipse開發hadoop的配置

file -> properties -> java build path -> add libiary -> user libiary

點擊按鈕 user libiary -> new 取名叫 hdfslib

選中 hdfslib 的狀態下點擊 Add External JARs ,在安裝路徑下找到

  1. hdfs 的核心jar包:hadoop-3.0.0/share/hadoop/hdfs/hadoop-hdfs-3.0.0.jar,點OK。及其依賴:hadoop-3.0.0/share/hadoop/hdfs/lib,全選然后 OK
  2. commons 的核心jar包:hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar,ok,及其依賴:hadoop-3.0.0/share/hadoop/common/lib,全選然后 OK

然后在eclipse中創建一個工程,新建一個class,把core-site.xml和hdfs-site.xml兩個配置文件拷貝到src中。


代碼部分

寫一段代碼完成下載操作:開一個hdfs輸入流,在本地開一個輸出流,把輸入流拷貝到輸出流里就完成了下載

FileSystem(org.apache.hadoop.fs):訪問hdfs的核心入口類 是一個抽象類,用靜態方法 get() 拿到它的對象,這個對象就是hdfs的一個客戶端
Configuration(org.apache.hadoop.conf):讀配置文件,保存配置文件中的各種key-value

package cn.hadoop.hdfs;

import java.io.FileOutputStream;
import java.io.IOException;

import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class HdfsUtil {
	
	public static void main(String[] args) throws IOException {
		
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(conf);
		Path src = new Path("hdfs://node01:9000/jdk-8u161-linux-x64.tar.gz");
		//打開輸入流
		FSDataInputStream inputStream = fs.open(src);
		//本地輸出流
		FileOutputStream outputStream = new FileOutputStream("/home/app/download/jdk.tgz");
		//將輸入流寫入輸出流
		IOUtils.copy(inputStream, outputStream);
	}
	
}

運行報錯 1

hadoop 3.0.0 No FileSystem for scheme "hdfs"

參照:https://blog.csdn.net/u013281331/article/details/17992077

在core-site.xml中加入

<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
<description>The FileSystem for hdfs: uris.</description>
</property>

(或者在代碼中寫一行 conf.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());

運行報錯 2

Class org.apache.hadoop.hdfs.DistributedFileSystem not found

參照:https://stackoverflow.com/questions/34410376/no-filesystem-for-schemehdfs-and-class-org-apache-hadoop-distributedfilesystem

去官網下載導入了hadoop-core-1.2.1.jar之后好了

運行報錯 3

之前的報錯是沒有了,但再次GG了

Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(Ljava/lang/String;)Ljava/net/InetSocketAddress;

看到有說是hadoop本地庫是32位的,而在64位的服務器上就會有問題(估計是老版本)。但是查看本地庫

cd hadoop-2.4.1/lib/native
file libhadoop.so.1.0.0

看到就是64-bit的,說明不是這個問題。

命令行運行如下代碼打印錯誤信息(參照:https://blog.csdn.net/znb769525443/article/details/51507283)

export HADOOP_ROOT_LOGGER=DEBUG,console  
hadoop fs -text /test/data/origz/access.log.gz  

打印出來一大堆

2018-03-25 15:35:15,338 DEBUG util.Shell: setsid exited with exit code 0
2018-03-25 15:35:15,379 DEBUG conf.C8onfiguration: parsing URL jar:file:/home/thousfeet/app/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar!/core-default.xml
2018-03-25 15:35:15,393 DEBUG conf.Configuration: parsing input stream sun.net.www.protocol.jar.JarURLConnection\(JarURLInputStream@26653222 2018-03-25 15:35:16,054 DEBUG conf.Configuration: parsing URL file:/home/thousfeet/app/hadoop-3.0.0/etc/hadoop/core-site.xml 2018-03-25 15:35:16,054 DEBUG conf.Configuration: parsing input stream java.io.BufferedInputStream@5a8e6209 2018-03-25 15:35:17,071 DEBUG core.Tracer: sampler.classes = ; loaded no samplers 2018-03-25 15:35:17,173 DEBUG core.Tracer: span.receiver.classes = ; loaded no span receivers 2018-03-25 15:35:18,565 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation\)UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)])
2018-03-25 15:35:18,570 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation\(UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)]) 2018-03-25 15:35:18,570 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation\)UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[GetGroups])
2018-03-25 15:35:18,571 DEBUG lib.MutableMetricsFactory: field private org.apache.hadoop.metrics2.lib.MutableGaugeLong org.apache.hadoop.security.UserGroupInformation\(UgiMetrics.renewalFailuresTotal with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[Renewal failures since startup]) 2018-03-25 15:35:18,571 DEBUG lib.MutableMetricsFactory: field private org.apache.hadoop.metrics2.lib.MutableGaugeInt org.apache.hadoop.security.UserGroupInformation\)UgiMetrics.renewalFailures with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[Renewal failures since last successful login])
2018-03-25 15:35:18,572 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
2018-03-25 15:35:18,651 DEBUG security.SecurityUtil: Setting hadoop.security.token.service.use_ip to true
2018-03-25 15:35:18,738 DEBUG security.Groups: Creating new Groups object
2018-03-25 15:35:18,740 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
2018-03-25 15:35:18,868 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
2018-03-25 15:35:18,869 DEBUG security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
2018-03-25 15:35:18,869 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping
2018-03-25 15:35:19,088 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
2018-03-25 15:35:19,157 DEBUG security.UserGroupInformation: hadoop login
2018-03-25 15:35:19,157 DEBUG security.UserGroupInformation: hadoop login commit
2018-03-25 15:35:19,256 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: thousfeet
2018-03-25 15:35:19,256 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: thousfeet" with name thousfeet
2018-03-25 15:35:19,257 DEBUG security.UserGroupInformation: User entry: "thousfeet"
2018-03-25 15:35:19,257 DEBUG security.UserGroupInformation: UGI loginUser:thousfeet (auth:SIMPLE)
2018-03-25 15:35:19,258 DEBUG core.Tracer: sampler.classes = ; loaded no samplers
2018-03-25 15:35:19,258 DEBUG core.Tracer: span.receiver.classes = ; loaded no span receivers
2018-03-25 15:35:19,258 DEBUG fs.FileSystem: Loading filesystems
2018-03-25 15:35:19,304 DEBUG fs.FileSystem: file:// = class org.apache.hadoop.fs.LocalFileSystem from /home/thousfeet/app/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar
2018-03-25 15:35:19,339 DEBUG fs.FileSystem: viewfs:// = class org.apache.hadoop.fs.viewfs.ViewFileSystem from /home/thousfeet/app/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar
2018-03-25 15:35:19,343 DEBUG fs.FileSystem: ftp:// = class org.apache.hadoop.fs.ftp.FTPFileSystem from /home/thousfeet/app/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar
2018-03-25 15:35:19,349 DEBUG fs.FileSystem: har:// = class org.apache.hadoop.fs.HarFileSystem from /home/thousfeet/app/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar
2018-03-25 15:35:19,368 DEBUG fs.FileSystem: http:// = class org.apache.hadoop.fs.http.HttpFileSystem from /home/thousfeet/app/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar
2018-03-25 15:35:19,395 DEBUG fs.FileSystem: https:// = class org.apache.hadoop.fs.http.HttpsFileSystem from /home/thousfeet/app/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar
2018-03-25 15:35:19,450 DEBUG fs.FileSystem: hdfs:// = class org.apache.hadoop.hdfs.DistributedFileSystem from /home/thousfeet/app/hadoop-3.0.0/share/hadoop/hdfs/hadoop-hdfs-client-3.0.0.jar
2018-03-25 15:35:20,568 DEBUG fs.FileSystem: webhdfs:// = class org.apache.hadoop.hdfs.web.WebHdfsFileSystem from /home/thousfeet/app/hadoop-3.0.0/share/hadoop/hdfs/hadoop-hdfs-client-3.0.0.jar
2018-03-25 15:35:20,569 DEBUG fs.FileSystem: swebhdfs:// = class org.apache.hadoop.hdfs.web.SWebHdfsFileSystem from /home/thousfeet/app/hadoop-3.0.0/share/hadoop/hdfs/hadoop-hdfs-client-3.0.0.jar
2018-03-25 15:35:20,569 DEBUG fs.FileSystem: Looking for FS supporting hdfs
2018-03-25 15:35:20,569 DEBUG fs.FileSystem: looking for configuration option fs.hdfs.impl
2018-03-25 15:35:20,654 DEBUG fs.FileSystem: Looking in service filesystems for implementation class
2018-03-25 15:35:20,655 DEBUG fs.FileSystem: FS for hdfs is class org.apache.hadoop.hdfs.DistributedFileSystem
2018-03-25 15:35:20,760 DEBUG impl.DfsClientConf: dfs.client.use.legacy.blockreader.local = false
2018-03-25 15:35:20,761 DEBUG impl.DfsClientConf: dfs.client.read.shortcircuit = false
2018-03-25 15:35:20,761 DEBUG impl.DfsClientConf: dfs.client.domain.socket.data.traffic = false
2018-03-25 15:35:20,761 DEBUG impl.DfsClientConf: dfs.domain.socket.path =
2018-03-25 15:35:20,823 DEBUG hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.min-replication to 0
2018-03-25 15:35:20,928 DEBUG retry.RetryUtils: multipleLinearRandomRetry = null
2018-03-25 15:35:20,989 DEBUG ipc.Server: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine\(RpcProtobufRequest, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine\)Server$ProtoBufRpcInvoker@272113c4
2018-03-25 15:35:21,060 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@2794eab6
2018-03-25 15:35:23,045 DEBUG unix.DomainSocketWatcher: org.apache.hadoop.net.unix.DomainSocketWatcher$2@4b4bda67: starting with interruptCheckPeriodMs = 60000
2018-03-25 15:35:23,115 DEBUG util.PerformanceAdvisory: Both short-circuit local reads and UNIX domain socket are disabled.
2018-03-25 15:35:23,123 DEBUG sasl.DataTransferSaslUtil: DataTransferProtocol not using SaslPropertiesResolver, no QOP found in configuration for dfs.data.transfer.protection
2018-03-25 15:35:23,329 DEBUG ipc.Client: The ping interval is 60000 ms.
2018-03-25 15:35:23,331 DEBUG ipc.Client: Connecting to node01/192.168.216.100:9000
2018-03-25 15:35:24,267 DEBUG ipc.Client: IPC Client (2146338580) connection to node01/192.168.216.100:9000 from thousfeet: starting, having connections 1
2018-03-25 15:35:24,381 DEBUG ipc.Client: IPC Client (2146338580) connection to node01/192.168.216.100:9000 from thousfeet sending #0 org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo
2018-03-25 15:35:24,427 DEBUG ipc.Client: IPC Client (2146338580) connection to node01/192.168.216.100:9000 from thousfeet got value #0
2018-03-25 15:35:24,427 DEBUG ipc.ProtobufRpcEngine: Call: getFileInfo took 1193ms
text: `/text/data/origz/access.log.gz': No such file or directory
2018-03-25 15:35:24,575 DEBUG ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@2794eab6
2018-03-25 15:35:24,576 DEBUG ipc.Client: removing client from cache: org.apache.hadoop.ipc.Client@2794eab6
2018-03-25 15:35:24,576 DEBUG ipc.Client: stopping actual client because no more references remain: org.apache.hadoop.ipc.Client@2794eab6
2018-03-25 15:35:24,576 DEBUG ipc.Client: Stopping client
2018-03-25 15:35:24,598 DEBUG ipc.Client: IPC Client (2146338580) connection to node01/192.168.216.100:9000 from thousfeet: closed
2018-03-25 15:35:24,598 DEBUG ipc.Client: IPC Client (2146338580) connection to node01/192.168.216.100:9000 from thousfeet: stopped, remaining connections 0
2018-03-25 15:35:24,699 DEBUG util.ShutdownHookManager: ShutdownHookManger complete shutdown.

感覺問題出在

Both short-circuit local reads and UNIX domain socket are disabled.

查了一大堆的的博客教程都不對,直接找到官網去了 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html

給hdfs-site.xml配置加上

 <property>
    <name>dfs.client.read.shortcircuit</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.client.use.legacy.blockreader.local</name>
    <value>true</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir.perm</name>
    <value>750</value>
  </property>
  <property>
    <name>dfs.block.local-path-access.user</name>
    <value>foo,bar</value>
  </property>

還是不行

參考 https://www.cnblogs.com/dandingyy/archive/2013/03/08/2950696.html ,在run configuration中 Arguments/VM arguments加上 -Djava.library.path=/home/thousfeet/app/hadoop-3.0.0/lib/native

運行報錯 4

之前報錯的第一行unable xxx不見了...但第二行的報錯還在

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(Ljava/lang/String;)Ljava/net/InetSocketAddress;

參考 https://blog.csdn.net/yzl_8877/article/details/53216923

		java.net.URL urlOfClass = NameNode.class.getClassLoader().getResource("com/sun/jndi/dns/NameNode.class"); //InsertProgarm是該語句所在的類
		 System.out.println(urlOfClass);

輸出

jar:file:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/lib/rt.jar!/com/sun/jndi/dns/NameNode.class

但是我總不能把 rt.jar 移除了吧!甚至都有點想修改rt.jar把這個class的getAddress()方法給去了,但是動了JRE的標准API后續不知道的某些地方有牽連就會很爆炸()

到這里瓶頸了,還沒解決

猜測可能最好還是用maven直接配置(?)...打算再折騰半天,如果還解決不了就算了,反正現在的需求來看暫時不會用上這個。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM