HDFS,是Hadoop Distributed File System的簡稱,是Hadoop抽象文件系統的一種實現。Hadoop抽象文件系統可以與本地系統、Amazon S3等集成,甚至可以通過Web協議(webhsfs)來操作。HDFS的文件分布在集群機器上,同時提供副本進行容錯及可靠性保證。例如客戶端寫入讀取文件的直接操作都是分布在集群各個機器上的,沒有單點性能壓力。
HDFS相關的搭建可以看我前面的一篇博文,我們今天主要來講下怎么操作hdfs的api和 hdfs命令行,
java內操作HDFS需要先配置倉庫
<repositories> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> </repository> </repositories> //導包 <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency>
例子:通過api創建目錄
this.configuration = new Configuration(); this.fileSystem = FileSystem.get(new URI(this.HDFS_PATH),configuration,"hadoop"); Path path = new Path("/hdfsapi/test"); boolean result = fileSystem.mkdirs(path);
通過API讀取文件,回寫到本地
Path path = new Path("/gwyy.txt"); FSDataInputStream fsDataInputStream = fileSystem.open(path); FileOutputStream fileOutputStream = new FileOutputStream(new File("a.txt")); byte[] buffer = new byte[1024]; int length = 0; StringBuffer sb = new StringBuffer(); while( ( length = fsDataInputStream.read(buffer)) != -1) { sb.append(new String(buffer,0,buffer.length)); fileOutputStream.write(buffer,0,buffer.length); } System.out.println(sb.toString());
HDFS 創建文件並且寫入內容
FSDataOutputStream out = fileSystem.create(new Path("/fuck.txt")); out.writeUTF("aaabbb"); out.flush(); out.close();
HDFS 重名
boolean a = fileSystem.rename(new Path("/fuck.txt"),new Path("/fuck.aaa")); System.out.println(a);
HDFS拷貝文件
fileSystem.copyFromLocalFile(new Path("a.txt"),new Path("/copy_a.txt"));
HDFS上傳大文件
InputStream in = new BufferedInputStream(new FileInputStream(new File("hive-1.1.0-cdh5.15.1.tar.gz"))); Path dst = new Path("/hive.tar.gz"); //顯示進度條 FSDataOutputStream out = fileSystem.create(dst, new Progressable() { @Override public void progress() { System.out.flush(); System.out.print('.'); } }); byte[] buffer = new byte[4096]; int length = 0; //寫入到 hdfs while((length = in.read(buffer,0,buffer.length)) != -1) { out.write(buffer,0,buffer.length); }
HDFS下載文件
fileSystem.copyToLocalFile(new Path("/fuck.aaa"),new Path("./"));
HDFS 列出所有文件
FileStatus[] fileStatuses = fileSystem.listStatus(new Path("/")); for (FileStatus f:fileStatuses) { System.out.println(f.getPath()); }
HDFS 遞歸列出文件
RemoteIterator<LocatedFileStatus> remoteIterator = fileSystem.listFiles(new Path("/"),true); while(remoteIterator.hasNext()) { LocatedFileStatus file = remoteIterator.next(); System.out.println(file.getPath()); }
HDFS查看文件區塊
FileStatus fileStatus = fileSystem.getFileStatus(new Path("/jdk-8u221-linux-x64.tar.gz")); BlockLocation[] blockLocations = fileSystem.getFileBlockLocations(fileStatus,0,fileStatus.getLen()); //查看區塊 for (BlockLocation b:blockLocations) { for (String name:b.getNames()) { System.out.println(name + b.getOffset() + b.getLength()); } }
HDFS刪除文件
如果路徑是目錄並設置為*如果為true,則刪除目錄,否則引發異常。在*對於文件,遞歸可以設置為true或false。 boolean a = fileSystem.delete(new Path("/gwyy.txt"),true); System.out.println(a);
下面我們介紹下HDFS的命令行操作
查看 hdfs 文件根目錄
hadoop fs -ls /
上傳文件到 hdfs的根目錄
hadoop fs -put gwyy.txt /
從本地拷貝文件到hdfs
hf -copyFromLocal xhc.txt /
####從本地移動文件到hdfs 本地文件刪除 hf -moveFromLocal a.txt /
查看文件內容
hadoop fs -cat /gwyy.txt hadoop fs -text /gwyy.txt
從 hdfs里拿文件到本地
hadoop fs -get /a.txt ./
HDFS創建文件夾
hadoop fs -mkdir /hdfs-test
從A文件夾移動到B文件夾
hadoop fs -mv /a.txt /hdfs-test/a.txt
文件復制操作
hadoop fs -cp /hdfs-test/a.txt /hdfs-test/a.txt.back
把多個文件合並到一起 導出來
hadoop fs -getmerge /hdfs-test ./t.txt
刪除一個文件
hf -rm /hdfs-test/a.txt.back
刪除一個目錄
hadoop fs -rmdir /hdfs-test 只能刪除空目錄 hadoop fs -rm -r /hdfs-test 刪除目錄不管有沒有東西都刪