HDFS中的fsck命令(檢查數據塊是否健康)


在HDFS中,提供了fsck命令,用於檢查HDFS上文件和目錄的健康狀態、獲取文件的block信息和位置信息等。

我們在master機器上執行hdfs fsck就可以看到這個命令的用法。

[hadoop-twq@master ~]$ hdfs fsck
Usage: hdfs fsck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] [-includeSnapshots] [-storagepolicies] [-blockId <blk_Id>]
	<path>	start checking from this path
	-move	move corrupted files to /lost+found
	-delete	delete corrupted files
	-files	print out files being checked
	-openforwrite	print out files opened for write
	-includeSnapshots	include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it
	-list-corruptfileblocks	print out list of missing blocks and files they belong to
	-blocks	print out block report
	-locations	print out locations for every block
	-racks	print out network topology for data-node locations
	-storagepolicies	print out storage policy summary for the blocks
	-blockId	print out which file this blockId belongs to, locations (nodes, racks) of this block, and other diagnostics info (under replicated, corrupted or not, etc)

 

查看文件目錄的健康信息

執行如下的命令:

hdfs fsck /user/hadoop-twq/cmd

可以查看/user/hadoop-twq/cmd目錄的健康信息:

 

 

 其中有一個比較重要的信息,就是Corrupt blocks,表示損壞的數據塊的數量

查看文件中損壞的塊 (-list-corruptfileblocks)

[hadoop-twq@master ~]$ hdfs fsck /user/hadoop-twq/cmd -list-corruptfileblocks
Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&listcorruptfileblocks=1&path=%2Fuser%2Fhadoop-twq%2Fcmd
The filesystem under path '/user/hadoop-twq/cmd' has 0 CORRUPT files

上面的命令可以找到某個目錄下面的損壞的數據塊,但是上面表示沒有看到壞的數據塊

損壞文件的處理

將損壞的文件移動至/lost+found目錄 (-move)

hdfs fsck /user/hadoop-twq/cmd -move

 刪除有損壞數據塊的文件 (-delete)

hdfs fsck /user/hadoop-twq/cmd -delete

檢查並列出所有文件狀態(-files)

執行如下的命令:

hdfs fsck /user/hadoop-twq/cmd -files

 顯示結果如下:

 

 

 上面的命令可以檢查指定路徑下的所有文件的信息,包括:數據塊的數量以及數據塊的備份情況

檢查並打印正在被打開執行寫操作的文件(-openforwrite)

執行下面的命令可以檢查指定路徑下面的哪些文件正在執行寫操作:

hdfs fsck /user/hadoop-twq/cmd -openforwrite

打印文件的Block報告(-blocks)

 執行下面的命令,可以查看一個指定文件的所有的Block詳細信息,需要和-files一起使用: 

hdfs fsck /user/hadoop-twq/cmd/big_file.txt -files -blocks

  結果如下:

 

 如果,我們在上面的命令再加上-locations的話,就是表示還需要打印每一個數據塊的位置信息,如下命令:

hdfs fsck /user/hadoop-twq/cmd/big_file.txt -files -blocks -locations

  結果如下:

 

 如果,我們在上面的命令再加上-racks的話,就是表示還需要打印每一個數據塊的位置所在的機架信息,如下命令:

hdfs fsck /user/hadoop-twq/cmd/big_file.txt -files -blocks -locations -racks

  結果如下:

 

 

hdfs fsck的使用場景

場景一

當我們執行如下的命令:

hdfs fsck /user/hadoop-twq/cmd

可以查看/user/hadoop-twq/cmd目錄的健康信息:   

 

 我們可以看出,有兩個文件的數據塊的備份數量不足,這個我們可以通過如下的命令,重新設置兩個文件數據塊的備份數:

## 將文件big_file.txt對應的數據塊備份數設置為1
hadoop fs -setrep -w 1 /user/hadoop-twq/cmd/big_file.txt
## 將文件parameter_test.txt對應的數據塊備份數設置為1
hadoop fs -setrep -w 1 /user/hadoop-twq/cmd/parameter_test.txt

上面命令中 -w 參數表示等待備份數到達指定的備份數,加上這個參數后再執行的話,則需要比較長的時間  

執行完上面的命令后,我們再來執行下面的命令:

hdfs fsck /user/hadoop-twq/cmd

結果如下:

  

 

 

場景二

當我們訪問HDFS的WEB UI的時候,出現了如下的警告信息:

 

 表明有一個數據塊丟失了,這個時候我們執行下面的命令來確定是哪一個文件的數據塊丟失了:

[hadoop-twq@master ~]$ hdfs fsck / -list-corruptfileblocks
Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&listcorruptfileblocks=1&path=%2F
The list of corrupt files under path '/' are:
blk_1073744153	/tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml
The filesystem under path '/' has 1 CORRUPT files

  

發現是數據塊blk_1073744153丟失了,這個數據塊是淑文文件/tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml的。

如果出現這種場景是因為在DataNode中沒有這個數據塊,但是在NameNode的元數據中有這個數據塊的信息,我們可以執行下面的命令,把這些沒用的數據塊信息刪除掉,如下:

[hadoop-twq@master ~]$ hdfs fsck /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/ -delete
Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&delete=1&path=%2Ftmp%2Fhadoop-yarn%2Fstaging%2Fhistory%2Fdone_intermediate%2Fhadoop-twq
FSCK started by hadoop-twq (auth:SIMPLE) from /192.168.126.130 for path /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq at Tue Mar 05 19:18:00 EST 2019
....................................................................................................
..
/tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml: CORRUPT blockpool BP-1639452328-192.168.126.130-1525478508894 block blk_1073744153

/tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml: MISSING 1 blocks of total size 220262 B...................................................................................................
....................................................................................................
........................Status: CORRUPT
 Total size:	28418833 B
 Total dirs:	1
 Total files:	324
 Total symlinks:		0
 Total blocks (validated):	324 (avg. block size 87712 B)
  ********************************
  UNDER MIN REPL'D BLOCKS:	1 (0.30864197 %)
  dfs.namenode.replication.min:	1
  CORRUPT FILES:	1
  MISSING BLOCKS:	1
  MISSING SIZE:		220262 B
  CORRUPT BLOCKS: 	1
  ********************************
 Minimally replicated blocks:	323 (99.69136 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	1
 Average block replication:	0.99691355
 Corrupt blocks:		1
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		2
 Number of racks:		1
FSCK ended at Tue Mar 05 19:18:01 EST 2019 in 215 milliseconds

  然后執行:

[hadoop-twq@master ~]$ hdfs fsck / -list-corruptfileblocks
Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&listcorruptfileblocks=1&path=%2F
The filesystem under path '/' has 0 CORRUPT files

  

丟失的數據塊沒有的,被刪除了。我們也可以刷新WEB UI,也沒有了警告信息:

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM