在HDFS中,提供了fsck命令,用於檢查HDFS上文件和目錄的健康狀態、獲取文件的block信息和位置信息等。
我們在master機器上執行hdfs fsck
就可以看到這個命令的用法。
[hadoop-twq@master ~]$ hdfs fsck Usage: hdfs fsck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] [-includeSnapshots] [-storagepolicies] [-blockId <blk_Id>] <path> start checking from this path -move move corrupted files to /lost+found -delete delete corrupted files -files print out files being checked -openforwrite print out files opened for write -includeSnapshots include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it -list-corruptfileblocks print out list of missing blocks and files they belong to -blocks print out block report -locations print out locations for every block -racks print out network topology for data-node locations -storagepolicies print out storage policy summary for the blocks -blockId print out which file this blockId belongs to, locations (nodes, racks) of this block, and other diagnostics info (under replicated, corrupted or not, etc)
查看文件目錄的健康信息
執行如下的命令:
hdfs fsck /user/hadoop-twq/cmd
可以查看/user/hadoop-twq/cmd
目錄的健康信息:
其中有一個比較重要的信息,就是Corrupt blocks
,表示損壞的數據塊的數量
查看文件中損壞的塊 (-list-corruptfileblocks)
[hadoop-twq@master ~]$ hdfs fsck /user/hadoop-twq/cmd -list-corruptfileblocks Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&listcorruptfileblocks=1&path=%2Fuser%2Fhadoop-twq%2Fcmd The filesystem under path '/user/hadoop-twq/cmd' has 0 CORRUPT files
上面的命令可以找到某個目錄下面的損壞的數據塊,但是上面表示沒有看到壞的數據塊
損壞文件的處理
將損壞的文件移動至/lost+found目錄 (-move)
hdfs fsck /user/hadoop-twq/cmd -move
刪除有損壞數據塊的文件 (-delete)
hdfs fsck /user/hadoop-twq/cmd -delete
檢查並列出所有文件狀態(-files)
執行如下的命令:
hdfs fsck /user/hadoop-twq/cmd -files
顯示結果如下:
上面的命令可以檢查指定路徑下的所有文件的信息,包括:數據塊的數量以及數據塊的備份情況
檢查並打印正在被打開執行寫操作的文件(-openforwrite)
執行下面的命令可以檢查指定路徑下面的哪些文件正在執行寫操作:
hdfs fsck /user/hadoop-twq/cmd -openforwrite
打印文件的Block報告(-blocks)
執行下面的命令,可以查看一個指定文件的所有的Block詳細信息,需要和-files一起使用:
hdfs fsck /user/hadoop-twq/cmd/big_file.txt -files -blocks
結果如下:
如果,我們在上面的命令再加上-locations
的話,就是表示還需要打印每一個數據塊的位置信息,如下命令:
hdfs fsck /user/hadoop-twq/cmd/big_file.txt -files -blocks -locations
結果如下:
如果,我們在上面的命令再加上-racks
的話,就是表示還需要打印每一個數據塊的位置所在的機架信息,如下命令:
hdfs fsck /user/hadoop-twq/cmd/big_file.txt -files -blocks -locations -racks
結果如下:
hdfs fsck的使用場景
場景一
當我們執行如下的命令:
hdfs fsck /user/hadoop-twq/cmd
可以查看/user/hadoop-twq/cmd
目錄的健康信息:
我們可以看出,有兩個文件的數據塊的備份數量不足,這個我們可以通過如下的命令,重新設置兩個文件數據塊的備份數:
## 將文件big_file.txt對應的數據塊備份數設置為1 hadoop fs -setrep -w 1 /user/hadoop-twq/cmd/big_file.txt ## 將文件parameter_test.txt對應的數據塊備份數設置為1 hadoop fs -setrep -w 1 /user/hadoop-twq/cmd/parameter_test.txt
上面命令中 -w
參數表示等待備份數到達指定的備份數,加上這個參數后再執行的話,則需要比較長的時間
執行完上面的命令后,我們再來執行下面的命令:
hdfs fsck /user/hadoop-twq/cmd
結果如下:
場景二
當我們訪問HDFS的WEB UI的時候,出現了如下的警告信息:
表明有一個數據塊丟失了,這個時候我們執行下面的命令來確定是哪一個文件的數據塊丟失了:
[hadoop-twq@master ~]$ hdfs fsck / -list-corruptfileblocks Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&listcorruptfileblocks=1&path=%2F The list of corrupt files under path '/' are: blk_1073744153 /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml The filesystem under path '/' has 1 CORRUPT files
發現是數據塊blk_1073744153
丟失了,這個數據塊是淑文文件/tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml
的。
如果出現這種場景是因為在DataNode中沒有這個數據塊,但是在NameNode的元數據中有這個數據塊的信息,我們可以執行下面的命令,把這些沒用的數據塊信息刪除掉,如下:
[hadoop-twq@master ~]$ hdfs fsck /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/ -delete Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&delete=1&path=%2Ftmp%2Fhadoop-yarn%2Fstaging%2Fhistory%2Fdone_intermediate%2Fhadoop-twq FSCK started by hadoop-twq (auth:SIMPLE) from /192.168.126.130 for path /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq at Tue Mar 05 19:18:00 EST 2019 .................................................................................................... .. /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml: CORRUPT blockpool BP-1639452328-192.168.126.130-1525478508894 block blk_1073744153 /tmp/hadoop-yarn/staging/history/done_intermediate/hadoop-twq/job_1528682852398_0015_conf.xml: MISSING 1 blocks of total size 220262 B................................................................................................... .................................................................................................... ........................Status: CORRUPT Total size: 28418833 B Total dirs: 1 Total files: 324 Total symlinks: 0 Total blocks (validated): 324 (avg. block size 87712 B) ******************************** UNDER MIN REPL'D BLOCKS: 1 (0.30864197 %) dfs.namenode.replication.min: 1 CORRUPT FILES: 1 MISSING BLOCKS: 1 MISSING SIZE: 220262 B CORRUPT BLOCKS: 1 ******************************** Minimally replicated blocks: 323 (99.69136 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 1 Average block replication: 0.99691355 Corrupt blocks: 1 Missing replicas: 0 (0.0 %) Number of data-nodes: 2 Number of racks: 1 FSCK ended at Tue Mar 05 19:18:01 EST 2019 in 215 milliseconds
然后執行:
[hadoop-twq@master ~]$ hdfs fsck / -list-corruptfileblocks Connecting to namenode via http://master:50070/fsck?ugi=hadoop-twq&listcorruptfileblocks=1&path=%2F The filesystem under path '/' has 0 CORRUPT files
丟失的數據塊沒有的,被刪除了。我們也可以刷新WEB UI,也沒有了警告信息: