Hadoop學習筆記之(二):實驗Hadoop的文件塊復制刪除操作感受強大的容災性


首先來了解一下HDFS的一些基本特性

HDFS設計基礎與目標

  • 硬件錯誤是常態。因此需要冗余
  • 流式數據訪問。即數據批量讀取而非隨機讀寫,Hadoop擅長做的是數據分析而不是事務處理
  • 大規模數據集
  • 簡單一致性模型。為了降低系統復雜度,對文件采用一次性寫多次讀的邏輯設計,即是文件一經寫入,關閉,就再也不能修改
  • 程序采用“數據就近”原則分配節點執行

HDFS體系結構

  • NameNode
  • DataNode
  • 事務日志
  • 映像文件
  • SecondaryNameNode

Namenode

  • 管理文件系統的命名空間
  • 記錄每個文件數據塊在各個Datanode上的位置和副本信息
  • 協調客戶端對文件的訪問
  • 記錄命名空間內的改動或空間本身屬性的改動
  • Namenode使用事務日志記錄HDFS元數據的變化。使用映像文件存儲文件系統的命名空間,包括文件映射,文件屬性等

Datanode

  • 負責所在物理節點的存儲管理
  • 一次寫入,多次讀取(不修改)
  • 文件由數據塊組成,典型的塊大小是64MB
  • 數據塊盡量散布道各個節點

讀取數據流程

  1. 客戶端要訪問HDFS中的一個文件
  2. 首先從namenode獲得組成這個文件的數據塊位置列表
  3. 根據列表知道存儲數據塊的datanode
  4. 訪問datanode獲取數據
  5. Namenode並不參與數據實際傳輸

HDFS的可靠性

  • 冗余副本策略
  • 機架策略
  • 心跳機制
  • 安全模式
  • 使用文件塊的校驗和 Checksum來檢查文件的完整性
  • 回收站
  • 元數據保護
  • 快照機制

我分別試驗了冗余副本策略/心跳機制/安全模式/回收站。下面實驗是關於冗余副本策略的。

環境:

  • Namenode/Master/jobtracker: h1/192.168.221.130
  • SecondaryNameNode: h1s/192.168.221.131
  • 四個Datanode: h2~h4 (IP段:142~144)

為以防文件太小只有一個文件塊(block/blk),我們准備一個稍微大一點的(600M)的文件,使之能分散分布到幾個datanode,再停掉其中一個看有沒有問題。
先來put一個文件(為了方便起見,建議將hadoop/bin追加到$Path變量后
:hadoop fs –put ~/Documents/IMMAUSWX201304
結束后,我們想查看一下文件塊的情況,可以去網頁上看,也可以在namenode上使用fsck命令來檢查一下,關於fsck命令
:bin/hadoop fsck /user/hadoop_admin/in/bigfile  -files -blocks -locations > ~/hadoopfiles/log1.txt
下面打印結果說明 個600M文件被划分為9個64M的blocks,並且被分散到我當前所有datanode上(共4個),看起來比較平均,

/user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s):  OK
0. blk_-4541681964616523124_1011 len=67108864 repl=3 [192.168.221.131:50010, 192.168.221.142:50010, 192.168.221.144:50010]
1. blk_4347039731705448097_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.131:50010, 192.168.221.144:50010]
2. blk_-4962604929782655181_1011 len=67108864 repl=3 [192.168.221.142:50010, 192.168.221.143:50010, 192.168.221.144:50010]
3. blk_2055128947154747381_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.142:50010, 192.168.221.144:50010]
4. blk_-2280734543774885595_1011 len=67108864 repl=3 [192.168.221.131:50010, 192.168.221.142:50010, 192.168.221.144:50010]
5. blk_6802612391555920071_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.142:50010, 192.168.221.144:50010]
6. blk_1890624110923458654_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.142:50010, 192.168.221.144:50010]
7. blk_226084029380457017_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.131:50010, 192.168.221.144:50010]
8. blk_-1230960090596945446_1011 len=60768970 repl=3 [192.168.221.142:50010, 192.168.221.143:50010, 192.168.221.144:50010]

Status: HEALTHY
Total size:    597639882 B
Total dirs:    0
Total files:   1
Total blocks (validated):      9 (avg. block size 66404431 B)
Minimally replicated blocks:   9 (100.0 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       0 (0.0 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     3.0
Corrupt blocks:                0
Missing replicas:              0 (0.0 %)
Number of data-nodes:          4
Number of racks:               1

h1s,h2,h3,h4四個DD全部參與,跑去h2 (142),h3(143) stop datanode, 從h4上面get,發現居然能夠get回,而且初步來看,size正確,看一下上圖中黃底和綠底都DEAD了,每個blk都有源可以取回,所以GET后數據仍然是完整的,從這點看hadoop確實是強大啊,load balancing也做得很不錯,數據看上去很堅強,容錯性做得不錯

1

再檢查一下,我本來想測試safemode的,結果隔一會一刷,本來有幾個blk只有1個livenode的,現在又被全部復制為確保每個有2個了!

hadoop_admin@h1:~/hadoop-0.20.2$ hadoop fsck /user/hadoop_admin/in/bigfile  -files -blocks -locations
/user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s): 
Under replicated blk_-4541681964616523124_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_4347039731705448097_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_-4962604929782655181_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_2055128947154747381_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_-2280734543774885595_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_6802612391555920071_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_1890624110923458654_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_226084029380457017_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_-1230960090596945446_1011. Target Replicas is 3 but found 2 replica(s).
0. blk_-4541681964616523124_1011 len=67108864 repl=2 [192.168.221.131:50010, 192.168.221.144:50010]
1. blk_4347039731705448097_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
2. blk_-4962604929782655181_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
3. blk_2055128947154747381_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
4. blk_-2280734543774885595_1011 len=67108864 repl=2 [192.168.221.131:50010, 192.168.221.144:50010]
5. blk_6802612391555920071_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
6. blk_1890624110923458654_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
7. blk_226084029380457017_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
8. blk_-1230960090596945446_1011 len=60768970 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]

我決定再關一個datanode,結果等了好半天也沒見namenode發現它死了,這是因為心跳機制,datanode每隔3秒會向namenode發送heartbeat指令表明它的存活,但如果namenode很長時間(5~10分鍾看設置)沒有收到heartbeat即認為這個NODE死掉了,就會做出BLOCK的復制操作,以保證有足夠的replica來保證數據有足夠的容災/錯性,現在再打印看看,發現因為只有一個live datanode,所以現在每個blk都有且只有一份

hadoop_admin@h1:~$ hadoop fsck /user/hadoop_admin/in/bigfile -files -blocks -locations
/user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s):  Under replicated blk_-4541681964616523124_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_4347039731705448097_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_-4962604929782655181_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_2055128947154747381_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_-2280734543774885595_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_6802612391555920071_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_1890624110923458654_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_226084029380457017_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_-1230960090596945446_1011. Target Replicas is 3 but found 1 replica(s).

我現在把其中一個BLK從這個僅存的Datanode中移走使之corrupt,我想實驗,重啟一個DATANODE后,會不會復員
hadoop_admin@h4:/hadoop_run/data/current$ mv blk_4347039731705448097_1011* ~/Documents/
然后為了不必要等8分鍾DN發block report,我手動修改了h4的dfs.blockreport.intervalMsec值為30000,stop datanode,再start (另外,你應該把hadoop/bin也加入到Path變量后面,這樣你可以不帶全路徑執行hadoop命令,結果,檢測它已被損壞
hadoop_admin@h1:~$ hadoop fsck /user/hadoop_admin/in/bigfile -files -blocks -locations

/user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s):  Under replicated blk_-4541681964616523124_1011. Target Replicas is 3 but found 1 replica(s).

/user/hadoop_admin/in/bigfile/USWX201304: CORRUPT block blk_4347039731705448097
Under replicated blk_-4962604929782655181_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_2055128947154747381_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_-2280734543774885595_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_6802612391555920071_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_1890624110923458654_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_226084029380457017_1011. Target Replicas is 3 but found 1 replica(s).
Under replicated blk_-1230960090596945446_1011. Target Replicas is 3 but found 1 replica(s).
MISSING 1 blocks of total size 67108864 B
0. blk_-4541681964616523124_1011 len=67108864 repl=1 [192.168.221.144:50010]
1. blk_4347039731705448097_1011 len=67108864 MISSING!
2. blk_-4962604929782655181_1011 len=67108864 repl=1 [192.168.221.144:50010]
3. blk_2055128947154747381_1011 len=67108864 repl=1 [192.168.221.144:50010]
4. blk_-2280734543774885595_1011 len=67108864 repl=1 [192.168.221.144:50010]
5. blk_6802612391555920071_1011 len=67108864 repl=1 [192.168.221.144:50010]
6. blk_1890624110923458654_1011 len=67108864 repl=1 [192.168.221.144:50010]
7. blk_226084029380457017_1011 len=67108864 repl=1 [192.168.221.144:50010]
8. blk_-1230960090596945446_1011 len=60768970 repl=1 [192.168.221.144:50010]

Status: CORRUPT
Total size:    597639882 B
Total dirs:    0
Total files:   1
Total blocks (validated):      9 (avg. block size 66404431 B)
   ********************************
   CORRUPT FILES:        1
   MISSING BLOCKS:       1
   MISSING SIZE:         67108864 B
   CORRUPT BLOCKS:       1
   ********************************
Minimally replicated blocks:   8 (88.888885 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       8 (88.888885 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     0.8888889
Corrupt blocks:                1
Missing replicas:              16 (200.0 %)
Number of data-nodes:          1
Number of racks:               1


The filesystem under path '/user/hadoop_admin/in/bigfile' is CORRUPT

我現在啟動一個DATANODE h1s(131),結果很快的在30秒之內,它就被hadoop原地滿HP復活了,現在每個blk都有了兩份replica
hadoop_admin@h1:~$ hadoop fsck /user/hadoop_admin/in/bigfile -files -blocks -locations
/user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s):  Under replicated blk_-4541681964616523124_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_4347039731705448097_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_-4962604929782655181_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_2055128947154747381_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_-2280734543774885595_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_6802612391555920071_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_1890624110923458654_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_226084029380457017_1011. Target Replicas is 3 but found 2 replica(s).
Under replicated blk_-1230960090596945446_1011. Target Replicas is 3 but found 2 replica(s).
0. blk_-4541681964616523124_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
1. blk_4347039731705448097_1011 len=67108864 repl=2 [192.168.221.131:50010, 192.168.221.144:50010]
2. blk_-4962604929782655181_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
3. blk_2055128947154747381_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
4. blk_-2280734543774885595_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
5. blk_6802612391555920071_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
6. blk_1890624110923458654_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
7. blk_226084029380457017_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
8. blk_-1230960090596945446_1011 len=60768970 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]

發現這個文件被從131成功復制回了144 (h4)。

結論:HADOOP容災太堅挺了,我現在堅信不疑了!

另外有一個沒有粘出來的提示就是,h4 datanode上有不少重新format遺留下來的badLinkBlock,在重新put同一個文件的時候,hadoop將那些老舊殘留的block文件全部都刪除了。這說明它是具有刪除無效bad block的功能的。

實驗簡單來講就是

1. put 一個600M文件,分散3個replica x 9個block 共18個blocks到4個datanode

2. 我關掉了兩個datanode,使得大部分的block只在一個datanode上存在,但因為9個很分散,所以文件能正確取回(靠的是checksum來計算文件值)

3. hadoop namenode很迅速的復制了僅有一個replica的block使之成為 3 replica(2) but only found 2

4. 我再關掉一個datanode,結果發現每個datanode被很均衡的分配了block,這樣即使只有一個datanode,也因為之前有確保2個replicas的比率,所以依然healthy

5. 我從這個僅存的datanode中刪除一個blk,namenode report這個文件corrupt,(我其實一直很希望能進safemode,結果-safemode get一直是OFF)

6. 然后我啟動另外一個datanode,30秒不到,這個missing的block被從這個新啟動的datanode中迅速“擴展”為2個replicas

 

容災性非常可靠,如果使用至少三個rack的話,數據會非常堅挺,對HADOOP信任值 level up!


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM