NFS服務對Hadoop（hdfs）集群影響測試(轉)

本文轉載自查看原文 2013-01-20 20:10 4885 hadoop

測試環境，系統信息

$uname -a
Linux 10.**.**.15 2.6.32-220.17.1.tb619.el6.x86_64 #1 SMP Fri Jun 8 13:48:13CST 2012 x86_64 x86_64 x86_64 GNU/Linux

hadoop和hbase版本信息：

hadoop-0.20.2-cdh3u4

hbase-0.90-adh1u7.1

10.**.**.12 NFS Server端，提供NFS服務

10.**.**.15 作為HDFS NameNode掛載10.**.**.12 NFS共享目錄

以ganglia-5.rpm作為文件操作對象，大小在3m左右。

hadoop/conf/hdfs-site.xml 關於NFS配置信息如下：

  <property>
       <name>dfs.name.dir</name>
       <value>/u01/hbase/nndata/local,/u01/hbase/nndata/nfs</value>

</property>

NFS Server端服務停掉情況

NFS Server端服務停掉，執行：

$sudo service nfs status
rpc.svcgssd is stopped
rpc.mountd is stopped
nfsd is stopped
rpc.rquotad is stopped

此時，HDFS繼續put，但是一直hang住，不會退出。

NFS服務重啟后，HDFS繼續put，仍然hang住。重新執行put操作，hang住后timeout時長服務繼續，提示文件存在，執行：

$sh hadoop/bin/hadoop fs -ls hdfs://10.**.**.15:9516/ 發現目錄下存在同名空文件。

$tail -f hadoop-**-namenode-10.**.**.15.log 時日志無輸出，直到put操作繼續后才有日志輸出，一次輸出這段時間操作的所有日志，包括put失敗的文件異常信息。

2012-10-23 11:22:38,956 WARN  org.apache.hadoop.ipc.Server: IPC Server Responder, call  create(/ganglia-4.rpm, rwxr-xr-x, DFSClient_-621134164, false, 3, 67108864) from  10.**.**.15:47771: output error
2012-10-23 11:22:38,957 INFO org.apache.hadoop.ipc.Server: IPC Server handler  7 on 9516 caught: java.nio.channels.ClosedChannelException
        at  sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
        at  sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
        at  org.apache.hadoop.ipc.Server.channelWrite(Server.java:1763)
        at  org.apache.hadoop.ipc.Server.access$2000(Server.java:95)
        at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:773)
        at  org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:837)
        at  org.apache.hadoop.ipc.Server$Handler.run(Server.java:1462)
……

2012-10-23 11:22:38,963 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:** (auth:SIMPLE) cause:org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /ganglia-5.rpm for DFSClient_382171631 on client 10.**.**.15, because this file is already being created by DFSClient_-1964937422 on 10.**.**.15
......

2012-10-23 14:40:11,672 WARN org.apache.hadoop.ipc.Server: IPC Server  Responder, call getDatanodeReport(LIVE) from 10.**.**.15:54929: output error
2012-10-23 14:40:11,672 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll  Edit Log from 10.**.**.12
2012-10-23 14:40:11,672 INFO org.apache.hadoop.ipc.Server: IPC Server handler  0 on 9516 caught: java.nio.channels.ClosedChannelException
        at  sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
        at  sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
        at  org.apache.hadoop.ipc.Server.channelWrite(Server.java:1763)
        at  org.apache.hadoop.ipc.Server.access$2000(Server.java:95)
        at  org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:773)
        at  org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:837)
        at  org.apache.hadoop.ipc.Server$Handler.run(Server.java:1462)
……

2012-10-23 14:40:11,672 INFO  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:  8 Total time for transactions(ms): 1Number of transactions batched in Syncs:  0 Number of syncs: 4 SyncTimes(ms): 4 1007521
2012-10-23 14:40:12,152 INFO org.apache.hadoop.hdfs.server.namenode.GetImageServlet:  Downloaded new fsimage with checksum: 444a843721bd52a951673a1ba7aecb37
2012-10-23 14:40:12,154 INFO  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll FSImage  from 10.**.**.12
2012-10-23 14:40:12,154 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:  Number of transactions: 0 Total time for transactions(ms): 0Number of  transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 4 16

此時，NFS Server端的hbase_home/nndata/share/current/edits文件修改時間在nfs服務恢復后重新更新。

恢復nfs后重新完整put文件后

$sh hadoop/bin/hadoop fs -put ~/dba-ganglia-gmetad-3.1.7-2.x86_64.rpm hdfs://10.**.**.15:9516/ganglia-5.rpm：

log日志信息如下：

2012-10-23 11:31:08,794 INFO  org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:  25 Total time for transactions(ms): 3Number of transactions batched in Syncs:  2 Number of syncs: 15 SyncTimes(ms): 10 676853
2012-10-23 11:31:08,804 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*  NameSystem.allocateBlock: /ganglia-5.rpm. blk_2675602071792190621_3890
2012-10-23 11:31:08,855 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*  NameSystem.addStoredBlock: blockMap updated: 10.**.**.13:50010 is added to  blk_2675602071792190621_3890 size 38020
……
2012-10-23 11:31:08,860 INFO org.apache.hadoop.hdfs.StateChange: Removing  lease on  file /ganglia-5.rpm from client DFSClient_-19034129
2012-10-23 11:31:08,861 INFO org.apache.hadoop.hdfs.StateChange: DIR*  NameSystem.completeFile: file /ganglia-5.rpm is closed by DFSClient_-19034129

使用$sudo service nfs stop 關閉nfs服務時，namenode輸出以下信息，此信息不是因為NFS服務停止通知而產生，是定期同步而產生：

2012-10-23 11:33:54,815 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 2 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 0

查詢HDFS的safemode狀態：

$sh hadoop/bin/hadoop dfsadmin -safemode get
Safe mode is OFF

可知HDFS並未自動切換到SafeMode。

NFS Server端服務掛掉情況

NFS Server端服務掛掉，執行：

$sudo killall -9 nfsd

查看NFS狀態：

$sudo service nfs status
rpc.svcgssd is stopped
rpc.mountd (pid 10677) is running...
nfsd is stopped
rpc.rquotad (pid 10645) is running...

執行

$sh hadoop/bin/hadoop dfsadmin -report

和

$sh hadoop/bin/hadoop fs -put ~/dba-ganglia.rpm hdfs://10.**.**.15:9516/ganglia-13.rpm

測試文件put操作，兩個操作都會被hang住。並且和測試用例1中的情況一樣。report和put 會話會一直hang住，並且不會timeout退出。

此時重啟NFS服務，會在一個超時時間后自動恢復。

測試得出結論

1. NFS掛掉后，如果客戶端涉及到HDFS要讀的文件均在本機datanode上將不受影響（eg：$shhadoop/bin/hadoop fs -cat hdfs://10.**.**.15:9516/11.txt 可以讀到文本內容）；

2.NFS掛掉后，客戶端涉及到HDFS文件寫操作將會被一直hang住，不會超時退出。

3.NFS掛掉后（包括servicenfs stop 或者killall nfsd 服務），HDFS端寫操作將會一直被hang住，在NFS服務恢復之后，HDFS寫操作會繼續，並且會正常操作完成，這段時間內操作的詳細日志也會在NFS服務恢復正常之后批量輸出到hadoop_namenode.log中，后續測試會討論對該超時的配置。

轉自 http://hi.baidu.com/richarwu/item/0c900469d48e9f2069105b9f

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [測試] 試用Hadoop 2.2中的HDFS NFS hadoop集群搭建（hdfs） hadoop集群間的hdfs文件拷貝 hadoop集群hdfs-site.xml hadoop集群之HDFS和YARN啟動和停止命令 Hadoop（五）搭建Hadoop客戶端與Java訪問HDFS集群 hbase+hadoop+hdfs集群搭建集成spring 解決Hadoop集群hdfs無法啟動DataNode的問題 HDFS NFS Gateway 【轉】HADOOP HDFS BALANCER介紹及經驗總結