[測試] 試用Hadoop 2.2中的HDFS NFS

本文轉載自查看原文 2013-11-27 14:36 3876 BigData/ NFS/ hadoop

Hadoop 2.2中正式啟用了hdfs nfs功能，使得hdfs的通用性邁進了一大步。在公司讓小朋友搭建了一下，然后我自己進行了一點簡單的試驗，有一點收獲，記錄在此。

理論

使用hdfs nfs功能的話，數據訪問路徑如上圖：用戶或程序通過Linux自帶的nfs client訪問hdfs nfs服務，然后再由nfs網關作為hdfs的客戶端訪問hdfs。

這張圖中，中間的節點就是nfs代理服務器（hdfs nfs proxy）或nfs網關（hdfs nfs gateway）。藍色代表該模塊是一個進程或服務，綠色代表該模塊是一個庫。圖中還畫了兩條虛線，下、上線分別表示操作系統級別和分布式操作系統(hadpp)級別的內核態與用戶態分界。

部署

在nfs網關上部署hdfs nfs服務所需要的程序包，按hadoop 2.2的部署方式，應該存在這兩個文件：

share/hadoop/common/hadoop-nfs-2.2.0.jar

share/hadoop/hdfs/hadoop-hdfs-nfs-2.2.0.jar

配置文件不需要改，使用默認即可；默認的幾個配置分別是nfs的服務端口（標准的2049）、mount的監聽端口（4242），還有一個dump目錄（/tmp/.hdfs-nfs）與寫邏輯有關，暫不明原理。

部署完成后，啟用服務，需要依次啟動portmap和nfs兩個服務；

$ hadoop-daemon.sh start portmap

$ hadoop-daemon.sh start nfs3

注意，portmap需要用root用戶啟動（因為portmap標准端口111，小於1024，是超級資源），而nfs服務應該用hdfs的超級用戶啟動。如果出現沖突，應該將操作系統本身的nfs服務停掉。

啟動完成后，檢查確認是否可用，其中nfs_server_ip是nfs網關的地址：

$ rpcinfo -p $nfs_server_ip

program vers proto   port
100005    1   tcp   4242  mountd
100000    2   udp    111  portmapper
100005    3   tcp   4242  mountd
100005    2   udp   4242  mountd
100003    3   tcp   2049  nfs
100000    2   tcp    111  portmapper
100005    3   udp   4242  mountd
100005    1   udp   4242  mountd
100005    2   tcp   4242  mountd

$ showmount -e $nfs_server_ip

Export list for SY-0245:
/ *

掛載NFS服務

創建掛載的目錄

$ mkdir /mnt/hdfs

安裝mount.nfs

$ sudo apt-get install nfs-common

開始掛載

$ mount.nfs $nfs_server_ip:/ /mnt/hdfs

試用及分析

嘗試訪問/mnt/hdfs，試用了簡單的ls、cp、rm等操作，也進行了md5sum，都可以正常使用，而且響應速度明顯快於通過FsShell進行操作，這應該是得益於nfs的wcc緩存及hdfs nfs的實現中對連接的緩存；

但hdfs nfs是否是一個完全兼容標准文件系統接口的實現呢，為此我測試了一下最難處理的隨機寫和復寫，代碼如下，簡單的說，就是做三次寫，第一次寫在文件頭（字符1），第二次寫在文件尾（字符2），第三次寫在文件中間（字符3）：

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

void usage(char* argv[]) {
  fprintf(stdout, "%s <file_length>\n", argv[0]);
  fprintf(stdout, "NOTE:\n");
  fprintf(stdout, "    file_length >= 3\n");
}

bool open_and_check(FILE** fpp, int op_seq) {
  (*fpp) = fopen("testfile", "r+");

  if ((*fpp) == NULL) {
    fprintf(stderr, "%d.Can not open test file.\n", op_seq);
    return false;
  }

  return true;
}

int main(int args, char* argv[]) {
  if (args != 2) {
    usage(argv);
    return -1;
  }

  int length = atoi(argv[1]);
  if (length < 3) {
    fprintf(stdout, "file_length must be at least 3\n");
    return -1;
  }

  fclose(fopen("testfile", "w+"));

  FILE* fp;
  int op_seq = 1;

  if (!open_and_check(&fp, op_seq))
    return op_seq;
  putc('0'+op_seq, fp); // '1'
  fclose(fp);
  op_seq++;

  if (!open_and_check(&fp, op_seq))
    return op_seq;
  fseek(fp, length, SEEK_SET);
  putc('0'+op_seq, fp); // '2'
  fclose(fp);
  op_seq++;

  if (!open_and_check(&fp, op_seq))
    return op_seq;
  fseek(fp, length/2, SEEK_SET);
  putc('0'+op_seq, fp); // '3'
  fclose(fp);
  //op_seq++;

  return 0;
}

test_rrw.c

注：參數n是第二次寫之前做的偏移量，因而實際文件長度會是n+1

1. 首先用一個小文件做測試，如下：

root@xxx:/mnt/hdfs/tmp# ./a.out 3
root@xxx:/mnt/hdfs/tmp# ls -l testfile 
-rw-r--r-- 1 root root 4 Nov 27 18:04 testfile
root@xxx:/mnt/hdfs/tmp# cat testfile 
132

結果都符合預期；

2. 如果再重復執行一次呢？

root@xxx:/mnt/hdfs/tmp# ./a.out 3
Segmentation fault (core dumped)

從hdfs nfs網關的日志中可以找到出錯的原因：

2013-11-27 18:11:53,695 ERROR org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3: Setting file size is not supported when setattr, fileId: 20779

不支持重置文件大小，也就是不支持truncate，至少還“正確地”返回了失敗；

3. 改變文件大小測試一下

root@SY-0266:/mnt/hdfs/tmp# ./a.out 4096 && ls -lh --full-time testfile && sleep 5 && ls -lh --full-time testfile
-rw-r--r-- 1 root root 2.1K 2013-11-27 22:22:40.572000000 +0800 testfile
-rw-r--r-- 1 root root 1 2013-11-27 22:22:40.572000000 +0800 testfile
root@SY-0266:/mnt/hdfs/tmp# rm testfile 
root@SY-0266:/mnt/hdfs/tmp# ./a.out 4095 && ls -lh --full-time testfile && sleep 5 && ls -lh --full-time testfile
-rw-r--r-- 1 root root 4.0K 2013-11-27 22:25:17.606000000 +0800 testfile
-rw-r--r-- 1 root root 4.0K 2013-11-27 22:25:17.606000000 +0800 testfile

可以發現從4K開始，向上的文件已經無法正常完成這個測試了，文件會隱性的丟失數據。這應該與hdfs nfs對隨機寫和復寫的實現有關，我沒有具體研究代碼。

從這個簡單測試可以得出結論，hdfs nfs可以進行簡單的文件讀寫、使用常用的shell命令操作，但決不可以直接當本地文件系統、通過程序進行訪問。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 NFS服務對Hadoop（hdfs）集群影響測試(轉) [Hadoop 2.2 + Solr 4.5]系列之四：基於HDFS的Solr配置 Hadoop HDFS NFS GateWay部署深入具體解釋 hadoop的hdfs中的javaAPI操作 hadoop中HDFS的NameNode原理 Hadoop之HDFS中HA的搭建介紹hadoop中的hadoop和hdfs命令 03 測試Hadoop hdfs 上傳與 mr 五.hadoop 從mysql中讀取數據寫到hdfs 《hadoop學習》關於hdfs中的namenode和datanode詳解