HDFS元數據管理實戰篇


           HDFS元數據管理實戰篇

                                   作者:尹正傑

版權聲明:原創作品,謝絕轉載!否則將追究法律責任。

 

 

 

一.HDFS元數據概述

1>.什么是HDFS元數據

  NameNode的主要工作是存儲HDFS命名空間,HDFS元數據(或HDFS命名空間)是由inode(其存儲屬性,如權限,修改,訪問時間和磁盤空間配額)表示的文件和目錄的層次結構。命名空間還包括文件到塊ID的映射關系。

  NameNode存儲HDFS元數據,而DataNode存儲實際的HDFS數據。當客戶端連接到Hadoop讀取和寫入數據時,它們首先連接到NameNode,從而直到實際數據塊存儲在哪里或往哪個DataNode寫入其數據。。

  HDFS元數據包括以下信息:
    (1)HDFS文件位置(持久化);
    (2)HDFS文件的所有權和權限(持久化);
    (3)HDFS數據塊的名稱(持久化);
    (4)HDFS數據塊的位置(為持久化,僅在內存中存儲,該信息由集群的所有DataNodes節點匯報而來);

  溫馨提示:
    除了上面的第4條,元數據文件fsimage包括以上列出的所有元數據。

2>.檢查點

  NameNode維護命名空間樹,以及將數據塊映射到集群中的DataNode。inode和塊列表一起定義命名空間的元數據,稱為映像(fsimage)。

  NameNode將整個映像存儲在其內存中,並在NameNode文件系統上存儲該映像的記錄。命名空間的這個持久記錄稱為檢查點。

  NameNode將對HDFS文件系統的更改寫入日志,命名為編輯日志。很重要的一點是,僅當NameNode啟動,用戶請求或者輔助節點或Standby NameNode創建新的檢查點時才會改變檢查點,否則NameNode在運行時不會改變它的檢查點。

  當NameNode啟動時,它會從磁盤上的檢查點初始化命名空間映像,並重播日志中的所有更改。在開始為客戶端提供服務之前,他會創建一個新的檢查點(fsimage文件)和一個空編輯日志文件。

  溫馨提示:
    fsimage文件包含存儲在DataNode上的數據塊和HDFS文件之間的映射信息。如果這個文件丟失或損壞,則存儲在DataNode上的HDFS數據無法被訪問,好像所有的數據已經消失了!

3>.fsimage和編輯日志

  當客戶端將數據寫入HDFS時,寫操作會更改HDFS元數據,當然,這些更改將由NameNode記錄到編輯日志中。同時,NameNode還將更新其元數據的內存。

  每個客戶端事務由NameNode記錄在預寫日志中,NameNode在向客戶端發送確認之前刷新並同步編輯日志。

  NameNode處理來自集群中多個客戶端的請求,因此為了優化將這些事務保存到磁盤的過程,它批處理多個客戶端事務。

  博主推薦閱讀:
    https://www.cnblogs.com/yinzhengjie2020/p/13364108.html

 

二.下載最新的fsimage文件

1>.fsimage和編輯日志的存儲位置

  fsimage和編輯日志時與HDFS元數據相關聯的兩個關鍵結構。NameNode將這了兩個結構存儲在由hdfs-site.xml文件中的配置參數"dfs.namenode.name.dir"(映像文件)和"dfs.namenode.edits.dir"(編輯文件)指定的存儲路徑。

  如下圖所示,是我們上面提到的兩個參數指定的目錄內容。

  溫馨提示:
    Secondary NameNode(或Standby NameNode)具有相同的文件結構。
    另外需要注意的是,編輯日志由多個編輯段組成的,每個段都是以"edits_*"開頭的文件;fsimage文件當然以"fsimage_*"開頭。

2>.下載最新的映像(fsimage)文件

[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -help fetchImage 
-fetchImage <local directory>:
    Downloads the most recent fsimage from the Name Node and saves it in    the specified local directory.

[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -help fetchImage
[root@hadoop105.yinzhengjie.com ~]# ll
total 0
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -fetchImage ./          #下載最新的映像文件到當前目錄
20/09/02 00:54:34 INFO namenode.TransferFsImage: Opening connection to http://hadoop101.yinzhengjie.com:50070/imagetransfer?getimage=1&txid=latest
20/09/02 00:54:34 INFO common.Util: Combined time for fsimage download and fsync to all disks took 0.00s. The fsimage download took 0.00s at 3000.00 KB/s. Synchronous (fsync) write to disk of /root/./fsimage_0000000000000004419 took 0.00s.
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# ll
total 4
-rw-r--r-- 1 root root 3731 Sep  2 00:54 fsimage_0000000000000004419
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -fetchImage ./          #下載最新的映像文件到當前目錄

3>.溫馨提示

  NameNode僅存儲文件系統元數據,例如磁盤上fsimage文件中的文件,塊,目錄和權限信息,它將實際塊位置信息保留在內存中。

  當客戶端讀取數據時,NmaeNode會高速客戶端文件塊所在位置。在這一點上,客戶端不需要進一步與NameNode進行關於數據本身傳送的通信。

  由於NameNode元數據具有的關鍵性質,因此應配置多個目錄作為dfs.namenode.name.dir配置參數的值。在理想情況下,推薦使用NFS設備掛載點,這樣做的目的是可以保證數據的冗余性。

 

三.離線(脫機)映像文件查看器(Offline Image Viewer,簡稱"oiv")和離線編輯日志查看器(Offline Edits Viewer,簡稱"oev")

1>.如何查看映像文件的內容呢?

  下載映像文件后, 如何查看其內容呢?如下圖所示,下載的映像文件其是一個二進制文件,我們不能使用文本工具去查看相應的內容(如果您強行這樣做,根本得不到該文件存儲的正確信息,如下圖所示)。

  可以使用離線映像查看器(全稱為"Offline Image Viewer",簡稱"oiv")查看fsimage文件的內容,從而了解集群的命名空間。此工具將fsimage文件的內容轉換為人類可讀的格式,並允許通過只讀WebHDFS API檢查HDFS的命名空間。

  生產環境中fsimage文件一般都相當大(如果您的集群數據量在剛剛達到PB級別,那么映像文件通常都能達到GB的容量),OIV可以幫助您快速處理文件的內容。正如其名,它可以離線幫咱們查看映像文件。

  溫馨提示:
    上面我們提到了如何下載映像文件,細心的小伙伴估計已經發現了,如果你可以直接登錄到NameNode節點,壓根就無需下載映像文件。只要到對應的存儲目錄拷貝一份即可,千萬別試圖去修改它!(如果你之意要這樣做,修改前最好做好備份喲~)
 

2>.映像文件離線查看器的常見處理器概述

  oiv實用程序將嘗試解析格式正確的圖像文件,如果圖像文件格式不正確,則會中止。

  該工具脫機工作,不需要在中運行群集以處理圖像文件。

  提供以下5種常見的映像處理器:
    XML:
      這個處理器創建一個XML文檔,其中包含fsimage枚舉,適合用XML進一步分析工具。

    ReverseXML:
      這個處理器接受一個XML文件並創建一個包含相同元素的二進制fsimage。說白了就是XML的反向操作,該工具使用場景相對較少(我直接使用上面XML處理器生成的xml文件作為輸入文件來將其轉換為映像文件發現有出現失敗的跡象)。

    FileDistribution:
      這個處理器可以分析映像文件的指定內容,可結合以下參數使用:
        -maxSize:
          指定文件大小的范圍[0,maxSize]已分析(默認128GB)。
        -step:
          定義了分發的粒度。(默認為2MB)
        -format:
          以人類可讀的方式格式化輸出結果而不是字節數。(默認為false)
    Web:       運行查看器以公開只讀webhdfsapi。可以使用
"-addr"指定要偵聽的地址。(默認值為"localhost:5978")     Delimited(實驗性):       生成一個包含所有公共元素的文本文件連接到正在構建的inode和inode,用分隔符分割(默認的分隔符是\t),可以通過-delimiter參數更改。

3>.使用oiv工具查詢hadoop鏡像文件(指定映像處理器為"XML"案例)

[root@hadoop101.yinzhengjie.com ~]# hdfs oiv --help
Usage: bin/hdfs oiv [OPTIONS] -i INPUTFILE -o OUTPUTFILE
Offline Image Viewer
View a Hadoop fsimage INPUTFILE using the specified PROCESSOR,
saving the results in OUTPUTFILE.

The oiv utility will attempt to parse correctly formed image files
and will abort fail with mal-formed image files.

The tool works offline and does not require a running cluster in
order to process an image file.

The following image processors are available:
  * XML: This processor creates an XML document with all elements of
    the fsimage enumerated, suitable for further analysis by XML
    tools.
  * ReverseXML: This processor takes an XML file and creates a
    binary fsimage containing the same elements.
  * FileDistribution: This processor analyzes the file size
    distribution in the image.
    -maxSize specifies the range [0, maxSize] of file sizes to be
     analyzed (128GB by default).
    -step defines the granularity of the distribution. (2MB by default)
    -format formats the output result in a human-readable fashion
     rather than a number of bytes. (false by default)
  * Web: Run a viewer to expose read-only WebHDFS API.
    -addr specifies the address to listen. (localhost:5978 by default)
  * Delimited (experimental): Generate a text file with all of the elements common
    to both inodes and inodes-under-construction, separated by a
    delimiter. The default delimiter is \t, though this may be
    changed via the -delimiter argument.

Required command line arguments:
-i,--inputFile <arg>   FSImage or XML file to process.

Optional command line arguments:
-o,--outputFile <arg>  Name of output file. If the specified
                       file exists, it will be overwritten.
                       (output to stdout by default)
                       If the input file was an XML file, we
                       will also create an <outputFile>.md5 file.
-p,--processor <arg>   Select which type of processor to apply
                       against image file. (XML|FileDistribution|
                       ReverseXML|Web|Delimited)
                       The default is Web.
-delimiter <arg>       Delimiting string to use with Delimited processor.  
-t,--temp <arg>        Use temporary dir to cache intermediate result to generate
                       Delimited outputs. If not set, Delimited processor constructs
                       the namespace in memory before outputting text.
-h,--help              Display usage information and exit

[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs oiv --help
[root@hadoop101.yinzhengjie.com ~]# ll
total 4
-rw-r--r-- 1 root root 3731 Sep  2 00:52 fsimage_0000000000000004419
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs oiv -i fsimage_0000000000000004419 -o yinzhengjie-fsimage.xml -p XML
20/09/15 13:50:16 INFO offlineImageViewer.FSImageHandler: Loading 3 strings
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# ll
total 24
-rw-r--r-- 1 root root  3731 Sep  2 00:52 fsimage_0000000000000004419
-rw-r--r-- 1 root root 18402 Sep 15 13:50 yinzhengjie-fsimage.xml
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# head yinzhengjie-fsimage.xml 
<?xml version="1.0"?>
<fsimage><version><layoutVersion>-63</layoutVersion><onDiskVersion>1</onDiskVersion><oivRevision>e2f1f118e465e787d8567dfa6e2f3b72a0eb9194</oivRevision></version>
<NameSection><namespaceId>1933296438</namespaceId><genstampV1>1000</genstampV1><genstampV2>1932</genstampV2><genstampV1Limit>0</genstampV1Limit><lastAllocatedBlockId>1073741902</lastAllocat
edBlockId><txid>4419</txid></NameSection><INodeSection><lastInodeId>16506</lastInodeId><numInodes>34</numInodes><inode><id>16385</id><type>DIRECTORY</type><name></name><mtime>1598003240154</mtime><permission>root:admingroup:0755</
permission><nsquota>9223372036854775807</nsquota><dsquota>-1</dsquota></inode><inode><id>16390</id><type>DIRECTORY</type><name>yinzhengjie</name><mtime>1598917806068</mtime><permission>root:admingroup:0755</permission><nsquota>50</nsquota><dsquota>10737418240</dsquot
a></inode><inode><id>16410</id><type>DIRECTORY</type><name>user</name><mtime>1597403986865</mtime><permission>root:admingroup:0700</permission><nsquota>-1</nsquota><dsquota>-1</dsquota></inode>
<inode><id>16411</id><type>DIRECTORY</type><name>root</name><mtime>1598916134456</mtime><permission>root:admingroup:0700</permission><nsquota>50</nsquota><dsquota>10737418240</dsquota></ino
de><inode><id>16412</id><type>DIRECTORY</type><name>.Trash</name><mtime>1598916600051</mtime><permission>root:admingroup:0700</permission><nsquota>-1</nsquota><dsquota>-1</dsquota></inode>
<inode><id>16423</id><type>DIRECTORY</type><name>yum.repos.d</name><mtime>1597417995606</mtime><permission>root:admingroup:0755</permission><nsquota>-1</nsquota><dsquota>-1</dsquota></inode
><inode><id>16424</id><type>FILE</type><name>CentOS-Base.repo</name><replication>2</replication><mtime>1597417995351</mtime><atime>1598855537752</atime><preferredBlockSize>536870912</preferr
edBlockSize><permission>root:admingroup:0644</permission><blocks><block><id>1073741839</id><genstamp>1015</genstamp><numBytes>1664</numBytes></block>[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs oiv -i fsimage_0000000000000004419 -o yinzhengjie-fsimage.xml -p XML

4>.使用oiv工具查詢hadoop鏡像文件(指定映像處理器為"Delimited "案例)

[root@hadoop101.yinzhengjie.com ~]# ll
total 24
-rw-r--r-- 1 root root  3731 Sep  2 00:52 fsimage_0000000000000004419
-rw-r--r-- 1 root root 18402 Sep 15 13:50 yinzhengjie-fsimage.xml
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs oiv -i fsimage_0000000000000004419 -o yinzhengjie-fsimage.txt -p Delimited
20/09/15 14:19:34 INFO offlineImageViewer.PBImageTextWriter: Loading string table
20/09/15 14:19:34 INFO offlineImageViewer.FSImageHandler: Loading 3 strings
20/09/15 14:19:34 INFO offlineImageViewer.PBImageTextWriter: Loading inode references
20/09/15 14:19:34 INFO offlineImageViewer.FSImageHandler: Loading inode references
20/09/15 14:19:34 INFO offlineImageViewer.FSImageHandler: Loaded 4 inode references
20/09/15 14:19:34 INFO offlineImageViewer.PBImageTextWriter: Loading directories
20/09/15 14:19:34 INFO offlineImageViewer.PBImageTextWriter: Loading directories in INode section.
20/09/15 14:19:34 INFO offlineImageViewer.PBImageTextWriter: Found 19 directories in INode section.
20/09/15 14:19:34 INFO offlineImageViewer.PBImageTextWriter: Finished loading directories in 14ms
20/09/15 14:19:34 INFO offlineImageViewer.PBImageTextWriter: Loading INode directory section.
20/09/15 14:19:34 INFO offlineImageViewer.PBImageTextWriter: Scanned 12 INode directories to build namespace.
20/09/15 14:19:34 INFO offlineImageViewer.PBImageTextWriter: Finished loading INode directory section in 2ms
20/09/15 14:19:34 INFO offlineImageViewer.PBImageTextWriter: Found 34 INodes in the INode section
20/09/15 14:19:34 WARN offlineImageViewer.PBImageTextWriter: Ignored 1 nodes, including 1 in snapshots. Please turn on debug log for details
20/09/15 14:19:34 INFO offlineImageViewer.PBImageTextWriter: Outputted 34 INodes.
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# ll
total 32
-rw-r--r-- 1 root root  3731 Sep  2 00:52 fsimage_0000000000000004419
-rw-r--r-- 1 root root  4104 Sep 15 14:19 yinzhengjie-fsimage.txt
-rw-r--r-- 1 root root 18402 Sep 15 13:50 yinzhengjie-fsimage.xml
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# head -5 yinzhengjie-fsimage.txt
Path    Replication    ModificationTime    AccessTime    PreferredBlockSize    BlocksCount    FileSize    NSQUOTA    DSQUOTA    Permission    UserName    GroupName
/    0    2020-08-21 17:47    1970-01-01 08:00    0    0    0    9223372036854775807    -1    drwxr-xr-x    root    admingroup
/yinzhengjie    0    2020-09-01 07:50    1970-01-01 08:00    0    0    0    50    10737418240    drwxr-xr-x    root    admingroup
/user    0    2020-08-14 19:19    1970-01-01 08:00    0    0    0    -1    -1    drwx------    root    admingroup
/user/root    0    2020-09-01 07:22    1970-01-01 08:00    0    0    0    50    10737418240    drwx------    root    admingroup
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs oiv -i fsimage_0000000000000004419 -o yinzhengjie-fsimage.txt -p Delimited

5>.使用oiv工具查詢hadoop鏡像文件(指定映像處理器為"Web"案例)

[root@hadoop101.yinzhengjie.com ~]# hdfs dfs -ls /yinzhengjie/        #首先向大家確認我的集群是有該文件的,然后將最新的鏡像文件下載到本地進行下面的操作即可
Found 1 items
-rw-r--r--   3 root admingroup        371 2020-08-31 18:07 /yinzhengjie/hosts
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hostname -i
172.200.6.101
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# ll
total 32
-rw-r--r-- 1 root root  3731 Sep  2 00:52 fsimage_0000000000000004419
-rw-r--r-- 1 root root  4104 Sep 15 14:19 yinzhengjie-fsimage.txt
-rw-r--r-- 1 root root 18402 Sep 15 13:50 yinzhengjie-fsimage.xml
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs oiv -i fsimage_0000000000000004419 -p Web -addr 172.200.6.101:5978
20/09/15 14:59:52 INFO offlineImageViewer.FSImageHandler: Loading 3 strings
20/09/15 14:59:52 INFO offlineImageViewer.FSImageHandler: Loading 34 inodes.
20/09/15 14:59:52 INFO offlineImageViewer.FSImageHandler: Loading inode references
20/09/15 14:59:52 INFO offlineImageViewer.FSImageHandler: Loaded 4 inode references
20/09/15 14:59:52 INFO offlineImageViewer.FSImageHandler: Loading inode directory section
20/09/15 14:59:52 INFO offlineImageViewer.FSImageHandler: Loaded 12 directories
20/09/15 14:59:52 INFO offlineImageViewer.WebImageViewer: WebImageViewer started. Listening on /172.200.6.101:5978. Press Ctrl+C to stop the viewer.
20/09/15 14:59:57 INFO offlineImageViewer.FSImageHandler: op=GETFILESTATUS target=/yinzhengjie/hosts
[root@hadoop101.yinzhengjie.com ~]# hdfs oiv -i fsimage_0000000000000004419 -p Web -addr 172.200.6.101:5978

6>.使用oev工具查詢hadoop的編輯日志文件

[root@hadoop101.yinzhengjie.com ~]# hdfs oev --help
Usage: bin/hdfs oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE
Offline edits viewer
Parse a Hadoop edits log file INPUT_FILE and save results
in OUTPUT_FILE.
Required command line arguments:
-i,--inputFile <arg>   edits file to process, xml (case
                       insensitive) extension means XML format,
                       any other filename means binary format.
                       XML/Binary format input file is not allowed
                       to be processed by the same type processor.
-o,--outputFile <arg>  Name of output file. If the specified
                       file exists, it will be overwritten,
                       format of the file is determined
                       by -p option

Optional command line arguments:
-p,--processor <arg>   Select which type of processor to apply
                       against image file, currently supported
                       processors are: binary (native binary format
                       that Hadoop uses), xml (default, XML
                       format), stats (prints statistics about
                       edits file)
-h,--help              Display usage information and exit
-f,--fix-txids         Renumber the transaction IDs in the input,
                       so that there are no gaps or invalid
                       transaction IDs.
-r,--recover           When reading binary edit logs, use recovery 
                       mode.  This will give you the chance to skip 
                       corrupt parts of the edit log.
-v,--verbose           More verbose output, prints the input and
                       output filenames, for processors that write
                       to a file, also output to screen. On large
                       image files this will dramatically increase
                       processing time (default is false).


Generic options supported are:
-conf <configuration file>        specify an application configuration file
-D <property=value>               define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>  specify a ResourceManager
-files <file1,...>                specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...>               specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...>          specify a comma-separated list of archives to be unarchived on the compute machines

The general command line syntax is:
command [genericOptions] [commandOptions]

[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs oev --help
root@hadoop101.yinzhengjie.com ~]# ll /yinzhengjie/data/hadoop/fully-mode/hdfs/nn1/edits/current/ | tail
-rw-r--r-- 1 root root     193 Sep  2 01:58 edits_0000000000000004422-0000000000000004425
-rw-r--r-- 1 root root      42 Sep  2 02:58 edits_0000000000000004426-0000000000000004427
-rw-r--r-- 1 root root      42 Sep  2 03:58 edits_0000000000000004428-0000000000000004429
-rw-r--r-- 1 root root      42 Sep  2 04:59 edits_0000000000000004430-0000000000000004431
-rw-r--r-- 1 root root      42 Sep  2 05:59 edits_0000000000000004432-0000000000000004433
-rw-r--r-- 1 root root 1048576 Sep  2 05:59 edits_0000000000000004434-0000000000000004434
-rw-r--r-- 1 root root     279 Sep 15 14:44 edits_0000000000000004435-0000000000000004439
-rw-r--r-- 1 root root 1048576 Sep 15 14:44 edits_inprogress_0000000000000004440
-rw-r--r-- 1 root root       5 Sep 15 14:44 seen_txid
-rw-r--r-- 1 root root     217 Sep 15 13:45 VERSION
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# ll
total 32
-rw-r--r-- 1 root root  3731 Sep  2 00:52 fsimage_0000000000000004419
-rw-r--r-- 1 root root  4104 Sep 15 14:19 yinzhengjie-fsimage.txt
-rw-r--r-- 1 root root 18402 Sep 15 13:50 yinzhengjie-fsimage.xml
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs oev -i /yinzhengjie/data/hadoop/fully-mode/hdfs/nn1/edits/current/edits_inprogress_0000000000000004440 -o yinzhengjie-edits.xml -p XML
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# ll
total 36
-rw-r--r-- 1 root root  3731 Sep  2 00:52 fsimage_0000000000000004419
-rw-r--r-- 1 root root   205 Sep 15 15:13 yinzhengjie-edits.xml
-rw-r--r-- 1 root root  4104 Sep 15 14:19 yinzhengjie-fsimage.txt
-rw-r--r-- 1 root root 18402 Sep 15 13:50 yinzhengjie-fsimage.xml
[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# sz yinzhengjie-edits.xml 

[root@hadoop101.yinzhengjie.com ~]# 
[root@hadoop101.yinzhengjie.com ~]# hdfs oev -i /yinzhengjie/data/hadoop/fully-mode/hdfs/nn1/edits/current/edits_inprogress_0000000000000004440 -o yinzhengjie-edits.xml -p XML

 

四.備份和恢復NameNode元數據

  由於NameNode元數據對集群操作是非常關鍵的數據,因此應該定期備份。接下來我們來實戰操作一下如何備份數據。

1>.將集群置於安全模式

  關於HDFS集群的安全模式可以參考我之前的筆記,我這里就不羅嗦啦。

  博主推薦閱讀:
    https://www.cnblogs.com/yinzhengjie2020/p/13378967.html
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -safemode get
Safe mode is OFF
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -safemode enter
Safe mode is ON
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -safemode get
Safe mode is ON
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -safemode enter

2>.使用"-saveNamespace"命令備份元數據(手動生成新的檢查點,即fsimage文件)

  溫馨提示:
    "-saveNamespace"命令將當前命名空間(HDFS在內存中的元數據)保存到磁盤,並重置編輯日志。

    由於此命令要求HDFS處於安全模式,因此必須在執行"-saveNamespace"命令行確保退出安全模式。
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -help saveNamespace
-saveNamespace:    Save current namespace into storage directories and reset edits log.
        Requires safe mode.

[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -help saveNamespace
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -safemode get
Safe mode is ON
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -saveNamespace
Save namespace successful
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -safemode get
Safe mode is ON
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -saveNamespace

3>.使用"-metasave"命令將NameNode主結構保存到文本文件

  溫馨提示:
    如下圖所示,在此示例中,dfsadmin -metasave命令將以下初始數據結構存儲到"${HADOOP_HOME}/logs/"目錄中名為"yinzhengjie-meta.log"的文件中。

    編輯元數據文件不是一個好的習慣,因為可能會丟失數據。例如,VERSION文件保護在NameNode上注冊的不同命名空間的DataNode信息。這是一個內置的安全機制,通過編輯VERSION文件來匹配DataNode和NameNode的命名空間ID,可能會破壞數據喲。
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -help metasave
-metasave <filename>:     Save Namenode's primary data structures
        to <filename> in the directory specified by hadoop.log.dir property.
        <filename> is overwritten if it exists.
        <filename> will contain one line for each of the following
            1. Datanodes heart beating with Namenode
            2. Blocks waiting to be replicated
            3. Blocks currrently being replicated
            4. Blocks waiting to be deleted

[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -help metasave
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -safemode get
Safe mode is ON
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -metasave yinzhengjie-meta.log
Created metasave file yinzhengjie-meta.log in the log directory of namenode hdfs://hadoop101.yinzhengjie.com:9000
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# 
[root@hadoop105.yinzhengjie.com ~]# hdfs dfsadmin -metasave yinzhengjie-meta.log

 

五.使用getconf命令獲取NameNode配置信息

  可使用hdfs getconf使用程序來獲取NameNode配置信息,如下圖所示。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM