hdfs操作手冊


hdfscli 命令行

# hdfscli --help
HdfsCLI: a command line interface for HDFS.

Usage:
  hdfscli [interactive] [-a ALIAS] [-v...]
  hdfscli download [-fsa ALIAS] [-v...] [-t THREADS] HDFS_PATH LOCAL_PATH
  hdfscli upload [-sa ALIAS] [-v...] [-A | -f] [-t THREADS] LOCAL_PATH HDFS_PATH
  hdfscli -L | -V | -h

Commands:
  download                      Download a file or folder from HDFS. If a
                                single file is downloaded, - can be
                                specified as LOCAL_PATH to stream it to
                                standard out.
  interactive                   Start the client and expose it via the python
                                interpreter (using iPython if available).
  upload                        Upload a file or folder to HDFS. - can be
                                specified as LOCAL_PATH to read from standard
                                in.

Arguments:
  HDFS_PATH                     Remote HDFS path.
  LOCAL_PATH                    Path to local file or directory.

Options:
  -A --append                   Append data to an existing file. Only supported
                                if uploading a single file or from standard in.
  -L --log                      Show path to current log file and exit.
  -V --version                  Show version and exit.
  -a ALIAS --alias=ALIAS        Alias of namenode to connect to.
  -f --force                    Allow overwriting any existing files.
  -s --silent                   Don't display progress status.
  -t THREADS --threads=THREADS  Number of threads to use for parallelization.
                                0 allocates a thread per file. [default: 0]
  -v --verbose                  Enable log output. Can be specified up to three
                                times (increasing verbosity each time).

Examples:
  hdfscli -a prod /user/foo
  hdfscli download features.avro dat/
  hdfscli download logs/1987-03-23 - >>logs
  hdfscli upload -f - data/weights.tsv <weights.tsv

HdfsCLI exits with return status 1 if an error occurred and 0 otherwise.

  

要使用hdfscli,首先需要設置hdfscli的默認配置文件

# cat ~/.hdfscli.cfg 
[global]
default.alias = dev

[dev.alias]
url = http://hadoop:50070
user = root

  python可用的客戶端類:

    InsecureClient(default)

    TokenClient

 

 上傳或下載文件

使用hdfscli上傳文件或文件夾(將hadoop文件夾上傳到/hdfs)

  # hdfscli upload --alias=dev -f /hadoop-2.4.1/etc/hadoop/ /hdfs

使用hdfscli下載/logs目錄到操作系統的/root/test目錄下  

  # hdfscli download /logs /root/test/

 

hdfscli 交互模式

[root@hadoop ~]# hdfscli --alias=dev

Welcome to the interactive HDFS python shell.
The HDFS client is available as `CLIENT`.

>>> CLIENT.list("/")
[u'Demo', u'hdfs', u'logs', u'logss']
>>> CLIENT.status("/Demo")  
{u'group': u'supergroup', u'permission': u'755', u'blockSize': 0,
 u'accessTime': 0, u'pathSuffix': u'', u'modificationTime': 1495123035501L, 
 u'replication': 0, u'length': 0, u'childrenNum': 1, u'owner': u'root', 
 u'type': u'DIRECTORY', u'fileId': 16389}
>>> CLIENT.delete("logs/install.log")
False
>>> CLIENT.delete("/logs/install.log")         
True

  

與python接口的綁定

  初始化客戶端

  1、導入client類,然后調用它的構造函數

>>> from hdfs import InsecureClient
>>> client = InsecureClient("http://172.10.236.21:50070",user='ann')
>>> client.list("/")
[u'Demo', u'hdfs', u'logs', u'logss']

  2、導入config類,加載一個已存在的配置文件並且從已存在的alias創建一個client,配置文件默認的讀取文件為~/.hdfs_config.cfg

>>> from hdfs import Config
>>> client=Config().get_client("dev")
>>> client.list("/")   
[u'Demo', u'hdfs', u'logs', u'logss']

  

  讀文件

  read()方法可從hdfs系統讀取一個文件,但是它必須放在with塊中,以確保每次都能正確關閉連接

>>> with client.read("/logs/yarn-env.sh",encoding="utf-8") as reader:
...   features=reader.read()
... 
>>> print features

  chunk_size參數將返回一個生成器,它使文件的內容變成流數據

>>> with client.read("/logs/yarn-env.sh",chunk_size=1024) as reader:
...   for chunk in reader:
...      print chunk
... 

  delimiter參數同樣返回一個生成器,文件內容是被指定符號分隔的

>>> with client.read("/logs/yarn-env.sh", encoding="utf-8", delimiter="\n") as reader:
...   for line in reader:
...     time.sleep(1)
...     print line

  寫文件

write方法用於寫文件到hdfs(將本地文件kong.txt寫入hdfs的/logs/kongtest.txt文件中)

>>> with open("/root/test/kong.txt") as reader, client.write("/logs/kongtest.txt") as writer:
...   for line in reader:
...     if line.startswith("-"):
...       writer.write(line)

  

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM