hdfscli 命令行
# hdfscli --help
HdfsCLI: a command line interface for HDFS.
Usage:
hdfscli [interactive] [-a ALIAS] [-v...]
hdfscli download [-fsa ALIAS] [-v...] [-t THREADS] HDFS_PATH LOCAL_PATH
hdfscli upload [-sa ALIAS] [-v...] [-A | -f] [-t THREADS] LOCAL_PATH HDFS_PATH
hdfscli -L | -V | -h
Commands:
download Download a file or folder from HDFS. If a
single file is downloaded, - can be
specified as LOCAL_PATH to stream it to
standard out.
interactive Start the client and expose it via the python
interpreter (using iPython if available).
upload Upload a file or folder to HDFS. - can be
specified as LOCAL_PATH to read from standard
in.
Arguments:
HDFS_PATH Remote HDFS path.
LOCAL_PATH Path to local file or directory.
Options:
-A --append Append data to an existing file. Only supported
if uploading a single file or from standard in.
-L --log Show path to current log file and exit.
-V --version Show version and exit.
-a ALIAS --alias=ALIAS Alias of namenode to connect to.
-f --force Allow overwriting any existing files.
-s --silent Don't display progress status.
-t THREADS --threads=THREADS Number of threads to use for parallelization.
0 allocates a thread per file. [default: 0]
-v --verbose Enable log output. Can be specified up to three
times (increasing verbosity each time).
Examples:
hdfscli -a prod /user/foo
hdfscli download features.avro dat/
hdfscli download logs/1987-03-23 - >>logs
hdfscli upload -f - data/weights.tsv <weights.tsv
HdfsCLI exits with return status 1 if an error occurred and 0 otherwise.
要使用hdfscli,首先需要設置hdfscli的默認配置文件
# cat ~/.hdfscli.cfg [global] default.alias = dev [dev.alias] url = http://hadoop:50070 user = root
python可用的客戶端類:
InsecureClient(default)
TokenClient
上傳或下載文件
使用hdfscli上傳文件或文件夾(將hadoop文件夾上傳到/hdfs)
# hdfscli upload --alias=dev -f /hadoop-2.4.1/etc/hadoop/ /hdfs

使用hdfscli下載/logs目錄到操作系統的/root/test目錄下
# hdfscli download /logs /root/test/
hdfscli 交互模式
[root@hadoop ~]# hdfscli --alias=dev
Welcome to the interactive HDFS python shell.
The HDFS client is available as `CLIENT`.
>>> CLIENT.list("/")
[u'Demo', u'hdfs', u'logs', u'logss']
>>> CLIENT.status("/Demo")
{u'group': u'supergroup', u'permission': u'755', u'blockSize': 0,
u'accessTime': 0, u'pathSuffix': u'', u'modificationTime': 1495123035501L,
u'replication': 0, u'length': 0, u'childrenNum': 1, u'owner': u'root',
u'type': u'DIRECTORY', u'fileId': 16389}
>>> CLIENT.delete("logs/install.log")
False
>>> CLIENT.delete("/logs/install.log")
True
與python接口的綁定
初始化客戶端
1、導入client類,然后調用它的構造函數
>>> from hdfs import InsecureClient
>>> client = InsecureClient("http://172.10.236.21:50070",user='ann')
>>> client.list("/")
[u'Demo', u'hdfs', u'logs', u'logss']
2、導入config類,加載一個已存在的配置文件並且從已存在的alias創建一個client,配置文件默認的讀取文件為~/.hdfs_config.cfg
>>> from hdfs import Config
>>> client=Config().get_client("dev")
>>> client.list("/")
[u'Demo', u'hdfs', u'logs', u'logss']
讀文件
read()方法可從hdfs系統讀取一個文件,但是它必須放在with塊中,以確保每次都能正確關閉連接
>>> with client.read("/logs/yarn-env.sh",encoding="utf-8") as reader:
... features=reader.read()
...
>>> print features
chunk_size參數將返回一個生成器,它使文件的內容變成流數據
>>> with client.read("/logs/yarn-env.sh",chunk_size=1024) as reader:
... for chunk in reader:
... print chunk
...
delimiter參數同樣返回一個生成器,文件內容是被指定符號分隔的
>>> with client.read("/logs/yarn-env.sh", encoding="utf-8", delimiter="\n") as reader:
... for line in reader:
... time.sleep(1)
... print line
寫文件
write方法用於寫文件到hdfs(將本地文件kong.txt寫入hdfs的/logs/kongtest.txt文件中)
>>> with open("/root/test/kong.txt") as reader, client.write("/logs/kongtest.txt") as writer:
... for line in reader:
... if line.startswith("-"):
... writer.write(line)
