hdfscli 命令行
# hdfscli --help HdfsCLI: a command line interface for HDFS. Usage: hdfscli [interactive] [-a ALIAS] [-v...] hdfscli download [-fsa ALIAS] [-v...] [-t THREADS] HDFS_PATH LOCAL_PATH hdfscli upload [-sa ALIAS] [-v...] [-A | -f] [-t THREADS] LOCAL_PATH HDFS_PATH hdfscli -L | -V | -h Commands: download Download a file or folder from HDFS. If a single file is downloaded, - can be specified as LOCAL_PATH to stream it to standard out. interactive Start the client and expose it via the python interpreter (using iPython if available). upload Upload a file or folder to HDFS. - can be specified as LOCAL_PATH to read from standard in. Arguments: HDFS_PATH Remote HDFS path. LOCAL_PATH Path to local file or directory. Options: -A --append Append data to an existing file. Only supported if uploading a single file or from standard in. -L --log Show path to current log file and exit. -V --version Show version and exit. -a ALIAS --alias=ALIAS Alias of namenode to connect to. -f --force Allow overwriting any existing files. -s --silent Don't display progress status. -t THREADS --threads=THREADS Number of threads to use for parallelization. 0 allocates a thread per file. [default: 0] -v --verbose Enable log output. Can be specified up to three times (increasing verbosity each time). Examples: hdfscli -a prod /user/foo hdfscli download features.avro dat/ hdfscli download logs/1987-03-23 - >>logs hdfscli upload -f - data/weights.tsv <weights.tsv HdfsCLI exits with return status 1 if an error occurred and 0 otherwise.
要使用hdfscli,首先需要設置hdfscli的默認配置文件
# cat ~/.hdfscli.cfg [global] default.alias = dev [dev.alias] url = http://hadoop:50070 user = root
python可用的客戶端類:
InsecureClient(default)
TokenClient
上傳或下載文件
使用hdfscli上傳文件或文件夾(將hadoop文件夾上傳到/hdfs)
# hdfscli upload --alias=dev -f /hadoop-2.4.1/etc/hadoop/ /hdfs
使用hdfscli下載/logs目錄到操作系統的/root/test目錄下
# hdfscli download /logs /root/test/
hdfscli 交互模式
[root@hadoop ~]# hdfscli --alias=dev Welcome to the interactive HDFS python shell. The HDFS client is available as `CLIENT`. >>> CLIENT.list("/") [u'Demo', u'hdfs', u'logs', u'logss'] >>> CLIENT.status("/Demo") {u'group': u'supergroup', u'permission': u'755', u'blockSize': 0, u'accessTime': 0, u'pathSuffix': u'', u'modificationTime': 1495123035501L, u'replication': 0, u'length': 0, u'childrenNum': 1, u'owner': u'root', u'type': u'DIRECTORY', u'fileId': 16389} >>> CLIENT.delete("logs/install.log") False >>> CLIENT.delete("/logs/install.log") True
與python接口的綁定
初始化客戶端
1、導入client類,然后調用它的構造函數
>>> from hdfs import InsecureClient >>> client = InsecureClient("http://172.10.236.21:50070",user='ann') >>> client.list("/") [u'Demo', u'hdfs', u'logs', u'logss']
2、導入config類,加載一個已存在的配置文件並且從已存在的alias創建一個client,配置文件默認的讀取文件為~/.hdfs_config.cfg
>>> from hdfs import Config >>> client=Config().get_client("dev") >>> client.list("/") [u'Demo', u'hdfs', u'logs', u'logss']
讀文件
read()方法可從hdfs系統讀取一個文件,但是它必須放在with塊中,以確保每次都能正確關閉連接
>>> with client.read("/logs/yarn-env.sh",encoding="utf-8") as reader: ... features=reader.read() ... >>> print features
chunk_size參數將返回一個生成器,它使文件的內容變成流數據
>>> with client.read("/logs/yarn-env.sh",chunk_size=1024) as reader: ... for chunk in reader: ... print chunk ...
delimiter參數同樣返回一個生成器,文件內容是被指定符號分隔的
>>> with client.read("/logs/yarn-env.sh", encoding="utf-8", delimiter="\n") as reader: ... for line in reader: ... time.sleep(1) ... print line
寫文件
write方法用於寫文件到hdfs(將本地文件kong.txt寫入hdfs的/logs/kongtest.txt文件中)
>>> with open("/root/test/kong.txt") as reader, client.write("/logs/kongtest.txt") as writer: ... for line in reader: ... if line.startswith("-"): ... writer.write(line)