hdfs基本操作-python接口


安裝hdfs包

  pip install hdfs

 

查看hdfs目錄

[root@hadoop hadoop]# hdfs dfs -ls -R /
drwxr-xr-x - root supergroup 0 2017-05-18 23:57 /Demo
-rw-r--r-- 1 root supergroup 3494 2017-05-18 23:57 /Demo/hadoop-env.sh
drwxr-xr-x - root supergroup 0 2017-05-18 19:01 /logs
-rw-r--r-- 1 root supergroup 2223 2017-05-18 19:01 /logs/anaconda-ks.cfg
-rw-r--r-- 1 root supergroup 57162 2017-05-18 18:32 /logs/install.log

  

創建hdfs連接實例

#!/usr/bin/env python
# -*- coding:utf-8 -*-
__Author__ = 'kongZhaGen'

import hdfs
client = hdfs.Client("http://172.10.236.21:50070")

  

list:返回遠程文件夾包含的文件或目錄名稱,如果路徑不存在則拋出錯誤。

  hdfs_path:遠程文件夾的路徑

  status:同時返回每個文件的狀態信息

def list(self, hdfs_path, status=False):
    """Return names of files contained in a remote folder.

    :param hdfs_path: Remote path to a directory. If `hdfs_path` doesn't exist
      or points to a normal file, an :class:`HdfsError` will be raised.
    :param status: Also return each file's corresponding FileStatus_.

    """

  示例:

print client.list("/",status=False)
結果:
[u'Demo', u'logs']

  

status:獲取hdfs系統上文件或文件夾的狀態信息

  hdfs_path:路徑名稱

  strict:

    False:如果遠程路徑不存在返回None

    True:如果遠程路徑不存在拋出異常

def status(self, hdfs_path, strict=True):
    """Get FileStatus_ for a file or folder on HDFS.

    :param hdfs_path: Remote path.
    :param strict: If `False`, return `None` rather than raise an exception if
      the path doesn't exist.

    .. _FileStatus: FS_
    .. _FS: http://hadoop.apache.org/docs/r1.0.4/webhdfs.html#FileStatus

    """

  示例:

print client.status(hdfs_path="/Demoo",strict=False)
結果:
None

  

makedirs:在hdfs上創建目錄,可實現遞歸創建目錄

  hdfs_path:遠程目錄名稱

  permission:為新創建的目錄設置權限

 def makedirs(self, hdfs_path, permission=None):
    """Create a remote directory, recursively if necessary.

    :param hdfs_path: Remote path. Intermediate directories will be created
      appropriately.
    :param permission: Octal permission to set on the newly created directory.
      These permissions will only be set on directories that do not already
      exist.

    This function currently has no return value as WebHDFS doesn't return a
    meaningful flag.

    """

  示例:

  如果想在遠程客戶端通過腳本給hdfs創建目錄,需要修改hdfs-site.xml

  <property>
  <name>dfs.permissions</name>
  <value>false</value>
  </property>

  重啟hdfs

stop-dfs.sh
start-dfs.sh

  遞歸創建目錄

client.makedirs("/data/rar/tmp",permission=755)

  

rename:移動一個文件或文件夾

  hdfs_src_path:源路徑

  hdfs_dst_path:目標路徑,如果路徑存在且是個目錄,則源目錄移動到此目錄中。如果路徑存在且是個文件,則會拋出異常

def rename(self, hdfs_src_path, hdfs_dst_path):
    """Move a file or folder.

    :param hdfs_src_path: Source path.
    :param hdfs_dst_path: Destination path. If the path already exists and is
      a directory, the source will be moved into it. If the path exists and is
      a file, or if a parent destination directory is missing, this method will
      raise an :class:`HdfsError`.

    """

  示例:

client.rename("/SRC_DATA","/dest_data")

  

delete:從hdfs刪除一個文件或目錄

  hdfs_path:hdfs系統上的路徑

  recursive:如果目錄非空,True:可遞歸刪除.False:拋出異常。

def delete(self, hdfs_path, recursive=False):
    """Remove a file or directory from HDFS.

    :param hdfs_path: HDFS path.
    :param recursive: Recursively delete files and directories. By default,
      this method will raise an :class:`HdfsError` if trying to delete a
      non-empty directory.

    This function returns `True` if the deletion was successful and `False` if
    no file or directory previously existed at `hdfs_path`.

    """

  示例:

client.delete("/dest_data",recursive=True)

  

 upload:上傳文件或目錄到hdfs文件系統,如果目標目錄已經存在,則將文件或目錄上傳到此目錄中,否則新建目錄。

def upload(self, hdfs_path, local_path, overwrite=False, n_threads=1,
    temp_dir=None, chunk_size=2 ** 16, progress=None, cleanup=True, **kwargs):
    """Upload a file or directory to HDFS.

    :param hdfs_path: Target HDFS path. If it already exists and is a
      directory, files will be uploaded inside.
    :param local_path: Local path to file or folder. If a folder, all the files
      inside of it will be uploaded (note that this implies that folders empty
      of files will not be created remotely).
    :param overwrite: Overwrite any existing file or directory.
    :param n_threads: Number of threads to use for parallelization. A value of
      `0` (or negative) uses as many threads as there are files.
    :param temp_dir: Directory under which the files will first be uploaded
      when `overwrite=True` and the final remote path already exists. Once the
      upload successfully completes, it will be swapped in.
    :param chunk_size: Interval in bytes by which the files will be uploaded.
    :param progress: Callback function to track progress, called every
      `chunk_size` bytes. It will be passed two arguments, the path to the
      file being uploaded and the number of bytes transferred so far. On
      completion, it will be called once with `-1` as second argument.
    :param cleanup: Delete any uploaded files if an error occurs during the
      upload.
    :param \*\*kwargs: Keyword arguments forwarded to :meth:`write`.

    On success, this method returns the remote upload path.

    """

  示例:

>>> import hdfs
>>> client=hdfs.Client("http://172.10.236.21:50070")
>>> client.upload("/logs","/root/training/jdk-7u75-linux-i586.tar.gz")
'/logs/jdk-7u75-linux-i586.tar.gz'
>>> client.list("/logs")
[u'anaconda-ks.cfg', u'install.log', u'jdk-7u75-linux-i586.tar.gz']

  

content:獲取hdfs系統上文件或目錄的概要信息

print client.content("/logs/install.log")
結果:
{u'spaceConsumed': 57162, u'quota': -1, u'spaceQuota': -1, u'length': 57162, u'directoryCount': 0, u'fileCount': 1}

  

write:在hdfs文件系統上創建文件,可以是字符串,生成器或文件對象

def write(self, hdfs_path, data=None, overwrite=False, permission=None,
    blocksize=None, replication=None, buffersize=None, append=False,
    encoding=None):
    """Create a file on HDFS.

    :param hdfs_path: Path where to create file. The necessary directories will
      be created appropriately.
    :param data: Contents of file to write. Can be a string, a generator or a
      file object. The last two options will allow streaming upload (i.e.
      without having to load the entire contents into memory). If `None`, this
      method will return a file-like object and should be called using a `with`
      block (see below for examples).
    :param overwrite: Overwrite any existing file or directory.
    :param permission: Octal permission to set on the newly created file.
      Leading zeros may be omitted.
    :param blocksize: Block size of the file.
    :param replication: Number of replications of the file.
    :param buffersize: Size of upload buffer.
    :param append: Append to a file rather than create a new one.
    :param encoding: Encoding used to serialize data written.
"""

  


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM