tensorflow 讀取訓練集文件 from Hadoop


1、代碼配置

filename_queue = tf.train.string_input_producer([
    "hdfs://namenode:8020/path/to/file1.csv",
    "hdfs://namenode:8020/path/to/file2.csv",
])

filename_queue = tf.train.string_input_producer([
    "hdfs://namenode:9000/path/to/file1.tfrecord",
    "hdfs://namenode:9000/path/to/file2.tfrecord",
])

def read_tfrecords(filename_queue):
key, serialized_example
= reader.read(filename_queue) features = tf.parse_single_example( serialized_example, features={ 'label': tf.FixedLenFeature(shape=[label_dims], dtype=data_type), 'image': tf.FixedLenFeature(shape=[steps * width * height * channels], dtype=tf.float32) } ) label = features['label'] image = features['image'] return image, label

 

2、環境配置

   JAVA_HOME

  HADOOP_HFDS_HOME

  LD_LIBRARY_PATH

  CLASSPATH

  

 

eg:

  vi  ~/.bashrc

export JAVA_HOME=/home/user/java/jdk1.8.0_05
export HADOOP_HDFS_HOME=/home/user/software/hadoop-2.7.6/
export PATH=$PATH:$HADOOP_HDFS_HOME/libexec/hadoop-config.sh
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/server
export PATH=$PATH:$HADOOP_HDFS_HOME/bin:$HADOOP_HDFS_HOME/sbin
export CLASSPATH="$(hadoop classpath --glob)"

  source ~/.bashrc

 

3、使用

  此時就可以訪問Hadoop系統上的文件了  file = "hdfs://namenode:8020/path/to/file1.tfrecords",

  python your_script.py

 

 

參考文件

https://medium.com/@matthewyeung/hadoop-file-system-with-tensorflow-dataset-api-13ce9aeaa107

https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/hadoop.md


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM