tensorflow 读取训练集文件 from Hadoop


1、代码配置

filename_queue = tf.train.string_input_producer([
    "hdfs://namenode:8020/path/to/file1.csv",
    "hdfs://namenode:8020/path/to/file2.csv",
])

filename_queue = tf.train.string_input_producer([
    "hdfs://namenode:9000/path/to/file1.tfrecord",
    "hdfs://namenode:9000/path/to/file2.tfrecord",
])

def read_tfrecords(filename_queue):
key, serialized_example
= reader.read(filename_queue) features = tf.parse_single_example( serialized_example, features={ 'label': tf.FixedLenFeature(shape=[label_dims], dtype=data_type), 'image': tf.FixedLenFeature(shape=[steps * width * height * channels], dtype=tf.float32) } ) label = features['label'] image = features['image'] return image, label

 

2、环境配置

   JAVA_HOME

  HADOOP_HFDS_HOME

  LD_LIBRARY_PATH

  CLASSPATH

  

 

eg:

  vi  ~/.bashrc

export JAVA_HOME=/home/user/java/jdk1.8.0_05
export HADOOP_HDFS_HOME=/home/user/software/hadoop-2.7.6/
export PATH=$PATH:$HADOOP_HDFS_HOME/libexec/hadoop-config.sh
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/server
export PATH=$PATH:$HADOOP_HDFS_HOME/bin:$HADOOP_HDFS_HOME/sbin
export CLASSPATH="$(hadoop classpath --glob)"

  source ~/.bashrc

 

3、使用

  此时就可以访问Hadoop系统上的文件了  file = "hdfs://namenode:8020/path/to/file1.tfrecords",

  python your_script.py

 

 

参考文件

https://medium.com/@matthewyeung/hadoop-file-system-with-tensorflow-dataset-api-13ce9aeaa107

https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/hadoop.md


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM