1、代码配置
filename_queue = tf.train.string_input_producer([ "hdfs://namenode:8020/path/to/file1.csv", "hdfs://namenode:8020/path/to/file2.csv", ]) filename_queue = tf.train.string_input_producer([ "hdfs://namenode:9000/path/to/file1.tfrecord", "hdfs://namenode:9000/path/to/file2.tfrecord", ])
def read_tfrecords(filename_queue):
key, serialized_example = reader.read(filename_queue) features = tf.parse_single_example( serialized_example, features={ 'label': tf.FixedLenFeature(shape=[label_dims], dtype=data_type), 'image': tf.FixedLenFeature(shape=[steps * width * height * channels], dtype=tf.float32) } ) label = features['label'] image = features['image'] return image, label
2、环境配置
JAVA_HOME
HADOOP_HFDS_HOME
LD_LIBRARY_PATH
CLASSPATH
eg:
vi ~/.bashrc
export JAVA_HOME=/home/user/java/jdk1.8.0_05 export HADOOP_HDFS_HOME=/home/user/software/hadoop-2.7.6/ export PATH=$PATH:$HADOOP_HDFS_HOME/libexec/hadoop-config.sh export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$JAVA_HOME/jre/lib/amd64/server export PATH=$PATH:$HADOOP_HDFS_HOME/bin:$HADOOP_HDFS_HOME/sbin export CLASSPATH="$(hadoop classpath --glob)"
source ~/.bashrc
3、使用
此时就可以访问Hadoop系统上的文件了 file = "hdfs://namenode:8020/path/to/file1.tfrecords",
python your_script.py
参考文件
https://medium.com/@matthewyeung/hadoop-file-system-with-tensorflow-dataset-api-13ce9aeaa107
https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/hadoop.md