0.說明
在Google開源該框架之后便使用真實K40m卡測試,由於生產環境是CentOS6.6的操作系統,但是該框架需要在Python2.7環境下執行,CentOS6.6下折騰了一天沒搞定,后來換成CentOS7,順利跑通
1.系統環境
- python >=2.7
- numpy >=1.9
- gcc >=4.8.2
- cuda 7.0
- java >=1.8
- cudnn 6.5 v2
2.安裝部署
#安裝依賴,kernel-devel是為了安裝cuda
yum -y install gcc python-devel kernel-devel
#安裝pip
wget --no-check-certificate https://github.com/pypa/pip/archive/1.5.5.tar.gz
tar zvxf 1.5.5.tar.gz
cd pip-1.5.5/
python setup.py install
#安裝tensorflow,此時確保服務器可以聯網,會自動下載安裝numpy和six
pip install http://dlp.iflytek.com/soft/tensorflow-0.5.0-cp27-none-linux_x86_64.whl
3.測試用例CNN
- 下載訓練數據集
wget http://dlp.iflytek.com/soft/cifar-10-binary.tar.gz
tar -zxvf -C /tmp/cifar10_data
-
執行腳本(默認cpu)
cd /root/tensorflow-master/tensorflow/models/image/cifar10
python cifar10_train.py -
采用gpu執行
python cifar10_multi_gpu_train.py --num_gpus=4
4.腳本相關說明
#查看幫助
python cifar10_train.py --help
--batch_size BATCH_SIZE #一批數據的圖片數量,默認是包含128個examples
Number of images to process in a batch.
--data_dir DATA_DIR #訓練數據集目錄,默認是/tmp/cifar10_data
Path to the CIFAR-10 data directory.
--train_dir TRAIN_DIR #訓練目錄
Directory where to write event logs and checkpoint.
--max_steps MAX_STEPS #最大步數,默認是1000000
Number of batches to run.
--log_device_placement LOG_DEVICE_PLACEMENT
Whether to log device placement.
--nolog_device_placement
Tutorials and Machine Learning Examples — TensorFlow
http://tensorflow.org/tutorials/deep_cnn/index.md