用卷積神經網絡基於 Tensorflow 實現的中文文本分類
項目地址:
https://github.com/fendouai/Chinese-Text-Classification
歡迎提問:http://tensorflow123.com/
這個項目是基於以下項目改寫:
cnn-text-classification-tf
主要的改動:
- 兼容 tensorflow 1.2 以上
- 增加了中文數據集
- 增加了中文處理流程
特性:
- 兼容最新 TensorFlow
- 中文數據集
- 基於 jieba 的中文處理工具
- 模型訓練,模型保存,模型評估的完整實現
訓練結果
模型評估
以下為原項目的 README
This code belongs to the "Implementing a CNN for Text Classification in Tensorflow" blog post.
It is slightly simplified implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in Tensorflow.
Requirements
- Python 3
- Tensorflow > 1.2
- Numpy
Training
Print parameters:
./train.py --help
optional arguments:
-h, --help show this help message and exit
--embedding_dim EMBEDDING_DIM
Dimensionality of character embedding (default: 128)
--filter_sizes FILTER_SIZES
Comma-separated filter sizes (default: '3,4,5')
--num_filters NUM_FILTERS
Number of filters per filter size (default: 128)
--l2_reg_lambda L2_REG_LAMBDA
L2 regularizaion lambda (default: 0.0)
--dropout_keep_prob DROPOUT_KEEP_PROB
Dropout keep probability (default: 0.5)
--batch_size BATCH_SIZE
Batch Size (default: 64)
--num_epochs NUM_EPOCHS
Number of training epochs (default: 100)
--evaluate_every EVALUATE_EVERY
Evaluate model on dev set after this many steps
(default: 100)
--checkpoint_every CHECKPOINT_EVERY
Save model after this many steps (default: 100)
--allow_soft_placement ALLOW_SOFT_PLACEMENT
Allow device soft device placement
--noallow_soft_placement
--log_device_placement LOG_DEVICE_PLACEMENT
Log placement of ops on devices
--nolog_device_placement
Train:
./train.py
Evaluating
./eval.py --eval_train --checkpoint_dir="./runs/1459637919/checkpoints/"
Replace the checkpoint dir with the output from the training. To use your own data, change the eval.py
script to load your data.
References
- Convolutional Neural Networks for Sentence Classification
- A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification
TensorFlow 問答:http://tensorflow123.com/