基於MNIST數據的softmax regression

本文轉載自查看原文 2017-09-08 15:36 1128 tensorflow/ 機器學習 Machine learning/ MNIST/ 機器學習有監督分類

跟着tensorflow上mnist基本機器學習教程聯系
首先了解sklearn接口: sklearn.linear_model.LogisticRegression

  In the multiclass case, the training algorithm uses the one-vs-rest (OvR)
    scheme if the 'multi_class' option is set to 'ovr', and uses the cross-
    entropy loss if the 'multi_class' option is set to 'multinomial'.
    (Currently the 'multinomial' option is supported only by the 'lbfgs',
    'sag' and 'newton-cg' solvers.)
    This class implements regularized logistic regression using the
    'liblinear' library, 'newton-cg', 'sag' and 'lbfgs' solvers. It can handle
    both dense and sparse input. Use C-ordered arrays or CSR matrices
    containing 64-bit floats for optimal performance; any other input format
    will be converted (and copied).
    The 'newton-cg', 'sag', and 'lbfgs' solvers support only L2 regularization
    with primal formulation. The 'liblinear' solver supports both L1 and L2
    regularization, with a dual formulation only for the L2 penalty.
    Read more in the :ref:`User Guide <logistic_regression>`.
    Parameters
    ----------
    penalty : str, 'l1' or 'l2', default: 'l2'
        Used to specify the norm used in the penalization. The 'newton-cg',
        'sag' and 'lbfgs' solvers support only l2 penalties.
        .. versionadded:: 0.19
           l1 penalty with SAGA solver (allowing 'multinomial' + L1)
    dual : bool, default: False
        Dual or primal formulation. Dual formulation is only implemented for
        l2 penalty with liblinear solver. Prefer dual=False when
        n_samples > n_features.
    tol : float, default: 1e-4
        Tolerance for stopping criteria.
    C : float, default: 1.0
        Inverse of regularization strength; must be a positive float.
        Like in support vector machines, smaller values specify stronger
        regularization.
    fit_intercept : bool, default: True
        Specifies if a constant (a.k.a. bias or intercept) should be
        added to the decision function.
  
    solver : {'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'},
        default: 'liblinear'
        Algorithm to use in the optimization problem.
        - For small datasets, 'liblinear' is a good choice, whereas 'sag' and
            'saga' are faster for large ones.
        - For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs'
            handle multinomial loss; 'liblinear' is limited to one-versus-rest
            schemes.
        - 'newton-cg', 'lbfgs' and 'sag' only handle L2 penalty, whereas
            'liblinear' and 'saga' handle L1 penalty.
        Note that 'sag' and 'saga' fast convergence is only guaranteed on
        features with approximately the same scale. You can
        preprocess the data with a scaler from sklearn.preprocessing.
        .. versionadded:: 0.17
           Stochastic Average Gradient descent solver.
        .. versionadded:: 0.19
           SAGA solver.
   
    multi_class : str, {'ovr', 'multinomial'}, default: 'ovr'
        Multiclass option can be either 'ovr' or 'multinomial'. If the option
        chosen is 'ovr', then a binary problem is fit for each label. Else
        the loss minimised is the multinomial loss fit across
        the entire probability distribution. Does not work for liblinear
        solver.
        .. versionadded:: 0.18
           Stochastic Average Gradient descent solver for 'multinomial' case.
   
    Attributes
  
    coef_ : array, shape (1, n_features) or (n_classes, n_features)
        Coefficient of the features in the decision function.
        `coef_` is of shape (1, n_features) when the given problem
        is binary.
    intercept_ : array, shape (1,) or (n_classes,)
        Intercept (a.k.a. bias) added to the decision function.
        If `fit_intercept` is set to False, the intercept is set to zero.
        `intercept_` is of shape(1,) when the problem is binary.
    n_iter_ : array, shape (n_classes,) or (1, )
        Actual number of iterations for all classes. If binary or multinomial,
        it returns only 1 element. For liblinear solver, only the maximum
        number of iteration across all classes is given.

基於Softmax的mnist回歸

# -*- coding: utf-8 -*-
"""
Created on Thu Sep  7 10:47:18 2017

@author: Administrator
"""

import gzip
import struct
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing
from sklearn.metrics import accuracy_score
import tensorflow as tf

# MNIST data is stored in binary format, 
# and we transform them into numpy ndarray objects by the following two utility functions
def read_image(file_name):
    with gzip.open(file_name, 'rb') as f:
        buf = f.read()
        index = 0
        magic, images, rows, columns = struct.unpack_from('>IIII' , buf , index)
        index += struct.calcsize('>IIII')

        image_size = '>' + str(images*rows*columns) + 'B'
        ims = struct.unpack_from(image_size, buf, index)
        
        im_array = np.array(ims).reshape(images, rows, columns)
        return im_array

def read_label(file_name):
    with gzip.open(file_name, 'rb') as f:
        buf = f.read()
        index = 0
        magic, labels = struct.unpack_from('>II', buf, index)
        index += struct.calcsize('>II')
        
        label_size = '>' + str(labels) + 'B'
        labels = struct.unpack_from(label_size, buf, index)

        label_array = np.array(labels)
        return label_array

print ("Start processing MNIST handwritten digits data...")
train_x_data = read_image("MNIST_data/train-images-idx3-ubyte.gz")
train_x_data = train_x_data.reshape(train_x_data.shape[0], -1).astype(np.float32)
train_y_data = read_label("MNIST_data/train-labels-idx1-ubyte.gz")
test_x_data = read_image("MNIST_data/t10k-images-idx3-ubyte.gz")
test_x_data = test_x_data.reshape(test_x_data.shape[0], -1).astype(np.float32)
test_y_data = read_label("MNIST_data/t10k-labels-idx1-ubyte.gz")

train_x_minmax = train_x_data / 255.0
test_x_minmax = test_x_data / 255.0

# Of course you can also use the utility function to read in MNIST provided by tensorflow
# from tensorflow.examples.tutorials.mnist import input_data
# mnist = input_data.read_data_sets("MNIST_data/", one_hot=False)
# train_x_minmax = mnist.train.images
# train_y_data = mnist.train.labels
# test_x_minmax = mnist.test.images
# test_y_data = mnist.test.labels

# We evaluate the softmax regression model by sklearn first
eval_sklearn = False
if eval_sklearn:
    print ("Start evaluating softmax regression model by sklearn...")
    reg = LogisticRegression(solver="lbfgs", multi_class="multinomial")
    reg.fit(train_x_minmax, train_y_data)
    np.savetxt('coef_softmax_sklearn.txt', reg.coef_, fmt='%.6f')  # Save coefficients to a text file
    test_y_predict = reg.predict(test_x_minmax)
    print ("Accuracy of test set: %f" % accuracy_score(test_y_data, test_y_predict))

eval_tensorflow = True
batch_gradient = False
if eval_tensorflow:
    print ("Start evaluating softmax regression model by tensorflow...")
    # reformat y into one-hot encoding style
    lb = preprocessing.LabelBinarizer()
    lb.fit(train_y_data)
    train_y_data_trans = lb.transform(train_y_data)
    test_y_data_trans = lb.transform(test_y_data)

    x = tf.placeholder(tf.float32, [None, 784])
    W = tf.Variable(tf.zeros([784, 10]))
    b = tf.Variable(tf.zeros([10]))
    V = tf.matmul(x, W) + b
    y = tf.nn.softmax(V)

    y_ = tf.placeholder(tf.float32, [None, 10])

    loss = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
    optimizer = tf.train.GradientDescentOptimizer(0.5)
    train = optimizer.minimize(loss)

    init = tf.initialize_all_variables()

    sess = tf.Session()
    sess.run(init)

    if batch_gradient:
        for step in range(300):
            sess.run(train, feed_dict={x: train_x_minmax, y_: train_y_data_trans})
            if step % 10 == 0:
                print ("Batch Gradient Descent processing step %d" % step)
        print ("Finally we got the estimated results, take such a long time...")
    else:
        for step in range(1000):
            sample_index = np.random.choice(train_x_minmax.shape[0], 100)
            batch_xs = train_x_minmax[sample_index, :]
            batch_ys = train_y_data_trans[sample_index, :]
            sess.run(train, feed_dict={x: batch_xs, y_: batch_ys})
            if step % 100 == 0:
                print ("Stochastic Gradient Descent processing step %d" % step)
    np.savetxt('coef_softmax_tf.txt', np.transpose(sess.run(W)), fmt='%.6f')  # Save coefficients to a text file
    correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print ("Accuracy of test set: %f" % sess.run(accuracy, feed_dict={x: test_x_minmax, y_: test_y_data_trans}))

注意：
A Variable is a modifiable tensor that lives in TensorFlow's graph of interacting operations. It can be used and even modified by the computation. For machine learning applications, one generally has the model parameters be Variables.
不過從測試集的准確率來看，二者都在92%左右，sklearn稍微好一點。注意，92%的准確率看起來不錯，但其實是一個很低的准確率，按照官網教程的說法，應該要感到羞愧。
sklearn的估計時間有點長，因為每一輪參數更新都是基於全量的訓練集數據算出損失，再算出梯度，然后再改進結果的。
tensorflow采用batch gradient descent估計算法時，時間也比較長，原因同上。
tensorflow采用stochastic gradient descent估計算法時間短，最后的估計結果也挺好，相當於每輪迭代只用到了部分數據集算出損失和梯度，速度變快，但可能bias增加；所以把迭代次數增多，這樣可以降低variance，總體上的誤差相比batch gradient descent並沒有差多少。

官網demo

自動下載數據


# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

"""A very simple MNIST classifier.
See extensive documentation at
https://www.tensorflow.org/get_started/mnist/beginners
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import argparse
import sys

from tensorflow.examples.tutorials.mnist import input_data

import tensorflow as tf

FLAGS = None


def main(_):
  # Import data
  mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)

  # Create the model
  x = tf.placeholder(tf.float32, [None, 784])
  W = tf.Variable(tf.zeros([784, 10]))
  b = tf.Variable(tf.zeros([10]))
  y = tf.matmul(x, W) + b

  # Define loss and optimizer
  y_ = tf.placeholder(tf.float32, [None, 10])

  # The raw formulation of cross-entropy,
  #
  #   tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.nn.softmax(y)),
  #                                 reduction_indices=[1]))
  #
  # can be numerically unstable.
  #
  # So here we use tf.nn.softmax_cross_entropy_with_logits on the raw
  # outputs of 'y', and then average across the batch.
  cross_entropy = tf.reduce_mean(
      tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
  train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

  sess = tf.InteractiveSession()
  tf.global_variables_initializer().run()
  # Train
  # 該循環的每個步驟中，我們都會隨機抓取訓練數據中的100個批處理數據點，然后我們用這些數據點作為參數替換之前的占位符來運行train_step。
  # 使用一小部分的隨機數據來進行訓練被稱為隨機訓練（stochastic
  # training）- 在這里更確切的說是隨機梯度下降訓練。在理想情況下，我們希望用我們所有的數據來進行每一步的訓練，因為這能給我們更好的訓練結果，但顯然這需要很大的計算開銷。
  # 所以，每一次訓練我們可以使用不同的數據子集，這樣做既可以減少計算開銷，又可以最大化地學習到數據集的總體特性。
  for _ in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100) ##
    sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

  # Test trained model
  correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
  accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
  print(sess.run(accuracy, feed_dict={x: mnist.test.images,
                                      y_: mnist.test.labels}))

if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument('--data_dir', type=str, default='/tmp/tensorflow/mnist/input_data',
                      help='Directory for storing input data')
  FLAGS, unparsed = parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

result

Start processing MNIST handwritten digits data...
Start evaluating softmax regression model by tensorflow...
WARNING:tensorflow:From D:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\util\tf_should_use.py:175: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
2017-09-08 16:47:36.504803: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-08 16:47:36.504803: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
Stochastic Gradient Descent processing step 0
Stochastic Gradient Descent processing step 100
Stochastic Gradient Descent processing step 200
Stochastic Gradient Descent processing step 300
Stochastic Gradient Descent processing step 400
Stochastic Gradient Descent processing step 500
Stochastic Gradient Descent processing step 600
Stochastic Gradient Descent processing step 700
Stochastic Gradient Descent processing step 800
Stochastic Gradient Descent processing step 900
Accuracy of test set: 0.915600

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Tensorflow實現對fashion mnist（衣服，褲子等圖片）數據集的softmax分類 TensorFlow 入門之手寫識別(MNIST) softmax算法 Softmax softmax MNIST 數據集分類導出MNIST的數據集 keras加載mnist數據集 RNN入門（一）識別MNIST數據集 Fashion MNIST數據集介紹下載並使用MNIST數據集