【問題解決方案】Keras手寫數字識別-ConnectionResetError: [WinError 10054] 遠程主機強迫關閉了一個現有的連接


參考:

編程環境:

  • 操作系統:win7 - CPU

  • anaconda-Python3-jupyter notebook

  • tersonFlow:1.10.0

  • Keras:2.2.4

背景

  • Keras實現手寫數字識別,在載入數據階段報錯:

    • ConnectionResetError: [WinError 10054] 遠程主機強迫關閉了一個現有的連接


問題解決步驟:

1-去官網下載數據集
2-編寫獨立的載入數據模塊以便主程序引用
3-在主程序進行相應的修改
4-測試運行是否正常
5-數組過大的新問題與梯子解決

1-去官網下載數據集:

2-編寫獨立的載入數據模塊以便主程序引用

  • 將如下代碼另存為一個文件load_data.py,后面直接import使用(代碼來自調參博文1)

  • 數據集放在代碼文件所在目錄下

  • 注意文件路徑格式


# encoding: utf-8
"""
對MNIST手寫數字數據文件轉換為bmp圖片文件格式。
數據集下載地址為http://yann.lecun.com/exdb/mnist。
相關格式轉換見官網以及代碼注釋。

========================
關於IDX文件格式的解析規則:
========================
THE IDX FILE FORMAT

the IDX file format is a simple format for vectors and multidimensional matrices of various numerical types.
The basic format is

magic number
size in dimension 0
size in dimension 1
size in dimension 2
.....
size in dimension N
data

The magic number is an integer (MSB first). The first 2 bytes are always 0.

The third byte codes the type of the data:
0x08: unsigned byte
0x09: signed byte
0x0B: short (2 bytes)
0x0C: int (4 bytes)
0x0D: float (4 bytes)
0x0E: double (8 bytes)

The 4-th byte codes the number of dimensions of the vector/matrix: 1 for vectors, 2 for matrices....

The sizes in each dimension are 4-byte integers (MSB first, high endian, like in most non-Intel processors).

The data is stored like in a C array, i.e. the index in the last dimension changes the fastest.
"""

import numpy as np
import struct
import matplotlib.pyplot as plt

# 訓練集文件
train_images_idx3_ubyte_file = './data/train-images-idx3-ubyte'
# 訓練集標簽文件
train_labels_idx1_ubyte_file = './data/train-labels-idx1-ubyte'

# 測試集文件
test_images_idx3_ubyte_file = './data/t10k-images-idx3-ubyte'
# 測試集標簽文件
test_labels_idx1_ubyte_file = './data/t10k-labels-idx1-ubyte'


def decode_idx3_ubyte(idx3_ubyte_file):
    """
    解析idx3文件的通用函數
    :param idx3_ubyte_file: idx3文件路徑
    :return: 數據集
    """
    # 讀取二進制數據
    bin_data = open(idx3_ubyte_file, 'rb').read()

    # 解析文件頭信息,依次為魔數、圖片數量、每張圖片高、每張圖片寬
    offset = 0
    fmt_header = '>iiii'
    magic_number, num_images, num_rows, num_cols = struct.unpack_from(fmt_header, bin_data, offset)
    #print('魔數:%d, 圖片數量: %d張, 圖片大小: %d*%d' % (magic_number, num_images, num_rows, num_cols))

    # 解析數據集
    image_size = num_rows * num_cols
    offset += struct.calcsize(fmt_header)
    fmt_image = '>' + str(image_size) + 'B'
    images = np.empty((num_images, num_rows, num_cols))
    for i in range(num_images):
        #if (i + 1) % 10000 == 0:
            #print('已解析 %d' % (i + 1) + '張')
        images[i] = np.array(struct.unpack_from(fmt_image, bin_data, offset)).reshape((num_rows, num_cols))
        offset += struct.calcsize(fmt_image)
    return images


def decode_idx1_ubyte(idx1_ubyte_file):
    """
    解析idx1文件的通用函數
    :param idx1_ubyte_file: idx1文件路徑
    :return: 數據集
    """
    # 讀取二進制數據
    bin_data = open(idx1_ubyte_file, 'rb').read()

    # 解析文件頭信息,依次為魔數和標簽數
    offset = 0
    fmt_header = '>ii'
    magic_number, num_images = struct.unpack_from(fmt_header, bin_data, offset)
    #print('魔數:%d, 圖片數量: %d張' % (magic_number, num_images))

    # 解析數據集
    offset += struct.calcsize(fmt_header)
    fmt_image = '>B'
    labels = np.empty(num_images)
    for i in range(num_images):
        #if (i + 1) % 10000 == 0:
        #    print('已解析 %d' % (i + 1) + '張')
        labels[i] = struct.unpack_from(fmt_image, bin_data, offset)[0]
        offset += struct.calcsize(fmt_image)
    return labels


def load_train_images(idx_ubyte_file=train_images_idx3_ubyte_file):
    """
    TRAINING SET IMAGE FILE (train-images-idx3-ubyte):
    [offset] [type]          [value]          [description]
    0000     32 bit integer  0x00000803(2051) magic number
    0004     32 bit integer  60000            number of images
    0008     32 bit integer  28               number of rows
    0012     32 bit integer  28               number of columns
    0016     unsigned byte   ??               pixel
    0017     unsigned byte   ??               pixel
    ........
    xxxx     unsigned byte   ??               pixel
    Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).

    :param idx_ubyte_file: idx文件路徑
    :return: n*row*col維np.array對象,n為圖片數量
    """
    return decode_idx3_ubyte(idx_ubyte_file)


def load_train_labels(idx_ubyte_file=train_labels_idx1_ubyte_file):
    """
    TRAINING SET LABEL FILE (train-labels-idx1-ubyte):
    [offset] [type]          [value]          [description]
    0000     32 bit integer  0x00000801(2049) magic number (MSB first)
    0004     32 bit integer  60000            number of items
    0008     unsigned byte   ??               label
    0009     unsigned byte   ??               label
    ........
    xxxx     unsigned byte   ??               label
    The labels values are 0 to 9.

    :param idx_ubyte_file: idx文件路徑
    :return: n*1維np.array對象,n為圖片數量
    """
    return decode_idx1_ubyte(idx_ubyte_file)


def load_test_images(idx_ubyte_file=test_images_idx3_ubyte_file):
    """
    TEST SET IMAGE FILE (t10k-images-idx3-ubyte):
    [offset] [type]          [value]          [description]
    0000     32 bit integer  0x00000803(2051) magic number
    0004     32 bit integer  10000            number of images
    0008     32 bit integer  28               number of rows
    0012     32 bit integer  28               number of columns
    0016     unsigned byte   ??               pixel
    0017     unsigned byte   ??               pixel
    ........
    xxxx     unsigned byte   ??               pixel
    Pixels are organized row-wise. Pixel values are 0 to 255. 0 means background (white), 255 means foreground (black).

    :param idx_ubyte_file: idx文件路徑
    :return: n*row*col維np.array對象,n為圖片數量
    """
    return decode_idx3_ubyte(idx_ubyte_file)


def load_test_labels(idx_ubyte_file=test_labels_idx1_ubyte_file):
    """
    TEST SET LABEL FILE (t10k-labels-idx1-ubyte):
    [offset] [type]          [value]          [description]
    0000     32 bit integer  0x00000801(2049) magic number (MSB first)
    0004     32 bit integer  10000            number of items
    0008     unsigned byte   ??               label
    0009     unsigned byte   ??               label
    ........
    xxxx     unsigned byte   ??               label
    The labels values are 0 to 9.

    :param idx_ubyte_file: idx文件路徑
    :return: n*1維np.array對象,n為圖片數量
    """
    return decode_idx1_ubyte(idx_ubyte_file)

def run():
    train_images = load_train_images()
    train_labels = load_train_labels()
    test_images = load_test_images()
    test_labels = load_test_labels()

    # 查看前十個數據及其標簽以讀取是否正確
    for i in range(10):
        print(train_labels[i])
        plt.imshow(train_images[i], cmap='gray')
        plt.show()
    print('done')

if __name__ == '__main__':
    run()


3-在主程序進行相應的修改

  • 由原來的 from keras.datasets.. 修改為 from load_data import *

  • 數據預處理部分相應的修改:

4-測試運行是否正常

  • 報錯:找不到文件路徑

  • 繼續報錯:ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.

5-數組過大的問題搜了很多也解決不了,轉而求助於梯子后用原方法載入數據

  • 因為上面的路子已經卡死了,過大的問題解決不了進行不下去了,希望可以成功把數據載入...

  • 成功了啊!!!! 感天動地!!!! 梯子萬歲!!!!!

  • 梯子心得:只要梯子沒問題,可以多試幾次,最終都會成功的,太贊啦~~

總結:

  • 這個問題的本質是牆的問題,只有梯子夠高夠穩,其實不用以上這么麻煩

END


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM