開源標准數據集 —— mnist（手寫字符識別）

下載地址：mnist.pkl.gz

１. 使用 python 讀取和解析 mnist.pkl.gz

import pickle
import gzip
from PIL import Image

def load_data():
    with gzip.open('./mnist.pkl.gz') as fp:
        training_data, valid_data, test_data = pickle.load(fp)
    return training_data, valid_data, test_data

training_data, valid_data, test_data = load_data()
print len(training_data[0])
print len(valid_data[0])
print len(test_data[0])
print len(training_data[0][0])

I = training_data[0][0]
I.resize((28, 28))
im = Image.fromarray((I*256).astype('uint8'))
im.show()
im.save('5.png')

可以看出，mnist.pkl.gz 分為訓練集，校驗集和測試集；

使用 PIL 中的圖像相關 api，我們可對其中的圖像顯示出來並保存；

２. Python中的單行、多行、中文注釋

在大量的數據處理或者計算機視覺的文獻和著作中，我們常見如下的數據集可視化（甚至對參數也可進行可視化，畢竟圖像的本質是二維數組），通過文章末尾的代碼我們發現只需對布局及間距的慎重設置，便可對大量豐富的圖像以”地板貼磚（tiles on a floor）”的形式進行組織，也即可視化，展示數據或相關工作，可以起到十分直觀的效果，下圖即是對深度神經網絡的權值矩陣進行的貼磚可視化：

def normalize(darr, eps=1e-8):
    # normalize(x) = (x-min)/(max-min)
    darr -= darr.min()
    darr *= 1./(darr.max()+eps)
    return darr

def tile_raster_images(X, image_shape, tile_shape, 
            tile_spacing=(0, 0), normalize_rows=True, output_pixel_vals=True):
            # image_shape：每一個磚的高和寬，
            # tile_shape：在橫縱兩個方向上分別有多少磚
            # tile_spacing：磚與磚之間的距離
            # normalize_rows：是否對磚進行歸一化
            # output_pixel_vals：是否對磚以圖像的形式進行顯示

    assert len(image_shape) == 2
    assert len(tile_shape) == 2
    assert len(tile_spacing) == 2
                            # 對參數進行斷言，確保它們都是二維元組
    output_shape = [
        (ishp + tsp)*tshp-tsp
        for ishp, tshp, tsp in zip(image_shape, tile_shape, tile_spacing)
    ]
                    # image_shape == (28, 28)   mnist data
                    # tile_shape == (10, 10), tile_spacing == (1, 1)
                    # [(28+1)*10-1]*[(28+1)*10-1]                   

    H, W = image_shape
    Hs, Ws = tile_spacing
    dt = 'uint8' if output_pixel_vals else X.dtype
                            # python 風格的三目運算符
    output_array = numpy.zeros(output_shape, dtype=dt)

    # 開始貼磚
    for i in range(tile_shape[0]):
        for j in range(tile_shape[1]):
            if i*tile_shape[1]+j < X.shape[0]:
                    # X的每一行是一個圖像（二維）flatten后的（一維的行向量）
                this_x = X[i*tile_shape[1]+j]
                this_image = normalize(this_x.reshape(image_shape)) if normalize_rows else this_x.reshape(image_shape)
                c = 255 if output_pixel_vals else 1
                output_array[
                    i*(H+Hs):i*(H+Hs)+H, j*(W+Ws):j*(W+Ws)+W
                ] = this_image*c
    return output_array

import numpy
from PIL import Image

X = numpy.random.randn(500, 28*28)
arr = tile_raster_images(X, image_shape=(28, 28), 
            tile_shape=(12, 12), tile_spacing=(1, 1))
img = Image.fromarray(arr)
img.show()
img.save('./磚塊可視化.png')
    # 這里也可使用 matplotlib 進行顯示
    # plt.imshow(img, cmap='gray')
    # plt.show()

可視化可以更直觀的觀察數據，讓工作更加高效。

３. 數據可視化，貼磚

一、python單行注釋符號(#)

示例：#this is a comment

二、批量、多行注釋符號

多行注釋是用三引號”’ ”’包含的，引號可以使單引號也可以是雙引號

例如：

'''
ABC
ABC
ABC
'''
"""
ABC
ABC
ABC
"""

三、python中文注釋方法

如果文件里有非ASCII字符，需要在第一行或第二行指定編碼聲明。把ChineseTest.py文件的編碼重新改為ANSI，並加上編碼聲明：

一定要在第一行或者第二行加上這么一句話：

#coding=utf-8
或
# -*- coding: utf-8 -*-

我剛開始加上了依然出錯，是因為我的py文件的前三行是注釋聲明，我把這句話放在了第四行，所以依然報錯。

py腳本的前兩行一般都是：

#!/usr/bin/python

# -*- coding: utf-8 -*-

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python讀寫dbf數據庫 Python讀寫Oracle數據庫 GDAL python教程（1）——用OGR讀寫矢量數據 python讀寫和pickle數據的序列化 Python數據分析庫pandas ------ pandas數據讀寫 Python之使用Pandas庫實現MySQL數據庫的讀寫 Python StringIO實現內存緩沖區中讀寫數據實踐 2-2 python多線程讀寫mysql數據庫 Python中文件的讀寫 Python 【文件的讀寫】