kaggle競賽使用TPU對104種花朵進行分類第十八次嘗試 99.9%准確率中文注釋【深度學習TPU+Keras+Tensorflow+EfficientNetB7】

本文轉載自查看原文 2020-05-07 18:11 932

排行榜分數

該排行榜的計算結果約為測試數據的70％。最終結果將基於其他30％，因此最終排名可能會有所不同。（就是排行榜的計算結果不一定等於你的驗證集准確率）
第18次嘗試的排行榜分數為95.7%，當時我還挺開心的，可能這就是無知者最快樂吧，還好我不知道我自己菜，哈哈哈，但是被大佬噴了，然后我就又加油去學習其他模型，去調參數。
在這里插入圖片描述
應大佬要求先貼一個第19次嘗試（版本19）的排名，正在慢慢進步，別問我為啥名字不一樣，因為每個賬號一只能提交5次結果，每個賬號一周只能用30小時TPU，所以我申請了4個賬號，從張志浩1-張志浩4

第20次嘗試
在這里插入圖片描述
第21次嘗試

這是第21版本的其他提交分數，深度學習嘛，本來每次訓練結果都不一樣

最終排名

我上面的版本21就是最終版本，而且在拿到64名的名次以后我沒有再進行訓練和提交，最終排名中的37名的分數就是我原來的64名分數
在這里插入圖片描述
翻譯：比賽已經結束。該排行榜反映了初步的最終排名。競賽組織者驗證結果后，結果將成為最終結果。

我猜想最終分數與我們之前看到的額不同有兩個原因：
1.很多人用與測試集相關的數據集訓練，被判定為作弊
2. 比賽最終分數由70%給定的測試集（我們能拿到的test數據）和30%其他測試集決定，我們模型可能在這70%上表現好，在另外30%就差了

在這里插入圖片描述

比賽過后的一點心得

這個比賽花費了我將近一周的時間，這一周基本都是熬夜熬夜熬夜，哈哈哈哈😂😂😂。
因為這個比賽真的學到好多東西啊，可能因為以前掌握的東西太少了吧。學習了兩大莫模型DenseNet+EfficientNet。學習了管道（Pipelining）性能優化、並行讀取數據、緩存（cache）性能優化。以前因為數據集小、神經網絡簡單，從沒有考慮過優化。
這次比賽我這幾天應該不會再次去嘗試了，我參加這次比賽是想把它作為我們學校《深度學習導論》課程的結課作業，已經開始寫報告了，寫了40多頁了，大家有需要就私聊我吧😁😁😁，QQ3382885270，我是菜雞，而且賊喜歡問別人問題，得到很多大佬的幫助（尤其是經常去煩我們老師，被老師詢問為什么老是糾結一些小細節，哈哈哈，還挺有意思的。感謝徐老師😎😎😎），所以我很想能為其他人提供幫助，能和其他人一起變優秀。✨✨✨
心得：

該比賽需要梯子，你使用TPU需要驗證手機號，驗證手機號需要梯子；你使用Kaggle的Kernel也需要梯子。你讀取Kaggle數據還是需要梯子。

一個賬號一周只能使用30小時TPU，一個賬號一周只能進行5次結果提交，我建議大家申請很多賬號同時參加比賽。而且每次訓練模型都需要2小時甚至更多，所以我建議大家同時開多個瀏覽器，每個瀏覽器登陸不同賬號，同時進行模型的訓練，這樣2小時就能同時訓練很多個模型了。

堅持吧兄弟，你會變強。不是只有你很累，沒有什么懷才不遇，只是你太菜了。如果你真的很努力很努力，你會開花結果的。

前言

大家好，我是愛做夢的魚（因為喜歡幻想，總是想象各種美好的事），我是東北大學大三的小菜雞，非常渴望優秀，羡慕優秀的人，已拿兩個暑假offer（拿的大數據開發，因為數據分析的實習崗位不面向本科生，但是還是很喜歡數據分析，我把數據分析當作我僅存的浪漫），
剛系統學習兩周深度學習（通過看書《Python深度學習》+《神經網絡和深度學習》），歡迎大家找我進行交流😂😂😂
這是我的博客地址：子浩的博客https://blog.csdn.net/weixin_43124279

本次kaggle競賽地址：https://www.kaggle.com/c/flower-classification-with-tpus/overview
其他文章：
【深度學習 TPU、tensorflow】kaggle競賽使用TPU對104種花朵進行分類第一次嘗試 40%准確率
 【深度學習TPU+Keras+Tensorflow+EfficientNetB7】kaggle競賽使用TPU對104種花朵進行分類第十八次嘗試 99.9%准確率（英文版）
專欄：
深度學習

本競賽英文全稱
Flower Classification with TPUs
Use TPUs to classify 104 types of flowers

以下為比賽的描述：
在這場比賽中，您面臨的挑戰是建立一個機器學習模型，該模型可識別圖像
數據集中的花朵類型（為簡單起見，我們堅持使用100多種類型的花朵）。

數據集：
12753個訓練圖像，3712個驗證圖像，7382個未標記的測試圖像
選用的數據為：
在這次比賽中，我們根據來自五個不同公共數據集的花卉圖像對104種花卉進行分類。有些種類非常狹窄，只包含一個特定的花的子種類（例如粉紅報春花），而其他種類包含許多子種類（例如野生玫瑰）。
這種競賽的不同之處在於以TFRecord格式提供圖像。 TFRecord格式是Tensorflow中經常使用的容器格式，用於對數據數據文件進行分組和分片以獲得最佳訓練性能。每個文件都包含許多圖像的id，標簽（樣本數據，用於訓練數據）和img（數組形式的實際像素）信息。

train/*.tfrec-訓練集，包括標簽。
val/*.tfrec-驗證集。預分割訓練樣本，帶有幫助檢查您的模型在TPU上的性能的標簽。這種分割是按標簽分層的。
test/*.tfrec-測試集，不帶標簽的樣本-您將預測這些花屬於哪一類。
sample_submission.csv-格式正確的示例提交文件
- id-每個樣本的唯一id。
- 標記（在訓練數據中）樣本所代表的花的類別

版本更新情況

以下准確率全都是驗證准確率，和比賽提交以后的准確率有一定區別，因為算法不一樣

V1：官方給出的代碼，用了VGG模型，准確率40%
V2-V8：不斷增刪層，並調超參數，更換損失函數與優化器准確率增長到60%就遇到瓶頸了
V9：嘗試通過僅在5分鍾內訓練softmax層來預熱，然后再釋放所有重量。准確率下降到50%
V10：更多數據擴充准確率55%
V11：使用LR Scheduler 准確率62%
V12：同時使用訓練和驗證數據來訓練模型。准確率68%
V13；使用谷歌開源新模型 EfficientNetB7 准確率91%，害怕
V14：訓練更長的時間（25個輪次）。准確率82%，下降了，是因為過擬合吧
V15：回到20個輪次； Global Max Pooling instead of Average。（全局最大池而不是平均。）准確率67%，不適合
V16：回滾到global average pooling （全局平均池）准確率81%
V18：回滾到V13，並調節部分參數准確率99.9%，恐怖如斯，我好無敵

1. 安裝efficientnet

!pip install -q efficientnet #因為我們想用 EfficientNet模型，所以我們先進行安裝efficientnet，
# 感嘆號表示調用控制台，這句代碼等價於於在控制台輸入了pip install -q efficientnet

2. 導入需要的包

# 導入需要的包
import math, re, os # math：包括一些通用的數學公式；re：字符串正則匹配；os：操作系統接口
import tensorflow as tf # tensorflow包
import numpy as np # numpy操作數組
from matplotlib import pyplot as plt   # matplotlib進行畫圖
from kaggle_datasets import KaggleDatasets # Kaggle數據集
import efficientnet.tfkeras as efn    # 導入efficientnet模型
# 從python的sklearn機器學習中導入f1值、精度、召回率和混淆矩陣
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix  

print("Tensorflow version " + tf.__version__) #檢查tensorflow的版本

Tensorflow version 2.1.0

3. 檢測TPU和GPU

我這里注釋掉的原因是我們已經知道TPU和GPU存在，而且我們打算完全用TPU而不用GPU

# Detect hardware, return appropriate distribution strategy
# try:
      # TPU檢測。 如果設置了TPU_NAME環境變量，則不需要任何參數。 在Kaggle上，情況總是如此。
# tpu = tf.distribute.cluster_resolver.TPUClusterResolver() 
# print('Running on TPU ', tpu.master())
# except ValueError:
# tpu = None

# if tpu:
# tf.config.experimental_connect_to_cluster(tpu)
# tf.tpu.experimental.initialize_tpu_system(tpu)
# strategy = tf.distribute.experimental.TPUStrategy(tpu)
# else:
# strategy = tf.distribute.get_strategy() # default distribution strategy in Tensorflow. Works on CPU and single GPU.

# print("REPLICAS: ", strategy.num_replicas_in_sync) #輸出副本數

4. 配置TPU、訪問路徑等

AUTO = tf.data.experimental.AUTOTUNE # 可以讓程序自動的選擇最優的線程並行個數

# Create strategy from tpu
# 從TPU創建部署
tpu = tf.distribute.cluster_resolver.TPUClusterResolver() #如果先前設置好了ＴＰＵ＿ＮＡＭＥ環境變量，不需要再給參數．
tf.config.experimental_connect_to_cluster(tpu) # 配置實驗連接到群集
tf.tpu.experimental.initialize_tpu_system(tpu) # 初始化tpu系統
strategy = tf.distribute.experimental.TPUStrategy(tpu) # 設置TPU部署


# 官方給出的競賽數據訪問注釋
# Competition data access
# TPUs read data directly from Google Cloud Storage (GCS). 
# This Kaggle utility will copy the dataset to a GCS bucket co-located with the TPU. 
# If you have multiple datasets attached to the notebook, 
# you can pass the name of a specific dataset to the get_gcs_path function. 
# The name of the dataset is the name of the directory it is mounted in. 
# Use !ls /kaggle/input/ to list attached datasets.
# 比賽數據訪問
# TPU直接從Google Cloud Storage（GCS）讀取數據。
# 該Kaggle實用程序會將數據集復制到與TPU並置的GCS存儲桶中。
# 如果筆記本有多個數據集，
# 您可以將特定數據集的名稱傳遞給get_gcs_path函數。
# 數據集的名稱是其安裝目錄的名稱。
# 使用！ls / kaggle / input /列出附加的數據集。

GCS_DS_PATH = KaggleDatasets().get_gcs_path() #設置Kaggle數據的訪問路徑

# Configuration

IMAGE_SIZE = [512, 512] # 配置像素點矩陣大小
EPOCHS = 20 # # 配置模型訓練的輪次
BATCH_SIZE = 16 * strategy.num_replicas_in_sync # 設置每個小批量的大小

# 配置不同大小圖片的路徑
GCS_PATH_SELECT = { # available image sizes
    192: GCS_DS_PATH + '/tfrecords-jpeg-192x192',
    224: GCS_DS_PATH + '/tfrecords-jpeg-224x224',
    331: GCS_DS_PATH + '/tfrecords-jpeg-331x331',
    512: GCS_DS_PATH + '/tfrecords-jpeg-512x512'
}
GCS_PATH = GCS_PATH_SELECT[IMAGE_SIZE[0]]

TRAINING_FILENAMES = tf.io.gfile.glob(GCS_PATH + '/train/*.tfrec') # 訓練集路徑
VALIDATION_FILENAMES = tf.io.gfile.glob(GCS_PATH + '/val/*.tfrec') # 驗證集路徑
TEST_FILENAMES = tf.io.gfile.glob(GCS_PATH + '/test/*.tfrec') # 測試集路徑 predictions on this dataset should be submitted for the competition

# 104種花的名稱
CLASSES = ['pink primrose',    'hard-leaved pocket orchid', 'canterbury bells', 'sweet pea',     'wild geranium',     'tiger lily',           'moon orchid',              'bird of paradise', 'monkshood',        'globe thistle',         # 00 - 09
           'snapdragon',       "colt's foot",               'king protea',      'spear thistle', 'yellow iris',       'globe-flower',         'purple coneflower',        'peruvian lily',    'balloon flower',   'giant white arum lily', # 10 - 19
           'fire lily',        'pincushion flower',         'fritillary',       'red ginger',    'grape hyacinth',    'corn poppy',           'prince of wales feathers', 'stemless gentian', 'artichoke',        'sweet william',         # 20 - 29
           'carnation',        'garden phlox',              'love in the mist', 'cosmos',        'alpine sea holly',  'ruby-lipped cattleya', 'cape flower',              'great masterwort', 'siam tulip',       'lenten rose',           # 30 - 39
           'barberton daisy',  'daffodil',                  'sword lily',       'poinsettia',    'bolero deep blue',  'wallflower',           'marigold',                 'buttercup',        'daisy',            'common dandelion',      # 40 - 49
           'petunia',          'wild pansy',                'primula',          'sunflower',     'lilac hibiscus',    'bishop of llandaff',   'gaura',                    'geranium',         'orange dahlia',    'pink-yellow dahlia',    # 50 - 59
           'cautleya spicata', 'japanese anemone',          'black-eyed susan', 'silverbush',    'californian poppy', 'osteospermum',         'spring crocus',            'iris',             'windflower',       'tree poppy',            # 60 - 69
           'gazania',          'azalea',                    'water lily',       'rose',          'thorn apple',       'morning glory',        'passion flower',           'lotus',            'toad lily',        'anthurium',             # 70 - 79
           'frangipani',       'clematis',                  'hibiscus',         'columbine',     'desert-rose',       'tree mallow',          'magnolia',                 'cyclamen ',        'watercress',       'canna lily',            # 80 - 89
           'hippeastrum ',     'bee balm',                  'pink quill',       'foxglove',      'bougainvillea',     'camellia',             'mallow',                   'mexican petunia',  'bromelia',         'blanket flower',        # 90 - 99
           'trumpet creeper',  'blackberry lily',           'common tulip',     'wild rose']

5. 各種函數

5.1. 可視化函數

# 展示訓練和驗證曲線，也就是損失和准確率隨輪次的變化
def display_training_curves(training, validation, title, subplot):
    if subplot%10==1: # set up the subplots on the first call # 在第一次調用該函數時設置子圖
        plt.subplots(figsize=(10,10), facecolor='#F0F0F0')
        plt.tight_layout()
    ax = plt.subplot(subplot) #設置子圖
    ax.set_facecolor('#F8F8F8') #設置背景顏色
    ax.plot(training) #畫訓練集的曲線
    ax.plot(validation) #畫測試集的曲線
    ax.set_title('model '+ title)
    ax.set_ylabel(title) #設置y軸標題
    #ax.set_ylim(0.28,1.05)
    ax.set_xlabel('epoch') #設置x軸標題
    ax.legend(['train', 'valid.']) #設置圖例
    
# 繪制混淆矩陣
def display_confusion_matrix(cmat, score, precision, recall):
    plt.figure(figsize=(15,15))  # 設置畫布大小
    ax = plt.gca() #返回當前axes(matplotlib.axes.Axes) 獲取當前子圖
    ax.matshow(cmat, cmap='Reds') #繪制矩陣
    ax.set_xticks(range(len(CLASSES)))  #根據花朵類別數（其實就是104）設置x軸范圍
    ax.set_xticklabels(CLASSES, fontdict={'fontsize': 7}) #設置x軸下標字體的大小
    plt.setp(ax.get_xticklabels(), rotation=45, ha="left", rotation_mode="anchor") #更換x軸下標角度
    ax.set_yticks(range(len(CLASSES)))  #根據花朵類別數（其實就是104）設置y軸范圍
    ax.set_yticklabels(CLASSES, fontdict={'fontsize': 7}) #設置y軸下標字體的大小
    plt.setp(ax.get_yticklabels(), rotation=45, ha="right", rotation_mode="anchor") #更換y軸下標角度
    titlestring = ""
    if score is not None:
        titlestring += 'f1 = {:.3f} '.format(score) #更改格式為有3位小數的浮點數
    if precision is not None:
        titlestring += '\nprecision = {:.3f} '.format(precision) #更改格式為有3位小數的浮點數
    if recall is not None:
        titlestring += '\nrecall = {:.3f} '.format(recall) #更改格式為有3位小數的浮點數
    if len(titlestring) > 0:
        ax.text(101, 1, titlestring, fontdict={'fontsize': 18, 'horizontalalignment':'right', 'verticalalignment':'top', 'color':'#804040'}) #添加文本注釋
    plt.show()

# 設置numpy數組基本屬性，設置顯示15個數字，用於插入換行符的每行字符數（默認為75）。
# threshold : int, optional，Total number of array elements which trigger summarization rather than full repr (default 1000).
# 當數組數目過大時，設置顯示幾個數字，其余用省略號
# linewidth : int, optional，The number of characters per line for the purpose of inserting line breaks (default 75).
# 用於插入換行符的每行字符數（默認為75）。
np.set_printoptions(threshold=15, linewidth=80)

# 將小批量圖片和標簽處理為numpy向量格式
def batch_to_numpy_images_and_labels(data):
    images, labels = data 
    numpy_images = images.numpy() #將圖像轉換為numpy向量格式
    numpy_labels = labels.numpy() #將label標簽轉換為numpy向量格式
    if numpy_labels.dtype == object: # 在這種情況下為二進制字符串，它們是圖像ID字符串
        numpy_labels = [None for _ in enumerate(numpy_images)]
    # 如果沒有標簽，只有圖像ID，則對標簽返回None（測試數據就是這種情況）
    return numpy_images, numpy_labels

# 把實際類型和模型預測出來的模型一起顯示在圖片上方，這是用給驗證集的，當對驗證集預測完標簽后和驗證集的實際標簽進行比較
# label,圖片中花朵的實際類別
# correct_label，當前我們預測的類別
def title_from_label_and_target(label, correct_label):
    # 如果沒有預測的類別，則返回實際類別，比如訓練集
    if correct_label is None:
        return CLASSES[label], True
    correct = (label == correct_label) #判斷一下實際類別和我們預測的類別是否一致
    # 如果一致，則返回OK，不一致則返回NO加實際類別
    return "{} [{}{}{}]".format(CLASSES[label], 'OK' if correct else 'NO', u"\u2192" if not correct else '',
                                CLASSES[correct_label] if not correct else ''), correct

# 繪制一朵花
def display_one_flower(image, title, subplot, red=False, titlesize=16):
    plt.subplot(*subplot)
    plt.axis('off') # 不顯示坐標尺寸
    plt.imshow(image) #函數負責對圖像進行處理，並顯示其格式；而plt.show()則是將plt.imshow()處理后的函數顯示出來。
    if len(title) > 0:
        #繪制圖片的標題
        plt.title(title, fontsize=int(titlesize) if not red else int(titlesize/1.2), color='red' if red else 'black', 
                  fontdict={'verticalalignment':'center'}, pad=int(titlesize/1.5))
    return (subplot[0], subplot[1], subplot[2]+1)
    
# 展示小批量圖片，我們在下面的代碼中經常展示20張照片
def display_batch_of_images(databatch, predictions=None):
    """This will work with: display_batch_of_images(images) # 只展示圖片 測試集需要這個 display_batch_of_images(images, predictions) #展示圖片加預測的類別 測試集需要這個 display_batch_of_images((images, labels)) #展示圖片加實際標簽 訓練集需要這個 display_batch_of_images((images, labels), predictions) #展示圖片+實際類別+預測類別 驗證集需要這個，因為驗證集既有實際標簽，也會進行預測 """
    # 讀取圖片和實際標簽數據，而且這些數據被轉換成numpy向量的格式
    images, labels = batch_to_numpy_images_and_labels(databatch)
    # 如果沒有實際標簽（即if labels is None為true），比如測試集，那么我們需要將labels變量設為每個元素都為none
    if labels is None:
        labels = [None for _ in enumerate(images)]
        
    # 自動平方：這將刪除不適合正方形或矩形的數據
    rows = int(math.sqrt(len(images)))
    cols = len(images)//rows  #" // " 表示整數除法,返回不大於結果的一個最大的整數，向下取整
        
    # 大小和間距
    FIGSIZE = 13.0  #畫圖大小
    SPACING = 0.1
    subplot=(rows,cols,1)
    if rows < cols:
        # 如果行大於列
        plt.figure(figsize=(FIGSIZE,FIGSIZE/cols*rows))
    else:
        plt.figure(figsize=(FIGSIZE/rows*cols,FIGSIZE))
    
    # display
    for i, (image, label) in enumerate(zip(images[:rows*cols], labels[:rows*cols])):
        title = '' if label is None else CLASSES[label]
        correct = True
        if predictions is not None:
            title, correct = title_from_label_and_target(predictions[i], label)
        dynamic_titlesize = FIGSIZE*SPACING/max(rows,cols)*40+3 # 經過測試可以在1x1到10x10圖像上工作的魔術公式
        subplot = display_one_flower(image, title, subplot, not correct, titlesize=dynamic_titlesize)
    
    #layout
    plt.tight_layout()
    if label is None and predictions is None:
        plt.subplots_adjust(wspace=0, hspace=0)
    else:
        plt.subplots_adjust(wspace=SPACING, hspace=SPACING)
    plt.show()

5.2. 數據集函數

# 准備圖像數據
def decode_image(image_data):
    image = tf.image.decode_jpeg(image_data, channels=3) # 將圖片解碼
    # 之前訓練圖像保存在一個 uint8 類型的數組中，取值區間為 [0, 255]。我們需要將其變換為一個 float32 數組，其形取值范圍為 0~1。
    # 將圖片轉換為[0，1]范圍內的浮點數
    image = tf.cast(image, tf.float32) / 255.0  
    image = tf.reshape(image, [*IMAGE_SIZE, 3]) # TPU所需的精確的大小
    return image

# 讀取帶有標簽的TFRecord 格式文件
def read_labeled_tfrecord(example):
    LABELED_TFREC_FORMAT = {
        "image": tf.io.FixedLenFeature([], tf.string), # tf.string means bytestring
        "class": tf.io.FixedLenFeature([], tf.int64),  # shape [] means single element
    }
    example = tf.io.parse_single_example(example, LABELED_TFREC_FORMAT)
    image = decode_image(example['image'])
    label = tf.cast(example['class'], tf.int32)
    return image, label # returns a dataset of (image, label) pairs

# 讀取沒有標簽的TFRecord 格式文件
def read_unlabeled_tfrecord(example):
    UNLABELED_TFREC_FORMAT = {
        "image": tf.io.FixedLenFeature([], tf.string), # tf.string means bytestring
        "id": tf.io.FixedLenFeature([], tf.string),  # shape [] means single element
        # class is missing, this competitions's challenge is to predict flower classes for the test dataset
    }
    example = tf.io.parse_single_example(example, UNLABELED_TFREC_FORMAT)
    image = decode_image(example['image'])
    idnum = example['id']
    return image, idnum # returns a dataset of image(s)

# 加載數據集
# 這三個參數分別為：文件路徑、是否有標簽、是否按順序（就是要不要把數據順序打亂）
def load_dataset(filenames, labeled=True, ordered=False):
    # 從TFRecords讀取。 為了獲得最佳性能，請一次從多個文件中讀取數據，而不考慮數據順序。 順序無關緊要，因為無論如何我們都會對數據進行混洗。
    ignore_order = tf.data.Options()
    if not ordered:
        ignore_order.experimental_deterministic = False # 禁用順序，提高速度

    dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads=AUTO)  # 自動交錯讀取多個文件
    dataset = dataset.with_options(ignore_order) # 在流入數據后立即使用數據，而不是按原始順序使用
    dataset = dataset.map(read_labeled_tfrecord if labeled else read_unlabeled_tfrecord, num_parallel_calls=AUTO)
    # 如果標記為True則返回（圖像，label）對的數據集，如果標記為False，則返回（圖像，id）對的數據集
    return dataset

# 按水平 (從左向右) 隨機翻轉圖像.返回圖片的參數image和label
def data_augment(image, label, seed=2020):
    # TensorFlow函數：tf.image.random_flip_left_right
    # 按水平 (從左向右) 隨機翻轉圖像.
    # 以1比2的概率,輸出image沿着第二維翻轉的內容,即,width.否則按原樣輸出圖像.
    # 參數：
    # image：形狀為[height, width, channels]的三維張量.
    # seed：一個Python整數,用於創建一個隨機種子.查看tf.set_random_seed行為.
    # 返回：一個與image具有相同類型和形狀的三維張量.
    image = tf.image.random_flip_left_right(image, seed=seed)
    
# image = tf.image.random_flip_up_down(image, seed=seed)
# image = tf.image.random_brightness(image, 0.1, seed=seed)
# image = tf.image.random_jpeg_quality(image, 85, 100, seed=seed)
# image = tf.image.resize(image, [530, 530])
# image = tf.image.random_crop(image, [512, 512], seed=seed)
    #image = tf.image.random_saturation(image, 0, 2)
    return image, label   

# 獲取訓練集
def get_training_dataset():
    # 加載訓練集，第一個參數為訓練集路徑，第二個參數表示有標簽
    dataset = load_dataset(TRAINING_FILENAMES, labeled=True)
    # 將數據轉換並行化
    # 為num_parallel_calls 參數選擇最佳值取決於您的硬件、訓練數據的特征（例如其大小和形狀）、Map 功能的成本以及在 CPU 上同時進行的其他處理；
    dataset = dataset.map(data_augment, num_parallel_calls=AUTO)
    # 重復此數據集count次數
    # 函數形式：repeat(count=None)
    # 參數count:(可選）表示數據集應重復的次數。默認行為（如果count是None或-1）是無限期重復的數據集。
    dataset = dataset.repeat() # 數據集必須重復幾個輪次
    dataset = dataset.shuffle(2048) #將數據打亂，括號中數值越大，混亂程度越大
    dataset = dataset.batch(BATCH_SIZE) # 按照順序將小批量中樣本數目行數據合成一個小批量，最后一個小批量可能小於20
    # pipeline（管道）讀取數據，在訓練時預取下一批（自動調整預取緩沖區大小）
    dataset = dataset.prefetch(AUTO) 
    return dataset

# 獲取驗證集
def get_validation_dataset(ordered=False):
    # 加載訓練集，第一個參數為驗證集路徑，第二個參數表示有標簽，第三個參數為不按照順序
    dataset = load_dataset(VALIDATION_FILENAMES, labeled=True, ordered=ordered)
    dataset = dataset.batch(BATCH_SIZE) ## 按照順序將小批量中樣本數目行數據合成一個小批量，最后一個小批量可能小於20
    dataset = dataset.cache() # 使用.cache()方法：當計算緩存空間足夠時，將preprocess的數據存儲在緩存空間中將大幅提高計算速度。
    # pipeline（管道）讀取數據，在訓練時預取下一批（自動調整預取緩沖區大小）
    dataset = dataset.prefetch(AUTO)  
    return dataset

# 將訓練集和驗證集合並
def get_train_valid_datasets():
    dataset = load_dataset(TRAINING_FILENAMES + VALIDATION_FILENAMES, labeled=True)
       # 將數據轉換並行化
    # 加載訓練集，第一個參數為訓練集路徑，第二個參數表示有標簽
    dataset = dataset.map(data_augment, num_parallel_calls=AUTO)
    # 重復此數據集count次數
    # 函數形式：repeat(count=None)
    # 參數count:(可選）表示數據集應重復的次數。默認行為（如果count是None或-1）是無限期重復的數據集。
    dataset = dataset.repeat() # 數據集必須重復幾個輪次
    dataset = dataset.shuffle(2048) # 將數據打亂，括號中數值越大，混亂程度越大
    dataset = dataset.batch(BATCH_SIZE)
    # pipeline（管道）讀取數據，在訓練時預取下一批（自動調整預取緩沖區大小）
    dataset = dataset.prefetch(AUTO)
    return dataset

# 獲取測試集
def get_test_dataset(ordered=False):
    dataset = load_dataset(TEST_FILENAMES, labeled=False, ordered=ordered)
    dataset = dataset.batch(BATCH_SIZE)
    # pipeline（管道）讀取數據，在訓練時預取下一批（自動調整預取緩沖區大小）
    dataset = dataset.prefetch(AUTO)
    return dataset

# 計算數據集樣本數目
def count_data_items(filenames):
    # 數據集的數量以.tfrec文件的名稱編寫，即flowers00-230.tfrec = 230個數據項
    n = [int(re.compile(r"-([0-9]*)\.").search(filename).group(1)) for filename in filenames]
    return np.sum(n)

5.3. 模型函數

# LearningRate Function 自己編寫的學習率函數
# 返回學習率·
def lrfn(epoch):
    LR_START = 0.00001 # 初始學習率
    LR_MAX = 0.00005 * strategy.num_replicas_in_sync # 最大學習率
    LR_MIN = 0.00001 # 最小學習率
    LR_RAMPUP_EPOCHS = 5
    LR_SUSTAIN_EPOCHS = 0
    LR_EXP_DECAY = .8
    
    if epoch < LR_RAMPUP_EPOCHS:
        lr = (LR_MAX - LR_START) / LR_RAMPUP_EPOCHS * epoch + LR_START
    elif epoch < LR_RAMPUP_EPOCHS + LR_SUSTAIN_EPOCHS:
        lr = LR_MAX
    else:
        lr = (LR_MAX - LR_MIN) * LR_EXP_DECAY**(epoch - LR_RAMPUP_EPOCHS - LR_SUSTAIN_EPOCHS) + LR_MIN
    return lr

6. 數據集可視化

# 數據展示
print("Training data shapes:")
# 輸出訓練集前3個小批量的圖像數據形狀、標簽形狀
for image, label in get_training_dataset().take(3):
    print(image.numpy().shape, label.numpy().shape)
# 訓練數據標簽示例
print("Training data label examples:", label.numpy())

print("Validation data shapes:")
# 輸出驗證集前3個小批量的圖像數據形狀、標簽形狀
for image, label in get_validation_dataset().take(3):
    print(image.numpy().shape, label.numpy().shape)
# 驗證數據標簽示例
print("Validation data label examples:", label.numpy())

print("Test data shapes:")
# 輸出測試集前3個小批量的圖像數據形狀、標簽形狀
for image, idnum in get_test_dataset().take(3):
    print(image.numpy().shape, idnum.numpy().shape)
# 測試集的id示例
print("Test data IDs:", idnum.numpy().astype('U')) # U=unicode string

Training data shapes:
(128, 512, 512, 3) (128,)
(128, 512, 512, 3) (128,)
(128, 512, 512, 3) (128,)
Training data label examples: [ 1  7 49 ... 77 53 67]
Validation data shapes:
(128, 512, 512, 3) (128,)
(128, 512, 512, 3) (128,)
(128, 512, 512, 3) (128,)
Validation data label examples: [49  4 91 ... 66 93 21]
Test data shapes:
(128, 512, 512, 3) (128,)
(128, 512, 512, 3) (128,)
(128, 512, 512, 3) (128,)
Test data IDs: ['75d255458' '8d1bc9b54' 'ff30e8b96' ... '256e89fc6' 'f6482ab55' '82f95de55']

# 查看訓練集
training_dataset = get_training_dataset() #通過一個函數來獲取訓練集
training_dataset = training_dataset.unbatch().batch(20) # 將訓練集分成大小為20的小批量
train_batch = iter(training_dataset) # 首先獲得Iterator對象

# 再次運行該單元格以獲取下一組圖像
display_batch_of_images(next(train_batch))

在這里插入圖片描述

# 查看測試集
test_dataset = get_test_dataset() #通過一個函數來獲取測試集
test_dataset = test_dataset.unbatch().batch(20) # 將訓練集分成大小為20的小批量
test_batch = iter(test_dataset) # 首先獲得Iterator對象

# 再次運行該單元格以獲取下一組圖像
display_batch_of_images(next(test_batch))

在這里插入圖片描述

7. 訓練模型

NUM_TRAINING_IMAGES = count_data_items(TRAINING_FILENAMES) # 訓練集樣本數目
NUM_VALIDATION_IMAGES = count_data_items(VALIDATION_FILENAMES) # 驗證集樣本數目
NUM_TEST_IMAGES = count_data_items(TEST_FILENAMES) # 測試集樣本數目
STEPS_PER_EPOCH = NUM_TRAINING_IMAGES // BATCH_SIZE # 每輪次中的步數=訓練集樣本數除以每個小批量中樣本數目
# 輸出訓練集、驗證集和測試集的數目
print('Dataset: {} training images, {} validation images, {} unlabeled test images'.format(NUM_TRAINING_IMAGES, NUM_VALIDATION_IMAGES, NUM_TEST_IMAGES))

Dataset: 12753 training images, 3712 validation images, 7382 unlabeled test images

7.1. 創建模型並加載到TPU

# 創建模型並加載到TPU
with strategy.scope():
    # 創建EfficientNetB7模型
    enet = efn.EfficientNetB7( # 選擇EfficientNet中的EfficientNetB7模型
        input_shape=(512, 512, 3), # 規定輸入數據的形狀
        weights='imagenet', # 用ImageNet的參數初始化模型的參數。如果不想使用ImageNet上預訓練到的權重初始話模型，可以將各語句的中'imagenet'替換為'None'。
        include_top=False # include_top：是否保留頂層的3個全連接網絡，False為不保留
    )
    
    # 創建模型
    model = tf.keras.Sequential([ #Sequential類（僅用於層的線性堆疊，這是目前最常見的網絡架構）
        enet, # EfficientNetB7模型
        tf.keras.layers.GlobalAveragePooling2D(), #全局平均池
        # len(CLASSES)：表示這個層將返回一個大小為類別個數（104）的張量
        # activation='softmax'：表示這個層將返回圖片在104個類別上的概率，其中最大的概率表示這個圖片的預測類別
        # softmax激活函數的本質就是將一個K維的任意實數向量壓縮（映射）成另一個K維的實數向量，其中向量中的每個元素取值都介於（0，1）之間並且和為1。
        # 在多分類單標簽問題中，可以用softmax作為最后的激活層，取概率最高的作為結果
        tf.keras.layers.Dense(len(CLASSES), activation='softmax')
    ])
    
    # 編譯模型
    model.compile(
        optimizer=tf.keras.optimizers.Adam(), #優化器：Adam 是一種可以替代傳統隨機梯度下降（SGD）過程的一階優化算法，它能基於訓練數據迭代地更新神經網絡權重
        # 損失函數：
        # 對於多分類問題，可以用分類交叉熵（categorical crossentropy）或稀疏分類交叉熵（sparse_categorical_crossentropy）損失函數
        # 這個sparse_categorical_crossentropy損失函數在數學上與 categorical_crossentropy 完全相同，
        # 如果目標是 one-hot 編碼的，那么使用 categorical_crossentropy 作為損失；
        # 如果目標是整數，那么使用 sparse_categorical_crossentropy 作為損失。
        loss = 'sparse_categorical_crossentropy', 
        metrics=['sparse_categorical_accuracy'] # 監控指標：分類准確率
    )
    
     #模型的摘要
    model.summary()

Downloading data from https://github.com/Callidior/keras-applications/releases/download/efficientnet/efficientnet-b7_weights_tf_dim_ordering_tf_kernels_autoaugment_notop.h5
258441216/258434480 [==============================] - 4s 0us/step
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
efficientnet-b7 (Model)      (None, 16, 16, 2560)      64097680  
_________________________________________________________________
global_average_pooling2d (Gl (None, 2560)              0         
_________________________________________________________________
dense (Dense)                (None, 104)               266344    
=================================================================
Total params: 64,364,024
Trainable params: 64,053,304
Non-trainable params: 310,720
_________________________________________________________________

保存全模型

可以對整個模型進行保存，其保存的內容包括：

該模型的架構
模型的權重（在訓練期間學到的）
模型的訓練配置（你傳遞給編譯的），如果有的話
優化器及其狀態（如果有的話）（這使您可以從中斷的地方重新啟動訓練

model.save('the_save_model.h5') #保存全模型

7.2. 訓練模型

# scheduler = tf.keras.callbacks.ReduceLROnPlateau(patience=3, verbose=1)
# 作為回調函數的一員,LearningRateScheduler 可以按照epoch的次數自動調整學習率,
# 參數：
# schedule：一個函數，它將一個epoch索引作為輸入（整數，從0開始索引）並返回一個新的學習速率作為輸出（浮點數）。
# 我們這里用lrfn（epoch）函數
# verbose：int；當其為0時，保持安靜；當其為1時，表示更新消息。
lr_schedule = tf.keras.callbacks.LearningRateScheduler(lrfn, verbose=1) 

# 訓練模型
history = model.fit(
    get_train_valid_datasets(),  # 獲取訓練集
    steps_per_epoch=STEPS_PER_EPOCH, # 設置每輪的步數
    epochs=EPOCHS,  # 設置輪次
    callbacks=[lr_schedule], # 設置回調函數
    validation_data=get_validation_dataset() # 設置驗證集
)

Train for 99 steps

Epoch 00001: LearningRateScheduler reducing learning rate to 1e-05.
Epoch 1/20
99/99 [==============================] - 412s 4s/step - loss: 4.5641 - sparse_categorical_accuracy: 0.0624 - val_loss: 4.4639 - val_sparse_categorical_accuracy: 0.1339

Epoch 00002: LearningRateScheduler reducing learning rate to 8.8e-05.
Epoch 2/20
99/99 [==============================] - 100s 1s/step - loss: 3.0131 - sparse_categorical_accuracy: 0.4089 - val_loss: 1.6291 - val_sparse_categorical_accuracy: 0.6549

Epoch 00003: LearningRateScheduler reducing learning rate to 0.000166.
Epoch 3/20
99/99 [==============================] - 100s 1s/step - loss: 1.0785 - sparse_categorical_accuracy: 0.7629 - val_loss: 0.4187 - val_sparse_categorical_accuracy: 0.9119

Epoch 00004: LearningRateScheduler reducing learning rate to 0.000244.
Epoch 4/20
99/99 [==============================] - 100s 1s/step - loss: 0.5098 - sparse_categorical_accuracy: 0.8813 - val_loss: 0.1893 - val_sparse_categorical_accuracy: 0.9577

Epoch 00005: LearningRateScheduler reducing learning rate to 0.000322.
Epoch 5/20
99/99 [==============================] - 100s 1s/step - loss: 0.3387 - sparse_categorical_accuracy: 0.9171 - val_loss: 0.0990 - val_sparse_categorical_accuracy: 0.9706

Epoch 00006: LearningRateScheduler reducing learning rate to 0.0004.
Epoch 6/20
99/99 [==============================] - 100s 1s/step - loss: 0.2712 - sparse_categorical_accuracy: 0.9316 - val_loss: 0.0653 - val_sparse_categorical_accuracy: 0.9811

Epoch 00007: LearningRateScheduler reducing learning rate to 0.000322.
Epoch 7/20
99/99 [==============================] - 100s 1s/step - loss: 0.1728 - sparse_categorical_accuracy: 0.9566 - val_loss: 0.0263 - val_sparse_categorical_accuracy: 0.9935

Epoch 00008: LearningRateScheduler reducing learning rate to 0.0002596000000000001.
Epoch 8/20
99/99 [==============================] - 100s 1s/step - loss: 0.1122 - sparse_categorical_accuracy: 0.9716 - val_loss: 0.0147 - val_sparse_categorical_accuracy: 0.9954

Epoch 00009: LearningRateScheduler reducing learning rate to 0.00020968000000000004.
Epoch 9/20
99/99 [==============================] - 100s 1s/step - loss: 0.0762 - sparse_categorical_accuracy: 0.9815 - val_loss: 0.0073 - val_sparse_categorical_accuracy: 0.9976

Epoch 00010: LearningRateScheduler reducing learning rate to 0.00016974400000000002.
Epoch 10/20
99/99 [==============================] - 100s 1s/step - loss: 0.0535 - sparse_categorical_accuracy: 0.9878 - val_loss: 0.0039 - val_sparse_categorical_accuracy: 0.9987

Epoch 00011: LearningRateScheduler reducing learning rate to 0.00013779520000000003.
Epoch 11/20
99/99 [==============================] - 100s 1s/step - loss: 0.0404 - sparse_categorical_accuracy: 0.9907 - val_loss: 0.0026 - val_sparse_categorical_accuracy: 0.9995

Epoch 00012: LearningRateScheduler reducing learning rate to 0.00011223616000000004.
Epoch 12/20
99/99 [==============================] - 101s 1s/step - loss: 0.0355 - sparse_categorical_accuracy: 0.9912 - val_loss: 0.0024 - val_sparse_categorical_accuracy: 0.9995

Epoch 00013: LearningRateScheduler reducing learning rate to 9.178892800000003e-05.
Epoch 13/20
99/99 [==============================] - 100s 1s/step - loss: 0.0292 - sparse_categorical_accuracy: 0.9936 - val_loss: 0.0023 - val_sparse_categorical_accuracy: 0.9992

Epoch 00014: LearningRateScheduler reducing learning rate to 7.543114240000003e-05.
Epoch 14/20
99/99 [==============================] - 100s 1s/step - loss: 0.0241 - sparse_categorical_accuracy: 0.9950 - val_loss: 0.0020 - val_sparse_categorical_accuracy: 0.9997

Epoch 00015: LearningRateScheduler reducing learning rate to 6.234491392000002e-05.
Epoch 15/20
99/99 [==============================] - 100s 1s/step - loss: 0.0231 - sparse_categorical_accuracy: 0.9950 - val_loss: 0.0012 - val_sparse_categorical_accuracy: 1.0000

Epoch 00016: LearningRateScheduler reducing learning rate to 5.1875931136000024e-05.
Epoch 16/20
99/99 [==============================] - 100s 1s/step - loss: 0.0182 - sparse_categorical_accuracy: 0.9965 - val_loss: 0.0011 - val_sparse_categorical_accuracy: 1.0000

Epoch 00017: LearningRateScheduler reducing learning rate to 4.3500744908800015e-05.
Epoch 17/20
99/99 [==============================] - 100s 1s/step - loss: 0.0182 - sparse_categorical_accuracy: 0.9959 - val_loss: 9.8715e-04 - val_sparse_categorical_accuracy: 1.0000

Epoch 00018: LearningRateScheduler reducing learning rate to 3.6800595927040014e-05.
Epoch 18/20
99/99 [==============================] - 100s 1s/step - loss: 0.0169 - sparse_categorical_accuracy: 0.9972 - val_loss: 9.7219e-04 - val_sparse_categorical_accuracy: 1.0000

Epoch 00019: LearningRateScheduler reducing learning rate to 3.1440476741632015e-05.
Epoch 19/20
99/99 [==============================] - 101s 1s/step - loss: 0.0160 - sparse_categorical_accuracy: 0.9973 - val_loss: 8.9415e-04 - val_sparse_categorical_accuracy: 1.0000

Epoch 00020: LearningRateScheduler reducing learning rate to 2.7152381393305616e-05.
Epoch 20/20
99/99 [==============================] - 100s 1s/step - loss: 0.0170 - sparse_categorical_accuracy: 0.9965 - val_loss: 8.7359e-04 - val_sparse_categorical_accuracy: 1.0000

第1-5輪。我們發現回調函數LearningRateScheduler自動調整學習率，並且驗證准確率最大為0.9706
在這里插入圖片描述
最后的五輪，第16-20輪。我們發現回調函數LearningRateScheduler自動調整學習率，並且驗證准確率保持在1

在這里插入圖片描述

7.3. 繪制損失和准確率曲線

# 畫出訓練集和驗證集隨輪次變化的損失和准確率
display_training_curves(history.history['loss'], history.history['val_loss'], 'loss', 211) #損失曲線
display_training_curves(history.history['sparse_categorical_accuracy'], history.history['val_sparse_categorical_accuracy'], 'accuracy', 212) #准確率曲線
# display_training_curves(history.history['loss'], history.history['loss'], 'loss', 211)
# display_training_curves(history.history['sparse_categorical_accuracy'], history.history['sparse_categorical_accuracy'], 'accuracy', 212)

在這里插入圖片描述

7.4. 繪制混淆矩陣

# 因為我們要分割數據集並分別對圖像和標簽進行迭代，所以順序很重要。
cmdataset = get_validation_dataset(ordered=True)  # 驗證集
images_ds = cmdataset.map(lambda image, label: image)  # 圖像集
labels_ds = cmdataset.map(lambda image, label: label).unbatch() # 標簽集 
cm_correct_labels = next(iter(labels_ds.batch(NUM_VALIDATION_IMAGES))).numpy() # get everything as one batch
cm_probabilities = model.predict(images_ds) # 圖片在104個類別上的概率
cm_predictions = np.argmax(cm_probabilities, axis=-1) # 其中最大的概率表示這個圖片的預測類別
print("Correct labels: ", cm_correct_labels.shape, cm_correct_labels) # 輸出正確（實際）標簽的形狀、輸出正確標簽 
print("Predicted labels: ", cm_predictions.shape, cm_predictions) # 輸出預測標簽的形狀、輸出預測標簽

Correct   labels:  (3712,) [ 50  13  74 ... 102  48  67]
Predicted labels:  (3712,) [ 50  13  74 ... 102  48  67]

# 計算混淆矩陣
# 參數為實際標簽和預測的標簽
cmat = confusion_matrix(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)))
# 計算f1分數
score = f1_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average='macro')
# 計算精確率
precision = precision_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average='macro')
# 計算召回率
recall = recall_score(cm_correct_labels, cm_predictions, labels=range(len(CLASSES)), average='macro')
# 歸一化
cmat = (cmat.T / cmat.sum(axis=1)).T # normalized
# 繪制混淆矩陣
display_confusion_matrix(cmat, score, precision, recall)
# 輸出f1分數、精確率、召回率
print('f1 score: {:.3f}, precision: {:.3f}, recall: {:.3f}'.format(score, precision, recall))

圖一：非本次的混沌矩陣，這是V1版本的混沌矩陣，這里放圖只是因為我們最后的准確率（V18版本）太高，圖一無法讓我們感受到混淆矩陣的魅力。貼一個准確率低一點的來讓我們感受混淆矩陣的魅力。
對驗證集預測后，
准確率（accuracy ）為40%
f1分數（f1 score）=0.246,
精確率（precision）=0.419，
召回率（recall）=0.226
在這里插入圖片描述

圖二：本次的混沌矩陣，這是V18版本的混沌矩陣，
對驗證集預測后，
准確率（accuracy ）為99.9%
f1分數（f1 score）=1,
精確率（precision）=1，
召回率（recall）=1
在這里插入圖片描述

f1 score: 1.000, precision: 1.000, recall: 1.000

8. 預測

# 因為我們要分割數據集並分別對圖像和ID進行迭代，所以順序很重要。
test_ds = get_test_dataset(ordered=True) # 測試集

# 對測試集進行預測
print('Computing predictions...')
test_images_ds = test_ds.map(lambda image, idnum: image) #測試集的圖片
probabilities = model.predict(test_images_ds) # 圖片在104個類別上的概率
predictions = np.argmax(probabilities, axis=-1) # 其中最大的概率表示這個圖片的預測類別
print(predictions) # 輸出預測類別

# 生成提交文件
print('Generating submission.csv file...')
test_ids_ds = test_ds.map(lambda image, idnum: idnum).unbatch() #測試集的id
test_ids = next(iter(test_ids_ds.batch(NUM_TEST_IMAGES))).numpy().astype('U') # 准換id的數據類型 # all in one batch

# 第一種存儲文件方式，不需要pandas
# np.savetxt('submission.csv', np.rec.fromarrays([test_ids, predictions]), fmt=['%s', '%d'], delimiter=',', header='id,label', comments='')
# 第二種存儲文件的方式，需要pandas
import pandas as pd
test = pd.DataFrame({"id":test_ids,"label":predictions}) #將id列和label列創建成一個DataFrame
print(test.head) # 輸出test的前幾行
test.to_csv("submission.csv",index = False) # 生成沒有索引的submission.csv，以便提交

Computing predictions...
[ 67  28  83 ...  86 102  62]
Generating submission.csv file...
<bound method NDFrame.head of              id  label
0     252d840db     67
1     1c4736dea     28
2     c37a6f3e9     83
3     00e4f514e    103
4     59d1b6146     70
...         ...    ...
7377  c785abe6f      7
7378  9b9c0e574     68
7379  e46998f4d     86
7380  523df966b    102
7381  e86e2a592     62

[7382 rows x 2 columns]>

9. 視覺上進行一下驗證，看下預測效果

這里為什么選擇驗證集進行視覺上的驗證？

我們選取驗證集進行驗證，因為模型是根據訓練集訓練的，而驗證集和測試集都和訓練集毫不相關，但是驗證集有實際標簽，方便我們進行驗證

dataset = get_validation_dataset()  # 獲取驗證集
dataset = dataset.unbatch().batch(20)  #將驗證集分成大小為20的小批量
batch = iter(dataset) # 將數據集轉化為Iterator對象

# 再次運行該單元格以獲取下一組圖像
images, labels = next(batch) # 獲取驗證集的下一個批量
probabilities = model.predict(images) # 圖片在104個類別上的概率
predictions = np.argmax(probabilities, axis=-1) # 其中最大的概率表示這個圖片的預測類別
display_batch_of_images((images, labels), predictions) # 展示一個批量的圖片，圖片標題為預測標簽+預測標簽是否正確（OK或NO）
# 舉個例子：標題為wild rose（NO->watercress），這個圖片實際是豆瓣花，但是預測為野玫瑰，所以它是錯的。所以它的標簽為 野玫瑰（NO->豆瓣花）

圖一：非本次的經過預測的驗證集部分圖片，這是V1版本，這里放圖只是因為我們最后的准確率（V18版本）太高，圖一無法讓我們看到預測失敗時的情況。
對驗證集預測后，
准確率（accuracy ）為40%
f1分數（f1 score）=0.246,
精確率（precision）=0.419，
召回率（recall）=0.226
在這里插入圖片描述

圖二：本次的經過預測的驗證集的部分圖片，這是V18版本，對驗證集預測后的
准確率（accuracy ）為99.9%
f1分數（f1 score）=1,
精確率（precision）=1，
召回率（recall）=1
在這里插入圖片描述

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 提高深度學習分類模型准確率方法 [ML]tensorflow計算分類准確率 keras如何求分類問題中的准確率和召回率 tensorflow(二十六)：Keras計算准確率和損失 97.5%准確率的深度學習中文分詞（字嵌入+Bi-LSTM+CRF） 97.5%准確率的深度學習中文分詞（字嵌入+Bi-LSTM+CRF） 97.5%准確率的深度學習中文分詞（字嵌入+Bi-LSTM+CRF）准確率99.9%的離線IP地址定位庫【tensorflow2.0】使用TPU訓練模型 linux服務器上配置進行kaggle比賽的深度學習tensorflow keras環境詳細教程

kaggle競賽 使用TPU對104種花朵進行分類 第十八次嘗試 99.9%准確率 中文注釋【深度學習TPU+Keras+Tensorflow+EfficientNetB7】

目錄

排行榜分數

最終排名

比賽過后的一點心得

前言

版本更新情況

1. 安裝efficientnet

2. 導入需要的包

3. 檢測TPU和GPU

4. 配置TPU、訪問路徑等

5. 各種函數

5.1. 可視化函數

5.2. 數據集函數

5.3. 模型函數

6. 數據集可視化

7. 訓練模型

7.1. 創建模型並加載到TPU

7.2. 訓練模型

7.3. 繪制損失和准確率曲線

7.4. 繪制混淆矩陣

8. 預測

9. 視覺上進行一下驗證，看下預測效果

免責聲明！

kaggle競賽使用TPU對104種花朵進行分類第十八次嘗試 99.9%准確率中文注釋【深度學習TPU+Keras+Tensorflow+EfficientNetB7】