python進程 - 調試報錯 you are not using fork to start your child processes

本文轉載自查看原文 2021-07-30 00:24 214 程序人生/ Python

在走這段代碼的時候報錯了，記錄一下我的調試過程，感覺有個思路來走就挺好的。

1、報錯與解決

文件名字：ClassifierTest.py

import torch
import torchvision
import torchvision.transforms as transforms
from torchTest import imgShow

transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='Resources/CIFAR10',  # 存放路徑 #！！/Resources/CIFAR10是絕對路徑，C:\Resources\CIFAR10
                                    train=True, download=True,  # 是否下載訓練集
                                    transform=transform)  # 圖片轉換
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='Resources/CIFAR10', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

dataIter = iter(trainloader)
images, labels = dataIter.next()
imgShow.imshow(torchvision.utils.make_grid(images))
print(' '.join(classes[labels[j]] for j in range(4)))

報錯

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

關於這個報錯，涉及線程問題，改num_workers=0，當然就么事沒有，然而，作為一個優秀的程序員，能止步於此嗎，不行的。

我百度了一下報錯情況，找到這樣的解決方案，是可行：

def main():
    transform = transforms.Compose([transforms.ToTensor(),
                                    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
    trainset = torchvision.datasets.CIFAR10(root='Resources/CIFAR10',  # 存放路徑，注：/Resources/CIFAR10是絕對路徑，C:\Resources\CIFAR10
                                            train=True, download=True,  # 是否下載訓練集
                                            transform=transform)  # 圖片轉換
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)
    testset = torchvision.datasets.CIFAR10(root='Resources/CIFAR10', train=False, download=True, transform=transform)
    testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)
    classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

    dataIter = iter(trainloader)
    images, labels = dataIter.next()
    imgShow.imshow(torchvision.utils.make_grid(images))
    print(' '.join(classes[labels[j]] for j in range(4)))

if __name__=='__main__': #不加這句就會報錯
    main()

2、為什么是main？

整段放在main里面，就安全了——為什么呢？

對於python編程我還是萌新，實在想不明白加個__name__=='__main__'判斷有什么魅力。

關於__name__屬性：

作為啟動腳本，它模塊的__name__都是__main__。

此句主要作用在於有時候import，不想運行引用模塊中某些語句的時候，以啟動模塊的名字作為區別。

報錯的位置在這里：

C:\Users\13723\AppData\Local\Programs\Python\Python39\Lib\multiprocessing\spawn.py

def _check_not_importing_main():
    if getattr(process.current_process(), '_inheriting', False):
        raise RuntimeError('''
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase...''')

getattr(實例, 屬性名字, 默認值)

如果有屬性，取True，否則取默認值，沒有默認值則取False。

_inheriting，查找當前程序的可繼承性？沒用過，筆者不知道呢。

看不懂（下文有解），只能從方法名字入手，它走這一段為了什么——

檢查是不是源自__main__模塊，即程序不讓由執行腳本import的模塊走這一段。

我跑ClassifierTest.py（進程pid1），它在走到

dataIter = iter(trainloader)

里面，由其他模塊，再導入了一次ClassifierTest.py（此時是進程pid2）

而當增加判斷 __name__==’__main__’，就避免模塊陷入執行的死循環。

3、為什么多一個進程？

3.1 現象

為什么會多一個進程，num_workers=2，此句是一個進程兩個線程worker，還是兩個進程worker呢？

我很奇怪，為什么不是開線程，而是開進程這么個重量級東西。

雖然叫做process，但它應該只干一個事情——畢竟進程的重量級要大於線程。

3.2 線程與進程

這個時候就很糾結線程和進程的區別了，

（參考：https://www.zhihu.com/question/25532384）

線程是cpu執行的時間段顆粒，

進程保存上下文，cpu切進進程里面讀取上下文（寄存器、指令內容之類）。

這樣看來，如果進程是倉庫，線程就是倉庫里面的機器人，等待CPU來靈魂激活。但是在一個倉庫里面工作，必然比在多個倉庫里面工作要省事。

所以為什么要開多進程呢？

一個莫名的靈感，讓我查了一下fork()，

（參考：https://www.cnblogs.com/liyuan989/p/4279210.html）

因為進程、線程是windows系統的概念，unix中只有進程的說法。

在windows當中，進程是資源管理的最小單位，而線程是程序執行的最小單位。

fork創建的一個子進程幾乎但不完全與父進程相同。

子進程得到與父進程用戶級虛擬地址空間相同的（但是獨立的）一份拷貝，

包括文本、數據和bss段、堆以及用戶棧等。

子進程還獲得與父進程任何打開文件描述符相同的拷貝，

這就意味着子進程可以讀寫父進程中任何打開的文件，父進程和子進程區別在於它們有着不同的PID。

fork 意為分支，分支與父進程幾乎一樣的子進程。子進程區別於父進程，兩者有不同的pid，但二者的引用均指向相同的地址。

話雖如此，Python里面確實是包含threading，和process模塊，那為什么選擇process更好？

（參考：https://zhuanlan.zhihu.com/p/20953544）

一個進程，有一個全局鎖GIL（Global Interpreter Lock），此設定是為了數據安全。

線程執行時，先獲取GIL，執行代碼直到sleep或掛起，釋放GIL。

所以多線程執行，其實僅是宏觀時間上處理多任務，微觀時間上仍是順序處理。

而每個進程有各自獨立的GIL，互不干擾，多進程才能在真正意義上實現並行執行（多核CPU同時做多個任務，程序在微觀時間上同時執行）。

3.3 Python中，worker是進程

為什么會再讀一次ClassifierTest.py，從堆棧看，是這里：

（注，以下截圖可能取自不同次調試，所以父pid會不同）

走了 exec(code, run_globals) 導致再此導入 ClassifierTest.py 。

再往前走frame not available，也即IDE只能看到spawn_main函數。

（spawn應該就是孵化了，孵化進程的，還挺有蛇下蛋的感覺）

更之前的調用情況沒有了，可以猜是不是新進程直接調用spawn_main了，那就找spawn_main引用。

（可能pyCharm我還沒get靈魂用法，spawn_main引用我是用notepad++查找全局的）

Python39\Lib\multiprocessing\popen_spawn_win32.py

前后呼應：

在查看堆棧的過程中，恰巧看到了_inheriting的賦值：

堆棧可以看到對_inheriting賦值，此時就很明了表示是否子進程，此處賦值True。

再者，inheriting是ing結尾，表示進行時狀態；如果是表示繼承性，應該叫inherited，如此看來這個編程就很細心，自己寫程序的時候也得注意。

3.4 num_workers=2 的結果

前文設置num_workers = 2，此時就是父進程帶着兩個子進程，

__name__==’__main__’ 的處理，阻止了子進程由於調用 ClassifierTest.py 而再生子子進程的子孫無窮盡也。

主線程12012 它有兩個worker，分別是 15480 和 7036 。

（這個數值是系統分配的pid編號，區分進程的代號，每次啟動程序都不同）

15480 和 7036 帶着自己的Queue，dataloader.py完成了這個配置。

dataIter = iter(trainloader)
images, labels = dataIter.next()

當執行 next()，程序會讀取象 dataIter 當中的 _data_queue ，這個數據由兩個子進程各自傳入。

data = self._data_queue.get(timeout=timeout)

具體實現看這個類：

C:\Users\13723\PycharmProjects\pythonProject\venv\Lib\site-packages\torch\utils\data\dataloader.py

class _MultiProcessingDataLoaderIter(_BaseDataLoaderIter):
    pass

4、結語

　　由一個小小的報錯，能“查漏補缺”知識漏洞就挺好的，鍛煉思維也挺好的。共勉。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 普通用戶fork問題 fork: retry: No child processes python fork 守護進程 python多進程——fork() You must specify a machine to start, using the command line. php日志報錯child exited with code 0 after seconds from start 表現層工程啟動報錯：A child container failed during start Eclipse Maven項目報錯2之A child container failed during start 啟動Tomcat報錯 “A child container failed during start” mysql5.7初始化密碼報錯 ERROR 1820 (HY000): You must reset your password using ALTER USER statement before 解決Navicat連接mysql報錯：1862 - Your password has expired. To log in you must change it using a client that supports expired passwords.