subprocess popen 子進程退出的問題

本文轉載自查看原文 2020-07-30 16:20 5917

最近在項目中遇到一個需求，前端發來一個命令，這個命令是去執行傳遞過來的一個腳本（shell 或者python），並返回腳本的標准輸出和標准出錯，如果執行超過設定時間還沒結束就超時，然后終止腳本的執行。實現這個功能，自然而然先想到的是subprocess這個庫了。

因此，在后端的一個腳本中調用python的subprocess去執行傳遞過來的腳本，通常情況下subprocess都能運行的很好，完成腳本的執行並返回。最初的實現代碼如下：
run_cmd.py

#!/usr/bin/python # -*- coding: utf-8 -*- import subprocess from threading import Timer import os class test(object): def __init__(self): self.stdout = [] self.stderr = [] self.timeout = 10 self.is_timeout = False pass def timeout_callback(self, p): print 'exe time out call back' print p.pid try: p.kill() except Exception as error: print error def run(self): cmd = ['bash', '/home/XXXX/test.sh'] p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) my_timer = Timer(self.timeout, self.timeout_callback, [p]) my_timer.start() try: print "start to count timeout; timeout set to be %d \n" % (self.timeout,) stdout, stderr = p.communicate() exit_code = p.returncode print exit_code print type(stdout), type(stderr) print stdout print stderr finally: my_timer.cancel()

但是偶然間測試一個shell腳本，這個shell腳本中有一行ping www.baidu.com &，shell腳本如下：
test.sh

#!/bin/bash
ping   www.baidu.com (&) #加不加&都沒區別
echo $$

python（父進程）用subprocess.Popen新建一個進程（子進程）去開啟一個shell， shell新開一個子進程（孫進程）去執行ping www.baidu.com的命令。由於孫進程ping www.baidu.com一直在執行,就類似於一個daemon程序，一直在運行。在超時時間后，父進程殺掉了shell子進程，但是父進程阻塞在了p.communicate函數了，是阻塞在了調用wait()函數之前，感興趣的朋友可以看一下源碼_communicate函數，linux系統重點看_communicate_with_poll和_communicate_with_select函數，你會發現是阻塞在了while循環里面，因為父進程一直在獲取輸出，而孫進程一直像一個daemon程序一樣，一直在往子進程的輸出寫東西，而子進程的文件句柄繼承自父進程。雖然shell子進程被殺掉了，但是父進程里面的邏輯並沒有因為子進程被意外的干掉而終止，（因為孫進程一直有輸出到子進程的stdout，導致子進程的stdout一直有輸出，也就是父進程的stdout也有輸出），所以while循環一直成立，就導致了阻塞，進而導致wait()沒有被調用，所以子進程沒有被回收，就成了僵屍進程。

要完美的解決這個問題就是即要能獲取到subprocess.Popen的進程的輸出，在超時又要能殺掉子進程，讓主進程不被阻塞。

一開始比較急，也對subprocess.Popen沒有深入的去用過，嘗試了一個low B的辦法，就是不用subprocess.Popen.communicate()去獲取輸出，而是直接去讀文件，然后超時后不去讀文件。代碼如下：
run_cmd.py第一個改版

#!/usr/bin/python # -*- coding: utf-8 -*- import subprocess from threading import Timer import os class test(object): def __init__(self): self.stdout = [] self.stderr = [] self.timeout = 10 self.is_timeout = False pass def timeout_callback(self, p): self.is_timeout = True print "time out" def run(self): cmd = ['bash', '/home/zhangxin/work/baofabu/while_test.sh'] p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) my_timer = Timer(self.timeout, self.timeout_callback, [p]) my_timer.start() try: print "start to count timeout; timeout set to be %d \n" % (self.timeout,) for line in iter(p.stdout.readline, b''): print line if self.is_timeout: break for line in iter(p.stderr.readline, b''): print line if self.is_timeout: break finally: my_timer.cancel() p.stdout.close() p.stderr.close() p.kill() p.wait()

這樣雖然能獲取輸出，在超時后也不再阻塞，寫完過后返回來再看時發現，其實在最開始的那一版代碼中，只要在超時的回調函數中加上p.stdout.close()和p.stderr.clode(), p.communicate就不再阻塞了，其實問題也就解決了。但是還會存在一個潛在的問題，父進程結束了，沒有其他進程去讀取PIPE，daemon孫進程一直往PIPE寫，最后導致PIPE填滿，孫進程也被阻塞。

所以這樣處理其實沒任何意義，因為孫進程沒有被終止掉，只是簡單的關閉了管道。所以在假期，我仔細的在網上找了找，看了看subprocess，發現subprocess.Popen有一個參數preexec_fn，調用subprocess.Popen時傳遞preexec_fn=os.setsid或者preexec_fn=os.setpgrp，然后在超時的時候執行os.killpg(p.pid, signal.SIGKILL)就可以殺掉子進程以及在同一個會話的所有進程。所以將run函數的subprocess.Popen改為

p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, preexec_fn=os.setsid) 同時將timeout_callback函數改成如下就可以了： def timeout_callback(self, p): self.is_timeout = True print 'exe time out call back' print p.pid try: os.killpg(p.pid, signal.SIGKILL) except Exception as error: print error

關於preexec_fn=os.setsid的作用，以下摘自https://blog.tonyseek.com/post/kill-the-descendants-of-subprocess/

運行程序 fork 以后，產生的子進程都享有獨立的進程空間和 pid，也就是它超出了我們觸碰的范圍。好在 subprocess.Popen 有個 preexec_fn 參數，它接受一個回調函數，並在 fork 之后 exec 之前的間隙中執行它。我們可以利用這個特性對被運行的子進程做出一些修改，比如執行 setsid() 成立一個獨立的進程組。 Linux 的進程組是一個進程的集合，任何進程用系統調用 setsid 可以創建一個新的進程組，並讓自己成為首領進程。首領進程的子子孫孫只要沒有再調用 setsid 成立自己的獨立進程組，那么它都將成為這個進程組的成員。 之后進程組內只要還有一個存活的進程，那么這個進程組就還是存在的，即使首領進程已經死亡也不例外。 而這個存在的意義在於，我們只要知道了首領進程的 pid (同時也是進程組的 pgid)， 那么可以給整個進程組發送 signal，組內的所有進程都會收到。 因此利用這個特性，就可以通過 preexec_fn 參數讓 Popen 成立自己的進程組， 然后再向進程組發送 SIGTERM 或 SIGKILL，中止 subprocess.Popen 所啟動進程的子子孫孫。當然，前提是這些子子孫孫中沒有進程再調用 setsid 分裂自立門戶。

至於setsid和setpgrp有什么區別，看了各自的man page，還不是很明白，如果有大兄弟知道，並且不吝留言分享告知，感激涕零！

subprocess.Popen只能運行命令或者腳本，而不能像threading的thread庫一樣運行函數，那么如何在一個只有.py文件的情況下像thread一樣運行subprocess.Popen呢？在調用subprocess.Popen的py我們可以把要執行的腳本內容寫到一個臨時文件，也即是類似於thread的target函數，然后用subprocess.Popen執行這個臨時腳本，這樣就可以不用預先存在多個腳本。。如下面的例子：

import os import signal import subprocess import tempfile import time import sys def show_setting_prgrp(): print('Calling os.setpgrp() from {}'.format(os.getpid())) os.setpgrp() print('Process group is now {}'.format( os.getpid(), os.getpgrp())) sys.stdout.flush() # 這次的重點關注是這里 script = '''#!/bin/sh echo "Shell script in process $$" set -x python3 signal_child.py ''' script_file = tempfile.NamedTemporaryFile('wt') script_file.write(script) script_file.flush() proc = subprocess.Popen( ['sh', script_file.name], preexec_fn=show_setting_prgrp, ) print('PARENT : Pausing before signaling {}...'.format( proc.pid)) sys.stdout.flush() time.sleep(1) print('PARENT : Signaling process group {}'.format( proc.pid)) sys.stdout.flush() os.killpg(proc.pid, signal.SIGUSR1) time.sleep(3)

當然也可以在shell腳本里面用exec來運行命令，那么就只有父進程和子進程，沒有孫進程的概念了。

其實關於阻塞問題，也可以將subprocess.Popen的輸出重定向到文件。

#!/usr/bin/python # -*- coding: utf-8 -*- import subprocess from threading import Timer import os import time import signal class test(object): def __init__(self): self.stdout = [] self.stderr = [] self.timeout = 6 self.is_timeout = False pass def timeout_callback(self, p): print 'exe time out call back' try: p.kill() # os.killpg(p.pid, signal.SIGKILL) except Exception as error: print error def run(self): stdout = open('/tmp/subprocess_stdout', 'wb') stderr = open('/tmp/subprocess_stderr', 'wb') cmd = ['bash', '/home/xxx/while_test.sh'] p = subprocess.Popen(cmd, stdout=stdout.fileno(), stderr=stderr.fileno()) my_timer = Timer(self.timeout, self.timeout_callback, [p]) my_timer.start() print p.pid try: print "start to count timeout; timeout set to be %d \n" % (self.timeout,) p.wait() finally: my_timer.cancel() stdout.flush() stderr.flush() stdout.close() stderr.close()

寫在最后，關於p = subprocess.Popen，最好用p.communicate.而不是直接用p.wait()，因為p.wait()有可能因為子進程往PIPE寫的時候寫滿了，但是子進程還沒有結束，導致子進程阻塞，而父進程一直在wait()，導致父進程阻塞。而且p.wait()和p.communicate不能一起用，因為p.communicate里面也會去調用wait()。
在linux平台下，p.wait()其實最后調用的是os.waitpid()，我們自己用的時候，也盡量用waitpid,而不是wait(),因為多次調用waitpid去wait同一個進程不會導致阻塞，但是程序中多次調用wait就很有可能會被阻塞，詳見wait函數的作用。

其實阻塞的根本原因還是因為PIPE滿了，所以用PIPE的時候，最好和select或者poll模型一起使用，防止讀、寫阻塞。 PIPE管道是系統調用，os.pipe產生的一個文件，只不過他有兩個fd，一個用於讀，一個用於寫，當讀寫端都被關閉后，內核會自動回收。你可以理解內核在內存中開辟了一個隊列，一端讀，一端寫。

管道在進程間通信(IPC)使用很廣泛，shell命令就使用的很廣泛。比如：
ps –aux | grep mysqld
上述命令表示獲取mysqld進程相關的信息。這里ps和grep兩個命令通信就采用了管道。管道有幾個特點：

 管道是半雙工的，數據只能單向流動，ps命令的輸出是grep的輸出

 只能用於父子進程或兄弟進程通信，這里可以認為ps和grep命令都是shell（bash/pdksh/ash/dash）命令的子進程，兩者是兄弟關系。

 管道相對於管道兩端的進程而言就是一個文件，並且只存在於內存中。

```
 寫入端不斷往管道寫，並且每次寫到管道末尾；讀取端則不斷從管道讀，每次從頭部讀取。
```
到這里大家可能會有一個疑問，管道兩端的進程，寫入進程不斷的寫，讀取進程不斷的讀，那么什么時候結束呢？比如我們剛剛這個命令很快就結束了，它的原理是怎么樣的呢？對於管道，這里有兩個基本原則：
1.當讀一個寫端已經關閉的管道時，在所有數據被讀取后，read返回0，以指示達到文件結束處。
2.當寫一個讀端已經關閉的管道時，會產生sigpipe信息。
結合這個例子，當ps寫管道結束后，就會自動關閉，此時grep進程read就會返回0，然后自動結束。
具體pipe可以參見http://man7.org/linux/man-pages/man7/pipe.7.html

最近有發現了一個有趣的shell命令timeout，結合python 2.7的subprocess.Popen（python3的subprocess.Popen自帶timeout參數），可以做到超時后終止進程。

cmd = ['timeout', 'bash', 'xxxxx']
subprocess.Popen(cmd)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 subprocess.popen.kill殺死所有子進程 Python subprocess 創建子進程 Python多進程（1）——subprocess與Popen() python subprocess 殺掉全部派生的子進程【Python】子進程創建與使用subprocess 父進程退出后，子進程如何變化主進程退出的時候，殺死所有子進程 swoole_process,子進程自動重啟及主進程退出后,子進程退出的試例. python子進程模塊subprocess詳解與應用實例之一 python子進程模塊subprocess詳解與應用實例之三