Python遠程控制模塊paramiko遇到的問題及解決記錄


轉載 https://zhang.ge/5122.html

最近一直在開發自動化運維發布平台,底層命令行、文件通道主要基於paramiko模塊,使用過程中遇到各種各樣的問題,本文主要用於收集問題及解決記錄,以備后續使用。Python遠程控制模塊paramiko遇到的問題及解決記錄

一、Error reading SSH protocol banner連接錯誤

這個關鍵詞,在百度、谷歌一搜一大把的提問,也有少部分給出了解決方案,但是最終都無法解決,我經過不斷嘗試和解讀paramiko源碼,終於搞定了這個問題,在此記錄分享下。

1、具體報錯信息:

 
Traceback (most recent call last):
 
File "<stdin>", line 1, in <module>
 
File "build/bdist.linux-x86_64/egg/paramiko/client.py", line 307, in connect
 
File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 465, in start_client
 
paramiko.SSHException: Error reading SSH protocol banner

2、解決辦法:

重新下載paramiko插件源碼,解壓后,編輯安裝目錄下的transport.py文件:

vim build/lib/paramiko/transport.py

 

搜索 self.banner_timeout 關鍵詞,並將其參數改大即可,比如改為300s:

self.banner_timeout = 300

最后,重裝paramiko即可。

3、下面的曲折、啰嗦的解決過程,不喜請跳過:

在谷歌搜到一個老外相關提問,雖然他說的是pysftp,其實也是基於paramiko:

https://stackoverflow.com/questions/34288526/pysft-paramiko-grequests-error-reading-ssh-protocol-banner/44493465#44493465

他最后給出了他的解決方案:

UPDATE:

It seems the problem is caused by importing the package grequests. If I do not import grequests, pysftp works as expected. The issue was raised before but has not been solved

意思是,在paramiko使用前,先import grequests,就能解決問題。我照做之后,發現對手頭的現網環境無效,可能錯誤產生的原因不一樣。

但是,我從老外的問題描述過程中,找到了解決方法,他是這樣說的:

I have already tried changing the banner timeout from 15 seconds to 60 secs in the transport.py, but it did not solve the problem.

我看到有個timeout和transport.py,就想到現網那些報Error reading SSH protocol banner錯誤的機器也是非常卡,而且目測了下發起paramiko連接到報錯的時間,基本是相同的。

於是系統中搜索,並找到了transport.py這個文件:

/usr/lib/python2.7/site-packages/paramiko/transport.py

並搜了下banner,發現果然有一個參數設置,而且和目測的超時基本一致!

Python遠程控制模塊paramiko遇到的問題及解決記錄

於是,順手修改成300S,並重新測試發現沒任何效果,依然15S超時。接着打斷點、甚至移走這個文件,問題依舊!!看來這個文件不會被引用。。。

回到最初的報錯信息,發現里面顯示的是:

 
build/bdist.linux-x86_64/egg/paramiko/transport.py

而系統里面搜不到這個問題,最后醍醐灌頂,發覺Python模塊編譯后,基本是以egg文件保存的,看來 必須修改源碼才行了。

於是cd到paramiko的源碼目錄,執行搜索,找到2各transport.py文件:

 
[root@localhost :/data/software/paramiko-1.9]# find . -name transport.py
 
./paramiko/transport.py
 
./build/lib/paramiko/transport.py

嘗試將文件中的 self.banner_timeout 值改成300,重新安裝paramiko,結果一次性測試成功!

然后,我順便在老外的帖子回答了下(請忽略蹩腳的英語),算是回饋吧!Python遠程控制模塊paramiko遇到的問題及解決記錄

二、paramiko遠程執行后台腳本“阻塞”問題

我寫的遠程命令通道上線之后,發現在遠程腳本中后台再執行另一個腳本,通道會一直等待后台腳本執行完成才會返回,有時甚至會僵死。

1、復現過程如下:

①、編寫測試腳本

腳本1:test.sh

 
#!/bin/bash
 
sleep 30
 
echo test end
 
exit 0

腳本2:run.sh

 
#!/bin/bash
 
bash /tmp/test.sh &
 
echo run ok!
 
exit 0

腳本3:test.py

 
import paramiko
 
client = paramiko.SSHClient()
 
client = paramiko.SSHClient()
 
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
 
client.connect(hostname= '192.168.1.10', port=22, username='root', password='123456', timeout=300,allow_agent=False,look_for_keys=False)
 
stdin,stdout,stderr=client.exec_command( "bash /tmp/run.sh")
   
 
result_info = ""
   
 
for line in stdout.readlines():
 
result_info += line
   
 
print result_info

將test.sh和run.sh傳到遠程服務器上,比如放到192.168.1.10:/tmp/下。

②、發起遠程執行

在本地執行 python test.py,會發現整個腳本不會立即打印run ok,而是等30s之后才打印包括test.sh的所有輸出信息。

2、解決辦法

將遠程腳本的標准輸出stdout重定向到錯誤輸出stderr即可,test.py 修改如下:

 
import paramiko
 
client = paramiko.SSHClient()
 
client = paramiko.SSHClient()
 
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
 
client.connect(hostname= '192.168.1.10', port=22, username='root', password='123456', timeout=300,allow_agent=False,look_for_keys=False)
 
stdin,stdout,stderr=client.exec_command( "bash /tmp/run.sh 1>&2")
   
 
result_info = ""
   
 
for line in stderr.readlines():
 
result_info += line
   
 
print result_info

現在執行,就能立即得到結果了。其實原因很簡單,因為stdout(標准輸出),輸出方式是行緩沖。輸出的字符會先存放在緩沖區,等按下回車鍵時才進行實際的I/O操作,導致paramiko遠程命令產生等待問題。而stderr(標准錯誤),是不帶緩沖的,這使得出錯信息可以直接盡快地顯示出來。所以,這里只要將腳本執行的標准輸出重定向到錯誤輸出(1>&2),然后paramiko就可以使用stderr快速讀取遠程打屏信息了。

三、This operation would block forever 報錯解決

這次擴容一個基於pramiko的自動化apiserver,結果發現在新環境執行遠程命令或文件傳輸時,拋了如下報錯:

 
2017-08-04 12:38:31,243 [ERROR] Exception: Error reading SSH protocol banner('This operation would block forever', <Hub at 0x38b02d0 epoll pending=0 ref=0 fileno=28>)
 
2017-08-04 12:38:31,244 [ERROR] Traceback (most recent call last):
 
2017-08-04 12:38:31,244 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1555, in run
 
2017-08-04 12:38:31,245 [ERROR] self._check_banner()
 
2017-08-04 12:38:31,245 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1681, in _check_banner
 
2017-08-04 12:38:31,245 [ERROR] raise SSHException('Error reading SSH protocol banner' + str(x))
 
2017-08-04 12:38:31,245 [ERROR] SSHException: Error reading SSH protocol banner('This operation would block forever', <Hub at 0x38b02d0 epoll pending=0 ref=0 fileno=28>)
 
2017-08-04 12:38:31,245 [ERROR]
 
2017-08-04 12:38:31,247 [INFO] Error reading SSH protocol banner('This operation would block forever', <Hub at 0x38b02d0 epoll pending=0 ref=0 fileno=28>)

總以為是python組件安裝有問題,反反復復檢查,最終發現居然是多裝了一個插件導致的!

解決辦法:

刪除已經安裝 greenlet插件即可,具體原因見后文:

 
rm -r /usr/ local/python2.7.5/lib/python2.7/site-packages/greenlet*

下面是"艱難險阻"的解決過程,不喜勿看:

1、看到報錯,作為懶人第一時間就搜了下 【This operation would block forever', <Hub】這個關鍵詞,發現沒能get到解決方案。

2、按照經驗,我先找到圖中 _check_banner 函數如下:

 
def _check_banner(self):
 
# this is slow, but we only have to do it once
 
for i in range(100):
 
# give them 15 seconds for the first line, then just 2 seconds
 
# each additional line. (some sites have very high latency.)
 
if i == 0:
 
timeout = self.banner_timeout
 
else:
 
timeout = 2
 
try:
 
buf = self.packetizer.readline(timeout)
 
except ProxyCommandFailure:
 
raise
 
except Exception, x:
 
raise SSHException('Error reading SSH protocol banner' + str(x))
 
if buf[:4] == 'SSH-':
 
break
 
self._log(DEBUG, 'Banner: ' + buf)
 
if buf[:4] != 'SSH-':
 
raise SSHException('Indecipherable protocol version "' + buf + '"')
 
# save this server version string for later
 
self.remote_version = buf
 
# pull off any attached comment
 
comment = ''
 
i = string.find(buf, ' ')
 
if i >= 0:
 
comment = buf[i+ 1:]
 
buf = buf[:i]
 
# parse out version string and make sure it matches
 
segs = buf.split( '-', 2)
 
if len(segs) < 3:
 
raise SSHException('Invalid SSH banner')
 
version = segs[ 1]
 
client = segs[ 2]
 
if version != '1.99' and version != '2.0':
 
raise SSHException('Incompatible version (%s instead of 2.0)' % (version,))
 
self._log(INFO, 'Connected (version %s, client %s)' % (version, client))

3、很明顯這個異常由 buf = self.packetizer.readline(timeout) 語句拋出,我印象中的粗暴定位方法就是不使用try,直接將此語句執行看看:

 
def _check_banner(self):
 
# this is slow, but we only have to do it once
 
for i in range(100):
 
# give them 15 seconds for the first line, then just 2 seconds
 
# each additional line. (some sites have very high latency.)
 
if i == 0:
 
timeout = self.banner_timeout
 
else:
 
timeout = 2
 
buf = self.packetizer.readline(timeout) # 我就加到,看看是從哪出來的異常
 
try:
 
buf = self.packetizer.readline(timeout)
 
except ProxyCommandFailure:
 
raise
 
except Exception, x:
 
raise SSHException('Error reading SSH protocol banner' + str(x))
 
if buf[:4] == 'SSH-':
 
break
 
self._log(DEBUG, 'Banner: ' + buf)
 
.....

結果報錯信息就更加具體了,如下所示:

 
2017-08-04 13:23:26,085 [ERROR] Unknown exception: ('This operation would block forever', <Hub at 0x20390f0 epoll pending=0 ref=0 fileno=27>)
 
2017-08-04 13:23:26,087 [ERROR] Traceback (most recent call last):
 
2017-08-04 13:23:26,088 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1555, in run
 
2017-08-04 13:23:26,088 [ERROR] self._check_banner()
 
2017-08-04 13:23:26,088 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1676, in _check_banner
 
2017-08-04 13:23:26,088 [ERROR] buf = self.packetizer.readline(timeout)
 
2017-08-04 13:23:26,088 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 280, in readline
 
2017-08-04 13:23:26,088 [ERROR] buf += self._read_timeout(timeout)
 
2017-08-04 13:23:26,088 [ERROR] File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 468, in _read_timeout
 
2017-08-04 13:23:26,089 [ERROR] x = self.__socket.recv(128)
 
2017-08-04 13:23:26,089 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/_socket2.py", line 280, in recv
 
2017-08-04 13:23:26,089 [ERROR] self._wait(self._read_event)
 
2017-08-04 13:23:26,089 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/_socket2.py", line 179, in _wait
 
2017-08-04 13:23:26,089 [ERROR] self.hub.wait(watcher)
 
2017-08-04 13:23:26,089 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/hub.py", line 630, in wait
 
2017-08-04 13:23:26,089 [ERROR] result = waiter.get()
 
2017-08-04 13:23:26,089 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/hub.py", line 878, in get
 
2017-08-04 13:23:26,090 [ERROR] return self.hub.switch()
 
2017-08-04 13:23:26,090 [ERROR] File "/usr/local/python2.7.5/lib/python2.7/site-packages/gevent-1.1.2-py2.7-linux-x86_64.egg/gevent/hub.py", line 609, in switch
 
2017-08-04 13:23:26,090 [ERROR] return greenlet.switch(self)
 
2017-08-04 13:23:26,090 [ERROR] LoopExit: ('This operation would block forever', <Hub at 0x20390f0 epoll pending=0 ref=0 fileno=27>)
 
2017-08-04 13:23:26,090 [ERROR]
 
2017-08-04 13:23:26,093 [INFO] ('This operation would block forever', <Hub at 0x20390f0 epoll pending=0 ref=0 fileno=27>)

這次基本就定位到了gevent和greenlet這個真凶了!本以為是我的apiserver調用了gevent,結果定位了半天,確定並沒有使用。而且印象中paramiko這個插件也沒用到gevent,可這異常是怎么來的?

直到我再次在谷歌搜索【LoopExit: ('This operation would block forever', <Hub at】關鍵詞,找到一個博客文章:http://www.hongquan.me/?p=178,總算知道是什么原因了!

具體原因:主要是因為 greenlet 里面有個run函數,覆蓋了 paramiko 的transport.py 里面的同名函數,導致paramiko執行_check_banner時,實際調用了greenlet的run函數,因此報錯!再次醉了!


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM