Python中的ftplib模塊用於對FTP的相關操作,常見的如下載,上傳等。使用python從FTP下載較大的文件時,往往比較耗時,如何提高從FTP下載文件的速度呢?多線程粉墨登場,本文給大家分享我的多線程下載代碼,需要用到的python主要模塊包括:ftplib和threading。
首先討論我們的下載思路,示意如下:
1. 將文件分塊,比如我們打算采用20個線程去下載同一個文件,則需要將文件以二進制方式打開,平均分成20塊,然后分別啟用一個線程去下載一個塊:
1 def setupThreads(self, filePath, localFilePath, threadNumber = 20): 2 """ 3 set up the threads which will be used to download images 4 list of threads will be returned if success, else 5 None will be returned 6 """ 7 try: 8 temp = self.ftp.sendcmd('SIZE ' + filePath) 9 remoteFileSize = int(string.split(temp)[1]) 10 blockSize = remoteFileSize / threadNumber 11 rest = None 12 threads = [] 13 for i in range(0, threadNumber - 1): 14 beginPoint = blockSize * i 15 subThread = threading.Thread(target = self.downloadFileMultiThreads, args = (i, filePath, localFilePath, beginPoint, blockSize, rest,)) 16 threads.append(subThread) 17 18 assigned = blockSize * threadNumber 19 unassigned = remoteFileSize - assigned 20 lastBlockSize = blockSize + unassigned 21 beginPoint = blockSize * (threadNumber - 1) 22 subThread = threading.Thread(target = self.downloadFileMultiThreads, args = (threadNumber - 1, filePath, localFilePath, beginPoint, lastBlockSize, rest,)) 23 threads.append(subThread) 24 return threads 25 except Exception, diag: 26 self.recordLog(str(diag), 'error') 27 return None
其中的downloadFileMultiThreads函數如下:
1 def downloadFileMultiThreads(self, threadIndex, remoteFilePath, localFilePath, \ 2 beginPoint, blockSize, rest = None): 3 """ 4 A sub thread used to download file 5 """ 6 try: 7 threadName = threading.currentThread().getName() 8 # temp local file 9 fp = open(localFilePath + '.part.' + str(threadIndex), 'wb') 10 callback = fp.write 11 12 # another connection to ftp server, change to path, and set binary mode 13 myFtp = FTP(self.host, self.user, self.passwd) 14 myFtp.cwd(os.path.dirname(remoteFilePath)) 15 myFtp.voidcmd('TYPE I') 16 17 finishedSize = 0 18 # where to begin downloading 19 setBeginPoint = 'REST ' + str(beginPoint) 20 myFtp.sendcmd(setBeginPoint) 21 # begin to download 22 beginToDownload = 'RETR ' + os.path.basename(remoteFilePath) 23 connection = myFtp.transfercmd(beginToDownload, rest) 24 readSize = self.fixBlockSize 25 while 1: 26 if blockSize > 0: 27 remainedSize = blockSize - finishedSize 28 if remainedSize > self.fixBlockSize: 29 readSize = self.fixBlockSize 30 else: 31 readSize = remainedSize 32 data = connection.recv(readSize) 33 if not data: 34 break 35 finishedSize = finishedSize + len(data) 36 # make sure the finished data no more than blockSize 37 if finishedSize == blockSize: 38 callback(data) 39 break 40 callback(data) 41 connection.close() 42 fp.close() 43 myFtp.quit() 44 return True 45 except Exception, diag: 46 return False
2. 等待下載完成之后我們需要對各個文件塊進行合並,合並的過程見本系列之二:Python之FTP多線程下載文件之分塊多線程文件合並
感謝大家的閱讀,希望能夠幫到大家!
Published by Windows Live Writer!