【python】threadpool的內存占用問題

本文轉載自查看原文 2018-01-16 19:03 1976 python

先說結論：

在使用多線程時，不要使用threadpool，應該使用threading, 尤其是數據量大的情況。因為threadpool會導致嚴重的內存占用問題！

對比threading和threadpool的內存占用

# coding=utf-8

import time
import os
import psutil
import json
import threadpool
import threading


class TEST(object):
    # 獲取數據，使用yield, 每次返回一個len=10的list, list中的每一項是一個線程的數據
    def get_data(self):
        multi_list = list()
        for i in range(100):
            data = "abcdefg" * 100000
            multi_list.append(data)
            if len(multi_list) % 10 == 0:
                yield multi_list
                multi_list = list()

    # 測試函數
    def test(self):
        for data in self.get_data():
            mem = psutil.Process(os.getpid()).memory_info().rss
            print "[test] mem %s" % mem    # 打印內存占用情況
            self.deal_threadpool(data)      # 使用threadpool
            # self.deal_multi_thread(data)  # 使用threading

    # 待對比方法，threadpool
    def deal_threadpool(self, data_list):
        pool = threadpool.ThreadPool(10)
        requests = threadpool.makeRequests(self.sub_task, data_list)
        [pool.putRequest(req) for req in requests]
        pool.wait()

    # 待對比方法，threading
    def deal_multi_thread(self, data_list):
        threads = list()
        for data in data_list:
            threads.append(threading.Thread(target=self.sub_task, args=(data,)))
        for t in threads:
            t.start()
        for t in threads:
            t.join()

    def sub_task(self, data):
        return


if __name__ == "__main__":
    mem = psutil.Process(os.getpid()).memory_info().rss
    print "[main] mem %s" % mem
    obj = TEST()
    obj.test()
    mem = psutil.Process(os.getpid()).memory_info().rss
    print "[main] mem %s" % mem

結果：

1. 使用threadpool時

[main] mem 9760768
[test] mem 16764928
[test] mem 23924736
[test] mem 26820608
[test] mem 29720576
[test] mem 31911936
[test] mem 34795520
[test] mem 36978688
[test] mem 39161856
[test] mem 41340928
[test] mem 43524096
[main] mem 43606016

2. 使用threading時

[main] mem 9760768
[test] mem 16764928
[test] mem 23838720
[test] mem 16969728
[test] mem 23838720
[test] mem 16969728
[test] mem 23838720
[test] mem 16973824
[test] mem 23842816
[test] mem 16973824
[test] mem 23842816
[main] mem 16973824

對比可以看出，

使用threading時，每次線程退出可以正確的釋放內存，內存占用的最大值很穩定。

使用threadpool時，每次線程退出后內存都沒有釋放，而是一直累加。在我實際使用的過程中，從mongo獲取了大量的數據，threadpool在處理過程中占用的內存高達50g，而使用threading后內存占用穩定在了1g.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 jemalloc內存占用問題 Python 對象內存占用 .Net線程池ThreadPool導致內存高的問題分析 hiveserver 占用內存過大的問題 .NET 程序內存占用問題高內存占用問題分析 python占用內存過高排查 python線程池（threadpool） docker gitlab占用內存太多問題 Filebeat占用內存和CPU過高問題排查