被kill問題之1:進程物理內存遠大於Xmx的問題分析


 《被kill問題之1:進程物理內存遠大於Xmx的問題分析

被kill問題之2:Docker環境下Java應用的JVM設置(容器中的JVM資源該如何被安全的限制)

問題描述

最近經常被問到一個問題,”為什么我們系統進程占用的物理內存(Res/Rss)會遠遠大於設置的Xmx值”,比如Xmx設置1.7G,但是top看到的Res的值卻達到了3.0G,隨着進程的運行,Res的值還在遞增,直到達到某個值,被OS當做bad process直接被kill掉了。

top - 16:57:47 up 73 days, 4:12, 8 users, load average: 6.78, 9.68, 13.31 Tasks: 130 total, 1 running, 123 sleeping, 6 stopped, 0 zombie Cpu(s): 89.9%us, 5.6%sy, 0.0%ni, 2.0%id, 0.7%wa, 0.7%hi, 1.2%si, 0.0%st ... PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22753 admin 20 0 4252m 3.0g 17m S 192.8 52.7 151:47.59 /opt/taobao/java/bin/java -server -Xms1700m -Xmx1700m -Xmn680m -Xss256k -XX:PermSize=128m -XX:MaxPermSize=128m -XX:+UseStringCache -XX:+ 40 root 20 0 0 0 0 D 0.3 0.0 5:53.07 [kswapd0]

物理內存大於Xmx可能嗎

先說下Xmx,這個vm配置只包括我們熟悉的新生代和老生代的最大值,不包括持久代,也不包括CodeCache,還有我們常聽說的堆外內存從名字上一看也知道沒有包括在內,當然還有其他內存也不會算在內等,因此理論上我們看到物理內存大於Xmx也是可能的,不過超過太多估計就可能有問題了。

物理內存和虛擬內存間的映射關系

我們知道os在內存上面的設計是花了心思的,為了讓資源得到最大合理利用,在物理內存之上搞一層虛擬地址,同一台機器上每個進程可訪問的虛擬地址空間大小都是一樣的,為了屏蔽掉復雜的到物理內存的映射,該工作os直接做了,當需要物理內存的時候,當前虛擬地址又沒有映射到物理內存上的時候,就會發生缺頁中斷,由內核去為之准備一塊物理內存,所以即使我們分配了一塊1G的虛擬內存,物理內存上不一定有一塊1G的空間與之對應,那到底這塊虛擬內存塊到底映射了多少物理內存呢,這個我們在linux下可以通過 /proc/<pid>/smaps 這個文件看到,其中的Size表示虛擬內存大小,而Rss表示的是物理內存,所以從這層意義上來說和虛擬內存塊對應的物理內存塊不應該超過此虛擬內存塊的空間范圍

8dc00000-100000000 rwxp 00000000 00:00 0 Size: 1871872 kB Rss: 1798444 kB Pss: 1798444 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 1798444 kB Referenced: 1798392 kB Anonymous: 1798444 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB

此次為了排查這個問題,我特地寫了個簡單的分析工具來分析這個問題,將連續的虛擬內存塊合並做統計,一般來說連續分配的內存塊還是有一定關系的,當然也不能完全肯定這種關系,得到的效果大致如下:

from->to  vs rss rss_percentage(rss/total_rss) merge_block_count

0x8dc00000->0x30c9a20000 1871872 1487480 53.77% 1 0x7faf7a4c5000->0x7fffa7dd9000 1069464 735996 26.60% 440 0x7faf50c75000->0x7faf6c02a000 445996 226860 8.20% 418 0x7faf6c027000->0x7faf78010000 196452 140640 5.08% 492 0x418e8000->0x100000000 90968 90904 3.29% 1 0x7faf48000000->0x7faf50c78000 131072 35120 1.27% 4 0x7faf28000000->0x7faf3905e000 196608 20708 0.75% 6 0x7faf38000000->0x7faf4ad83000 196608 17036 0.62% 6 0x7faf78009000->0x7faf7a4c6000 37612 10440 0.38% 465 0x30c9e00000->0x30ca202000 3656 716 0.03% 5 0x7faf20000000->0x7faf289c7000 65536 132 0.00% 2 0x30c9a00000->0x30c9c20000 128 108 0.00% 1 0x30ca600000->0x30cae83000 2164 76 0.00% 5 0x30cbe00000->0x30cca16000 2152 68 0.00% 5 0x7fffa7dc3000->0x7fffa7e00000 92 48 0.00% 1 0x30cca00000->0x7faf21dba000 2148 32 0.00% 5 0x30cb200000->0x30cbe16000 2080 28 0.00% 4 0x30cae00000->0x30cb207000 2576 20 0.00% 4 0x30ca200000->0x30ca617000 2064 16 0.00% 4 0x40000000->0x4010a000 36 12 0.00% 2 0x30c9c1f000->0x30c9f89000 12 12 0.00% 3 0x40108000->0x471be000 8 8 0.00% 1 0x7fffa7dff000->0x0 4 4 0.00% 0

當然這只是一個簡單的分析,如果更有價值需要我們挖掘更多的點出來,比如每個內存塊是屬於哪塊memory pool,到底是什么地方分配的等,不過需要jvm支持( 注:上面的第一條,其實就是new+old+perm對應的虛擬內存及其物理內存映射情況 )。

進程滿足什么條件會被os因為oom而被kill

當一個進程無故消失的時候,我們一般看 /var/log/message 里是否有 Out of memory: Kill process 關鍵字(如果是java進程我們先看是否有crash日志),如果有就說明是被os因為oom而被kill了:

Aug 19 08:32:38 mybank-ant kernel: : [6176841.238016] java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238022] java cpuset=/ mems_allowed=0 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238024] Pid: 25371, comm: java Not tainted 2.6.32-220.23.2.ali878.el6.x86_64 #1 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238026] Call Trace: Aug 19 08:32:38 mybank-ant kernel: : [6176841.238039] [<ffffffff810c35e1>] ? cpuset_print_task_mems_allowed+0x91/0xb0 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238068] [<ffffffff81114d70>] ? dump_header+0x90/0x1b0 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238074] [<ffffffff810e1b2e>] ? __delayacct_freepages_end+0x2e/0x30 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238079] [<ffffffff81213ffc>] ? security_real_capable_noaudit+0x3c/0x70 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238082] [<ffffffff811151fa>] ? oom_kill_process+0x8a/0x2c0 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238084] [<ffffffff81115131>] ? select_bad_process+0xe1/0x120 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238087] [<ffffffff81115650>] ? out_of_memory+0x220/0x3c0 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238093] [<ffffffff81125929>] ? __alloc_pages_nodemask+0x899/0x930 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238099] [<ffffffff81159b6a>] ? alloc_pages_current+0xaa/0x110 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238102] [<ffffffff81111ea7>] ? __page_cache_alloc+0x87/0x90 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238105] [<ffffffff81127f4b>] ? __do_page_cache_readahead+0xdb/0x270 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238108] [<ffffffff81128101>] ? ra_submit+0x21/0x30 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238110] [<ffffffff81113e17>] ? filemap_fault+0x5b7/0x600 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238113] [<ffffffff8113ca64>] ? __do_fault+0x54/0x510 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238116] [<ffffffff811140a0>] ? __generic_file_aio_write+0x240/0x470 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238118] [<ffffffff8113d017>] ? handle_pte_fault+0xf7/0xb50 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238121] [<ffffffff8111438e>] ? generic_file_aio_write+0xbe/0xe0 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238133] [<ffffffffa008a171>] ? ext4_file_write+0x61/0x1e0 [ext4] Aug 19 08:32:38 mybank-ant kernel: : [6176841.238135] [<ffffffff8113dc54>] ? handle_mm_fault+0x1e4/0x2b0 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238138] [<ffffffff81177c7a>] ? do_sync_write+0xfa/0x140 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238143] [<ffffffff81042c69>] ? __do_page_fault+0x139/0x480 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238147] [<ffffffff8118ad22>] ? vfs_ioctl+0x22/0xa0 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238151] [<ffffffff814e4f8e>] ? do_page_fault+0x3e/0xa0 Aug 19 08:32:38 mybank-ant kernel: : [6176841.238154] [<ffffffff814e2345>] ? page_fault+0x25/0x30 ... Aug 19 08:32:38 mybank-ant kernel: : [6176841.247969] [24673] 1801 24673 1280126 926068 1 0 0 java Aug 19 08:32:38 mybank-ant kernel: : [6176841.247971] [25084] 1801 25084 3756 101 0 0 0 top Aug 19 08:32:38 mybank-ant kernel: : [6176841.247973] [25094] 1801 25094 25233 30 1 0 0 tail Aug 19 08:32:38 mybank-ant kernel: : [6176841.247975] [25098] 1801 25098 25233 31 0 0 0 tail Aug 19 08:32:38 mybank-ant kernel: : [6176841.247977] [25100] 1801 25100 25233 30 1 0 0 tail Aug 19 08:32:38 mybank-ant kernel: : [6176841.247979] [25485] 1801 25485 25233 30 1 0 0 tail Aug 19 08:32:38 mybank-ant kernel: : [6176841.247981] [26055] 1801 26055 25233 30 0 0 0 tail Aug 19 08:32:38 mybank-ant kernel: : [6176841.247984] [26069] 1801 26069 25233 30 0 0 0 tail Aug 19 08:32:38 mybank-ant kernel: : [6176841.247986] [26081] 1801 26081 25233 30 0 0 0 tail Aug 19 08:32:38 mybank-ant kernel: : [6176841.247988] [26147] 1801 26147 25233 32 0 0 0 tail Aug 19 08:32:38 mybank-ant kernel: : [6176841.247990] Out of memory: Kill process 24673 (java) score 946 or sacrifice child Aug 19 08:32:38 mybank-ant kernel: : [6176841.249016] Killed process 24673, UID 1801, (java) total-vm:5120504kB, anon-rss:3703788kB, file-rss:484kB

從上面我們看到了一個堆棧,也就是內核里選擇被kill進程的過程,這個過程會對進程進行一系列的計算,每個進程都會給它們計算一個score,這個分數會記錄在 /proc/<pid>/oom_score 里,通常這個分數越高,就越危險,被kill的可能性就越大,下面將內核相關的代碼貼出來,有興趣的可以看看,其中代碼注釋上也寫了挺多相關的東西了:

/*  * Simple selection loop. We chose the process with the highest  * number of 'points'. We expect the caller will lock the tasklist.  *  * (not docbooked, we don't want this one cluttering up the manual)  */ static struct task_struct *select_bad_process(unsigned long *ppoints, struct mem_cgroup *mem) { struct task_struct *p; struct task_struct *chosen = NULL; struct timespec uptime; *ppoints = 0; do_posix_clock_monotonic_gettime(&uptime); for_each_process(p) { unsigned long points; /*  * skip kernel threads and tasks which have already released  * their mm.  */ if (!p->mm) continue; /* skip the init task */ if (is_global_init(p)) continue; if (mem && !task_in_mem_cgroup(p, mem)) continue; /*  * This task already has access to memory reserves and is  * being killed. Don't allow any other task access to the  * memory reserve.  *  * Note: this may have a chance of deadlock if it gets  * blocked waiting for another task which itself is waiting  * for memory. Is there a better alternative?  */ if (test_tsk_thread_flag(p, TIF_MEMDIE)) return ERR_PTR(-1UL); /*  * This is in the process of releasing memory so wait for it  * to finish before killing some other task by mistake.  *  * However, if p is the current task, we allow the 'kill' to  * go ahead if it is exiting: this will simply set TIF_MEMDIE,  * which will allow it to gain access to memory reserves in  * the process of exiting and releasing its resources.  * Otherwise we could get an easy OOM deadlock.  */ if (p->flags & PF_EXITING) { if (p != current) return ERR_PTR(-1UL); chosen = p; *ppoints = ULONG_MAX; } if (p->signal->oom_adj == OOM_DISABLE) continue; points = badness(p, uptime.tv_sec); if (points > *ppoints || !chosen) { chosen = p; *ppoints = points; } } return chosen; } /**  * badness - calculate a numeric value for how bad this task has been  * @p: task struct of which task we should calculate  * @uptime: current uptime in seconds  *  * The formula used is relatively simple and documented inline in the  * function. The main rationale is that we want to select a good task  * to kill when we run out of memory.  *  * Good in this context means that:  * 1) we lose the minimum amount of work done  * 2) we recover a large amount of memory  * 3) we don't kill anything innocent of eating tons of memory  * 4) we want to kill the minimum amount of processes (one)  * 5) we try to kill the process the user expects us to kill, this  * algorithm has been meticulously tuned to meet the principle  * of least surprise ... (be careful when you change it)  */ unsigned long badness(struct task_struct *p, unsigned long uptime) { unsigned long points, cpu_time, run_time; struct mm_struct *mm; struct task_struct *child; int oom_adj = p->signal->oom_adj; struct task_cputime task_time; unsigned long utime; unsigned long stime; if (oom_adj == OOM_DISABLE) return 0; task_lock(p); mm = p->mm; if (!mm) { task_unlock(p); return 0; } /*  * The memory size of the process is the basis for the badness.  */ points = mm->total_vm; /*  * After this unlock we can no longer dereference local variable `mm'  */ task_unlock(p); /*  * swapoff can easily use up all memory, so kill those first.  */ if (p->flags & PF_OOM_ORIGIN) return ULONG_MAX; /*  * Processes which fork a lot of child processes are likely  * a good choice. We add half the vmsize of the children if they  * have an own mm. This prevents forking servers to flood the  * machine with an endless amount of children. In case a single  * child is eating the vast majority of memory, adding only half  * to the parents will make the child our kill candidate of choice.  */ list_for_each_entry(child, &p->children, sibling) { task_lock(child); if (child->mm != mm && child->mm) points += child->mm->total_vm/2 + 1; task_unlock(child); } /*  * CPU time is in tens of seconds and run time is in thousands  * of seconds. There is no particular reason for this other than  * that it turned out to work very well in practice.  */ thread_group_cputime(p, &task_time); utime = cputime_to_jiffies(task_time.utime); stime = cputime_to_jiffies(task_time.stime); cpu_time = (utime + stime) >> (SHIFT_HZ + 3); if (uptime >= p->start_time.tv_sec) run_time = (uptime - p->start_time.tv_sec) >> 10; else run_time = 0; if (cpu_time) points /= int_sqrt(cpu_time); if (run_time) points /= int_sqrt(int_sqrt(run_time)); /*  * Niced processes are most likely less important, so double  * their badness points.  */ if (task_nice(p) > 0) points *= 2; /*  * Superuser processes are usually more important, so we make it  * less likely that we kill those.  */ if (has_capability_noaudit(p, CAP_SYS_ADMIN) || has_capability_noaudit(p, CAP_SYS_RESOURCE)) points /= 4; /*  * We don't want to kill a process with direct hardware access.  * Not only could that mess up the hardware, but usually users  * tend to only have this flag set on applications they think  * of as important.  */ if (has_capability_noaudit(p, CAP_SYS_RAWIO)) points /= 4; /*  * If p's nodes don't overlap ours, it may still help to kill p  * because p may have allocated or otherwise mapped memory on  * this node before. However it will be less likely.  */ if (!has_intersects_mems_allowed(p)) points /= 8; /*  * Adjust the score by oom_adj.  */ if (oom_adj) { if (oom_adj > 0) { if (!points) points = 1; points <<= oom_adj; } else points >>= -(oom_adj); } #ifdef DEBUG printk(KERN_DEBUG "OOMkill: task %d (%s) got %lu points\n", p->pid, p->comm, points); #endif return points; }

物理內存到底去哪了?

DirectByteBuffer冰山對象?

這是我們查這個問題首先要想到的一個地方,是否是因為什么地方不斷創建DirectByteBuffer對象,但是由於沒有被回收導致了內存泄露呢,之前有篇文章已經詳細介紹了這種特殊對象 JVM源碼分析之堆外內存完全解讀 ,對阿里內部的童鞋,可以直接使用zprofiler的heap視圖里的堆外內存分析功能拿到統計結果,知道后台到底綁定了多少堆外內存還沒有被回收:

object   position    limit   capacity
 java.nio.DirectByteBuffer @ 0x760afaed0 133 133 6380562 java.nio.DirectByteBuffer @ 0x790d51ae0 0 262144 262144 java.nio.DirectByteBuffer @ 0x790d20b80 133934 133934 262144 java.nio.DirectByteBuffer @ 0x790d20b40 0 262144 262144 java.nio.DirectByteBuffer @ 0x790d20b00 133934 133934 262144 java.nio.DirectByteBuffer @ 0x771ba3608 0 262144 262144 java.nio.DirectByteBuffer @ 0x771ba35c8 133934 133934 262144 java.nio.DirectByteBuffer @ 0x7c5c9e250 0 131072 131072 java.nio.DirectByteBuffer @ 0x7c5c9e210 74670 74670 131072 java.nio.DirectByteBuffer @ 0x7c185cd10 0 131072 131072 java.nio.DirectByteBuffer @ 0x7c185ccd0 98965 98965 131072 java.nio.DirectByteBuffer @ 0x7b181c980 65627 65627 131072 java.nio.DirectByteBuffer @ 0x7a40d6e40 0 131072 131072 java.nio.DirectByteBuffer @ 0x794ac3320 0 131072 131072 java.nio.DirectByteBuffer @ 0x794a7a418 80490 80490 131072 java.nio.DirectByteBuffer @ 0x77279e1d8 0 131072 131072 java.nio.DirectByteBuffer @ 0x77279dde8 65627 65627 131072 java.nio.DirectByteBuffer @ 0x76ea84000 0 131072 131072 java.nio.DirectByteBuffer @ 0x76ea83fc0 82549 82549 131072 java.nio.DirectByteBuffer @ 0x764d8d678 0 0 131072 java.nio.DirectByteBuffer @ 0x764d8d638 0 0 131072 java.nio.DirectByteBuffer @ 0x764d8d5f8 0 0 131072 java.nio.DirectByteBuffer @ 0x761a76340 0 131072 131072 java.nio.DirectByteBuffer @ 0x761a76300 74369 74369 131072 java.nio.DirectByteBuffer @ 0x7607423d0 0 131072 131072 總共: 25 / 875 條目; 還有850條,雙擊展開 1267762 3826551 12083282

某個動態庫里頻繁分配?

對於動態庫里頻繁分配的問題,主要得使用google的perftools工具了,該工具網上介紹挺多的,就不對其用法做詳細介紹了,通過該工具我們能得到native方法分配內存的情況,該工具主要利用了unix的一個環境變量LD_PRELOAD,它允許你要加載的動態庫優先加載起來,相當於一個Hook了,於是可以針對同一個函數可以選擇不同的動態庫里的實現了,比如googleperftools就是將malloc方法替換成了tcmalloc的實現,這樣就可以跟蹤內存分配路徑了,得到的效果類似如下:

Total: 1670.0 MB 1616.3 96.8% 96.8% 1616.3 96.8% zcalloc 40.3 2.4% 99.2% 40.3 2.4% os::malloc 9.4 0.6% 99.8% 9.4 0.6% init 1.6 0.1% 99.9% 1.7 0.1% readCEN 1.3 0.1% 99.9% 1.3 0.1% ObjectSynchronizer::omAlloc 0.5 0.0% 100.0% 1591.0 95.3% Java_java_util_zip_Deflater_init 0.1 0.0% 100.0% 0.1 0.0% _dl_allocate_tls 0.1 0.0% 100.0% 0.2 0.0% addMetaName 0.1 0.0% 100.0% 0.2 0.0% allocZip 0.1 0.0% 100.0% 0.1 0.0% instanceKlass::add_dependent_nmethod 0.1 0.0% 100.0% 0.1 0.0% newEntry 0.0 0.0% 100.0% 0.0 0.0% strdup 0.0 0.0% 100.0% 25.8 1.5% Java_java_util_zip_Inflater_init 0.0 0.0% 100.0% 0.0 0.0% growMetaNames 0.0 0.0% 100.0% 0.0 0.0% _dl_new_object 0.0 0.0% 100.0% 0.0 0.0% pthread_cond_wait@GLIBC_2.2.5 0.0 0.0% 100.0% 1.4 0.1% Thread::Thread 0.0 0.0% 100.0% 0.0 0.0% pthread_cond_timedwait@GLIBC_2.2.5 0.0 0.0% 100.0% 0.0 0.0% JLI_MemAlloc 0.0 0.0% 100.0% 0.0 0.0% read_alias_file 0.0 0.0% 100.0% 0.0 0.0% _nl_intern_locale_data 0.0 0.0% 100.0% 0.0 0.0% nss_parse_service_list 0.0 0.0% 100.0% 0.0 0.0% getprotobyname 0.0 0.0% 100.0% 0.0 0.0% getpwuid 0.0 0.0% 100.0% 0.0 0.0% _dl_check_map_versions 0.0 0.0% 100.0% 1590.5 95.2% deflateInit2_

從上面的輸出中我們看到了 zcalloc 函數總共分配了1616.3M的內存,還有 Java_java_util_zip_Deflater_init 分配了1591.0M內存, deflateInit2_ 分配了1590.5M,然而總共才分配了1670.0M內存,所以這幾個函數肯定是調用者和被調用者的關系:

JNIEXPORT jlong JNICALL
Java_java_util_zip_Deflater_init(JNIEnv *env, jclass cls, jint level,
                                 jint strategy, jboolean nowrap)
{
    z_stream *strm = calloc(1, sizeof(z_stream)); if (strm == 0) { JNU_ThrowOutOfMemoryError(env, 0); return jlong_zero; } else { char *msg; switch (deflateInit2(strm, level, Z_DEFLATED, nowrap ? -MAX_WBITS : MAX_WBITS, DEF_MEM_LEVEL, strategy)) { case Z_OK: return ptr_to_jlong(strm); case Z_MEM_ERROR: free(strm); JNU_ThrowOutOfMemoryError(env, 0); return jlong_zero; case Z_STREAM_ERROR: free(strm); JNU_ThrowIllegalArgumentException(env, 0); return jlong_zero; default: msg = strm->msg; free(strm); JNU_ThrowInternalError(env, msg); return jlong_zero; } } } int ZEXPORT deflateInit2_(strm, level, method, windowBits, memLevel, strategy, version, stream_size) z_streamp strm; int level; int method; int windowBits; int memLevel; int strategy; const char *version; int stream_size; { deflate_state *s; int wrap = 1; static const char my_version[] = ZLIB_VERSION; ushf *overlay; /* We overlay pending_buf and d_buf+l_buf. This works since the average  * output size for (length,distance) codes is <= 24 bits.  */ if (version == Z_NULL || version[0] != my_version[0] || stream_size != sizeof(z_stream)) { return Z_VERSION_ERROR; } if (strm == Z_NULL) return Z_STREAM_ERROR; strm->msg = Z_NULL; if (strm->zalloc == (alloc_func)0) { strm->zalloc = zcalloc; strm->opaque = (voidpf)0; } if (strm->zfree == (free_func)0) strm->zfree = zcfree; #ifdef FASTEST if (level != 0) level = 1; #else if (level == Z_DEFAULT_COMPRESSION) level = 6; #endif if (windowBits < 0) { /* suppress zlib wrapper */ wrap = 0; windowBits = -windowBits; } #ifdef GZIP else if (windowBits > 15) { wrap = 2; /* write gzip wrapper instead */ windowBits -= 16; } #endif if (memLevel < 1 || memLevel > MAX_MEM_LEVEL || method != Z_DEFLATED || windowBits < 8 || windowBits > 15 || level < 0 || level > 9 || strategy < 0 || strategy > Z_FIXED) { return Z_STREAM_ERROR; } if (windowBits == 8) windowBits = 9; /* until 256-byte window bug fixed */ s = (deflate_state *) ZALLOC(strm, 1, sizeof(deflate_state)); if (s == Z_NULL) return Z_MEM_ERROR; strm->state = (struct internal_state FAR *)s; s->strm = strm; s->wrap = wrap; s->gzhead = Z_NULL; s->w_bits = windowBits; s->w_size = 1 << s->w_bits; s->w_mask = s->w_size - 1; s->hash_bits = memLevel + 7; s->hash_size = 1 << s->hash_bits; s->hash_mask = s->hash_size - 1; s->hash_shift = ((s->hash_bits+MIN_MATCH-1)/MIN_MATCH); s->window = (Bytef *) ZALLOC(strm, s->w_size, 2*sizeof(Byte)); s->prev = (Posf *) ZALLOC(strm, s->w_size, sizeof(Pos)); s->head = (Posf *) ZALLOC(strm, s->hash_size, sizeof(Pos)); s->lit_bufsize = 1 << (memLevel + 6); /* 16K elements by default */ overlay = (ushf *) ZALLOC(strm, s->lit_bufsize, sizeof(ush)+2); s->pending_buf = (uchf *) overlay; s->pending_buf_size = (ulg)s->lit_bufsize * (sizeof(ush)+2L); if (s->window == Z_NULL || s->prev == Z_NULL || s->head == Z_NULL || s->pending_buf == Z_NULL) { s->status = FINISH_STATE; strm->msg = (char*)ERR_MSG(Z_MEM_ERROR); deflateEnd (strm); return Z_MEM_ERROR; } s->d_buf = overlay + s->lit_bufsize/sizeof(ush); s->l_buf = s->pending_buf + (1+sizeof(ush))*s->lit_bufsize; s->level = level; s->strategy = strategy; s->method = (Byte)method; return deflateReset(strm); }

上述代碼也驗證了他們這種關系。

那現在的問題就是找出哪里調用 Java_java_util_zip_Deflater_init 了,從這方法的命名上知道它是一個java的native方法實現,對應的是 java.util.zip.Deflater 這個類的 init 方法,所以要知道 init 方法哪里被調用了,跟蹤調用棧我們會想到btrace工具,但是btrace是通過插樁的方式來實現的,對於native方法是無法插樁的,於是我們看調用它的地方,找到對應的方法,然后進行btrace腳本編寫:

import com.sun.btrace.annotations.*; import static com.sun.btrace.BTraceUtils.*; @BTrace public class Test { @OnMethod( clazz="java.util.zip.Deflater", method="<init>" ) public static void onnewThread(int i,boolean b) { jstack(); } } 

於是跟蹤對應的進程,我們能抓到調用Deflater構造函數的堆棧

org.apache.commons.compress.compressors.deflate.DeflateCompressorOutputStream.<init>(DeflateCompressorOutputStream.java:47) com.xxx.unimsg.parse.util.CompressUtil.deflateCompressAndEncode(CompressUtil.java:199) com.xxx.unimsg.parse.util.CompressUtil.compress(CompressUtil.java:80) com.xxx.unimsg.UnifyMessageHelper.compressXml(UnifyMessageHelper.java:65) com.xxx.core.model.utils.UnifyMessageUtil.compressXml(UnifyMessageUtil.java:56) com.xxx.repository.convert.BatchInDetailConvert.convertDO(BatchInDetailConvert.java:57) com.xxx.repository.impl.IncomingDetailRepositoryImpl$1.store(IncomingDetailRepositoryImpl.java:43) com.xxx.repository.helper.IdempotenceHelper.store(IdempotenceHelper.java:27) com.xxx.repository.impl.IncomingDetailRepositoryImpl.store(IncomingDetailRepositoryImpl.java:40) sun.reflect.GeneratedMethodAccessor274.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:597) org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:309) org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) com.alipay.finsupport.component.monitor.MethodMonitorInterceptor.invoke(MethodMonitorInterceptor.java:45) org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) ...

從上面的堆棧我們找出了調用 java.util.zip.Deflate.init() 的地方

問題解決

上面已經定位了具體的代碼了,於是再細致跟蹤了下對應的代碼,其實並不是代碼實現上的問題,而是代碼設計上沒有考慮到流量很大的場景,當流量很大的時候,不管自己系統是否能承受這么大的壓力,都來者不拒,拿到數據就做deflate,而這個過程是需要分配堆外內存的,當量達到一定程度的時候此時會發生oom killer,另外我們在分析過程中發現其實物理內存是有下降的

30071.txt: 0.0 0.0% 100.0% 96.7 57.0% Java_java_util_zip_Deflater_init 30071.txt: 0.1 0.0% 99.9% 196.0 72.6% Java_java_util_zip_Deflater_init 30071.txt: 0.1 0.0% 99.9% 290.3 78.5% Java_java_util_zip_Deflater_init 30071.txt: 0.1 0.0% 99.9% 392.7 83.6% Java_java_util_zip_Deflater_init 30071.txt: 0.2 0.0% 99.9% 592.8 88.5% Java_java_util_zip_Deflater_init 30071.txt: 0.2 0.0% 99.9% 700.7 91.0% Java_java_util_zip_Deflater_init 30071.txt: 0.3 0.0% 99.9% 799.1 91.9% Java_java_util_zip_Deflater_init 30071.txt: 0.3 0.0% 99.9% 893.9 92.2% Java_java_util_zip_Deflater_init 30071.txt: 0.0 0.0% 99.9% 114.2 63.7% Java_java_util_zip_Deflater_init 30071.txt: 0.0 0.0% 100.0% 105.1 52.1% Java_java_util_zip_Deflater_init 30071.txt: 0.2 0.0% 99.9% 479.7 87.4% Java_java_util_zip_Deflater_init 30071.txt: 0.3 0.0% 99.9% 782.2 90.1% Java_java_util_zip_Deflater_init 30071.txt: 0.3 0.0% 99.9% 986.9 92.3% Java_java_util_zip_Deflater_init 30071.txt: 0.4 0.0% 99.9% 1086.3 92.9% Java_java_util_zip_Deflater_init 30071.txt: 0.4 0.0% 99.9% 1185.1 93.3% Java_java_util_zip_Deflater_init 30071.txt: 0.3 0.0% 99.9% 941.5 92.1% Java_java_util_zip_Deflater_init 30071.txt: 0.4 0.0% 100.0% 1288.8 94.1% Java_java_util_zip_Deflater_init 30071.txt: 0.5 0.0% 100.0% 1394.8 94.9% Java_java_util_zip_Deflater_init 30071.txt: 0.5 0.0% 100.0% 1492.5 95.1% Java_java_util_zip_Deflater_init 30071.txt: 0.5 0.0% 100.0% 1591.0 95.3% Java_java_util_zip_Deflater_init 30071.txt: 0.3 0.0% 99.9% 874.6 90.0% Java_java_util_zip_Deflater_init 30071.txt: 0.3 0.0% 99.9% 950.7 92.8% Java_java_util_zip_Deflater_init 30071.txt: 0.3 0.0% 99.9% 858.4 92.3% Java_java_util_zip_Deflater_init 30071.txt: 0.3 0.0% 99.9% 818.4 91.9% Java_java_util_zip_Deflater_init 30071.txt: 0.3 0.0% 99.9% 858.7 91.2% Java_java_util_zip_Deflater_init 30071.txt: 0.1 0.0% 99.9% 271.5 77.9% Java_java_util_zip_Deflater_init 30071.txt: 0.4 0.0% 99.9% 1260.4 93.1% Java_java_util_zip_Deflater_init 30071.txt: 0.3 0.0% 99.9% 976.4 90.6% Java_java_util_zip_Deflater_init

這也就說明了其實代碼使用上並沒有錯,因此建議將deflate放到隊列里去做,比如限制隊列大小是100,每次最多100個數據可以被deflate,處理一個放進一個,以至於不會被活活撐死。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM