0x0
最近在做一些JVM啟動加速的工作,用Instrumentation做了個agent,會調用retransformClasses之類的修改字節碼,這一部分的工作主要是省去了類查找的開銷,對於JVM對類的解析,驗證和鏈接等開銷還是存在的。
於是准備結合JEP 310 AppCDS來做,因為CDS可以省去JVM解析等開銷(實際上,AppCDS不能完成這個場景,用了內部的EagerCDS,這里就不能展開,如果后期開源了倒是可以說說,這里用AppCDS不影響描述)。
結果就遇到AppCDS+JVMTI agent跑的時候JVM Crash🤔。可以確定JVM是支持AppCDS+JVMTI的
兩者結合的時候有一些問題,也有一些解決 方案 ,這些patch解決問題的問題是JVMTI動態修改bootclasspath/classpath導致dumptime和runtime不一樣。和這個問題的場景不一樣,所以可以確定與這些patch無關。
0x1
crash后的hs_err部分如下:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007ffff558cfbc, pid=7854, tid=7880
#
# JRE version: OpenJDK Runtime Environment (11.0.7) (slowdebug build 11.0.7-internal+0-adhoc.qingfengyy.ajdk)
# Java VM: OpenJDK 64-Bit Server VM (slowdebug 11.0.7-internal+0-adhoc.qingfengyy.ajdk, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0xb55fbc] PackageEntry::module() const+0xc
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
--------------- S U M M A R Y ------------
Command Line: --add-opens=java.base/jdk.internal.loader=ALL-UNNAMED --add-opens=java.base/jdk.internal.util.jar=ALL-UNNAMED -javaagent:/home/qingfeng.yy/jar_index/jarindexer/jarindexer.jar=use -XX:+UnlockExperimentalVMOptions -XX:+EagerAppCDS -XX:+EagerAppCDSLegacyVerisonSupport -Xshare:on -XX:SharedArchiveFile=my.jsa -Dcom.alibaba.cds.listPath=my.lst Buy2ByURLClassLoader
Host: e69e13043.et15sqa, Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz, 96 cores, 503G, Alibaba Group Enterprise Linux Server release 7.2 (Paladin)
Time: Wed Aug 19 11:16:24 2020 CST elapsed time: 589 seconds (0d 0h 9m 49s)
--------------- T H R E A D ---------------
Current thread (0x00007ffff001f800): JavaThread "main" [_thread_in_vm, id=7880, stack(0x00007ffff7ee6000,0x00007ffff7fe7000)]
Stack: [0x00007ffff7ee6000,0x00007ffff7fe7000], sp=0x00007ffff7fe2070, free space=1008k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0xb55fbc] PackageEntry::module() const+0xc
V [libjvm.so+0xfa005a] InstanceKlass::module() const+0x32
V [libjvm.so+0x125132b] KlassFactory::check_shared_class_file_load_hook(InstanceKlass*, Symbol*, Handle, Handle, Thread*)+0x2eb
V [libjvm.so+0x165f208] SystemDictionary::load_shared_class(InstanceKlass*, Handle, Handle, Thread*)+0x35a
V [libjvm.so+0x166abbf] SystemDictionaryShared::acquire_class_for_current_thread(InstanceKlass*, Handle, Handle, Thread*)+0xcf
V [libjvm.so+0x166a960] SystemDictionaryShared::define_class_from_cds(InstanceKlass*, Handle, Handle, Thread*)+0x40
V [libjvm.so+0x10f5489] JVM_DefineClassFromCDS+0x2b0
j java.lang.ClassLoader.defineClassFromCDS0(Ljava/lang/ClassLoader;Ljava/security/ProtectionDomain;J)Ljava/lang/Class;+0 java.base@11.0.7-internal
j java.lang.ClassLoader.defineClassFromCDS(Ljava/lang/String;JLjava/security/ProtectionDomain;)Ljava/lang/Class;+17 java.base@11.0.7-internal
j java.security.SecureClassLoader.defineClassFromCDS(Ljava/lang/String;JLjava/security/CodeSource;)Ljava/lang/Class;+9 java.base@11.0.7-internal
j java.net.URLClassLoader.defineClassInternal(Ljava/lang/String;Ljdk/internal/loader/Resource;ZLjava/lang/String;Ljava/lang/String;J)Ljava/lang/Class;+346 java.base@11.0.7-internal
j java.net.URLClassLoader.defineClassFromCDS(Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;J)Ljava/lang/Class;+8 java.base@11.0.7-internal
j java.net.URLClassLoader$1.run()Ljava/lang/Class;+41 java.base@11.0.7-internal
j java.net.URLClassLoader$1.run()Ljava/lang/Object;+1 java.base@11.0.7-internal
v ~StubRoutines::call_stub
V [libjvm.so+0xfd1127] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x689
V [libjvm.so+0x1498618] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*), JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x32
V [libjvm.so+0xfd0a9b] JavaCalls::call(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x14b
V [libjvm.so+0x10eafbe] JVM_DoPrivileged+0x769
C [libjava.so+0xe8f8] Java_java_security_AccessController_doPrivileged__Ljava_security_PrivilegedExceptionAction_2Ljava_security_AccessControlContext_2+0x46
j java.security.AccessController.doPrivileged(Ljava/security/PrivilegedExceptionAction;Ljava/security/AccessControlContext;)Ljava/lang/Object;+0 java.base@11.0.7-internal
j java.net.URLClassLoader.findClassInternal(Ljava/lang/String;ZLjava/lang/String;J)Ljava/lang/Class;+17 java.base@11.0.7-internal
j java.net.URLClassLoader.findClassFromCDS(Ljava/lang/String;Ljava/lang/String;J)Ljava/lang/Class;+5 java.base@11.0.7-internal
j java.lang.ClassLoader.loadClassFromCDS(Ljava/lang/String;Ljava/lang/String;JI)Ljava/lang/Class;+147 java.base@11.0.7-internal
v ~StubRoutines::call_stub
V [libjvm.so+0xfd1127] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x689
V [libjvm.so+0x1498618] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*), JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x32
V [libjvm.so+0xfd0a9b] JavaCalls::call(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x14b
V [libjvm.so+0xfcf9d0] JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x1a2
V [libjvm.so+0x166a52c] SystemDictionaryShared::load_class_from_cds(Symbol const*, Handle, InstanceKlass*, int, Thread*)+0x204
V [libjvm.so+0x166a716] SystemDictionaryShared::lookup_shared(Symbol*, Handle, bool&, bool, Thread*)+0x164
V [libjvm.so+0x165ff63] SystemDictionary::load_instance_class(Symbol*, Handle, Thread*)+0x631
V [libjvm.so+0x165d202] SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, Thread*)+0x92a
V [libjvm.so+0x165b8a8] SystemDictionary::resolve_or_null(Symbol*, Handle, Handle, Thread*)+0x11e
V [libjvm.so+0x165b4ab] SystemDictionary::resolve_or_fail(Symbol*, Handle, Handle, bool, Thread*)+0x35
V [libjvm.so+0x11015d8] find_class_from_class_loader(JNIEnv_*, Symbol*, unsigned char, Handle, Handle, unsigned char, Thread*)+0x45
V [libjvm.so+0x10e704b] JVM_FindClassFromCaller+0x36f
C [libjava.so+0xf555] Java_java_lang_Class_forName0+0x22b
j java.lang.Class.forName0(Ljava/lang/String;ZLjava/lang/ClassLoader;Ljava/lang/Class;)Ljava/lang/Class;+0 java.base@11.0.7-internal
j java.lang.Class.forName(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class;+43 java.base@11.0.7-internal
j Buy2ByURLClassLoader.lambda$main$1(Ljava/net/URLClassLoader;[ILjava/lang/String;)V+12
j Buy2ByURLClassLoader$$Lambda$25.accept(Ljava/lang/Object;)V+12
j java.util.stream.ForEachOps$ForEachOp$OfRef.accept(Ljava/lang/Object;)V+5 java.base@11.0.7-internal
J 1021 c1 java.util.stream.ReferencePipeline$2$1.accept(Ljava/lang/Object;)V java.base@11.0.7-internal (27 bytes) @ 0x00007fffd87f7864 [0x00007fffd87f7480+0x00000000000003e4]
j java.util.Iterator.forEachRemaining(Ljava/util/function/Consumer;)V+21 java.base@11.0.7-internal
j java.util.Spliterators$IteratorSpliterator.forEachRemaining(Ljava/util/function/Consumer;)V+52 java.base@11.0.7-internal
j java.util.stream.AbstractPipeline.copyInto(Ljava/util/stream/Sink;Ljava/util/Spliterator;)V+32 java.base@11.0.7-internal
j java.util.stream.AbstractPipeline.wrapAndCopyInto(Ljava/util/stream/Sink;Ljava/util/Spliterator;)Ljava/util/stream/Sink;+13 java.base@11.0.7-internal
j java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Ljava/util/stream/PipelineHelper;Ljava/util/Spliterator;)Ljava/lang/Void;+3 java.base@11.0.7-internal
j java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(Ljava/util/stream/PipelineHelper;Ljava/util/Spliterator;)Ljava/lang/Object;+3 java.base@11.0.7-internal
j java.util.stream.AbstractPipeline.evaluate(Ljava/util/stream/TerminalOp;)Ljava/lang/Object;+88 java.base@11.0.7-internal
j java.util.stream.ReferencePipeline.forEach(Ljava/util/function/Consumer;)V+6 java.base@11.0.7-internal
j Buy2ByURLClassLoader.main([Ljava/lang/String;)V+89
v ~StubRoutines::call_stub
V [libjvm.so+0xfd1127] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x689
V [libjvm.so+0x1498618] os::os_exception_wrapper(void (*)(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*), JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x32
V [libjvm.so+0xfd0a9b] JavaCalls::call(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x14b
V [libjvm.so+0x107c15e] jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x1f0
V [libjvm.so+0x1093240] jni_CallStaticVoidMethod+0x36a
C [libjli.so+0x4f3e] JavaMain+0xcd7
從最上層幾個frame的calls來看。最上面兩個看不出什么,看第三個:
// called during initial loading of a shared class
InstanceKlass* KlassFactory::check_shared_class_file_load_hook(...) {
#if INCLUDE_CDS && INCLUDE_JVMTI
assert(ik != NULL, "sanity");
assert(ik->is_shared(), "expecting a shared class");
if (JvmtiExport::should_post_class_file_load_hook()) {
assert(THREAD->is_Java_thread(), "must be JavaThread");
// Post the CFLH
JvmtiCachedClassFileData* cached_class_file = NULL;
JvmtiCachedClassFileData* archived_class_data = ik->get_archived_class_data();
assert(archived_class_data != NULL, "shared class has no archived class data");
unsigned char* ptr =
VM_RedefineClasses::get_cached_class_file_bytes(archived_class_data);
unsigned char* end_ptr =
ptr + VM_RedefineClasses::get_cached_class_file_len(archived_class_data);
unsigned char* old_ptr = ptr;
JvmtiExport::post_class_file_load_hook(class_name,
class_loader,
protection_domain,
&ptr,
&end_ptr,
&cached_class_file);
if (old_ptr != ptr) {
// JVMTI agent has modified class file data.
// Set new class file stream using JVMTI agent modified class file data.
ClassLoaderData* loader_data =
ClassLoaderData::class_loader_data(class_loader());
int path_index = ik->shared_classpath_index();
const char* pathname;
if (path_index < 0) {
ModuleEntry* mod_entry = ik->module();
if (mod_entry != NULL && (mod_entry->location() != NULL)) {
ResourceMark rm;
pathname = (const char*)(mod_entry->location()->as_C_string());
} else {
pathname = "";
}
}
...
}
}
#endif
return NULL;
}
ik是org/apache/xerces/jaxp/JAXPConstants
,比較莫名其妙。ik->module()
crash,要走到這里需要old_ptr!=ptr
,這兩個指針往前看是:
unsigned char* ptr =
VM_RedefineClasses::get_cached_class_file_bytes(archived_class_data);
unsigned char* end_ptr =
ptr + VM_RedefineClasses::get_cached_class_file_len(archived_class_data);
unsigned char* old_ptr = ptr;
JvmtiExport::post_class_file_load_hook(class_name,
class_loader,
protection_domain,
&ptr,
&end_ptr,
&cached_class_file);
if (old_ptr != ptr){...}
調用JvmtiExport::post_class_file_load_hook
之前肯定是old_ptr==ptr
,所以問題可能就是JvmtiExport::post_class_file_load_hook
了,它的參數只有ptr和end_ptr,所以應該是它修改了ptr,導致old_ptr不等於ptr。
0x2
為了進一步確認,避免白費功夫,先看看是不是真的這個原因。[1]
看一下KlassFactory::check_shared_class_file_load_hook
的caller,即SystemDictionary::load_shared_class
:
InstanceKlass* SystemDictionary::load_shared_class(InstanceKlass* ik,
Handle class_loader,
Handle protection_domain, TRAPS) {
...
InstanceKlass* new_ik = KlassFactory::check_shared_class_file_load_hook(
ik, class_name, class_loader, protection_domain, CHECK_NULL);
if (new_ik != NULL) {
// The class is changed by CFLH. Return the new class. The shared class is
// not used.
return new_ik;
}
...
return ik;
}
CFLH表示Class File Load Hook,這段代碼說如果agent修改了類的字節碼,那就不使用CDS archive的ik類,使用修改后的類,但是agent的實現根本沒有修改過org/apache/xerces/jaxp/JAXPConstants
,所以把這段邏輯注釋掉,再跑一下AppCDS+agent,沒有任何問題。那可以確定是JvmtiExport::post_class_file_load_hook
,回到[1]處繼續。
0x3
JvmtiExport::post_class_file_load_hook
會經過層層調用,走到JvmtiClassFileLoadHookPoster::post_to_env
,之前的ptr和end_ptr現在分別對應_data_ptr和_endptr,cur_ptr指向data_ptr:
class JvmtiClassFileLoadHookPoster : public StackObj {
...
public:
inline JvmtiClassFileLoadHookPoster(Symbol* h_name, Handle class_loader,
Handle h_protection_domain,
unsigned char **data_ptr, unsigned char **end_ptr,
JvmtiCachedClassFileData **cache_ptr) {
_h_name = h_name;
_class_loader = class_loader;
_h_protection_domain = h_protection_domain;
_data_ptr = data_ptr;
_end_ptr = end_ptr;
_thread = JavaThread::current();
_curr_len = *end_ptr - *data_ptr;
_curr_data = *data_ptr;
_curr_env = NULL;
_cached_class_file_ptr = cache_ptr;
...
}
...
void post_to_env(JvmtiEnv* env, bool caching_needed) {
...
unsigned char *new_data = NULL;
if (callback != NULL) {
(*callback)(env->jvmti_external(), jem.jni_env(),
jem.class_being_redefined(),
jem.jloader(), jem.class_name(),
jem.protection_domain(),
_curr_len, _curr_data,
&new_len, &new_data);
}
if (new_data != NULL) {
...
_curr_data = new_data;
_curr_len = new_len;
// Save the current agent env we need this to deallocate the
// memory allocated by this agent.
_curr_env = env;
}
}
...
};
這里最開始new_data為NULL,經過一個調用后,如果new_data不為NULL,則修改cur_data,即修改data_ptr,即修改ptr。
所以問題就是,這個調用導致了new_data不為null。
這個調用會從libjvm.so轉到libinstrument.so,調用transformClassFile:
void
transformClassFile( JPLISAgent * agent,
JNIEnv * jnienv,
jobject loaderObject,
const char* name,
jclass classBeingRedefined,
jobject protectionDomain,
jint class_data_len,
const unsigned char* class_data,
jint* new_class_data_len,
unsigned char** new_class_data,
jboolean is_retransformer) {
jboolean errorOutstanding = JNI_FALSE;
jstring classNameStringObject = NULL;
jarray classFileBufferObject = NULL;
jarray transformedBufferObject = NULL;
jsize transformedBufferSize = 0;
unsigned char * resultBuffer = NULL;
jboolean shouldRun = JNI_FALSE;
...
/* Finally, unmarshall the parameters (if someone touched the buffer, tell the JVM) */
if ( !errorOutstanding ) {
if ( transformedBufferObject != NULL ) {
transformedBufferSize = (*jnienv)->GetArrayLength( jnienv,
transformedBufferObject);
errorOutstanding = checkForAndClearThrowable(jnienv);
jplis_assert_msg(!errorOutstanding, "can't get array length");
if ( !errorOutstanding ) {
/* allocate the response buffer with the JVMTI allocate call.
* This is what the JVMTI spec says to do for Class File Load hook responses
*/
jvmtiError allocError = (*(jvmti(agent)))->Allocate(jvmti(agent),
transformedBufferSize,
&resultBuffer);
errorOutstanding = (allocError != JVMTI_ERROR_NONE);
jplis_assert_msg(!errorOutstanding, "can't allocate result buffer");
}
if ( !errorOutstanding ) {
(*jnienv)->GetByteArrayRegion( jnienv,
transformedBufferObject,
0,
transformedBufferSize,
(jbyte *) resultBuffer);
errorOutstanding = checkForAndClearThrowable(jnienv);
jplis_assert_msg(!errorOutstanding, "can't get byte array region");
/* in this case, we will not return the buffer to the JVMTI,
* so we need to deallocate it ourselves
*/
if ( errorOutstanding ) {
deallocate( jvmti(agent),
(void*)resultBuffer);
}
}
if ( !errorOutstanding ) {
*new_class_data_len = (transformedBufferSize);
*new_class_data = resultBuffer;
}
}
}
...
}
return;
}
這個調用是transform相關的,已經猜到了,agent的transform不會返回null,就算類沒有修改,也是返回原來的byte[]
:
@Override
public byte[] transform(ClassLoader loader, String className, Class<?> classBeingRedefined,
ProtectionDomain protectionDomain, byte[] classfileBuffer) {
if (classBeingRedefined != null && classBeingRedefined == URLClassLoader.class) {
try {
ClassPool cp = new ClassPool();
cp.appendSystemPath();
CtClass ctClass = cp.get(URLClassLoader.class.getName());
dumpCtor1(cp, ctClass);
dumpCtor2(cp, ctClass);
dumpCtor3(cp, ctClass);
dumpCtor4(cp, ctClass);
dumpCtor5(cp, ctClass);
dumpCtor6(cp, ctClass);
dumpCtor7(cp, ctClass);
byte[] classData = ctClass.toBytecode();
ctClass.detach();
return classData;
} catch (Exception e) {
try {
Files.writeString(Paths.get(ERROR_LOG_FILE), e.toString());
} catch (java.io.IOException e1) {
e.printStackTrace();
}
}
}
return classfileBuffer;
}
這就導致了new_data最開始是NULL,后面變成新的,導致出了問題。修改Java的transform,沒有修改字節碼的部分返回return null,問題就解決了。
0x4
再看看doc:
慚愧,官方有說過,如果transform的實現沒有修改過類字節碼,那么就應該返回null。
進一步,如果transform的實現修改了類,應該創建一個新的byte[]
,然后把classfileBuffer的數據復制進去,再修改新的byte[],而不是直接修改classfileBuffer的字節碼,網上的教程和example很多都沒有注意這個問題。
這就完了嗎?對於手頭的工作來說,到這里已經可以了。但是回到問題本身,實際上還有下文。因為即便返回了原來的classfileBuffer而不是按照推薦返回的null,對應的邏輯也應該是使用classfileBuffer做為類的新字節碼代替原來的CDS archive數據,而不是JVM Crash,JVM的實現其實是有問題的。