libprotobuf FATAL [xxx.pb.cc:59] CHECK failed: file != NULL


背景

具體報錯信息:

[libprotobuf FATAL bc_out/baidu/aicd/bvs-algo/common/proto/processor.pb.cc:59] CHECK failed: file != NULL:

現象:

程序運行時報錯。實際程序代碼沒有修改,僅是替換了新的sdk動態庫文件。

 

查看libprotobuf自動生成的.cc文件,common/proto/processor.pb.cc中報錯代碼位置如下:

const ::google::protobuf::FileDescriptor* file =

    ::google::protobuf::DescriptorPool::generated_pool()->FindFileByName(

      "xxxx.proto");

  GOOGLE_CHECK(file != NULL);

 

程序鏈接情況:

進程process啟動時通過dlopen加載libtest1.so;

libtest1.so依賴libcommon.so;  缺失的xxxx.proto就是編譯在這個common庫中,並且靜態鏈接了libprotobuf庫;

libtest1又會加載另一個so, libtset222.so,通過dlopen的方式,下面的函數調用方式:

handle = dlopen(sopath.c_str(), RTLD_NOW | RTLD_GLOBAL | RTLD_DEEPBIND);

問題分析結果

報錯原因是在全局的::google::protobuf::DescriptorPool::generated_pool()中找不到processor.proto";而processor.proto在common模塊下並且沒有修改代碼。

從報錯看,可能是注冊到全局generated_pool中的內容被另一份內容覆蓋了,或者被重新初始化給刪除了。

 

查看資料發現,

Protocol Buffers holds a global registry based on the .proto filename. When 2 pieces of software try to add the same PB message to this registry, you get a name conflict.

同一進程內libprotobuf會維護一個全局的global descriptor database。

C++全局對象的初始化是在進入main函數之前進行的,也就是說在程序一開始的時候就已經將.proto文件中的元信息導入到了數據表項集合中。

 

查看依賴的so,發現依賴的一個新引入so有libprotobuf.so的依賴項。因此問題原因應該是global descriptor database重復初始化了,類似下面的case:

  1. InitGeneratedPoolOnce() is called before generated_pool_init_ is dynamically initialized. It will call InitGeneratedPool() and create EncodeDescriptorDatabase/DescriptorPool for the first time.
  2. dynamical initialization of generated_pool_init_ happens and it simply zeros the variable's content.
  3. InitGeneratedPoolOnce() is called again and it calls InitGeneratedPool() again which creates new EncodedDescriptorDatabase/DescriptorPool for the second time.

修復步驟:

嘗試刪除新的so對libprotobuf動態庫的依賴,重啟程序查看沒有了上述的報錯。

 

使用patchelf去除libprotobuf動態庫的依賴:

$ patchelf --print-needed libtest222.so
libpthread.so.0
libdl.so.2
libprotobuf.so.11
libstdc++.so.6
libm.so.6
libgcc_s.so.1
libc.so.6
$ patchelf --remove-needed libprotobuf.so.11 libtest222.so
$ patchelf --print-needed libtest222.so
libpthread.so.0
libdl.so.2
libstdc++.so.6
libm.so.6
libgcc_s.so.1
libc.so.6

分析過程

1、將報錯的代碼封裝成 testproto()的簡單函數,然后提前到init()中進行測試。 這么做目的是驗證在其他地方調用是否會報錯,且排查程序運行過程的影響。另外,算法so也是在init函數中加載的。

2、修改后,發現在加載算法so之前調用testproto(),就不會報CHECK failed: file != NULL的錯誤;但是在加載算法so之后第二次執行testproto(),就會在log打印消息時崩潰,堆棧在protobuf訪問上。

      在加載算法so之后首次調用testproto(),就會報CHECK failed: file != NULL的錯誤;

3、init()會加載4個第三方的so,依次去除某一個so進行測試,發現只有一個so會引起崩潰。

4、聚焦這個so,發現和其他算法so不同之處在於它依賴了libprotobuf.so.11動態庫。猜測可能和全局對象的初始化有關。

     查看資料,發現libprotobuf會維護全局的global descriptor database,而且C++全局對象的初始化是在進入main函數之前進行的。

      那么程序已經靜態編譯了libprotobuf,又額外加載了鏈接libprotobuf動態庫的so,很可能是重復初始化的原因。

因此手動修改so的依賴,去除libprotobuf動態庫的依賴進行驗證。發現刪除libprotobuf.so.11的依賴,即可避免此錯誤。

 

5、打印generated_pool()的地址進行驗證:

第一次,不加載對應so:

test--- init-testproto
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x366cf20
test--- init-testproto 222
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x366cf20

第二次,加載對應的so:

test--- init-testproto
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x283d310, xxxx.proto file = 0x7f6c8c00edc0

init libtest222.so   //加載動態庫
test--- init-testproto 222
test--- ::google::protobuf::DescriptorPool::generated_pool() =  0x283d310, xxxx.proto file = 0x7f6c8c00edc0

test--- test-proto
test--- in log process
signal 11 received              //在加載新的so之前先調用::google::protobuf::DescriptorPool::generated_pool(),獲取的xxxx.proto file 就不為空 ;但是在加載so之后再調用testrpc(),還是會崩潰
deregister service for signal 11
stop node successfully!
段錯誤(吐核)

第三次,加載新的的so,在加載so之前不調用generated_pool() :

test--- init-testproto

init libtest222.so      //加載動態庫

test--- init-testproto 222
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x7f7d600545f0, xxxx.proto file = (nil)
test--- test-proto
test--- in log process
[libprotobuf FATAL bc_out/baidu/aicd/bvs-algo/common/proto/processor.pb.cc:59] CHECK failed: file != NULL:     //在加載so之后首次調用::google::protobuf::DescriptorPool::generated_pool(),就會報CHECK failed: file != NULL的錯誤
terminate called after throwing an instance of 'google::protobuf::FatalException'
what(): CHECK failed: file != NULL:
signal 11 received
deregister service for signal 11
已放棄(吐核)

第四次,去除so對libprotobuf動態庫的依賴,加載so進行驗證:

test--- init-testproto
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x39ce350

init libtest222.so   //加載動態庫
test--- init-testproto 222
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x39ce350

第五次,保留so對libprotobuf動態庫的依賴,加載track的so進行驗證, 但是去除dlopen調用時的RTLD_GLOBAL設置:

handle = dlopen(sopath.c_str(), RTLD_NOW | RTLD_GLOBAL | RTLD_DEEPBIND);   //當前加載so的設置方式

init libtest222.so
test--- init-testptoro 222
test--- ::google::protobuf::DescriptorPool::generated_pool() = 0x3b12260, xxxx.proto file = 0x7fb79d5cf200   //去除RTLD_GLOBAL后加載so也能正常獲取xxxx.proto的FileDescriptor
test--- test-proto
test--- in log process
test--- test-proto

RTLD_GLOBAL causes symbols from shared libraries to be made public and available for relocation. This is needed when you import several separate libraries via dlopen(), that use each other's symbols.

protobuf部分源碼

1、protobuf/include/google/protobuf/descriptor.h 中的相關定義:

namespace google {
namespace protobuf {

// Used to construct descriptors.
//
// Normally you won't want to build your own descriptors. Message classes
// constructed by the protocol compiler will provide them for you. However,
// if you are implementing Message on your own, or if you are writing a
// program which can operate on totally arbitrary types and needs to load
// them from some sort of database, you might need to.
//
// Since Descriptors are composed of a whole lot of cross-linked bits of
// data that would be a pain to put together manually, the
// DescriptorPool class is provided to make the process easier. It can
// take a FileDescriptorProto (defined in descriptor.proto), validate it,
// and convert it to a set of nicely cross-linked Descriptors.
//
// DescriptorPool also helps with memory management. Descriptors are
// composed of many objects containing static data and pointers to each
// other. In all likelihood, when it comes time to delete this data,
// you'll want to delete it all at once. In fact, it is not uncommon to
// have a whole pool of descriptors all cross-linked with each other which
// you wish to delete all at once. This class represents such a pool, and
// handles the memory management for you.
//
// You can also search for descriptors within a DescriptorPool by name, and
// extensions by number.
class LIBPROTOBUF_EXPORT DescriptorPool {

// Get a pointer to the generated pool. Generated protocol message classes
// which are compiled into the binary will allocate their descriptors in
// this pool. Do not add your own descriptors to this pool.
static const DescriptorPool* generated_pool();

...

};

2、descriptor.cc

https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/descriptor.cc

// generated_pool ====================================================

namespace {

EncodedDescriptorDatabase* GeneratedDatabase() {
  static auto generated_database =
  internal::OnShutdownDelete(new EncodedDescriptorDatabase());
  return generated_database;
}

DescriptorPool* NewGeneratedPool() {
  auto generated_pool = new DescriptorPool(GeneratedDatabase());
  generated_pool->InternalSetLazilyBuildDependencies();
  return generated_pool;
}

} // anonymous namespace

DescriptorDatabase* DescriptorPool::internal_generated_database() {
  return GeneratedDatabase();
}

DescriptorPool* DescriptorPool::internal_generated_pool() {
  static DescriptorPool* generated_pool =
  internal::OnShutdownDelete(NewGeneratedPool());
  return generated_pool;
}

const DescriptorPool* DescriptorPool::generated_pool() {
  const DescriptorPool* pool = internal_generated_pool();
  // Ensure that descriptor.proto has been registered in the generated pool.
  DescriptorProto::descriptor();
  return pool;
}

 

參考鏈接

解析Google Protocol Buffer消息類型的自動反射原理

Protobuf-Descriptor相關類

global std::once_flag variables don't work with Visual Studio #4773

Static linking vs dynamic linking

patchelf編輯動態庫

多個動態鏈接庫里共享的內存對象的構造問題

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM