第9篇-字節碼指令的定義

本文轉載自查看原文 2021-08-24 10:13 252

之前的文章介紹了解釋執行下的Java棧幀創建以及字節碼分派邏輯，但是始終沒有講到虛擬機到底是怎么執行Java方法中的字節碼的，在介紹字節碼的執行之前，需要先知道字節碼指令的定義。在Bytecodes::initialize()函數中會定義字節碼指令的一些屬性。這個函數的調用鏈如下：

init_globals()
bytecodes_init() 
Bytecodes::initialize()

在Bytecodes::initialize()函數中有類似這樣的定義：

//  bytecode               bytecode name           format   wide f.   result tp  stk traps
def(_nop                 , "nop"                 , "b"    , NULL    , T_VOID   ,  0, false);
def(_aconst_null         , "aconst_null"         , "b"    , NULL    , T_OBJECT ,  1, false);
def(_iconst_m1           , "iconst_m1"           , "b"    , NULL    , T_INT    ,  1, false);
def(_iconst_0            , "iconst_0"            , "b"    , NULL    , T_INT    ,  1, false);
def(_iconst_1            , "iconst_1"            , "b"    , NULL    , T_INT    ,  1, false);
// ...

現在Java虛擬機規范定義的202個字節碼指令都會向上圖那樣，調用def()函數進行定義，我們需要重點關注調用def()函數時傳遞的參數bytecode name、format等。下面一個一個解釋，如下：

bytecode name就是字節碼名稱；
wide表示字節碼前面是否可以加wide，如果可以，則值為"wbii"；
result tp表示指令執行后的結果類型，如為T_ILLEGAL時，表示只參考當前字節碼無法決定執行結果的類型，如_invokevirtual方法調用指令，結果類型應該為方法返回類型，但是此時只參考這個調用方法的字節碼指令是無法決定的；
stk表示對表達式棧深度的影響，如_nop指令不執行任何操作，所以對表達式棧的深度無影響，stk的值為0；當用_iconst_0向棧中壓入0時，棧的深度增加1，所以stk的值為1。當為_lconst_0時，棧的深度會增加2；當為_lstore_0時，棧的深度會減少2；
traps表示can_trap，這個比較重要，在后面會詳細介紹。
format，這個屬性能表達2個意思，首先能表達字節碼的格式，另外還能表示字節碼的長度。

下面我們需要重點介紹一下format這個參數。format表示字節碼的格式，當字符串中有一個字符時就是一個字節長度的字節碼，當為2個字符時就是2個字節長度的字節碼...，如_iconst_0就是一個字節寬度的字節碼，_istore的format為"bi"，所以是2個字節寬度。format還可能為空字符串，當為空字符串時，表示當前的字節碼不是Java虛擬機規范中定義的字節碼，如為了提高解釋執行效率的_fast_agetfield、_fast_bgetfield等字節碼，這些字節碼是虛擬機內部定義的。還能表達字節碼的格式，其中的字符串中各個字符的含義如下：

b：表示字節碼指令是非可變長度的，所以對於tableswitch、lookupswitch這種可變長度的指令來說，format字符串中不會含有b字符；

c：操作數為有符號的常量，如bipush指令將byte帶符號擴展為一個int類型的值，然后將這個值入棧到操作數棧中；

i：操作數為無符號的本地變量表索引值，如iload指令從局部變量表加載一個int類型的值到操作數棧中；

j：操作數為常量池緩存的索引，注意常量池緩存索引不同與常量池索引，關於常量池索引，在《深入剖析Java虛擬機：源碼剖析與實例詳解》基礎卷中詳細介紹過，這里不再介紹；

k：操作數為無符號的常量池索引，如ldc指令將從運行時常量池中提取數據並壓入操作數棧，所以格式為"bk"；

o：操作數為分支偏移，如ifeq表示整數與零比較，如果整數為0，則比較結果為真，將操作數看為分支偏移量進行跳轉，所以格式為”boo“；

_：可直接忽略

w：可用來擴展局部變量表索引的字節碼，這些字節碼有iload、fload等，所以wild的值為"wbii"；

調用的def()函數的實現如下：

void Bytecodes::def(
Code          code,
const char*   name,
const char*   format,
const char*   wide_format,
BasicType     result_type,
int           depth,
bool          can_trap,
Code          java_code
) {
  int len  = (format      != NULL ? (int) strlen(format)      : 0);
  int wlen = (wide_format != NULL ? (int) strlen(wide_format) : 0);

  _name          [code] = name;
  _result_type   [code] = result_type;
  _depth         [code] = depth;
  _lengths       [code] = (wlen << 4) | (len & 0xF); // 0xF的二進制值為1111
  _java_code     [code] = java_code;


  int bc_flags = 0;
  if (can_trap){
    // ldc、ldc_w、ldc2_w、_aload_0、iaload、iastore、idiv、ldiv、ireturn等
    // 字節碼指令都會含有_bc_can_trap
    bc_flags |= _bc_can_trap; 
  }
  if (java_code != code){
    bc_flags |= _bc_can_rewrite; // 虛擬機內部定義的指令都會有_bc_can_rewrite
  }

  // 在這里對_flags賦值操作
  _flags[(u1)code+0*(1<<BitsPerByte)] = compute_flags(format,      bc_flags);
  _flags[(u1)code+1*(1<<BitsPerByte)] = compute_flags(wide_format, bc_flags);
}

其中的_name、_result_type等都是在Bytecodes類中定義的靜態數組，其下標為Opcode值，而存儲的值就是name、result_type等。這些變量的定義如下：

const char*     Bytecodes::_name          [Bytecodes::number_of_codes];
BasicType       Bytecodes::_result_type   [Bytecodes::number_of_codes];
s_char          Bytecodes::_depth         [Bytecodes::number_of_codes];
u_char          Bytecodes::_lengths       [Bytecodes::number_of_codes];
Bytecodes::Code Bytecodes::_java_code     [Bytecodes::number_of_codes];
u_short         Bytecodes::_flags         [(1<<BitsPerByte)*2];

Bytecodes::number_of_codes的值為234，足夠存儲所有的字節碼指令了（包含虛擬機內部擴展的指令）。　

回看Bytecodes::def()函數，通過調用compute_flags()函數根據傳入的wide_format和format來計算字節碼的一些屬性，然后存儲到高8位和低8位中。調用的compute_flags()函數的實現如下：

int Bytecodes::compute_flags(const char* format, int more_flags) {
  if (format == NULL) {
	  return 0;  // not even more_flags
  }

  int flags = more_flags;
  const char* fp = format;
  switch (*fp) {
  case '\0':
    flags |= _fmt_not_simple; // but variable
    break;
  case 'b':
    flags |= _fmt_not_variable;  // but simple
    ++fp;  // skip 'b'
    break;
  case 'w':
    flags |= _fmt_not_variable | _fmt_not_simple;
    ++fp;  // skip 'w'
    guarantee(*fp == 'b', "wide format must start with 'wb'");
    ++fp;  // skip 'b'
    break;
  }

  int has_nbo = 0, has_jbo = 0, has_size = 0;
  for (;;) {
    int this_flag = 0;
    char fc = *fp++;
    switch (fc) {
    case '\0':  // end of string
      assert(flags == (jchar)flags, "change _format_flags");
      return flags;

    case '_': continue;         // ignore these

    case 'j': this_flag = _fmt_has_j; has_jbo = 1; break;
    case 'k': this_flag = _fmt_has_k; has_jbo = 1; break;
    case 'i': this_flag = _fmt_has_i; has_jbo = 1; break;
    case 'c': this_flag = _fmt_has_c; has_jbo = 1; break;
    case 'o': this_flag = _fmt_has_o; has_jbo = 1; break;

    case 'J': this_flag = _fmt_has_j; has_nbo = 1; break;
    ...
    default:  guarantee(false, "bad char in format");
    }// 結束switch

    flags |= this_flag;

    guarantee(!(has_jbo && has_nbo), "mixed byte orders in format");
    if (has_nbo){
      flags |= _fmt_has_nbo;
    }

    int this_size = 1;
    if (*fp == fc) {
      // advance beyond run of the same characters
      this_size = 2;
      while (*++fp == fc){
    	  this_size++;
      }
      switch (this_size) {
      case 2: flags |= _fmt_has_u2; break; // 如sipush、ldc_w、ldc2_w、wide iload等
      case 4: flags |= _fmt_has_u4; break; // 如goto_w和invokedynamic指令
      default:
    	  guarantee(false, "bad rep count in format");
      }
    }

    has_size = this_size;
  }
}

函數要根據wide_format和format來計算flags的值，通過flags中的值能夠表示字節碼的b、c、i、j、k、o、w（在之前介紹format時介紹過）和字節碼操作數的大小（操作數是2字節還是4字節）。以_fmt開頭的一些變量在枚舉類中已經定義，如下：

// Flag bits derived from format strings, can_trap, can_rewrite, etc.:
enum Flags {
// semantic flags:
_bc_can_trap      = 1<<0,     // bytecode execution can trap(卡住) or block
// 虛擬機內部定義的字節碼指令都會含有這個標識
_bc_can_rewrite   = 1<<1,     // bytecode execution has an alternate(代替者) form

// format bits (determined only by the format string):
_fmt_has_c        = 1<<2,     // constant, such as sipush "bcc"
_fmt_has_j        = 1<<3,     // constant pool cache index, such as getfield "bjj"
_fmt_has_k        = 1<<4,     // constant pool index, such as ldc "bk"
_fmt_has_i        = 1<<5,     // local index, such as iload
_fmt_has_o        = 1<<6,     // offset, such as ifeq

_fmt_has_nbo      = 1<<7,     // contains native-order field(s)
_fmt_has_u2       = 1<<8,     // contains double-byte field(s)
_fmt_has_u4       = 1<<9,     // contains quad-byte field
_fmt_not_variable = 1<<10,    // not of variable length (simple or wide) 不可變長度的指令
_fmt_not_simple   = 1<<11,    // either wide or variable length 或者是可加wild的字節碼指令，或者是可變長度的指令
_all_fmt_bits     = (_fmt_not_simple*2 - _fmt_has_c),

// ...
};

與format的對應關系如下：　

這樣通過組合就可表示出不同的值，枚舉類中定義了常用的組合如下：

_fmt_b      = _fmt_not_variable,
_fmt_bc     = _fmt_b | _fmt_has_c,
_fmt_bi     = _fmt_b | _fmt_has_i,
_fmt_bkk    = _fmt_b | _fmt_has_k | _fmt_has_u2,
_fmt_bJJ    = _fmt_b | _fmt_has_j | _fmt_has_u2 | _fmt_has_nbo,
_fmt_bo2    = _fmt_b | _fmt_has_o | _fmt_has_u2,
_fmt_bo4    = _fmt_b | _fmt_has_o | _fmt_has_u4

例如字節碼為bipush時，format就是"bc"，那么flags的值為_fmt_b | _fmt_has_c，ldc字節碼的format為"bk"，則flags的值為_fmt_b | _fmt_has_k。　

公眾號 深入剖析Java虛擬機HotSpot 已經更新虛擬機源代碼剖析相關文章到60+，歡迎關注，如果有任何問題，可加作者微信mazhimazh，拉你入虛擬機群交流

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 第36篇-return字節碼指令第34篇-解析invokeinterface字節碼指令第22篇-虛擬機字節碼之運算指令 lua源碼學習篇四：字節碼指令第32篇-解析interfacevirtual字節碼指令 JVM 字節碼指令字節碼編程，Javassist篇五《使用Bytecode指令碼生成含有自定義注解的類和方法》第28篇-虛擬機字節碼指令之控制轉移指令第27篇-虛擬機字節碼指令之操作數棧管理指令 Java字節碼指令收集大全