【JVM】模板解釋器--如何根據字節碼生成匯編碼？

本文轉載自查看原文 2015-08-03 01:33 2584 jvm/ bytecode/ tempate intepreter/ 字節碼/ 匯編/ java/ 解釋器/ x86/ HotSpot VM/ 尋址

1、背景##

僅針對JVM的模板解釋器：

如何根據opcode和尋址模式，將bytecode生成匯編碼。

本文的示例中所使用的字節碼和匯編碼，請參見上篇博文：按值傳遞還是按引用？

2、尋址模式##

本文不打算深入展開尋址模式的闡述，我們聚焦Intel的IA32-64架構的指令格式：
這里寫圖片描述

簡要說明下，更多的請參考intel的手冊：

-- Prefixes ：用於修飾操作碼Opcode，賦予其lock、repeat等的語義.
-- REX Prefix：
---- Specify GPRs and SSE registers.
---- Specify 64-bit operand size.
---- Specify extended control registers.
--Opcode：操作碼,如mov、push.
--Mod R/M：尋址相關，具體見手冊。
--SIB：和Mod R/M結合起來指定尋址。
--Displacement：配合Mod R/M和SIB指定尋址。
--Immediate：立即數。

對上面的Opcode、Mod R/W、SIB、disp、imm如果不明白，看句匯編有個概念：

%mov %eax , %rax,-0x18(%rcx,%rbx,4)

如果這句匯編也不太明白，那么配合下面的：

-- Base + (Index ∗ Scale) + Displacement -- Using all the addressing components together allows efficient
indexing of a two-dimensional array when the elements of the array are 2, 4, or 8 bytes in size.

3、合法的值（64位）##

關注下這4個參數的合法取值：

• Displacement — An 8-bit, 16-bit, or 32-bit value.
• Base — The value in a 64-bit general-purpose register.
• Index — The value in a 64-bit general-purpose register.
• Scale factor — A value of 2, 4, or 8 that is multiplied by the index value.

4、Mod R/M（32位尋址）##

我們在后文將會用到Mod R/M字節，所以將32位尋址的格式貼在這里：

這里寫圖片描述

上表的備注，其中第1條將在我們的示例中用到，所以這里留意下：

The [--][--] nomenclature means a SIB follows the ModR/M byte.
The disp32 nomenclature denotes a 32-bit displacement that follows the ModR/M byte (or the SIB byte if one is present) and that is
added to the index.
The disp8 nomenclature denotes an 8-bit

5、SIB（32位尋址）##

同樣，因為用到了Mod R/M字節，那么SIB字節也可能要用到：

這里寫圖片描述

6、示例##

6.1、准備工作###

來看個實際的例子。

下面的代碼是生成mov匯編碼：

void Assembler::movl(Address dst, Register src) {
  InstructionMark im(this);
  prefix(dst, src);
  emit_int8((unsigned char)0x89);
  emit_operand(src, dst);
}

prefix(dst,src)就是處理prefix和REX prefix，這里我們不關注。

emit_int8((unsigned char) 0x89)顧名思義就是生成了一個字節，那字節的內容0x89代表什么呢？

先不急，還有一句emit_operand(src,dst)，這是一段很長的代碼，我們大概看下：

void Assembler::emit_operand(Register reg, Register base, Register index,
                 Address::ScaleFactor scale, int disp,
                 RelocationHolder const& rspec,
                 int rip_relative_correction) {
  relocInfo::relocType rtype = (relocInfo::relocType) rspec.type();

  // Encode the registers as needed in the fields they are used in

  int regenc = encode(reg) << 3;
  int indexenc = index->is_valid() ? encode(index) << 3 : 0;
  int baseenc = base->is_valid() ? encode(base) : 0;

  if (base->is_valid()) {
    if (index->is_valid()) {
      assert(scale != Address::no_scale, "inconsistent address");
      // [base + index*scale + disp]
      if (disp == 0 && rtype == relocInfo::none  &&
          base != rbp LP64_ONLY(&& base != r13)) {
        // [base + index*scale]
        // [00 reg 100][ss index base]

      	/**************************
		* 關鍵點：關注這里
      	**************************/

        assert(index != rsp, "illegal addressing mode");
        emit_int8(0x04 | regenc);
        emit_int8(scale << 6 | indexenc | baseenc);
      } else if (is8bit(disp) && rtype == relocInfo::none) {
        // ...
      } else {
        // [base + index*scale + disp32]
        // [10 reg 100][ss index base] disp32
        assert(index != rsp, "illegal addressing mode");
        emit_int8(0x84 | regenc);
        emit_int8(scale << 6 | indexenc | baseenc);
        emit_data(disp, rspec, disp32_operand);
      }
    } else if (base == rsp LP64_ONLY(|| base == r12)) {
      // ... 
    } else {
      
      // ... 
    }
  } else {
    // ... 
  }
}

上面的代碼的關注點已經標出，這里我們將其抽出，並將前文中的emit_int8((unsigned char) 0x89)結合起來：

emit_int8((unsigned char) 0x89)
emit_int8(0x04 | regenc);
emit_int8(scale << 6 | indexenc | baseenc);

最終其生成了如下的匯編代碼（64位機器）：

mov    %eax,(%rcx,%rbx,1)

好了，問題來了：

上面這句匯編怎么得出的？

6.2、計算過程###

我們給個下面的值：

regenc = 0x0，scale << 6 | indexenc | baseenc = 25

進行簡單的運算就可以得到：

emit_int8((unsigned char) 0x89) //得到0x89
emit_int8(0x04 | regenc); //得到0x04
emit_int8(scale << 6 | indexenc | baseenc); //得到0x19

合起來就是三個字節：

0x89 0x04 0x19

1、0x89對應什么？

這里寫圖片描述

從上表可以看出因為JVM工作在64位下，所以需要配合REX.W來“起頭”，不過在我們這個例子中，其恰好是0。

主要看那個89/r：

MOV r/m64,r64 //64位，將寄存器中的值給到寄存器或者內存地址中

2、0x04代表什么？

現在我們要用到上面的Mod R/M表和SIB表了。

用第二個字節0x04查Mod R/M表，可知源操作數是寄存器EAX，同時可知尋址類型是[--][--]類型，含義為：

The [--][--] nomenclature means a SIB follows the ModR/M byte.

3、0x19代表什么？

繼續查SIB表，對應字節0x19的是：

base = ECX
scaled index = EBX

4、匯編代碼：

//32位
mov %eax,%(ecx,ebx,1)

//64位
mov %rax,%(rcx,rbx,1)

7、結語##

本文簡要探討了：

如何根據opcode和尋址模式，將bytecode生成匯編碼。

終。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【JVM】模板解釋器--字節碼的resolve過程 [inside hotspot] 匯編模板解釋器(Template Interpreter)和字節碼執行《深度剖析CPython解釋器》22. 解密Python中的生成器對象，從字節碼的角度分析生成器的底層實現以及執行邏輯 JVM 字節碼指令 JVM-字節碼 JVM系列五（Javac 字節碼編譯器）. 《深度剖析CPython解釋器》11. 深入Python虛擬機，探索虛擬機執行字節碼的奧秘《深度剖析CPython解釋器》12. 剖析字節碼指令，從不一樣的角度觀測Python源代碼的執行過程 [IDA] 顯示反匯編字節碼字節碼解釋執行引擎