LLVM筆記(12) - 指令選擇(四) legalize


本節介紹指令選擇中legalize的概念, 為中端IR精確匹配機器指令需要同時檢查操作符與操作數, 在正式指令選擇前對不合法的操作符或操作數作出轉換的過程即legalize.
通常情況下給定一個后端架構其支持的指令集:

  1. 不一定能支持表達所有中端IR的操作. 一個顯而易見的例子是在一個不支持浮點運算的架構上運行浮點運算的代碼, 編譯器會使用軟浮點函數調用來替換這些操作, 而在一個支持浮點運算的架構上編譯器會選擇生成對應的浮點運算指令. 將不支持的中端IR語義轉換為架構支持的行為被稱作legalize operation.
  2. 另一方面對具體一條指令其支持的操作數通常也不能支持所有的數據類型. 以ARM為例其加法指令支持32bit整型相加, 若中端IR輸入是兩個8bit整型數據相加, 編譯器需要將兩個操作數先零擴展為32bit整型, 使用加法指令相加后再將結果截斷為8bit整型. 變換數據類型使其符合指令操作數類型的過程被稱作legalize type(矢量數據又稱legalize vector).
    可見不同於combine, legalize是一個與架構強相關的概念, 在legalize過程中, 公共的SDNode將逐步替換為架構獨有的SDNode, 甚至有些情況下直接轉換為機器指令. 我們首先會介紹一些legalize中的基本概念, 解釋架構如何設置回調來指定legalize的方式, 然后分析legalize的具體實現.

設置架構的legalize方式

為表明對一個操作該如何使其合法化, 在TargetLoweringBase類(defined in include/llvm/CodeGen/TargetLowering.h)中定義了五種legalize的枚舉, 用來對應不同的legalize方式.

enum LegalizeAction : uint8_t {
  Legal,      // The target natively supports this operation.
  Promote,    // This operation should be executed in a larger type.
  Expand,     // Try to expand this to other ops, otherwise use a libcall.
  LibCall,    // Don't try to expand this to other ops, always use a libcall.
  Custom      // Use the LowerOperation hook to implement custom lowering.
};

enum LegalizeTypeAction : uint8_t {
  TypeLegal,           // The target natively supports this type.
  TypePromoteInteger,  // Replace this integer with a larger one.
  TypeExpandInteger,   // Split this integer into two of half the size.
  TypeSoftenFloat,     // Convert this float to a same size integer type.
  TypeExpandFloat,     // Split this float into two of half the size.
  TypeScalarizeVector, // Replace this one-element vector with its element.
  TypeSplitVector,     // Split this vector into two of half the size.
  TypeWidenVector,     // This vector should be widened into a larger vector.
  TypePromoteFloat     // Replace this float with a larger one.
};

LegalizeAction枚舉含義如下:

  1. legal - 架構支持該操作/操作數類型.
  2. promote - 架構支持該操作, 但需要將其擴展為更大的數據類型(比如前面提到的ARM上的8bit加法可以轉換為32bit加法).
  3. expand - 架構不支持該操作, 可擴展為其它指令或一個libc調用(與promote相反, 比如ARM上的64bit加法需要拆成兩個32bit加法的組合).
  4. libcall - 架構不支持該操作, 直接調用libc接口替換(比如前面提到的軟浮點運算).
  5. custom - 自定義接口, 需要在LowerOperation中實現.

如果一個操作是legal的, legalize過程將會忽略這個節點, 否則根據分類做對應的legalize, 其中custom類型需要編譯器開發人員自己實現回調來使節點合法化.
LegalizeTypeAction枚舉含義類似, 但是根據數據類型不同細分了更多的種類.
每個架構需要指定每個數據類型/操作在這個架構上legalize的類型, 這些信息保存在TargetLoweringBase類中.

class TargetLoweringBase {
  /// This indicates the default register class to use for each ValueType the
  /// target supports natively.
  const TargetRegisterClass *RegClassForVT[MVT::LAST_VALUETYPE];

  /// For any value types we are promoting or expanding, this contains the value
  /// type that we are changing to.  For Expanded types, this contains one step
  /// of the expand (e.g. i64 -> i32), even if there are multiple steps required
  /// (e.g. i64 -> i16).  For types natively supported by the system, this holds
  /// the same type (e.g. i32 -> i32).
  MVT TransformToType[MVT::LAST_VALUETYPE];

  /// For each operation and each value type, keep a LegalizeAction that
  /// indicates how instruction selection should deal with the operation.  Most
  /// operations are Legal (aka, supported natively by the target), but
  /// operations that are not should be described.  Note that operations on
  /// non-legal value types are not described here.
  LegalizeAction OpActions[MVT::LAST_VALUETYPE][ISD::BUILTIN_OP_END];
};

寄存器類型決定了架構操作的數據類型, 因此RegClassForVT數組保存了每種數據類型映射的對應的寄存器類型.
當該架構不支持某個數據類型時需要將其轉換為合法的數據類型, TransformToType數組保存了需要轉換的目標類型.
最后對於一個給定某一數據類型的操作, 其legalize的類型被保存在二維數組OpActions中. 在索引該數組時注意如果架構不支持某一數據類型那么必然不支持該數據類型的所有操作. 此時該數據類型的操作的legalize方式不應該從這個數組中查找, 而是從TransformToType中首先轉換為合法的數據類型再在對應的操作中查找.
TargetLoweringBase中還保存extend load / trunc store / condition code等節點的legalize方式, 限於篇幅這里一一解釋.
我們先看下何修改OpActions, TargetLoweringBase提供了兩個接口來設置與訪問這個數組.

class TargetLoweringBase {
public:
  /// Indicate that the specified operation does not work with the specified
  /// type and indicate what to do about it. Note that VT may refer to either
  /// the type of a result or that of an operand of Op.
  void setOperationAction(unsigned Op, MVT VT,
                          LegalizeAction Action) {
    assert(Op < array_lengthof(OpActions[0]) && "Table isn't big enough!");
    OpActions[(unsigned)VT.SimpleTy][Op] = Action;
  }

  /// Return how this operation should be treated: either it is legal, needs to
  /// be promoted to a larger size, needs to be expanded to some other code
  /// sequence, or the target has a custom expander for it.
  LegalizeAction getOperationAction(unsigned Op, EVT VT) const {
    if (VT.isExtended()) return Expand;
    // If a target-specific SDNode requires legalization, require the target
    // to provide custom legalization for it.
    if (Op >= array_lengthof(OpActions[0])) return Custom;
    return OpActions[(unsigned)VT.getSimpleVT().SimpleTy][Op];
  }
};

每個架構都有一個[arch]TargetLowering類繼承TargetLoweringBase類, 在該類的構造函數中會調用setOperationAction()初始化OpActions. 我們以RISCV為例, 截取RISCVTargetLowering::RISCVTargetLowering()(defined in lib/Target/RISCV/RISCVISelLowering.cpp)的部分代碼.

RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
                                         const RISCVSubtarget &STI)
    : TargetLowering(TM), Subtarget(STI) {
  MVT XLenVT = Subtarget.getXLenVT();

  // Set up the register classes.
  addRegisterClass(XLenVT, &RISCV::GPRRegClass);

  // Compute derived properties from the register classes.
  computeRegisterProperties(STI.getRegisterInfo());

  setOperationAction(ISD::BR_JT, MVT::Other, Expand);
  setOperationAction(ISD::BR_CC, XLenVT, Expand);
  setOperationAction(ISD::SELECT, XLenVT, Custom);
  setOperationAction(ISD::SELECT_CC, XLenVT, Expand);

  setOperationAction(ISD::GlobalAddress, XLenVT, Custom);
  setOperationAction(ISD::BlockAddress, XLenVT, Custom);
  setOperationAction(ISD::ConstantPool, XLenVT, Custom);
  setOperationAction(ISD::GlobalTLSAddress, XLenVT, Custom);

  ......
};

注意到上面的代碼中還調用了TargetLoweringBase::addRegisterClass()與TargetLoweringBase::computeRegisterProperties()(defined in include/llvm/CodeGen/TargetLowering.h), 前者會會設置RegClassForVT, 后者需要在設置所有架構支持的寄存器后調用, 它會根據添加的寄存器類型來計算TransformToType.

class TargetLoweringBase {
public:
  /// Add the specified register class as an available regclass for the
  /// specified value type. This indicates the selector can handle values of
  /// that class natively.
  void addRegisterClass(MVT VT, const TargetRegisterClass *RC) {
    assert((unsigned)VT.SimpleTy < array_lengthof(RegClassForVT));
    RegClassForVT[VT.SimpleTy] = RC;
  }
};

/// computeRegisterProperties - Once all of the register classes are added,
/// this allows us to compute derived properties we expose.
void TargetLoweringBase::computeRegisterProperties(
    const TargetRegisterInfo *TRI) {
  // Everything defaults to needing one register.
  for (unsigned i = 0; i != MVT::LAST_VALUETYPE; ++i) {
    NumRegistersForVT[i] = 1;
    RegisterTypeForVT[i] = TransformToType[i] = (MVT::SimpleValueType)i;
  }
  // ...except isVoid, which doesn't need any registers.
  NumRegistersForVT[MVT::isVoid] = 0;

  // Find the largest integer register class.
  unsigned LargestIntReg = MVT::LAST_INTEGER_VALUETYPE;
  for (; RegClassForVT[LargestIntReg] == nullptr; --LargestIntReg)
    assert(LargestIntReg != MVT::i1 && "No integer registers defined!");

  // Every integer value type larger than this largest register takes twice as
  // many registers to represent as the previous ValueType.
  for (unsigned ExpandedReg = LargestIntReg + 1;
       ExpandedReg <= MVT::LAST_INTEGER_VALUETYPE; ++ExpandedReg) {
    NumRegistersForVT[ExpandedReg] = 2*NumRegistersForVT[ExpandedReg-1];
    RegisterTypeForVT[ExpandedReg] = (MVT::SimpleValueType)LargestIntReg;
    TransformToType[ExpandedReg] = (MVT::SimpleValueType)(ExpandedReg - 1);
    ValueTypeActions.setTypeAction((MVT::SimpleValueType)ExpandedReg,
                                   TypeExpandInteger);
  }

  // Inspect all of the ValueType's smaller than the largest integer
  // register to see which ones need promotion.
  unsigned LegalIntReg = LargestIntReg;
  for (unsigned IntReg = LargestIntReg - 1;
       IntReg >= (unsigned)MVT::i1; --IntReg) {
    MVT IVT = (MVT::SimpleValueType)IntReg;
    if (isTypeLegal(IVT)) {
      LegalIntReg = IntReg;
    } else {
      RegisterTypeForVT[IntReg] = TransformToType[IntReg] =
        (MVT::SimpleValueType)LegalIntReg;
      ValueTypeActions.setTypeAction(IVT, TypePromoteInteger);
    }
  }

  ......
};

computeRegisterProperties()比較復雜, 這里只截取計算TransformToType的實現. 可以看到它首先查找架構支持的最大整型寄存器對應的數據類型, 所有大於該類型的數據類型被分成兩個小一級的數據類型(expand), 所有小於該類型的且非法的數據類型(以X86_64為例, 它支持16bit 32bit 64bit等多個整型類型, 那就無需promote)會向上一級轉化(promote).

legalize type

SelectionDAG的策略是首先legalize type再legalize operation, 這樣做的好處是在legalize type后DAG中所有的節點值的數據類型都是架構支持的類型(盡管存在有些操作對應部分數據類型是非法的).
如果反過來先legalize operation可以嗎? 我的理解是不行的, 因為在legalize operation時需要知道其legalize的方式, 而這個分類本身又和數據類型相關(expand / promote). 如果先做legalize operation等於在legalize operation過程中還要考慮legalize type的問題.
因此在實現架構相關的custom legalize時也要考慮這個問題: 在legalize type時做數據類型的合法化, 在legalize operation時做操作的合法化.
讓我們先來看下legalize type的實現, legalize type的入口是SelectionDAG::LegalizeTypes()(defined in lib/CodeGen/SelectionDAG/LegalizeTypes.cpp), 它會調用DAGTypeLegalizer::run(), 后者會檢查DAG中所有SDValue並legalize非法的value type.
DAGTypeLegalizer::run()的實現有點類似於DAGCombiner::Run(). 區別在於combine時不關心節點的先后順序, 只需保證所有節點都遍歷過即可, 而legalize type則要求先legalize操作數再legalize節點值.

bool DAGTypeLegalizer::run() {
  bool Changed = false;

  // Create a dummy node (which is not added to allnodes), that adds a reference
  // to the root node, preventing it from being deleted, and tracking any
  // changes of the root.
  HandleSDNode Dummy(DAG.getRoot());
  Dummy.setNodeId(Unanalyzed);

  // The root of the dag may dangle to deleted nodes until the type legalizer is
  // done.  Set it to null to avoid confusion.
  DAG.setRoot(SDValue());

  // Walk all nodes in the graph, assigning them a NodeId of 'ReadyToProcess'
  // (and remembering them) if they are leaves and assigning 'Unanalyzed' if
  // non-leaves.
  for (SDNode &Node : DAG.allnodes()) {
    if (Node.getNumOperands() == 0) {
      AddToWorklist(&Node);
    } else {
      Node.setNodeId(Unanalyzed);
    }
  }

  ......
}

run()首先會新建一個dummy節點引用root節點防止root節點被優化, 將DAG的root設為空防止懸空的節點. 然后遍歷DAG中所有的無操作數的節點加入worklist, 將其它節點設為未分析的狀態. 之前在介紹SDNode時提到SDNode.NodeId的含義與處理的流程相關, 在legalize中NodeId代表了節點的legalize狀態, 其定義見lib/CodeGen/SelectionDAG/LegalizeTypes.h:

class LLVM_LIBRARY_VISIBILITY DAGTypeLegalizer {
  const TargetLowering &TLI;
  SelectionDAG &DAG;
public:
  /// This pass uses the NodeId on the SDNodes to hold information about the
  /// state of the node. The enum has all the values.
  enum NodeIdFlags {
    /// All operands have been processed, so this node is ready to be handled.
    ReadyToProcess = 0,

    /// This is a new node, not before seen, that was created in the process of
    /// legalizing some other node.
    NewNode = -1,

    /// This node's ID needs to be set to the number of its unprocessed
    /// operands.
    Unanalyzed = -2,

    /// This is a node that has already been processed.
    Processed = -3

    // 1+ - This is a node which has this many unprocessed operands.
  };
};

注意當其大於0時表示該節點還有若干操作數未經過legalize, 每legalize一個節點就會將節點的user的計數減1, 當節點計數到0時將其加入worklist, 小於0時代表節點未初始化或已處理.
回到run(), 類似combine一樣每次從worklist中取出一個節點做legalize, DAGTypeLegalizer::getTypeAction()會調用TargetLowering::getTypeAction()查詢節點legalize方式, 后者的實現見上文. 節點legalize后將其設為Processed狀態, 同時檢查節點的user, 減少其引用計數, 若引用計數到0將其加入worklist.

bool DAGTypeLegalizer::run() {
  ......
getTypeAction
  // Now that we have a set of nodes to process, handle them all.
  while (!Worklist.empty()) {
#ifndef EXPENSIVE_CHECKS
    if (EnableExpensiveChecks)
#endif
      PerformExpensiveChecks();

    SDNode *N = Worklist.back();
    Worklist.pop_back();
    assert(N->getNodeId() == ReadyToProcess &&
           "Node should be ready if on worklist!");

    LLVM_DEBUG(dbgs() << "Legalizing node: "; N->dump(&DAG));
    if (IgnoreNodeResults(N)) {
      LLVM_DEBUG(dbgs() << "Ignoring node results\n");
      goto ScanOperands;
    }

    // Scan the values produced by the node, checking to see if any result
    // types are illegal.
    for (unsigned i = 0, NumResults = N->getNumValues(); i < NumResults; ++i) {
      EVT ResultVT = N->getValueType(i);
      LLVM_DEBUG(dbgs() << "Analyzing result type: " << ResultVT.getEVTString()
                        << "\n");
      switch (getTypeAction(ResultVT)) {
      case TargetLowering::TypeLegal:
        LLVM_DEBUG(dbgs() << "Legal result type\n");
        break;
      // The following calls must take care of *all* of the node's results,
      // not just the illegal result they were passed (this includes results
      // with a legal type).  Results can be remapped using ReplaceValueWith,
      // or their promoted/expanded/etc values registered in PromotedIntegers,
      // ExpandedIntegers etc.
      case TargetLowering::TypePromoteInteger:
        PromoteIntegerResult(N, i);
        Changed = true;
        goto NodeDone;
      case TargetLowering::TypeExpandInteger:
        ExpandIntegerResult(N, i);
        Changed = true;
        goto NodeDone;
      case TargetLowering::TypeSoftenFloat:
        SoftenFloatResult(N, i);
        Changed = true;
        goto NodeDone;
      case TargetLowering::TypeExpandFloat:
        ExpandFloatResult(N, i);
        Changed = true;
        goto NodeDone;
      case TargetLowering::TypeScalarizeVector:
        ScalarizeVectorResult(N, i);
        Changed = true;
        goto NodeDone;
      case TargetLowering::TypeSplitVector:
        SplitVectorResult(N, i);
        Changed = true;
        goto NodeDone;
      case TargetLowering::TypeWidenVector:
        WidenVectorResult(N, i);
        Changed = true;
        goto NodeDone;
      case TargetLowering::TypePromoteFloat:
        PromoteFloatResult(N, i);
        Changed = true;
        goto NodeDone;
      }
    }

    ......

NodeDone:

    // If we reach here, the node was processed, potentially creating new nodes.
    // Mark it as processed and add its users to the worklist as appropriate.
    assert(N->getNodeId() == ReadyToProcess && "Node ID recalculated?");
    N->setNodeId(Processed);

    for (SDNode::use_iterator UI = N->use_begin(), E = N->use_end();
         UI != E; ++UI) {
      SDNode *User = *UI;
      int NodeId = User->getNodeId();

      // This node has two options: it can either be a new node or its Node ID
      // may be a count of the number of operands it has that are not ready.
      if (NodeId > 0) {
        User->setNodeId(NodeId-1);

        // If this was the last use it was waiting on, add it to the ready list.
        if (NodeId-1 == ReadyToProcess)
          Worklist.push_back(User);
        continue;
      }

      // If this is an unreachable new node, then ignore it.  If it ever becomes
      // reachable by being used by a newly created node then it will be handled
      // by AnalyzeNewNode.
      if (NodeId == NewNode)
        continue;

      // Otherwise, this node is new: this is the first operand of it that
      // became ready.  Its new NodeId is the number of operands it has minus 1
      // (as this node is now processed).
      assert(NodeId == Unanalyzed && "Unknown node ID!");
      User->setNodeId(User->getNumOperands() - 1);

      // If the node only has a single operand, it is now ready.
      if (User->getNumOperands() == 1)
        Worklist.push_back(User);
    }
  }

  ......
}

legalize vector

TODO

legalize operation

SelectionDAG類提供了兩個接口來調用legalize operation, 其中LegalizeOp()用來legalize一個操作, 我們在combine中已經見到過, legalize整個DAG則需要調用Legalize(). 兩者都調用SelectionLegalize::LegalizeOp()來實現legalize operation.

void SelectionDAG::Legalize() {
  AssignTopologicalOrder();

  SmallPtrSet<SDNode *, 16> LegalizedNodes;
  // Use a delete listener to remove nodes which were deleted during
  // legalization from LegalizeNodes. This is needed to handle the situation
  // where a new node is allocated by the object pool to the same address of a
  // previously deleted node.
  DAGNodeDeletedListener DeleteListener(
      *this,
      [&LegalizedNodes](SDNode *N, SDNode *E) { LegalizedNodes.erase(N); });

  SelectionDAGLegalize Legalizer(*this, LegalizedNodes);

  // Visit all the nodes. We start in topological order, so that we see
  // nodes with their original operands intact. Legalization can produce
  // new nodes which may themselves need to be legalized. Iterate until all
  // nodes have been legalized.
  while (true) {
    bool AnyLegalized = false;
    for (auto NI = allnodes_end(); NI != allnodes_begin();) {
      --NI;

      SDNode *N = &*NI;
      if (N->use_empty() && N != getRoot().getNode()) {
        ++NI;
        DeleteNode(N);
        continue;
      }

      if (LegalizedNodes.insert(N).second) {
        AnyLegalized = true;
        Legalizer.LegalizeOp(N);

        if (N->use_empty() && N != getRoot().getNode()) {
          ++NI;
          DeleteNode(N);
        }
      }
    }
    if (!AnyLegalized)
      break;

  }

  // Remove dead nodes now.
  RemoveDeadNodes();
}

bool SelectionDAG::LegalizeOp(SDNode *N,
                              SmallSetVector<SDNode *, 16> &UpdatedNodes) {
  SmallPtrSet<SDNode *, 16> LegalizedNodes;
  SelectionDAGLegalize Legalizer(*this, LegalizedNodes, &UpdatedNodes);

  // Directly insert the node in question, and legalize it. This will recurse
  // as needed through operands.
  LegalizedNodes.insert(N);
  Legalizer.LegalizeOp(N);

  return LegalizedNodes.count(N);
}

legalize operation處理節點也是分先后順序的, 但是不同於combine和legalize type, 它會首先對DAG中節點進行一次拓撲排序.
SelectionDAG::AssignTopologicalOrder()(define in lib/CodeGen/SelectionDAG/SelectionDAG.cpp)會根據節點的操作數做排序, 相同操作數的節點按之前的先后順序排序, 排序后節點的NodeId等於它在DAG中的位置.
注意代碼中的SortedPos表示當前排序的隊列尾, 第一遍遍歷時先將NodeId設置為節點的操作數個數(沒有操作數的節點就直接排序), 之后再次遍歷節點將已排序的節點的user的引用計數減少直到計數為0(表明節點的前繼均已排序, 該節點也可加入排序), 當遍歷到結尾時隊列已全部有序.

unsigned SelectionDAG::AssignTopologicalOrder() {
  unsigned DAGSize = 0;

  // SortedPos tracks the progress of the algorithm. Nodes before it are
  // sorted, nodes after it are unsorted. When the algorithm completes
  // it is at the end of the list.
  allnodes_iterator SortedPos = allnodes_begin();

  // Visit all the nodes. Move nodes with no operands to the front of
  // the list immediately. Annotate nodes that do have operands with their
  // operand count. Before we do this, the Node Id fields of the nodes
  // may contain arbitrary values. After, the Node Id fields for nodes
  // before SortedPos will contain the topological sort index, and the
  // Node Id fields for nodes At SortedPos and after will contain the
  // count of outstanding operands.
  for (allnodes_iterator I = allnodes_begin(),E = allnodes_end(); I != E; ) {
    SDNode *N = &*I++;
    checkForCycles(N, this);
    unsigned Degree = N->getNumOperands();
    if (Degree == 0) {
      // A node with no uses, add it to the result array immediately.
      N->setNodeId(DAGSize++);
      allnodes_iterator Q(N);
      if (Q != SortedPos)
        SortedPos = AllNodes.insert(SortedPos, AllNodes.remove(Q));
      assert(SortedPos != AllNodes.end() && "Overran node list");
      ++SortedPos;
    } else {
      // Temporarily use the Node Id as scratch space for the degree count.
      N->setNodeId(Degree);
    }
  }

  // Visit all the nodes. As we iterate, move nodes into sorted order,
  // such that by the time the end is reached all nodes will be sorted.
  for (SDNode &Node : allnodes()) {
    SDNode *N = &Node;
    checkForCycles(N, this);
    // N is in sorted position, so all its uses have one less operand
    // that needs to be sorted.
    for (SDNode::use_iterator UI = N->use_begin(), UE = N->use_end();
         UI != UE; ++UI) {
      SDNode *P = *UI;
      unsigned Degree = P->getNodeId();
      assert(Degree != 0 && "Invalid node degree");
      --Degree;
      if (Degree == 0) {
        // All of P's operands are sorted, so P may sorted now.
        P->setNodeId(DAGSize++);
        if (P->getIterator() != SortedPos)
          SortedPos = AllNodes.insert(SortedPos, AllNodes.remove(P));
        assert(SortedPos != AllNodes.end() && "Overran node list");
        ++SortedPos;
      } else {
        // Update P's outstanding operand count.
        P->setNodeId(Degree);
      }
    }
    if (Node.getIterator() == SortedPos) {
#ifndef NDEBUG
      allnodes_iterator I(N);
      SDNode *S = &*++I;
      dbgs() << "Overran sorted position:\n";
      S->dumprFull(this); dbgs() << "\n";
      dbgs() << "Checking if this is due to cycles\n";
      checkForCycles(this, true);
#endif
      llvm_unreachable(nullptr);
    }
  }

  return DAGSize;
}

注意到legalize type一開始只處理操作數為0的節點, 且下一次處理時取的隊尾節點, 因此是DFS處理. 而legalize operation是先排序, 然后按順序一一處理, 是BFS方式. 為什么使用不同算法?
可以通過截取的部分SelectionDAGLegalize::LegalizeOp()代碼看到legalize單個節點的流程與legalize type基本類似, 一個區別是legalize operation有多一種custom legalize.
custom方式會調用TargetLoweringBase::LowerOperation()實現架構自定義的legalize, 其返回值被用來替換原節點, 注意custom的legalize不能返回空的節點(否則報錯崩潰).

void SelectionDAGLegalize::LegalizeOp(SDNode *Node) {
  ......

  // Figure out the correct action; the way to query this varies by opcode
  TargetLowering::LegalizeAction Action = TargetLowering::Legal;
  bool SimpleFinishLegalizing = true;
  switch (Node->getOpcode()) {
  ......
  default:
    if (Node->getOpcode() >= ISD::BUILTIN_OP_END) {
      Action = TargetLowering::Legal;
    } else {
      Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
    }
    break;
  }

  if (SimpleFinishLegalizing) {
    SDNode *NewNode = Node;

    switch (Action) {
    case TargetLowering::Legal:
      LLVM_DEBUG(dbgs() << "Legal node: nothing to do\n");
      return;
    case TargetLowering::Custom:
      LLVM_DEBUG(dbgs() << "Trying custom legalization\n");
      // FIXME: The handling for custom lowering with multiple results is
      // a complete mess.
      if (SDValue Res = TLI.LowerOperation(SDValue(Node, 0), DAG)) {
        if (!(Res.getNode() != Node || Res.getResNo() != 0))
          return;

        if (Node->getNumValues() == 1) {
          LLVM_DEBUG(dbgs() << "Successfully custom legalized node\n");
          // We can just directly replace this node with the lowered value.
          ReplaceNode(SDValue(Node, 0), Res);
          return;
        }

        SmallVector<SDValue, 8> ResultVals;
        for (unsigned i = 0, e = Node->getNumValues(); i != e; ++i)
          ResultVals.push_back(Res.getValue(i));
        LLVM_DEBUG(dbgs() << "Successfully custom legalized node\n");
        ReplaceNode(Node, ResultVals.data());
        return;
      }
      LLVM_DEBUG(dbgs() << "Could not custom legalize node\n");
      LLVM_FALLTHROUGH;
    case TargetLowering::Expand:
      if (ExpandNode(Node))
        return;
      LLVM_FALLTHROUGH;
    case TargetLowering::LibCall:
      ConvertNodeToLibcall(Node);
      return;
    case TargetLowering::Promote:
      PromoteNode(Node);
      return;
    }
  }

  switch (Node->getOpcode()) {
  default:
    llvm_unreachable("Do not know how to legalize this operator!");

  case ISD::CALLSEQ_START:
  case ISD::CALLSEQ_END:
    break;
  case ISD::LOAD:
    return LegalizeLoadOps(Node);
  case ISD::STORE:
    return LegalizeStoreOps(Node);
  }
}

以RISCV為例, 由於指令集限制(不支持32bit立即數編碼), 移動一個全局地址到寄存器需要通過兩條指令實現(lui+addi / auipc+addi), 因此RISCV自定義了相關的legalize實現.
前面我們看到RISCV自定義了GlobalAddress節點的legalize方式, 在lib/Target/RISCV/RISCVISelLowering.cpp中實現了對應的lowering. 注意到RISCV直接使用機器指令的NodeType來替換原節點, 即lowering過程可能會干涉到指令選擇(在lowering過程中對某些節點提前選擇指令替換).

SDValue RISCVTargetLowering::LowerOperation(SDValue Op,
                                            SelectionDAG &DAG) const {
  switch (Op.getOpcode()) {
  default:
    report_fatal_error("unimplemented operand");
  case ISD::GlobalAddress:
    return lowerGlobalAddress(Op, DAG);
  ......
  }
}

SDValue RISCVTargetLowering::lowerGlobalAddress(SDValue Op,
                                                SelectionDAG &DAG) const {
  SDLoc DL(Op);
  EVT Ty = Op.getValueType();
  GlobalAddressSDNode *N = cast<GlobalAddressSDNode>(Op);
  int64_t Offset = N->getOffset();
  MVT XLenVT = Subtarget.getXLenVT();

  const GlobalValue *GV = N->getGlobal();
  bool IsLocal = getTargetMachine().shouldAssumeDSOLocal(*GV->getParent(), GV);
  SDValue Addr = getAddr(N, DAG, IsLocal);

  // In order to maximise the opportunity for common subexpression elimination,
  // emit a separate ADD node for the global address offset instead of folding
  // it in the global address node. Later peephole optimisations may choose to
  // fold it back in when profitable.
  if (Offset != 0)
    return DAG.getNode(ISD::ADD, DL, Ty, Addr,
                       DAG.getConstant(Offset, DL, XLenVT));
  return Addr;
}

template <class NodeTy>
SDValue RISCVTargetLowering::getAddr(NodeTy *N, SelectionDAG &DAG,
                                     bool IsLocal) const {
  SDLoc DL(N);
  EVT Ty = getPointerTy(DAG.getDataLayout());

  if (isPositionIndependent()) {
    SDValue Addr = getTargetNode(N, DL, Ty, DAG, 0);
    if (IsLocal)
      // Use PC-relative addressing to access the symbol. This generates the
      // pattern (PseudoLLA sym), which expands to (addi (auipc %pcrel_hi(sym))
      // %pcrel_lo(auipc)).
      return SDValue(DAG.getMachineNode(RISCV::PseudoLLA, DL, Ty, Addr), 0);

    // Use PC-relative addressing to access the GOT for this symbol, then load
    // the address from the GOT. This generates the pattern (PseudoLA sym),
    // which expands to (ld (addi (auipc %got_pcrel_hi(sym)) %pcrel_lo(auipc))).
    return SDValue(DAG.getMachineNode(RISCV::PseudoLA, DL, Ty, Addr), 0);
  }

  switch (getTargetMachine().getCodeModel()) {
  default:
    report_fatal_error("Unsupported code model for lowering");
  case CodeModel::Small: {
    // Generate a sequence for accessing addresses within the first 2 GiB of
    // address space. This generates the pattern (addi (lui %hi(sym)) %lo(sym)).
    SDValue AddrHi = getTargetNode(N, DL, Ty, DAG, RISCVII::MO_HI);
    SDValue AddrLo = getTargetNode(N, DL, Ty, DAG, RISCVII::MO_LO);
    SDValue MNHi = SDValue(DAG.getMachineNode(RISCV::LUI, DL, Ty, AddrHi), 0);
    return SDValue(DAG.getMachineNode(RISCV::ADDI, DL, Ty, MNHi, AddrLo), 0);
  }
  case CodeModel::Medium: {
    // Generate a sequence for accessing addresses within any 2GiB range within
    // the address space. This generates the pattern (PseudoLLA sym), which
    // expands to (addi (auipc %pcrel_hi(sym)) %pcrel_lo(auipc)).
    SDValue Addr = getTargetNode(N, DL, Ty, DAG, 0);
    return SDValue(DAG.getMachineNode(RISCV::PseudoLLA, DL, Ty, Addr), 0);
  }
  }
}

日常小姐:

  1. legalize的目的? 將架構不支持的數據類型/操作轉換為支持的操作, 保證指令選擇時一定有pattern覆蓋.
  2. legalize的內容? type legalize和operation legalize, 分別針對數據類型(映射到哪種寄存器)與節點類型(映射到哪條指令)做legalize.
  3. 影響legalize的后端接口? TargetLoweringBase中的callback.
  4. legalize問題的定位? 按階段打印DAG圖, 比較每次優化后DAG是否是對目標架構而言合法的DAG圖, 注意不同的legalize需要在不同階段實現不能搞錯次序.


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM