clang9適配一階段總結


1. 概述

截止2021年11月25日,clang9完成sdk/gtest/dsopt模塊的編譯。

參照下面的腳本下載了所有[TR-16607] clang9交叉編譯工具鏈制作和驗證 - Enflame Company JIRA相關的修改,包含merged和當前還是open狀態的修改:

怎么從gerrit批量導出詳細的patch - 周榮華_Ronghua - enflame wiki

 

特地說明一下,gerrit的query命令里面不能有括號,所以實際如果存在多個條件的復雜聯合時,默認是AND運算,如果想使用OR運算的話,需要把多個可選表達式用OR連接起來。

 

簡單統計了一下,新增3924行代碼,刪除4164行代碼:

PS D:\code> grep "^+[^+]" .\diffrecord.txt |wc
   3924   24785  152346
PS D:\code> grep "^-[^-]" .\diffrecord.txt |wc
   4164   23159  147430

 

前期修改的時候,由於打開了-Werr選項,所以有一些是不太重要的告警,由於告警實在太多,后期將-Werr臨時先關閉了,只保留了部分特定的Werr選項。

另外,由於tops下面的代碼中從大的整型向小的整型隱式轉換的非常多,后面還用-Wno-c++11-narrowing臨時關閉了相關告警。

 

2. 問題發現和解決的方法

如果每次發現一個問題之后,修改完之后,再走全量編譯,通常非常耗時,下面的方法可以獲取單個的編譯或者鏈接命令,便於針對性驗證。

2.1. cmake的編譯命令獲取

cmake有編譯字典,在cmake_build(敲cmake命令的目錄,可能是其他目錄)目錄下會生成一個“compile_commands.json”文件,里面記錄了所有.c/.cc/.cpp生成.o的目錄和完整命令,例如想知道

hlir_utils_test.cc的編譯命令,可以用下面的途徑獲取:
grep hlir_utils_test.cc compile_commands.json
  "command": "/opt/efb/clang9/bin/clang++  -DLLVM_DISABLE_ABI_BREAKING_CHECKS_ENFORCING -D_GLIBCXX_USE_CXX11_ABI=0 -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/sdksrc/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/sdksrc/include/_virtual_includes/include/dtu -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/lib/umd/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/ef_log/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/sdk -I/home/ronghua.zhou/clang1_build/tops/sdk/lib -I/home/ronghua.zhou/clang1_build/tops/sdk/lib/cpu_ops -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/llvm-project/llvm/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/llvm-project/mlir/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/org_tensorflow -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/eigen_archive -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/com_google_absl -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/com_google_protobuf/src -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/dtu_sdk/bazel-bin -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/llvm-project/llvm/utils/unittest/googlemock/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/com_googlesource_code_re2 -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/lib -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/org_tensorflow -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/llvm-project/llvm/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/llvm-project/mlir/include -isystem /home/ronghua.zhou/clang1_build/tops/3rdparty/googletest/include -isystem /home/ronghua.zhou/clang1_build/tops/3rdparty/googletest  -O3 -g0 -DNDEBUG -fPIE   -m64 -march=x86-64 -mtune=generic -Werror=array-bounds -Werror=empty-body -Werror=format-extra-args -Werror=incompatible-pointer-types -Werror=array-bounds-pointer-arithmetic -Werror=c++-compat -Werror=shift-count-overflow -Werror=sizeof-pointer-memaccess -Werror=for-loop-analysis -Werror=unused-label -Werror=delete-incomplete -Werror=empty-translation-unit -Werror=unused-local-typedef -Werror=gnu-case-range -Werror=mismatched-new-delete -Werror=infinite-recursion -Werror=unreachable-code -Werror=sometimes-uninitialized -Werror=c++14-binary-literal -Werror=implicit-fallthrough -Werror=constant-logical-operand -Werror=exceptions -fcxx-exceptions -Werror=extra-tokens -Werror=format -Werror=format-security -Werror=header-guard -Werror=literal-conversion -Werror=null-conversion -Werror=pointer-bool-conversion -Werror=shift-overflow -Werror=tautological-constant-out-of-range-compare -Werror=tautological-pointer-compare -Werror=varargs -Wdouble-promotion -Wno-error=extern-c-compat -Wall -Wno-c++11-narrowing -Wextra -fsanitize=address -fno-omit-frame-pointer -std=gnu++14 -std=gnu++14 -o sdk/tests/hlir/cc_tests/CMakeFiles/hlir_utils_test.dir
hlir_utils_test.cc.o -c /home/ronghua.zhou/clang1_build/tops/sdk/tests/hlir/cc_tests/hlir_utils_test.cc",
  "file": "/home/ronghua.zhou/clang1_build/tops/sdk/tests/hlir/cc_tests/hlir_utils_test.cc"

 

 

2.2. bazel的編譯命令獲取

https://github.com/vincent-picaud/Bazel_and_CompileCommands

上面這個開源項目提到可以用–experimental_action_listener=//tools/actions:generate_compile_commands_listener到bazel命令的方式來實現接收編譯命令,但我用了幾次沒有成功,最終改為在編譯過程中用原始的ps命令來獲取,例如想獲取hlir_utils_test.ccbian編譯命令可以用下面的命令:

ps -elf |grep hlir_utils_test.cc

另外,bazel命令后面加上-s參數也可以達到獲取后續編譯命令的效果。

2.3. 鏈接命令的獲取

如果知道鏈接的具體目標文件,可以參照2.2的方法用ps命令獲取,例如要鏈接libdtu_sdk.so,可以用下面命令獲取鏈接命令:

ps -elf |grep libdtu_sdk.so

如果不清楚鏈接的具體目標,在鏈接對象不多的情況下可以用“ps -elf”獲取一個全集,從全集里面可以看到很多“ld @/tmp/response-xxx.txt”的進程,將當前所有的/tmp/response*拷貝到別的目錄下,研究下這些文件用來鏈接生成什么目標的,這些文件里面會有完整的鏈接命令和參數,通過這個文件可以得到鏈接命令。

 

3. 實際修改分類

3.1. 編譯選項的修改

3.1.1. 增加的選項

-fcxx-exceptions :因為dsopt使用了異常,clang的異常處理默認關閉,需要打開。

-Wno-c++11-narrowing :tops下面的代碼中從大的整型向小的整型隱式轉換的非常多,臨時關閉,等各個組件消除了相關問題之后再打開,clang里面把從大整型到小整型的隱式轉換當做錯誤處理。

3.1.2. 刪除的選項

-Werror : 告警實在太多,要求消除所有告警不現實,臨時先刪除該選項。

3.1.3. 修改的選項

set (CMAKE_CXX_STANDARD 14) :原來的默認標准是17,和TensorFlow的默認標准14沖突,也和gcc的默認標准14沖突,改成c++14。

-fno-canonical-system-headers :這個參數僅gcc支持,clang不支持,所以把它從所有編譯器都打開,改到僅gcc打開。

3.1.4. bazel的選項說明

bazel的編譯選項分copt/cxxopt/conlyopt,其中copt是c和c++公用的選項,cxxopt是僅c++才是用的選項,conlyopt是僅c才有的選項,如果用錯了,會出現很多告警。

 

3.1.5. CMAKE的CMAKE_TOOLCHAIN_FILE變量在rerun的時候,有一定概率會把搜索路徑下的工具鏈配置文件加上全路徑,導致直接STREQUAL判斷失敗

解決方案是用MATCHES代替STREQUAL,通配是否增加全路徑的情況:

CMakeLists.txt  Expand source

3.2. 模板相關錯誤

3.2.1. use 'template' keyword to treat 'cast' as a dependent template name

clang里面對在一個模板實例化后的對象中調用一個需要動態翻譯的函數,需要使用template顯示說明,否則會報錯。參照ISO C++03 14.2/4:

When the name of a member template specialization appears after . or -> in a postfix-expression, or after nested-name-specifier in a qualified-id, and the postfix-expression or qualified-id explicitly depends on a template-parameter (14.6.2), the member template name must be prefixed by the keyword template. Otherwise the name is assumed to name a non-template.

 

例如hlir的SinkTransposeWithScalarBroadcast類里面調用了mlir::RankedTensorType、mlir::ShapedType的cast方法

 
diff --git a/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc b/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc
index c82fa217a21..9952ddbc470 100644
--- a/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc
+++ b/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc
@@ -237,11 +237,14 @@ struct SinkTransposeWithScalarBroadcast : public mlir::OpRewritePattern<T> {
     }
     llvm::SmallVector<mlir::Value, 4> new_operands(root->getNumOperands(), {});
     for (auto& it : broadcast_ops) {
-      auto transposedTy = getTransposedType(std::get<1>(it)
-                                                ->getResult(0)
-                                                .getType()
-                                                .cast<mlir::RankedTensorType>(),
-                                            prePermutation);
+      // fix error:
+      // use 'template' keyword to treat 'cast' as a dependent template name
+      auto transposedTy =
+          getTransposedType(std::get<1>(it)
+                                ->getResult(0)
+                                .getType()
+                                .template cast<mlir::RankedTensorType>(),
+                            prePermutation);
       auto new_attr = llvm::cast<HlirOp::BroadcastInDimOp>(std::get<1>(it))
                           .broadcast_dimensionsAttr();
       if (new_attr) {
@@ -251,7 +254,7 @@ struct SinkTransposeWithScalarBroadcast : public mlir::OpRewritePattern<T> {
           new_data[i] = layout[data[i]];
         }
         new_attr = mlir::DenseIntElementsAttr::get(
-            new_attr.getType().cast<mlir::RankedTensorType>(),
+            new_attr.getType().template cast<mlir::RankedTensorType>(),
             llvm::makeArrayRef(new_data));
       }
       mlir::Operation* transpose_bs_op =
@@ -274,7 +277,7 @@ struct SinkTransposeWithScalarBroadcast : public mlir::OpRewritePattern<T> {
     mlir::Operation* ret_transpose = rewriter.create<HlirOp::TransposeOp>(
         root->getLoc(), root->getResult(0).getType(), new_root->getResult(0),
         mlir::DenseIntElementsAttr::get(
-            permutation.getType().cast<mlir::ShapedType>(), layout));
+            permutation.getType().template cast<mlir::ShapedType>(), layout));
     root->replaceAllUsesWith(ret_transpose);
   }

 

注意,如果不是模板實例化的函數,不需要加template,同一個類里面也存在不需要處理的函數調用,例如同一個文件里面的ss對象是非模板實例化的,類型是固定的mlir::Operation*,ss在調用存在多態的cast函數時就不需要使用temple進行前置聲明:

 
mlir::Operation* ss = op.getOperation();
auto new_operand_ty = getTransposedType(operand_ty, prePermutation);
auto new_source_ty = getTransposedType(source_ty, prePermutation);
auto new_result_ty = getTransposedType(
    ss->getResult(0).getType().cast<mlir::RankedTensorType>(),
    prePermutation);

 

同樣的問題也存在於factor模塊的factor_profiler_pass.cc中:

diff --git a/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc b/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc
index 43419fd305a..ad23a709f20 100644
--- a/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc
+++ b/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc
@@ -55,11 +55,11 @@ mlir::Value getFirstOperand<mlir::Value>(mlir::Value op) {
  
 template <typename T>
 int getSrcCompressed(T op) {
-  return op.template dma_src_compressedAttr().getInt();
+  return op.dma_src_compressedAttr().getInt();
 }
 template <typename T>
 int getDstDecompressed(T op) {
-  return op.template dma_dst_decompressAttr().getInt();
+  return op.dma_dst_decompressAttr().getInt();
 }
  
 #define DISABLE_DMA_COMPRESS_ATTR_GETTER(OP) \
@@ -84,11 +84,11 @@ DISABLE_DMA_COMPRESS_ATTR_GETTER(mlir::factor::FactorDeSliceOp)
  
 template <typename T>
 int getReverseLr(T op) {
-  return op.template dma_reverse_lrAttr().getInt();
+  return op.dma_reverse_lrAttr().getInt();
 }
 template <typename T>
 int getReverseTb(T op) {
-  return op.template dma_reverse_tbAttr().getInt();
+  return op.dma_reverse_tbAttr().getInt();
 }
  
 #define DISABLE_REVERSE_ATTR_GETTER(OP) \
@@ -114,7 +114,7 @@ DISABLE_REVERSE_ATTR_GETTER(mlir::factor::FactorDeSliceOp)
  
 template <typename T>
 int getDmaType(T op) {
-  return op.template dma_typeAttr().getInt();
+  return op.dma_typeAttr().getInt();
 }
  
 #define DISABLE_DMA_TYPE_GETTER(OP) \
@@ -142,8 +142,8 @@ std::string formatDmaAttrs(int direction, int src_compressed,
 template <typename T>
 void extractDmaMetaInfoTo(T op, dtu_activity_data &data) {
   auto &args = data.args;
-  mlir::Value from = getFirstOperand(op.template from());
-  mlir::Value to = getFirstOperand(op.template to());
+  mlir::Value from = getFirstOperand(op.from());
+  mlir::Value to = getFirstOperand(op.to());
   auto engine_type = getDmaType(op);
   auto direction = op.dma_directionAttr().getInt();

 

3.2.2. 二義性

部分模板實例化的時候,如果同一個調用用模板函數A和模板函數B都能正常匹配到,clang會報二義性錯誤,gcc不報錯。

例如下面的EraseHelp,原來的版本定義了兩種原型,其實對存在多個模板類型需要使用TypeSequence進行原型定義的時候,編譯器其實不知道是該先把Last抽出來計算,還是先把Inner抽出來計算,如果這2個函數的實現邏輯不一樣的話,在gcc里面居然沒報錯,不知道是隨機找到一個匹配的原型就調用,還是用第一個或者最后一個原型來調用。

constexpr static auto EraseHelp(TypeSequence<Left...>, TypeSequence<Last>);

constexpr static auto EraseHelp(TypeSequence<Left...>, TypeSequence<Inner, Right...>);
diff --git a/sdk/lib/hlir/ir/type_utils.h b/sdk/lib/hlir/ir/type_utils.h
index 3cf2bc7994a..0e645fd1e7e 100644
--- a/sdk/lib/hlir/ir/type_utils.h
+++ b/sdk/lib/hlir/ir/type_utils.h
@@ -157,12 +157,9 @@ struct EraseSeqIf {
     using type = decltype(EraseHelp(LeftSeq(), TypeSequence<Right...>()));
     return type();
   }
-  template <typename... Left, typename Last>
-  constexpr static auto EraseHelp(TypeSequence<Left...>, TypeSequence<Last>) {
-    using type = typename std::conditional<!Pred<Last>::value,
-                                           TypeSequence<Left..., Last>,
-                                           TypeSequence<Left...>>::type;
-    return type();
+  template <typename... Left>
+  constexpr static auto EraseHelp(TypeSequence<Left...>, TypeSequence<>) {
+    return TypeSequence<Left...>();
   }
   using type = decltype(EraseHelp(TypeSequence<>(), TypeSequence<T...>()));
 };

 

3.3. 類型不匹配

3.3.1. 大整型向小整型的隱式轉換

例如sdk/tests/llir/dataflow1_pingpang_buffer_test.cc里面定義的func_entry是int64_t類型,但實際調用函數的時候,函數原型要求的入參是uint32_t,會觸發int64_t → uint32_t的隱式轉換:

diff --git a/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc b/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc
index fa824f03d9a..70298b1fb59 100644
--- a/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc
+++ b/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc
@@ -522,7 +522,7 @@ TEST(Pavo2xCDMAPattern1Test, Pavo2xCDMAPattern1WithPingpangTest) {
                              {{0}, {1}, {2}, {3}, {4}, {5}}, 1, 1, 1, -1, -1,
                              output_queues_l1);
  
-    int64_t func_entry = 0;
+    uint32_t func_entry = 0;
     // trigger sip
     for (uint64_t idx = 0; idx < SIP_COUNT; ++idx) {
       std::string sip_name = std::string("sip") + std::to_string(idx);

 

其他類似的有:

sdk/tests/llir/dataflow1_test.cc

sdk/tests/llir/dataflow2_test.cc

sdk/tests/llir/dataflow3_test.cc

sdk/tests/llir/dataflow5_test.cc

sdk/tests/llir/dataflow5_test_1xcdma.cc

sdk/tests/llir/dataflow7_test.cc

sdk/tests/llir/llir2assembler_leo_test.cc

sdk/tests/llir/utils/llir_test_util.cc

sdk/tests/llir/utils/llir_test_util.h

 

3.3.2. 有符號向無符號的隱式轉換

-1轉換為無符號整型:

diff --git a/sdk/lib/hlir/ir/type_utils.h b/sdk/lib/hlir/ir/type_utils.h
index 0e645fd1e7e..f84360269f3 100644
--- a/sdk/lib/hlir/ir/type_utils.h
+++ b/sdk/lib/hlir/ir/type_utils.h
@@ -122,10 +122,9 @@ struct FindIf<Pred, T, R...> {
  
 template <template <typename N> typename Pred, typename T>
 struct FindIf<Pred, T> {
-  using type =
-      typename std::conditional<Pred<T>::value,
-                                std::integral_constant<size_t, 0>,
-                                std::integral_constant<size_t, -1>>::type;
+  using type = typename std::conditional<
+      Pred<T>::value, std::integral_constant<size_t, 0>,
+      std::integral_constant<size_t, static_cast<size_t>(-1)>>::type;
 };

 

其他主要體現在迭代器定義的是int類型,但實際使用過程中需要和很多uint32_t進行比較,導致了隱式的int → uint32的轉換:

diff --git a/sdk/lib/umd/tests/sample/launch_code.cc b/sdk/lib/umd/tests/sample/launch_code.cc
index 1152a283052..708b1f44e7d 100644
--- a/sdk/lib/umd/tests/sample/launch_code.cc
+++ b/sdk/lib/umd/tests/sample/launch_code.cc
@@ -719,11 +716,10 @@ static void _launch_code_for_eight_sip(int cid, bool check_result) {
   dtu_mem_handle param = cluster_mem[cid];
   u64 param_off = A_B_SIZE + EIGHT_C_SIZE;
   u64 param_size = PARAM_TRUE_SIZE;
-  u16 launch_entry = 0;
   dtu_sip_mode_cfg_st mode;
   mode.mode_dw = 0x5070f10;
   LaunchKernelParameter parameter[8];
-  for (int i = 0; i < run_sip_count; i++) {
+  for (u32 i = 0; i < run_sip_count; i++) {
     parameter[i] =
         LaunchKernelParameter(sip[i], param, param_off + i * ONE_PARAM_SIZE,
                               param_size, 0, mode, 0, false, false, "op_0");

 

其他文件:

sdk/lib/spm/src/buddy_policy.c

system_test/tools/vpd_cycle/vpd_cycle.c

sdk/lib/spm/include/spm.h

sdk/tests/llir/llir2assembler_leo_test.cc

sdk/tests/llir/dataflow5_test_1xcdma.cc

sdk/tests/llir/dataflow5_test.cc

sdk/tests/llir/llir2assembler_leo_test.cc

sdk/tests/llir/utils/llir_test_util.cc

sdk/tests/llir/utils/llir_test_util.h

 

對sdk/lib/umd/tools/kernel_code_processor/dturt.inc的修改要麻煩一點,___leo_runtime___和___x_runtime___定義的時候是char[],但初始化有可能大於127,會導致溢出,但使用該變量的函數,以及二級引用的函數,都要求它是char[],最終修改是定義改成unsigned char[],但在一級引用的函數中做一次強制轉換。

diff --git a/sdk/lib/umd/tools/kernel_code_processor/dturt.inc b/sdk/lib/umd/tools/kernel_code_processor/dturt.inc
index 1f22b52d8af..d1ed30a049d 100644
--- a/sdk/lib/umd/tools/kernel_code_processor/dturt.inc
+++ b/sdk/lib/umd/tools/kernel_code_processor/dturt.inc
@@ -1,4 +1,4 @@
-static const char ___leo_runtime___[] = {
+static const unsigned char ___leo_runtime___[] = {
     0x21, 0x3C, 0x61, 0x72, 0x63, 0x68, 0x3E, 0x0A, 0x2F, 0x20, 0x20, 0x20,
     0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,
     0x30, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,
@@ -2885,7 +2885,7 @@ static const char ___leo_runtime___[] = {
     0x00, 0x00,
 };
 static const int ___leo_runtime_size___ = sizeof(___leo_runtime___);
-static const char ___x_runtime___[] = {
+static const unsigned char ___x_runtime___[] = {
     0x21, 0x3C, 0x61, 0x72, 0x63, 0x68, 0x3E, 0x0A, 0x2F, 0x20, 0x20, 0x20,
     0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,
     0x30, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,

 

diff --git a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h
index d193a8823ac..f61d048cfd6 100644
--- a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h
+++ b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h
@@ -128,7 +128,10 @@ class Kernel {
   struct __target__ : public KernelCode<__target__>, public Kernel {         \
     using KernelCode<__target__>::KernelCode;                                \
     static const llvm::StringRef GetArch() { return #__arch__; }             \
-    static const char* GetRTBuffer() { return ___##__arch__##_runtime___; }  \
+    static const char* GetRTBuffer() {                                       \
+      return static_cast<char*>(static_cast<void*>(                          \
+          const_cast<unsigned char*>(___##__arch__##_runtime___)));          \
+    }                                                                        \
     static int GetRTBufferSize() { return ___##__arch__##_runtime_size___; } \
   };                                                                         \
   template class KernelCode<__target__>

 

 

 

 

3.3.3. 浮點向整型的隱式轉換

小數點直接轉沒了,非0值立即成了0值:

diff --git a/sdk/tests/tops/tops_bnForwardTrainingEx_integration_test.cc b/sdk/tests/tops/tops_bnForwardTrainingEx_integration_test.cc
index 73712aba4ad..df82dadfa65 100644
--- a/sdk/tests/tops/tops_bnForwardTrainingEx_integration_test.cc
+++ b/sdk/tests/tops/tops_bnForwardTrainingEx_integration_test.cc
@@ -1890,7 +1890,7 @@ TEST_F(TopsTest, topsConvolutionForward_BatchNorm_RELU_UV) {
   int k_c = 1;
   int k_h = 3;
   int k_w = 3;
-  int epsilon = 0.01;
+  float epsilon = 0.01;
  
   int input_size = n * c * h * w;
   int kernel_size = k_n * k_c * k_h * k_w;
@@ -2181,7 +2181,7 @@ TEST_F(TopsTest, topsConvolutionForward_BatchNorm_RELU_SV) {
   int k_c = 1;
   int k_h = 3;
   int k_w = 3;
-  int epsilon = 0.01;
+  float epsilon = 0.01;
  
   int input_size = n * c * h * w;
   int kernel_size = k_n * k_c * k_h * k_w;

 

其他類似修改:

sdk/tests/op/hlir/pavo/bert/hlir_div_test.cc

 

3.3.4. double向float的隱式轉換

diff --git a/sdk/lib/umd/tests/sample/launch_code.cc b/sdk/lib/umd/tests/sample/launch_code.cc
index 1152a283052..708b1f44e7d 100644
--- a/sdk/lib/umd/tests/sample/launch_code.cc
+++ b/sdk/lib/umd/tests/sample/launch_code.cc
@@ -783,8 +779,8 @@ static void _launch_code_for_eight_sip(int cid, bool check_result) {
     float *result = (float *)((u64)dtu_mem_get_cpu_ptr(host_mem) + A_B_SIZE);
     for (u32 i = 0; i < (run_sip_count * DTU_ALIGN(DATA_BUFF_SIZE, 128)) / 4;
          i++) {
-      if (result[i] - ((2 * i) % (DATA_BUFF_SIZE / 2)) > 0.01 ||
-          ((2 * i) % (DATA_BUFF_SIZE / 2)) - result[i] > 0.01) {
+      if (result[i] - ((2 * i) % (DATA_BUFF_SIZE / 2)) > 0.01f ||
+          ((2 * i) % (DATA_BUFF_SIZE / 2)) - result[i] > 0.01f) {
         dtu_command_queue_destroy(queue);
         dtu_mem_free_hbm(hbm_mem);
         dtu_mem_free_host(host_mem);
@@ -425,7 +425,7 @@ static void launch_code_for_one_sip(void) {
  
   float *result = (float *)((u64)dtu_mem_get_cpu_ptr(host_mem) + A_B_SIZE);
   for (int i = 0; i < DATA_BUFF_SIZE / 4; i++) {
-    if (result[i] - (2 * i) > 0.01 || (2 * i) - result[i] > 0.01) {
+    if (result[i] - (2 * i) > 0.01f || (2 * i) - result[i] > 0.01f) {
       dtu_command_queue_destroy(queue);
       dtu_mem_free_hbm(hbm_mem);
       dtu_mem_free_host(host_mem);
@@ -605,8 +605,8 @@ static void launch_one_sip_twice(void) {
  
   float *result = (float *)((u64)dtu_mem_get_cpu_ptr(host_mem) + A_B_SIZE);
   for (int i = 0; i < 2 * DATA_BUFF_SIZE / 4; i++) {
-    if (result[i] - ((2 * i) % (DATA_BUFF_SIZE / 2)) > 0.01 ||
-        ((2 * i) % (DATA_BUFF_SIZE / 2)) - result[i] > 0.01) {
+    if (result[i] - ((2 * i) % (DATA_BUFF_SIZE / 2)) > 0.01f ||
+        ((2 * i) % (DATA_BUFF_SIZE / 2)) - result[i] > 0.01f) {
       dtu_command_queue_destroy(queue);
       dtu_mem_free_hbm(hbm_mem);
       dtu_mem_free_host(host_mem);

 

其他類似修改:

sdk/tests/op/hlir/pavo/resnet50/hlir_general_resize_test.cc

3.3.5. 指針向bool的隱式轉換

diff --git a/system_test/tools/vpd_cycle/vpd_cycle.c b/system_test/tools/vpd_cycle/vpd_cycle.c
index 31d57fa0f9c..ccc9f71b827 100644
--- a/system_test/tools/vpd_cycle/vpd_cycle.c
+++ b/system_test/tools/vpd_cycle/vpd_cycle.c
@@ -75,14 +83,14 @@ static int ProcessDB(const char *path) {
   char *name = strdup(path);
   char *base = basename(name);
   char *p;
-  if (p = strrchr(base, '.')) *p = '\0';
+  if ((p = strrchr(base, '.')) != NULL) *p = '\0';
   fprintf(output_fp, "%s,%lu\n", base, end - start);
   free(name);

 

 

3.3.6. 不同類型隱式轉換

 

fixed_size_mem_pool.h直接將dtu_status和int相互賦值,雖然dtu_status是個enum類型,和int類型很類似,但clang是強類型檢查,直接報錯。
diff --git a/sdk/runtime/lib/top_scheduler/fixed_size_mem_pool.h b/sdk/runtime/lib/top_scheduler/fixed_size_mem_pool.h
index 73d02f3b1f4..b7be6ee39c4 100644
--- a/sdk/runtime/lib/top_scheduler/fixed_size_mem_pool.h
+++ b/sdk/runtime/lib/top_scheduler/fixed_size_mem_pool.h
@@ -118,7 +118,7 @@ class DeviceFixedSizeMemPool final
   ~DeviceFixedSizeMemPool() {}
  
   Status Init(dtu_umd::MemoryMgr *mgr, uint32_t mc, uint32_t flags) override {
-    dtu_status status = 0;
+    dtu_status status = DTU_SUCCESS;
     status =
         mgr->AllocDevice(NODE_NUMBER * NODE_SIZE, mc, flags, &(this->mem_));
     if (status) {
--

 

dtu_status的定義:

 
typedef enum dtu_status_code {
  DTU_SUCCESS = 0,
  DTU_ERROR_INVALID_PARAMETER = -100,
  DTU_ERROR_INVALID_MEM_TYPE = -101,
  DTU_ERROR_OUT_OF_MEMORY = -102,
  DTU_ERROR_OUT_OF_RESOURCES = -103,
  DTU_ERROR_NOT_INITIALIZED = -104,
  DTU_ERROR_INVALID_CTX_OBJ = -105,
  DTU_ERROR_INVALID_CLUSTER_OBJ = -106,
  DTU_ERROR_INVALID_SIP_OBJ = -107,
  DTU_ERROR_INVALID_MEM_OBJ = -108,
  DTU_ERROR_INVALID_CMD_QUEUE_OBJ = -109,
  DTU_ERROR_INVALID_CMD_DESC_OBJ = -110,
  DTU_ERROR_INVALID_PROGRAM_OBJ = -111,
  DTU_ERROR_INVALID_FUNCTION_OBJ = -112,
  DTU_ERROR_INVALID_EVENT_OBJ = -113,
  DTU_ERROR_CLUSTER_BUSY = -114,
  DTU_ERROR_SIP_BUSY = -115,
  DTU_ERROR_IN_DRM = -116,
  DTU_ERROR_IN_IOCTRL = -117,
  DTU_ERROR_GEM_CREATE = -118,
  DTU_ERROR_GEM_CLOSE = -119,
  DTU_ERROR_GEM_MMAP = -120,
  DTU_ERROR_GEM_UNMMAP = -121,
  DTU_ERROR_CMD_QUEUE_SYNC = -122,
  DTU_ERROR_CMD_QUEUE_EMIT = -123,
  DTU_ERROR_CLUSTER_ACQUIRE = -124,
  DTU_ERROR_CLUSTER_RELEASE = -125,
  DTU_ERROR_NOT_MATCH = -126,
  DTU_ERROR_NOT_RELEASE_REF = -127,
  DTU_ERROR_GET_DEVICE_HDL = -128,
  DTU_ERROR_ALLOC_HOST = -129,
  DTU_ERROR_ALLOC_HBM = -130,
  DTU_ERROR_ALLOC_CLUSTER = -131,
  DTU_ERROR_FREE_HOST = -132,
  DTU_ERROR_FREE_HBM = -133,
  DTU_ERROR_FREE_CLUSTER = -134,
  DTU_ERROR_CMD_QUEUE_EMITED = -135,
  DTU_ERROR_OPEN_FILE = -136,
  DTU_ERROR_READ_FILE = -137,
  DTU_ERROR_WRITE_FILE = -138,
  DTU_ERROR_INVALID_BIN_TYPE = -139,
  DTU_ERROR_LOAD_BIN_FILE = -140,
  DTU_ERROR_LOAD_BIN_IMAGE = -141,
  DTU_ERROR_FUNCTION_NOT_FOUND = -142,
  DTU_ERROR_INVALID_OPERATION = -143,
  DTU_ERROR_EVENT_GET_ID = -144,
  DTU_ERROR_EVENT_WAIT_STATUS = -145,
  DTU_ERROR_EVENT_SIGNAL_STATUS = -146,
  DTU_ERROR_EVENT_TYPE = -147,
  DTU_ERROR_EVENT_NOT_SUBMIT = -148,
  DTU_ERROR_EVENT_DESTROYED = -149,
  DTU_ERROR_EVENT_SIGNAL_TWICE = -150,
  DTU_ERROR_MEMORY_OVERLAP = -151,
  DTU_ERROR_THREAD_POOL_QUEUE_OVERFLOW = -152,
  DTU_ERROR_PCI_BUS_SCAN = -153,
  DTU_ERROR_ALLOC_USERPTR = -154,
  DTU_ERROR_FREE_USERPTR = -155,
  DTU_ERROR_DUMP_CMEM = -156,
  DTU_ERROR_LOAD_CMEM = -157,
  DTU_ERROR_DUMP_SMEM = -158,
  DTU_ERROR_LOAD_SMEM = -159,
  DTU_ERROR_READ_REGISTERS = -160,
  DTU_ERROR_WRITE_REGISTERS = -161,
  DTU_ERROR_ALLOC_SIP = -162,
  DTU_ERROR_FREE_SIP = -163,
  DTU_ERROR_UNKNOWN = -164,
  DTU_ERROR_ALLOC_HUGE = -165,
  DTU_ERROR_INVALID_USR_IRQ_OBJ = -166,
  DTU_ERROR_LINK_CCIX_IO = -167,
  DTU_ERROR_PLACEHOLDER_NOT_FEED = -168,
  DTU_ERROR_LAUNCH_DMA = -169,
  DTU_ERROR_INVALID_PROFILE_MAGIC = -170,
  DTU_ERROR_INVALID_TIMESTAMP = -180,
  DTU_ERROR_INVALID_CONFIG = -181,
  DTU_ERROR_CHILD_NOT_SUBMIT = -182,
  DTU_ERROR_ALREADY_FORKED = -183,
  DTU_ERROR_LABEL_USED = -184,
  DTU_ERROR_LABEL_NOT_VALIDATED = -185,
  DTU_ERROR_COMMAND_TYPE_MISMATCH = -186,
  DTU_ERROR_VECTOR_NUMBER = -187,
  DTU_ERROR_VECTOR_FLAG_MISMATCH = -188,
  DTU_ERROR_DEVICE_RESET = -189,
  DTU_ERROR_EXECUTABLE_CRC_VERIFY = -190,
  DTU_ERROR_EXECUTABLE_DEVICE_VERIFY = -191,
  DTU_ERROR_INVALID_TS_OBJ = -192,
  DTU_ERROR_ALLOC_VDEV = -193,
  DTU_ERROR_FREE_VDEV = -194,
  DTU_ERROR_VDEV_BUSY = -195,
} dtu_status;

 

 

NULL和0的值雖然一樣,但前者的類型是void*,后者類型是int,差別很大的。

diff --git a/sdk/lib/umd/tests/sample/sample_run.cc b/sdk/lib/umd/tests/sample/sample_run.cc
index c5a3557c2a5..23e9563859b 100644
--- a/sdk/lib/umd/tests/sample/sample_run.cc
+++ b/sdk/lib/umd/tests/sample/sample_run.cc
@@ -35,7 +35,7 @@ void usage() {
  
 dtu_context ctx;
 dtu_cluster cluster[4] = {NULL};
-u32 cluster_id[4] = {NULL};
+u32 cluster_id[4] = {0};
 dtu_mem_handle cluster_mem[4] = {NULL};
 dtu_sip sip[32] = {NULL};

 

3.3.7. 函數原型中的const隱式轉換

diff --git a/sdk/lib/cpu/cpu_func_manager.cc b/sdk/lib/cpu/cpu_func_manager.cc
index 940bde5d91a..ec8967203c2 100644
--- a/sdk/lib/cpu/cpu_func_manager.cc
+++ b/sdk/lib/cpu/cpu_func_manager.cc
@@ -31,7 +31,7 @@ struct FunctionInvoker {
   }
   template <size_t... idx>
   void unpack(std::index_sequence<idx...> seq, const void* func, char** argvs) {
-    (*reinterpret_cast<void (*)(...)>(func))(argvs[idx]...);
+    (*reinterpret_cast<void (*)(...)>(const_cast<void*>(func)))(argvs[idx]...);
   }
 };

 

3.3.8. void*向char*的隱式轉換

很多模塊直接對void*指針多算術運算,void*指向的對象大小是未知的,一般如果把它作為地址進行+或者-運算,實際上是自己先做了一次隱式的void* → char*的轉換,clang中不允許這樣做:

diff --git a/sdk/tests/hlir/cc_tests/hlir_4c_add_test.cc b/sdk/tests/hlir/cc_tests/hlir_4c_add_test.cc
index 5b9c2dcc98f..7b568f934bb 100644
--- a/sdk/tests/hlir/cc_tests/hlir_4c_add_test.cc
+++ b/sdk/tests/hlir/cc_tests/hlir_4c_add_test.cc
@@ -66,8 +66,8 @@ static void Add4CTest(SimpleModuleOpBuilder::ShapeType &shape,
   executor.run(false);
  
   auto output_hanlde = executor.get_output(0);
-  T* result =
-      static_cast<T*>(output_hanlde->CPUPtr() + output_hanlde->offset());
+  T* result = static_cast<T*>(static_cast<void*>(
+      static_cast<char*>(output_hanlde->CPUPtr()) + output_hanlde->offset()));
   for (size_t i = 0; i < l_data.size(); ++i) {
     EXPECT_EQ(result[i], out_data[i]);
   }

 

其他類似修改:

sdk/tests/hlir/cc_tests/hlir_corner_test.cc

sdk/tests/hlir/cc_tests/hlir_press_test.cc

sdk/tests/tops/tops_dot_test.cc

sdk/tests/op/hlir/pavo/bert/hlir_broadcast_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_transpose_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_test_header.h

sdk/tests/op/hlir/pavo/resnet50/hlir_slice_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_select_and_scatter_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_select_and_scatter_non4c_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_pad_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_dynamic_update_slice_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_dynamic_slice_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_concat_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_broadcast_test.cc

sdk/tests/op/hlir/pavo/dnn/hlir_test_header.h

sdk/tests/op/hlir/hlir_test_header.h

sdk/tests/runtime/executable_test.cc

3.3.9. string類型到char*的隱式轉換

diff --git a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
index 7e366337561..41fb573a562 100644
--- a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
+++ b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
@@ -23,7 +23,7 @@ KernelCode<T>::KernelCode(StringRef file)
     : compiled_(false), name_(file), module_("KernelModule", context_) {
   auto mb_or_err = MemoryBuffer::getFile(file);
   if (auto ec = mb_or_err.getError()) {
-    EF_PRINT(UmdMsg::UMD_CANNOT_OPEN_MSG, file.data(), ec.message());
+    EF_PRINT(UmdMsg::UMD_CANNOT_OPEN_MSG, file.data(), ec.message().c_str());
     EF_THROW_WITH << -1 << std::endl;
   }

 

 

3.4. switch中break缺失

3.4.1. 語義上確實需要break的場景,增加break

例如parser.hpp里面在最后的default分支之前沒有加break,雖然由於default分支當前是空的,所以實際上不影響功能,但萬一后面default分支增加了任何處理,就會出問題:

diff --git a/3rdparty/inja/include/inja/parser.hpp b/3rdparty/inja/include/inja/parser.hpp
index 6266c4a0f74..466499ecc8b 100644
--- a/3rdparty/inja/include/inja/parser.hpp
+++ b/3rdparty/inja/include/inja/parser.hpp
@@ -296,7 +296,7 @@ class Parser {
           operator_stack.pop();
           function_stack.pop();
         }
-      }
+      } break;
       default:
         break;
       }

 

其他類似修改:

sdk/sdk.bzl
sdk/third_party/inja.patch

 

3.4.2. 語義上確實不需要break的場景,增加編譯指示,讓編譯器忽略檢查

這樣的問題比較普遍。

diff --git a/sdk/tests/runtime/chunk_allocator_test.cc b/sdk/tests/runtime/chunk_allocator_test.cc
index e63568ddc63..78896778a21 100644
--- a/sdk/tests/runtime/chunk_allocator_test.cc
+++ b/sdk/tests/runtime/chunk_allocator_test.cc
@@ -552,6 +552,10 @@ TEST_F(ChunkAllocatorTest, copy_constructor_test) {
     uint64_t offset0 = 0;
     uint64_t offset1 = 0;
  
+#if defined(__clang__)
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wimplicit-fallthrough"
+#endif
     switch (op) {
       case TestOpAllocTopDown:
       case TestOpAllocDownTop: {
@@ -590,6 +594,9 @@ TEST_F(ChunkAllocatorTest, copy_constructor_test) {
         }
       } break;
     }
+#if defined(__clang__)
+#pragma clang diagnostic pop
+#endif
   }
 }
 }  // namespace

 

 

其他類似修改:

sdk/tests/runtime/mem_manager_test.cc
sdk/tests/runtime/mem_pool_test.cc
sdk/tools/dtu_compiler/dtu_compiler.cc
sdk/lib/umd/tests/sample/tinyxmlparser.cc

另外,C++17開始支持fallthrough的attribute,可以比較簡單的告訴編譯器需要fallthrough:C++ attribute: fallthrough (since C++17) - cppreference.com

 
        

3.5. format不匹配問題

3.5.1. 不匹配,但實際上不影響功能

format的string和后面實際傳遞的參數不一致的情況下,有可能導致嚴重問題,不過tops下面的代碼很多是ll類型傳遞了64位數據,實際上對功能影響不大,但如果后面有128位處理器,可能ll就是實際上128位,就可能導致堆棧異常。

 

diff --git a/sdk/tests/runtime/chunk_allocator_test.cc b/sdk/tests/runtime/chunk_allocator_test.cc
index e63568ddc63..78896778a21 100644
--- a/sdk/tests/runtime/chunk_allocator_test.cc
+++ b/sdk/tests/runtime/chunk_allocator_test.cc
@@ -426,7 +426,7 @@ TEST_F(ChunkAllocatorTest, basic_stress_test) {
           if (allocated_size < allocated_size_pass) {
             char str_buf[256];
             snprintf(str_buf, sizeof(str_buf),
-                     "allocated_size: %llx, allocated_chunks.size(): %lu",
+                     "allocated_size: %lx, allocated_chunks.size(): %lu",
                      allocated_size, allocated_chunks.size());
             EXPECT_TRUE(false) << str_buf;
             break;

 

其他文件:

sdk/lib/umd/tests/sample/mm_test.cc

sdk/include/driver/mem_handle.h

sdk/include/runtime/command_packet.h

sdk/include/driver/mem_handle.h

sdk/tests/runtime/mem_pool_test.cc

sdk/lib/umd/tests/sample/performance_test.cc

sdk/tests/profile/test_zebu.cc

sdk/runtime/tests/top_scheduler/loop_task_utils.h

 

 

3.5.2. 不匹配,並且影響功能

 

下面本意是打印uint16_t*的指針指向的數據,錯誤傳遞成指針,相當於打印的是一個地址,而不是值,幸好只是一句打印,但實際上%hu對應的是32位,而入參指針在64位機器上是64位,還是會破壞堆棧:

diff --git a/sdk/include/runtime/command_packet.h b/sdk/include/runtime/command_packet.h
index a2d061e9117..5006a601cc0 100644
--- a/sdk/include/runtime/command_packet.h
+++ b/sdk/include/runtime/command_packet.h
@@ -362,7 +362,7 @@ struct CommandPacket {
    */
   static std::string MemberToString(uint16_t* p, std::string tab = "    ") {
     char buf[256];
-    snprintf(buf, sizeof(buf), "%hu", p);
+    snprintf(buf, sizeof(buf), "%hu", *p);
     return buf;
   }

 

3.6. 有定義無使用

3.6.1. 未使用變量

 

diff --git a/sdk/lib/umd/tests/sample/launch_code.cc b/sdk/lib/umd/tests/sample/launch_code.cc
index 1152a283052..708b1f44e7d 100644
--- a/sdk/lib/umd/tests/sample/launch_code.cc
+++ b/sdk/lib/umd/tests/sample/launch_code.cc
@@ -180,7 +180,6 @@ static void launch_code_with_cluster_check(void) {
   dtu_mem_handle param = cluster_mem[0];
   u64 param_off = A_B_SIZE + ONE_C_SIZE;
   u64 param_size = PARAM_TRUE_SIZE;
-  u16 launch_entry = 0;
   dtu_sip_mode_cfg_st mode;
   mode.mode_dw = 0x5070f10;
   LaunchKernelParameter parameter(sip[0], param, param_off, param_size, 0, mode,
@@ -363,7 +362,6 @@ static void launch_code_for_one_sip(void) {
   dtu_mem_handle param = cluster_mem[0];
   u64 param_off = A_B_SIZE + ONE_C_SIZE;
   u64 param_size = PARAM_TRUE_SIZE;
-  u16 launch_entry = 0;
   dtu_sip_mode_cfg_st mode;
   mode.mode_dw = 0x5070f10;
   LaunchKernelParameter parameter(sip[0], param, param_off, param_size, 0, mode,
@@ -537,7 +535,6 @@ static void launch_one_sip_twice(void) {
   dtu_mem_handle param = cluster_mem[0];
   u64 param_off = A_B_SIZE + TWO_C_SIZE;
   u64 param_size = PARAM_TRUE_SIZE;
-  u16 launch_entry = 0;
   dtu_sip_mode_cfg_st mode;
   mode.mode_dw = 0x5070f10;
   LaunchKernelParameter parameter[2];
@@ -719,11 +716,10 @@ static void _launch_code_for_eight_sip(int cid, bool check_result) {
   dtu_mem_handle param = cluster_mem[cid];
   u64 param_off = A_B_SIZE + EIGHT_C_SIZE;
   u64 param_size = PARAM_TRUE_SIZE;
-  u16 launch_entry = 0;
   dtu_sip_mode_cfg_st mode;
   mode.mode_dw = 0x5070f10;
   LaunchKernelParameter parameter[8];

 

其他文件:

sdk/tests/spm/basic.cc

sdk/lib/spm/src/best_fit_policy.c

 

 

3.6.2. 未使用參數

非常多,尤其涉及一些第三方組件,還要專門制作patch的方式修改,后面忍不住把Werr關掉主要也是因為這個告警:

 
diff --git a/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc b/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc
index abd50ad4e81..1b39f594406 100644
--- a/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc
+++ b/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc
@@ -16,6 +16,7 @@
 #include "dtu/umd/dtu.h"
 #include "dtu/umd/dtu_base_obj.h"
 #include "dtu/umd/dtu_log.h"
+#include "dtu/umd/dtu_utils.h"
 #include "lib/umd/src/dtu_memory.h"
 #include "lib/umd/tests/sample/sample.h"
 #include "lib/umd/tests/sample/sample_assert.h"
@@ -26,6 +27,7 @@ std::mutex mtx;
  
 void event_callback_func_1(dtu_callback callback, void *user_data,
                            u32 engine_id) {
+  MAYBE_UNUSED(callback);
   std::unique_lock<std::mutex> lock(mtx);
   *(int *)user_data = 1;
   DTU_ERROR_LOG(TEST, "event callback_1 call[%d]\n", engine_id);
@@ -33,6 +35,7 @@ void event_callback_func_1(dtu_callback callback, void *user_data,
  
 void event_callback_func_2(dtu_callback callback, void *user_data,
                            u32 engine_id) {
+  MAYBE_UNUSED(callback);
   std::unique_lock<std::mutex> lock(mtx);
   *(int *)user_data = 1;
   DTU_ERROR_LOG(TEST, "event callback_2 call[%d]\n", engine_id);

 

其他文件:

tools/logging/lib/logging/log.cc

tools/logging/lib/logging/to/file.cc

tools/logging/lib/logging/to/std_err.cc

tools/logging/lib/util/signal_handler.cc

tools/logging/tests/logging/log_to_test.h

sdk/lib/umd/include/dtu_utils.h

sdk/lib/umd/include/reference_obj.h

3rdparty/protobuf-3.8.0/src/google/protobuf/arena.h

sdk/lib/umd/tests/sample/device_reset.cc

sdk/lib/umd/tests/sample/usr_irq.cc

sdk/lib/umd/tests/sample/callback_test.cc

3rdparty/protobuf-3.8.0/src/google/protobuf/map_type_handler.h

3rdparty/protobuf-3.8.0/src/google/protobuf/parse_context.h

kmd/utils/ktest/kmd-test.cpp

sdk/lib/spm/src/buddy_policy.c

sdk/lib/umd/include/dtu_command_obj.h

sdk/lib/umd/include/dtu_context_obj.h

sdk/lib/umd/include/dtu_dqm_obj.h

sdk/lib/umd/include/dtu_driver.h

system_test/tools/vpd_cycle/vpd_cycle.c

sdk/lib/spm/src/buddy_policy.c

sdk/lib/spm/src/interface.c

sdk/lib/spm/src/rbtree.c

sdk/lib/umd/include/dtu_device.h

sdk/lib/umd/include/dtu_driver.h

 

另外,tools/logging/include/logging/check.h里面的未使用變量比較特殊,實際上是要用的,不過接口調用錯了,導致信息傳遞中丟失了:

diff --git a/tools/logging/include/logging/check.h b/tools/logging/include/logging/check.h
index eb856b7df85..67a667477f1 100644
--- a/tools/logging/include/logging/check.h
+++ b/tools/logging/include/logging/check.h
@@ -47,16 +47,16 @@
 #define EFCHECK_STRCASENE(s1, s2) EF_DTU_CHECK_STROP(strcasecmp, !=, false, s1, s2)
  
 #undef EFCHECK_NOTNULL
-#define EFCHECK_NOTNULL(val) \
-  ::ef_log::CheckNotNull(__FILE__, __LINE__, "'" #val "' Must be non NULL", (val))
-
+#define EFCHECK_NOTNULL(val)                                                \
+  ::ef_log::CheckNotNull(__FILE__, __LINE__, "'" #val "' Must be non NULL", \
+                         (val))
  
 namespace ef_log {
  
 template <typename T>
 T&& CheckNotNull(const char* file, int line, const char* exprtext, T&& t) {
   if (t == nullptr) {
-    EFLOG(FATAL) << std::string(exprtext);
+    ::ef_log::FatalLog(file, line) << std::string(exprtext);
   }
   return std::forward<T>(t);
 }

 

 

 

3.6.3. 未使用label

 

diff --git a/sdk/include/scheduler/cmd_packet_pass_util.h b/sdk/include/scheduler/cmd_packet_pass_util.h
index d56b5f362e8..1cc18ddc603 100644
--- a/sdk/include/scheduler/cmd_packet_pass_util.h
+++ b/sdk/include/scheduler/cmd_packet_pass_util.h
@@ -457,7 +457,6 @@ void MultiThreadDo(PacketGraph* graph, InitFuncS initf, ThreadFunc f,
   uninif(core_count);
  
   delete[] ptl;
-Exit0:
   return;
 }

 

 

3.6.4. 執行不到的代碼

下面代碼開發解釋是當前不支持,又不想刪除,先加個注釋:

diff --git a/sdk/runtime/tests/top_scheduler/TimerTest.cc b/sdk/runtime/tests/top_scheduler/TimerTest.cc
index cb1e2269dd4..5aea6c1f956 100644
--- a/sdk/runtime/tests/top_scheduler/TimerTest.cc
+++ b/sdk/runtime/tests/top_scheduler/TimerTest.cc
@@ -127,12 +127,12 @@ TEST_F(TimerTest, Timer) {
     L3DMA = EngineType::Type::ODMA;
   } else if (IsPavoT20() || IsPavoT21()) {
     return;  // Need TS FW;
-    assembler = new ExecutableAssembler(TargetType::PAVO);
-    L3DMA = EngineType::Type::CDMA_LITE;
+    // assembler = new ExecutableAssembler(TargetType::PAVO);
+    // L3DMA = EngineType::Type::CDMA_LITE;
   } else if (IsDoradoI20() || IsDoradoI21()) {
     return;  // Need TS FW;
-    assembler = new ExecutableAssembler(TargetType::DORADO);
-    L3DMA = EngineType::Type::CDMA;
+    // assembler = new ExecutableAssembler(TargetType::DORADO);
+    // L3DMA = EngineType::Type::CDMA;
   } else {
     return;
   }

 

sdk/tests/tops/tops_customop_upsample_nearest_test.cc也會報未使用代碼,主要是因為Co當前是固定值,導致第一層if判斷永遠未false,實際上后面這層循環也兼容了Co為1的場景,完全可以去掉:

diff --git a/sdk/tests/tops/tops_customop_upsample_nearest_test.cc b/sdk/tests/tops/tops_customop_upsample_nearest_test.cc
index 2b59c3fc0fc..cf19502b9b4 100644
--- a/sdk/tests/tops/tops_customop_upsample_nearest_test.cc
+++ b/sdk/tests/tops/tops_customop_upsample_nearest_test.cc
@@ -175,32 +175,17 @@ TEST_F(TopsTest, CustomCall_UpSample_Nearest_1) {
   int n_offset = Ho * Wo * Co;
   int h_offset = Wo * Co;
  
-  if (Co == 1) {
-    for (int n = 0; n < N; ++n) {
-      int n_offset = n * n_offset;
-      for (int h = 0; h < Ho; ++h) {
-        int h_index = h / scale_H;
-        for (int w = 0; w < Wo; ++w) {
-          int w_index = w / scale_W;
-          output_ref[n_offset + h * h_offset + w] =
-              image_data[n * Hi * Wi * Ci + h_index * Wi * Ci + w_index * Ci];
-        }
-      }
-    }
-
-  } else {
-    for (int n = 0; n < N; ++n) {
-      int n_offset = n * Ho * Wo * Co;
-      for (int h = 0; h < Ho; ++h) {
-        int h_index = h / scale_H;
-        for (int w = 0; w < Wo; ++w) {
-          int w_index = w / scale_W;
-          for (int c = 0; c < Co; ++c) {
-            int c_index = c / scale_C;
-            output_ref[n_offset + h * h_offset + w * Co + c] =
-                image_data[n * Hi * Wi * Ci + h_index * Wi * Ci + w_index * Ci +
-                           c_index];
-          }
+  for (int n = 0; n < N; ++n) {
+    int n_offset = n * Ho * Wo * Co;
+    for (int h = 0; h < Ho; ++h) {
+      int h_index = h / scale_H;
+      for (int w = 0; w < Wo; ++w) {
+        int w_index = w / scale_W;
+        for (int c = 0; c < Co; ++c) {
+          int c_index = c / scale_C;
+          output_ref[n_offset + h * h_offset + w * Co + c] =
+              image_data[n * Hi * Wi * Ci + h_index * Wi * Ci + w_index * Ci +
+                         c_index];
         }
       }
     }

 

 

3.6.5. 未被調用的inline函數

diff --git a/sdk/lib/umd/tests/sample/memcpy_odma.cc b/sdk/lib/umd/tests/sample/memcpy_odma.cc
index 3cfc9777934..2b565c5e55e 100644
--- a/sdk/lib/umd/tests/sample/memcpy_odma.cc
+++ b/sdk/lib/umd/tests/sample/memcpy_odma.cc
@@ -9,6 +9,7 @@
  
 #include "dtu/umd/dtu.h"
 #include "dtu/umd/dtu_interface.h"
+#include "dtu/umd/dtu_utils.h"
 #include "lib/umd/tests/sample/sample.h"
 #include "lib/umd/tests/sample/sample_assert.h"
  
@@ -991,6 +992,7 @@ static void memcpy_host_to_hbm_mc_scan_sync(void) {
 }
 MAKE_SAMPLE_FROM_FUNCTION(memcpy_host_to_hbm_mc_scan_sync);
  
+#if 0
 static int odma_copy(dtu_mem_handle dst_hdl, u64 dst_offset,
                      dtu_mem_handle src_hdl, u64 src_offset, u64 size,
                      u32 engine_id) {
@@ -1034,6 +1036,7 @@ static int odma_copy(dtu_mem_handle dst_hdl, u64 dst_offset,
   dtu_command_queue_destroy(queue);
   return 0;
 }
+#endif
  
 #define MB (1 * 1024 * 1024)
 #if 0

 

 

其他文件:

sdk/lib/spm/src/buddy_policy.c

sdk/lib/umd/tests/sample/mm_test.cc

 

3.6.6. 未使用的class聲明

diff --git a/dtu_backend/dtu_executor.h b/dtu_backend/dtu_executor.h
index 5149656537f..c361bb15e9d 100644
--- a/dtu_backend/dtu_executor.h
+++ b/dtu_backend/dtu_executor.h
@@ -50,7 +50,6 @@ class ClusterAllocation;
 }
  
 class DTUObject;
-class sr::TaskContext;
 class DTUExecutor : public ::xla::dtu::DTUExecutorInterface {
  public:
   typedef typename sr::TaskContext context_type;

 

3.6.7. 未使用的類型定義

diff --git a/sdk/tests/tops/tops_transform_parameter_test.cc b/sdk/tests/tops/tops_transform_parameter_test.cc
index ee7718e40f1..c7b99fb323d 100644
--- a/sdk/tests/tops/tops_transform_parameter_test.cc
+++ b/sdk/tests/tops/tops_transform_parameter_test.cc
@@ -483,7 +483,6 @@ TEST_P(TopsGraphTransformParameterTest, TopsConv) {
       break;
   }
  
-  typedef float D_TYPE;
   int inputdata_size = input_length * (sizeof(input_data[0]));
  
   topsMemory_t output_mem;

 

 

3.7. 重復定義

tops代碼棧里面各個模塊都分別定義的宏非常多,輪到大家相互include的時候就會有大量重復定義問題,解決這個問題的根本解決方案還是需要提取一些公共的頭文件,但各模塊當前又不希望相互間存在依賴,當前只能用ifndef來包起來臨時規避:

diff --git a/sdk/lib/umd/tests/sample/loop_task.cc b/sdk/lib/umd/tests/sample/loop_task.cc
index 2869f571029..099dcaf53f1 100644
--- a/sdk/lib/umd/tests/sample/loop_task.cc
+++ b/sdk/lib/umd/tests/sample/loop_task.cc
@@ -22,12 +22,14 @@
  
 using namespace std;
  
+#ifndef EFCHECK
 #define EFCHECK(__statement__)                                       \
   do {                                                               \
     sts = __statement__;                                             \
     if (sts != DTU_SUCCESS)                                          \
       failed_assertion("Failed:", __FILE__, __FUNCTION__, __LINE__); \
   } while (0)
+#endif
  
 template <int N>
 struct DataLayout {

 

其他文件:

sdk/lib/umd/tests/sample/sample_assert.h

system_test/tools/vpd_cycle/vpd_cycle.c

 

3.8. 入參初始化順序異常

這個就出現過一次:

diff --git a/sdk/include/factor/func.h b/sdk/include/factor/func.h
index af24b782ed1..50137f7c5dd 100644
--- a/sdk/include/factor/func.h
+++ b/sdk/include/factor/func.h
@@ -4163,10 +4163,10 @@ struct FACTOR_EXPORT ConvGenDescParams {
                     int64_t Co, int64_t R, int64_t S)
       : conv_type(conv_type),
         data_format(data_format),
-        stride(stride),
-        dailations(dailations),
         opt_level(opt_level),
         padding(padding),
+        stride(stride),
+        dailations(dailations),
         N(N),
         Hi(Hi),
         Wi(Wi),

 

其他修改的文件:

sdk/tests/tops/tops_convert_parameter_test.cc

 

 

3.9. 類型申明不全

clang對直接聲明一個class,但包含的頭文件里面找不到完整定義的會報錯。

要找到tf頭文件的定義順序是個非常麻煩的事情,幸好clang會自動搜索頭文件,所以用clang的宏包起來了。

diff --git a/sdk/lib/cpu/cpu_func_runtime_context.h b/sdk/lib/cpu/cpu_func_runtime_context.h
index 530b4def8ad..62ba4099e6e 100644
--- a/sdk/lib/cpu/cpu_func_runtime_context.h
+++ b/sdk/lib/cpu/cpu_func_runtime_context.h
@@ -23,6 +23,10 @@
 #include <tuple>
 #include <vector>
  
+#if defined(__clang__)
+#include "tensorflow/compiler/xla/service/cpu/simple_orc_jit.h"
+#endif
+
 namespace xla {
 namespace cpu {
 class SimpleOrcJIT;

 

3.10. 數組初始化

3.10.1. 確實必須是變長數組的使用new[]()和delete[]來申請和釋放內存

diff --git a/sdk/lib/cpu_ops/naive/dot.cc b/sdk/lib/cpu_ops/naive/dot.cc
index f4bb6b7d877..be7ddb0ab23 100755
--- a/sdk/lib/cpu_ops/naive/dot.cc
+++ b/sdk/lib/cpu_ops/naive/dot.cc
@@ -31,9 +31,9 @@ void vectorMul_4_4(const int64_t M, const int64_t N, const int64_t K, outT* out,
     int64_t m_stride = (M - m) >= stride ? stride : (M - m);
     for (int64_t n = 0; n < N;) {
       int64_t n_stride = (N - n) >= stride ? stride : (N - n);
-      register outT out_reg[m_stride * n_stride] = {0};
-      register lhsT lhs_reg[m_stride];
-      register rhsT rhs_reg[n_stride];
+      register outT* out_reg = new outT[m_stride * n_stride]();
+      register outT* lhs_reg = new outT[m_stride]();
+      register outT* rhs_reg = new outT[n_stride]();
       for (int64_t i = 0; i < K; i++) {
         for (auto idx = 0; idx < m_stride; idx++) {
           lhs_reg[idx] = ELEMENT(lhs, m + idx, i, K);
@@ -53,6 +53,9 @@ void vectorMul_4_4(const int64_t M, const int64_t N, const int64_t K, outT* out,
         }
       }
       n += n_stride;
+      delete[] rhs_reg;
+      delete[] lhs_reg;
+      delete[] out_reg;
     }
     m += m_stride;
   }

 

其他類似修改:

sdk/lib/umd/tests/sample/mm_test.cc

sdk/lib/cpu_ops/naive/dot.cc

sdk/lib/factor/codegen/macro_instruction/minst_conv2d_bpi.cc

 

 

3.10.2. 實際語義是定長數組的,通過加const修飾來解決

這種在test里面非常多,大家定義數組的時候都沒有習慣把數組的長度定義加上const修飾符,這樣不斷可以增加執行效率,也不容易出錯。

 

diff --git a/sdk/sample/batchnormalInference/tops_batchnormalInference.cc b/sdk/sample/batchnormalInference/tops_batchnormalInference.cc
index 8df67784dec..1db6440e22c 100644
--- a/sdk/sample/batchnormalInference/tops_batchnormalInference.cc
+++ b/sdk/sample/batchnormalInference/tops_batchnormalInference.cc
@@ -67,16 +67,16 @@ void topsBatchNormalInferenceNHWC() {
   topsTensorDescriptor_t xDesc;
   topsTensorDescriptor_t yDesc;
  
-  int x_c = 4;
-  int x_h = 2;
-  int x_n = 3;
-  int x_w = 2;
+  const int x_c = 4;
+  const int x_h = 2;
+  const int x_n = 3;
+  const int x_w = 2;
  
-  int scaleNums = x_c;
-  int y_c = x_c;
-  int y_h = x_h;
-  int y_n = x_n;
-  int y_w = x_w;
+  const int scaleNums = x_c;
+  const int y_c = x_c;
+  const int y_h = x_h;
+  const int y_n = x_n;
+  const int y_w = x_w;
  
   topsContext_t context;
   int clusters[] = {0};
@@ -90,7 +90,7 @@ void topsBatchNormalInferenceNHWC() {
   topsSetTensorDescriptor(yDesc, TOPS_TENSOR_NHWC, TOPS_DATA_FLOAT, y_n, y_c,
                           y_h, y_w);
  
-  int inputdata_num = x_c * x_h * x_n * x_w;
+  const int inputdata_num = x_c * x_h * x_n * x_w;
  
   D_TYPE InputData[inputdata_num] = {
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
@@ -170,16 +170,16 @@ void topsBatchNormalInferenceCHNW() {
   topsTensorDescriptor_t xDesc;
   topsTensorDescriptor_t yDesc;
  
-  int x_c = 4;
-  int x_h = 2;
-  int x_n = 3;
-  int x_w = 2;
+  const int x_c = 4;
+  const int x_h = 2;
+  const int x_n = 3;
+  const int x_w = 2;
  
-  int scaleNums = x_c;
-  int y_c = x_c;
-  int y_h = x_h;
-  int y_n = x_n;
-  int y_w = x_w;
+  const int scaleNums = x_c;
+  const int y_c = x_c;
+  const int y_h = x_h;
+  const int y_n = x_n;
+  const int y_w = x_w;
  
   topsContext_t context;
   int clusters[] = {0};
@@ -193,7 +193,7 @@ void topsBatchNormalInferenceCHNW() {
   topsSetTensorDescriptor(yDesc, TOPS_TENSOR_CHNW, TOPS_DATA_FLOAT, y_n, y_c,
                           y_h, y_w);
  
-  int inputdata_num = x_c * x_h * x_n * x_w;
+  const int inputdata_num = x_c * x_h * x_n * x_w;
  
   D_TYPE InputData[inputdata_num] = {
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
@@ -275,16 +275,16 @@ void topsBatchNormalInferenceBoundary() {
   topsTensorDescriptor_t xDesc;
   topsTensorDescriptor_t yDesc;
  
-  int x_c = 4;
-  int x_h = 2;
-  int x_n = 50;
-  int x_w = 2;
+  const int x_c = 4;
+  const int x_h = 2;
+  const int x_n = 50;
+  const int x_w = 2;
  
-  int scaleNums = x_c;
-  int y_c = x_c;
-  int y_h = x_h;
-  int y_n = x_n;
-  int y_w = x_w;
+  const int scaleNums = x_c;
+  const int y_c = x_c;
+  const int y_h = x_h;
+  const int y_n = x_n;
+  const int y_w = x_w;
  
   topsContext_t context;
   int clusters[] = {0};
@@ -298,7 +298,7 @@ void topsBatchNormalInferenceBoundary() {
   topsSetTensorDescriptor(yDesc, TOPS_TENSOR_CHNW, TOPS_DATA_FLOAT, y_n, y_c,
                           y_h, y_w);
  
-  int inputdata_num = x_c * x_h * x_n * x_w;
+  const int inputdata_num = x_c * x_h * x_n * x_w;
  
   D_TYPE InputData[inputdata_num];
   for (int i = 0; i < inputdata_num; i++) {
@@ -380,16 +380,16 @@ void topsBatchNormalInferenceScaleOffset() {
   topsTensorDescriptor_t xDesc;
   topsTensorDescriptor_t yDesc;
  
-  int x_c = 4;
-  int x_h = 2;
-  int x_n = 3;
-  int x_w = 2;
+  const int x_c = 4;
+  const int x_h = 2;
+  const int x_n = 3;
+  const int x_w = 2;
  
-  int scaleNums = x_c;
-  int y_c = x_c;
-  int y_h = x_h;
-  int y_n = x_n;
-  int y_w = x_w;
+  const int scaleNums = x_c;
+  const int y_c = x_c;
+  const int y_h = x_h;
+  const int y_n = x_n;
+  const int y_w = x_w;
  
   topsContext_t context;
   int clusters[] = {0};
@@ -403,7 +403,7 @@ void topsBatchNormalInferenceScaleOffset() {
   topsSetTensorDescriptor(yDesc, TOPS_TENSOR_NHWC, TOPS_DATA_FLOAT, y_n, y_c,
                           y_h, y_w);
  
-  int inputdata_num = x_c * x_h * x_n * x_w;
+  const int inputdata_num = x_c * x_h * x_n * x_w;
  
   D_TYPE InputData[inputdata_num] = {
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

 

還有很多,僅列出文件名:

sdk/sample/batchnormalTraining/tops_batchnormalTraining.cc

sdk/sample/broadcast/tops_broadcast.cc

sdk/sample/resnet50/TopsOpApi.cc

sdk/tests/tops/tops_batchnormalBackward_test.cc

sdk/tests/tops/tops_batchnormalTraining_test.cc

sdk/tests/tops/tops_concat_test.cc

sdk/tests/tops/tops_convert_test.cc

sdk/tests/tops/tops_customop_test.cc

sdk/tests/tops/tops_scatter_test.cc

sdk/tests/tops/tops_bnForwardTrainingEx_unit_test.cc (這個文件修改了1800+行,逼得我單獨成了一個patch)

sdk/tests/tops/tops_broadcast_test.cc

sdk/tests/tops/tops_concat_test.cc

sdk/tests/tops/tops_convert_test.cc

sdk/tests/tops/tops_descriptor_test.cc

sdk/tests/tops/tops_pad_test.cc

sdk/tests/tops/tops_scatter_test.cc

 

3.11. 函數原型中的auto

clang禁止在函數原型中使用auto入參,我理解主要出於以下考慮:

1、如果該函數作為接口暴露接口出去,調用者應該用什么類型的實參?

2、如果多個調用,使用的實參類型不一樣,函數體類對入參進行處理時是否會觸發隱式的類型轉換?而clang對存在信息損耗的隱式的類型轉換是嚴格禁止的。

3、如果多個調用時,入參本身使用的存儲長度不一樣,是否會導致堆棧被破壞?例如有些用int,有些用long,函數具體編譯過程中是應該實例化出來2個實體,還是單個實體?

4、函數翻譯成C函數的時候,函數名稱應該怎么生成?C++函數名稱轉換為C函數名稱的時候,可沒有考慮auto入參的轉換規則。

auto入參的問題,主要體現在sdk/lib/tuner/pavo/和sdk/tests/factor/targets/pavo/dnn/conv/目錄中:

diff --git a/sdk/lib/tuner/pavo/pavo_conv_dataflow3_bpi_non4c_impl.cc b/sdk/lib/tuner/pavo/pavo_conv_dataflow3_bpi_non4c_impl.cc
index dba48418fc2..2c59bc4eda6 100644
--- a/sdk/lib/tuner/pavo/pavo_conv_dataflow3_bpi_non4c_impl.cc
+++ b/sdk/lib/tuner/pavo/pavo_conv_dataflow3_bpi_non4c_impl.cc
@@ -31,8 +31,8 @@ namespace factor {
 using namespace hlir;
  
 static std::vector<std::vector<int64_t>> build_dim(
-    std::vector<int64_t> dim_count, auto cores_on_dim, auto sip_cord,
-    int64_t sip_num) {
+    std::vector<int64_t> dim_count, std::vector<int64_t> cores_on_dim,
+    std::vector<std::vector<int64_t>> sip_cord, int sip_num) {
   std::vector<int64_t> dim_count1 = {
       dim_count[0] / cores_on_dim[0], dim_count[1] / cores_on_dim[1],
       dim_count[2] / cores_on_dim[2], dim_count[3] / cores_on_dim[3]};

 

其他函數的修改類似,僅列出文件名:

sdk/lib/tuner/pavo/pavo_conv_dataflow5_bpi_non4c_impl.cc
sdk/lib/tuner/pavo/
pavo_conv_dataflow7_bpi_non4c_impl.cc

sdk/lib/tuner/pavo/pavo_conv_dataflow1_bpi_non4c_impl.cc

sdk/lib/tuner/pavo/pavo_conv_dataflow2_bpi_non4c_impl.cc

sdk/lib/tuner/pavo/pavo_conv_dataflow3_1_forward_non4c_impl.cc

sdk/lib/tuner/pavo/pavo_conv_dataflow5_1_forward_non4c_impl.cc

sdk/lib/tuner/pavo/pavo_conv_dataflow6_bpk_non4c_impl.cc

sdk/lib/tuner/pavo/pavo_conv_dataflow7_1_forward_non4c_impl.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c6s_bpi_dataflow1_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_bpk_1c4s_dataflow7_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_bpk_1c6s_dataflow6_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_ff_dataflow3_1_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_ff_dataflow5_1_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_ff_dataflow7_1_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow2_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow3_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow5_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow7_template_test.cc

sdk/lib/ops/common/dtu_elementwise_fusion_impl.cc

sdk/tests/llir/dma_test/slice_dma_test.cc

sdk/tests/llir/dma_test/broadcast_dma_test.cc

sdk/tests/llir/dma_test/deslice_dma_test.cc

sdk/tests/llir/dma_test/mirror_dma_test.cc

sdk/tests/llir/dma_test/padding_dma_test.cc

sdk/tests/llir/dma_test/subsampling_dma_test.cc

sdk/tests/llir/dma_test/transpose_dma_test.cc

 

3.12. strlen返回值不作為常量類型的處理

clang里面把strlen返回值當做變量處理,如果要作為const來使用,需要自己定義函數:

diff --git a/sdk/lib/profile/topspti/reader/helper.h b/sdk/lib/profile/topspti/reader/helper.h
index c56502bef86..1d642e2431e 100644
--- a/sdk/lib/profile/topspti/reader/helper.h
+++ b/sdk/lib/profile/topspti/reader/helper.h
@@ -28,7 +28,6 @@
  
 #include <cstring>
 #include <string>
-
 #include "utils/utils.h"
  
 namespace topspti2 {
@@ -36,6 +35,10 @@ namespace topspti2 {
 #define TENSOR_MARK "!dtu.tensor<"
 #define TENSOR_MARK_SZ (sizeof(TENSOR_MARK) - 1)
  
+int constexpr CONSTEXPR_STRLEN(const char *str) {
+  return *str ? 1 + CONSTEXPR_STRLEN(str + 1) : 0;
+}
+
 static inline bool HasDPF(const std::string &product) {
   return (product != "" && product != "unknown" && product != "T10" &&
           product != "T11" && product != "T10s" && product != "I10");
@@ -123,7 +126,7 @@ static inline bool FastParseSizeFromTensor(const std::string &tensor,
   if (std::string::npos == pos) {
     return false;
   }
-  constexpr int tensor_mark_sz = strlen(TENSOR_MARK);
+  constexpr int tensor_mark_sz = CONSTEXPR_STRLEN(TENSOR_MARK);
   const char *data = tensor.c_str();
   while (pos != std::string::npos) {
     pos += tensor_mark_sz;
@@ -202,7 +205,7 @@ static inline bool FastParseSizeFromMemref(const std::string &memref,
   if (0 != pos) {
     return false;
   }
-  constexpr auto memref_mark_sz = strlen(MEMREF_MARK);
+  constexpr auto memref_mark_sz = CONSTEXPR_STRLEN(MEMREF_MARK);
   pos += memref_mark_sz;
   int64_t prod = 1;
   size_t lz = memref.size();
@@ -261,7 +264,7 @@ static inline bool ParseTensorInfoFromString(const std::string &input,
                                              TensorInfoValue &tiv) {
   tiv = TensorInfoValue();
   constexpr const char *const szstr = "size:";
-  constexpr int sz = strlen(szstr);
+  constexpr int sz = CONSTEXPR_STRLEN(szstr);
  
   if (input.size() > sz && !strncmp(input.c_str(), szstr, sz)) {
     tiv.size = stoll(input.substr(sz));

 

其他類似修改:

sdk/lib/profile/libprofile_defs.h

3.13. 其他語法問題

3.13.1. lambda語法問題

參見 Lambda expressions (since C++11) - cppreference.com,lambda表達式的capture用法如下:

a comma-separated list of zero or more captures, optionally beginning with a capture-default.

See below for the detailed description of captures.

A lambda expression can use a variable without capturing it if the variable

  • is a non-local variable or has static or thread local storage duration (in which case the variable cannot be captured), or
  • is a reference that has been initialized with a constant expression.

A lambda expression can read the value of a variable without capturing it if the variable

  • has const non-volatile integral or enumeration type and has been initialized with a constant expression, or
  • is constexpr and has no mutable members.

上面的描述是說,下面這幾種情況不需要指定capture:

1)非局部變量(全局變量)

2)static變量

3) thread local 變量(這種情況下不是不需要指定,是指定了也用不了)

4)常量表達式初始化的對象的引用

5)常量表達式初始化的非volatile整型或者枚舉類型(只讀訪問)

6)不帶可變成員的常量表達式(只讀訪問)

sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc里面使用的module_str是全局變量,不需要指定捕獲,原來的寫法在gcc5上可以編譯通過,但gcc7和clang下面會直接報錯:
diff --git a/sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc b/sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc
index 6f17f27ca4d..70506e38506 100644
--- a/sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc
+++ b/sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc
@@ -38,7 +38,7 @@ TEST(MTTest, PassMgr) {
   std::vector<std::thread> th_vec;
   th_vec.reserve(thread_count);
   for (size_t i = 0; i < thread_count; ++i) {
-    th_vec.emplace_back([&module_str]() {
+    th_vec.emplace_back([]() {
       mlir::MLIRContext context;
       mlir::OwningModuleRef module =
           mlir::parseSourceString(module_str, &context);

 

下面這個寫法由於this指針雖然指定了捕獲,但沒有使用,所以會有“expression result unused [-Wunused-value]”告警,設置了捕獲相當於在lambda函數里面做了一次聲明,如果未使用會有告警:

diff --git a/tools/logging/tests/logging/test_log_old_api.cc b/tools/logging/tests/logging/test_log_old_api.cc
index 638a84cac6b..91cbf6b2676 100644
--- a/tools/logging/tests/logging/test_log_old_api.cc
+++ b/tools/logging/tests/logging/test_log_old_api.cc
@@ -18,7 +18,7 @@ class OldLogTest : public testing::Test {
     Test::SetUp();
     RegisterLogTo(this->pLog);
     pLog->setCallback(
-        [this](const std::string &msg) { std::cerr << msg << std::endl; });
+        [](const std::string &msg) { std::cerr << msg << std::endl; });
     pLog->SetAutoClear(true);
   }

 

類似的,sdk/lib/ops/common/dtu_scatter_impl.cc里面將常量alignment在捕獲中定義也是錯誤的:

diff --git a/sdk/lib/ops/common/dtu_scatter_impl.cc b/sdk/lib/ops/common/dtu_scatter_impl.cc
index 032c19e6ef9..99058631d4c 100644
--- a/sdk/lib/ops/common/dtu_scatter_impl.cc
+++ b/sdk/lib/ops/common/dtu_scatter_impl.cc
@@ -92,7 +92,7 @@ bool predicate_func(int64_t i) {
  
 // alloc_ memory with alignment of 128 byte.
 const uint32_t alignment = 128;
-auto GetAlignedSize = [alignment](uint64_t size) {
+auto GetAlignedSize = [](uint64_t size) {
   return (size + alignment - 1) / alignment * alignment;
 };

 

3.13.2. return語句中的move調用

在return語句中使用std::move會使編譯器的copy elision失效,下面修改之前的代碼clang會上報告警“moving a local object in a return statement prevents copy elision [-Wpessimizing-move]”,什么是copy elision?

Copy elision - cppreference.com上的定義如下:Omits copy and move (since C++11) constructors, resulting in zero-copy pass-by-value semantics.

也就是說,如果不調用std::move,在return的過程中,編譯器會盡量省略對象的copy或者move操作,達到零拷貝的效果;如果調用了std::move,會強制要求編譯器調用對象的move構造函數。顯然,后者更昂貴。

 
diff --git a/tools/logging/tests/logging/log_to_test.h b/tools/logging/tests/logging/log_to_test.h
index de91f49b34d..0c92e2acd8a 100644
--- a/tools/logging/tests/logging/log_to_test.h
+++ b/tools/logging/tests/logging/log_to_test.h
@@ -21,7 +21,7 @@ class LogToString : public LogDestination {
     if (autoClear_) {
       Clear();
     }
-    return std::move(ret);
+    return ret;
   }
   void SetAutoClear(bool autoClear) { autoClear_ = autoClear; }
   void Clear() { str_.clear(); }

 

3.13.3. 使用未初始化的對象

sdk/tests/runtime/device_manager_test.cc在修改前的版本中,如果result.ok()為false,則cluster沒有機會初始化就會被后面的device->ClusterMemoryHandle()函數當做入參使用,會觸發很惡劣的影響:
diff --git a/sdk/tests/runtime/device_manager_test.cc b/sdk/tests/runtime/device_manager_test.cc
index cf9075367a7..0adf469da8b 100644
--- a/sdk/tests/runtime/device_manager_test.cc
+++ b/sdk/tests/runtime/device_manager_test.cc
@@ -109,14 +109,13 @@ TEST_F(DeviceManagerTest, ClusterMemoryHandle_SuccessFail) {
   dtu::driver::DeviceManager* device = dtu::driver::DeviceManager::instance();
   device->AcquireDevice(0);
   dtu::StatusOr<dtu_cluster> result = device->Cluster(0, 0);
-  dtu_cluster cluster;
   if (result.ok()) {
-    cluster = std::move(result.ValueOrDie());
-    EXPECT_NE(cluster, nullptr);
+    dtu_cluster cluster = std::move(result.ValueOrDie());
+    dtu::StatusOr<dtu_mem_handle> result1 =
+        device->ClusterMemoryHandle(cluster);
   } else {
     EFLOG(FATAL) << "Get ClusterIds error: " << result.status();
   }
-  dtu::StatusOr<dtu_mem_handle> result1 = device->ClusterMemoryHandle(cluster);
   EXPECT_EQ(result.ok(), true);
   EXPECT_NE(result.ValueOrDie(), nullptr);
   device->ReleaseCluster(0, 0);

 

3.13.4. clang禁止使用括號表達式初始化數組

下面的修改前的代碼clang會報錯"parenthesized initialization of a member array is a GNU extension [-Wgnu-array-member-paren-init]",從gcc回報告警"list-initializer for non-class type must not be parenthesized":
diff --git a/sdk/tests/tops/tops_broadcast_parameter_test.cc b/sdk/tests/tops/tops_broadcast_parameter_test.cc
index 831d9d23791..321c56171f3 100644
--- a/sdk/tests/tops/tops_broadcast_parameter_test.cc
+++ b/sdk/tests/tops/tops_broadcast_parameter_test.cc
@@ -139,14 +139,18 @@ class TopsBroadcastParameterTest
 };
  
 TopsBroadcastParameterTest::TopsBroadcastParameterTest()
-    : x_desc_dim({GetParam().x.h, GetParam().x.w}),
-      y_desc_dim(
-          {GetParam().y.n, GetParam().y.c, GetParam().y.h, GetParam().y.w}),
-      broadcast_dims(
-          {GetParam().broadcast_dim.dim_1, GetParam().broadcast_dim.dim_2}),
-      input_length(GetParam().x.h * GetParam().x.w),
+    : input_length(GetParam().x.h * GetParam().x.w),
       output_length(GetParam().y.n * GetParam().y.c * GetParam().y.h *
-                    GetParam().y.w) {}
+                    GetParam().y.w) {
+  x_desc_dim[0] = GetParam().x.h;
+  x_desc_dim[1] = GetParam().x.w;
+  y_desc_dim[0] = GetParam().y.n;
+  y_desc_dim[1] = GetParam().y.c;
+  y_desc_dim[2] = GetParam().y.h;
+  y_desc_dim[3] = GetParam().y.w;
+  broadcast_dims[0] = GetParam().broadcast_dim.dim_1;
+  broadcast_dims[1] = GetParam().broadcast_dim.dim_2;
+}
  
 void TopsBroadcastParameterTest::freeDebugInfo() {
   if (input_mem == nullptr) {

 

類似的修改還有:

sdk/tests/tops/tops_dot_parameter_test.cc

sdk/tests/tops/tops_pad_parameter_test.cc

3.13.5. clang的泛型函數的實例化必須有相關調用才會觸發

因為構造函數在sdk自身代碼里面沒有被調用,導致libdtu_sdk.so里面也沒有相關符號,但測試函數需要使用,不得已加了個樁函數來觸發構造函數實例化。

diff --git a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
index 7e366337561..41fb573a562 100644
--- a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
+++ b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
@@ -248,4 +248,13 @@ vector<string> KernelCode<T>::LinkArgs() {
   return args;
 }
  
+// stab function for undefined reference to
+// 'dtu_umd::KernelCode<dtu_umd::PavoKernel>::KernelCode(llvm::StringRef)'
+void kernel_code_stab() {
+  StringRef file_name = "stab_file";
+  KernelCode<PavoKernel> k_stab1(file_name);
+  KernelCode<DoradoKernel> k_stab2(file_name);
+  KernelCode<LeoKernel> k_stab3(file_name);
+}
+
 }  // namespace dtu_umd

 

3.13.6. clang的constexpr中不允許定義需要內存處理的復雜對象

下面的模板定義中需要新生成vector對象,該對象需要在構造函數中使用內存相關處理,不修改會報錯“variable of non-literal type 'std::vector<size_t>' (aka 'vector<unsigned long>') cannot be defined in a constexpr function”,將模板中的constexpr標識刪掉之后正常。

查看c++標准3.9/10可以看到literal type的定義(相當於常量或者簡單變量),unpack_seq_to_vector里面的vector不屬於簡單變量或者簡單變量的數組,如果換成array應該可以通過,不過調用這個函數的地方都要修改:

A type is a literal type if it is:

  • void; or

  • a scalar type; or

  • a reference type; or

  • an array of literal type; or

  • a class type (Clause 9) that has all of the following properties:

    • it has a trivial destructor,

    • it is an aggregate type (8.5.1) or has at least one constexpr constructor or constructor template that is not a copy or move constructor, and

    • all of its non-static data members and base classes are of non-volatile literal types

diff --git a/sdk/tests/hlir/cc_tests/hlir_utils_test.cc b/sdk/tests/hlir/cc_tests/hlir_utils_test.cc
index bdc21e3f317..24e37191fc4 100644
--- a/sdk/tests/hlir/cc_tests/hlir_utils_test.cc
+++ b/sdk/tests/hlir/cc_tests/hlir_utils_test.cc
@@ -152,7 +152,7 @@ TEST(HlirUtilTest, ConstSplatValue) {
 }
  
 template <size_t... Idx>
-constexpr static auto unpack_seq_to_vector(hlir::IndexSeq<Idx...>) {
+static auto unpack_seq_to_vector(hlir::IndexSeq<Idx...>) {
   std::vector<size_t> ret = {Idx...};
   return ret;
 }

 

3.13.7. clang的虛函數的重載需要加上顯式的override關鍵字

 

diff --git a/tools/logging/include/logging/to/file.h b/tools/logging/include/logging/to/file.h
index bdda687afdc..c6d39779bde 100644
--- a/tools/logging/include/logging/to/file.h
+++ b/tools/logging/include/logging/to/file.h
@@ -18,7 +18,7 @@ class LogToFile : public LogDestination {
   DISALLOW_COPY_AND_ASSIGN(LogToFile);
  
   static pointer Create(const std::string &file_name);
-  void Message(int level, const std::string &message);
+  void Message(int level, const std::string &message) override;
   void Flush() override;
  
  private:

 

其他類似修改:

tools/logging/include/logging/to/std_err.h

3.13.8. alignas使用問題

alignas本意是定義結構體的時候,為了優化結構體的訪問效率,讓結構體的存放盡量靠近大的整數邊界,和c語言里面的pack不是一個概念。所以pack可以對所有對象強制指定pack(1)來確保內存訪問不移位,alignas的設置卻要求比結構體成員的最大長度要大:

The object or the type declared by such a declaration will have its alignment requirement equal to the strictest (largest) non-zero expression of all alignas specifiers used in the declaration, unless it would weaken the natural alignment of the type.

下面定義的結構體中有uint16_t的成員,理論上最小alignas是2,所以不能用alignas(1)來修飾:

diff --git a/sdk/lib/hlir/utils/types.h b/sdk/lib/hlir/utils/types.h
index 87aee25fe31..90cabe7bdb6 100644
--- a/sdk/lib/hlir/utils/types.h
+++ b/sdk/lib/hlir/utils/types.h
@@ -151,13 +151,13 @@ enum class CompareType {
  
 // define raw data type
 // lower to factor need raw data
-struct alignas(1) raw_bf16_ty {
+struct alignas(2) raw_bf16_ty {
   uint16_t data;
 };
 static_assert(sizeof(raw_bf16_ty) == 2, "");
  
 // half
-struct alignas(1) raw_fp16_ty {
+struct alignas(2) raw_fp16_ty {
   uint16_t data;
 };
 static_assert(sizeof(raw_fp16_ty) == 2, "");

 

3.14. 為了解決告警順帶做的一些優化

3.14.1. 冗余的計算

tools/logging/lib/logging/log_message.cc當時本來是為了解決變長數組的初始化問題,但自己閱讀發現把timeval的毫秒和秒先計算成一個總的毫秒之后並沒有使用,后面又直接換算成秒和毫秒再用的,所以這個換算實際上沒用,和代碼onwer確認之后刪掉相關冗余計算。

diff --git a/tools/logging/lib/logging/log_message.cc b/tools/logging/lib/logging/log_message.cc
index 77fa33fe129..8d4cd48b348 100644
--- a/tools/logging/lib/logging/log_message.cc
+++ b/tools/logging/lib/logging/log_message.cc
@@ -25,18 +25,15 @@ std::string LogMessage::GenerateMessage() {
   std::stringstream os;
   struct timeval tv;
   gettimeofday(&tv, nullptr);
-  uint64_t now_micros = static_cast<uint64_t>(tv.tv_sec) * 1000000 + tv.tv_usec;
-  time_t now_seconds = static_cast<time_t>(now_micros / 1000000);
-  int32_t micros_remainder = static_cast<int32_t>(now_micros % 1000000);
   const size_t time_buffer_size = 50;
-  struct tm now_time = {0};
-  char time_buffer[time_buffer_size];
-  localtime_r(&now_seconds, &now_time);
+  struct tm now_time = tm();
+  char time_buffer[time_buffer_size]={0};
+  localtime_r(&tv.tv_sec, &now_time);
   strftime(time_buffer, time_buffer_size, "%Y-%m-%d %H:%M:%S", &now_time);
  
   os << time_buffer << ".";
   os.width(6);
-  os << micros_remainder << ": ";
+  os << tv.tv_usec << ": ";
   os << "DIWEF"[severity_];
   if(msg_code_) {
     os << msg_code_;

 

3.14.2. 引用指針和空指針的冗余比較

對象的引用是指某個對象的地址,肯定不是空,所以將它和nullptr做比較沒有意義:

diff --git a/tools/logging/lib/logging/log_module.cc b/tools/logging/lib/logging/log_module.cc
index f40e13d6fea..3ea150b37a0 100644
--- a/tools/logging/lib/logging/log_module.cc
+++ b/tools/logging/lib/logging/log_module.cc
@@ -27,10 +27,6 @@ LogModuleMgr &LogModuleMgr::Instance() {
 }
  
 void LogModuleMgr::UpdateModuleMaskFromEnv(const std::string &env) {
-  if (&env == nullptr) {
-    return;
-  }
-
   EFLOG(DBG) << "Init Logging Module" << std::endl;
   EFLOG(DBG) << "ENFLAME_LOG_DEBUG_MOD = " << env << std::endl;
   auto tokens = strutil::split(env, ',');
@@ -91,4 +87,4 @@ void LogModuleMgr::SetModuleOff(EF_LOG_MOD module) {
   mod_status_[static_cast<int>(module)] = false;
 }
  
-} // namespace dtu
\ No newline at end of file
+} // namespace dtu

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM