clang9适配一阶段总结


1. 概述

截止2021年11月25日,clang9完成sdk/gtest/dsopt模块的编译。

参照下面的脚本下载了所有[TR-16607] clang9交叉编译工具链制作和验证 - Enflame Company JIRA相关的修改,包含merged和当前还是open状态的修改:

怎么从gerrit批量导出详细的patch - 周荣华_Ronghua - enflame wiki

 

特地说明一下,gerrit的query命令里面不能有括号,所以实际如果存在多个条件的复杂联合时,默认是AND运算,如果想使用OR运算的话,需要把多个可选表达式用OR连接起来。

 

简单统计了一下,新增3924行代码,删除4164行代码:

PS D:\code> grep "^+[^+]" .\diffrecord.txt |wc
   3924   24785  152346
PS D:\code> grep "^-[^-]" .\diffrecord.txt |wc
   4164   23159  147430

 

前期修改的时候,由于打开了-Werr选项,所以有一些是不太重要的告警,由于告警实在太多,后期将-Werr临时先关闭了,只保留了部分特定的Werr选项。

另外,由于tops下面的代码中从大的整型向小的整型隐式转换的非常多,后面还用-Wno-c++11-narrowing临时关闭了相关告警。

 

2. 问题发现和解决的方法

如果每次发现一个问题之后,修改完之后,再走全量编译,通常非常耗时,下面的方法可以获取单个的编译或者链接命令,便于针对性验证。

2.1. cmake的编译命令获取

cmake有编译字典,在cmake_build(敲cmake命令的目录,可能是其他目录)目录下会生成一个“compile_commands.json”文件,里面记录了所有.c/.cc/.cpp生成.o的目录和完整命令,例如想知道

hlir_utils_test.cc的编译命令,可以用下面的途径获取:
grep hlir_utils_test.cc compile_commands.json
  "command": "/opt/efb/clang9/bin/clang++  -DLLVM_DISABLE_ABI_BREAKING_CHECKS_ENFORCING -D_GLIBCXX_USE_CXX11_ABI=0 -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/sdksrc/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/sdksrc/include/_virtual_includes/include/dtu -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/lib/umd/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/ef_log/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/sdk -I/home/ronghua.zhou/clang1_build/tops/sdk/lib -I/home/ronghua.zhou/clang1_build/tops/sdk/lib/cpu_ops -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/llvm-project/llvm/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/llvm-project/mlir/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/org_tensorflow -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/eigen_archive -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/com_google_absl -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/com_google_protobuf/src -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/dtu_sdk/bazel-bin -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/llvm-project/llvm/utils/unittest/googlemock/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/com_googlesource_code_re2 -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/lib -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/org_tensorflow -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/llvm-project/llvm/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/llvm-project/mlir/include -isystem /home/ronghua.zhou/clang1_build/tops/3rdparty/googletest/include -isystem /home/ronghua.zhou/clang1_build/tops/3rdparty/googletest  -O3 -g0 -DNDEBUG -fPIE   -m64 -march=x86-64 -mtune=generic -Werror=array-bounds -Werror=empty-body -Werror=format-extra-args -Werror=incompatible-pointer-types -Werror=array-bounds-pointer-arithmetic -Werror=c++-compat -Werror=shift-count-overflow -Werror=sizeof-pointer-memaccess -Werror=for-loop-analysis -Werror=unused-label -Werror=delete-incomplete -Werror=empty-translation-unit -Werror=unused-local-typedef -Werror=gnu-case-range -Werror=mismatched-new-delete -Werror=infinite-recursion -Werror=unreachable-code -Werror=sometimes-uninitialized -Werror=c++14-binary-literal -Werror=implicit-fallthrough -Werror=constant-logical-operand -Werror=exceptions -fcxx-exceptions -Werror=extra-tokens -Werror=format -Werror=format-security -Werror=header-guard -Werror=literal-conversion -Werror=null-conversion -Werror=pointer-bool-conversion -Werror=shift-overflow -Werror=tautological-constant-out-of-range-compare -Werror=tautological-pointer-compare -Werror=varargs -Wdouble-promotion -Wno-error=extern-c-compat -Wall -Wno-c++11-narrowing -Wextra -fsanitize=address -fno-omit-frame-pointer -std=gnu++14 -std=gnu++14 -o sdk/tests/hlir/cc_tests/CMakeFiles/hlir_utils_test.dir
hlir_utils_test.cc.o -c /home/ronghua.zhou/clang1_build/tops/sdk/tests/hlir/cc_tests/hlir_utils_test.cc",
  "file": "/home/ronghua.zhou/clang1_build/tops/sdk/tests/hlir/cc_tests/hlir_utils_test.cc"

 

 

2.2. bazel的编译命令获取

https://github.com/vincent-picaud/Bazel_and_CompileCommands

上面这个开源项目提到可以用–experimental_action_listener=//tools/actions:generate_compile_commands_listener到bazel命令的方式来实现接收编译命令,但我用了几次没有成功,最终改为在编译过程中用原始的ps命令来获取,例如想获取hlir_utils_test.ccbian编译命令可以用下面的命令:

ps -elf |grep hlir_utils_test.cc

另外,bazel命令后面加上-s参数也可以达到获取后续编译命令的效果。

2.3. 链接命令的获取

如果知道链接的具体目标文件,可以参照2.2的方法用ps命令获取,例如要链接libdtu_sdk.so,可以用下面命令获取链接命令:

ps -elf |grep libdtu_sdk.so

如果不清楚链接的具体目标,在链接对象不多的情况下可以用“ps -elf”获取一个全集,从全集里面可以看到很多“ld @/tmp/response-xxx.txt”的进程,将当前所有的/tmp/response*拷贝到别的目录下,研究下这些文件用来链接生成什么目标的,这些文件里面会有完整的链接命令和参数,通过这个文件可以得到链接命令。

 

3. 实际修改分类

3.1. 编译选项的修改

3.1.1. 增加的选项

-fcxx-exceptions :因为dsopt使用了异常,clang的异常处理默认关闭,需要打开。

-Wno-c++11-narrowing :tops下面的代码中从大的整型向小的整型隐式转换的非常多,临时关闭,等各个组件消除了相关问题之后再打开,clang里面把从大整型到小整型的隐式转换当做错误处理。

3.1.2. 删除的选项

-Werror : 告警实在太多,要求消除所有告警不现实,临时先删除该选项。

3.1.3. 修改的选项

set (CMAKE_CXX_STANDARD 14) :原来的默认标准是17,和TensorFlow的默认标准14冲突,也和gcc的默认标准14冲突,改成c++14。

-fno-canonical-system-headers :这个参数仅gcc支持,clang不支持,所以把它从所有编译器都打开,改到仅gcc打开。

3.1.4. bazel的选项说明

bazel的编译选项分copt/cxxopt/conlyopt,其中copt是c和c++公用的选项,cxxopt是仅c++才是用的选项,conlyopt是仅c才有的选项,如果用错了,会出现很多告警。

 

3.1.5. CMAKE的CMAKE_TOOLCHAIN_FILE变量在rerun的时候,有一定概率会把搜索路径下的工具链配置文件加上全路径,导致直接STREQUAL判断失败

解决方案是用MATCHES代替STREQUAL,通配是否增加全路径的情况:

CMakeLists.txt  Expand source

3.2. 模板相关错误

3.2.1. use 'template' keyword to treat 'cast' as a dependent template name

clang里面对在一个模板实例化后的对象中调用一个需要动态翻译的函数,需要使用template显示说明,否则会报错。参照ISO C++03 14.2/4:

When the name of a member template specialization appears after . or -> in a postfix-expression, or after nested-name-specifier in a qualified-id, and the postfix-expression or qualified-id explicitly depends on a template-parameter (14.6.2), the member template name must be prefixed by the keyword template. Otherwise the name is assumed to name a non-template.

 

例如hlir的SinkTransposeWithScalarBroadcast类里面调用了mlir::RankedTensorType、mlir::ShapedType的cast方法

 
diff --git a/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc b/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc
index c82fa217a21..9952ddbc470 100644
--- a/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc
+++ b/sdk/lib/hlir/transforms/TopsInferenceHlirPass/HlirTransposeMoverExt.cc
@@ -237,11 +237,14 @@ struct SinkTransposeWithScalarBroadcast : public mlir::OpRewritePattern<T> {
     }
     llvm::SmallVector<mlir::Value, 4> new_operands(root->getNumOperands(), {});
     for (auto& it : broadcast_ops) {
-      auto transposedTy = getTransposedType(std::get<1>(it)
-                                                ->getResult(0)
-                                                .getType()
-                                                .cast<mlir::RankedTensorType>(),
-                                            prePermutation);
+      // fix error:
+      // use 'template' keyword to treat 'cast' as a dependent template name
+      auto transposedTy =
+          getTransposedType(std::get<1>(it)
+                                ->getResult(0)
+                                .getType()
+                                .template cast<mlir::RankedTensorType>(),
+                            prePermutation);
       auto new_attr = llvm::cast<HlirOp::BroadcastInDimOp>(std::get<1>(it))
                           .broadcast_dimensionsAttr();
       if (new_attr) {
@@ -251,7 +254,7 @@ struct SinkTransposeWithScalarBroadcast : public mlir::OpRewritePattern<T> {
           new_data[i] = layout[data[i]];
         }
         new_attr = mlir::DenseIntElementsAttr::get(
-            new_attr.getType().cast<mlir::RankedTensorType>(),
+            new_attr.getType().template cast<mlir::RankedTensorType>(),
             llvm::makeArrayRef(new_data));
       }
       mlir::Operation* transpose_bs_op =
@@ -274,7 +277,7 @@ struct SinkTransposeWithScalarBroadcast : public mlir::OpRewritePattern<T> {
     mlir::Operation* ret_transpose = rewriter.create<HlirOp::TransposeOp>(
         root->getLoc(), root->getResult(0).getType(), new_root->getResult(0),
         mlir::DenseIntElementsAttr::get(
-            permutation.getType().cast<mlir::ShapedType>(), layout));
+            permutation.getType().template cast<mlir::ShapedType>(), layout));
     root->replaceAllUsesWith(ret_transpose);
   }

 

注意,如果不是模板实例化的函数,不需要加template,同一个类里面也存在不需要处理的函数调用,例如同一个文件里面的ss对象是非模板实例化的,类型是固定的mlir::Operation*,ss在调用存在多态的cast函数时就不需要使用temple进行前置声明:

 
mlir::Operation* ss = op.getOperation();
auto new_operand_ty = getTransposedType(operand_ty, prePermutation);
auto new_source_ty = getTransposedType(source_ty, prePermutation);
auto new_result_ty = getTransposedType(
    ss->getResult(0).getType().cast<mlir::RankedTensorType>(),
    prePermutation);

 

同样的问题也存在于factor模块的factor_profiler_pass.cc中:

diff --git a/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc b/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc
index 43419fd305a..ad23a709f20 100644
--- a/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc
+++ b/sdk/lib/factor/codegen/passes/factor_profiler_pass.cc
@@ -55,11 +55,11 @@ mlir::Value getFirstOperand<mlir::Value>(mlir::Value op) {
  
 template <typename T>
 int getSrcCompressed(T op) {
-  return op.template dma_src_compressedAttr().getInt();
+  return op.dma_src_compressedAttr().getInt();
 }
 template <typename T>
 int getDstDecompressed(T op) {
-  return op.template dma_dst_decompressAttr().getInt();
+  return op.dma_dst_decompressAttr().getInt();
 }
  
 #define DISABLE_DMA_COMPRESS_ATTR_GETTER(OP) \
@@ -84,11 +84,11 @@ DISABLE_DMA_COMPRESS_ATTR_GETTER(mlir::factor::FactorDeSliceOp)
  
 template <typename T>
 int getReverseLr(T op) {
-  return op.template dma_reverse_lrAttr().getInt();
+  return op.dma_reverse_lrAttr().getInt();
 }
 template <typename T>
 int getReverseTb(T op) {
-  return op.template dma_reverse_tbAttr().getInt();
+  return op.dma_reverse_tbAttr().getInt();
 }
  
 #define DISABLE_REVERSE_ATTR_GETTER(OP) \
@@ -114,7 +114,7 @@ DISABLE_REVERSE_ATTR_GETTER(mlir::factor::FactorDeSliceOp)
  
 template <typename T>
 int getDmaType(T op) {
-  return op.template dma_typeAttr().getInt();
+  return op.dma_typeAttr().getInt();
 }
  
 #define DISABLE_DMA_TYPE_GETTER(OP) \
@@ -142,8 +142,8 @@ std::string formatDmaAttrs(int direction, int src_compressed,
 template <typename T>
 void extractDmaMetaInfoTo(T op, dtu_activity_data &data) {
   auto &args = data.args;
-  mlir::Value from = getFirstOperand(op.template from());
-  mlir::Value to = getFirstOperand(op.template to());
+  mlir::Value from = getFirstOperand(op.from());
+  mlir::Value to = getFirstOperand(op.to());
   auto engine_type = getDmaType(op);
   auto direction = op.dma_directionAttr().getInt();

 

3.2.2. 二义性

部分模板实例化的时候,如果同一个调用用模板函数A和模板函数B都能正常匹配到,clang会报二义性错误,gcc不报错。

例如下面的EraseHelp,原来的版本定义了两种原型,其实对存在多个模板类型需要使用TypeSequence进行原型定义的时候,编译器其实不知道是该先把Last抽出来计算,还是先把Inner抽出来计算,如果这2个函数的实现逻辑不一样的话,在gcc里面居然没报错,不知道是随机找到一个匹配的原型就调用,还是用第一个或者最后一个原型来调用。

constexpr static auto EraseHelp(TypeSequence<Left...>, TypeSequence<Last>);

constexpr static auto EraseHelp(TypeSequence<Left...>, TypeSequence<Inner, Right...>);
diff --git a/sdk/lib/hlir/ir/type_utils.h b/sdk/lib/hlir/ir/type_utils.h
index 3cf2bc7994a..0e645fd1e7e 100644
--- a/sdk/lib/hlir/ir/type_utils.h
+++ b/sdk/lib/hlir/ir/type_utils.h
@@ -157,12 +157,9 @@ struct EraseSeqIf {
     using type = decltype(EraseHelp(LeftSeq(), TypeSequence<Right...>()));
     return type();
   }
-  template <typename... Left, typename Last>
-  constexpr static auto EraseHelp(TypeSequence<Left...>, TypeSequence<Last>) {
-    using type = typename std::conditional<!Pred<Last>::value,
-                                           TypeSequence<Left..., Last>,
-                                           TypeSequence<Left...>>::type;
-    return type();
+  template <typename... Left>
+  constexpr static auto EraseHelp(TypeSequence<Left...>, TypeSequence<>) {
+    return TypeSequence<Left...>();
   }
   using type = decltype(EraseHelp(TypeSequence<>(), TypeSequence<T...>()));
 };

 

3.3. 类型不匹配

3.3.1. 大整型向小整型的隐式转换

例如sdk/tests/llir/dataflow1_pingpang_buffer_test.cc里面定义的func_entry是int64_t类型,但实际调用函数的时候,函数原型要求的入参是uint32_t,会触发int64_t → uint32_t的隐式转换:

diff --git a/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc b/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc
index fa824f03d9a..70298b1fb59 100644
--- a/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc
+++ b/sdk/tests/llir/dataflow1_pingpang_buffer_test.cc
@@ -522,7 +522,7 @@ TEST(Pavo2xCDMAPattern1Test, Pavo2xCDMAPattern1WithPingpangTest) {
                              {{0}, {1}, {2}, {3}, {4}, {5}}, 1, 1, 1, -1, -1,
                              output_queues_l1);
  
-    int64_t func_entry = 0;
+    uint32_t func_entry = 0;
     // trigger sip
     for (uint64_t idx = 0; idx < SIP_COUNT; ++idx) {
       std::string sip_name = std::string("sip") + std::to_string(idx);

 

其他类似的有:

sdk/tests/llir/dataflow1_test.cc

sdk/tests/llir/dataflow2_test.cc

sdk/tests/llir/dataflow3_test.cc

sdk/tests/llir/dataflow5_test.cc

sdk/tests/llir/dataflow5_test_1xcdma.cc

sdk/tests/llir/dataflow7_test.cc

sdk/tests/llir/llir2assembler_leo_test.cc

sdk/tests/llir/utils/llir_test_util.cc

sdk/tests/llir/utils/llir_test_util.h

 

3.3.2. 有符号向无符号的隐式转换

-1转换为无符号整型:

diff --git a/sdk/lib/hlir/ir/type_utils.h b/sdk/lib/hlir/ir/type_utils.h
index 0e645fd1e7e..f84360269f3 100644
--- a/sdk/lib/hlir/ir/type_utils.h
+++ b/sdk/lib/hlir/ir/type_utils.h
@@ -122,10 +122,9 @@ struct FindIf<Pred, T, R...> {
  
 template <template <typename N> typename Pred, typename T>
 struct FindIf<Pred, T> {
-  using type =
-      typename std::conditional<Pred<T>::value,
-                                std::integral_constant<size_t, 0>,
-                                std::integral_constant<size_t, -1>>::type;
+  using type = typename std::conditional<
+      Pred<T>::value, std::integral_constant<size_t, 0>,
+      std::integral_constant<size_t, static_cast<size_t>(-1)>>::type;
 };

 

其他主要体现在迭代器定义的是int类型,但实际使用过程中需要和很多uint32_t进行比较,导致了隐式的int → uint32的转换:

diff --git a/sdk/lib/umd/tests/sample/launch_code.cc b/sdk/lib/umd/tests/sample/launch_code.cc
index 1152a283052..708b1f44e7d 100644
--- a/sdk/lib/umd/tests/sample/launch_code.cc
+++ b/sdk/lib/umd/tests/sample/launch_code.cc
@@ -719,11 +716,10 @@ static void _launch_code_for_eight_sip(int cid, bool check_result) {
   dtu_mem_handle param = cluster_mem[cid];
   u64 param_off = A_B_SIZE + EIGHT_C_SIZE;
   u64 param_size = PARAM_TRUE_SIZE;
-  u16 launch_entry = 0;
   dtu_sip_mode_cfg_st mode;
   mode.mode_dw = 0x5070f10;
   LaunchKernelParameter parameter[8];
-  for (int i = 0; i < run_sip_count; i++) {
+  for (u32 i = 0; i < run_sip_count; i++) {
     parameter[i] =
         LaunchKernelParameter(sip[i], param, param_off + i * ONE_PARAM_SIZE,
                               param_size, 0, mode, 0, false, false, "op_0");

 

其他文件:

sdk/lib/spm/src/buddy_policy.c

system_test/tools/vpd_cycle/vpd_cycle.c

sdk/lib/spm/include/spm.h

sdk/tests/llir/llir2assembler_leo_test.cc

sdk/tests/llir/dataflow5_test_1xcdma.cc

sdk/tests/llir/dataflow5_test.cc

sdk/tests/llir/llir2assembler_leo_test.cc

sdk/tests/llir/utils/llir_test_util.cc

sdk/tests/llir/utils/llir_test_util.h

 

对sdk/lib/umd/tools/kernel_code_processor/dturt.inc的修改要麻烦一点,___leo_runtime___和___x_runtime___定义的时候是char[],但初始化有可能大于127,会导致溢出,但使用该变量的函数,以及二级引用的函数,都要求它是char[],最终修改是定义改成unsigned char[],但在一级引用的函数中做一次强制转换。

diff --git a/sdk/lib/umd/tools/kernel_code_processor/dturt.inc b/sdk/lib/umd/tools/kernel_code_processor/dturt.inc
index 1f22b52d8af..d1ed30a049d 100644
--- a/sdk/lib/umd/tools/kernel_code_processor/dturt.inc
+++ b/sdk/lib/umd/tools/kernel_code_processor/dturt.inc
@@ -1,4 +1,4 @@
-static const char ___leo_runtime___[] = {
+static const unsigned char ___leo_runtime___[] = {
     0x21, 0x3C, 0x61, 0x72, 0x63, 0x68, 0x3E, 0x0A, 0x2F, 0x20, 0x20, 0x20,
     0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,
     0x30, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,
@@ -2885,7 +2885,7 @@ static const char ___leo_runtime___[] = {
     0x00, 0x00,
 };
 static const int ___leo_runtime_size___ = sizeof(___leo_runtime___);
-static const char ___x_runtime___[] = {
+static const unsigned char ___x_runtime___[] = {
     0x21, 0x3C, 0x61, 0x72, 0x63, 0x68, 0x3E, 0x0A, 0x2F, 0x20, 0x20, 0x20,
     0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,
     0x30, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20,

 

diff --git a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h
index d193a8823ac..f61d048cfd6 100644
--- a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h
+++ b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h
@@ -128,7 +128,10 @@ class Kernel {
   struct __target__ : public KernelCode<__target__>, public Kernel {         \
     using KernelCode<__target__>::KernelCode;                                \
     static const llvm::StringRef GetArch() { return #__arch__; }             \
-    static const char* GetRTBuffer() { return ___##__arch__##_runtime___; }  \
+    static const char* GetRTBuffer() {                                       \
+      return static_cast<char*>(static_cast<void*>(                          \
+          const_cast<unsigned char*>(___##__arch__##_runtime___)));          \
+    }                                                                        \
     static int GetRTBufferSize() { return ___##__arch__##_runtime_size___; } \
   };                                                                         \
   template class KernelCode<__target__>

 

 

 

 

3.3.3. 浮点向整型的隐式转换

小数点直接转没了,非0值立即成了0值:

diff --git a/sdk/tests/tops/tops_bnForwardTrainingEx_integration_test.cc b/sdk/tests/tops/tops_bnForwardTrainingEx_integration_test.cc
index 73712aba4ad..df82dadfa65 100644
--- a/sdk/tests/tops/tops_bnForwardTrainingEx_integration_test.cc
+++ b/sdk/tests/tops/tops_bnForwardTrainingEx_integration_test.cc
@@ -1890,7 +1890,7 @@ TEST_F(TopsTest, topsConvolutionForward_BatchNorm_RELU_UV) {
   int k_c = 1;
   int k_h = 3;
   int k_w = 3;
-  int epsilon = 0.01;
+  float epsilon = 0.01;
  
   int input_size = n * c * h * w;
   int kernel_size = k_n * k_c * k_h * k_w;
@@ -2181,7 +2181,7 @@ TEST_F(TopsTest, topsConvolutionForward_BatchNorm_RELU_SV) {
   int k_c = 1;
   int k_h = 3;
   int k_w = 3;
-  int epsilon = 0.01;
+  float epsilon = 0.01;
  
   int input_size = n * c * h * w;
   int kernel_size = k_n * k_c * k_h * k_w;

 

其他类似修改:

sdk/tests/op/hlir/pavo/bert/hlir_div_test.cc

 

3.3.4. double向float的隐式转换

diff --git a/sdk/lib/umd/tests/sample/launch_code.cc b/sdk/lib/umd/tests/sample/launch_code.cc
index 1152a283052..708b1f44e7d 100644
--- a/sdk/lib/umd/tests/sample/launch_code.cc
+++ b/sdk/lib/umd/tests/sample/launch_code.cc
@@ -783,8 +779,8 @@ static void _launch_code_for_eight_sip(int cid, bool check_result) {
     float *result = (float *)((u64)dtu_mem_get_cpu_ptr(host_mem) + A_B_SIZE);
     for (u32 i = 0; i < (run_sip_count * DTU_ALIGN(DATA_BUFF_SIZE, 128)) / 4;
          i++) {
-      if (result[i] - ((2 * i) % (DATA_BUFF_SIZE / 2)) > 0.01 ||
-          ((2 * i) % (DATA_BUFF_SIZE / 2)) - result[i] > 0.01) {
+      if (result[i] - ((2 * i) % (DATA_BUFF_SIZE / 2)) > 0.01f ||
+          ((2 * i) % (DATA_BUFF_SIZE / 2)) - result[i] > 0.01f) {
         dtu_command_queue_destroy(queue);
         dtu_mem_free_hbm(hbm_mem);
         dtu_mem_free_host(host_mem);
@@ -425,7 +425,7 @@ static void launch_code_for_one_sip(void) {
  
   float *result = (float *)((u64)dtu_mem_get_cpu_ptr(host_mem) + A_B_SIZE);
   for (int i = 0; i < DATA_BUFF_SIZE / 4; i++) {
-    if (result[i] - (2 * i) > 0.01 || (2 * i) - result[i] > 0.01) {
+    if (result[i] - (2 * i) > 0.01f || (2 * i) - result[i] > 0.01f) {
       dtu_command_queue_destroy(queue);
       dtu_mem_free_hbm(hbm_mem);
       dtu_mem_free_host(host_mem);
@@ -605,8 +605,8 @@ static void launch_one_sip_twice(void) {
  
   float *result = (float *)((u64)dtu_mem_get_cpu_ptr(host_mem) + A_B_SIZE);
   for (int i = 0; i < 2 * DATA_BUFF_SIZE / 4; i++) {
-    if (result[i] - ((2 * i) % (DATA_BUFF_SIZE / 2)) > 0.01 ||
-        ((2 * i) % (DATA_BUFF_SIZE / 2)) - result[i] > 0.01) {
+    if (result[i] - ((2 * i) % (DATA_BUFF_SIZE / 2)) > 0.01f ||
+        ((2 * i) % (DATA_BUFF_SIZE / 2)) - result[i] > 0.01f) {
       dtu_command_queue_destroy(queue);
       dtu_mem_free_hbm(hbm_mem);
       dtu_mem_free_host(host_mem);

 

其他类似修改:

sdk/tests/op/hlir/pavo/resnet50/hlir_general_resize_test.cc

3.3.5. 指针向bool的隐式转换

diff --git a/system_test/tools/vpd_cycle/vpd_cycle.c b/system_test/tools/vpd_cycle/vpd_cycle.c
index 31d57fa0f9c..ccc9f71b827 100644
--- a/system_test/tools/vpd_cycle/vpd_cycle.c
+++ b/system_test/tools/vpd_cycle/vpd_cycle.c
@@ -75,14 +83,14 @@ static int ProcessDB(const char *path) {
   char *name = strdup(path);
   char *base = basename(name);
   char *p;
-  if (p = strrchr(base, '.')) *p = '\0';
+  if ((p = strrchr(base, '.')) != NULL) *p = '\0';
   fprintf(output_fp, "%s,%lu\n", base, end - start);
   free(name);

 

 

3.3.6. 不同类型隐式转换

 

fixed_size_mem_pool.h直接将dtu_status和int相互赋值,虽然dtu_status是个enum类型,和int类型很类似,但clang是强类型检查,直接报错。
diff --git a/sdk/runtime/lib/top_scheduler/fixed_size_mem_pool.h b/sdk/runtime/lib/top_scheduler/fixed_size_mem_pool.h
index 73d02f3b1f4..b7be6ee39c4 100644
--- a/sdk/runtime/lib/top_scheduler/fixed_size_mem_pool.h
+++ b/sdk/runtime/lib/top_scheduler/fixed_size_mem_pool.h
@@ -118,7 +118,7 @@ class DeviceFixedSizeMemPool final
   ~DeviceFixedSizeMemPool() {}
  
   Status Init(dtu_umd::MemoryMgr *mgr, uint32_t mc, uint32_t flags) override {
-    dtu_status status = 0;
+    dtu_status status = DTU_SUCCESS;
     status =
         mgr->AllocDevice(NODE_NUMBER * NODE_SIZE, mc, flags, &(this->mem_));
     if (status) {
--

 

dtu_status的定义:

 
typedef enum dtu_status_code {
  DTU_SUCCESS = 0,
  DTU_ERROR_INVALID_PARAMETER = -100,
  DTU_ERROR_INVALID_MEM_TYPE = -101,
  DTU_ERROR_OUT_OF_MEMORY = -102,
  DTU_ERROR_OUT_OF_RESOURCES = -103,
  DTU_ERROR_NOT_INITIALIZED = -104,
  DTU_ERROR_INVALID_CTX_OBJ = -105,
  DTU_ERROR_INVALID_CLUSTER_OBJ = -106,
  DTU_ERROR_INVALID_SIP_OBJ = -107,
  DTU_ERROR_INVALID_MEM_OBJ = -108,
  DTU_ERROR_INVALID_CMD_QUEUE_OBJ = -109,
  DTU_ERROR_INVALID_CMD_DESC_OBJ = -110,
  DTU_ERROR_INVALID_PROGRAM_OBJ = -111,
  DTU_ERROR_INVALID_FUNCTION_OBJ = -112,
  DTU_ERROR_INVALID_EVENT_OBJ = -113,
  DTU_ERROR_CLUSTER_BUSY = -114,
  DTU_ERROR_SIP_BUSY = -115,
  DTU_ERROR_IN_DRM = -116,
  DTU_ERROR_IN_IOCTRL = -117,
  DTU_ERROR_GEM_CREATE = -118,
  DTU_ERROR_GEM_CLOSE = -119,
  DTU_ERROR_GEM_MMAP = -120,
  DTU_ERROR_GEM_UNMMAP = -121,
  DTU_ERROR_CMD_QUEUE_SYNC = -122,
  DTU_ERROR_CMD_QUEUE_EMIT = -123,
  DTU_ERROR_CLUSTER_ACQUIRE = -124,
  DTU_ERROR_CLUSTER_RELEASE = -125,
  DTU_ERROR_NOT_MATCH = -126,
  DTU_ERROR_NOT_RELEASE_REF = -127,
  DTU_ERROR_GET_DEVICE_HDL = -128,
  DTU_ERROR_ALLOC_HOST = -129,
  DTU_ERROR_ALLOC_HBM = -130,
  DTU_ERROR_ALLOC_CLUSTER = -131,
  DTU_ERROR_FREE_HOST = -132,
  DTU_ERROR_FREE_HBM = -133,
  DTU_ERROR_FREE_CLUSTER = -134,
  DTU_ERROR_CMD_QUEUE_EMITED = -135,
  DTU_ERROR_OPEN_FILE = -136,
  DTU_ERROR_READ_FILE = -137,
  DTU_ERROR_WRITE_FILE = -138,
  DTU_ERROR_INVALID_BIN_TYPE = -139,
  DTU_ERROR_LOAD_BIN_FILE = -140,
  DTU_ERROR_LOAD_BIN_IMAGE = -141,
  DTU_ERROR_FUNCTION_NOT_FOUND = -142,
  DTU_ERROR_INVALID_OPERATION = -143,
  DTU_ERROR_EVENT_GET_ID = -144,
  DTU_ERROR_EVENT_WAIT_STATUS = -145,
  DTU_ERROR_EVENT_SIGNAL_STATUS = -146,
  DTU_ERROR_EVENT_TYPE = -147,
  DTU_ERROR_EVENT_NOT_SUBMIT = -148,
  DTU_ERROR_EVENT_DESTROYED = -149,
  DTU_ERROR_EVENT_SIGNAL_TWICE = -150,
  DTU_ERROR_MEMORY_OVERLAP = -151,
  DTU_ERROR_THREAD_POOL_QUEUE_OVERFLOW = -152,
  DTU_ERROR_PCI_BUS_SCAN = -153,
  DTU_ERROR_ALLOC_USERPTR = -154,
  DTU_ERROR_FREE_USERPTR = -155,
  DTU_ERROR_DUMP_CMEM = -156,
  DTU_ERROR_LOAD_CMEM = -157,
  DTU_ERROR_DUMP_SMEM = -158,
  DTU_ERROR_LOAD_SMEM = -159,
  DTU_ERROR_READ_REGISTERS = -160,
  DTU_ERROR_WRITE_REGISTERS = -161,
  DTU_ERROR_ALLOC_SIP = -162,
  DTU_ERROR_FREE_SIP = -163,
  DTU_ERROR_UNKNOWN = -164,
  DTU_ERROR_ALLOC_HUGE = -165,
  DTU_ERROR_INVALID_USR_IRQ_OBJ = -166,
  DTU_ERROR_LINK_CCIX_IO = -167,
  DTU_ERROR_PLACEHOLDER_NOT_FEED = -168,
  DTU_ERROR_LAUNCH_DMA = -169,
  DTU_ERROR_INVALID_PROFILE_MAGIC = -170,
  DTU_ERROR_INVALID_TIMESTAMP = -180,
  DTU_ERROR_INVALID_CONFIG = -181,
  DTU_ERROR_CHILD_NOT_SUBMIT = -182,
  DTU_ERROR_ALREADY_FORKED = -183,
  DTU_ERROR_LABEL_USED = -184,
  DTU_ERROR_LABEL_NOT_VALIDATED = -185,
  DTU_ERROR_COMMAND_TYPE_MISMATCH = -186,
  DTU_ERROR_VECTOR_NUMBER = -187,
  DTU_ERROR_VECTOR_FLAG_MISMATCH = -188,
  DTU_ERROR_DEVICE_RESET = -189,
  DTU_ERROR_EXECUTABLE_CRC_VERIFY = -190,
  DTU_ERROR_EXECUTABLE_DEVICE_VERIFY = -191,
  DTU_ERROR_INVALID_TS_OBJ = -192,
  DTU_ERROR_ALLOC_VDEV = -193,
  DTU_ERROR_FREE_VDEV = -194,
  DTU_ERROR_VDEV_BUSY = -195,
} dtu_status;

 

 

NULL和0的值虽然一样,但前者的类型是void*,后者类型是int,差别很大的。

diff --git a/sdk/lib/umd/tests/sample/sample_run.cc b/sdk/lib/umd/tests/sample/sample_run.cc
index c5a3557c2a5..23e9563859b 100644
--- a/sdk/lib/umd/tests/sample/sample_run.cc
+++ b/sdk/lib/umd/tests/sample/sample_run.cc
@@ -35,7 +35,7 @@ void usage() {
  
 dtu_context ctx;
 dtu_cluster cluster[4] = {NULL};
-u32 cluster_id[4] = {NULL};
+u32 cluster_id[4] = {0};
 dtu_mem_handle cluster_mem[4] = {NULL};
 dtu_sip sip[32] = {NULL};

 

3.3.7. 函数原型中的const隐式转换

diff --git a/sdk/lib/cpu/cpu_func_manager.cc b/sdk/lib/cpu/cpu_func_manager.cc
index 940bde5d91a..ec8967203c2 100644
--- a/sdk/lib/cpu/cpu_func_manager.cc
+++ b/sdk/lib/cpu/cpu_func_manager.cc
@@ -31,7 +31,7 @@ struct FunctionInvoker {
   }
   template <size_t... idx>
   void unpack(std::index_sequence<idx...> seq, const void* func, char** argvs) {
-    (*reinterpret_cast<void (*)(...)>(func))(argvs[idx]...);
+    (*reinterpret_cast<void (*)(...)>(const_cast<void*>(func)))(argvs[idx]...);
   }
 };

 

3.3.8. void*向char*的隐式转换

很多模块直接对void*指针多算术运算,void*指向的对象大小是未知的,一般如果把它作为地址进行+或者-运算,实际上是自己先做了一次隐式的void* → char*的转换,clang中不允许这样做:

diff --git a/sdk/tests/hlir/cc_tests/hlir_4c_add_test.cc b/sdk/tests/hlir/cc_tests/hlir_4c_add_test.cc
index 5b9c2dcc98f..7b568f934bb 100644
--- a/sdk/tests/hlir/cc_tests/hlir_4c_add_test.cc
+++ b/sdk/tests/hlir/cc_tests/hlir_4c_add_test.cc
@@ -66,8 +66,8 @@ static void Add4CTest(SimpleModuleOpBuilder::ShapeType &shape,
   executor.run(false);
  
   auto output_hanlde = executor.get_output(0);
-  T* result =
-      static_cast<T*>(output_hanlde->CPUPtr() + output_hanlde->offset());
+  T* result = static_cast<T*>(static_cast<void*>(
+      static_cast<char*>(output_hanlde->CPUPtr()) + output_hanlde->offset()));
   for (size_t i = 0; i < l_data.size(); ++i) {
     EXPECT_EQ(result[i], out_data[i]);
   }

 

其他类似修改:

sdk/tests/hlir/cc_tests/hlir_corner_test.cc

sdk/tests/hlir/cc_tests/hlir_press_test.cc

sdk/tests/tops/tops_dot_test.cc

sdk/tests/op/hlir/pavo/bert/hlir_broadcast_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_transpose_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_test_header.h

sdk/tests/op/hlir/pavo/resnet50/hlir_slice_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_select_and_scatter_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_select_and_scatter_non4c_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_pad_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_dynamic_update_slice_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_dynamic_slice_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_concat_test.cc

sdk/tests/op/hlir/pavo/resnet50/hlir_broadcast_test.cc

sdk/tests/op/hlir/pavo/dnn/hlir_test_header.h

sdk/tests/op/hlir/hlir_test_header.h

sdk/tests/runtime/executable_test.cc

3.3.9. string类型到char*的隐式转换

diff --git a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
index 7e366337561..41fb573a562 100644
--- a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
+++ b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
@@ -23,7 +23,7 @@ KernelCode<T>::KernelCode(StringRef file)
     : compiled_(false), name_(file), module_("KernelModule", context_) {
   auto mb_or_err = MemoryBuffer::getFile(file);
   if (auto ec = mb_or_err.getError()) {
-    EF_PRINT(UmdMsg::UMD_CANNOT_OPEN_MSG, file.data(), ec.message());
+    EF_PRINT(UmdMsg::UMD_CANNOT_OPEN_MSG, file.data(), ec.message().c_str());
     EF_THROW_WITH << -1 << std::endl;
   }

 

 

3.4. switch中break缺失

3.4.1. 语义上确实需要break的场景,增加break

例如parser.hpp里面在最后的default分支之前没有加break,虽然由于default分支当前是空的,所以实际上不影响功能,但万一后面default分支增加了任何处理,就会出问题:

diff --git a/3rdparty/inja/include/inja/parser.hpp b/3rdparty/inja/include/inja/parser.hpp
index 6266c4a0f74..466499ecc8b 100644
--- a/3rdparty/inja/include/inja/parser.hpp
+++ b/3rdparty/inja/include/inja/parser.hpp
@@ -296,7 +296,7 @@ class Parser {
           operator_stack.pop();
           function_stack.pop();
         }
-      }
+      } break;
       default:
         break;
       }

 

其他类似修改:

sdk/sdk.bzl
sdk/third_party/inja.patch

 

3.4.2. 语义上确实不需要break的场景,增加编译指示,让编译器忽略检查

这样的问题比较普遍。

diff --git a/sdk/tests/runtime/chunk_allocator_test.cc b/sdk/tests/runtime/chunk_allocator_test.cc
index e63568ddc63..78896778a21 100644
--- a/sdk/tests/runtime/chunk_allocator_test.cc
+++ b/sdk/tests/runtime/chunk_allocator_test.cc
@@ -552,6 +552,10 @@ TEST_F(ChunkAllocatorTest, copy_constructor_test) {
     uint64_t offset0 = 0;
     uint64_t offset1 = 0;
  
+#if defined(__clang__)
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wimplicit-fallthrough"
+#endif
     switch (op) {
       case TestOpAllocTopDown:
       case TestOpAllocDownTop: {
@@ -590,6 +594,9 @@ TEST_F(ChunkAllocatorTest, copy_constructor_test) {
         }
       } break;
     }
+#if defined(__clang__)
+#pragma clang diagnostic pop
+#endif
   }
 }
 }  // namespace

 

 

其他类似修改:

sdk/tests/runtime/mem_manager_test.cc
sdk/tests/runtime/mem_pool_test.cc
sdk/tools/dtu_compiler/dtu_compiler.cc
sdk/lib/umd/tests/sample/tinyxmlparser.cc

另外,C++17开始支持fallthrough的attribute,可以比较简单的告诉编译器需要fallthrough:C++ attribute: fallthrough (since C++17) - cppreference.com

 

3.5. format不匹配问题

3.5.1. 不匹配,但实际上不影响功能

format的string和后面实际传递的参数不一致的情况下,有可能导致严重问题,不过tops下面的代码很多是ll类型传递了64位数据,实际上对功能影响不大,但如果后面有128位处理器,可能ll就是实际上128位,就可能导致堆栈异常。

 

diff --git a/sdk/tests/runtime/chunk_allocator_test.cc b/sdk/tests/runtime/chunk_allocator_test.cc
index e63568ddc63..78896778a21 100644
--- a/sdk/tests/runtime/chunk_allocator_test.cc
+++ b/sdk/tests/runtime/chunk_allocator_test.cc
@@ -426,7 +426,7 @@ TEST_F(ChunkAllocatorTest, basic_stress_test) {
           if (allocated_size < allocated_size_pass) {
             char str_buf[256];
             snprintf(str_buf, sizeof(str_buf),
-                     "allocated_size: %llx, allocated_chunks.size(): %lu",
+                     "allocated_size: %lx, allocated_chunks.size(): %lu",
                      allocated_size, allocated_chunks.size());
             EXPECT_TRUE(false) << str_buf;
             break;

 

其他文件:

sdk/lib/umd/tests/sample/mm_test.cc

sdk/include/driver/mem_handle.h

sdk/include/runtime/command_packet.h

sdk/include/driver/mem_handle.h

sdk/tests/runtime/mem_pool_test.cc

sdk/lib/umd/tests/sample/performance_test.cc

sdk/tests/profile/test_zebu.cc

sdk/runtime/tests/top_scheduler/loop_task_utils.h

 

 

3.5.2. 不匹配,并且影响功能

 

下面本意是打印uint16_t*的指针指向的数据,错误传递成指针,相当于打印的是一个地址,而不是值,幸好只是一句打印,但实际上%hu对应的是32位,而入参指针在64位机器上是64位,还是会破坏堆栈:

diff --git a/sdk/include/runtime/command_packet.h b/sdk/include/runtime/command_packet.h
index a2d061e9117..5006a601cc0 100644
--- a/sdk/include/runtime/command_packet.h
+++ b/sdk/include/runtime/command_packet.h
@@ -362,7 +362,7 @@ struct CommandPacket {
    */
   static std::string MemberToString(uint16_t* p, std::string tab = "    ") {
     char buf[256];
-    snprintf(buf, sizeof(buf), "%hu", p);
+    snprintf(buf, sizeof(buf), "%hu", *p);
     return buf;
   }

 

3.6. 有定义无使用

3.6.1. 未使用变量

 

diff --git a/sdk/lib/umd/tests/sample/launch_code.cc b/sdk/lib/umd/tests/sample/launch_code.cc
index 1152a283052..708b1f44e7d 100644
--- a/sdk/lib/umd/tests/sample/launch_code.cc
+++ b/sdk/lib/umd/tests/sample/launch_code.cc
@@ -180,7 +180,6 @@ static void launch_code_with_cluster_check(void) {
   dtu_mem_handle param = cluster_mem[0];
   u64 param_off = A_B_SIZE + ONE_C_SIZE;
   u64 param_size = PARAM_TRUE_SIZE;
-  u16 launch_entry = 0;
   dtu_sip_mode_cfg_st mode;
   mode.mode_dw = 0x5070f10;
   LaunchKernelParameter parameter(sip[0], param, param_off, param_size, 0, mode,
@@ -363,7 +362,6 @@ static void launch_code_for_one_sip(void) {
   dtu_mem_handle param = cluster_mem[0];
   u64 param_off = A_B_SIZE + ONE_C_SIZE;
   u64 param_size = PARAM_TRUE_SIZE;
-  u16 launch_entry = 0;
   dtu_sip_mode_cfg_st mode;
   mode.mode_dw = 0x5070f10;
   LaunchKernelParameter parameter(sip[0], param, param_off, param_size, 0, mode,
@@ -537,7 +535,6 @@ static void launch_one_sip_twice(void) {
   dtu_mem_handle param = cluster_mem[0];
   u64 param_off = A_B_SIZE + TWO_C_SIZE;
   u64 param_size = PARAM_TRUE_SIZE;
-  u16 launch_entry = 0;
   dtu_sip_mode_cfg_st mode;
   mode.mode_dw = 0x5070f10;
   LaunchKernelParameter parameter[2];
@@ -719,11 +716,10 @@ static void _launch_code_for_eight_sip(int cid, bool check_result) {
   dtu_mem_handle param = cluster_mem[cid];
   u64 param_off = A_B_SIZE + EIGHT_C_SIZE;
   u64 param_size = PARAM_TRUE_SIZE;
-  u16 launch_entry = 0;
   dtu_sip_mode_cfg_st mode;
   mode.mode_dw = 0x5070f10;
   LaunchKernelParameter parameter[8];

 

其他文件:

sdk/tests/spm/basic.cc

sdk/lib/spm/src/best_fit_policy.c

 

 

3.6.2. 未使用参数

非常多,尤其涉及一些第三方组件,还要专门制作patch的方式修改,后面忍不住把Werr关掉主要也是因为这个告警:

 
diff --git a/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc b/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc
index abd50ad4e81..1b39f594406 100644
--- a/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc
+++ b/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc
@@ -16,6 +16,7 @@
 #include "dtu/umd/dtu.h"
 #include "dtu/umd/dtu_base_obj.h"
 #include "dtu/umd/dtu_log.h"
+#include "dtu/umd/dtu_utils.h"
 #include "lib/umd/src/dtu_memory.h"
 #include "lib/umd/tests/sample/sample.h"
 #include "lib/umd/tests/sample/sample_assert.h"
@@ -26,6 +27,7 @@ std::mutex mtx;
  
 void event_callback_func_1(dtu_callback callback, void *user_data,
                            u32 engine_id) {
+  MAYBE_UNUSED(callback);
   std::unique_lock<std::mutex> lock(mtx);
   *(int *)user_data = 1;
   DTU_ERROR_LOG(TEST, "event callback_1 call[%d]\n", engine_id);
@@ -33,6 +35,7 @@ void event_callback_func_1(dtu_callback callback, void *user_data,
  
 void event_callback_func_2(dtu_callback callback, void *user_data,
                            u32 engine_id) {
+  MAYBE_UNUSED(callback);
   std::unique_lock<std::mutex> lock(mtx);
   *(int *)user_data = 1;
   DTU_ERROR_LOG(TEST, "event callback_2 call[%d]\n", engine_id);

 

其他文件:

tools/logging/lib/logging/log.cc

tools/logging/lib/logging/to/file.cc

tools/logging/lib/logging/to/std_err.cc

tools/logging/lib/util/signal_handler.cc

tools/logging/tests/logging/log_to_test.h

sdk/lib/umd/include/dtu_utils.h

sdk/lib/umd/include/reference_obj.h

3rdparty/protobuf-3.8.0/src/google/protobuf/arena.h

sdk/lib/umd/tests/sample/device_reset.cc

sdk/lib/umd/tests/sample/usr_irq.cc

sdk/lib/umd/tests/sample/callback_test.cc

3rdparty/protobuf-3.8.0/src/google/protobuf/map_type_handler.h

3rdparty/protobuf-3.8.0/src/google/protobuf/parse_context.h

kmd/utils/ktest/kmd-test.cpp

sdk/lib/spm/src/buddy_policy.c

sdk/lib/umd/include/dtu_command_obj.h

sdk/lib/umd/include/dtu_context_obj.h

sdk/lib/umd/include/dtu_dqm_obj.h

sdk/lib/umd/include/dtu_driver.h

system_test/tools/vpd_cycle/vpd_cycle.c

sdk/lib/spm/src/buddy_policy.c

sdk/lib/spm/src/interface.c

sdk/lib/spm/src/rbtree.c

sdk/lib/umd/include/dtu_device.h

sdk/lib/umd/include/dtu_driver.h

 

另外,tools/logging/include/logging/check.h里面的未使用变量比较特殊,实际上是要用的,不过接口调用错了,导致信息传递中丢失了:

diff --git a/tools/logging/include/logging/check.h b/tools/logging/include/logging/check.h
index eb856b7df85..67a667477f1 100644
--- a/tools/logging/include/logging/check.h
+++ b/tools/logging/include/logging/check.h
@@ -47,16 +47,16 @@
 #define EFCHECK_STRCASENE(s1, s2) EF_DTU_CHECK_STROP(strcasecmp, !=, false, s1, s2)
  
 #undef EFCHECK_NOTNULL
-#define EFCHECK_NOTNULL(val) \
-  ::ef_log::CheckNotNull(__FILE__, __LINE__, "'" #val "' Must be non NULL", (val))
-
+#define EFCHECK_NOTNULL(val)                                                \
+  ::ef_log::CheckNotNull(__FILE__, __LINE__, "'" #val "' Must be non NULL", \
+                         (val))
  
 namespace ef_log {
  
 template <typename T>
 T&& CheckNotNull(const char* file, int line, const char* exprtext, T&& t) {
   if (t == nullptr) {
-    EFLOG(FATAL) << std::string(exprtext);
+    ::ef_log::FatalLog(file, line) << std::string(exprtext);
   }
   return std::forward<T>(t);
 }

 

 

 

3.6.3. 未使用label

 

diff --git a/sdk/include/scheduler/cmd_packet_pass_util.h b/sdk/include/scheduler/cmd_packet_pass_util.h
index d56b5f362e8..1cc18ddc603 100644
--- a/sdk/include/scheduler/cmd_packet_pass_util.h
+++ b/sdk/include/scheduler/cmd_packet_pass_util.h
@@ -457,7 +457,6 @@ void MultiThreadDo(PacketGraph* graph, InitFuncS initf, ThreadFunc f,
   uninif(core_count);
  
   delete[] ptl;
-Exit0:
   return;
 }

 

 

3.6.4. 执行不到的代码

下面代码开发解释是当前不支持,又不想删除,先加个注释:

diff --git a/sdk/runtime/tests/top_scheduler/TimerTest.cc b/sdk/runtime/tests/top_scheduler/TimerTest.cc
index cb1e2269dd4..5aea6c1f956 100644
--- a/sdk/runtime/tests/top_scheduler/TimerTest.cc
+++ b/sdk/runtime/tests/top_scheduler/TimerTest.cc
@@ -127,12 +127,12 @@ TEST_F(TimerTest, Timer) {
     L3DMA = EngineType::Type::ODMA;
   } else if (IsPavoT20() || IsPavoT21()) {
     return;  // Need TS FW;
-    assembler = new ExecutableAssembler(TargetType::PAVO);
-    L3DMA = EngineType::Type::CDMA_LITE;
+    // assembler = new ExecutableAssembler(TargetType::PAVO);
+    // L3DMA = EngineType::Type::CDMA_LITE;
   } else if (IsDoradoI20() || IsDoradoI21()) {
     return;  // Need TS FW;
-    assembler = new ExecutableAssembler(TargetType::DORADO);
-    L3DMA = EngineType::Type::CDMA;
+    // assembler = new ExecutableAssembler(TargetType::DORADO);
+    // L3DMA = EngineType::Type::CDMA;
   } else {
     return;
   }

 

sdk/tests/tops/tops_customop_upsample_nearest_test.cc也会报未使用代码,主要是因为Co当前是固定值,导致第一层if判断永远未false,实际上后面这层循环也兼容了Co为1的场景,完全可以去掉:

diff --git a/sdk/tests/tops/tops_customop_upsample_nearest_test.cc b/sdk/tests/tops/tops_customop_upsample_nearest_test.cc
index 2b59c3fc0fc..cf19502b9b4 100644
--- a/sdk/tests/tops/tops_customop_upsample_nearest_test.cc
+++ b/sdk/tests/tops/tops_customop_upsample_nearest_test.cc
@@ -175,32 +175,17 @@ TEST_F(TopsTest, CustomCall_UpSample_Nearest_1) {
   int n_offset = Ho * Wo * Co;
   int h_offset = Wo * Co;
  
-  if (Co == 1) {
-    for (int n = 0; n < N; ++n) {
-      int n_offset = n * n_offset;
-      for (int h = 0; h < Ho; ++h) {
-        int h_index = h / scale_H;
-        for (int w = 0; w < Wo; ++w) {
-          int w_index = w / scale_W;
-          output_ref[n_offset + h * h_offset + w] =
-              image_data[n * Hi * Wi * Ci + h_index * Wi * Ci + w_index * Ci];
-        }
-      }
-    }
-
-  } else {
-    for (int n = 0; n < N; ++n) {
-      int n_offset = n * Ho * Wo * Co;
-      for (int h = 0; h < Ho; ++h) {
-        int h_index = h / scale_H;
-        for (int w = 0; w < Wo; ++w) {
-          int w_index = w / scale_W;
-          for (int c = 0; c < Co; ++c) {
-            int c_index = c / scale_C;
-            output_ref[n_offset + h * h_offset + w * Co + c] =
-                image_data[n * Hi * Wi * Ci + h_index * Wi * Ci + w_index * Ci +
-                           c_index];
-          }
+  for (int n = 0; n < N; ++n) {
+    int n_offset = n * Ho * Wo * Co;
+    for (int h = 0; h < Ho; ++h) {
+      int h_index = h / scale_H;
+      for (int w = 0; w < Wo; ++w) {
+        int w_index = w / scale_W;
+        for (int c = 0; c < Co; ++c) {
+          int c_index = c / scale_C;
+          output_ref[n_offset + h * h_offset + w * Co + c] =
+              image_data[n * Hi * Wi * Ci + h_index * Wi * Ci + w_index * Ci +
+                         c_index];
         }
       }
     }

 

 

3.6.5. 未被调用的inline函数

diff --git a/sdk/lib/umd/tests/sample/memcpy_odma.cc b/sdk/lib/umd/tests/sample/memcpy_odma.cc
index 3cfc9777934..2b565c5e55e 100644
--- a/sdk/lib/umd/tests/sample/memcpy_odma.cc
+++ b/sdk/lib/umd/tests/sample/memcpy_odma.cc
@@ -9,6 +9,7 @@
  
 #include "dtu/umd/dtu.h"
 #include "dtu/umd/dtu_interface.h"
+#include "dtu/umd/dtu_utils.h"
 #include "lib/umd/tests/sample/sample.h"
 #include "lib/umd/tests/sample/sample_assert.h"
  
@@ -991,6 +992,7 @@ static void memcpy_host_to_hbm_mc_scan_sync(void) {
 }
 MAKE_SAMPLE_FROM_FUNCTION(memcpy_host_to_hbm_mc_scan_sync);
  
+#if 0
 static int odma_copy(dtu_mem_handle dst_hdl, u64 dst_offset,
                      dtu_mem_handle src_hdl, u64 src_offset, u64 size,
                      u32 engine_id) {
@@ -1034,6 +1036,7 @@ static int odma_copy(dtu_mem_handle dst_hdl, u64 dst_offset,
   dtu_command_queue_destroy(queue);
   return 0;
 }
+#endif
  
 #define MB (1 * 1024 * 1024)
 #if 0

 

 

其他文件:

sdk/lib/spm/src/buddy_policy.c

sdk/lib/umd/tests/sample/mm_test.cc

 

3.6.6. 未使用的class声明

diff --git a/dtu_backend/dtu_executor.h b/dtu_backend/dtu_executor.h
index 5149656537f..c361bb15e9d 100644
--- a/dtu_backend/dtu_executor.h
+++ b/dtu_backend/dtu_executor.h
@@ -50,7 +50,6 @@ class ClusterAllocation;
 }
  
 class DTUObject;
-class sr::TaskContext;
 class DTUExecutor : public ::xla::dtu::DTUExecutorInterface {
  public:
   typedef typename sr::TaskContext context_type;

 

3.6.7. 未使用的类型定义

diff --git a/sdk/tests/tops/tops_transform_parameter_test.cc b/sdk/tests/tops/tops_transform_parameter_test.cc
index ee7718e40f1..c7b99fb323d 100644
--- a/sdk/tests/tops/tops_transform_parameter_test.cc
+++ b/sdk/tests/tops/tops_transform_parameter_test.cc
@@ -483,7 +483,6 @@ TEST_P(TopsGraphTransformParameterTest, TopsConv) {
       break;
   }
  
-  typedef float D_TYPE;
   int inputdata_size = input_length * (sizeof(input_data[0]));
  
   topsMemory_t output_mem;

 

 

3.7. 重复定义

tops代码栈里面各个模块都分别定义的宏非常多,轮到大家相互include的时候就会有大量重复定义问题,解决这个问题的根本解决方案还是需要提取一些公共的头文件,但各模块当前又不希望相互间存在依赖,当前只能用ifndef来包起来临时规避:

diff --git a/sdk/lib/umd/tests/sample/loop_task.cc b/sdk/lib/umd/tests/sample/loop_task.cc
index 2869f571029..099dcaf53f1 100644
--- a/sdk/lib/umd/tests/sample/loop_task.cc
+++ b/sdk/lib/umd/tests/sample/loop_task.cc
@@ -22,12 +22,14 @@
  
 using namespace std;
  
+#ifndef EFCHECK
 #define EFCHECK(__statement__)                                       \
   do {                                                               \
     sts = __statement__;                                             \
     if (sts != DTU_SUCCESS)                                          \
       failed_assertion("Failed:", __FILE__, __FUNCTION__, __LINE__); \
   } while (0)
+#endif
  
 template <int N>
 struct DataLayout {

 

其他文件:

sdk/lib/umd/tests/sample/sample_assert.h

system_test/tools/vpd_cycle/vpd_cycle.c

 

3.8. 入参初始化顺序异常

这个就出现过一次:

diff --git a/sdk/include/factor/func.h b/sdk/include/factor/func.h
index af24b782ed1..50137f7c5dd 100644
--- a/sdk/include/factor/func.h
+++ b/sdk/include/factor/func.h
@@ -4163,10 +4163,10 @@ struct FACTOR_EXPORT ConvGenDescParams {
                     int64_t Co, int64_t R, int64_t S)
       : conv_type(conv_type),
         data_format(data_format),
-        stride(stride),
-        dailations(dailations),
         opt_level(opt_level),
         padding(padding),
+        stride(stride),
+        dailations(dailations),
         N(N),
         Hi(Hi),
         Wi(Wi),

 

其他修改的文件:

sdk/tests/tops/tops_convert_parameter_test.cc

 

 

3.9. 类型申明不全

clang对直接声明一个class,但包含的头文件里面找不到完整定义的会报错。

要找到tf头文件的定义顺序是个非常麻烦的事情,幸好clang会自动搜索头文件,所以用clang的宏包起来了。

diff --git a/sdk/lib/cpu/cpu_func_runtime_context.h b/sdk/lib/cpu/cpu_func_runtime_context.h
index 530b4def8ad..62ba4099e6e 100644
--- a/sdk/lib/cpu/cpu_func_runtime_context.h
+++ b/sdk/lib/cpu/cpu_func_runtime_context.h
@@ -23,6 +23,10 @@
 #include <tuple>
 #include <vector>
  
+#if defined(__clang__)
+#include "tensorflow/compiler/xla/service/cpu/simple_orc_jit.h"
+#endif
+
 namespace xla {
 namespace cpu {
 class SimpleOrcJIT;

 

3.10. 数组初始化

3.10.1. 确实必须是变长数组的使用new[]()和delete[]来申请和释放内存

diff --git a/sdk/lib/cpu_ops/naive/dot.cc b/sdk/lib/cpu_ops/naive/dot.cc
index f4bb6b7d877..be7ddb0ab23 100755
--- a/sdk/lib/cpu_ops/naive/dot.cc
+++ b/sdk/lib/cpu_ops/naive/dot.cc
@@ -31,9 +31,9 @@ void vectorMul_4_4(const int64_t M, const int64_t N, const int64_t K, outT* out,
     int64_t m_stride = (M - m) >= stride ? stride : (M - m);
     for (int64_t n = 0; n < N;) {
       int64_t n_stride = (N - n) >= stride ? stride : (N - n);
-      register outT out_reg[m_stride * n_stride] = {0};
-      register lhsT lhs_reg[m_stride];
-      register rhsT rhs_reg[n_stride];
+      register outT* out_reg = new outT[m_stride * n_stride]();
+      register outT* lhs_reg = new outT[m_stride]();
+      register outT* rhs_reg = new outT[n_stride]();
       for (int64_t i = 0; i < K; i++) {
         for (auto idx = 0; idx < m_stride; idx++) {
           lhs_reg[idx] = ELEMENT(lhs, m + idx, i, K);
@@ -53,6 +53,9 @@ void vectorMul_4_4(const int64_t M, const int64_t N, const int64_t K, outT* out,
         }
       }
       n += n_stride;
+      delete[] rhs_reg;
+      delete[] lhs_reg;
+      delete[] out_reg;
     }
     m += m_stride;
   }

 

其他类似修改:

sdk/lib/umd/tests/sample/mm_test.cc

sdk/lib/cpu_ops/naive/dot.cc

sdk/lib/factor/codegen/macro_instruction/minst_conv2d_bpi.cc

 

 

3.10.2. 实际语义是定长数组的,通过加const修饰来解决

这种在test里面非常多,大家定义数组的时候都没有习惯把数组的长度定义加上const修饰符,这样不断可以增加执行效率,也不容易出错。

 

diff --git a/sdk/sample/batchnormalInference/tops_batchnormalInference.cc b/sdk/sample/batchnormalInference/tops_batchnormalInference.cc
index 8df67784dec..1db6440e22c 100644
--- a/sdk/sample/batchnormalInference/tops_batchnormalInference.cc
+++ b/sdk/sample/batchnormalInference/tops_batchnormalInference.cc
@@ -67,16 +67,16 @@ void topsBatchNormalInferenceNHWC() {
   topsTensorDescriptor_t xDesc;
   topsTensorDescriptor_t yDesc;
  
-  int x_c = 4;
-  int x_h = 2;
-  int x_n = 3;
-  int x_w = 2;
+  const int x_c = 4;
+  const int x_h = 2;
+  const int x_n = 3;
+  const int x_w = 2;
  
-  int scaleNums = x_c;
-  int y_c = x_c;
-  int y_h = x_h;
-  int y_n = x_n;
-  int y_w = x_w;
+  const int scaleNums = x_c;
+  const int y_c = x_c;
+  const int y_h = x_h;
+  const int y_n = x_n;
+  const int y_w = x_w;
  
   topsContext_t context;
   int clusters[] = {0};
@@ -90,7 +90,7 @@ void topsBatchNormalInferenceNHWC() {
   topsSetTensorDescriptor(yDesc, TOPS_TENSOR_NHWC, TOPS_DATA_FLOAT, y_n, y_c,
                           y_h, y_w);
  
-  int inputdata_num = x_c * x_h * x_n * x_w;
+  const int inputdata_num = x_c * x_h * x_n * x_w;
  
   D_TYPE InputData[inputdata_num] = {
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
@@ -170,16 +170,16 @@ void topsBatchNormalInferenceCHNW() {
   topsTensorDescriptor_t xDesc;
   topsTensorDescriptor_t yDesc;
  
-  int x_c = 4;
-  int x_h = 2;
-  int x_n = 3;
-  int x_w = 2;
+  const int x_c = 4;
+  const int x_h = 2;
+  const int x_n = 3;
+  const int x_w = 2;
  
-  int scaleNums = x_c;
-  int y_c = x_c;
-  int y_h = x_h;
-  int y_n = x_n;
-  int y_w = x_w;
+  const int scaleNums = x_c;
+  const int y_c = x_c;
+  const int y_h = x_h;
+  const int y_n = x_n;
+  const int y_w = x_w;
  
   topsContext_t context;
   int clusters[] = {0};
@@ -193,7 +193,7 @@ void topsBatchNormalInferenceCHNW() {
   topsSetTensorDescriptor(yDesc, TOPS_TENSOR_CHNW, TOPS_DATA_FLOAT, y_n, y_c,
                           y_h, y_w);
  
-  int inputdata_num = x_c * x_h * x_n * x_w;
+  const int inputdata_num = x_c * x_h * x_n * x_w;
  
   D_TYPE InputData[inputdata_num] = {
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
@@ -275,16 +275,16 @@ void topsBatchNormalInferenceBoundary() {
   topsTensorDescriptor_t xDesc;
   topsTensorDescriptor_t yDesc;
  
-  int x_c = 4;
-  int x_h = 2;
-  int x_n = 50;
-  int x_w = 2;
+  const int x_c = 4;
+  const int x_h = 2;
+  const int x_n = 50;
+  const int x_w = 2;
  
-  int scaleNums = x_c;
-  int y_c = x_c;
-  int y_h = x_h;
-  int y_n = x_n;
-  int y_w = x_w;
+  const int scaleNums = x_c;
+  const int y_c = x_c;
+  const int y_h = x_h;
+  const int y_n = x_n;
+  const int y_w = x_w;
  
   topsContext_t context;
   int clusters[] = {0};
@@ -298,7 +298,7 @@ void topsBatchNormalInferenceBoundary() {
   topsSetTensorDescriptor(yDesc, TOPS_TENSOR_CHNW, TOPS_DATA_FLOAT, y_n, y_c,
                           y_h, y_w);
  
-  int inputdata_num = x_c * x_h * x_n * x_w;
+  const int inputdata_num = x_c * x_h * x_n * x_w;
  
   D_TYPE InputData[inputdata_num];
   for (int i = 0; i < inputdata_num; i++) {
@@ -380,16 +380,16 @@ void topsBatchNormalInferenceScaleOffset() {
   topsTensorDescriptor_t xDesc;
   topsTensorDescriptor_t yDesc;
  
-  int x_c = 4;
-  int x_h = 2;
-  int x_n = 3;
-  int x_w = 2;
+  const int x_c = 4;
+  const int x_h = 2;
+  const int x_n = 3;
+  const int x_w = 2;
  
-  int scaleNums = x_c;
-  int y_c = x_c;
-  int y_h = x_h;
-  int y_n = x_n;
-  int y_w = x_w;
+  const int scaleNums = x_c;
+  const int y_c = x_c;
+  const int y_h = x_h;
+  const int y_n = x_n;
+  const int y_w = x_w;
  
   topsContext_t context;
   int clusters[] = {0};
@@ -403,7 +403,7 @@ void topsBatchNormalInferenceScaleOffset() {
   topsSetTensorDescriptor(yDesc, TOPS_TENSOR_NHWC, TOPS_DATA_FLOAT, y_n, y_c,
                           y_h, y_w);
  
-  int inputdata_num = x_c * x_h * x_n * x_w;
+  const int inputdata_num = x_c * x_h * x_n * x_w;
  
   D_TYPE InputData[inputdata_num] = {
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

 

还有很多,仅列出文件名:

sdk/sample/batchnormalTraining/tops_batchnormalTraining.cc

sdk/sample/broadcast/tops_broadcast.cc

sdk/sample/resnet50/TopsOpApi.cc

sdk/tests/tops/tops_batchnormalBackward_test.cc

sdk/tests/tops/tops_batchnormalTraining_test.cc

sdk/tests/tops/tops_concat_test.cc

sdk/tests/tops/tops_convert_test.cc

sdk/tests/tops/tops_customop_test.cc

sdk/tests/tops/tops_scatter_test.cc

sdk/tests/tops/tops_bnForwardTrainingEx_unit_test.cc (这个文件修改了1800+行,逼得我单独成了一个patch)

sdk/tests/tops/tops_broadcast_test.cc

sdk/tests/tops/tops_concat_test.cc

sdk/tests/tops/tops_convert_test.cc

sdk/tests/tops/tops_descriptor_test.cc

sdk/tests/tops/tops_pad_test.cc

sdk/tests/tops/tops_scatter_test.cc

 

3.11. 函数原型中的auto

clang禁止在函数原型中使用auto入参,我理解主要出于以下考虑:

1、如果该函数作为接口暴露接口出去,调用者应该用什么类型的实参?

2、如果多个调用,使用的实参类型不一样,函数体类对入参进行处理时是否会触发隐式的类型转换?而clang对存在信息损耗的隐式的类型转换是严格禁止的。

3、如果多个调用时,入参本身使用的存储长度不一样,是否会导致堆栈被破坏?例如有些用int,有些用long,函数具体编译过程中是应该实例化出来2个实体,还是单个实体?

4、函数翻译成C函数的时候,函数名称应该怎么生成?C++函数名称转换为C函数名称的时候,可没有考虑auto入参的转换规则。

auto入参的问题,主要体现在sdk/lib/tuner/pavo/和sdk/tests/factor/targets/pavo/dnn/conv/目录中:

diff --git a/sdk/lib/tuner/pavo/pavo_conv_dataflow3_bpi_non4c_impl.cc b/sdk/lib/tuner/pavo/pavo_conv_dataflow3_bpi_non4c_impl.cc
index dba48418fc2..2c59bc4eda6 100644
--- a/sdk/lib/tuner/pavo/pavo_conv_dataflow3_bpi_non4c_impl.cc
+++ b/sdk/lib/tuner/pavo/pavo_conv_dataflow3_bpi_non4c_impl.cc
@@ -31,8 +31,8 @@ namespace factor {
 using namespace hlir;
  
 static std::vector<std::vector<int64_t>> build_dim(
-    std::vector<int64_t> dim_count, auto cores_on_dim, auto sip_cord,
-    int64_t sip_num) {
+    std::vector<int64_t> dim_count, std::vector<int64_t> cores_on_dim,
+    std::vector<std::vector<int64_t>> sip_cord, int sip_num) {
   std::vector<int64_t> dim_count1 = {
       dim_count[0] / cores_on_dim[0], dim_count[1] / cores_on_dim[1],
       dim_count[2] / cores_on_dim[2], dim_count[3] / cores_on_dim[3]};

 

其他函数的修改类似,仅列出文件名:

sdk/lib/tuner/pavo/pavo_conv_dataflow5_bpi_non4c_impl.cc
sdk/lib/tuner/pavo/
pavo_conv_dataflow7_bpi_non4c_impl.cc

sdk/lib/tuner/pavo/pavo_conv_dataflow1_bpi_non4c_impl.cc

sdk/lib/tuner/pavo/pavo_conv_dataflow2_bpi_non4c_impl.cc

sdk/lib/tuner/pavo/pavo_conv_dataflow3_1_forward_non4c_impl.cc

sdk/lib/tuner/pavo/pavo_conv_dataflow5_1_forward_non4c_impl.cc

sdk/lib/tuner/pavo/pavo_conv_dataflow6_bpk_non4c_impl.cc

sdk/lib/tuner/pavo/pavo_conv_dataflow7_1_forward_non4c_impl.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c6s_bpi_dataflow1_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_bpk_1c4s_dataflow7_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_bpk_1c6s_dataflow6_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_ff_dataflow3_1_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_ff_dataflow5_1_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_ff_dataflow7_1_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow2_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow3_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow5_template_test.cc

sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow7_template_test.cc

sdk/lib/ops/common/dtu_elementwise_fusion_impl.cc

sdk/tests/llir/dma_test/slice_dma_test.cc

sdk/tests/llir/dma_test/broadcast_dma_test.cc

sdk/tests/llir/dma_test/deslice_dma_test.cc

sdk/tests/llir/dma_test/mirror_dma_test.cc

sdk/tests/llir/dma_test/padding_dma_test.cc

sdk/tests/llir/dma_test/subsampling_dma_test.cc

sdk/tests/llir/dma_test/transpose_dma_test.cc

 

3.12. strlen返回值不作为常量类型的处理

clang里面把strlen返回值当做变量处理,如果要作为const来使用,需要自己定义函数:

diff --git a/sdk/lib/profile/topspti/reader/helper.h b/sdk/lib/profile/topspti/reader/helper.h
index c56502bef86..1d642e2431e 100644
--- a/sdk/lib/profile/topspti/reader/helper.h
+++ b/sdk/lib/profile/topspti/reader/helper.h
@@ -28,7 +28,6 @@
  
 #include <cstring>
 #include <string>
-
 #include "utils/utils.h"
  
 namespace topspti2 {
@@ -36,6 +35,10 @@ namespace topspti2 {
 #define TENSOR_MARK "!dtu.tensor<"
 #define TENSOR_MARK_SZ (sizeof(TENSOR_MARK) - 1)
  
+int constexpr CONSTEXPR_STRLEN(const char *str) {
+  return *str ? 1 + CONSTEXPR_STRLEN(str + 1) : 0;
+}
+
 static inline bool HasDPF(const std::string &product) {
   return (product != "" && product != "unknown" && product != "T10" &&
           product != "T11" && product != "T10s" && product != "I10");
@@ -123,7 +126,7 @@ static inline bool FastParseSizeFromTensor(const std::string &tensor,
   if (std::string::npos == pos) {
     return false;
   }
-  constexpr int tensor_mark_sz = strlen(TENSOR_MARK);
+  constexpr int tensor_mark_sz = CONSTEXPR_STRLEN(TENSOR_MARK);
   const char *data = tensor.c_str();
   while (pos != std::string::npos) {
     pos += tensor_mark_sz;
@@ -202,7 +205,7 @@ static inline bool FastParseSizeFromMemref(const std::string &memref,
   if (0 != pos) {
     return false;
   }
-  constexpr auto memref_mark_sz = strlen(MEMREF_MARK);
+  constexpr auto memref_mark_sz = CONSTEXPR_STRLEN(MEMREF_MARK);
   pos += memref_mark_sz;
   int64_t prod = 1;
   size_t lz = memref.size();
@@ -261,7 +264,7 @@ static inline bool ParseTensorInfoFromString(const std::string &input,
                                              TensorInfoValue &tiv) {
   tiv = TensorInfoValue();
   constexpr const char *const szstr = "size:";
-  constexpr int sz = strlen(szstr);
+  constexpr int sz = CONSTEXPR_STRLEN(szstr);
  
   if (input.size() > sz && !strncmp(input.c_str(), szstr, sz)) {
     tiv.size = stoll(input.substr(sz));

 

其他类似修改:

sdk/lib/profile/libprofile_defs.h

3.13. 其他语法问题

3.13.1. lambda语法问题

参见 Lambda expressions (since C++11) - cppreference.com,lambda表达式的capture用法如下:

a comma-separated list of zero or more captures, optionally beginning with a capture-default.

See below for the detailed description of captures.

A lambda expression can use a variable without capturing it if the variable

  • is a non-local variable or has static or thread local storage duration (in which case the variable cannot be captured), or
  • is a reference that has been initialized with a constant expression.

A lambda expression can read the value of a variable without capturing it if the variable

  • has const non-volatile integral or enumeration type and has been initialized with a constant expression, or
  • is constexpr and has no mutable members.

上面的描述是说,下面这几种情况不需要指定capture:

1)非局部变量(全局变量)

2)static变量

3) thread local 变量(这种情况下不是不需要指定,是指定了也用不了)

4)常量表达式初始化的对象的引用

5)常量表达式初始化的非volatile整型或者枚举类型(只读访问)

6)不带可变成员的常量表达式(只读访问)

sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc里面使用的module_str是全局变量,不需要指定捕获,原来的写法在gcc5上可以编译通过,但gcc7和clang下面会直接报错:
diff --git a/sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc b/sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc
index 6f17f27ca4d..70506e38506 100644
--- a/sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc
+++ b/sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc
@@ -38,7 +38,7 @@ TEST(MTTest, PassMgr) {
   std::vector<std::thread> th_vec;
   th_vec.reserve(thread_count);
   for (size_t i = 0; i < thread_count; ++i) {
-    th_vec.emplace_back([&module_str]() {
+    th_vec.emplace_back([]() {
       mlir::MLIRContext context;
       mlir::OwningModuleRef module =
           mlir::parseSourceString(module_str, &context);

 

下面这个写法由于this指针虽然指定了捕获,但没有使用,所以会有“expression result unused [-Wunused-value]”告警,设置了捕获相当于在lambda函数里面做了一次声明,如果未使用会有告警:

diff --git a/tools/logging/tests/logging/test_log_old_api.cc b/tools/logging/tests/logging/test_log_old_api.cc
index 638a84cac6b..91cbf6b2676 100644
--- a/tools/logging/tests/logging/test_log_old_api.cc
+++ b/tools/logging/tests/logging/test_log_old_api.cc
@@ -18,7 +18,7 @@ class OldLogTest : public testing::Test {
     Test::SetUp();
     RegisterLogTo(this->pLog);
     pLog->setCallback(
-        [this](const std::string &msg) { std::cerr << msg << std::endl; });
+        [](const std::string &msg) { std::cerr << msg << std::endl; });
     pLog->SetAutoClear(true);
   }

 

类似的,sdk/lib/ops/common/dtu_scatter_impl.cc里面将常量alignment在捕获中定义也是错误的:

diff --git a/sdk/lib/ops/common/dtu_scatter_impl.cc b/sdk/lib/ops/common/dtu_scatter_impl.cc
index 032c19e6ef9..99058631d4c 100644
--- a/sdk/lib/ops/common/dtu_scatter_impl.cc
+++ b/sdk/lib/ops/common/dtu_scatter_impl.cc
@@ -92,7 +92,7 @@ bool predicate_func(int64_t i) {
  
 // alloc_ memory with alignment of 128 byte.
 const uint32_t alignment = 128;
-auto GetAlignedSize = [alignment](uint64_t size) {
+auto GetAlignedSize = [](uint64_t size) {
   return (size + alignment - 1) / alignment * alignment;
 };

 

3.13.2. return语句中的move调用

在return语句中使用std::move会使编译器的copy elision失效,下面修改之前的代码clang会上报告警“moving a local object in a return statement prevents copy elision [-Wpessimizing-move]”,什么是copy elision?

Copy elision - cppreference.com上的定义如下:Omits copy and move (since C++11) constructors, resulting in zero-copy pass-by-value semantics.

也就是说,如果不调用std::move,在return的过程中,编译器会尽量省略对象的copy或者move操作,达到零拷贝的效果;如果调用了std::move,会强制要求编译器调用对象的move构造函数。显然,后者更昂贵。

 
diff --git a/tools/logging/tests/logging/log_to_test.h b/tools/logging/tests/logging/log_to_test.h
index de91f49b34d..0c92e2acd8a 100644
--- a/tools/logging/tests/logging/log_to_test.h
+++ b/tools/logging/tests/logging/log_to_test.h
@@ -21,7 +21,7 @@ class LogToString : public LogDestination {
     if (autoClear_) {
       Clear();
     }
-    return std::move(ret);
+    return ret;
   }
   void SetAutoClear(bool autoClear) { autoClear_ = autoClear; }
   void Clear() { str_.clear(); }

 

3.13.3. 使用未初始化的对象

sdk/tests/runtime/device_manager_test.cc在修改前的版本中,如果result.ok()为false,则cluster没有机会初始化就会被后面的device->ClusterMemoryHandle()函数当做入参使用,会触发很恶劣的影响:
diff --git a/sdk/tests/runtime/device_manager_test.cc b/sdk/tests/runtime/device_manager_test.cc
index cf9075367a7..0adf469da8b 100644
--- a/sdk/tests/runtime/device_manager_test.cc
+++ b/sdk/tests/runtime/device_manager_test.cc
@@ -109,14 +109,13 @@ TEST_F(DeviceManagerTest, ClusterMemoryHandle_SuccessFail) {
   dtu::driver::DeviceManager* device = dtu::driver::DeviceManager::instance();
   device->AcquireDevice(0);
   dtu::StatusOr<dtu_cluster> result = device->Cluster(0, 0);
-  dtu_cluster cluster;
   if (result.ok()) {
-    cluster = std::move(result.ValueOrDie());
-    EXPECT_NE(cluster, nullptr);
+    dtu_cluster cluster = std::move(result.ValueOrDie());
+    dtu::StatusOr<dtu_mem_handle> result1 =
+        device->ClusterMemoryHandle(cluster);
   } else {
     EFLOG(FATAL) << "Get ClusterIds error: " << result.status();
   }
-  dtu::StatusOr<dtu_mem_handle> result1 = device->ClusterMemoryHandle(cluster);
   EXPECT_EQ(result.ok(), true);
   EXPECT_NE(result.ValueOrDie(), nullptr);
   device->ReleaseCluster(0, 0);

 

3.13.4. clang禁止使用括号表达式初始化数组

下面的修改前的代码clang会报错"parenthesized initialization of a member array is a GNU extension [-Wgnu-array-member-paren-init]",从gcc回报告警"list-initializer for non-class type must not be parenthesized":
diff --git a/sdk/tests/tops/tops_broadcast_parameter_test.cc b/sdk/tests/tops/tops_broadcast_parameter_test.cc
index 831d9d23791..321c56171f3 100644
--- a/sdk/tests/tops/tops_broadcast_parameter_test.cc
+++ b/sdk/tests/tops/tops_broadcast_parameter_test.cc
@@ -139,14 +139,18 @@ class TopsBroadcastParameterTest
 };
  
 TopsBroadcastParameterTest::TopsBroadcastParameterTest()
-    : x_desc_dim({GetParam().x.h, GetParam().x.w}),
-      y_desc_dim(
-          {GetParam().y.n, GetParam().y.c, GetParam().y.h, GetParam().y.w}),
-      broadcast_dims(
-          {GetParam().broadcast_dim.dim_1, GetParam().broadcast_dim.dim_2}),
-      input_length(GetParam().x.h * GetParam().x.w),
+    : input_length(GetParam().x.h * GetParam().x.w),
       output_length(GetParam().y.n * GetParam().y.c * GetParam().y.h *
-                    GetParam().y.w) {}
+                    GetParam().y.w) {
+  x_desc_dim[0] = GetParam().x.h;
+  x_desc_dim[1] = GetParam().x.w;
+  y_desc_dim[0] = GetParam().y.n;
+  y_desc_dim[1] = GetParam().y.c;
+  y_desc_dim[2] = GetParam().y.h;
+  y_desc_dim[3] = GetParam().y.w;
+  broadcast_dims[0] = GetParam().broadcast_dim.dim_1;
+  broadcast_dims[1] = GetParam().broadcast_dim.dim_2;
+}
  
 void TopsBroadcastParameterTest::freeDebugInfo() {
   if (input_mem == nullptr) {

 

类似的修改还有:

sdk/tests/tops/tops_dot_parameter_test.cc

sdk/tests/tops/tops_pad_parameter_test.cc

3.13.5. clang的泛型函数的实例化必须有相关调用才会触发

因为构造函数在sdk自身代码里面没有被调用,导致libdtu_sdk.so里面也没有相关符号,但测试函数需要使用,不得已加了个桩函数来触发构造函数实例化。

diff --git a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
index 7e366337561..41fb573a562 100644
--- a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
+++ b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.cc
@@ -248,4 +248,13 @@ vector<string> KernelCode<T>::LinkArgs() {
   return args;
 }
  
+// stab function for undefined reference to
+// 'dtu_umd::KernelCode<dtu_umd::PavoKernel>::KernelCode(llvm::StringRef)'
+void kernel_code_stab() {
+  StringRef file_name = "stab_file";
+  KernelCode<PavoKernel> k_stab1(file_name);
+  KernelCode<DoradoKernel> k_stab2(file_name);
+  KernelCode<LeoKernel> k_stab3(file_name);
+}
+
 }  // namespace dtu_umd

 

3.13.6. clang的constexpr中不允许定义需要内存处理的复杂对象

下面的模板定义中需要新生成vector对象,该对象需要在构造函数中使用内存相关处理,不修改会报错“variable of non-literal type 'std::vector<size_t>' (aka 'vector<unsigned long>') cannot be defined in a constexpr function”,将模板中的constexpr标识删掉之后正常。

查看c++标准3.9/10可以看到literal type的定义(相当于常量或者简单变量),unpack_seq_to_vector里面的vector不属于简单变量或者简单变量的数组,如果换成array应该可以通过,不过调用这个函数的地方都要修改:

A type is a literal type if it is:

  • void; or

  • a scalar type; or

  • a reference type; or

  • an array of literal type; or

  • a class type (Clause 9) that has all of the following properties:

    • it has a trivial destructor,

    • it is an aggregate type (8.5.1) or has at least one constexpr constructor or constructor template that is not a copy or move constructor, and

    • all of its non-static data members and base classes are of non-volatile literal types

diff --git a/sdk/tests/hlir/cc_tests/hlir_utils_test.cc b/sdk/tests/hlir/cc_tests/hlir_utils_test.cc
index bdc21e3f317..24e37191fc4 100644
--- a/sdk/tests/hlir/cc_tests/hlir_utils_test.cc
+++ b/sdk/tests/hlir/cc_tests/hlir_utils_test.cc
@@ -152,7 +152,7 @@ TEST(HlirUtilTest, ConstSplatValue) {
 }
  
 template <size_t... Idx>
-constexpr static auto unpack_seq_to_vector(hlir::IndexSeq<Idx...>) {
+static auto unpack_seq_to_vector(hlir::IndexSeq<Idx...>) {
   std::vector<size_t> ret = {Idx...};
   return ret;
 }

 

3.13.7. clang的虚函数的重载需要加上显式的override关键字

 

diff --git a/tools/logging/include/logging/to/file.h b/tools/logging/include/logging/to/file.h
index bdda687afdc..c6d39779bde 100644
--- a/tools/logging/include/logging/to/file.h
+++ b/tools/logging/include/logging/to/file.h
@@ -18,7 +18,7 @@ class LogToFile : public LogDestination {
   DISALLOW_COPY_AND_ASSIGN(LogToFile);
  
   static pointer Create(const std::string &file_name);
-  void Message(int level, const std::string &message);
+  void Message(int level, const std::string &message) override;
   void Flush() override;
  
  private:

 

其他类似修改:

tools/logging/include/logging/to/std_err.h

3.13.8. alignas使用问题

alignas本意是定义结构体的时候,为了优化结构体的访问效率,让结构体的存放尽量靠近大的整数边界,和c语言里面的pack不是一个概念。所以pack可以对所有对象强制指定pack(1)来确保内存访问不移位,alignas的设置却要求比结构体成员的最大长度要大:

The object or the type declared by such a declaration will have its alignment requirement equal to the strictest (largest) non-zero expression of all alignas specifiers used in the declaration, unless it would weaken the natural alignment of the type.

下面定义的结构体中有uint16_t的成员,理论上最小alignas是2,所以不能用alignas(1)来修饰:

diff --git a/sdk/lib/hlir/utils/types.h b/sdk/lib/hlir/utils/types.h
index 87aee25fe31..90cabe7bdb6 100644
--- a/sdk/lib/hlir/utils/types.h
+++ b/sdk/lib/hlir/utils/types.h
@@ -151,13 +151,13 @@ enum class CompareType {
  
 // define raw data type
 // lower to factor need raw data
-struct alignas(1) raw_bf16_ty {
+struct alignas(2) raw_bf16_ty {
   uint16_t data;
 };
 static_assert(sizeof(raw_bf16_ty) == 2, "");
  
 // half
-struct alignas(1) raw_fp16_ty {
+struct alignas(2) raw_fp16_ty {
   uint16_t data;
 };
 static_assert(sizeof(raw_fp16_ty) == 2, "");

 

3.14. 为了解决告警顺带做的一些优化

3.14.1. 冗余的计算

tools/logging/lib/logging/log_message.cc当时本来是为了解决变长数组的初始化问题,但自己阅读发现把timeval的毫秒和秒先计算成一个总的毫秒之后并没有使用,后面又直接换算成秒和毫秒再用的,所以这个换算实际上没用,和代码onwer确认之后删掉相关冗余计算。

diff --git a/tools/logging/lib/logging/log_message.cc b/tools/logging/lib/logging/log_message.cc
index 77fa33fe129..8d4cd48b348 100644
--- a/tools/logging/lib/logging/log_message.cc
+++ b/tools/logging/lib/logging/log_message.cc
@@ -25,18 +25,15 @@ std::string LogMessage::GenerateMessage() {
   std::stringstream os;
   struct timeval tv;
   gettimeofday(&tv, nullptr);
-  uint64_t now_micros = static_cast<uint64_t>(tv.tv_sec) * 1000000 + tv.tv_usec;
-  time_t now_seconds = static_cast<time_t>(now_micros / 1000000);
-  int32_t micros_remainder = static_cast<int32_t>(now_micros % 1000000);
   const size_t time_buffer_size = 50;
-  struct tm now_time = {0};
-  char time_buffer[time_buffer_size];
-  localtime_r(&now_seconds, &now_time);
+  struct tm now_time = tm();
+  char time_buffer[time_buffer_size]={0};
+  localtime_r(&tv.tv_sec, &now_time);
   strftime(time_buffer, time_buffer_size, "%Y-%m-%d %H:%M:%S", &now_time);
  
   os << time_buffer << ".";
   os.width(6);
-  os << micros_remainder << ": ";
+  os << tv.tv_usec << ": ";
   os << "DIWEF"[severity_];
   if(msg_code_) {
     os << msg_code_;

 

3.14.2. 引用指针和空指针的冗余比较

对象的引用是指某个对象的地址,肯定不是空,所以将它和nullptr做比较没有意义:

diff --git a/tools/logging/lib/logging/log_module.cc b/tools/logging/lib/logging/log_module.cc
index f40e13d6fea..3ea150b37a0 100644
--- a/tools/logging/lib/logging/log_module.cc
+++ b/tools/logging/lib/logging/log_module.cc
@@ -27,10 +27,6 @@ LogModuleMgr &LogModuleMgr::Instance() {
 }
  
 void LogModuleMgr::UpdateModuleMaskFromEnv(const std::string &env) {
-  if (&env == nullptr) {
-    return;
-  }
-
   EFLOG(DBG) << "Init Logging Module" << std::endl;
   EFLOG(DBG) << "ENFLAME_LOG_DEBUG_MOD = " << env << std::endl;
   auto tokens = strutil::split(env, ',');
@@ -91,4 +87,4 @@ void LogModuleMgr::SetModuleOff(EF_LOG_MOD module) {
   mod_status_[static_cast<int>(module)] = false;
 }
  
-} // namespace dtu
\ No newline at end of file
+} // namespace dtu

 

 


免责声明!

本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系本站邮箱yoyou2525@163.com删除。



 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM