1. 概述
截止2021年11月25日,clang9完成sdk/gtest/dsopt模塊的編譯。
參照下面的腳本下載了所有[TR-16607] clang9交叉編譯工具鏈制作和驗證 - Enflame Company JIRA相關的修改,包含merged和當前還是open狀態的修改:
怎么從gerrit批量導出詳細的patch - 周榮華_Ronghua - enflame wiki
特地說明一下,gerrit的query命令里面不能有括號,所以實際如果存在多個條件的復雜聯合時,默認是AND運算,如果想使用OR運算的話,需要把多個可選表達式用OR連接起來。
簡單統計了一下,新增3924行代碼,刪除4164行代碼:
PS D:\code> grep "^+[^+]" .\diffrecord.txt |wc 3924 24785 152346 PS D:\code> grep "^-[^-]" .\diffrecord.txt |wc 4164 23159 147430
前期修改的時候,由於打開了-Werr選項,所以有一些是不太重要的告警,由於告警實在太多,后期將-Werr臨時先關閉了,只保留了部分特定的Werr選項。
另外,由於tops下面的代碼中從大的整型向小的整型隱式轉換的非常多,后面還用-Wno-c++11-narrowing臨時關閉了相關告警。
2. 問題發現和解決的方法
如果每次發現一個問題之后,修改完之后,再走全量編譯,通常非常耗時,下面的方法可以獲取單個的編譯或者鏈接命令,便於針對性驗證。
2.1. cmake的編譯命令獲取
cmake有編譯字典,在cmake_build(敲cmake命令的目錄,可能是其他目錄)目錄下會生成一個“compile_commands.json”文件,里面記錄了所有.c/.cc/.cpp生成.o的目錄和完整命令,例如想知道
hlir_utils_test.cc的編譯命令,可以用下面的途徑獲取:
grep hlir_utils_test.cc compile_commands.json "command": "/opt/efb/clang9/bin/clang++ -DLLVM_DISABLE_ABI_BREAKING_CHECKS_ENFORCING -D_GLIBCXX_USE_CXX11_ABI=0 -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/sdksrc/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/sdksrc/include/_virtual_includes/include/dtu -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/lib/umd/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/ef_log/include/_virtual_includes/include -I/home/ronghua.zhou/clang1_build/tops/sdk -I/home/ronghua.zhou/clang1_build/tops/sdk/lib -I/home/ronghua.zhou/clang1_build/tops/sdk/lib/cpu_ops -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/llvm-project/llvm/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/llvm-project/mlir/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/org_tensorflow -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/eigen_archive -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/com_google_absl -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/external/com_google_protobuf/src -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/dtu_sdk/bazel-bin -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/llvm-project/llvm/utils/unittest/googlemock/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/external/com_googlesource_code_re2 -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/lib -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/org_tensorflow -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/llvm-project/llvm/include -I/home/ronghua.zhou/clang1_build/tops/../test_sdk_build/execroot/dtu_sdk/bazel-out/k8-opt/bin/external/llvm-project/mlir/include -isystem /home/ronghua.zhou/clang1_build/tops/3rdparty/googletest/include -isystem /home/ronghua.zhou/clang1_build/tops/3rdparty/googletest -O3 -g0 -DNDEBUG -fPIE -m64 -march=x86-64 -mtune=generic -Werror=array-bounds -Werror=empty-body -Werror=format-extra-args -Werror=incompatible-pointer-types -Werror=array-bounds-pointer-arithmetic -Werror=c++-compat -Werror=shift-count-overflow -Werror=sizeof-pointer-memaccess -Werror=for-loop-analysis -Werror=unused-label -Werror=delete-incomplete -Werror=empty-translation-unit -Werror=unused-local-typedef -Werror=gnu-case-range -Werror=mismatched-new-delete -Werror=infinite-recursion -Werror=unreachable-code -Werror=sometimes-uninitialized -Werror=c++14-binary-literal -Werror=implicit-fallthrough -Werror=constant-logical-operand -Werror=exceptions -fcxx-exceptions -Werror=extra-tokens -Werror=format -Werror=format-security -Werror=header-guard -Werror=literal-conversion -Werror=null-conversion -Werror=pointer-bool-conversion -Werror=shift-overflow -Werror=tautological-constant-out-of-range-compare -Werror=tautological-pointer-compare -Werror=varargs -Wdouble-promotion -Wno-error=extern-c-compat -Wall -Wno-c++11-narrowing -Wextra -fsanitize=address -fno-omit-frame-pointer -std=gnu++14 -std=gnu++14 -o sdk/tests/hlir/cc_tests/CMakeFiles/hlir_utils_test.dir hlir_utils_test.cc.o -c /home/ronghua.zhou/clang1_build/tops/sdk/tests/hlir/cc_tests/hlir_utils_test.cc", "file": "/home/ronghua.zhou/clang1_build/tops/sdk/tests/hlir/cc_tests/hlir_utils_test.cc"
2.2. bazel的編譯命令獲取
•https://github.com/vincent-picaud/Bazel_and_CompileCommands
上面這個開源項目提到可以用–experimental_action_listener=//tools/actions:generate_compile_commands_listener到bazel命令的方式來實現接收編譯命令,但我用了幾次沒有成功,最終改為在編譯過程中用原始的ps命令來獲取,例如想獲取hlir_utils_test.ccbian編譯命令可以用下面的命令:
ps -elf |grep hlir_utils_test.cc
另外,bazel命令后面加上-s參數也可以達到獲取后續編譯命令的效果。
2.3. 鏈接命令的獲取
如果知道鏈接的具體目標文件,可以參照2.2的方法用ps命令獲取,例如要鏈接libdtu_sdk.so,可以用下面命令獲取鏈接命令:
ps -elf |grep libdtu_sdk.so
如果不清楚鏈接的具體目標,在鏈接對象不多的情況下可以用“ps -elf”獲取一個全集,從全集里面可以看到很多“ld @/tmp/response-xxx.txt”的進程,將當前所有的/tmp/response*拷貝到別的目錄下,研究下這些文件用來鏈接生成什么目標的,這些文件里面會有完整的鏈接命令和參數,通過這個文件可以得到鏈接命令。
3. 實際修改分類
3.1. 編譯選項的修改
3.1.1. 增加的選項
-fcxx-exceptions :因為dsopt使用了異常,clang的異常處理默認關閉,需要打開。
-Wno-c++11-narrowing :tops下面的代碼中從大的整型向小的整型隱式轉換的非常多,臨時關閉,等各個組件消除了相關問題之后再打開,clang里面把從大整型到小整型的隱式轉換當做錯誤處理。
3.1.2. 刪除的選項
-Werror : 告警實在太多,要求消除所有告警不現實,臨時先刪除該選項。
3.1.3. 修改的選項
set (CMAKE_CXX_STANDARD 14) :原來的默認標准是17,和TensorFlow的默認標准14沖突,也和gcc的默認標准14沖突,改成c++14。
-fno-canonical-system-headers :這個參數僅gcc支持,clang不支持,所以把它從所有編譯器都打開,改到僅gcc打開。
3.1.4. bazel的選項說明
bazel的編譯選項分copt/cxxopt/conlyopt,其中copt是c和c++公用的選項,cxxopt是僅c++才是用的選項,conlyopt是僅c才有的選項,如果用錯了,會出現很多告警。
3.1.5. CMAKE的CMAKE_TOOLCHAIN_FILE變量在rerun的時候,有一定概率會把搜索路徑下的工具鏈配置文件加上全路徑,導致直接STREQUAL判斷失敗
解決方案是用MATCHES代替STREQUAL,通配是否增加全路徑的情況:
3.2. 模板相關錯誤
3.2.1. use 'template' keyword to treat 'cast' as a dependent template name
clang里面對在一個模板實例化后的對象中調用一個需要動態翻譯的函數,需要使用template顯示說明,否則會報錯。參照ISO C++03 14.2/4:
When the name of a member template specialization appears after . or -> in a postfix-expression, or after nested-name-specifier in a qualified-id, and the postfix-expression or qualified-id explicitly depends on a template-parameter (14.6.2), the member template name must be prefixed by the keyword template. Otherwise the name is assumed to name a non-template.
例如hlir的SinkTransposeWithScalarBroadcast類里面調用了mlir::RankedTensorType、mlir::ShapedType的cast方法
注意,如果不是模板實例化的函數,不需要加template,同一個類里面也存在不需要處理的函數調用,例如同一個文件里面的ss對象是非模板實例化的,類型是固定的mlir::Operation*,ss在調用存在多態的cast函數時就不需要使用temple進行前置聲明:
mlir::Operation* ss = op.getOperation(); auto new_operand_ty = getTransposedType(operand_ty, prePermutation); auto new_source_ty = getTransposedType(source_ty, prePermutation); auto new_result_ty = getTransposedType( ss->getResult(0).getType().cast<mlir::RankedTensorType>(), prePermutation);
同樣的問題也存在於factor模塊的factor_profiler_pass.cc中:
3.2.2. 二義性
部分模板實例化的時候,如果同一個調用用模板函數A和模板函數B都能正常匹配到,clang會報二義性錯誤,gcc不報錯。
例如下面的EraseHelp,原來的版本定義了兩種原型,其實對存在多個模板類型需要使用TypeSequence進行原型定義的時候,編譯器其實不知道是該先把Last抽出來計算,還是先把Inner抽出來計算,如果這2個函數的實現邏輯不一樣的話,在gcc里面居然沒報錯,不知道是隨機找到一個匹配的原型就調用,還是用第一個或者最后一個原型來調用。
constexpr static auto EraseHelp(TypeSequence<Left...>, TypeSequence<Last>);
constexpr static auto EraseHelp(TypeSequence<Left...>, TypeSequence<Inner, Right...>);
3.3. 類型不匹配
3.3.1. 大整型向小整型的隱式轉換
例如sdk/tests/llir/dataflow1_pingpang_buffer_test.cc里面定義的func_entry是int64_t類型,但實際調用函數的時候,函數原型要求的入參是uint32_t,會觸發int64_t → uint32_t的隱式轉換:
其他類似的有:
sdk/tests/llir/dataflow1_test.cc
sdk/tests/llir/dataflow2_test.cc
sdk/tests/llir/dataflow3_test.cc
sdk/tests/llir/dataflow5_test.cc
sdk/tests/llir/dataflow5_test_1xcdma.cc
sdk/tests/llir/dataflow7_test.cc
sdk/tests/llir/llir2assembler_leo_test.cc
sdk/tests/llir/utils/llir_test_util.cc
sdk/tests/llir/utils/llir_test_util.h
3.3.2. 有符號向無符號的隱式轉換
-1轉換為無符號整型:
其他主要體現在迭代器定義的是int類型,但實際使用過程中需要和很多uint32_t進行比較,導致了隱式的int → uint32的轉換:
其他文件:
sdk/lib/spm/src/buddy_policy.c
system_test/tools/vpd_cycle/vpd_cycle.c
sdk/lib/spm/include/spm.h
sdk/tests/llir/llir2assembler_leo_test.cc
sdk/tests/llir/dataflow5_test_1xcdma.cc
sdk/tests/llir/dataflow5_test.cc
sdk/tests/llir/llir2assembler_leo_test.cc
sdk/tests/llir/utils/llir_test_util.cc
sdk/tests/llir/utils/llir_test_util.h
對sdk/lib/umd/tools/kernel_code_processor/dturt.inc的修改要麻煩一點,___leo_runtime___和___x_runtime___定義的時候是char[],但初始化有可能大於127,會導致溢出,但使用該變量的函數,以及二級引用的函數,都要求它是char[],最終修改是定義改成unsigned char[],但在一級引用的函數中做一次強制轉換。
diff --git a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h index d193a8823ac..f61d048cfd6 100644 --- a/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h +++ b/sdk/lib/umd/tools/kernel_code_processor/kernel_code.h @@ -128,7 +128,10 @@ class Kernel { struct __target__ : public KernelCode<__target__>, public Kernel { \ using KernelCode<__target__>::KernelCode; \ static const llvm::StringRef GetArch() { return #__arch__; } \ - static const char* GetRTBuffer() { return ___##__arch__##_runtime___; } \ + static const char* GetRTBuffer() { \ + return static_cast<char*>(static_cast<void*>( \ + const_cast<unsigned char*>(___##__arch__##_runtime___))); \ + } \ static int GetRTBufferSize() { return ___##__arch__##_runtime_size___; } \ }; \ template class KernelCode<__target__>
3.3.3. 浮點向整型的隱式轉換
小數點直接轉沒了,非0值立即成了0值:
其他類似修改:
sdk/tests/op/hlir/pavo/bert/hlir_div_test.cc
3.3.4. double向float的隱式轉換
其他類似修改:
sdk/tests/op/hlir/pavo/resnet50/hlir_general_resize_test.cc
3.3.5. 指針向bool的隱式轉換
3.3.6. 不同類型隱式轉換
fixed_size_mem_pool.h直接將dtu_status和int相互賦值,雖然dtu_status是個enum類型,和int類型很類似,但clang是強類型檢查,直接報錯。
dtu_status的定義:
typedef enum dtu_status_code { DTU_SUCCESS = 0, DTU_ERROR_INVALID_PARAMETER = -100, DTU_ERROR_INVALID_MEM_TYPE = -101, DTU_ERROR_OUT_OF_MEMORY = -102, DTU_ERROR_OUT_OF_RESOURCES = -103, DTU_ERROR_NOT_INITIALIZED = -104, DTU_ERROR_INVALID_CTX_OBJ = -105, DTU_ERROR_INVALID_CLUSTER_OBJ = -106, DTU_ERROR_INVALID_SIP_OBJ = -107, DTU_ERROR_INVALID_MEM_OBJ = -108, DTU_ERROR_INVALID_CMD_QUEUE_OBJ = -109, DTU_ERROR_INVALID_CMD_DESC_OBJ = -110, DTU_ERROR_INVALID_PROGRAM_OBJ = -111, DTU_ERROR_INVALID_FUNCTION_OBJ = -112, DTU_ERROR_INVALID_EVENT_OBJ = -113, DTU_ERROR_CLUSTER_BUSY = -114, DTU_ERROR_SIP_BUSY = -115, DTU_ERROR_IN_DRM = -116, DTU_ERROR_IN_IOCTRL = -117, DTU_ERROR_GEM_CREATE = -118, DTU_ERROR_GEM_CLOSE = -119, DTU_ERROR_GEM_MMAP = -120, DTU_ERROR_GEM_UNMMAP = -121, DTU_ERROR_CMD_QUEUE_SYNC = -122, DTU_ERROR_CMD_QUEUE_EMIT = -123, DTU_ERROR_CLUSTER_ACQUIRE = -124, DTU_ERROR_CLUSTER_RELEASE = -125, DTU_ERROR_NOT_MATCH = -126, DTU_ERROR_NOT_RELEASE_REF = -127, DTU_ERROR_GET_DEVICE_HDL = -128, DTU_ERROR_ALLOC_HOST = -129, DTU_ERROR_ALLOC_HBM = -130, DTU_ERROR_ALLOC_CLUSTER = -131, DTU_ERROR_FREE_HOST = -132, DTU_ERROR_FREE_HBM = -133, DTU_ERROR_FREE_CLUSTER = -134, DTU_ERROR_CMD_QUEUE_EMITED = -135, DTU_ERROR_OPEN_FILE = -136, DTU_ERROR_READ_FILE = -137, DTU_ERROR_WRITE_FILE = -138, DTU_ERROR_INVALID_BIN_TYPE = -139, DTU_ERROR_LOAD_BIN_FILE = -140, DTU_ERROR_LOAD_BIN_IMAGE = -141, DTU_ERROR_FUNCTION_NOT_FOUND = -142, DTU_ERROR_INVALID_OPERATION = -143, DTU_ERROR_EVENT_GET_ID = -144, DTU_ERROR_EVENT_WAIT_STATUS = -145, DTU_ERROR_EVENT_SIGNAL_STATUS = -146, DTU_ERROR_EVENT_TYPE = -147, DTU_ERROR_EVENT_NOT_SUBMIT = -148, DTU_ERROR_EVENT_DESTROYED = -149, DTU_ERROR_EVENT_SIGNAL_TWICE = -150, DTU_ERROR_MEMORY_OVERLAP = -151, DTU_ERROR_THREAD_POOL_QUEUE_OVERFLOW = -152, DTU_ERROR_PCI_BUS_SCAN = -153, DTU_ERROR_ALLOC_USERPTR = -154, DTU_ERROR_FREE_USERPTR = -155, DTU_ERROR_DUMP_CMEM = -156, DTU_ERROR_LOAD_CMEM = -157, DTU_ERROR_DUMP_SMEM = -158, DTU_ERROR_LOAD_SMEM = -159, DTU_ERROR_READ_REGISTERS = -160, DTU_ERROR_WRITE_REGISTERS = -161, DTU_ERROR_ALLOC_SIP = -162, DTU_ERROR_FREE_SIP = -163, DTU_ERROR_UNKNOWN = -164, DTU_ERROR_ALLOC_HUGE = -165, DTU_ERROR_INVALID_USR_IRQ_OBJ = -166, DTU_ERROR_LINK_CCIX_IO = -167, DTU_ERROR_PLACEHOLDER_NOT_FEED = -168, DTU_ERROR_LAUNCH_DMA = -169, DTU_ERROR_INVALID_PROFILE_MAGIC = -170, DTU_ERROR_INVALID_TIMESTAMP = -180, DTU_ERROR_INVALID_CONFIG = -181, DTU_ERROR_CHILD_NOT_SUBMIT = -182, DTU_ERROR_ALREADY_FORKED = -183, DTU_ERROR_LABEL_USED = -184, DTU_ERROR_LABEL_NOT_VALIDATED = -185, DTU_ERROR_COMMAND_TYPE_MISMATCH = -186, DTU_ERROR_VECTOR_NUMBER = -187, DTU_ERROR_VECTOR_FLAG_MISMATCH = -188, DTU_ERROR_DEVICE_RESET = -189, DTU_ERROR_EXECUTABLE_CRC_VERIFY = -190, DTU_ERROR_EXECUTABLE_DEVICE_VERIFY = -191, DTU_ERROR_INVALID_TS_OBJ = -192, DTU_ERROR_ALLOC_VDEV = -193, DTU_ERROR_FREE_VDEV = -194, DTU_ERROR_VDEV_BUSY = -195, } dtu_status;
NULL和0的值雖然一樣,但前者的類型是void*,后者類型是int,差別很大的。
3.3.7. 函數原型中的const隱式轉換
3.3.8. void*向char*的隱式轉換
很多模塊直接對void*指針多算術運算,void*指向的對象大小是未知的,一般如果把它作為地址進行+或者-運算,實際上是自己先做了一次隱式的void* → char*的轉換,clang中不允許這樣做:
其他類似修改:
sdk/tests/hlir/cc_tests/hlir_corner_test.cc
sdk/tests/hlir/cc_tests/hlir_press_test.cc
sdk/tests/tops/tops_dot_test.cc
sdk/tests/op/hlir/pavo/bert/hlir_broadcast_test.cc
sdk/tests/op/hlir/pavo/resnet50/hlir_transpose_test.cc
sdk/tests/op/hlir/pavo/resnet50/hlir_test_header.h
sdk/tests/op/hlir/pavo/resnet50/hlir_slice_test.cc
sdk/tests/op/hlir/pavo/resnet50/hlir_select_and_scatter_test.cc
sdk/tests/op/hlir/pavo/resnet50/hlir_select_and_scatter_non4c_test.cc
sdk/tests/op/hlir/pavo/resnet50/hlir_pad_test.cc
sdk/tests/op/hlir/pavo/resnet50/hlir_dynamic_update_slice_test.cc
sdk/tests/op/hlir/pavo/resnet50/hlir_dynamic_slice_test.cc
sdk/tests/op/hlir/pavo/resnet50/hlir_concat_test.cc
sdk/tests/op/hlir/pavo/resnet50/hlir_broadcast_test.cc
sdk/tests/op/hlir/pavo/dnn/hlir_test_header.h
sdk/tests/op/hlir/hlir_test_header.h
sdk/tests/runtime/executable_test.cc
3.3.9. string類型到char*的隱式轉換
3.4. switch中break缺失
3.4.1. 語義上確實需要break的場景,增加break
例如parser.hpp里面在最后的default分支之前沒有加break,雖然由於default分支當前是空的,所以實際上不影響功能,但萬一后面default分支增加了任何處理,就會出問題:
其他類似修改:
sdk/sdk.bzl
sdk/third_party/inja.patch
3.4.2. 語義上確實不需要break的場景,增加編譯指示,讓編譯器忽略檢查
這樣的問題比較普遍。
其他類似修改:
sdk/tests/runtime/mem_manager_test.cc
sdk/tests/runtime/mem_pool_test.cc
sdk/tools/dtu_compiler/dtu_compiler.cc
sdk/lib/umd/tests/sample/tinyxmlparser.cc
另外,C++17開始支持fallthrough的attribute,可以比較簡單的告訴編譯器需要fallthrough:C++ attribute: fallthrough (since C++17) - cppreference.com
3.5. format不匹配問題
3.5.1. 不匹配,但實際上不影響功能
format的string和后面實際傳遞的參數不一致的情況下,有可能導致嚴重問題,不過tops下面的代碼很多是ll類型傳遞了64位數據,實際上對功能影響不大,但如果后面有128位處理器,可能ll就是實際上128位,就可能導致堆棧異常。
diff --git a/sdk/tests/runtime/chunk_allocator_test.cc b/sdk/tests/runtime/chunk_allocator_test.cc index e63568ddc63..78896778a21 100644 --- a/sdk/tests/runtime/chunk_allocator_test.cc +++ b/sdk/tests/runtime/chunk_allocator_test.cc @@ -426,7 +426,7 @@ TEST_F(ChunkAllocatorTest, basic_stress_test) { if (allocated_size < allocated_size_pass) { char str_buf[256]; snprintf(str_buf, sizeof(str_buf), - "allocated_size: %llx, allocated_chunks.size(): %lu", + "allocated_size: %lx, allocated_chunks.size(): %lu", allocated_size, allocated_chunks.size()); EXPECT_TRUE(false) << str_buf; break;
其他文件:
sdk/lib/umd/tests/sample/mm_test.cc
sdk/include/driver/mem_handle.h
sdk/include/runtime/command_packet.h
sdk/include/driver/mem_handle.h
sdk/tests/runtime/mem_pool_test.cc
sdk/lib/umd/tests/sample/performance_test.cc
sdk/tests/profile/test_zebu.cc
sdk/runtime/tests/top_scheduler/loop_task_utils.h
3.5.2. 不匹配,並且影響功能
下面本意是打印uint16_t*的指針指向的數據,錯誤傳遞成指針,相當於打印的是一個地址,而不是值,幸好只是一句打印,但實際上%hu對應的是32位,而入參指針在64位機器上是64位,還是會破壞堆棧:
3.6. 有定義無使用
3.6.1. 未使用變量
其他文件:
sdk/tests/spm/basic.cc
sdk/lib/spm/src/best_fit_policy.c
3.6.2. 未使用參數
非常多,尤其涉及一些第三方組件,還要專門制作patch的方式修改,后面忍不住把Werr關掉主要也是因為這個告警:
diff --git a/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc b/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc index abd50ad4e81..1b39f594406 100644 --- a/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc +++ b/sdk/lib/umd/tests/sample/callback_multi_ctx_test.cc @@ -16,6 +16,7 @@ #include "dtu/umd/dtu.h" #include "dtu/umd/dtu_base_obj.h" #include "dtu/umd/dtu_log.h" +#include "dtu/umd/dtu_utils.h" #include "lib/umd/src/dtu_memory.h" #include "lib/umd/tests/sample/sample.h" #include "lib/umd/tests/sample/sample_assert.h" @@ -26,6 +27,7 @@ std::mutex mtx; void event_callback_func_1(dtu_callback callback, void *user_data, u32 engine_id) { + MAYBE_UNUSED(callback); std::unique_lock<std::mutex> lock(mtx); *(int *)user_data = 1; DTU_ERROR_LOG(TEST, "event callback_1 call[%d]\n", engine_id); @@ -33,6 +35,7 @@ void event_callback_func_1(dtu_callback callback, void *user_data, void event_callback_func_2(dtu_callback callback, void *user_data, u32 engine_id) { + MAYBE_UNUSED(callback); std::unique_lock<std::mutex> lock(mtx); *(int *)user_data = 1; DTU_ERROR_LOG(TEST, "event callback_2 call[%d]\n", engine_id);
其他文件:
tools/logging/lib/logging/log.cc
tools/logging/lib/logging/to/file.cc
tools/logging/lib/logging/to/std_err.cc
tools/logging/lib/util/signal_handler.cc
tools/logging/tests/logging/log_to_test.h
sdk/lib/umd/include/dtu_utils.h
sdk/lib/umd/include/reference_obj.h
3rdparty/protobuf-3.8.0/src/google/protobuf/arena.h
sdk/lib/umd/tests/sample/device_reset.cc
sdk/lib/umd/tests/sample/usr_irq.cc
sdk/lib/umd/tests/sample/callback_test.cc
3rdparty/protobuf-3.8.0/src/google/protobuf/map_type_handler.h
3rdparty/protobuf-3.8.0/src/google/protobuf/parse_context.h
kmd/utils/ktest/kmd-test.cpp
sdk/lib/spm/src/buddy_policy.c
sdk/lib/umd/include/dtu_command_obj.h
sdk/lib/umd/include/dtu_context_obj.h
sdk/lib/umd/include/dtu_dqm_obj.h
sdk/lib/umd/include/dtu_driver.h
system_test/tools/vpd_cycle/vpd_cycle.c
sdk/lib/spm/src/buddy_policy.c
sdk/lib/spm/src/interface.c
sdk/lib/spm/src/rbtree.c
sdk/lib/umd/include/dtu_device.h
sdk/lib/umd/include/dtu_driver.h
另外,tools/logging/include/logging/check.h里面的未使用變量比較特殊,實際上是要用的,不過接口調用錯了,導致信息傳遞中丟失了:
3.6.3. 未使用label
3.6.4. 執行不到的代碼
下面代碼開發解釋是當前不支持,又不想刪除,先加個注釋:
sdk/tests/tops/tops_customop_upsample_nearest_test.cc也會報未使用代碼,主要是因為Co當前是固定值,導致第一層if判斷永遠未false,實際上后面這層循環也兼容了Co為1的場景,完全可以去掉:
3.6.5. 未被調用的inline函數
其他文件:
sdk/lib/spm/src/buddy_policy.c
sdk/lib/umd/tests/sample/mm_test.cc
3.6.6. 未使用的class聲明
3.6.7. 未使用的類型定義
3.7. 重復定義
tops代碼棧里面各個模塊都分別定義的宏非常多,輪到大家相互include的時候就會有大量重復定義問題,解決這個問題的根本解決方案還是需要提取一些公共的頭文件,但各模塊當前又不希望相互間存在依賴,當前只能用ifndef來包起來臨時規避:
其他文件:
sdk/lib/umd/tests/sample/sample_assert.h
system_test/tools/vpd_cycle/vpd_cycle.c
3.8. 入參初始化順序異常
這個就出現過一次:
其他修改的文件:
sdk/tests/tops/tops_convert_parameter_test.cc
3.9. 類型申明不全
clang對直接聲明一個class,但包含的頭文件里面找不到完整定義的會報錯。
要找到tf頭文件的定義順序是個非常麻煩的事情,幸好clang會自動搜索頭文件,所以用clang的宏包起來了。
3.10. 數組初始化
3.10.1. 確實必須是變長數組的使用new[]()和delete[]來申請和釋放內存
其他類似修改:
sdk/lib/umd/tests/sample/mm_test.cc
sdk/lib/cpu_ops/naive/dot.cc
sdk/lib/factor/codegen/macro_instruction/minst_conv2d_bpi.cc
3.10.2. 實際語義是定長數組的,通過加const修飾來解決
這種在test里面非常多,大家定義數組的時候都沒有習慣把數組的長度定義加上const修飾符,這樣不斷可以增加執行效率,也不容易出錯。
還有很多,僅列出文件名:
sdk/sample/batchnormalTraining/tops_batchnormalTraining.cc
sdk/sample/broadcast/tops_broadcast.cc
sdk/sample/resnet50/TopsOpApi.cc
sdk/tests/tops/tops_batchnormalBackward_test.cc
sdk/tests/tops/tops_batchnormalTraining_test.cc
sdk/tests/tops/tops_concat_test.cc
sdk/tests/tops/tops_convert_test.cc
sdk/tests/tops/tops_customop_test.cc
sdk/tests/tops/tops_scatter_test.cc
sdk/tests/tops/tops_bnForwardTrainingEx_unit_test.cc (這個文件修改了1800+行,逼得我單獨成了一個patch)
sdk/tests/tops/tops_broadcast_test.cc
sdk/tests/tops/tops_concat_test.cc
sdk/tests/tops/tops_convert_test.cc
sdk/tests/tops/tops_descriptor_test.cc
sdk/tests/tops/tops_pad_test.cc
sdk/tests/tops/tops_scatter_test.cc
3.11. 函數原型中的auto
clang禁止在函數原型中使用auto入參,我理解主要出於以下考慮:
1、如果該函數作為接口暴露接口出去,調用者應該用什么類型的實參?
2、如果多個調用,使用的實參類型不一樣,函數體類對入參進行處理時是否會觸發隱式的類型轉換?而clang對存在信息損耗的隱式的類型轉換是嚴格禁止的。
3、如果多個調用時,入參本身使用的存儲長度不一樣,是否會導致堆棧被破壞?例如有些用int,有些用long,函數具體編譯過程中是應該實例化出來2個實體,還是單個實體?
4、函數翻譯成C函數的時候,函數名稱應該怎么生成?C++函數名稱轉換為C函數名稱的時候,可沒有考慮auto入參的轉換規則。
auto入參的問題,主要體現在sdk/lib/tuner/pavo/和sdk/tests/factor/targets/pavo/dnn/conv/目錄中:
其他函數的修改類似,僅列出文件名:
sdk/lib/tuner/pavo/pavo_conv_dataflow5_bpi_non4c_impl.cc
sdk/lib/tuner/pavo/pavo_conv_dataflow7_bpi_non4c_impl.cc
sdk/lib/tuner/pavo/pavo_conv_dataflow1_bpi_non4c_impl.cc
sdk/lib/tuner/pavo/pavo_conv_dataflow2_bpi_non4c_impl.cc
sdk/lib/tuner/pavo/pavo_conv_dataflow3_1_forward_non4c_impl.cc
sdk/lib/tuner/pavo/pavo_conv_dataflow5_1_forward_non4c_impl.cc
sdk/lib/tuner/pavo/pavo_conv_dataflow6_bpk_non4c_impl.cc
sdk/lib/tuner/pavo/pavo_conv_dataflow7_1_forward_non4c_impl.cc
sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c6s_bpi_dataflow1_template_test.cc
sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_bpk_1c4s_dataflow7_template_test.cc
sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_bpk_1c6s_dataflow6_template_test.cc
sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_ff_dataflow3_1_template_test.cc
sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_ff_dataflow5_1_template_test.cc
sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_ff_dataflow7_1_template_test.cc
sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow2_template_test.cc
sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow3_template_test.cc
sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow5_template_test.cc
sdk/tests/factor/targets/pavo/dnn/conv/dnn_conv_gen_1c4s_bpi_dataflow7_template_test.cc
sdk/lib/ops/common/dtu_elementwise_fusion_impl.cc
sdk/tests/llir/dma_test/slice_dma_test.cc
sdk/tests/llir/dma_test/broadcast_dma_test.cc
sdk/tests/llir/dma_test/deslice_dma_test.cc
sdk/tests/llir/dma_test/mirror_dma_test.cc
sdk/tests/llir/dma_test/padding_dma_test.cc
sdk/tests/llir/dma_test/subsampling_dma_test.cc
sdk/tests/llir/dma_test/transpose_dma_test.cc
3.12. strlen返回值不作為常量類型的處理
clang里面把strlen返回值當做變量處理,如果要作為const來使用,需要自己定義函數:
其他類似修改:
sdk/lib/profile/libprofile_defs.h
3.13. 其他語法問題
3.13.1. lambda語法問題
參見 Lambda expressions (since C++11) - cppreference.com,lambda表達式的capture用法如下:
a comma-separated list of zero or more captures, optionally beginning with a capture-default.
See below for the detailed description of captures.
A lambda expression can use a variable without capturing it if the variable
- is a non-local variable or has static or thread local storage duration (in which case the variable cannot be captured), or
- is a reference that has been initialized with a constant expression.
A lambda expression can read the value of a variable without capturing it if the variable
- has const non-volatile integral or enumeration type and has been initialized with a constant expression, or
- is
constexpr
and has no mutable members.
上面的描述是說,下面這幾種情況不需要指定capture:
1)非局部變量(全局變量)
2)static變量
3) thread local 變量(這種情況下不是不需要指定,是指定了也用不了)
4)常量表達式初始化的對象的引用
5)常量表達式初始化的非volatile整型或者枚舉類型(只讀訪問)
6)不帶可變成員的常量表達式(只讀訪問)
sdk/tests/hlir/cc_tests/hlir_pass_manager_test.cc里面使用的module_str是全局變量,不需要指定捕獲,原來的寫法在gcc5上可以編譯通過,但gcc7和clang下面會直接報錯:
下面這個寫法由於this指針雖然指定了捕獲,但沒有使用,所以會有“expression result unused [-Wunused-value]”告警,設置了捕獲相當於在lambda函數里面做了一次聲明,如果未使用會有告警:
類似的,sdk/lib/ops/common/dtu_scatter_impl.cc里面將常量alignment在捕獲中定義也是錯誤的:
3.13.2. return語句中的move調用
在return語句中使用std::move會使編譯器的copy elision失效,下面修改之前的代碼clang會上報告警“moving a local object in a return statement prevents copy elision [-Wpessimizing-move]”,什么是copy elision?
Copy elision - cppreference.com上的定義如下:Omits copy and move (since C++11) constructors, resulting in zero-copy pass-by-value semantics.
也就是說,如果不調用std::move,在return的過程中,編譯器會盡量省略對象的copy或者move操作,達到零拷貝的效果;如果調用了std::move,會強制要求編譯器調用對象的move構造函數。顯然,后者更昂貴。
3.13.3. 使用未初始化的對象
sdk/tests/runtime/device_manager_test.cc在修改前的版本中,如果result.ok()為false,則cluster沒有機會初始化就會被后面的device->ClusterMemoryHandle()函數當做入參使用,會觸發很惡劣的影響:
3.13.4. clang禁止使用括號表達式初始化數組
下面的修改前的代碼clang會報錯"parenthesized initialization of a member array is a GNU extension [-Wgnu-array-member-paren-init]",從gcc回報告警"list-initializer for non-class type must not be parenthesized":
類似的修改還有:
sdk/tests/tops/tops_dot_parameter_test.cc
sdk/tests/tops/tops_pad_parameter_test.cc
3.13.5. clang的泛型函數的實例化必須有相關調用才會觸發
因為構造函數在sdk自身代碼里面沒有被調用,導致libdtu_sdk.so里面也沒有相關符號,但測試函數需要使用,不得已加了個樁函數來觸發構造函數實例化。
3.13.6. clang的constexpr中不允許定義需要內存處理的復雜對象
下面的模板定義中需要新生成vector對象,該對象需要在構造函數中使用內存相關處理,不修改會報錯“variable of non-literal type 'std::vector<size_t>' (aka 'vector<unsigned long>') cannot be defined in a constexpr function”,將模板中的constexpr標識刪掉之后正常。
查看c++標准3.9/10可以看到literal type的定義(相當於常量或者簡單變量),unpack_seq_to_vector里面的vector不屬於簡單變量或者簡單變量的數組,如果換成array應該可以通過,不過調用這個函數的地方都要修改:
A type is a literal type if it is:
-
void
; or -
a scalar type; or
-
a reference type; or
-
an array of literal type; or
-
a class type (Clause 9) that has all of the following properties:
-
it has a trivial destructor,
-
it is an aggregate type (8.5.1) or has at least one
constexpr
constructor or constructor template that is not a copy or move constructor, and -
all of its non-static data members and base classes are of non-volatile literal types
-
3.13.7. clang的虛函數的重載需要加上顯式的override關鍵字
其他類似修改:
tools/logging/include/logging/to/std_err.h
3.13.8. alignas使用問題
alignas本意是定義結構體的時候,為了優化結構體的訪問效率,讓結構體的存放盡量靠近大的整數邊界,和c語言里面的pack不是一個概念。所以pack可以對所有對象強制指定pack(1)來確保內存訪問不移位,alignas的設置卻要求比結構體成員的最大長度要大:
The object or the type declared by such a declaration will have its alignment requirement equal to the strictest (largest) non-zero expression of all alignas
specifiers used in the declaration, unless it would weaken the natural alignment of the type.
下面定義的結構體中有uint16_t的成員,理論上最小alignas是2,所以不能用alignas(1)來修飾:
diff --git a/sdk/lib/hlir/utils/types.h b/sdk/lib/hlir/utils/types.h index 87aee25fe31..90cabe7bdb6 100644 --- a/sdk/lib/hlir/utils/types.h +++ b/sdk/lib/hlir/utils/types.h @@ -151,13 +151,13 @@ enum class CompareType { // define raw data type // lower to factor need raw data -struct alignas(1) raw_bf16_ty { +struct alignas(2) raw_bf16_ty { uint16_t data; }; static_assert(sizeof(raw_bf16_ty) == 2, ""); // half -struct alignas(1) raw_fp16_ty { +struct alignas(2) raw_fp16_ty { uint16_t data; }; static_assert(sizeof(raw_fp16_ty) == 2, "");
3.14. 為了解決告警順帶做的一些優化
3.14.1. 冗余的計算
tools/logging/lib/logging/log_message.cc當時本來是為了解決變長數組的初始化問題,但自己閱讀發現把timeval的毫秒和秒先計算成一個總的毫秒之后並沒有使用,后面又直接換算成秒和毫秒再用的,所以這個換算實際上沒用,和代碼onwer確認之后刪掉相關冗余計算。
3.14.2. 引用指針和空指針的冗余比較
對象的引用是指某個對象的地址,肯定不是空,所以將它和nullptr做比較沒有意義: