Pytorch1.3源碼解析-第一篇

本文轉載自查看原文 2019-10-28 11:34 1656 Pytorch1.3源碼

pytorch$ tree -L 1
.
├── android
├── aten
├── benchmarks
├── binaries
├── c10
├── caffe2
├── CITATION
├── cmake
├── CMakeLists.txt
├── CODEOWNERS
├── CONTRIBUTING.md
├── docker
├── docs
├── ios
├── LICENSE
├── Makefile
├── modules
├── mypy-files.txt
├── mypy.ini
├── mypy-README.md
├── NOTICE
├── README.md
├── requirements.txt
├── scripts
├── setup.py
├── submodules
├── test
├── third_party
├── tools
├── torch
├── ubsan.supp
└── version.txt

17 directories, 15 files

解讀如下：

├── android

├── aten(aten -A TENsor library for C++11,PyTorch的C++ tensor library，aten有大量的代碼是來聲明和定義Tensor運算相關的邏輯)

├── benchmarks (PyTorch Benchmarks)

├── binaries (用於移動端基准測試，在PEP中運行pytorch移動基准測試，Run pytorch mobile benchmark in PEP)

├── c10(c10-Caffe Tensor Library，核心Tensor實現（手機端+服務端）)

├── caffe2 (TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test。為了復用，2018年4月Facebook宣布將Caffe2的倉庫合並到了PyTorch的倉庫,從用戶層面來復用包含了代碼、CI、部署、使用、各種管理維護等。caffe2中network、operators等的實現，會生成libcaffe2.so、libcaffe2_gpu.so、caffe2_pybind11_state.cpython-37m-x86_64-linux-gnu.so（caffe2 CPU Python 綁定）、caffe2_pybind11_state_gpu.cpython-37m-x86_64-linux-gnu.so（caffe2 CUDA Python 綁定），基本上來自舊的caffe2項目)

├── cmake (TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test)

├── ios (與性能測試有關腳本)

├── modules (與iOS相關)

├── scripts (與iOS應用測試相關，增加 benchmark code to iOS TestApp)

├── submodules (Re-sync with internal repository)

├── third_party (谷歌、Facebook、NVIDIA、Intel等開源的第三方庫)

├── tools (用於PyTorch構建的腳本)

├── torch (TH / THC提供了一些hpp頭文件，它們是標准的C ++頭文件，而不是C頭文件。pytorch的variable、autograd、jit、onnx、distribute、model接口、python接口等都在這里聲明定義。其中，PyTorch會使用tools/setup_helpers/generate_code.py來動態生成)

細節展開2級目錄

$ tree -L 2
.
├── android
│       ├── build.gradle
│       ├── gradle
│       ├── gradle.properties
│       ├── libs
│       ├── pytorch_android
│       ├── pytorch_android_torchvision
│       ├── run_tests.sh
│       └── settings.gradle
├── aten
│       ├── CMakeLists.txt
│       ├── conda
│       ├── src
│       └── tools
├── benchmarks
│       ├── fastrnns
│       ├── framework_overhead_benchmark
│       ├── operator_benchmark
│       └── README.md
├── binaries
│       ├── at_launch_benchmark.cc
│       ├── bench_gen
│       ├── benchmark_args.h
│       ├── benchmark_helper.cc
│       ├── benchmark_helper.h
│       ├── caffe2_benchmark.cc
│       ├── CMakeLists.txt
│       ├── convert_and_benchmark.cc
│       ├── convert_caffe_image_db.cc
│       ├── convert_db.cc
│       ├── convert_encoded_to_raw_leveldb.cc
│       ├── convert_image_to_tensor.cc
│       ├── core_overhead_benchmark.cc
│       ├── core_overhead_benchmark_gpu.cc
│       ├── db_throughput.cc
│       ├── inspect_gpu.cc
│       ├── intra_inter_benchmark.cc
│       ├── make_cifar_db.cc
│       ├── make_image_db.cc
│       ├── make_mnist_db.cc
│       ├── parallel_info.cc
│       ├── predictor_verifier.cc
│       ├── print_core_object_sizes_gpu.cc
│       ├── print_registered_core_operators.cc
│       ├── run_plan.cc
│       ├── run_plan_mpi.cc
│       ├── speed_benchmark.cc
│       ├── speed_benchmark_torch.cc
│       ├── split_db.cc
│       ├── tsv_2_proto.cc
│       ├── tutorial_blob.cc
│       └── zmq_feeder.cc
├── c10
│       ├── CMakeLists.txt
│       ├── core
│       ├── cuda
│       ├── hip
│       ├── macros
│       ├── test
│       └── util
├── caffe2
│       ├── c2_aten_srcs.bzl
│       ├── CMakeLists.txt
│       ├── contrib
│       ├── core
│       ├── cuda_rtc
│       ├── db
│       ├── distributed
│       ├── experiments
│       ├── ideep
│       ├── image
│       ├── __init__.py
│       ├── mobile
│       ├── mpi
│       ├── observers
│       ├── onnx
│       ├── operators
│       ├── opt
│       ├── perfkernels
│       ├── predictor
│       ├── proto
│       ├── python
│       ├── quantization
│       ├── queue
│       ├── README.md
│       ├── release-notes.md
│       ├── requirements.txt
│       ├── serialize
│       ├── sgd
│       ├── share
│       ├── test
│       ├── transforms
│       ├── utils
│       ├── VERSION_NUMBER
│       └── video
├── CITATION
├── cmake
│       ├── BuildVariables.cmake
│       ├── Caffe2Config.cmake.in
│       ├── Caffe2ConfigVersion.cmake.in
│       ├── cmake_uninstall.cmake.in
│       ├── Codegen.cmake
│       ├── Dependencies.cmake
│       ├── External
│       ├── GoogleTestPatch.cmake
│       ├── iOS.cmake
│       ├── MiscCheck.cmake
│       ├── Modules
│       ├── Modules_CUDA_fix
│       ├── ProtoBuf.cmake
│       ├── ProtoBufPatch.cmake
│       ├── public
│       ├── Summary.cmake
│       ├── TorchConfig.cmake.in
│       ├── TorchConfigVersion.cmake.in
│       ├── Utils.cmake
│       └── Whitelist.cmake
├── CMakeLists.txt
├── CODEOWNERS
├── CONTRIBUTING.md
├── docker
│       ├── caffe2
│       └── pytorch
├── docs
│       ├── caffe2
│       ├── cpp
│       ├── libtorch.rst
│       ├── make.bat
│       ├── Makefile
│       ├── requirements.txt
│       └── source
├── ios
│       ├── LibTorch.h
│       ├── LibTorch.podspec
│       ├── README.md
│       └── TestApp
├── LICENSE
├── Makefile
├── modules
│       ├── CMakeLists.txt
│       ├── detectron
│       ├── module_test
│       ├── observers
│       └── rocksdb
├── mypy-files.txt
├── mypy.ini
├── mypy-README.md
├── NOTICE
├── README.md
├── requirements.txt
├── scripts
│       ├── add_apache_header.sh
│       ├── apache_header.txt
│       ├── apache_python.txt
│       ├── appveyor
│       ├── build_android.sh
│       ├── build_host_protoc.sh
│       ├── build_ios.sh
│       ├── build_local.sh
│       ├── build_mobile.sh
│       ├── build_pytorch_android.sh
│       ├── build_raspbian.sh
│       ├── build_tegra_x1.sh
│       ├── build_tizen.sh
│       ├── build_windows.bat
│       ├── diagnose_protobuf.py
│       ├── fbcode-dev-setup
│       ├── get_python_cmake_flags.py
│       ├── model_zoo
│       ├── onnx
│       ├── proto.ps1
│       ├── read_conda_versions.sh
│       ├── README.md
│       ├── remove_apache_header.sh
│       ├── run_mobilelab.py
│       ├── temp.sh
│       └── xcode_build.rb
├── setup.py
├── submodules
│       └── nervanagpu-rev.txt
├── test
│       ├── backward_compatibility
│       ├── bottleneck
│       ├── common_cuda.py
│       ├── common_device_type.py
│       ├── common_distributed.py
│       ├── common_methods_invocations.py
│       ├── common_nn.py
│       ├── common_quantization.py
│       ├── common_quantized.py
│       ├── common_utils.py
│       ├── cpp
│       ├── cpp_api_parity
│       ├── cpp_extensions
│       ├── custom_operator
│       ├── data
│       ├── dist_autograd_test.py
│       ├── dist_utils.py
│       ├── error_messages
│       ├── expect
│       ├── expecttest.py
│       ├── HowToWriteTestsUsingFileCheck.md
│       ├── hypothesis_utils.py
│       ├── jit
│       ├── jit_utils.py
│       ├── onnx
│       ├── optim
│       ├── rpc_test.py
│       ├── run_test.py
│       ├── simulate_nccl_errors.py
│       ├── test_autograd.py
│       ├── test_c10d.py
│       ├── test_c10d_spawn.py
│       ├── test_cpp_api_parity.py
│       ├── test_cpp_extensions.py
│       ├── test_cuda_primary_ctx.py
│       ├── test_cuda.py
│       ├── test_dataloader.py
│       ├── test_data_parallel.py
│       ├── test_dist_autograd_fork.py
│       ├── test_dist_autograd_spawn.py
│       ├── test_distributed.py
│       ├── test_distributions.py
│       ├── test_docs_coverage.py
│       ├── test_expecttest.py
│       ├── test_fake_quant.py
│       ├── test_function_schema.py
│       ├── test_indexing.py
│       ├── test_jit_disabled.py
│       ├── test_jit_fuser.py
│       ├── test_jit.py
│       ├── test_jit_py3.py
│       ├── test_jit_string.py
│       ├── test_logging.py
│       ├── test_mkldnn.py
│       ├── test_module
│       ├── test_multiprocessing.py
│       ├── test_multiprocessing_spawn.py
│       ├── test_namedtensor.py
│       ├── test_namedtuple_return_api.py
│       ├── test_nccl.py
│       ├── test_nn.py
│       ├── test_numba_integration.py
│       ├── test_optim.py
│       ├── test_qat.py
│       ├── test_quantization.py
│       ├── test_quantized_models.py
│       ├── test_quantized_nn_mods.py
│       ├── test_quantized.py
│       ├── test_quantized_tensor.py
│       ├── test_quantizer.py
│       ├── test_rpc_fork.py
│       ├── test_rpc_spawn.py
│       ├── test_sparse.py
│       ├── test_tensorboard.py
│       ├── test_throughput_benchmark.py
│       ├── test_torch.py
│       ├── test_type_hints.py
│       ├── test_type_info.py
│       ├── test_type_promotion.py
│       └── test_utils.py
├── third_party（谷歌、Facebook、NVIDIA、Intel等開源的第三方庫）
│       ├── benchmark（谷歌開源的benchmark庫）
│       ├── cpuinfo（Facebook開源的cpuinfo，檢測cpu信息）
│       ├── cub（NVIDIA開源的CUB is a flexible library of cooperative threadblock primitives and other utilities for CUDA kernel programming）
│       ├── eigen（線性代數矩陣運算庫）
│       ├── fbgemm（Facebook開源的低精度高性能的矩陣運算庫，目前作為caffe2 x86的量化運算符的backend）
│       ├── foxi（ONNXIFI with Facebook Extension）
│       ├── FP16（Conversion to/from half-precision floating point formats）
│       ├── FXdiv（C99/C++ header-only library for division via fixed-point multiplication by inverse）
│       ├── gemmlowp（谷歌開源的矩陣乘法運算庫Low-precision matrix multiplication，https://github.com/google/gemmlowp）
│       ├── gloo（Facebook開源的跨機器訓練的通信庫Collective communications library with various primitives for multi-machine training）
│       ├── googletest（谷歌開源的UT框架）
│       ├── ideep（Intel開源的使用MKL-DNN做的神經網絡加速庫）
│       ├── ios-cmake（用於ios的cmake工具鏈文件）
│       ├── miniz-2.0.8（數據壓縮庫，Miniz is a lossless, high performance data compression library in a single source file）
│       ├── nccl（NVIDIA開源的多GPU通信的優化原語，Optimized primitives for collective multi-GPU communication）
│       ├── neon2sse（與ARM有關，intende to simplify ARM->IA32 porting）
│       ├── NNPACK（多核心CPU加速包用於神經網絡，Acceleration package for neural networks on multi-core CPUs）
│       ├── onnx（Open Neural Network Exchange，Facebook開源的神經網絡模型交換格式，目前Pytorch、caffe2、ncnn、coreml等都可以對接）
│       ├── onnx-tensorrt（ONNX-TensorRT: TensorRT backend for ONNX）
│       ├── protobuf（谷歌開源的protobuf）
│       ├── psimd（便攜式128位SIMD內部函數，Portable 128-bit SIMD intrinsics）
│       ├── pthreadpool（用於C/C++的多線程池，pthread-based thread pool for C/C++）
│       ├── pybind11（C ++ 11和Python之間的無縫可操作性支撐庫，Seamless operability between C++11 and Python）
│       ├── python-enum（Python標准枚舉模塊，Mirror of enum34 package (PeachPy dependency) from PyPI to be used in submodules）
│       ├── python-peachpy（用於編寫高性能匯編內核的Python框架，PeachPy is a Python framework for writing high-performance assembly kernels）
│       ├── python-six（Python 2 and 3兼容性庫）
│       ├── QNNPACK（Facebook開源的面向移動平台的神經網絡量化加速庫）
│       ├── README.md
│       ├── sleef（SIMD Library for Evaluating Elementary Functions，SIMD庫，用於評估基本函數）
│       ├── tbb（Intel開源的官方線程構建Blocks,Official Threading Building Blocks (TBB)）
│       └── zstd（（Facebook開源的Zstandard，快速實時壓縮算法庫）
├── tools
│       ├── amd_build
│       ├── aten_mirror.sh
│       ├── autograd
│       ├── build_libtorch.py
│       ├── build_pytorch_libs.py
│       ├── build_variables.py
│       ├── clang_format.py
│       ├── clang_tidy.py
│       ├── docker
│       ├── download_mnist.py
│       ├── flake8_hook.py
│       ├── generated_dirs.txt
│       ├── git_add_generated_dirs.sh
│       ├── git-pre-commit
│       ├── git_reset_generated_dirs.sh
│       ├── __init__.py
│       ├── jit
│       ├── pyi
│       ├── pytorch.version
│       ├── README.md
│       ├── setup_helpers
│       └── shared
├── torch
│       ├── abi-check.cpp
│       ├── autograd
│       ├── backends
│       ├── _classes.py
│       ├── CMakeLists.txt
│       ├── __config__.py
│       ├── contrib
│       ├── csrc
│       ├── cuda
│       ├── custom_class.h
│       ├── distributed
│       ├── distributions
│       ├── extension.h
│       ├── for_onnx
│       ├── functional.py
│       ├── __future__.py
│       ├── hub.py
│       ├── __init__.py
│       ├── __init__.pyi.in
│       ├── jit
│       ├── _jit_internal.py
│       ├── legacy
│       ├── lib
│       ├── multiprocessing
│       ├── _namedtensor_internals.py
│       ├── nn
│       ├── onnx
│       ├── _ops.py
│       ├── optim
│       ├── py.typed
│       ├── quantization
│       ├── quasirandom.py
│       ├── random.py
│       ├── README.txt
│       ├── script.h
│       ├── serialization.py
│       ├── _six.py
│       ├── sparse
│       ├── _storage_docs.py
│       ├── storage.py
│       ├── _tensor_docs.py
│       ├── tensor.py
│       ├── _tensor_str.py
│       ├── testing
│       ├── _torch_docs.py
│       ├── utils
│       ├── _utils_internal.py
│       └── _utils.py
├── ubsan.supp
└── version.txt

148 directories, 219 files

其中第三方庫：third_party（谷歌、Facebook、NVIDIA、Intel等開源的第三方庫）：

├── third_party（谷歌、Facebook、NVIDIA、Intel等開源的第三方庫）

│ ├── benchmark（谷歌開源的benchmark庫）

│ ├── cpuinfo（Facebook開源的cpuinfo，檢測cpu信息）

│ ├── cub（NVIDIA開源的CUB is a flexible library of cooperative threadblock primitives and other utilities for CUDA kernel programming）

│ ├── eigen（線性代數矩陣運算庫）

│ ├── fbgemm（Facebook開源的低精度高性能的矩陣運算庫，目前作為caffe2 x86的量化運算符的backend）

│ ├── foxi（ONNXIFI with Facebook Extension）

│ ├── FP16（Conversion to/from half-precision floating point formats）

│ ├── FXdiv（C99/C++ header-only library for division via fixed-point multiplication by inverse）

│ ├── gemmlowp（谷歌開源的矩陣乘法運算庫Low-precision matrix multiplication，https://github.com/google/gemmlowp）

│ ├── gloo（Facebook開源的跨機器訓練的通信庫Collective communications library with various primitives for multi-machine training）

│ ├── googletest（谷歌開源的UT框架）

│ ├── ideep（Intel開源的使用MKL-DNN做的神經網絡加速庫）

│ ├── ios-cmake（用於ios的cmake工具鏈文件）

│ ├── miniz-2.0.8（數據壓縮庫，Miniz is a lossless, high performance data compression library in a single source file）

│ ├── nccl（NVIDIA開源的多GPU通信的優化原語，Optimized primitives for collective multi-GPU communication）

│ ├── neon2sse（與ARM有關，intende to simplify ARM->IA32 porting）

│ ├── NNPACK（多核心CPU加速包用於神經網絡，Acceleration package for neural networks on multi-core CPUs）

│ ├── onnx（Open Neural Network Exchange，Facebook開源的神經網絡模型交換格式，目前Pytorch、caffe2、ncnn、coreml等都可以對接）

│ ├── onnx-tensorrt（ONNX-TensorRT: TensorRT backend for ONNX）

│ ├── protobuf（谷歌開源的protobuf）

│ ├── psimd（便攜式128位SIMD內部函數，Portable 128-bit SIMD intrinsics）

│ ├── pthreadpool（用於C/C++的多線程池，pthread-based thread pool for C/C++）

│ ├── pybind11（C ++ 11和Python之間的無縫可操作性支撐庫，Seamless operability between C++11 and Python）

│ ├── python-enum（Python標准枚舉模塊，Mirror of enum34 package (PeachPy dependency) from PyPI to be used in submodules）

│ ├── python-peachpy（用於編寫高性能匯編內核的Python框架，PeachPy is a Python framework for writing high-performance assembly kernels）

│ ├── python-six（Python 2 and 3兼容性庫）

│ ├── QNNPACK（Facebook開源的面向移動平台的神經網絡量化加速庫）

│ ├── README.md

│ ├── sleef（SIMD Library for Evaluating Elementary Functions，SIMD庫，用於評估基本函數）

│ ├── tbb（Intel開源的官方線程構建Blocks,Official Threading Building Blocks (TBB)）

│ └── zstd（（Facebook開源的Zstandard，快速實時壓縮算法庫）

Pytorch核心分為5大塊：

1. c10(c10-Caffe Tensor Library，核心Tensor實現（手機端+服務端）)

2. aten(aten -A TENsor library for C++11,PyTorch的C++ tensor library，aten有大量的代碼是來聲明和定義Tensor運算相關的邏輯)

3. caffe2 (TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test。為了復用，2018年4月Facebook宣布將Caffe2的倉庫合並到了PyTorch的倉庫,從用戶層面來復用包含了代碼、CI、部署、使用、各種管理維護等。caffe2中network、operators等的實現，會生成libcaffe2.so、libcaffe2_gpu.so、caffe2_pybind11_state.cpython-37m-x86_64-linux-gnu.so（caffe2 CPU Python 綁定）、caffe2_pybind11_state_gpu.cpython-37m-x86_64-linux-gnu.so（caffe2 CUDA Python 綁定），基本上來自舊的caffe2項目)

4. torch (TH / THC提供了一些hpp頭文件，它們是標准的C ++頭文件，而不是C頭文件。pytorch的variable、autograd、jit、onnx、distribute、model接口、python接口等都在這里聲明定義。其中，PyTorch會使用tools/setup_helpers/generate_code.py來動態生成)

5. third_party (谷歌、Facebook、NVIDIA、Intel等開源的第三方庫)

具體詳情如下：

c10下的核心部件(c10-Caffe Tensor Library，最核心Tensor實現（手機端+服務端）。請注意，C10庫應保持最小的依賴關系-特別是，它不應該依賴於任何特定於實現或后端的庫。它尤其不應依賴於任何生成的protobuf頭文件，因為protobuf頭文件將可傳遞性地迫使一個人鏈接到特定的protobuf版本)，具體包括如下：

├── c10

│ ├── CMakeLists.txt

│ ├── core

│ ├── cuda

│ ├── hip

│ ├── macros

│ ├── test

│ └── util

Aten下的核心部件(aten -A TENsor library for C++11,PyTorch的C++ tensor library，aten有大量的代碼是來聲明和定義Tensor運算相關的邏輯)：

$ tree -L 2

├── CMakeLists.txt

├── conda

│ ├── build.sh

│ └── meta.yaml

├── src

│ ├── ATen

│ ├── README.md

│ ├── TH

│ ├── THC

│ ├── THCUNN

│ └── THNN

└── tools

├── run_tests.sh

├── test_install.sh

└── valgrind.sup

8 directories, 7 files

其中，Aten/ src下

該目錄包含PyTorch低級別的tensor libraries庫，同時新的C++版Aten被構建，這些低級別的tensor libraries庫可以追溯到最原始的Torch項目，該目錄包含庫如下：

* TH = TorcH

* THC = TorcH Cuda

* THCS = TorcH Cuda Sparse (now defunct)—不使用了

* THCUNN = TorcH CUda Neural Network (see cunn)

* THNN = TorcH Neural Network

* THS = TorcH Sparse (now defunct) —不使用了

caffe2模塊

Caffe2是一個輕量級，模塊化和可擴展的深度學習框架。支持TensorRT 6.0 (優化加速) and PyTorch->ONNX->TRT6 unit test。為了復用，2018年4月Facebook宣布將Caffe2的倉庫合並到了PyTorch的倉庫,從用戶層面來復用包含了代碼、CI、部署、使用、各種管理維護等。caffe2中network、operators等的實現，會生成libcaffe2.so、libcaffe2_gpu.so、caffe2_pybind11_state.cpython-37m-x86_64-linux-gnu.so（caffe2 CPU Python 綁定）、caffe2_pybind11_state_gpu.cpython-37m-x86_64-linux-gnu.so（caffe2 CUDA Python 綁定），基本上來自舊的caffe2項目。

torch下核心部件(TH / THC提供了一些hpp頭文件，它們是標准的C ++頭文件，而不是C頭文件。pytorch的variable、autograd、jit、onnx、distribute、model接口、python接口等都在這里聲明定義。理想情況下，根本不會安裝這些標頭。相反，應該使用公共函數（在類似THTensor.h的頭文件中，而不是THTensor.hpp的頭文件中）來操縱這些結構。但是，在Torch / csrc中有一些地方違反了這種抽象。它們頭文件有指向此注釋的指針。當重構THTensor的核和相關結構時，必須重構每個站點。其中，PyTorch會使用tools/setup_helpers/generate_code.py來動態生成)：

├── autograd (梯度處理)

├── backends (后向處理，包含cuda、cudnn、mkl、mkldnn、openmp和quantized庫)

├── csrc (csrc目錄包含與Python集成有關的所有代碼。這與lib（它包含與Python無關的Torch庫）形成對比。csrc取決於lib，反之則不然。具體包含api、autograd、cuda、distributed、generic、jit、multiprocessing、onnx、tensor和utils)

├── cuda (cuda)

├── distributed (分布式處理，包括autograd)

├── distributions

├── jit (用於最優性能編譯)

├── legacy (低於0.5版本才有)

├── lib (它包含與Python無關的Torch庫，具體包括：c10d、libshm和libshm_windows)

├── multiprocessing (cuda多線程處理)

├── nn (與神經網絡有關的操作與聲明，具體包括backends、intrinsic、modules、parallel、qat、quantized和utils)

├── onnx (模型交換格式)

├── optim (優化)

├── quantization (量化)

├── utils (具體包括backcompat、bottleneck、data、ffi、hipify和tensorboard)

third_party三方模塊

谷歌、Facebook、NVIDIA、Intel等開源的第三方庫，具體包含請見前文。

分層的視角看待：

1 第一層C10：最核心的Tensor實現，手機端、服務端都用；

2 第二層ATen + TH*： Tensor算法的實現，由ATen和TH*組成這一層面；這一層依賴上一層（第一層）。目前已將ATen 某些core往C10上移植，並且將Torch往ATen上移植；

3 第三層Caffe2：是一個輕量級，模塊化和可擴展的深度學習框架。支持TensorRT 6.0 (優化加速) and PyTorch->ONNX->TRT6 unit test。caffe2中network、operators等的實現，會生成libcaffe2.so、libcaffe2_gpu.so、caffe2_pybind11_state.cpython-37m-x86_64-linux-gnu.so（caffe2 CPU Python 綁定）、caffe2_pybind11_state_gpu.cpython-37m-x86_64-linux-gnu.so（caffe2 CUDA Python 綁定）；基本上來自於舊的caffe2項目，這一層依賴上一層（第二層）；

4 第四層Torch，PyTorch的實現，TH / THC提供了一些hpp頭文件，它們是標准的C ++頭文件，而不是C頭文件。pytorch的variable、autograd、jit、onnx、distribute、model接口、python接口等都在這里聲明定義，這一層會生成libtorch.so和libtorch_python.so（Python綁定），依賴ATen+TH*（第二層），不過因為ATen+TH*的邏輯被封裝在了libcaffe2.so，因此這一層要直接依賴上一層（第三層）。

5 其他，如hird_party三方庫：谷歌、Facebook、NVIDIA、Intel等開源的第三方庫，用於支撐ATen + TH*、Caffe2和Torch。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Apktool源碼解析——第一篇第一篇。 pytorch搭建神經網絡-第一篇博客 Ansible源碼分析【第一篇】： Ansible之獲取源碼第一篇: openJDK源碼編譯安裝--mac版本 DUiLib 源碼分析——第一篇UIManager Spring源碼解析 | 第一篇：IntelliJ IDEA2019.3編譯Spring5.3.x源碼 React Fiber源碼分析第一篇我的第一篇博客我的第一篇博客