[轉]BLAS簡介


BLAS(Basic Linear Algebra Subprograms)是一組線性代數計算中通用的基本運算操作函數集合[1] 。BLAS Technical (BLAST) Forum負責規范BLAS函數接口, 並在網站[1]公布一個由Fortran語言編寫的BLAS庫。這個Fortran版BLAS庫通常被稱為BLAS參考庫(the reference implementation)。 BLAS參考庫使用的算法能高效地給出正確的結果,但仍有許多優化潛力。要想獲得更高的計算效率,可以使用優化的BLAS庫。

BLAS是LAPACK的子集,LAPACK是更豐富的線性代數程序庫。

BLAS庫的實現

向量和矩陣運算是數值計算的基礎,BLAS庫通常是一個軟件計算效率的決定性因素。除了BLAS參考庫以外,還有多種衍生版本和優化版本。這些BLAS庫實現中,有些僅實現了其它編程語言的BLAS庫接口,有些是基於BLAS參考庫的Fortran語言代碼翻譯成其它編程語言,有些是通過二進制文件代碼轉化方法將BLAS參考庫轉換成其它變成語言代碼,有些是在BLAS參考庫的基礎上,針對不同硬件(如CPU,GPU)架構特點做進一步優化[4][5]。

ATLAS BLAS[3]

The ATLAS (Automatically Tuned Linear Algebra Software) project is an ongoing research effort focusing on applying empirical techniques in order to provide portable performance. At present, it provides C and Fortran77 interfaces to a portably efficient BLAS implementation, as well as a few routines from LAPACK.

OpenBLAS[4]

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.

Intel® Math Kernel Library[5]

Intel® Math Kernel Library (Intel® MKL) accelerates math processing and neural network routines that increase application performance and reduce development time. Intel MKL includes highly vectorized and threaded Linear Algebra, Fast Fourier Transforms (FFT), Neural Network, Vector Math and Statistics functions. The easiest way to take advantage of all of that processing power is to use a carefully optimized math library. Even the best compiler can’t compete with the level of performance possible from a hand-optimized library. If your application already relies on the BLAS or LAPACK functionality, simply re-link with Intel MKL to get better performance on Intel and compatible architectures.

cuBLAS[6]

The NVIDIA CUDA Basic Linear Algebra Subroutines (cuBLAS) library is a GPU-accelerated version of the complete standard BLAS library that delivers 6x to 17x faster performance than the latest MKL BLAS.

clBLAS[7]

This repository houses the code for the OpenCL™ BLAS portion of clMath. The complete set of BLAS level 1, 2 & 3 routines is implemented.

BLIS[10]

BLIS is a portable software framework for instantiating high-performance BLAS-like dense linear algebra libraries. The framework was designed to isolate essential kernels of computation that, when optimized, enable optimized implementations of most of its commonly used and computationally intensive operations. Select kernels have been optimized for the AMD EPYCTM processor family. The optimizations are done for single and double precision routines.

BLAS 函數

BLAS庫中函數根據運算對象的不同,分為三類:

  • Level 1 函數處理單一向量的線性運算以及兩個向量的二元運算。Level 1 函數最初出現在1979年公布的BLAS庫中。
  • Level 2 函數處理矩陣與向量的運算,同時也包含線性方程求解計算。 Level 2 函數公布於1988年。
  • Level 3 函數包含矩陣與矩陣運算。Level 3 函數發表於1990年。

BLAS 函數接口命名規范

Level 1 接口函數名稱由“前綴+操作簡稱“組成
例如 SROTG函數,其中

  • S    -- 標明矩陣或向量中元素數據類型的前綴;
  • ROTG -- 向量運算簡稱.
     
    前綴: 矩陣或向量內元素的數據類型,有以下幾種:
  • S - 單精度浮點數
  • D - 雙精度浮點數
  • C - 復數
  • Z - 16位復數

Level 2 和 Level 3函數涉及矩陣運算,接口函數名稱由”前綴 + 矩陣類型 + 操作簡稱“組成。
例如: SGEMV

  • S     -- 標明矩陣或向量中元素數據類型的前綴;

  • GE   -- 矩陣類型

  • MV  -- 向量或矩陣運算簡稱
     
    BLAS庫中使用的矩陣類型有以下幾種:

  • GE - GEneral  稠密矩陣

  • GB - General Band 帶狀矩陣

  • SY - SYmmetric    對稱矩陣

  • SB - Symmetric Band 對稱帶狀矩陣

  • SP - Symmetric Packed  壓縮存儲對稱矩陣

  • HE - HEmmitian     Hemmitian矩陣,自共軛矩陣

  • HB - Hemmitian Band   帶狀Hemmitian矩陣

  • HP - Hemmitian Packed  壓縮存儲Hemmitian矩陣

  • TR - TRiangular      三角矩陣

  • TB - Triangular Band  三角帶狀矩陣

  • TP - Triangular Packed  壓縮存儲三角矩陣

Level 1

PROTG
 - Description: generate plane rotation
 - Syntax: PROTG( A, B, C, S)
    - P: S(single float), D(double float)

PROTMG
 - Description: generate modified plane rotation
 - Syntax: PROTMG( D1, D2, A, B, PARAM)
    - P: S(single float), D(double float)

PROT
 - Description: apply plane rotation
 - Syntax: PROT( N, X, INCX, Y, INCY, C, S)
    - P: S(single float), D(double float)

PROTM
 - Description: apply modified plane rotation
 - Syntax: PROTM( N, X, INCX, Y, INCY, PARAM)
    - P: S(single float), D(double float)

PSWAP
 - Description: swap x and y
 - Syntax: PSWAP( N, X, INCX, Y, INCY)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PSCAL
 - Description: x = ax
 - Syntax: PSCAL( N, ALPHA, X, INCX)
    - P: S(single float), D(double float), C(complex), Z(complex
16), CS, ZD

PCOPY
 - Description: copy x into y
 - Syntax: PCOPY( N, X, INCX, Y, INCY)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PAXPY
 - Description: copy x into y
 - Syntax: PAXPY( N, ALPHA, X, INCX, Y, INCY)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PDOT
 - Description: dot product
 - Syntax: PDOT( N, X, INCX, Y, INCY)
    - P: S(single float), D(double float), DS

PNRM2
 - Description: Euclidean norm
 - Syntax: PNRM2( N, X, INCX)
    - P:  S(single float), D(double float), CS, ZD

PASUM
 - Description: sum of absolute values
 - Syntax: PASUM( N, X, INCX)
    - P:  S(single float), D(double float), CS, ZD

IXAMAX
 - Description: index of max absolute value
 - Syntax: IXAMAX( N, X, INCX)

Level 2

PGEMV
 - Description: matrix vector multiply
 - Syntax: PGEMV( TRANS, M, N, ALPHA, A, LDA, X, INCX, BETA, Y, INCY)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PGBMV
 - Description: banded matrix vector multiply
 - Syntax: PGEMV( TRANS, M, N, KL, KU, ALPHA, A, LDA, X, INCX, BETA, Y, INCY)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PSYMV
 - Description: symmetric matrix vector multiply
 - Syntax: PGEMV( TRANS, N, ALPHA, A, LDA, X, INCX, BETA, Y, INCY)
    - P:  S(single float), D(double float),

PSBMV
 - Description: symmetric banded matrix vector multiply
 - Syntax: PGEMV( TRANS, N, K, ALPHA, A, LDA, X, INCX, BETA, Y, INCY)
    - P:  S(single float), D(double float),

PSPMV -
 - Description: symmetric packed matrix vector multiply
 - Syntax: PGEMV( TRANS, N, ALPHA, AP, X, INCX, BETA, Y, INCY)
    - P:  S(single float), D(double float),

PTRMV
 - Description: triangular matrix vector multiply
 - Syntax: PTRMV( UPLO, TRANS, DIAG, A, LDA, X, INCX)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PTBMV -
 - Description: triangular banded matrix vector multiply
 - Syntax: PTRSV( UPLO, TRANS, DIAG, N, K, A, LDA, X, INCX)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PTPMV
 - Description: triangular packed matrix vector multiply
 - Syntax: PTPMV( UPLO, TRANS, DIAG, N, AP, X, INCX)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PTRSV
 - Description: solving triangular matrix problems
 - Syntax: PTRSV( UPLO, TRANS, DIAG, N, A, LDA, X, INCX)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PTBSV
 - Description: solving triangular banded matrix problems
 - Syntax: PTBSV( UPLO, TRANS, DIAG, N, K, A, LDA, X, INCX)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PTPSV
 - Description: solving triangular packed matrix problems
 - Syntax: PGER( UPLO, TRANS, DIAG, N, AP, X, INCX)
    - P:  S(single float), D(double float), C(complex), Z(complex*16)

PGER
 - Description: performs the rank 1 operation A := alphaxy' + A
 - Syntax: PGER( M, N, ALPHA, X, INCX, Y, INCY, A, LDA)
    - P:  S(single float), D(double float)

PSYR
 - Description: performs the symmetric rank 1 operation A := alphaxx' + A
 - Syntax: PSYR( UPLO,  N, ALPHA, X, INCX, A, LDA)
    - P: S(single float), D(double float)

PSPR -
 - Description: symmetric packed rank 1 operation A := alphaxx' + A
 - Syntax: PSPR( UPLO,  N, ALPHA, X, AP)
    - P: S(single float), D(double float)

PSYR2
 - Description: performs the symmetric rank 2 operation, A := alphaxy' + alphayx' + A
 - Syntax: PSYR2( UPLO,  N, ALPHA, X, INCX, Y, INCY, A, LDA)
    - P: S(single float), D(double float)

PSPR2
 - Description: performs the symmetric packed rank 2 operation, A := alphaxy' + alphayx' + A
 - Syntax: PSPR2( UPLO,  N, ALPHA, X, INCX, Y, INCY, AP)
    - P: S(single float), D(double float)

Level 3

PGEMM
 - Description: matrix matrix multiply
 - Syntax: PGEMM( TRANSA, TRANSB, M, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PSYMM
 - Description: symmetric matrix matrix multiply
 - Syntax: PTRSM( SIDE, UPLD, M, N, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PSYRK
 - Description: symmetric rank-k update to a matrix
 - Syntax: PSYR2K( UPLD, TRANSA, N, K, ALPHA, A, LDA, BETA, C, LDC)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PSYR2K
 - Description: symmetric rank-2k update to a matrix
 - Syntax: PSYR2K( UPLD, TRANSA, N, K, ALPHA, A, LDA, B, LDB, BETA, C, LDC)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PTRMM -
 - Description: triangular matrix matrix multiply
 - Syntax: PTRMM( SIDE, UPLD, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

PTRSM
 - Description: solving triangular matrix with multiple right hand sides
 - Syntax: PTRSM( SIDE, UPLD, TRANSA, DIAG, M, N, ALPHA, A, LDA, B, LDB)
    - P: S(single float), D(double float), C(complex), Z(complex*16)

其它矩陣計算庫

SparseLib++ --- Numerical Sparse Matrix Classes in C++

http://math.nist.gov/sparselib

SparseLib++ is a C++ class library for efficient sparse matrix computations across various computational platforms.  The software package consists of matrix objects representing several sparse storage formats currently in use (in this release: compressed row, compressed column and coordinate formats), providing basic functionality for managing sparse matrices, together with 
efficient kernel mathematical operations (e.g. sparse matrix-vector multiply).
Routines based on the Sparse BLAS are used to enhance portability and performance. Included in the package are various preconditioners commonly used in iterative solvers for linear systems of equations.  The focus is on computational support for iterative methods, but the sparse matrix objects 
presented here can be used on their own.

SparseLib++最新版是 v. 1.7. 最近更新時間是2008年(已經好久沒更新了)

SparseLib++ 1.7 使用了complex.h等C99特性。 使用g++ v. 4.0.1以上版本能編譯。Visual Studio對不支持所有C99特性,不能直接使用VS編譯SparseLib++ 1.7(可以通過mingw編譯)

PETSc

https://www.mcs.anl.gov/petsc/

PETSc, pronounced PET-see (the S is silent), is a suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations. It supports MPI, and GPUs through CUDA or OpenCL, as well as hybrid MPI-GPU parallelism. PETSc (sometimes called PETSc/Tao) also contains the Tao optimization software library.

SuitSparse

http://faculty.cse.tamu.edu/davis/suitesparse.html
SuiteSparse is a suite of sparse matrix algorithms,
另外, 網頁[8]上列舉了許多矩陣計算庫

參考文獻

[1] blas官網:http://www.netlib.org/blas/
[2] https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms
[3] http://math-atlas.sourceforge.net/
[4] http://www.openblas.net/ 
[5] https://software.intel.com/en-us/intel-mkl/
[6] https://developer.nvidia.com/cublas 
[7] https://github.com/clMathLibraries/clBLAS   
[8] https://scicomp.stackexchange.com/questions/351/recommendations-for-a-usable-fast-c-matrix-library
[9] https://martin-thoma.com/solving-linear-equations-with-gaussian-elimination/
[10] https://developer.amd.com/amd-cpu-libraries/blas-library/

原文:CSDN cocoonyang

附錄:相關鏈接

openblass的Github:https://github.com/xianyi/OpenBLAS/wiki/User-Manual
openblass作者的一次講座:https://www.leiphone.com/news/201704/Puevv3ZWxn0heoEv.html


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM