GEMM-深度學習的心臟


GEMM就是BLAS中的一個功能,它實現了大矩陣之間相乘。其中必然涉及了如何讀取,存儲等問題。

參考博客:https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/

看到這個時間分布圖你是不是震驚了!要想提高神經網絡計算時間,通過提高卷積層計算效率才是真理。

So what is GEMM?  It stands for General Matrix to Matrix Multiplication, and it essentially does exactly what it says on the tin, multiplies two input matrices together to get an output one. The difference between it and the kind of matrix operations I was used to in the 3D graphics world is that the matrices it works on are often very big. For example, a single layer in a typical network may require the multiplication of a 256 row, 1,152 column matrix by an 1,152 row, 192 column matrix to produce a 256 row, 192 column result. Naively, that requires 57 million (256 x 1,152, x 192) floating point operations and there can be dozens of these layers in a modern architecture, so I often see networks that need several billion FLOPs to calculate a single frame. Here’s a diagram that I sketched to help me visualize how it works:

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM