GEMM-深度學習的心臟

本文轉載自查看原文 2017-09-24 13:36 1184

GEMM就是BLAS中的一個功能，它實現了大矩陣之間相乘。其中必然涉及了如何讀取，存儲等問題。

參考博客：https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/

看到這個時間分布圖你是不是震驚了！要想提高神經網絡計算時間，通過提高卷積層計算效率才是真理。

So what is GEMM? It stands for General Matrix to Matrix Multiplication, and it essentially does exactly what it says on the tin, multiplies two input matrices together to get an output one. The difference between it and the kind of matrix operations I was used to in the 3D graphics world is that the matrices it works on are often very big. For example, a single layer in a typical network may require the multiplication of a 256 row, 1,152 column matrix by an 1,152 row, 192 column matrix to produce a 256 row, 192 column result. Naively, that requires 57 million (256 x 1,152, x 192) floating point operations and there can be dozens of these layers in a modern architecture, so I often see networks that need several billion FLOPs to calculate a single frame. Here’s a diagram that I sketched to help me visualize how it works:

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 深度學習的異構加速技術（一）：AI 需要一個多大的“心臟”？基於深度學習的圖像深度估計如何在 CPU 上優化 GEMM 【深度學習】為什么深度學習需要大內存？深度學習基礎（深度學習准備一）深度學習_BN 深度學習—BN的理解（一）深度學習-conv卷積深度學習基礎階段深度學習之Batch Normalization