特別推薦: https://www.codeproject.com/Articles/874396/Crunching-Numbers-with-AVX-and-AVX
1. 查看自己cpu支持指令集:
直接去官網查:
https://ark.intel.com/content/www/cn/zh/ark.html#@Processors
比如這顆
2. 測試例子:
#include <immintrin.h> #include <stdio.h> int main(int argc, char* argv[]) { __m256i first = _mm256_set_epi64x(10, 20, 30, 40); __m256i second = _mm256_set_epi64x(5, 5, 5, 5); __m256i result = _mm256_add_epi64(first, second); long int* values = (long int*) &result; printf("==%ld \n", sizeof(long int)); for (int i = 0;i < 4; i++) { printf("%ld ", values[i]); } return 0; }
_mm256_set_epi64x() _mm256_add_epi64() 等內建函數的含義和用法:
https://software.intel.com/sites/landingpage/IntrinsicsGuide
注意:左邊欄勾選后,右欄結果不一定准確。比如SSE的addss指令在有AVX機器中中變為vaddvss,但是勾選AVX512中才能搜到。
編譯命令:
gcc -mavx2 -S -fverbose-asm fun.c #看詳細的匯編語言結果 gcc -mavx2 fun.c
補充個例子:
#include <immintrin.h> #include <stdio.h> float aa[] = {10, 20, 30, 40, 50, 60, 70, 80}; float bb[] = {0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5}; float cc[] = {0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5}; int main(int argc, char* argv[]) { __m256 first = _mm256_loadu_ps (aa); __m256 second = _mm256_loadu_ps (bb); __m256 result = _mm256_add_ps (first, second); _mm256_storeu_ps (cc, result); printf("==%ld \n", sizeof(float)); for (int i = 0;i < 8; i++) { printf("%f\n", cc[i]); } return 0; }
查錯手冊:
AVX vector return without AVX enabled changes the ABI ——————————沒有 -mavx2
inlining failed in call to always_inline 'xxx': target specific option mismatch —————— 架構不匹配,看看cpu是否支持 avx2
參考資料:
https://zhuanlan.zhihu.com/p/94649418
https://www.codeproject.com/Articles/874396/Crunching-Numbers-with-AVX-and-AVX
https://software.intel.com/sites/landingpage/IntrinsicsGuid