C++優化筆記: -O2/-O3/-ffast-math/SIMD


1. 參考資料

gcc編譯優化選項: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
Optimizing C++: https://pashminacameron.github.io/cpp/
gcc/g++ 優化標識 -O1 -O2 -O3 -Os -Ofast -Og的作用: https://blog.csdn.net/liang_baikai/article/details/110137374
浮點優化選項 -ffast-math:極大地提高浮點運算速度 https://www.cnblogs.com/sky-heaven/p/6742610.html

2. 一些問題

Would it make sense to enable ffast-math for simd types? https://github.com/hsivonen/simd/issues/19
-ffast-math 對於加速浮點運算非常有用,特別是允許更容易的向量化。 我看到在此基准測試中執行 -ffast-math 時,clang 中矩陣乘法的運行時間減少了約 30%:
正如在 rust issue 中提到的,內在函數已經允許其中的一部分,並且已經可以實現 f32/f64 的包裝器類型。 由於 SIMD 類型已經針對矢量化並且包裝/展開的成本已經存在,無論如何為它們啟用 -ffast-math 是否有意義? 或者,如果在某些情況下這沒有意義,為了方便起見復制所有類型的慢速和快速版本是否有用?

Pre-RFC: What’s the best way to implement -ffast-math? https://internals.rust-lang.org/t/pre-rfc-whats-the-best-way-to-implement-ffast-math/5740
gcc, simd intrinsics and fast-math concepts https://stackoverflow.com/questions/4966489/gcc-simd-intrinsics-and-fast-math-concepts
What does gcc's ffast-math actually do? https://stackoverflow.com/questions/7420665/what-does-gccs-ffast-math-actually-do?noredirect=1&lq=1
Why doesn't GCC optimize aaaaaa to (aaa)(aaa)? https://stackoverflow.com/questions/6430448/why-doesnt-gcc-optimize-aaaaaa-to-aaaaaa
What kind of optimizations are included in -funsafe-math-optimizations? https://stackoverflow.com/questions/28134064/what-kind-of-optimizations-are-included-in-funsafe-math-optimizations

3. Optimizing C++ 筆記:

Compiler options can make quite a difference in the speed (as well as size and behaviour) of the code.
-O2 is the highest level of optimization you can request without sacrificing safety of the code.
Going from -O2 to -O3 shows very little gain in speed, but adding -ffast-math (which turns on -O3)does improve the speed noticeably.
However, this comes at a cost.

-ffast-math essentially turns on unsafe math optimizations and the changes due to this compiler option can propagate to the code that may link against your code in future (see Note of -ffast-math in References).
While this option does make your code faster, it is very important that you understand the implications of turning it on and if possible, mitigate against it.
A safer option that gives similar performance is to use either function-specific optimization (see Selective optimizations) or write some intrinsics or assembly to optimize just the bottlenecks rather than letting the compiler wreak havoc on all of your code (and your downstream dependencies).
We see from the graph below that AVX code performs better than the -ffast-math code and is also safer.
This is definitely a case in which the effort of writing SIMD intrinsics is worth it.

4. gcc 編譯優化選項關系

-Ofast = -O3 + -ffast-math + -fallow-store-data-races

-ffast-math
Sets the options -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans, -fcx-limited-range and -fexcess-precision=fast

-funsafe-math-optimizations
-fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math.
其中 -fassociative-math 依賴 -fno-signed-zeros, -fno-trapping-math

解釋
-funsafe-math-optimizations
允許優化浮點運算
(a) 假設論證和結果是有效的,並且
(b) 可能違反 IEEE 或 ANSI 標准。
在鏈接時使用時,它可能包含更改默認 FPU 控制字或其他類似優化的庫或啟動文件。

其中又包括了多個編譯選項:  -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math.
影響耗時的編譯選項為:
-fno-signed-zeros, -fno-trapping-math, -fassociative-math
測試發現缺一不可,主要因為 -fassociative-math 依賴 -fno-signed-zeros, -fno-trapping-math
解釋: -fassociative-math
允許在一系列浮點運算中重新關聯操作數。 可能會改變計算結果,違反了 ISO C 和 C++ 語言標准。 可能會改變0的符號(IEEE 算術指定不同 +0.0 和 -0.0 值的行為,並且禁止簡化表達式,例如 x+0.0 或 0.0*x),忽略 NaN 並禁止或創建下溢或溢出(因此不能用於依賴舍入行為的代碼,如 (x + 252) - 252。 還可以對浮點比較重新排序,因此在需要排序比較時可能不要去使用。

官方文檔對 -fno-trapping-math的說明: This option should never be turned on by any -O option since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions.


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM