[Cin] gcc's autovectorization

Andrew Randrianasulu randrianasulu at gmail.com
Sun Feb 10 22:40:56 CET 2019


Not sure if this article will be useful for anyone, but I was trying to 
understand why some color effects are ..well, slow. I saw all my four cores ( 
AMD FX(tm)-4300 Quad-Core Processor) used, but resulting framerate for single 
track, 1280x720x30 fps h264 video + "color 3 way" effect was just ..7 fps?!

I found this 
https://locklessinc.com/articles/vectorize/
"Auto-vectorization with gcc 4.7"


with interesting conclusion

---------------
Summary
gcc is very good, and can auto-vectorize many inner loops. However, if the 
expressions get too complex, vectorization will fail. gcc also may not be able 
to get the most optimal form of the loop kernel. In general, the simpler the 
code, the more likely gcc is to give good results.
However, you cannot expect gcc to give what you expect without a few tweaks. You 
may need to add the "--fast-math" to turn on associativity. You will definitely 
need to tell the compiler about alignment and array-overlap considerations to 
get good code.
On the other hand, gcc will still attempt to vectorize code which hasn't had 
changes done to it at all. It just won't be able to get nearly as much of a 
performance improvement as you might hope.
However, as time passes, more inner loop patterns will be added to the 
vectorizable list. Thus if you are using later versions of gcc, don't take the 
above results for granted. Check the output of the compiler yourself to see if 
it is behaving as you might expect. You might be pleasantly surprised by what 
it can do.

--------


I thought CinGG as of today might not use any those nice SIMD extensions, but 
right now I haven't cin-gg freshly copiled, for checking object 
files ..currently compiling latest git. (32-bit build with gcc 5.5, but  
without -march=native or something.. I should try this ...with package version 
bump!)



More information about the Cin mailing list