[Cin] gcc's autovectorization
randrianasulu at gmail.com
Sun Feb 10 22:40:56 CET 2019
Not sure if this article will be useful for anyone, but I was trying to
understand why some color effects are ..well, slow. I saw all my four cores (
AMD FX(tm)-4300 Quad-Core Processor) used, but resulting framerate for single
track, 1280x720x30 fps h264 video + "color 3 way" effect was just ..7 fps?!
I found this
"Auto-vectorization with gcc 4.7"
with interesting conclusion
gcc is very good, and can auto-vectorize many inner loops. However, if the
expressions get too complex, vectorization will fail. gcc also may not be able
to get the most optimal form of the loop kernel. In general, the simpler the
code, the more likely gcc is to give good results.
However, you cannot expect gcc to give what you expect without a few tweaks. You
may need to add the "--fast-math" to turn on associativity. You will definitely
need to tell the compiler about alignment and array-overlap considerations to
get good code.
On the other hand, gcc will still attempt to vectorize code which hasn't had
changes done to it at all. It just won't be able to get nearly as much of a
performance improvement as you might hope.
However, as time passes, more inner loop patterns will be added to the
vectorizable list. Thus if you are using later versions of gcc, don't take the
above results for granted. Check the output of the compiler yourself to see if
it is behaving as you might expect. You might be pleasantly surprised by what
it can do.
I thought CinGG as of today might not use any those nice SIMD extensions, but
right now I haven't cin-gg freshly copiled, for checking object
files ..currently compiling latest git. (32-bit build with gcc 5.5, but
without -march=native or something.. I should try this ...with package version
More information about the Cin