Not sure if this article will be useful for anyone, but I was trying to understand why some color effects are ..well, slow. I saw all my four cores ( AMD FX(tm)-4300 Quad-Core Processor) used, but resulting framerate for single track, 1280x720x30 fps h264 video + "color 3 way" effect was just ..7 fps?! I found this https://locklessinc.com/articles/vectorize/ "Auto-vectorization with gcc 4.7" with interesting conclusion --------------- Summary gcc is very good, and can auto-vectorize many inner loops. However, if the expressions get too complex, vectorization will fail. gcc also may not be able to get the most optimal form of the loop kernel. In general, the simpler the code, the more likely gcc is to give good results. However, you cannot expect gcc to give what you expect without a few tweaks. You may need to add the "--fast-math" to turn on associativity. You will definitely need to tell the compiler about alignment and array-overlap considerations to get good code. On the other hand, gcc will still attempt to vectorize code which hasn't had changes done to it at all. It just won't be able to get nearly as much of a performance improvement as you might hope. However, as time passes, more inner loop patterns will be added to the vectorizable list. Thus if you are using later versions of gcc, don't take the above results for granted. Check the output of the compiler yourself to see if it is behaving as you might expect. You might be pleasantly surprised by what it can do. -------- I thought CinGG as of today might not use any those nice SIMD extensions, but right now I haven't cin-gg freshly copiled, for checking object files ..currently compiling latest git. (32-bit build with gcc 5.5, but without -march=native or something.. I should try this ...with package version bump!)