[Cin] Autovectorization works!

Andrew Randrianasulu randrianasulu at gmail.com
Mon Feb 11 05:41:50 CET 2019

At last for my machine ..not sure if it will break some plugins?



" To enable vectorization of floating point reductions use -ffast-math 
or -fassociative-math."

I added 

-ffast-math  -ftree-vectorizer-verbose=2 to CFLAGS (global, but I recompiled and 
installed only color3way plugin)

and now we have tons of assmebly!

objdump -M 
intel -d /dev/shm/tmp/cinelerra-goodguy-20190211/cinelerra-5.1/plugins/color3way/i686/color3way.o | ./opcodes.sh -s 
SSE2 -s SSE3  -s SSE -s AVX -s FMA | less


    16e:       c5 f8 77                vzeroupper
     427:       c5 f8 77                vzeroupper
     6cc:       c5 f8 77                vzeroupper
     751:       c5 f8 77                vzeroupper
     80e:       c5 fa 10 81 00 00 00    vmovss xmm0,DWORD PTR [ecx+0x0]
     816:       c5 fb 10 89 00 00 00    vmovsd xmm1,QWORD PTR [ecx+0x0]
     824:       c5 fa 10 12             vmovss xmm2,DWORD PTR [edx]
     828:       c5 ea 5c 10             vsubss xmm2,xmm2,DWORD PTR [eax]
     82c:       c5 e8 54 d0             vandps xmm2,xmm2,xmm0
     830:       c5 ea 5a d2             vcvtss2sd xmm2,xmm2,xmm2
     834:       c5 f9 2f d1             vcomisd xmm2,xmm1
     83e:       c5 fa 10 52 0c          vmovss xmm2,DWORD PTR [edx+0xc]
     843:       c5 ea 5c 50 0c          vsubss xmm2,xmm2,DWORD PTR [eax+0xc]
     848:       c5 e8 54 d0             vandps xmm2,xmm2,xmm0
     84c:       c5 ea 5a d2             vcvtss2sd xmm2,xmm2,xmm2
     850:       c5 f9 2f d1             vcomisd xmm2,xmm1
     85a:       c5 fa 10 52 18          vmovss xmm2,DWORD PTR [edx+0x18]
     85f:       c5 ea 5c 50 18          vsubss xmm2,xmm2,DWORD PTR [eax+0x18]
     864:       c5 e8 54 d0             vandps xmm2,xmm2,xmm0
     868:       c5 ea 5a d2             vcvtss2sd xmm2,xmm2,xmm2
     86c:       c5 f9 2f d1             vcomisd xmm2,xmm1
     876:       c5 fa 10 52 24          vmovss xmm2,DWORD PTR [edx+0x24]
     87b:       c5 ea 5c 50 24          vsubss xmm2,xmm2,DWORD PTR [eax+0x24]
     880:       c5 e8 54 d0             vandps xmm2,xmm2,xmm0
     884:       c5 ea 5a d2             vcvtss2sd xmm2,xmm2,xmm2
     888:       c5 f9 2f d1             vcomisd xmm2,xmm1
     892:       c5 fa 10 52 04          vmovss xmm2,DWORD PTR [edx+0x4]
     897:       c5 ea 5c 50 04          vsubss xmm2,xmm2,DWORD PTR [eax+0x4]
     89c:       c5 e8 54 d0             vandps xmm2,xmm2,xmm0
     8a0:       c5 ea 5a d2             vcvtss2sd xmm2,xmm2,xmm2
     8a4:       c5 f9 2f d1             vcomisd xmm2,xmm1
     8ae:       c5 fa 10 52 10          vmovss xmm2,DWORD PTR [edx+0x10]
     8b3:       c5 ea 5c 50 10          vsubss xmm2,xmm2,DWORD PTR [eax+0x10]
     8b8:       c5 e8 54 d0             vandps xmm2,xmm2,xmm0
     8bc:       c5 ea 5a d2             vcvtss2sd xmm2,xmm2,xmm2
     8c0:       c5 f9 2f d1             vcomisd xmm2,xmm1

and so on!

Now I have like 15 fps for same 1280x720x30 h264 video + color3way plugin (one 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cinelerra-autovec.png
Type: image/png
Size: 606890 bytes
Desc: not available
URL: <https://lists.cinelerra-gg.org/pipermail/cin/attachments/20190211/b43b21b8/attachment-0001.png>

More information about the Cin mailing list