[Cin] Autovectorization works! (resend)

Andrew Randrianasulu randrianasulu at gmail.com
Tue Feb 12 03:01:03 CET 2019


resend without attached image
------------------
At last for my machine ..not sure if it will break some plugins?

from

http://gcc.gnu.org/projects/tree-ssa/vectorization.html

" To enable vectorization of floating point reductions use -ffast-math 
or -fassociative-math."

I added 

-ffast-math  -ftree-vectorizer-verbose=2 to CFLAGS (global, but I recompiled and 
installed only color3way plugin)

and now we have tons of assmebly!

objdump -M 
intel -d /dev/shm/tmp/cinelerra-goodguy-20190211/cinelerra-5.1/plugins/color3way/i686/color3way.o | ./opcodes.sh -s 
SSE2 -s SSE3  -s SSE -s AVX -s FMA | less

---------------------

    16e:       c5 f8 77                vzeroupper
--
     427:       c5 f8 77                vzeroupper
--
     6cc:       c5 f8 77                vzeroupper
--
     751:       c5 f8 77                vzeroupper
--
     80e:       c5 fa 10 81 00 00 00    vmovss xmm0,DWORD PTR [ecx+0x0]
--
     816:       c5 fb 10 89 00 00 00    vmovsd xmm1,QWORD PTR [ecx+0x0]
--
     824:       c5 fa 10 12             vmovss xmm2,DWORD PTR [edx]
     828:       c5 ea 5c 10             vsubss xmm2,xmm2,DWORD PTR [eax]
     82c:       c5 e8 54 d0             vandps xmm2,xmm2,xmm0
     830:       c5 ea 5a d2             vcvtss2sd xmm2,xmm2,xmm2
     834:       c5 f9 2f d1             vcomisd xmm2,xmm1
--
     83e:       c5 fa 10 52 0c          vmovss xmm2,DWORD PTR [edx+0xc]
     843:       c5 ea 5c 50 0c          vsubss xmm2,xmm2,DWORD PTR [eax+0xc]
     848:       c5 e8 54 d0             vandps xmm2,xmm2,xmm0
     84c:       c5 ea 5a d2             vcvtss2sd xmm2,xmm2,xmm2
     850:       c5 f9 2f d1             vcomisd xmm2,xmm1
--
     85a:       c5 fa 10 52 18          vmovss xmm2,DWORD PTR [edx+0x18]
     85f:       c5 ea 5c 50 18          vsubss xmm2,xmm2,DWORD PTR [eax+0x18]
     864:       c5 e8 54 d0             vandps xmm2,xmm2,xmm0
     868:       c5 ea 5a d2             vcvtss2sd xmm2,xmm2,xmm2
     86c:       c5 f9 2f d1             vcomisd xmm2,xmm1
--
     876:       c5 fa 10 52 24          vmovss xmm2,DWORD PTR [edx+0x24]
     87b:       c5 ea 5c 50 24          vsubss xmm2,xmm2,DWORD PTR [eax+0x24]
     880:       c5 e8 54 d0             vandps xmm2,xmm2,xmm0
     884:       c5 ea 5a d2             vcvtss2sd xmm2,xmm2,xmm2
     888:       c5 f9 2f d1             vcomisd xmm2,xmm1
--
     892:       c5 fa 10 52 04          vmovss xmm2,DWORD PTR [edx+0x4]
     897:       c5 ea 5c 50 04          vsubss xmm2,xmm2,DWORD PTR [eax+0x4]
     89c:       c5 e8 54 d0             vandps xmm2,xmm2,xmm0
     8a0:       c5 ea 5a d2             vcvtss2sd xmm2,xmm2,xmm2
     8a4:       c5 f9 2f d1             vcomisd xmm2,xmm1
--
     8ae:       c5 fa 10 52 10          vmovss xmm2,DWORD PTR [edx+0x10]
     8b3:       c5 ea 5c 50 10          vsubss xmm2,xmm2,DWORD PTR [eax+0x10]
     8b8:       c5 e8 54 d0             vandps xmm2,xmm2,xmm0
     8bc:       c5 ea 5a d2             vcvtss2sd xmm2,xmm2,xmm2
     8c0:       c5 f9 2f d1             vcomisd xmm2,xmm1
--


and so on!

Now I have like 15 fps for same 1280x720x30 h264 video + color3way plugin (one 
track)



More information about the Cin mailing list