[Cin] Autovectorization works!
Andrew Randrianasulu
randrianasulu at gmail.com
Mon Feb 11 05:41:50 CET 2019
At last for my machine ..not sure if it will break some plugins?
from
http://gcc.gnu.org/projects/tree-ssa/vectorization.html
" To enable vectorization of floating point reductions use -ffast-math
or -fassociative-math."
I added
-ffast-math -ftree-vectorizer-verbose=2 to CFLAGS (global, but I recompiled and
installed only color3way plugin)
and now we have tons of assmebly!
objdump -M
intel -d /dev/shm/tmp/cinelerra-goodguy-20190211/cinelerra-5.1/plugins/color3way/i686/color3way.o | ./opcodes.sh -s
SSE2 -s SSE3 -s SSE -s AVX -s FMA | less
---------------------
16e: c5 f8 77 vzeroupper
--
427: c5 f8 77 vzeroupper
--
6cc: c5 f8 77 vzeroupper
--
751: c5 f8 77 vzeroupper
--
80e: c5 fa 10 81 00 00 00 vmovss xmm0,DWORD PTR [ecx+0x0]
--
816: c5 fb 10 89 00 00 00 vmovsd xmm1,QWORD PTR [ecx+0x0]
--
824: c5 fa 10 12 vmovss xmm2,DWORD PTR [edx]
828: c5 ea 5c 10 vsubss xmm2,xmm2,DWORD PTR [eax]
82c: c5 e8 54 d0 vandps xmm2,xmm2,xmm0
830: c5 ea 5a d2 vcvtss2sd xmm2,xmm2,xmm2
834: c5 f9 2f d1 vcomisd xmm2,xmm1
--
83e: c5 fa 10 52 0c vmovss xmm2,DWORD PTR [edx+0xc]
843: c5 ea 5c 50 0c vsubss xmm2,xmm2,DWORD PTR [eax+0xc]
848: c5 e8 54 d0 vandps xmm2,xmm2,xmm0
84c: c5 ea 5a d2 vcvtss2sd xmm2,xmm2,xmm2
850: c5 f9 2f d1 vcomisd xmm2,xmm1
--
85a: c5 fa 10 52 18 vmovss xmm2,DWORD PTR [edx+0x18]
85f: c5 ea 5c 50 18 vsubss xmm2,xmm2,DWORD PTR [eax+0x18]
864: c5 e8 54 d0 vandps xmm2,xmm2,xmm0
868: c5 ea 5a d2 vcvtss2sd xmm2,xmm2,xmm2
86c: c5 f9 2f d1 vcomisd xmm2,xmm1
--
876: c5 fa 10 52 24 vmovss xmm2,DWORD PTR [edx+0x24]
87b: c5 ea 5c 50 24 vsubss xmm2,xmm2,DWORD PTR [eax+0x24]
880: c5 e8 54 d0 vandps xmm2,xmm2,xmm0
884: c5 ea 5a d2 vcvtss2sd xmm2,xmm2,xmm2
888: c5 f9 2f d1 vcomisd xmm2,xmm1
--
892: c5 fa 10 52 04 vmovss xmm2,DWORD PTR [edx+0x4]
897: c5 ea 5c 50 04 vsubss xmm2,xmm2,DWORD PTR [eax+0x4]
89c: c5 e8 54 d0 vandps xmm2,xmm2,xmm0
8a0: c5 ea 5a d2 vcvtss2sd xmm2,xmm2,xmm2
8a4: c5 f9 2f d1 vcomisd xmm2,xmm1
--
8ae: c5 fa 10 52 10 vmovss xmm2,DWORD PTR [edx+0x10]
8b3: c5 ea 5c 50 10 vsubss xmm2,xmm2,DWORD PTR [eax+0x10]
8b8: c5 e8 54 d0 vandps xmm2,xmm2,xmm0
8bc: c5 ea 5a d2 vcvtss2sd xmm2,xmm2,xmm2
8c0: c5 f9 2f d1 vcomisd xmm2,xmm1
--
and so on!
Now I have like 15 fps for same 1280x720x30 h264 video + color3way plugin (one
track)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cinelerra-autovec.png
Type: image/png
Size: 606890 bytes
Desc: not available
URL: <https://lists.cinelerra-gg.org/pipermail/cin/attachments/20190211/b43b21b8/attachment-0001.png>
More information about the Cin
mailing list