resend without attached image ------------------ At last for my machine ..not sure if it will break some plugins? from http://gcc.gnu.org/projects/tree-ssa/vectorization.html " To enable vectorization of floating point reductions use -ffast-math or -fassociative-math." I added -ffast-math -ftree-vectorizer-verbose=2 to CFLAGS (global, but I recompiled and installed only color3way plugin) and now we have tons of assmebly! objdump -M intel -d /dev/shm/tmp/cinelerra-goodguy-20190211/cinelerra-5.1/plugins/color3way/i686/color3way.o | ./opcodes.sh -s SSE2 -s SSE3 -s SSE -s AVX -s FMA | less --------------------- 16e: c5 f8 77 vzeroupper -- 427: c5 f8 77 vzeroupper -- 6cc: c5 f8 77 vzeroupper -- 751: c5 f8 77 vzeroupper -- 80e: c5 fa 10 81 00 00 00 vmovss xmm0,DWORD PTR [ecx+0x0] -- 816: c5 fb 10 89 00 00 00 vmovsd xmm1,QWORD PTR [ecx+0x0] -- 824: c5 fa 10 12 vmovss xmm2,DWORD PTR [edx] 828: c5 ea 5c 10 vsubss xmm2,xmm2,DWORD PTR [eax] 82c: c5 e8 54 d0 vandps xmm2,xmm2,xmm0 830: c5 ea 5a d2 vcvtss2sd xmm2,xmm2,xmm2 834: c5 f9 2f d1 vcomisd xmm2,xmm1 -- 83e: c5 fa 10 52 0c vmovss xmm2,DWORD PTR [edx+0xc] 843: c5 ea 5c 50 0c vsubss xmm2,xmm2,DWORD PTR [eax+0xc] 848: c5 e8 54 d0 vandps xmm2,xmm2,xmm0 84c: c5 ea 5a d2 vcvtss2sd xmm2,xmm2,xmm2 850: c5 f9 2f d1 vcomisd xmm2,xmm1 -- 85a: c5 fa 10 52 18 vmovss xmm2,DWORD PTR [edx+0x18] 85f: c5 ea 5c 50 18 vsubss xmm2,xmm2,DWORD PTR [eax+0x18] 864: c5 e8 54 d0 vandps xmm2,xmm2,xmm0 868: c5 ea 5a d2 vcvtss2sd xmm2,xmm2,xmm2 86c: c5 f9 2f d1 vcomisd xmm2,xmm1 -- 876: c5 fa 10 52 24 vmovss xmm2,DWORD PTR [edx+0x24] 87b: c5 ea 5c 50 24 vsubss xmm2,xmm2,DWORD PTR [eax+0x24] 880: c5 e8 54 d0 vandps xmm2,xmm2,xmm0 884: c5 ea 5a d2 vcvtss2sd xmm2,xmm2,xmm2 888: c5 f9 2f d1 vcomisd xmm2,xmm1 -- 892: c5 fa 10 52 04 vmovss xmm2,DWORD PTR [edx+0x4] 897: c5 ea 5c 50 04 vsubss xmm2,xmm2,DWORD PTR [eax+0x4] 89c: c5 e8 54 d0 vandps xmm2,xmm2,xmm0 8a0: c5 ea 5a d2 vcvtss2sd xmm2,xmm2,xmm2 8a4: c5 f9 2f d1 vcomisd xmm2,xmm1 -- 8ae: c5 fa 10 52 10 vmovss xmm2,DWORD PTR [edx+0x10] 8b3: c5 ea 5c 50 10 vsubss xmm2,xmm2,DWORD PTR [eax+0x10] 8b8: c5 e8 54 d0 vandps xmm2,xmm2,xmm0 8bc: c5 ea 5a d2 vcvtss2sd xmm2,xmm2,xmm2 8c0: c5 f9 2f d1 vcomisd xmm2,xmm1 -- and so on! Now I have like 15 fps for same 1280x720x30 h264 video + color3way plugin (one track)
participants (1)
-
Andrew Randrianasulu