..after installing python3 (latest Cin-GG uses it for somescript during build process). Default optimizations ("-O2, -fPIC") resulted in 22 fps playback with color3way effect enabled over 1280x720 (RGBA-8 color format in project - default) video. Upgrading optimizations to "-O3 -ffast-math -march=native -mtune=native -fPIC" resulted in 30 fps playback! Switching video output to OpenGL resulted in less fps. At most 64-bit Cin-GG can play two 1280x720 h264 MOV videos one on top of another, where one video with color3way effect and another - 50% transparent. 22 fps :} I'm still not sure if resulted Cin-GG compilation actually correct (mathematically-speaking). You can compile 'normal' and 'optimized' version and just swap some plugins, even one-by one and see if they still work correctly?
Thank you, Adrian. You have provided a lot of news and all interesting. I am not competent for compilations and coding, but it seems that we have almost reached the intrinsic limits of which the group of Lumiera spoke (IMO): https://www.lumiera.org/project/background/history/CinelerraWoes.html CFLAGS: In the Arch wiki they advise against the -03 option because it brings instability and there are times when it is slower than -02. But your results aren't bad. Can I apply these CFLAGS options to my Arch? Are they general or do they only apply to Slackware? Can I try vectorization with -ffast-math or are there any other ways that would advise against it to an incompetent like me?
В сообщении от Tuesday 12 February 2019 12:17:13 Andrea paz написал(а):
Thank you, Adrian. You have provided a lot of news and all interesting. I am not competent for compilations and coding, but it seems that we have almost reached the intrinsic limits of which the group of Lumiera spoke (IMO): https://www.lumiera.org/project/background/history/CinelerraWoes.html CFLAGS: In the Arch wiki they advise against the -03 option because it brings instability and there are times when it is slower than -02. But your results aren't bad. Can I apply these CFLAGS options to my Arch? Are they general or do they only apply to Slackware? Can I try vectorization with -ffast-math or are there any other ways that would advise against it to an incompetent like me?
CFLAGS are compiler options, and thus must be distribution-agnostic. Try those in Arch, of course, but name package like -opimized, so it will be clear even after some time what exactly you have installed. Do your usual work, including all those effect and most importantly final rendering (I haven't tested this step!). If after some playing results are visually (or numerically - if you care about it enough) identical to standard build, but faster (and Cin doesn't crash on you more often/in unusual ways) - you can keep those flags. You also can try to enable ONLY auto-vectorization with -O2 level of generic optimizations.
update: actually, i was pointed at this article: https://blogs.msdn.microsoft.com/vcblog/2015/10/19/do-you-prefer-fast-or-pre... it all about Microsoft compiler, but underlaying problem should be the same for all compilers. It even says _sometimes_ auto-vectorization can produce more correct results! --------------------quote--------- Counter Example The explanations so far would lead you to expect that /fp:fast will sometimes (maybe always?) produce a result that is less accurate than /fp:precise. As a simple example, let’s consider the sum of the first million reciprocals, or Sum(1/n) for n = 1..1000000. I calculated the approximate result using floats, and the correct result using Boost’s cpp_dec_float (to a precision of 100 decimal digits). With /O2 level of optimization, the results are: float /fp:precise 14.3574 float /fp:fast 14.3929 cpp_dec_float<100> 14.39272672286 So the /fp:fast result is nearer the correct answer than the /fp:precise! How can this be? With /fp:fast the auto-vectorizer emits the SIMD RCPPS machine instruction, which is both faster and more accurate than the DIVSS emitted for /fp:precise. This is just one specific case. But the point is that even a complete error analysis won’t tell you whether /fp:fastis acceptable in your App – there’s more going on. The only way to be sure is to test your App under each regime and compare answers. ----------------------quote end-------- --------- В сообщении от Tuesday 12 February 2019 12:17:13 Andrea paz написал(а):
Thank you, Adrian. You have provided a lot of news and all interesting. I am not competent for compilations and coding, but it seems that we have almost reached the intrinsic limits of which the group of Lumiera spoke (IMO): https://www.lumiera.org/project/background/history/CinelerraWoes.html CFLAGS: In the Arch wiki they advise against the -03 option because it brings instability and there are times when it is slower than -02. But your results aren't bad. Can I apply these CFLAGS options to my Arch? Are they general or do they only apply to Slackware? Can I try vectorization with -ffast-math or are there any other ways that would advise against it to an incompetent like me?
I did some quick tests: CFLAGS std --> -02 TIMEcompile= about 20 min Playback --> unchecked "play every frame"; X11-OpenGL Only edit --> framerate achivied = 30 edit + effects --> = 07 Renderig of 20 sec of mp4 video. Preset hd.youtube = 1 min 59 sec CFLAGS new --> -03 -ffast-math TIMEcompile= about 20 min (the same!) Playback --> unchecked "play every frame"; X11-OpenGL Only edit --> framerate achivied = 30 edit + effects --> = 13 Renderig of 20 sec of mp4 video. Preset hd.youtube = 1 min 59 sec (the same!)
В сообщении от Tuesday 12 February 2019 16:25:27 Andrea paz написал(а):
I did some quick tests: CFLAGS std --> -02 TIMEcompile= about 20 min Playback --> unchecked "play every frame"; X11-OpenGL
Only edit --> framerate achivied = 30 edit + effects --> = 07
Renderig of 20 sec of mp4 video. Preset hd.youtube = 1 min 59 sec
CFLAGS new --> -03 -ffast-math TIMEcompile= about 20 min (the same!) Playback --> unchecked "play every frame"; X11-OpenGL
Only edit --> framerate achivied = 30 edit + effects --> = 13
Renderig of 20 sec of mp4 video. Preset hd.youtube = 1 min 59 sec (the same!)
Thanks for testing! It seems decoding/encoding (via ffmpeg embedded in Cinelerra) already uses assembler optimizations, but effects still can be accelerated via this 'cheap and dirty' technique. BUT, most importantly, is final video still contains all effects as intended in both cases? because there is no point in making fast but incorrect effects ...
If you want to do tests with my video and EDL, download the material at the link: https://www.dropbox.com/s/dmqwrh095g1pjkw/plugin.zip?dl=0
participants (3)
-
Andrea paz -
Andrew Randrianasulu -
Kovács László