<div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-size:small">Simeon,</div><div class="gmail_default" style="font-size:small"><br></div><div class="gmail_default" style="font-size:small">WOW, thank you for your detailed analysis and taking the time to get your information down in writing.  It is very much appreciated.  I have read your email to Bill for discussion and he has some thoughts to relay.  Considerable thought has been put into your ideas already in the past. (I am going to log this in our Feature/Bug Tracker because you never know if someone will want to implement some parts of your ideas).<br></div><div class="gmail_default" style="font-size:small"><br></div></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

how cinelerra could improve performance considerably in cases where little<br>

(or rather: "nothing") is to be done to the input video frames.<br>

<br>"trivial editing" with long<span class="gmail_default" style="font-size:small"> </span>and/or large files:  I could completely avoid<br>

reencoding of the video stream using the command line tool ffmpeg. FFmpeg<br>

works perfectly fine for single "trivial edits", but the command (and<br>

required filters) becomes admittedly complex as soon as multiple edits have<br>

to be made.<br>

<span class="gmail_default" style="font-size:small">...</span><br>

So in my wildest dreams I dreamed of good old cinelerra learning how to do<br>

stream-copying (read up on ffmpeg's -c:v copy and -c:a copy if you are not<br>

familiar with that concept!). As stream-copying does not require to decode<br>

the input, the achievable speed is typically bound by the disk IO -- it can<br>

be as fast as your SSD-Raid at nearly negligible CPU cost.<br></blockquote><div><br></div><div style="font-size:small" class="gmail_default">Cinelerra could carry compressed AND uncompressed data in the vframe, but you want to be able to see the video in the composer so the uncompressed data is needed.  You would need a second uncompressed channel.  If you are using an NLE, the compositor is needed to look at the video and this obviously requires a decode operation.  It turns out that Cinelerra already has Direct Copy BUT where it is applicable and feasible is a small set of operations. <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Please note that stream-copying per definition only works if the packets<br>

from the input are not to be altered at all and the output stream has<br>

exactly the same encoding settings [1]. Only the container format would be<br>

allowed to be different, as long as it can carry the unmodified stream.<br>

<br>

Implementing this in cinelerra would definitely be a huge, nontrivial<span class="gmail_default" style="font-size:small"> </span>change. <span class="gmail_default" style="font-size:small">... </span> a *real* challenge! ;)<br></blockquote><div><br></div><div style="font-size:small" class="gmail_default">A test that Bill uses in deciding whether or not to spend the time for implementation is based on "developer time needed" versus "user time saved".  In this case maybe 500 developer hours to implement stream copying has to be balanced against the 2% of time that users would take advantage of this feature.  That 2% is another guess based on the fact that the majority of people using Cinelerra, are actually planning on doing a lot of editing, not just a trivial amount.<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

I profiled cinelerra-gg using operf during rendering when using an Intel<br>

UHD Graphics 630 GPU (gen9.5) for HW decoding and a Nvidia Quadro P2000<br>

(Pascal nvenc) for encoding.<br></blockquote><div><br></div><div style="font-size:small" class="gmail_default">Again, thanks for passing along the profiling information.  (Bill loves this kind of stuff!)</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

The most time-consuming parts appear to be:<br>

<br>

When setting format to YUV 8bit:<br>

17.7664  BC_Xfer::xfer_yuv888_to_yuv420p(unsigned int, unsigned int)<br>

13.1723  BC_Xfer::xfer_yuv444p_to_yuv888(unsigned int, unsigned int)<br>

10.7678  __memmove_avx_unaligned_erms   [ in <a href="http://libc-2.31.so" rel="noreferrer" target="_blank">libc-2.31.so</a> ]<br>

10.7615  ff_hscale8to15_4_ssse3<br>

 8.5718  BC_Xfer::xfer_yuv888_to_bgr8888(unsigned int, unsigned int)<br>

 2.8518  ff_yuv2plane1_8_avx<br></blockquote><div> </div><div><span class="gmail_default" style="font-size:small">About the above, Bill says "memmove is a tough act to follow" !!</span></div><div><span class="gmail_default" style="font-size:small">And "bgr8888"s purpose is to draw the video on the Display -- got to have this.<br></span></div><div><span class="gmail_default" style="font-size:small"><br></span></div><div><span class="gmail_default" style="font-size:small"></span><span class="gmail_default" style="font-size:small">In any of the profile information provided, anything that has "ff" in it is obviously ffmpeg which uses up a lot of time and most likely has been fine-tuned to work as well as possible and it is very good and almost always fast.  Cinelerra-GG takes full advantage of ffmpeg and so if it is not as fast as it could be, it is so worth it.  Earlier versions of Cinelerra, had their own implements and could be improved upon, others were and still are exceptional work.</span><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

During rendering, two of eight cores were at 70-85% (according to htop). As<br>

none reached 100% alone, but the sum is above 100%, I'm not really sure<br>

whether rendering is currently CPU bound or rather memory bound. If someone<br>

knows a good tool how to discriminate between these two bounds, please tell<br>

me! In case this should be CPU bound, multithreading in this part of the<br>

code might help, as I have (more than) 6 idling CPU cores left on this<br>

machine ;)<br></blockquote><div><br></div><div style="font-size:small" class="gmail_default">The contention if most likely to be "logic bound", that is probably due to locks; i.e. have to wait for something to happen before can proceed.  One thing that is likely to help is "having the plugin stack use threads" - that way the plugins would be queuing up in advance and ready to go., pipelined instead of demand pull.<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

With RGBA 8bit transcoding (i.e. rendering a timeline consisting of a<br>

single "unmodified" input clip) of a FullHD 25p h264 video using HW<br>

accelerated cinelerra can take now only a quarter of the playback time (in<br>

ffmpeg notation: speed=4x).<br></blockquote><div><br></div><div style="font-size:small" class="gmail_default">Yes, the above really is quite impressive. <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

While this might seem impressive at first sight (this is equivalent to 4K<br>

transcoding in realtime!) this is still a fraction of the speed ffmpeg<br>

achieves for the same transcoding path <br></blockquote><div> </div><div style="font-size:small" class="gmail_default">Believe it or not, we added Settings->Transcode to Cinelerra-GG despite the fact that we informed the users that the convert/copy operation would be much faster done from the command line.</div><div style="font-size:small" class="gmail_default"><br></div><div style="font-size:small" class="gmail_default">Phyllis/Bill<br></div></div></div>