Observations using GPU on DNxHD and MPEG proxy while running CinelerraGG
This is an amalgamation of email from Pierre / me that should have been in the mailing list but I missed seeing that. *Summary first* in case you don't want to read it all. And just FYI, Pierre tests on Mint18 and I use Fedora29 with totally different processors, number of CPUs and even brand of graphics boards. 1) Pierre: I don't really see any gains either with X11 or X11-OpenGL, the viewing in the composer may be a little more fluid but I'm not sure ... shouldn't vdpau be able to decode these mpeg? 2) Phyllis: When you use X11-OpenGL, which was written long ago when there was mostly only 1 CPU so only 1 thread, the computer to the Graphics board can become bottlenecked with Cinelerra calling for OpenGL graphics and at the same time the GPU being used with vdpau (can not confirm that this is happening). 3) Pierre: I don't have the feeling that the GPU decodes video tracks under X11-OpenGL even in the case of mpeg proxies. 4) Phyllis: with certain hardware, I think you might be correct about GPU not doing the decoding under X11-OpenGL, but I can not find anything that corroborates that. I do see that with "loglevel=verbose" a vdpau device is created in either the X11 or the OpenGL driver case. But I am finding CPU usage is actually higher with the X11-OpenGL driver PLUS vdapu than X11-OpenGL MINUS vdpau with my computer hardware. 5) Phyllis: on computers with lots of CPU cores it does not seem worthwhile to bother with using the graphics board GPU for decoding. And that might apply to encoding too in the case of the final render because using the Render Farm (on a single computer with lots of cores) is pretty fast. *Pierre's Tests Results* (Intel computer with Nvidia graphics board) "DNxHD corrected.xml Test" file X11-OpenGL driver command CIN_HW_DEV=vdpau ./cin Proxys mpeg cpu % 21.6 - 43.7 frame/sec 11.18 - 12.16 DNxHD media cpu % 13.4 - 44.8 frame/sec 11.54 - 12.16 "DNxHD corrected.xml Test" file X11-OpenGL driver order /home/stone/Cinelerra-GG_5.1/cin Proxys mpeg cpu % 19.1 - 44.3 frame/sec 11.39 - 12.24 DNxHD media cpu % 19.6 - 44.5 frame/sec 11.32 - 12.16 "DNxHD corrected.xml Test" file X11 Pilot command CIN_HW_DEV=vdpau ./cin Proxys mpeg cpu % 19.5 - 41.7 frame/sec 29.97 - 30.15 DNxHD media cpu % 22.9 - 40.3 frame/sec 28.24 - 31.15 "DNxHD corrected.xml Test" file X11 Pilot order /home/stone/Cinelerra-GG_5.1/cin Proxys mpeg cpu % 23.08 - 42.4 frame/sec 29.97 - 31.02 DNxHD media cpu % 21.7 - 43.5 frame/sec 29.97 - 31.02 Interesting.... At first glance, I would say that: X11-OpenGL with or without vdpau Does not decode DNxHD sources Do not decode Proxys mpeg x11 with or without vdpau Decodes DNxHD sources Decode mpeg proxies I think I will now have to do some identical tests with HDV and H264.mp4 sources. *Short Phyllis tests (*AMD computer, Radeon graphics board) using the proxy Mpeg version, I see: 11% cpu usage with X11-OpenGL 13% cpu usage with X11-OpenGL + vdpau/GPU 16% cpu usage with X11 + vdpau/GPU 21% cpu usage with X11 *Most of the rest of the email thread is below.* Pierre Observation: I don't really see any gains either with X11 or X11-OpenGL, the viewing in the composer may be a lit tle more fluid but I'm not sure. I'm surprised that X11-OpenGL can't be fluid with mpeg.mpeg proxies, while it's much more fluid with Clowns' h264.mpeg and X11 does it all much better. But, as it is the mpeg.mpeg proxies that I actually use, shouldn't vdpau be able to decode these mpeg? Phyllis some response: So I had 4 dnxhd files from previous reports and I proxied them as 1/2 mpeg-s. Although it is not o bvious that they are using vdpau, they actually are. Bill reminded me to edit ffmpeg/decode.opts an d change "loglevel=fatal" to "loglevel=verbose", restart Cinelerra and then in the cinelerra startup window you will see messages for the Mpegs: (you might have to also edit bin/ffmpeg/decode.opts ) [AVHWDeviceContext @ 0x7fff182c3cc0] Successfully created a VDPAU device (G3DVL VDPAU Driver Shared Library version 1.0) on X11 display :0 [AVHWDeviceContext @ 0x7ffea8afc980] Successfully created a VDPAU device (G3DVL VDPAU Driver Shared Library version 1.0) on X11 display :0 [AVHWDeviceContext @ 0x7fff6c1b12c0] Successfully created a VDPAU device (G3DVL VDPAU Driver Shared Library version 1.0) on X11 display :0 [AVHWDeviceContext @ 0x7fff6f223300] Successfully created a VDPAU device (G3DVL VDPAU Driver Shared Library version 1.0) on X11 display :0 When you use X11-OpenGL, which was written long ago when there was mostly only 1 CPU so only 1 threa d, the computer to the Graphics board can become bottlenecked with Cinelerra calling for OpenGL grap hics and at the same time the GPU being used with vdpau. ---------------------------------------------------------------------------------------------- Pierre Observation: What surprises me though is that this difficulty does not exist under X11; the accumulation of video tracks from mixers does not cause the composer to slow down under this video driver.When I play DNxHD sources the CPU usage is 37-41%, If I play proxies in mpeg, the CPU usage is 10-18%. But in both cases (DNxHD and proxies in mpeg), the frame rate (in CinGG's preferences) is 11-12 frame/s (whereas the normal rate should be 29.97 frame/s). Phyllis some response: When I play it, X11-OpenGL slows down so that the mixers are done playing and compositor is only on frame 106; but with X11 I still have slow down, just not as bad. The mixers are done playing but th e compositor is still playing at about frame 175 - so for me X11 does still have the difficulty. I suspect that your computer is faster with more cores. For example, with X11-OpenGL, if I run the "top" command from another window and watch it, I see it goes to 489% so is using threads/multiple c ores BUT it must be waiting on the single threaded Graphics Board. Since when I just use X11, the p rogram is not waiting on the graphics boards and runs at 600%. The graphics board is likely a bottl eneck. --------------------------------------------------------------------------------------------------- Pierre Observation: Under the X11 video driver: With DNxHD, the CPU is at 35% and the frame rate at 29.97-30 frame/s With Proxys in mpeg, the CPU is at 6.5-10.9% and the frame rate is also at 29.97-30 frame/s. I don't have the feeling that the GPU decodes video tracks under X11-OpenGL even in the case of mpeg proxies. Secondly, Under the X11-OpenGL video driver: When I play DNxHD sources the CPU usage is 37-41%, If I play proxies in mpeg, the CPU usage is 10-18%. But in both cases (DNxHD and proxies in mpeg), the frame rate (in CinGG's preferences) is 11-12 frame/s (whereas the normal rate should be 29.97 frame/s). Phyllis some response: Unfortunately, I am not seeing this. For example with: EX-EGO Test DNxHD/Cam-4_MVI_1321_EX-EGO_cam-D.mov I see 29.97 fps in preferences and 60% CPU EX-EGO Test DNxHD/Cam-4_MVI_1321_EX-EGO_cam-D.proxy2-mov.mpeg and here I see 29.97 fps and 70% CPU Something else must be causing the problem you see. GG noticed that these DNxHD sources are very large in size but I do not think that the disk I/O would slow anything down. I think you might be correct about GPU not doing the decoding under X11-OpenGL, but I can not find anything that corroborates that. I do see that with "loglevel=verbose" a vdpau device is created in either the X11 or the OpenGL driver case. But I am finding CPU usage is actually higher with the X11-OpenGL driver PLUS vdapu than X11-OpenGL MINUS vdpau. So I thought it was a really bad idea to use OpenGL and GPU/vdpau together. However, Sam and Andrea thought they got better results using X11-OpenGL than X11 with vdpau enabled. This has been mystifying to me as I definitely only saw good improvements using X11. Since we can not figure it out, I have decided that it might be due to the actual Graphics Card being used in conjunction with the Nvidia driver and operating system. Summary - on computers with lots of CPU cores it does not seem worthwhile to bother with using the graphics board GPU for decoding. And that probably applies to encoding too in the case of the final render because using the Render Farm (on a single computer with an Epyc chip) is so fast as to be trivial. ------------------------------------------------------------------------------------------------------------ Pierre tests on the following: The processor of my computer is an i7-3770k, so it has 4 physical core, 8 threads by Hyper-Threading (2 processing threads per physical core) at 3.50 GHz (turbo 3.90 GHz). 32 GB of ram and an nVidia GTX-750ti extension video card. The nVidia card, which is a few years old, is not very powerful for aespecially useful to me because of its ability to manage 4 monitors 1920x1080 simultaneously. In my case, it is connected to three monitors. I therefore suspect that this video card does not offer a significantly greater performance gain than the possibilities of my CPU... Phyllis tests on the following: AMD 8-Core RYZEN 7 1700 Processor, 3.0 GHz cache size : 512 KB memory : 64 GB Radeon Graphics Board : Radeon RX580 4GB gamer type card (the really powerful models have become extremely expensive since the arrival of bitcoin mining...). This video card is
I continued my tests. First with HDV files and then with H264.mp4 AVC files from my old cell phone. In each of these tests I used 4 mixers. I did them with the X11-OpenGL drivers then X11 with and without, the line CIN_HW_DEV=vdpau ./cin Unlike DNxHD files and Proxys mpeg files that I tested in the same way and that did not seem to show any reduction in CPU usage With these HDV and H264.mp4 AVC files, the results seem to indicate that both with X11-OpenGL, and under X11, if I use vdpau there is a reduction in the % of CPU usage. However, as for HDV and H264.mp4 AVC files, and this is what seems essential to me, only X11 is able to support an acceptable frame rate (around 29.97 frames/sec) in all these tests including 4 mixers. "Test HDV.xml" file HDV (MPEG-2.m2t) media (with 4 mixers) X11-OpenGL driver command CIN_HW_DEV=vdpau ./cin cpu % 8.1 - 16.1 frame/sec 10.24 - 12.16 "Test HDV.xml" file HDV (MPEG-2.m2t) media (with 4 mixers) X11-OpenGL driver command /cin cpu % 20.4 - 28.9 frame/sec 11.39 - 12.08 "Test HDV.xml" file HDV (MPEG-2.m2t) media (with 4 mixers) X11 Pilot command CIN_HW_DEV=vdpau ./cin cpu % 7.9 - 12.9 frame/sec 29.97 - 30.12 "Test HDV.xml" file HDV (MPEG-2.m2t) media (with 4 mixers) X11 Pilot command /cin cpu % 13.5 - 25.7 frame/sec 29.97 - 30.92 "Test Clowns .xml" file AVC H264.mp4 media (with 4 mixers) X11-OpenGL driver command CIN_HW_DEV=vdpau ./cin cpu % 16.2 - 28.0 frame/sec 9.89 - 13.74 "Test Clowns .xml" file AVC H264.mp4 media (with 4 mixers) X11-OpenGL driver command /cin cpu % 20.4 - 37.06 frame/sec 9.99 - 15.25 "Test Clowns .xml" file AVC H264.mp4 media (with 4 mixers) X11 Pilot command CIN_HW_DEV=vdpau ./cin cpu % 7.5 - 10.5 frame/sec 25.79 - 30.06 "Test Clowns .xml" file AVC H264.mp4 media (with 4 mixers) X11 Pilot command /cin cpu % 15.6 - 33.4 frame/sec 30.00 - 30.06 Pierre On 19-05-13 14 h 02, Phyllis Smith wrote:
This is an amalgamation of email from Pierre / me that should have been in the mailing list but I missed seeing that.
*_Summary first_* in case you don't want to read it all. And just FYI, Pierre tests on Mint18 and I use Fedora29 with totally different processors, number of CPUs and even brand of graphics boards.
1) Pierre: I don't really see any gains either with X11 or X11-OpenGL, the viewing in the composer may be a little more fluid but I'm not sure ... shouldn't vdpau be able to decode these mpeg? 2) Phyllis: When you use X11-OpenGL, which was written long ago when there was mostly only 1 CPU so only 1 thread, the computer to the Graphics board can become bottlenecked with Cinelerra calling for OpenGL graphics and at the same time the GPU being used with vdpau (can not confirm that this is happening). 3) Pierre: I don't have the feeling that the GPU decodes video tracks under X11-OpenGL even in the case of mpeg proxies. 4) Phyllis: with certain hardware, I think you might be correct about GPU not doing the decoding under X11-OpenGL, but I can not find anything that corroborates that. I do see that with "loglevel=verbose" a vdpau device is created in either the X11 or the OpenGL driver case. But I am finding CPU usage is actually higher with the X11-OpenGL driver PLUS vdapu than X11-OpenGL MINUS vdpau with my computer hardware. 5) Phyllis: on computers with lots of CPU cores it does not seem worthwhile to bother with using the graphics board GPU for decoding. And that might apply to encoding too in the case of the final render because using the Render Farm (on a single computer with lots of cores) is pretty fast.
*_Pierre's Tests Results_* (Intel computer with Nvidia graphics board)*_ _* "DNxHD corrected.xml Test" file X11-OpenGL driver command CIN_HW_DEV=vdpau ./cin
Proxys mpeg cpu % 21.6 - 43.7 frame/sec 11.18 - 12.16
DNxHD media cpu % 13.4 - 44.8 frame/sec 11.54 - 12.16
"DNxHD corrected.xml Test" file X11-OpenGL driver order /home/stone/Cinelerra-GG_5.1/cin
Proxys mpeg cpu % 19.1 - 44.3 frame/sec 11.39 - 12.24
DNxHD media cpu % 19.6 - 44.5 frame/sec 11.32 - 12.16
"DNxHD corrected.xml Test" file X11 Pilot command CIN_HW_DEV=vdpau ./cin
Proxys mpeg cpu % 19.5 - 41.7 frame/sec 29.97 - 30.15
DNxHD media cpu % 22.9 - 40.3 frame/sec 28.24 - 31.15 "DNxHD corrected.xml Test" file X11 Pilot order /home/stone/Cinelerra-GG_5.1/cin
Proxys mpeg cpu % 23.08 - 42.4 frame/sec 29.97 - 31.02
DNxHD media cpu % 21.7 - 43.5 frame/sec 29.97 - 31.02
Interesting....
At first glance, I would say that:
X11-OpenGL with or without vdpau
Does not decode DNxHD sources
Do not decode Proxys mpeg
x11 with or without vdpau
Decodes DNxHD sources
Decode mpeg proxies
I think I will now have to do some identical tests with HDV and H264.mp4 sources.
*Short Phyllis tests (*AMD computer, Radeon graphics board) using the proxy Mpeg version, I see: 11% cpu usage with X11-OpenGL 13% cpu usage with X11-OpenGL + vdpau/GPU 16% cpu usage with X11 + vdpau/GPU 21% cpu usage with X11
_*Most of the rest of the email thread is below.*_
Pierre Observation:
I don't really see any gains either with X11 or X11-OpenGL, the viewing in the composer may be a lit tle more fluid but I'm not sure.
I'm surprised that X11-OpenGL can't be fluid with mpeg.mpeg proxies, while it's much more fluid with Clowns' h264.mpeg and X11 does it all much better.
But, as it is the mpeg.mpeg proxies that I actually use, shouldn't vdpau be able to decode these mpeg?
Phyllis some response:
So I had 4 dnxhd files from previous reports and I proxied them as 1/2 mpeg-s. Although it is not o bvious that they are using vdpau, they actually are. Bill reminded me to edit ffmpeg/decode.opts an d change "loglevel=fatal" to "loglevel=verbose", restart Cinelerra and then in the cinelerra startup window you will see messages for the Mpegs: (you might have to also edit bin/ffmpeg/decode.opts )
[AVHWDeviceContext @ 0x7fff182c3cc0] Successfully created a VDPAU device (G3DVL VDPAU Driver Shared Library version 1.0) on X11 display :0 [AVHWDeviceContext @ 0x7ffea8afc980] Successfully created a VDPAU device (G3DVL VDPAU Driver Shared Library version 1.0) on X11 display :0 [AVHWDeviceContext @ 0x7fff6c1b12c0] Successfully created a VDPAU device (G3DVL VDPAU Driver Shared Library version 1.0) on X11 display :0 [AVHWDeviceContext @ 0x7fff6f223300] Successfully created a VDPAU device (G3DVL VDPAU Driver Shared Library version 1.0) on X11 display :0
When you use X11-OpenGL, which was written long ago when there was mostly only 1 CPU so only 1 threa d, the computer to the Graphics board can become bottlenecked with Cinelerra calling for OpenGL grap hics and at the same time the GPU being used with vdpau. ----------------------------------------------------------------------------------------------
Pierre Observation:
What surprises me though is that this difficulty does not exist under X11; the accumulation of video tracks from mixers does not cause the composer to slow down under this video driver.When I play DNxHD sources the CPU usage is 37-41%, If I play proxies in mpeg, the CPU usage is 10-18%. But in both cases (DNxHD and proxies in mpeg), the frame rate (in CinGG's preferences) is 11-12 frame/s (whereas the normal rate should be 29.97 frame/s).
Phyllis some response:
When I play it, X11-OpenGL slows down so that the mixers are done playing and compositor is only on frame 106; but with X11 I still have slow down, just not as bad. The mixers are done playing but th e compositor is still playing at about frame 175 - so for me X11 does still have the difficulty.
I suspect that your computer is faster with more cores. For example, with X11-OpenGL, if I run the "top" command from another window and watch it, I see it goes to 489% so is using threads/multiple c ores BUT it must be waiting on the single threaded Graphics Board. Since when I just use X11, the p rogram is not waiting on the graphics boards and runs at 600%. The graphics board is likely a bottl eneck. ---------------------------------------------------------------------------------------------------
Pierre Observation:
Under the X11 video driver: With DNxHD, the CPU is at 35% and the frame rate at 29.97-30 frame/s With Proxys in mpeg, the CPU is at 6.5-10.9% and the frame rate is also at 29.97-30 frame/s.
I don't have the feeling that the GPU decodes video tracks under X11-OpenGL even in the case of mpeg proxies.
Secondly,
Under the X11-OpenGL video driver: When I play DNxHD sources the CPU usage is 37-41%, If I play proxies in mpeg, the CPU usage is 10-18%. But in both cases (DNxHD and proxies in mpeg), the frame rate (in CinGG's preferences) is 11-12 frame/s (whereas the normal rate should be 29.97 frame/s).
Phyllis some response:
Unfortunately, I am not seeing this. For example with: EX-EGO Test DNxHD/Cam-4_MVI_1321_EX-EGO_cam-D.mov I see 29.97 fps in preferences and 60% CPU EX-EGO Test DNxHD/Cam-4_MVI_1321_EX-EGO_cam-D.proxy2-mov.mpeg and here I see 29.97 fps and 70% CPU
Something else must be causing the problem you see. GG noticed that these DNxHD sources are very large in size but I do not think that the disk I/O would slow anything down.
I think you might be correct about GPU not doing the decoding under X11-OpenGL, but I can not find anything that corroborates that. I do see that with "loglevel=verbose" a vdpau device is created in either the X11 or the OpenGL driver case. But I am finding CPU usage is actually higher with the X11-OpenGL driver PLUS vdapu than X11-OpenGL MINUS vdpau. So I thought it was a really bad idea to use OpenGL and GPU/vdpau together.
However, Sam and Andrea thought they got better results using X11-OpenGL than X11 with vdpau enabled. This has been mystifying to me as I definitely only saw good improvements using X11. Since we can not figure it out, I have decided that it might be due to the actual Graphics Card being used in conjunction with the Nvidia driver and operating system.
Summary - on computers with lots of CPU cores it does not seem worthwhile to bother with using the graphics board GPU for decoding. And that probably applies to encoding too in the case of the final render because using the Render Farm (on a single computer with an Epyc chip) is so fast as to be trivial. ------------------------------------------------------------------------------------------------------------
Pierre tests on the following:
The processor of my computer is an i7-3770k, so it has 4 physical core, 8 threads by Hyper-Threading (2 processing threads per physical core) at 3.50 GHz (turbo 3.90 GHz). 32 GB of ram and an nVidia GTX-750ti extension video card.
The nVidia card, which is a few years old, is not very powerful for aespecially useful to me because of its ability to manage 4 monitors 1920x1080 simultaneously. In my case, it is connected to three monitors.
I therefore suspect that this video card does not offer a significantly greater performance gain than the possibilities of my CPU...
Phyllis tests on the following:
AMD 8-Core RYZEN 7 1700 Processor, 3.0 GHz cache size : 512 KB memory : 64 GB Radeon Graphics Board : Radeon RX580 4GB
gamer type card (the really powerful models have become extremely expensive since the arrival of bitcoin mining...). This video card is
This time I completed my tests by adding Proxies to the HDV and AVC H264.mp4 test projects I did yesterday. Still in the context of use with 4 mixers. These are Proxys in mpeg with the same characteristics as those I had tested with DNxHD media. The results of these tests of the mpeg proxies tell me that with both the X11-OpenGL driver and the X11 driver, using vdpau results in a very slight reduction in the use of my CPU, but that this does not improve the frame rate possible that these video drivers allow to display... X11 allows in all cases to display at least 29.97 frame/sec sources that have been shot at this speed. X11-OpenGL is always limited to a maximum of about 12 frames/sec. These results are approximately true for all the types of media I tested, whether DNxHD.mov, HDV (MPEG-2.m2t), AVC H264.mp4 or even proxies in mpeg.mpeg. Given these results, I don't really see the advantage of using proxies... In any case, the video driver used will determine the possible frame rate regardless of the type of media used. I'm actually wondering if the constant frame rate limit of 12 frames/sec provided by X11-OpenGL in my tests with 4 mixers, regardless of the media type, doesn't actually indicate a bug somewhere or a limit inherent in my equipment. But then how do you explain the best throughput with X-11? "Test HDV.xml" file Proxy mpeg (with 4 mixers) X11-OpenGL driver command CIN_HW_DEV=vdpau ./cin cpu % 11.9 - 21.4 frame/sec 12.00 - 12.08 "Test HDV.xml" file Proxy mpeg (with 4 mixers) X11-OpenGL driver command /cin cpu % 14.5 - 23.8 frame/sec 11.84 - 12.14 "Test HDV.xml" file Proxy mpeg (with 4 mixers) X11 Pilot command CIN_HW_DEV=vdpau ./cin cpu % 9.6 - 18.6 frame/sec 29.97 - 30.12 "Test HDV.xml" file Proxy mpeg (with 4 mixers) X11 Pilot command /cin cpu % 12.8 - 19.2 frame/sec 29.97 - 30.09 "Test Clowns .xml" file AVC H264.mp4 media (with 4 mixers) X11-OpenGL driver command CIN_HW_DEV=vdpau ./cin cpu % 12.5 - 20.7 frame/sec 9.6 - 12.08 "Test Clowns .xml" file vdpau media (with 4 mixers) X11-OpenGL driver command /cin cpu % 14.3 - 22.05 frame/sec 11.97 - 12.08 "Test Clowns .xml" file AVC H264.mp4 media (with 4 mixers) X11 Pilot command CIN_HW_DEV=vdpau ./cin cpu % 10.5 - 17.1 frame/sec 30.00 - 30.12 "Test Clowns .xml" file AVC H264.mp4 media (with 4 mixers) X11 Pilot command /cin cpu % 10.8 - 17.5 frame/sec 30.00 - 30.03 Pierre On 19-05-14 18 h 12, Pierre autourduglobe wrote:
I continued my tests.
First with HDV files and then with H264.mp4 AVC files from my old cell phone.
In each of these tests I used 4 mixers.
I did them with the X11-OpenGL drivers then X11 with and without, the line CIN_HW_DEV=vdpau ./cin
Unlike DNxHD files and Proxys mpeg files that I tested in the same way and that did not seem to show any reduction in CPU usage
With these HDV and H264.mp4 AVC files, the results seem to indicate that both with X11-OpenGL, and under X11, if I use vdpau there is a reduction in the % of CPU usage.
However, as for HDV and H264.mp4 AVC files, and this is what seems essential to me, only X11 is able to support an acceptable frame rate (around 29.97 frames/sec) in all these tests including 4 mixers.
"Test HDV.xml" file HDV (MPEG-2.m2t) media (with 4 mixers) X11-OpenGL driver command CIN_HW_DEV=vdpau ./cin
cpu % 8.1 - 16.1 frame/sec 10.24 - 12.16
"Test HDV.xml" file HDV (MPEG-2.m2t) media (with 4 mixers) X11-OpenGL driver command /cin
cpu % 20.4 - 28.9 frame/sec 11.39 - 12.08
"Test HDV.xml" file HDV (MPEG-2.m2t) media (with 4 mixers) X11 Pilot command CIN_HW_DEV=vdpau ./cin
cpu % 7.9 - 12.9 frame/sec 29.97 - 30.12
"Test HDV.xml" file HDV (MPEG-2.m2t) media (with 4 mixers) X11 Pilot command /cin
cpu % 13.5 - 25.7 frame/sec 29.97 - 30.92
"Test Clowns .xml" file AVC H264.mp4 media (with 4 mixers) X11-OpenGL driver command CIN_HW_DEV=vdpau ./cin
cpu % 16.2 - 28.0 frame/sec 9.89 - 13.74
"Test Clowns .xml" file AVC H264.mp4 media (with 4 mixers) X11-OpenGL driver command /cin
cpu % 20.4 - 37.06 frame/sec 9.99 - 15.25
"Test Clowns .xml" file AVC H264.mp4 media (with 4 mixers) X11 Pilot command CIN_HW_DEV=vdpau ./cin
cpu % 7.5 - 10.5 frame/sec 25.79 - 30.06
"Test Clowns .xml" file AVC H264.mp4 media (with 4 mixers) X11 Pilot command /cin
cpu % 15.6 - 33.4 frame/sec 30.00 - 30.06
Pierre
On 19-05-13 14 h 02, Phyllis Smith wrote:
This is an amalgamation of email from Pierre / me that should have been in the mailing list but I missed seeing that.
*_Summary first_* in case you don't want to read it all. And just FYI, Pierre tests on Mint18 and I use Fedora29 with totally different processors, number of CPUs and even brand of graphics boards.
1) Pierre: I don't really see any gains either with X11 or X11-OpenGL, the viewing in the composer may be a little more fluid but I'm not sure ... shouldn't vdpau be able to decode these mpeg? 2) Phyllis: When you use X11-OpenGL, which was written long ago when there was mostly only 1 CPU so only 1 thread, the computer to the Graphics board can become bottlenecked with Cinelerra calling for OpenGL graphics and at the same time the GPU being used with vdpau (can not confirm that this is happening). 3) Pierre: I don't have the feeling that the GPU decodes video tracks under X11-OpenGL even in the case of mpeg proxies. 4) Phyllis: with certain hardware, I think you might be correct about GPU not doing the decoding under X11-OpenGL, but I can not find anything that corroborates that. I do see that with "loglevel=verbose" a vdpau device is created in either the X11 or the OpenGL driver case. But I am finding CPU usage is actually higher with the X11-OpenGL driver PLUS vdapu than X11-OpenGL MINUS vdpau with my computer hardware. 5) Phyllis: on computers with lots of CPU cores it does not seem worthwhile to bother with using the graphics board GPU for decoding. And that might apply to encoding too in the case of the final render because using the Render Farm (on a single computer with lots of cores) is pretty fast.
*_Pierre's Tests Results_* (Intel computer with Nvidia graphics board)*_ _* "DNxHD corrected.xml Test" file X11-OpenGL driver command CIN_HW_DEV=vdpau ./cin
Proxys mpeg cpu % 21.6 - 43.7 frame/sec 11.18 - 12.16
DNxHD media cpu % 13.4 - 44.8 frame/sec 11.54 - 12.16
"DNxHD corrected.xml Test" file X11-OpenGL driver order /home/stone/Cinelerra-GG_5.1/cin
Proxys mpeg cpu % 19.1 - 44.3 frame/sec 11.39 - 12.24
DNxHD media cpu % 19.6 - 44.5 frame/sec 11.32 - 12.16
"DNxHD corrected.xml Test" file X11 Pilot command CIN_HW_DEV=vdpau ./cin
Proxys mpeg cpu % 19.5 - 41.7 frame/sec 29.97 - 30.15
DNxHD media cpu % 22.9 - 40.3 frame/sec 28.24 - 31.15 "DNxHD corrected.xml Test" file X11 Pilot order /home/stone/Cinelerra-GG_5.1/cin
Proxys mpeg cpu % 23.08 - 42.4 frame/sec 29.97 - 31.02
DNxHD media cpu % 21.7 - 43.5 frame/sec 29.97 - 31.02
Interesting....
At first glance, I would say that:
X11-OpenGL with or without vdpau
Does not decode DNxHD sources
Do not decode Proxys mpeg
x11 with or without vdpau
Decodes DNxHD sources
Decode mpeg proxies
I think I will now have to do some identical tests with HDV and H264.mp4 sources.
*Short Phyllis tests (*AMD computer, Radeon graphics board) using the proxy Mpeg version, I see: 11% cpu usage with X11-OpenGL 13% cpu usage with X11-OpenGL + vdpau/GPU 16% cpu usage with X11 + vdpau/GPU 21% cpu usage with X11
_*Most of the rest of the email thread is below.*_
Pierre Observation:
I don't really see any gains either with X11 or X11-OpenGL, the viewing in the composer may be a lit tle more fluid but I'm not sure.
I'm surprised that X11-OpenGL can't be fluid with mpeg.mpeg proxies, while it's much more fluid with Clowns' h264.mpeg and X11 does it all much better.
But, as it is the mpeg.mpeg proxies that I actually use, shouldn't vdpau be able to decode these mpeg?
Phyllis some response:
So I had 4 dnxhd files from previous reports and I proxied them as 1/2 mpeg-s. Although it is not o bvious that they are using vdpau, they actually are. Bill reminded me to edit ffmpeg/decode.opts an d change "loglevel=fatal" to "loglevel=verbose", restart Cinelerra and then in the cinelerra startup window you will see messages for the Mpegs: (you might have to also edit bin/ffmpeg/decode.opts )
[AVHWDeviceContext @ 0x7fff182c3cc0] Successfully created a VDPAU device (G3DVL VDPAU Driver Shared Library version 1.0) on X11 display :0 [AVHWDeviceContext @ 0x7ffea8afc980] Successfully created a VDPAU device (G3DVL VDPAU Driver Shared Library version 1.0) on X11 display :0 [AVHWDeviceContext @ 0x7fff6c1b12c0] Successfully created a VDPAU device (G3DVL VDPAU Driver Shared Library version 1.0) on X11 display :0 [AVHWDeviceContext @ 0x7fff6f223300] Successfully created a VDPAU device (G3DVL VDPAU Driver Shared Library version 1.0) on X11 display :0
When you use X11-OpenGL, which was written long ago when there was mostly only 1 CPU so only 1 threa d, the computer to the Graphics board can become bottlenecked with Cinelerra calling for OpenGL grap hics and at the same time the GPU being used with vdpau. ----------------------------------------------------------------------------------------------
Pierre Observation:
What surprises me though is that this difficulty does not exist under X11; the accumulation of video tracks from mixers does not cause the composer to slow down under this video driver.When I play DNxHD sources the CPU usage is 37-41%, If I play proxies in mpeg, the CPU usage is 10-18%. But in both cases (DNxHD and proxies in mpeg), the frame rate (in CinGG's preferences) is 11-12 frame/s (whereas the normal rate should be 29.97 frame/s).
Phyllis some response:
When I play it, X11-OpenGL slows down so that the mixers are done playing and compositor is only on frame 106; but with X11 I still have slow down, just not as bad. The mixers are done playing but th e compositor is still playing at about frame 175 - so for me X11 does still have the difficulty.
I suspect that your computer is faster with more cores. For example, with X11-OpenGL, if I run the "top" command from another window and watch it, I see it goes to 489% so is using threads/multiple c ores BUT it must be waiting on the single threaded Graphics Board. Since when I just use X11, the p rogram is not waiting on the graphics boards and runs at 600%. The graphics board is likely a bottl eneck. ---------------------------------------------------------------------------------------------------
Pierre Observation:
Under the X11 video driver: With DNxHD, the CPU is at 35% and the frame rate at 29.97-30 frame/s With Proxys in mpeg, the CPU is at 6.5-10.9% and the frame rate is also at 29.97-30 frame/s.
I don't have the feeling that the GPU decodes video tracks under X11-OpenGL even in the case of mpeg proxies.
Secondly,
Under the X11-OpenGL video driver: When I play DNxHD sources the CPU usage is 37-41%, If I play proxies in mpeg, the CPU usage is 10-18%. But in both cases (DNxHD and proxies in mpeg), the frame rate (in CinGG's preferences) is 11-12 frame/s (whereas the normal rate should be 29.97 frame/s).
Phyllis some response:
Unfortunately, I am not seeing this. For example with: EX-EGO Test DNxHD/Cam-4_MVI_1321_EX-EGO_cam-D.mov I see 29.97 fps in preferences and 60% CPU EX-EGO Test DNxHD/Cam-4_MVI_1321_EX-EGO_cam-D.proxy2-mov.mpeg and here I see 29.97 fps and 70% CPU
Something else must be causing the problem you see. GG noticed that these DNxHD sources are very large in size but I do not think that the disk I/O would slow anything down.
I think you might be correct about GPU not doing the decoding under X11-OpenGL, but I can not find anything that corroborates that. I do see that with "loglevel=verbose" a vdpau device is created in either the X11 or the OpenGL driver case. But I am finding CPU usage is actually higher with the X11-OpenGL driver PLUS vdapu than X11-OpenGL MINUS vdpau. So I thought it was a really bad idea to use OpenGL and GPU/vdpau together.
However, Sam and Andrea thought they got better results using X11-OpenGL than X11 with vdpau enabled. This has been mystifying to me as I definitely only saw good improvements using X11. Since we can not figure it out, I have decided that it might be due to the actual Graphics Card being used in conjunction with the Nvidia driver and operating system.
Summary - on computers with lots of CPU cores it does not seem worthwhile to bother with using the graphics board GPU for decoding. And that probably applies to encoding too in the case of the final render because using the Render Farm (on a single computer with an Epyc chip) is so fast as to be trivial. ------------------------------------------------------------------------------------------------------------
Pierre tests on the following:
The processor of my computer is an i7-3770k, so it has 4 physical core, 8 threads by Hyper-Threading (2 processing threads per physical core) at 3.50 GHz (turbo 3.90 GHz). 32 GB of ram and an nVidia GTX-750ti extension video card.
The nVidia card, which is a few years old, is not very powerful for aespecially useful to me because of its ability to manage 4 monitors 1920x1080 simultaneously. In my case, it is connected to three monitors.
I therefore suspect that this video card does not offer a significantly greater performance gain than the possibilities of my CPU...
Phyllis tests on the following:
AMD 8-Core RYZEN 7 1700 Processor, 3.0 GHz cache size : 512 KB memory : 64 GB Radeon Graphics Board : Radeon RX580 4GB
gamer type card (the really powerful models have become extremely expensive since the arrival of bitcoin mining...). This video card is
Hi, Pierre. I really appreciate your test and thank you for your reports. A question: when you activate vdpau, you do it from the terminal o from Setting --> Preferences --> Performance Tab: Use HW Device?
Hello Andrea Paz, I've always done it from the terminal. I haven't tried the option from Setting --> Preferences --> Performance Tab: Use HW Device Pierre On 19-05-15 16 h 17, Andrea paz wrote:
Hi, Pierre. I really appreciate your test and thank you for your reports. A question: when you activate vdpau, you do it from the terminal o from Setting --> Preferences --> Performance Tab: Use HW Device?
Pierre:
From your last 2 emails and tests as compared to what I see, I am thinking that the graphics board is the bottleneck. Doing similar tests with the Clowns, as compared with your observations below, I am always getting close to 29.97 fps in either X11 or X11-OpenGL. The reason I think it is probably your graphics board is because my laptop is not really a "work" computer but rather a "gaming" computer (it was an inexpensive AMD computer that has never, ever played a single game!) so I would imagine the graphics board is meant to be pretty good.
The results of these tests of the mpeg proxies tell me that with both
the X11-OpenGL driver and the X11 driver, using vdpau results in a very slight reduction in the use of my CPU, but that this does not improve the frame rate possible that these video drivers allow to display...
The above seems to indicate that the graphics board does not improve anything and you have plenty of CPU anyway, so you might as well use that.
X11 allows in all cases to display at least 29.97 frame/sec sources that have been shot at this speed.
X11-OpenGL is always limited to a maximum of about 12 frames/sec.
These results are approximately true for all the types of media I tested, whether DNxHD.mov, HDV (MPEG-2.m2t), AVC H264.mp4 or even proxies in mpeg.mpeg.
Given these results, I don't really see the advantage of using proxies... In any case, the video driver used will determine the possible frame rate regardless of the type of media used.
I'm actually wondering if the constant frame rate limit of 12 frames/sec provided by X11-OpenGL in my tests with 4 mixers, regardless of the media type, doesn't actually indicate a bug somewhere or a limit inherent in my equipment. But then how do you explain the best throughput with X-11?
Instead of working with 29.97 fps media, I loaded Big Buck Bunny which is 60 frames per second. And there may be something strange going on as Pierre indicated that I will have to test on a faster computer. Because when I played this, like Pierre, it seems to limit it at always 30 fps whether I user X11 or OpenGL. Then when I proxy it to 1/2, I thought I should have improved the frame rate but it too was at only 30 fps. I will have to do the tests on GG's computer to eliminate the possibility of a limitation / bug. Phyllis
Yes, I am also inclined to believe that my video card is the culprit... for the lack of frame rate. It would not be able, through Open-GL, to decode simultaneously the 5 streams (composer + 4 mixers). I've never played any games on my computers either... but "gamer" cards are much cheaper than pro cards, while being relatively powerful, and that's why I've always chosen them for video editing. My current video card dates from 2014, it's a Nvidia GTX-750ti: https://www.gigabyte.com/Graphics-Card/GV-N75TOC-2GI#ov It includes 2 GB of GDDR5 memory, 128-bit memory interface and a Bandwidth of 86.4 GB/s If it becomes clear that it is the guilty one... I'm ready to buy another more powerful one. I started looking at what could be bought, which would not be too expensive and would be compatible with my current power supply (which I don't want to change). I also don't know if Nvidia video cards or AMD cards would be the most compatible and optimized for Cinelerra-GG. Here are the models I'm considering right now: - Nvidia GeForce GTX 1070 (8GB, 256-Bit GDDR5, Bandwidth 256 GB/s - Nvidia GeForce GTX 1660 Ti (6GB, 192-Bit GDDR6, Bandwidth 288 GB/s - AMD Radeon RX 580 (8GB, 256-Bit GDDR5, Bandwidth 256 GB/s - AMD Radeon RX 570 (4GB, 256-Bit GDDR5, Bandwidth 224 GB/s But I'm not ready to buy right now.... Pierre On 19-05-15 16 h 21, Phyllis Smith wrote:
Pierre:
From your last 2 emails and tests as compared to what I see, I am thinking that the graphics board is the bottleneck. Doing similar tests with the Clowns, as compared with your observations below, I am always getting close to 29.97 fps in either X11 or X11-OpenGL. The reason I think it is probably your graphics board is because my laptop is not really a "work" computer but rather a "gaming" computer (it was an inexpensive AMD computer that has never, ever played a single game!) so I would imagine the graphics board is meant to be pretty good.
The results of these tests of the mpeg proxies tell me that with both the X11-OpenGL driver and the X11 driver, using vdpau results in a very slight reduction in the use of my CPU, but that this does not improve the frame rate possible that these video drivers allow to display...
The above seems to indicate that the graphics board does not improve anything and you have plenty of CPU anyway, so you might as well use that.
X11 allows in all cases to display at least 29.97 frame/sec sources that have been shot at this speed.
X11-OpenGL is always limited to a maximum of about 12 frames/sec.
These results are approximately true for all the types of media I tested, whether DNxHD.mov, HDV (MPEG-2.m2t), AVC H264.mp4 or even proxies in mpeg.mpeg.
Given these results, I don't really see the advantage of using proxies... In any case, the video driver used will determine the possible frame rate regardless of the type of media used.
I'm actually wondering if the constant frame rate limit of 12 frames/sec provided by X11-OpenGL in my tests with 4 mixers, regardless of the media type, doesn't actually indicate a bug somewhere or a limit inherent in my equipment. But then how do you explain the best throughput with X-11?
Instead of working with 29.97 fps media, I loaded Big Buck Bunny which is 60 frames per second. And there may be something strange going on as Pierre indicated that I will have to test on a faster computer. Because when I played this, like Pierre, it seems to limit it at always 30 fps whether I user X11 or OpenGL. Then when I proxy it to 1/2, I thought I should have improved the frame rate but it too was at only 30 fps.
I will have to do the tests on GG's computer to eliminate the possibility of a limitation / bug. Phyllis
wild guess: Try to enable/disable Vsync in ... driver's control application (I assume you use proprietary drivers with Nvidia GTX-750ti) And also same in window manager settings. Try to set CPU and GPU to maximum performance (I think I observed some unusually slow playback when I tried to play av1 files with my libdav1d hack at just 1.8Ghz * 4 cores. Setting CPU to 2.6 Ghz fixed this! In both cases CPU was not completely loaded, according to gkrellm I have in a corner) Try to check how fast your PCI-E link. (lspci -vv as root) --------------- 01:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce 8800 GS] (rev a2) (prog-if 00 [VGA controller]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 38 Region 0: Memory at fc000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at fa000000 (64-bit, non-prefetchable) [size=32M] Region 5: I/O ports at e000 [size=128] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: [60] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee00000 Data: 0000 Capabilities: [78] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <256ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Capabilities: [128 v1] Power Budgeting <?> Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: nouveau ---------------- LnkSta: Speed 5GT/s, Width x16 - sounds like PCI-E 2.0 Check if VDPAU works for simple players - mpv, mplayer. В сообщении от Thursday 16 May 2019 00:22:30 Pierre autourduglobe написал(а):
Yes, I am also inclined to believe that my video card is the culprit... for the lack of frame rate. It would not be able, through Open-GL, to decode simultaneously the 5 streams (composer + 4 mixers).
I've never played any games on my computers either... but "gamer" cards are much cheaper than pro cards, while being relatively powerful, and that's why I've always chosen them for video editing.
My current video card dates from 2014, it's a Nvidia GTX-750ti: https://www.gigabyte.com/Graphics-Card/GV-N75TOC-2GI#ov
It includes 2 GB of GDDR5 memory, 128-bit memory interface and a Bandwidth of 86.4 GB/s
If it becomes clear that it is the guilty one... I'm ready to buy another more powerful one.
I started looking at what could be bought, which would not be too expensive and would be compatible with my current power supply (which I don't want to change).
I also don't know if Nvidia video cards or AMD cards would be the most compatible and optimized for Cinelerra-GG.
Here are the models I'm considering right now:
- Nvidia GeForce GTX 1070 (8GB, 256-Bit GDDR5, Bandwidth 256 GB/s - Nvidia GeForce GTX 1660 Ti (6GB, 192-Bit GDDR6, Bandwidth 288 GB/s - AMD Radeon RX 580 (8GB, 256-Bit GDDR5, Bandwidth 256 GB/s - AMD Radeon RX 570 (4GB, 256-Bit GDDR5, Bandwidth 224 GB/s
But I'm not ready to buy right now....
Pierre
On 19-05-15 16 h 21, Phyllis Smith wrote:
Pierre:
From your last 2 emails and tests as compared to what I see, I am thinking that the graphics board is the bottleneck. Doing similar tests with the Clowns, as compared with your observations below, I am always getting close to 29.97 fps in either X11 or X11-OpenGL. The reason I think it is probably your graphics board is because my laptop is not really a "work" computer but rather a "gaming" computer (it was an inexpensive AMD computer that has never, ever played a single game!) so I would imagine the graphics board is meant to be pretty good.
The results of these tests of the mpeg proxies tell me that with both the X11-OpenGL driver and the X11 driver, using vdpau results in a very slight reduction in the use of my CPU, but that this does not improve the frame rate possible that these video drivers allow to display...
The above seems to indicate that the graphics board does not improve anything and you have plenty of CPU anyway, so you might as well use that.
X11 allows in all cases to display at least 29.97 frame/sec sources that have been shot at this speed.
X11-OpenGL is always limited to a maximum of about 12 frames/sec.
These results are approximately true for all the types of media I tested, whether DNxHD.mov, HDV (MPEG-2.m2t), AVC H264.mp4 or even proxies in mpeg.mpeg.
Given these results, I don't really see the advantage of using proxies... In any case, the video driver used will determine the possible frame rate regardless of the type of media used.
I'm actually wondering if the constant frame rate limit of 12 frames/sec provided by X11-OpenGL in my tests with 4 mixers, regardless of the media type, doesn't actually indicate a bug somewhere or a limit inherent in my equipment. But then how do you explain the best throughput with X-11?
Instead of working with 29.97 fps media, I loaded Big Buck Bunny which is 60 frames per second. And there may be something strange going on as Pierre indicated that I will have to test on a faster computer. Because when I played this, like Pierre, it seems to limit it at always 30 fps whether I user X11 or OpenGL. Then when I proxy it to 1/2, I thought I should have improved the frame rate but it too was at only 30 fps.
I will have to do the tests on GG's computer to eliminate the possibility of a limitation / bug. Phyllis
I wouldn't have believed it.... But you are absolutely right! Disable "Sync to VBlank" (option for OpenGL) in NVIDIA X Server Settings... has solved the problem! In my tests using 4 mixers, whether the sources are in DNxHD, HDV or mgeg proxies, all now have an image rate close to 29.97 frame/sec (corresponding to the shooting rate). Only my sources in AVC H264.mp4 do not reach this rate and are limited to about 15 to 22 frames/sec. But the proxies do. I think you saved me the cost of buying a new video card. Thank you. Pierre On 19-05-15 18 h 28, Andrew Randrianasulu wrote:
wild guess:
Try to enable/disable Vsync in ... driver's control application (I assume you use proprietary drivers with Nvidia GTX-750ti) And also same in window manager settings. Try to set CPU and GPU to maximum performance (I think I observed some unusually slow playback when I tried to play av1 files with my libdav1d hack at just 1.8Ghz * 4 cores. Setting CPU to 2.6 Ghz fixed this! In both cases CPU was not completely loaded, according to gkrellm I have in a corner)
Try to check how fast your PCI-E link. (lspci -vv as root)
--------------- 01:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce 8800 GS] (rev a2) (prog-if 00 [VGA controller]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 38 Region 0: Memory at fc000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at fa000000 (64-bit, non-prefetchable) [size=32M] Region 5: I/O ports at e000 [size=128] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: [60] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee00000 Data: 0000 Capabilities: [78] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <256ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Capabilities: [128 v1] Power Budgeting <?> Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: nouveau ---------------- LnkSta: Speed 5GT/s, Width x16 - sounds like PCI-E 2.0
Check if VDPAU works for simple players - mpv, mplayer.
В сообщении от Thursday 16 May 2019 00:22:30 Pierre autourduglobe написал(а):
Yes, I am also inclined to believe that my video card is the culprit... for the lack of frame rate. It would not be able, through Open-GL, to decode simultaneously the 5 streams (composer + 4 mixers).
I've never played any games on my computers either... but "gamer" cards are much cheaper than pro cards, while being relatively powerful, and that's why I've always chosen them for video editing.
My current video card dates from 2014, it's a Nvidia GTX-750ti: https://www.gigabyte.com/Graphics-Card/GV-N75TOC-2GI#ov
It includes 2 GB of GDDR5 memory, 128-bit memory interface and a Bandwidth of 86.4 GB/s
If it becomes clear that it is the guilty one... I'm ready to buy another more powerful one.
I started looking at what could be bought, which would not be too expensive and would be compatible with my current power supply (which I don't want to change).
I also don't know if Nvidia video cards or AMD cards would be the most compatible and optimized for Cinelerra-GG.
Here are the models I'm considering right now:
- Nvidia GeForce GTX 1070 (8GB, 256-Bit GDDR5, Bandwidth 256 GB/s - Nvidia GeForce GTX 1660 Ti (6GB, 192-Bit GDDR6, Bandwidth 288 GB/s - AMD Radeon RX 580 (8GB, 256-Bit GDDR5, Bandwidth 256 GB/s - AMD Radeon RX 570 (4GB, 256-Bit GDDR5, Bandwidth 224 GB/s
But I'm not ready to buy right now....
Pierre
On 19-05-15 16 h 21, Phyllis Smith wrote:
Pierre:
From your last 2 emails and tests as compared to what I see, I am thinking that the graphics board is the bottleneck. Doing similar tests with the Clowns, as compared with your observations below, I am always getting close to 29.97 fps in either X11 or X11-OpenGL. The reason I think it is probably your graphics board is because my laptop is not really a "work" computer but rather a "gaming" computer (it was an inexpensive AMD computer that has never, ever played a single game!) so I would imagine the graphics board is meant to be pretty good.
The results of these tests of the mpeg proxies tell me that with both the X11-OpenGL driver and the X11 driver, using vdpau results in a very slight reduction in the use of my CPU, but that this does not improve the frame rate possible that these video drivers allow to display...
The above seems to indicate that the graphics board does not improve anything and you have plenty of CPU anyway, so you might as well use that.
X11 allows in all cases to display at least 29.97 frame/sec sources that have been shot at this speed.
X11-OpenGL is always limited to a maximum of about 12 frames/sec.
These results are approximately true for all the types of media I tested, whether DNxHD.mov, HDV (MPEG-2.m2t), AVC H264.mp4 or even proxies in mpeg.mpeg.
Given these results, I don't really see the advantage of using proxies... In any case, the video driver used will determine the possible frame rate regardless of the type of media used.
I'm actually wondering if the constant frame rate limit of 12 frames/sec provided by X11-OpenGL in my tests with 4 mixers, regardless of the media type, doesn't actually indicate a bug somewhere or a limit inherent in my equipment. But then how do you explain the best throughput with X-11?
Instead of working with 29.97 fps media, I loaded Big Buck Bunny which is 60 frames per second. And there may be something strange going on as Pierre indicated that I will have to test on a faster computer. Because when I played this, like Pierre, it seems to limit it at always 30 fps whether I user X11 or OpenGL. Then when I proxy it to 1/2, I thought I should have improved the frame rate but it too was at only 30 fps.
I will have to do the tests on GG's computer to eliminate the possibility of a limitation / bug. Phyllis
Not only to you :) haldun Le 17/05/2019 à 04:15, Pierre autourduglobe a écrit :
I wouldn't have believed it.... But you are absolutely right!
Disable "Sync to VBlank" (option for OpenGL) in NVIDIA X Server Settings... has solved the problem!
In my tests using 4 mixers, whether the sources are in DNxHD, HDV or mgeg proxies, all now have an image rate close to 29.97 frame/sec (corresponding to the shooting rate).
Only my sources in AVC H264.mp4 do not reach this rate and are limited to about 15 to 22 frames/sec. But the proxies do.
I think you saved me the cost of buying a new video card.
Thank you.
Pierre
On 19-05-15 18 h 28, Andrew Randrianasulu wrote:
wild guess:
Try to enable/disable Vsync in ... driver's control application (I assume you use proprietary drivers with Nvidia GTX-750ti) And also same in window manager settings. Try to set CPU and GPU to maximum performance (I think I observed some unusually slow playback when I tried to play av1 files with my libdav1d hack at just 1.8Ghz * 4 cores. Setting CPU to 2.6 Ghz fixed this! In both cases CPU was not completely loaded, according to gkrellm I have in a corner)
Try to check how fast your PCI-E link. (lspci -vv as root)
--------------- 01:00.0 VGA compatible controller: NVIDIA Corporation G92 [GeForce 8800 GS] (rev a2) (prog-if 00 [VGA controller]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 38 Region 0: Memory at fc000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at fa000000 (64-bit, non-prefetchable) [size=32M] Region 5: I/O ports at e000 [size=128] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: [60] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee00000 Data: 0000 Capabilities: [78] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0 <256ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+ DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Capabilities: [128 v1] Power Budgeting <?> Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: nouveau
LnkSta: Speed 5GT/s, Width x16 - sounds like PCI-E 2.0
Check if VDPAU works for simple players - mpv, mplayer.
В сообщении от Thursday 16 May 2019 00:22:30 Pierre autourduglobe написал(а):
Yes, I am also inclined to believe that my video card is the culprit... for the lack of frame rate. It would not be able, through Open-GL, to decode simultaneously the 5 streams (composer + 4 mixers).
I've never played any games on my computers either... but "gamer" cards are much cheaper than pro cards, while being relatively powerful, and that's why I've always chosen them for video editing.
My current video card dates from 2014, it's a Nvidia GTX-750ti: https://www.gigabyte.com/Graphics-Card/GV-N75TOC-2GI#ov
It includes 2 GB of GDDR5 memory, 128-bit memory interface and a Bandwidth of 86.4 GB/s
If it becomes clear that it is the guilty one... I'm ready to buy another more powerful one.
I started looking at what could be bought, which would not be too expensive and would be compatible with my current power supply (which I don't want to change).
I also don't know if Nvidia video cards or AMD cards would be the most compatible and optimized for Cinelerra-GG.
Here are the models I'm considering right now:
- Nvidia GeForce GTX 1070 (8GB, 256-Bit GDDR5, Bandwidth 256 GB/s - Nvidia GeForce GTX 1660 Ti (6GB, 192-Bit GDDR6, Bandwidth 288 GB/s - AMD Radeon RX 580 (8GB, 256-Bit GDDR5, Bandwidth 256 GB/s - AMD Radeon RX 570 (4GB, 256-Bit GDDR5, Bandwidth 224 GB/s
But I'm not ready to buy right now....
Pierre
On 19-05-15 16 h 21, Phyllis Smith wrote:
Pierre:
From your last 2 emails and tests as compared to what I see, I am thinking that the graphics board is the bottleneck. Doing similar tests with the Clowns, as compared with your observations below, I am always getting close to 29.97 fps in either X11 or X11-OpenGL. The reason I think it is probably your graphics board is because my laptop is not really a "work" computer but rather a "gaming" computer (it was an inexpensive AMD computer that has never, ever played a single game!) so I would imagine the graphics board is meant to be pretty good.
The results of these tests of the mpeg proxies tell me that with both the X11-OpenGL driver and the X11 driver, using vdpau results in a very slight reduction in the use of my CPU, but that this does not improve the frame rate possible that these video drivers allow to display...
The above seems to indicate that the graphics board does not improve anything and you have plenty of CPU anyway, so you might as well use that.
X11 allows in all cases to display at least 29.97 frame/sec sources that have been shot at this speed.
X11-OpenGL is always limited to a maximum of about 12 frames/sec.
These results are approximately true for all the types of media I tested, whether DNxHD.mov, HDV (MPEG-2.m2t), AVC H264.mp4 or even proxies in mpeg.mpeg.
Given these results, I don't really see the advantage of using proxies... In any case, the video driver used will determine the possible frame rate regardless of the type of media used.
I'm actually wondering if the constant frame rate limit of 12 frames/sec provided by X11-OpenGL in my tests with 4 mixers, regardless of the media type, doesn't actually indicate a bug somewhere or a limit inherent in my equipment. But then how do you explain the best throughput with X-11?
Instead of working with 29.97 fps media, I loaded Big Buck Bunny which is 60 frames per second. And there may be something strange going on as Pierre indicated that I will have to test on a faster computer. Because when I played this, like Pierre, it seems to limit it at always 30 fps whether I user X11 or OpenGL. Then when I proxy it to 1/2, I thought I should have improved the frame rate but it too was at only 30 fps.
I will have to do the tests on GG's computer to eliminate the possibility of a limitation / bug. Phyllis
*On Wed, May 15, 2019 at 4:35 PM Andrew Randrianasulu <[email protected] <[email protected]>> wrote:*
*wild guess:*
Not so "wild" after all. This was an *amazing discovery*/knowledge that Andrew passed along that is much appreciated for anyone with an Nvidia card using Cinelerra and having low frames/sec.
Try to enable/disable Vsync in ... driver's control application (I assume you use proprietary drivers with Nvidia GTX-750ti)
Thank you and I am adding this to the manual in the GPU hardware
acceleration section for others who have not seen this here. Thank you Andrew! More wild guesses needed! GoodGuy/Phyllis
Wow, the difference is clear. The activation of vdpau has brought me a lot, additionally deactivating Vsync has improved my result significantly. Thanks Andrew for this great hint. Sam On 17.05.19 22:29, Phyllis Smith wrote:
*On Wed, May 15, 2019 at 4:35 PM Andrew Randrianasulu <[email protected] <mailto:[email protected]>> wrote:*
*wild guess:*
Not so "wild" after all. This was an /amazing discovery//knowledge that Andrew passed along that is much appreciated for anyone with an Nvidia card using Cinelerra and having low frames/sec.
Try to enable/disable Vsync in ... driver's control application (I assume you use proprietary drivers with Nvidia GTX-750ti)
Thank you and I am adding this to the manual in the GPU hardware acceleration section for others who have not seen this here. Thank you Andrew! More wild guesses needed! GoodGuy/Phyllis
Pierre, your video card is good, for me. It has 640 CUDA CORES. Unfortuantely only ffmpeg-nvenc use cuda core and not for all codec, seems to me. If you use Blender (http://www.blender.org/) and enable ComputeDevice=YourGTX you can see that cuda cores save a lot of your time to render a 3Dmodel/scene and your cpu% is low. IgorBeg Il 15/05/2019 23.22, Pierre autourduglobe ha scritto:
Yes, I am also inclined to believe that my video card is the culprit... for the lack of frame rate. It would not be able, through Open-GL, to decode simultaneously the 5 streams (composer + 4 mixers).
I've never played any games on my computers either... but "gamer" cards are much cheaper than pro cards, while being relatively powerful, and that's why I've always chosen them for video editing.
My current video card dates from 2014, it's a Nvidia GTX-750ti: https://www.gigabyte.com/Graphics-Card/GV-N75TOC-2GI#ov
It includes 2 GB of GDDR5 memory, 128-bit memory interface and a Bandwidth of 86.4 GB/s
If it becomes clear that it is the guilty one... I'm ready to buy another more powerful one.
I started looking at what could be bought, which would not be too expensive and would be compatible with my current power supply (which I don't want to change).
I also don't know if Nvidia video cards or AMD cards would be the most compatible and optimized for Cinelerra-GG.
Here are the models I'm considering right now:
- Nvidia GeForce GTX 1070 (8GB, 256-Bit GDDR5, Bandwidth 256 GB/s - Nvidia GeForce GTX 1660 Ti (6GB, 192-Bit GDDR6, Bandwidth 288 GB/s - AMD Radeon RX 580 (8GB, 256-Bit GDDR5, Bandwidth 256 GB/s - AMD Radeon RX 570 (4GB, 256-Bit GDDR5, Bandwidth 224 GB/s
But I'm not ready to buy right now....
Pierre
Hi IgorBeg, Until recently I also had the same feeling as you that my little video card was powerful enough for linux editing. In any case, few Linux software seems to really take advantage of the real potential power of really powerful external video cards. I don't use Blender, it doesn't match my type of activity. So the Cuda cores unfortunately seem very useless to me with all the current Linux editing software. I think the only characteristics of video cards that currently count under Linux are their raw computing speed, their amount of memory, the number of bits processed (64, 128, 192, 256, 320...) and finally their bandwidth in GB/s. These features have an impact on the performance of these cards under Linux as well. Until now I have always chosen Nvidia cards because for equal performance I found that it consumed less electricity, so required a less powerful power supply and above all, allowed me to have a computer that generated less heat. If my video card is still sufficient for my needs, I will not change it. If I have to change it, I will carefully re-evaluate my choice between Nvidia and AMD. I have always preferred to buy cheaper cards, which only satisfied my foreseeable needs in the medium term, rather than seeking to buy in advance, a very expensive card, so I would only risk being able to use the total power much later, at a time when cards with identical characteristics would then sell at a much lower price. Pierre On 19-05-16 04 h 30, Igor BEGHETTO wrote:
Pierre, your video card is good, for me. It has 640 CUDA CORES. Unfortuantely only ffmpeg-nvenc use cuda core and not for all codec, seems to me. If you use Blender (http://www.blender.org/) and enable ComputeDevice=YourGTX you can see that cuda cores save a lot of your time to render a 3Dmodel/scene and your cpu% is low.
IgorBeg
Il 15/05/2019 23.22, Pierre autourduglobe ha scritto:
Yes, I am also inclined to believe that my video card is the culprit... for the lack of frame rate. It would not be able, through Open-GL, to decode simultaneously the 5 streams (composer + 4 mixers).
I've never played any games on my computers either... but "gamer" cards are much cheaper than pro cards, while being relatively powerful, and that's why I've always chosen them for video editing.
My current video card dates from 2014, it's a Nvidia GTX-750ti: https://www.gigabyte.com/Graphics-Card/GV-N75TOC-2GI#ov
It includes 2 GB of GDDR5 memory, 128-bit memory interface and a Bandwidth of 86.4 GB/s
If it becomes clear that it is the guilty one... I'm ready to buy another more powerful one.
I started looking at what could be bought, which would not be too expensive and would be compatible with my current power supply (which I don't want to change).
I also don't know if Nvidia video cards or AMD cards would be the most compatible and optimized for Cinelerra-GG.
Here are the models I'm considering right now:
- Nvidia GeForce GTX 1070 (8GB, 256-Bit GDDR5, Bandwidth 256 GB/s - Nvidia GeForce GTX 1660 Ti (6GB, 192-Bit GDDR6, Bandwidth 288 GB/s - AMD Radeon RX 580 (8GB, 256-Bit GDDR5, Bandwidth 256 GB/s - AMD Radeon RX 570 (4GB, 256-Bit GDDR5, Bandwidth 224 GB/s
But I'm not ready to buy right now....
Pierre
Pierre: It sounds like from what IgorB is saying, there is nothing wrong with your Graphics board and I found at least one place on the internet that indicated it has vdpau capabiities. I simply can not understand why you still are getting only about 12 frames/sec using OpenGL and why the mpeg proxy files are not seeing improved frame rate also. Phyllis quote: Instead of working with 29.97 fps media, I loaded Big Buck
Bunny which is 60 frames per second. And there may be something strange going on as Pierre indicated that I will have to test on a faster computer. Because when I played this, like Pierre, it seems to limit it at always 30 fps whether I user X11 or OpenGL. Then when I proxy it to 1/2, I thought I should have improved the frame rate but it too was at only 30 fps.
As far as the above, I planned on showing this to GG to get an explanation of why I too was not getting an improved frames/sec with proxy, BUT he always makes me prove it to him. So I had to start Cinelerra from a clean beginning with nothing set, and then I could not reproduce the 30 frames/sec only and had the full 60 frames/sec in the non-proxy case already. I am still trying to determine if I can find the bad case to show GG but may have to give up. Phyllis
Il 17/05/2019 0.55, Phyllis Smith ha scritto:
So I had to start Cinelerra from a clean beginning with nothing set, and then I could not reproduce the 30 frames/sec only and had the full 60 frames/sec in the non-proxy case already.
The max frame rate achieved is based on your fps Project in the Settings-> Format. So, if your fps Project is 30 and your video is 60fps, then you will never reach 60 fps, I think. IgorBeg
IgorB: You are right! I kept trying to recreate the problem and with your information I could so it was not even a problem after all. Thanks, Phyllis On Fri, May 17, 2019 at 3:12 AM Igor BEGHETTO <[email protected]> wrote:
Il 17/05/2019 0.55, Phyllis Smith ha scritto:
So I had to start Cinelerra from a clean beginning with nothing set, and then I could not reproduce the 30 frames/sec only and had the full 60 frames/sec in the non-proxy case already.
The max frame rate achieved is based on your fps Project in the Settings-> Format. So, if your fps Project is 30 and your video is 60fps, then you will never reach 60 fps, I think.
participants (7)
-
Andrea paz -
Andrew Randrianasulu -
Haldun ALTAN -
Igor BEGHETTO -
Phyllis Smith -
Pierre autourduglobe -
Sam