[Cin] Observations using GPU on DNxHD and MPEG proxy while running CinelerraGG
Pierre autourduglobe
p.autourduglobe at gmail.com
Wed May 15 00:12:01 CEST 2019
I continued my tests.
First with HDV files and then with H264.mp4 AVC files from my old cell
phone.
In each of these tests I used 4 mixers.
I did them with the X11-OpenGL drivers then X11
with and without, the line CIN_HW_DEV=vdpau ./cin
Unlike DNxHD files and Proxys mpeg files that I tested in the same way
and that did not seem to show any reduction in CPU usage
With these HDV and H264.mp4 AVC files, the results seem to indicate that
both with X11-OpenGL, and under X11, if I use vdpau there is a reduction
in the % of CPU usage.
However, as for HDV and H264.mp4 AVC files, and this is what seems
essential to me, only X11 is able to support an acceptable frame rate
(around 29.97 frames/sec) in all these tests including 4 mixers.
"Test HDV.xml" file
HDV (MPEG-2.m2t) media (with 4 mixers)
X11-OpenGL driver
command CIN_HW_DEV=vdpau ./cin
cpu % 8.1 - 16.1
frame/sec 10.24 - 12.16
"Test HDV.xml" file
HDV (MPEG-2.m2t) media (with 4 mixers)
X11-OpenGL driver
command /cin
cpu % 20.4 - 28.9
frame/sec 11.39 - 12.08
"Test HDV.xml" file
HDV (MPEG-2.m2t) media (with 4 mixers)
X11 Pilot
command CIN_HW_DEV=vdpau ./cin
cpu % 7.9 - 12.9
frame/sec 29.97 - 30.12
"Test HDV.xml" file
HDV (MPEG-2.m2t) media (with 4 mixers)
X11 Pilot
command /cin
cpu % 13.5 - 25.7
frame/sec 29.97 - 30.92
"Test Clowns .xml" file
AVC H264.mp4 media (with 4 mixers)
X11-OpenGL driver
command CIN_HW_DEV=vdpau ./cin
cpu % 16.2 - 28.0
frame/sec 9.89 - 13.74
"Test Clowns .xml" file
AVC H264.mp4 media (with 4 mixers)
X11-OpenGL driver
command /cin
cpu % 20.4 - 37.06
frame/sec 9.99 - 15.25
"Test Clowns .xml" file
AVC H264.mp4 media (with 4 mixers)
X11 Pilot
command CIN_HW_DEV=vdpau ./cin
cpu % 7.5 - 10.5
frame/sec 25.79 - 30.06
"Test Clowns .xml" file
AVC H264.mp4 media (with 4 mixers)
X11 Pilot
command /cin
cpu % 15.6 - 33.4
frame/sec 30.00 - 30.06
Pierre
On 19-05-13 14 h 02, Phyllis Smith wrote:
> This is an amalgamation of email from Pierre / me that should have been
> in the mailing list but I missed seeing that.
>
> *_Summary first_* in case you don't want to read it all. And just FYI,
> Pierre tests on Mint18 and I use Fedora29 with totally different
> processors, number of CPUs and even brand of graphics boards.
>
> 1) Pierre: I don't really see any gains either with X11 or X11-OpenGL,
> the viewing in the composer may be a little more fluid but I'm not sure
> ... shouldn't vdpau be able to decode these mpeg?
> 2) Phyllis: When you use X11-OpenGL, which was written long ago when
> there was mostly only 1 CPU so only 1 thread, the computer to the
> Graphics board can become bottlenecked with Cinelerra calling for OpenGL
> graphics and at the same time the GPU being used with vdpau (can not
> confirm that this is happening).
> 3) Pierre: I don't have the feeling that the GPU decodes video tracks
> under X11-OpenGL even in the case of mpeg proxies.
> 4) Phyllis: with certain hardware, I think you might be correct about
> GPU not doing the decoding under X11-OpenGL, but I can not find anything
> that corroborates that. I do see that with "loglevel=verbose" a vdpau
> device is created in either the X11 or the OpenGL driver case. But I am
> finding CPU usage is actually higher with the X11-OpenGL driver PLUS
> vdapu than X11-OpenGL MINUS vdpau with my computer hardware.
> 5) Phyllis: on computers with lots of CPU cores it does not seem
> worthwhile to bother with using the graphics board GPU for decoding.
> And that might apply to encoding too in the case of the final render
> because using the Render Farm (on a single computer with lots of cores)
> is pretty fast.
>
> *_Pierre's Tests Results_* (Intel computer with Nvidia graphics board)*_
> _*
> "DNxHD corrected.xml Test" file
> X11-OpenGL driver
> command CIN_HW_DEV=vdpau ./cin
>
> Proxys mpeg
> cpu % 21.6 - 43.7
> frame/sec 11.18 - 12.16
>
> DNxHD media
> cpu % 13.4 - 44.8
> frame/sec 11.54 - 12.16
>
>
> "DNxHD corrected.xml Test" file
> X11-OpenGL driver
> order /home/stone/Cinelerra-GG_5.1/cin
>
> Proxys mpeg
> cpu % 19.1 - 44.3
> frame/sec 11.39 - 12.24
>
> DNxHD media
> cpu % 19.6 - 44.5
> frame/sec 11.32 - 12.16
>
>
> "DNxHD corrected.xml Test" file
> X11 Pilot
> command CIN_HW_DEV=vdpau ./cin
>
> Proxys mpeg
> cpu % 19.5 - 41.7
> frame/sec 29.97 - 30.15
>
> DNxHD media
> cpu % 22.9 - 40.3
> frame/sec 28.24 - 31.15
> "DNxHD corrected.xml Test" file
> X11 Pilot
> order /home/stone/Cinelerra-GG_5.1/cin
>
> Proxys mpeg
> cpu % 23.08 - 42.4
> frame/sec 29.97 - 31.02
>
> DNxHD media
> cpu % 21.7 - 43.5
> frame/sec 29.97 - 31.02
>
>
> Interesting....
>
> At first glance, I would say that:
>
> X11-OpenGL with or without vdpau
>
> Does not decode DNxHD sources
>
> Do not decode Proxys mpeg
>
> x11 with or without vdpau
>
> Decodes DNxHD sources
>
> Decode mpeg proxies
>
>
> I think I will now have to do some identical tests with HDV and H264.mp4
> sources.
>
> *Short Phyllis tests (*AMD computer, Radeon graphics board)
> using the proxy Mpeg version, I see:
> 11% cpu usage with X11-OpenGL
> 13% cpu usage with X11-OpenGL + vdpau/GPU
> 16% cpu usage with X11 + vdpau/GPU
> 21% cpu usage with X11
>
> _*Most of the rest of the email thread is below.*_
>
> Pierre Observation:
>
> I don't really see any gains either with X11 or X11-OpenGL, the viewing
> in the composer may be a lit
> tle more fluid but I'm not sure.
>
> I'm surprised that X11-OpenGL can't be fluid with mpeg.mpeg proxies,
> while it's much more fluid with Clowns' h264.mpeg and X11 does it all
> much better.
>
> But, as it is the mpeg.mpeg proxies that I actually use, shouldn't vdpau
> be able to decode these mpeg?
>
> Phyllis some response:
>
> So I had 4 dnxhd files from previous reports and I proxied them as 1/2
> mpeg-s. Although it is not o
> bvious that they are using vdpau, they actually are. Bill reminded me
> to edit ffmpeg/decode.opts an
> d change "loglevel=fatal" to "loglevel=verbose", restart Cinelerra and
> then in the cinelerra startup
> window you will see messages for the Mpegs: (you might have to
> also edit bin/ffmpeg/decode.opts
> )
>
> [AVHWDeviceContext @ 0x7fff182c3cc0] Successfully created a VDPAU device
> (G3DVL VDPAU Driver Shared
> Library version 1.0) on X11 display :0
> [AVHWDeviceContext @ 0x7ffea8afc980] Successfully created a VDPAU device
> (G3DVL VDPAU Driver Shared
> Library version 1.0) on X11 display :0
> [AVHWDeviceContext @ 0x7fff6c1b12c0] Successfully created a VDPAU device
> (G3DVL VDPAU Driver Shared
> Library version 1.0) on X11 display :0
> [AVHWDeviceContext @ 0x7fff6f223300] Successfully created a VDPAU device
> (G3DVL VDPAU Driver Shared
> Library version 1.0) on X11 display :0
>
> When you use X11-OpenGL, which was written long ago when there was
> mostly only 1 CPU so only 1 threa
> d, the computer to the Graphics board can become bottlenecked with
> Cinelerra calling for OpenGL grap
> hics and at the same time the GPU being used with vdpau.
> ----------------------------------------------------------------------------------------------
>
> Pierre Observation:
>
> What surprises me though is that this difficulty does not exist under
> X11; the accumulation of video tracks from mixers does not cause the
> composer to slow down under this video driver.When I play DNxHD sources
> the CPU usage is 37-41%,
> If I play proxies in mpeg, the CPU usage is 10-18%.
> But in both cases (DNxHD and proxies in mpeg), the frame rate (in
> CinGG's preferences) is 11-12 frame/s (whereas the normal rate should be
> 29.97 frame/s).
>
> Phyllis some response:
>
> When I play it, X11-OpenGL slows down so that the mixers are done
> playing and compositor is only on
> frame 106; but with X11 I still have slow down, just not as bad. The
> mixers are done playing but th
> e compositor is still playing at about frame 175 - so for me X11 does
> still have the difficulty.
>
> I suspect that your computer is faster with more cores. For example,
> with X11-OpenGL, if I run the
> "top" command from another window and watch it, I see it goes to 489% so
> is using threads/multiple c
> ores BUT it must be waiting on the single threaded Graphics Board.
> Since when I just use X11, the p
> rogram is not waiting on the graphics boards and runs at 600%. The
> graphics board is likely a bottl
> eneck.
> ---------------------------------------------------------------------------------------------------
>
> Pierre Observation:
>
> Under the X11 video driver:
> With DNxHD, the CPU is at 35% and the frame rate at 29.97-30 frame/s
> With Proxys in mpeg, the CPU is at 6.5-10.9% and the frame rate is also
> at 29.97-30 frame/s.
>
> I don't have the feeling that the GPU decodes video tracks under
> X11-OpenGL even in the case of mpeg proxies.
>
> Secondly,
>
> Under the X11-OpenGL video driver:
> When I play DNxHD sources the CPU usage is 37-41%,
> If I play proxies in mpeg, the CPU usage is 10-18%.
> But in both cases (DNxHD and proxies in mpeg), the frame rate (in
> CinGG's preferences) is 11-12 frame/s (whereas the normal rate
> should be
> 29.97 frame/s).
>
> Phyllis some response:
>
> Unfortunately, I am not seeing this. For example with:
> EX-EGO Test DNxHD/Cam-4_MVI_1321_EX-EGO_cam-D.mov
> I see 29.97 fps in preferences and 60% CPU
> EX-EGO Test DNxHD/Cam-4_MVI_1321_EX-EGO_cam-D.proxy2-mov.mpeg
> and here I see 29.97 fps and 70% CPU
>
> Something else must be causing the problem you see. GG noticed that
> these DNxHD sources are very large in size but I do not think that the
> disk I/O would slow anything down.
>
> I think you might be correct about GPU not doing the decoding under
> X11-OpenGL, but I can not find anything that corroborates that. I do
> see that with "loglevel=verbose" a vdpau device is created in either the
> X11 or the OpenGL driver case. But I am finding CPU usage is actually
> higher with the X11-OpenGL driver PLUS vdapu than X11-OpenGL MINUS
> vdpau. So I thought it was a really bad idea to use OpenGL and
> GPU/vdpau together.
>
> However, Sam and Andrea thought they got better results using X11-OpenGL
> than X11 with vdpau enabled. This has been mystifying to me as I
> definitely only saw good improvements using X11. Since we can not
> figure it out, I have decided that it might be due to the actual
> Graphics Card being used in conjunction with the Nvidia driver and
> operating system.
>
> Summary - on computers with lots of CPU cores it does not seem
> worthwhile to bother with using the graphics board GPU for decoding.
> And that probably applies to encoding too in the case of the final
> render because using the Render Farm (on a single computer with an Epyc
> chip) is so fast as to be trivial.
> ------------------------------------------------------------------------------------------------------------
>
> Pierre tests on the following:
>
> The processor of my computer is an i7-3770k, so it has 4 physical core,
> 8 threads by Hyper-Threading (2 processing threads per physical core) at
> 3.50 GHz (turbo 3.90 GHz). 32 GB of ram and an nVidia GTX-750ti
> extension video card.
>
> The nVidia card, which is a few years old, is not very powerful for
> aespecially useful to me because of its ability to manage 4 monitors
> 1920x1080 simultaneously. In my case, it is connected to three monitors.
>
> I therefore suspect that this video card does not offer a significantly
> greater performance gain than the possibilities of my CPU...
>
> Phyllis tests on the following:
>
> AMD 8-Core RYZEN 7 1700 Processor, 3.0 GHz
> cache size : 512 KB
> memory : 64 GB
> Radeon Graphics Board : Radeon RX580 4GB
>
> gamer type card (the really powerful models have become extremely
> expensive since the arrival of bitcoin mining...). This video card is
>
>
>
>
>
More information about the Cin
mailing list