[Cin] Observations using GPU on DNxHD and MPEG proxy while running CinelerraGG
Phyllis Smith
phylsmith2017 at gmail.com
Mon May 13 20:02:01 CEST 2019
This is an amalgamation of email from Pierre / me that should have been in
the mailing list but I missed seeing that.
*Summary first* in case you don't want to read it all. And just FYI,
Pierre tests on Mint18 and I use Fedora29 with totally different
processors, number of CPUs and even brand of graphics boards.
1) Pierre: I don't really see any gains either with X11 or X11-OpenGL, the
viewing in the composer may be a little more fluid but I'm not sure ...
shouldn't vdpau be able to decode these mpeg?
2) Phyllis: When you use X11-OpenGL, which was written long ago when there
was mostly only 1 CPU so only 1 thread, the computer to the Graphics board
can become bottlenecked with Cinelerra calling for OpenGL graphics and at
the same time the GPU being used with vdpau (can not confirm that this is
happening).
3) Pierre: I don't have the feeling that the GPU decodes video tracks under
X11-OpenGL even in the case of mpeg proxies.
4) Phyllis: with certain hardware, I think you might be correct about GPU
not doing the decoding under X11-OpenGL, but I can not find anything that
corroborates that. I do see that with "loglevel=verbose" a vdpau device is
created in either the X11 or the OpenGL driver case. But I am finding CPU
usage is actually higher with the X11-OpenGL driver PLUS vdapu than
X11-OpenGL MINUS vdpau with my computer hardware.
5) Phyllis: on computers with lots of CPU cores it does not seem worthwhile
to bother with using the graphics board GPU for decoding. And that might
apply to encoding too in the case of the final render because using the
Render Farm (on a single computer with lots of cores) is pretty fast.
*Pierre's Tests Results* (Intel computer with Nvidia graphics board)
"DNxHD corrected.xml Test" file
X11-OpenGL driver
command CIN_HW_DEV=vdpau ./cin
Proxys mpeg
cpu % 21.6 - 43.7
frame/sec 11.18 - 12.16
DNxHD media
cpu % 13.4 - 44.8
frame/sec 11.54 - 12.16
"DNxHD corrected.xml Test" file
X11-OpenGL driver
order /home/stone/Cinelerra-GG_5.1/cin
Proxys mpeg
cpu % 19.1 - 44.3
frame/sec 11.39 - 12.24
DNxHD media
cpu % 19.6 - 44.5
frame/sec 11.32 - 12.16
"DNxHD corrected.xml Test" file
X11 Pilot
command CIN_HW_DEV=vdpau ./cin
Proxys mpeg
cpu % 19.5 - 41.7
frame/sec 29.97 - 30.15
DNxHD media
cpu % 22.9 - 40.3
frame/sec 28.24 - 31.15
"DNxHD corrected.xml Test" file
X11 Pilot
order /home/stone/Cinelerra-GG_5.1/cin
Proxys mpeg
cpu % 23.08 - 42.4
frame/sec 29.97 - 31.02
DNxHD media
cpu % 21.7 - 43.5
frame/sec 29.97 - 31.02
Interesting....
At first glance, I would say that:
X11-OpenGL with or without vdpau
Does not decode DNxHD sources
Do not decode Proxys mpeg
x11 with or without vdpau
Decodes DNxHD sources
Decode mpeg proxies
I think I will now have to do some identical tests with HDV and H264.mp4
sources.
*Short Phyllis tests (*AMD computer, Radeon graphics board)
using the proxy Mpeg version, I see:
11% cpu usage with X11-OpenGL
13% cpu usage with X11-OpenGL + vdpau/GPU
16% cpu usage with X11 + vdpau/GPU
21% cpu usage with X11
*Most of the rest of the email thread is below.*
Pierre Observation:
I don't really see any gains either with X11 or X11-OpenGL, the viewing in
the composer may be a lit
tle more fluid but I'm not sure.
I'm surprised that X11-OpenGL can't be fluid with mpeg.mpeg proxies,
while it's much more fluid with Clowns' h264.mpeg and X11 does it all
much better.
But, as it is the mpeg.mpeg proxies that I actually use, shouldn't vdpau
be able to decode these mpeg?
Phyllis some response:
So I had 4 dnxhd files from previous reports and I proxied them as 1/2
mpeg-s. Although it is not o
bvious that they are using vdpau, they actually are. Bill reminded me to
edit ffmpeg/decode.opts an
d change "loglevel=fatal" to "loglevel=verbose", restart Cinelerra and then
in the cinelerra startup
window you will see messages for the Mpegs: (you might have to also
edit bin/ffmpeg/decode.opts
)
[AVHWDeviceContext @ 0x7fff182c3cc0] Successfully created a VDPAU device
(G3DVL VDPAU Driver Shared
Library version 1.0) on X11 display :0
[AVHWDeviceContext @ 0x7ffea8afc980] Successfully created a VDPAU device
(G3DVL VDPAU Driver Shared
Library version 1.0) on X11 display :0
[AVHWDeviceContext @ 0x7fff6c1b12c0] Successfully created a VDPAU device
(G3DVL VDPAU Driver Shared
Library version 1.0) on X11 display :0
[AVHWDeviceContext @ 0x7fff6f223300] Successfully created a VDPAU device
(G3DVL VDPAU Driver Shared
Library version 1.0) on X11 display :0
When you use X11-OpenGL, which was written long ago when there was mostly
only 1 CPU so only 1 threa
d, the computer to the Graphics board can become bottlenecked with
Cinelerra calling for OpenGL grap
hics and at the same time the GPU being used with vdpau.
----------------------------------------------------------------------------------------------
Pierre Observation:
What surprises me though is that this difficulty does not exist under
X11; the accumulation of video tracks from mixers does not cause the
composer to slow down under this video driver.When I play DNxHD sources the
CPU usage is 37-41%,
If I play proxies in mpeg, the CPU usage is 10-18%.
But in both cases (DNxHD and proxies in mpeg), the frame rate (in
CinGG's preferences) is 11-12 frame/s (whereas the normal rate should be
29.97 frame/s).
Phyllis some response:
When I play it, X11-OpenGL slows down so that the mixers are done playing
and compositor is only on
frame 106; but with X11 I still have slow down, just not as bad. The
mixers are done playing but th
e compositor is still playing at about frame 175 - so for me X11 does still
have the difficulty.
I suspect that your computer is faster with more cores. For example, with
X11-OpenGL, if I run the
"top" command from another window and watch it, I see it goes to 489% so is
using threads/multiple c
ores BUT it must be waiting on the single threaded Graphics Board. Since
when I just use X11, the p
rogram is not waiting on the graphics boards and runs at 600%. The
graphics board is likely a bottl
eneck.
---------------------------------------------------------------------------------------------------
Pierre Observation:
Under the X11 video driver:
With DNxHD, the CPU is at 35% and the frame rate at 29.97-30 frame/s
With Proxys in mpeg, the CPU is at 6.5-10.9% and the frame rate is also
at 29.97-30 frame/s.
I don't have the feeling that the GPU decodes video tracks under
X11-OpenGL even in the case of mpeg proxies.
Secondly,
Under the X11-OpenGL video driver:
When I play DNxHD sources the CPU usage is 37-41%,
If I play proxies in mpeg, the CPU usage is 10-18%.
But in both cases (DNxHD and proxies in mpeg), the frame rate (in
CinGG's preferences) is 11-12 frame/s (whereas the normal rate should be
29.97 frame/s).
Phyllis some response:
Unfortunately, I am not seeing this. For example with:
EX-EGO Test DNxHD/Cam-4_MVI_1321_EX-EGO_cam-D.mov
I see 29.97 fps in preferences and 60% CPU
EX-EGO Test DNxHD/Cam-4_MVI_1321_EX-EGO_cam-D.proxy2-mov.mpeg
and here I see 29.97 fps and 70% CPU
Something else must be causing the problem you see. GG noticed that these
DNxHD sources are very large in size but I do not think that the disk I/O
would slow anything down.
I think you might be correct about GPU not doing the decoding under
X11-OpenGL, but I can not find anything that corroborates that. I do see
that with "loglevel=verbose" a vdpau device is created in either the X11 or
the OpenGL driver case. But I am finding CPU usage is actually higher with
the X11-OpenGL driver PLUS vdapu than X11-OpenGL MINUS vdpau. So I thought
it was a really bad idea to use OpenGL and GPU/vdpau together.
However, Sam and Andrea thought they got better results using X11-OpenGL
than X11 with vdpau enabled. This has been mystifying to me as I
definitely only saw good improvements using X11. Since we can not figure
it out, I have decided that it might be due to the actual Graphics Card
being used in conjunction with the Nvidia driver and operating system.
Summary - on computers with lots of CPU cores it does not seem worthwhile
to bother with using the graphics board GPU for decoding. And that
probably applies to encoding too in the case of the final render because
using the Render Farm (on a single computer with an Epyc chip) is so fast
as to be trivial.
------------------------------------------------------------------------------------------------------------
Pierre tests on the following:
The processor of my computer is an i7-3770k, so it has 4 physical core,
8 threads by Hyper-Threading (2 processing threads per physical core) at
3.50 GHz (turbo 3.90 GHz). 32 GB of ram and an nVidia GTX-750ti
extension video card.
The nVidia card, which is a few years old, is not very powerful for
aespecially useful to me because of its ability to manage 4 monitors
1920x1080 simultaneously. In my case, it is connected to three monitors.
I therefore suspect that this video card does not offer a significantly
greater performance gain than the possibilities of my CPU...
Phyllis tests on the following:
AMD 8-Core RYZEN 7 1700 Processor, 3.0 GHz
cache size : 512 KB
memory : 64 GB
Radeon Graphics Board : Radeon RX580 4GB
gamer type card (the really powerful models have become extremely
expensive since the arrival of bitcoin mining...). This video card is
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cinelerra-gg.org/pipermail/cin/attachments/20190513/0b740cb7/attachment-0001.html>
More information about the Cin
mailing list