[Cin] Observations using GPU on DNxHD and MPEG proxy while running CinelerraGG

Pierre autourduglobe p.autourduglobe at gmail.com
Wed May 15 21:30:33 CEST 2019


This time I completed my tests by adding Proxies to the HDV and AVC 
H264.mp4 test projects I did yesterday. Still in the context of use with 
4 mixers.

These are Proxys in mpeg with the same characteristics as those I had 
tested with DNxHD media.

The results of these tests of the mpeg proxies tell me that with both 
the X11-OpenGL driver and the X11 driver, using vdpau results in a very 
slight reduction in the use of my CPU, but that this does not improve 
the frame rate possible that these video drivers allow to display...

X11 allows in all cases to display at least 29.97 frame/sec sources that 
have been shot at this speed.

X11-OpenGL is always limited to a maximum of about 12 frames/sec.

These results are approximately true for all the types of media I 
tested, whether DNxHD.mov, HDV (MPEG-2.m2t), AVC H264.mp4 or even 
proxies in mpeg.mpeg.

Given these results, I don't really see the advantage of using 
proxies... In any case, the video driver used will determine the 
possible frame rate regardless of the type of media used.

I'm actually wondering if the constant frame rate limit of 12 frames/sec 
provided by X11-OpenGL in my tests with 4 mixers, regardless of the 
media type, doesn't actually indicate a bug somewhere or a limit 
inherent in my equipment. But then how do you explain the best 
throughput with X-11?


"Test HDV.xml" file
Proxy mpeg (with 4 mixers)
X11-OpenGL driver
command CIN_HW_DEV=vdpau ./cin

cpu % 11.9 - 21.4
frame/sec 12.00 - 12.08


"Test HDV.xml" file
Proxy mpeg (with 4 mixers)
X11-OpenGL driver
command /cin

cpu % 14.5 - 23.8
frame/sec 11.84 - 12.14


"Test HDV.xml" file
Proxy mpeg (with 4 mixers)
X11 Pilot
command CIN_HW_DEV=vdpau ./cin

cpu % 9.6 - 18.6
frame/sec 29.97 - 30.12


"Test HDV.xml" file
Proxy mpeg (with 4 mixers)
X11 Pilot
command /cin

cpu % 12.8 - 19.2
frame/sec 29.97 - 30.09


"Test Clowns .xml" file
AVC H264.mp4 media (with 4 mixers)
X11-OpenGL driver
command CIN_HW_DEV=vdpau ./cin

cpu % 12.5 - 20.7
frame/sec 9.6 - 12.08


"Test Clowns .xml" file
vdpau media (with 4 mixers)
X11-OpenGL driver
command /cin

cpu % 14.3 - 22.05
frame/sec 11.97 - 12.08


"Test Clowns .xml" file
AVC H264.mp4 media (with 4 mixers)
X11 Pilot
command CIN_HW_DEV=vdpau ./cin

cpu % 10.5 - 17.1
frame/sec 30.00 - 30.12


"Test Clowns .xml" file
AVC H264.mp4 media (with 4 mixers)
X11 Pilot
command /cin

cpu % 10.8 - 17.5
frame/sec 30.00 - 30.03


Pierre



On 19-05-14 18 h 12, Pierre autourduglobe wrote:
> I continued my tests.
> 
> First with HDV files and then with H264.mp4 AVC files from my old cell 
> phone.
> 
> In each of these tests I used 4 mixers.
> 
> I did them with the X11-OpenGL drivers then X11
> with and without, the line CIN_HW_DEV=vdpau ./cin
> 
> Unlike DNxHD files and Proxys mpeg files that I tested in the same way 
> and that did not seem to show any reduction in CPU usage
> 
> With these HDV and H264.mp4 AVC files, the results seem to indicate that 
> both with X11-OpenGL, and under X11, if I use vdpau there is a reduction 
> in the % of CPU usage.
> 
> However, as for HDV and H264.mp4 AVC files, and this is what seems 
> essential to me, only X11 is able to support an acceptable frame rate 
> (around 29.97 frames/sec) in all these tests including 4 mixers.
> 
> 
> 
> "Test HDV.xml" file
> HDV (MPEG-2.m2t) media (with 4 mixers)
> X11-OpenGL driver
> command CIN_HW_DEV=vdpau ./cin
> 
> cpu % 8.1 - 16.1
> frame/sec 10.24 - 12.16
> 
> 
> "Test HDV.xml" file
> HDV (MPEG-2.m2t) media (with 4 mixers)
> X11-OpenGL driver
> command /cin
> 
> cpu % 20.4 - 28.9
> frame/sec 11.39 - 12.08
> 
> 
> "Test HDV.xml" file
> HDV (MPEG-2.m2t) media (with 4 mixers)
> X11 Pilot
> command CIN_HW_DEV=vdpau ./cin
> 
> cpu % 7.9 - 12.9
> frame/sec 29.97 - 30.12
> 
> 
> "Test HDV.xml" file
> HDV (MPEG-2.m2t) media (with 4 mixers)
> X11 Pilot
> command /cin
> 
> cpu % 13.5 - 25.7
> frame/sec 29.97 - 30.92
> 
> 
> 
> "Test Clowns .xml" file
> AVC H264.mp4 media (with 4 mixers)
> X11-OpenGL driver
> command CIN_HW_DEV=vdpau ./cin
> 
> cpu % 16.2 - 28.0
> frame/sec 9.89 - 13.74
> 
> 
> "Test Clowns .xml" file
> AVC H264.mp4 media (with 4 mixers)
> X11-OpenGL driver
> command /cin
> 
> cpu % 20.4 - 37.06
> frame/sec 9.99 - 15.25
> 
> 
> "Test Clowns .xml" file
> AVC H264.mp4 media (with 4 mixers)
> X11 Pilot
> command CIN_HW_DEV=vdpau ./cin
> 
> cpu % 7.5 - 10.5
> frame/sec 25.79 - 30.06
> 
> 
> "Test Clowns .xml" file
> AVC H264.mp4 media (with 4 mixers)
> X11 Pilot
> command /cin
> 
> cpu % 15.6 - 33.4
> frame/sec 30.00 - 30.06
> 
> 
> Pierre
> 
> 
> 
> On 19-05-13 14 h 02, Phyllis Smith wrote:
>> This is an amalgamation of email from Pierre / me that should have 
>> been in the mailing list but I missed seeing that.
>>
>> *_Summary first_* in case you don't want to read it all.  And just 
>> FYI, Pierre tests on Mint18 and I use Fedora29 with totally different 
>> processors, number of CPUs and even brand of graphics boards.
>>
>> 1) Pierre: I don't really see any gains either with X11 or X11-OpenGL, 
>> the viewing in the composer may be a little more fluid but I'm not 
>> sure ... shouldn't vdpau be able to decode these mpeg?
>> 2) Phyllis: When you use X11-OpenGL, which was written long ago when 
>> there was mostly only 1 CPU so only 1 thread, the computer to the 
>> Graphics board can become bottlenecked with Cinelerra calling for 
>> OpenGL graphics and at the same time the GPU being used with vdpau 
>> (can not confirm that this is happening).
>> 3) Pierre: I don't have the feeling that the GPU decodes video tracks 
>> under X11-OpenGL even in the case of mpeg proxies.
>> 4) Phyllis: with certain hardware, I think you might be correct about 
>> GPU not doing the decoding under X11-OpenGL, but I can not find 
>> anything that corroborates that.  I do see that with 
>> "loglevel=verbose" a vdpau device is created in either the X11 or the 
>> OpenGL driver case.  But I am finding CPU usage is actually higher 
>> with the X11-OpenGL driver PLUS vdapu than X11-OpenGL MINUS vdpau with 
>> my computer hardware.
>> 5) Phyllis: on computers with lots of CPU cores it does not seem 
>> worthwhile to bother with using the graphics board GPU for decoding. 
>> And that might apply to encoding too in the case of the final render 
>> because using the Render Farm (on a single computer with lots of 
>> cores) is pretty fast.
>>
>> *_Pierre's Tests Results_* (Intel computer with Nvidia graphics board)*_
>> _*
>> "DNxHD corrected.xml Test" file
>> X11-OpenGL driver
>> command CIN_HW_DEV=vdpau ./cin
>>
>> Proxys mpeg
>> cpu % 21.6 - 43.7
>> frame/sec 11.18 - 12.16
>>
>> DNxHD media
>> cpu % 13.4 - 44.8
>> frame/sec 11.54 - 12.16
>>
>>
>> "DNxHD corrected.xml Test" file
>> X11-OpenGL driver
>> order /home/stone/Cinelerra-GG_5.1/cin
>>
>> Proxys mpeg
>> cpu % 19.1 - 44.3
>> frame/sec 11.39 - 12.24
>>
>> DNxHD media
>> cpu % 19.6 - 44.5
>> frame/sec 11.32 - 12.16
>>
>>
>> "DNxHD corrected.xml Test" file
>> X11 Pilot
>> command CIN_HW_DEV=vdpau ./cin
>>
>> Proxys mpeg
>> cpu % 19.5 - 41.7
>> frame/sec 29.97 - 30.15
>>
>> DNxHD media
>> cpu % 22.9 - 40.3
>> frame/sec 28.24 - 31.15
>> "DNxHD corrected.xml Test" file
>> X11 Pilot
>> order /home/stone/Cinelerra-GG_5.1/cin
>>
>> Proxys mpeg
>> cpu % 23.08 - 42.4
>> frame/sec 29.97 - 31.02
>>
>> DNxHD media
>> cpu % 21.7 - 43.5
>> frame/sec 29.97 - 31.02
>>
>>
>> Interesting....
>>
>> At first glance, I would say that:
>>
>> X11-OpenGL with or without vdpau
>>
>> Does not decode DNxHD sources
>>
>> Do not decode Proxys mpeg
>>
>> x11 with or without vdpau
>>
>> Decodes DNxHD sources
>>
>> Decode mpeg proxies
>>
>>
>> I think I will now have to do some identical tests with HDV and 
>> H264.mp4 sources.
>>
>> *Short Phyllis tests (*AMD computer, Radeon graphics board)
>>   using the proxy Mpeg version, I see:
>>    11% cpu usage with X11-OpenGL
>>    13% cpu usage with X11-OpenGL + vdpau/GPU
>>    16% cpu usage with X11 + vdpau/GPU
>>    21% cpu usage with X11
>>
>> _*Most of the rest of the email thread is below.*_
>>
>> Pierre Observation:
>>
>> I don't really see any gains either with X11 or X11-OpenGL, the 
>> viewing in the composer may be a lit
>> tle more fluid but I'm not sure.
>>
>> I'm surprised that X11-OpenGL can't be fluid with mpeg.mpeg proxies,
>> while it's much more fluid with Clowns' h264.mpeg and X11 does it all
>> much better.
>>
>> But, as it is the mpeg.mpeg proxies that I actually use, shouldn't vdpau
>> be able to decode these mpeg?
>>
>> Phyllis some response:
>>
>> So I had 4 dnxhd files from previous reports and I proxied them as 1/2 
>> mpeg-s.  Although it is not o
>> bvious that they are using vdpau, they actually are.  Bill reminded me 
>> to edit ffmpeg/decode.opts an
>> d change "loglevel=fatal" to "loglevel=verbose", restart Cinelerra and 
>> then in the cinelerra startup
>>   window you will see messages for the Mpegs:     (you might have to 
>> also edit bin/ffmpeg/decode.opts
>> )
>>
>> [AVHWDeviceContext @ 0x7fff182c3cc0] Successfully created a VDPAU 
>> device (G3DVL VDPAU Driver Shared
>> Library version 1.0) on X11 display :0
>> [AVHWDeviceContext @ 0x7ffea8afc980] Successfully created a VDPAU 
>> device (G3DVL VDPAU Driver Shared
>> Library version 1.0) on X11 display :0
>> [AVHWDeviceContext @ 0x7fff6c1b12c0] Successfully created a VDPAU 
>> device (G3DVL VDPAU Driver Shared
>> Library version 1.0) on X11 display :0
>> [AVHWDeviceContext @ 0x7fff6f223300] Successfully created a VDPAU 
>> device (G3DVL VDPAU Driver Shared
>> Library version 1.0) on X11 display :0
>>
>> When you use X11-OpenGL, which was written long ago when there was 
>> mostly only 1 CPU so only 1 threa
>> d, the computer to the Graphics board can become bottlenecked with 
>> Cinelerra calling for OpenGL grap
>> hics and at the same time the GPU being used with vdpau.
>> ---------------------------------------------------------------------------------------------- 
>>
>>
>> Pierre Observation:
>>
>> What surprises me though is that this difficulty does not exist under
>> X11; the accumulation of video tracks from mixers does not cause the
>> composer to slow down under this video driver.When I play DNxHD 
>> sources the CPU usage is 37-41%,
>> If I play proxies in mpeg, the CPU usage is 10-18%.
>> But in both cases (DNxHD and proxies in mpeg), the frame rate (in
>> CinGG's preferences) is 11-12 frame/s (whereas the normal rate should be
>> 29.97 frame/s).
>>
>> Phyllis some response:
>>
>> When I play it, X11-OpenGL slows down so that the mixers are done 
>> playing and compositor is only on
>> frame 106; but with X11 I still have slow down, just not as bad.  The 
>> mixers are done playing but th
>> e compositor is still playing at about frame 175 - so for me X11 does 
>> still have the difficulty.
>>
>> I suspect that your computer is faster with more cores.  For example, 
>> with X11-OpenGL, if I run the
>> "top" command from another window and watch it, I see it goes to 489% 
>> so is using threads/multiple c
>> ores BUT it must be waiting on the single threaded Graphics Board. 
>> Since when I just use X11, the p
>> rogram is not waiting on the graphics boards and runs at 600%.  The 
>> graphics board is likely a bottl
>> eneck.
>> --------------------------------------------------------------------------------------------------- 
>>
>>
>> Pierre Observation:
>>
>> Under the X11 video driver:
>> With DNxHD, the CPU is at 35% and the frame rate at 29.97-30 frame/s
>> With Proxys in mpeg, the CPU is at 6.5-10.9% and the frame rate is also
>> at 29.97-30 frame/s.
>>
>> I don't have the feeling that the GPU decodes video tracks under
>> X11-OpenGL even in the case of mpeg proxies.
>>
>>      Secondly,
>>
>>      Under the X11-OpenGL video driver:
>>      When I play DNxHD sources the CPU usage is 37-41%,
>>      If I play proxies in mpeg, the CPU usage is 10-18%.
>>      But in both cases (DNxHD and proxies in mpeg), the frame rate (in
>>      CinGG's preferences) is 11-12 frame/s (whereas the normal rate 
>> should be
>>      29.97 frame/s).
>>
>> Phyllis some response:
>>
>> Unfortunately, I am not seeing this.  For example with:
>>       EX-EGO Test DNxHD/Cam-4_MVI_1321_EX-EGO_cam-D.mov
>> I see 29.97 fps in preferences and 60% CPU
>>       EX-EGO Test DNxHD/Cam-4_MVI_1321_EX-EGO_cam-D.proxy2-mov.mpeg
>> and here I see 29.97 fps and 70% CPU
>>
>> Something else must be causing the problem you see.  GG noticed that 
>> these DNxHD sources are very large in size but I do not think that the 
>> disk I/O would slow anything down.
>>
>> I think you might be correct about GPU not doing the decoding under 
>> X11-OpenGL, but I can not find anything that corroborates that.  I do 
>> see that with "loglevel=verbose" a vdpau device is created in either 
>> the X11 or the OpenGL driver case.  But I am finding CPU usage is 
>> actually higher with the X11-OpenGL driver PLUS vdapu than X11-OpenGL 
>> MINUS vdpau.  So I thought it was a really bad idea to use OpenGL and 
>> GPU/vdpau together.
>>
>> However, Sam and Andrea thought they got better results using 
>> X11-OpenGL than X11 with vdpau enabled.  This has been mystifying to 
>> me as I definitely only saw good improvements using X11.  Since we can 
>> not figure it out, I have decided that it might be due to the actual 
>> Graphics Card being used in conjunction with the Nvidia driver and 
>> operating system.
>>
>> Summary - on computers with lots of CPU cores it does not seem 
>> worthwhile to bother with using the graphics board GPU for decoding. 
>> And that probably applies to encoding too in the case of the final 
>> render because using the Render Farm (on a single computer with an 
>> Epyc chip) is so fast as to be trivial.
>> ------------------------------------------------------------------------------------------------------------ 
>>
>>
>> Pierre tests on the following:
>>
>> The processor of my computer is an i7-3770k, so it has 4 physical core,
>> 8 threads by Hyper-Threading (2 processing threads per physical core) at
>> 3.50 GHz (turbo 3.90 GHz). 32 GB of ram and an nVidia GTX-750ti
>> extension video card.
>>
>> The nVidia card, which is a few years old, is not very powerful for 
>> aespecially useful to me because of its ability to manage 4 monitors
>> 1920x1080 simultaneously. In my case, it is connected to three monitors.
>>
>> I therefore suspect that this video card does not offer a significantly
>> greater performance gain than the possibilities of my CPU...
>>
>> Phyllis tests on the following:
>>
>>   AMD 8-Core RYZEN 7 1700 Processor, 3.0 GHz
>>    cache size      : 512 KB
>>    memory          : 64 GB
>>   Radeon Graphics Board  : Radeon RX580 4GB
>>
>> gamer type card (the really powerful models have become extremely
>> expensive since the arrival of bitcoin mining...). This video card is
>>
>>
>>
>>
>>


More information about the Cin mailing list