[Cin] OpenCL on CPU

Andrew Randrianasulu randrianasulu at gmail.com
Fri Apr 25 23:35:03 CEST 2025


пт, 25 апр. 2025 г., 22:06 Andrew Randrianasulu <randrianasulu at gmail.com>:

>
>
> чт, 24 апр. 2025 г., 19:54 Andrew Randrianasulu <randrianasulu at gmail.com>:
>
>>
>>
>> чт, 24 апр. 2025 г., 18:39 Andrew Randrianasulu <randrianasulu at gmail.com
>> >:
>>
>>> note, openCL is different to openGL, mostly being more about more
>>> accurate computations.
>>>
>>> On AMD FX4300, 32bit userspace but llvm probably uses avx?
>>>
>>>
>>> guest at slax:/dev/shm/mesa/BUILD$ RUSTICL_ENABLE=llvmpipe  clpeak
>>>
>>> Platform: rusticl
>>>   Device: llvmpipe (LLVM 20.1.3, 256 bits)
>>>     Driver version  : 25.2.0-devel (git-845611bb43) (Linux x86)
>>>     Compute units   : 8
>>>     Clock frequency : 300 MHz
>>>
>>>     Global memory bandwidth (GBPS)
>>>       float   : 3.72
>>>       float2  : 4.08
>>>       float4  : 3.59
>>>       float8  : 2.81
>>>       float16 : 2.09
>>>
>>>     Single-precision compute (GFLOPS)
>>>       float   : 14.67
>>>       float2  : 17.86
>>>       float4  : 15.99
>>>       float8  : 14.72
>>>       float16 : 14.63
>>>
>>>     No half precision support! Skipped
>>>
>>>     No double precision support! Skipped
>>>
>>>     Integer compute (GIOPS)
>>>       int   : 13.89
>>>       int2  : 13.25
>>>       int4  : 12.85
>>>                                 int8  : 13.04
>>>       int16 : 11.51
>>>
>>>     Integer compute Fast 24bit (GIOPS)
>>>                                  int   : 13.65
>>>       int2  : 13.29
>>>       int4  : 13.23
>>>                                 int8  : 12.90
>>>       int16 : 11.08
>>>
>>>     Transfer bandwidth (GBPS)
>>>       enqueueWriteBuffer              : 2.82
>>>                                  enqueueReadBuffer               : 1.08
>>>       enqueueWriteBuffer non-blocking : 2.89
>>>       enqueueReadBuffer non-blocking  : 1.02
>>>       enqueueMapBuffer(for read)      : 1.15
>>>         memcpy from mapped ptr        : 3.02
>>>       enqueueUnmap(after write)       : 2.22
>>>         memcpy to mapped ptr          : 3.01
>>>
>>>     Kernel launch latency : 21.55 us
>>>
>>> guest at slax:/dev/shm/mesa/BUILD$
>>>
>>> command to build somewhat minimal mesa (llvmpipe + amd):
>>>
>>>
>>> meson ../ --prefix=/usr/X11R7 --libdir=lib --strip --buildtype
>>> debugoptimized -Degl=enabled -Dosmesa=true -Dplatforms=x11
>>> -Dgallium-drivers=r600,radeonsi,llvmpipe -Dvulkan-drivers=amd,swrast
>>> -Dgallium-nine=true -Dgallium-va=enabled  -Dgallium-xa=disabled
>>> -Dgallium-rusticl=true -Dllvm=enabled -Drust_std=2021  -Dvideo-codecs="all"
>>>
>>> of course you can set your own prefix ( I have X installed into
>>> non-default location).
>>>
>>> Biggest obstacle for me was that mesa git require some new llvm, and
>>> just released two days ago SPIRV-Tools-2024.4 !
>>>
>>> And github "release" is of course broken, in sense you need to manually
>>> fetch headers at specific commit.
>>>
>>> Of course "real gpu" will get like >200 GFLOPS , even my puny GF710 was
>>> that fast, but possibility of lock up makes this option less attractive ;)
>>>
>>
>>
>>
>> but of course real ffmpeg command fail mysteriously:
>>
>> RUSTICL_ENABLE=llvmpipe ffmpeg  -init_hw_device opencl=ocl
>> -filter_hw_device ocl  -i
>> ~/K38_sdcard1/Documents/iPhone11_4K-recorder_59.940HDR10.mov -s 512:384 -r
>> 10 -vf
>> "format=p010,hwupload,tonemap_opencl=tonemap=mobius:param=0.01:desat=0:r=tv:p=bt709:t=bt709:m=bt709:format=nv12,hwdownload,format=nv12"
>> -c:a copy -c:s copy -c:v libx264 -f mp4 /dev/null -debug verbose
>>
>> ffmpeg: ../src/compiler/nir/nir_metadata.c:172:
>> nir_metadata_check_validation_flag: Assertion `!(impl->valid_metadata &
>> nir_metadata_not_properly_reset)' failed.
>> Aborted
>>
>
>
> real hw opencl from RX550 and ffmpeg 7.1.1 works, just with ~5 fps ;)
>
> ./ffmpeg  -hwaccel vaapi -init_hw_device opencl=ocl -filter_hw_device ocl
> -i ~/K38_sdcard1/Documents/iPhone11_4K-recorder_59.940HDR10.mov  -vf
> "format=p010,hwupload,tonemap_opencl=tonemap=mobius:param=0.01:desat=0:r=tv:p=bt709:t=bt709:m=bt709:format=p010,hwdownload,format=p010"
> -c:a copy -c:s copy -c:v rawvideo -f avi /dev/null -loglevel verbose
>
> [out#0/avi @ 0xbe7b000] Starting thread...
> frame=    1 fps=0.7 q=-0.0 size=      10KiB time=00:00:00.01
> bitrate=4750.2kbits/s speed=0.0111x
> frame=    3 fps=1.5 q=-0.0 size=   32256KiB time=00:00:00.05
> bitrate=5279543.5kbits/s speed=0.025x
> frame=    5 fps=2.0 q=-0.0 size=   80896KiB time=00:00:00.08
> bitrate=7944424.2kbits/s speed=0.0334x
> frame=    8 fps=2.7 q=-0.0 size=  113408KiB time=00:00:00.13
> bitrate=6960809.3kbits/s speed=0.0445x
> frame=   10 fps=2.9 q=-0.0 size=  161792KiB time=00:00:00.18
> bitrate=7222219.5kbits/s speed=0.0524x
> frame=   13 fps=3.2 q=-0.0 size=  194304KiB time=00:00:00.23
> bitrate=6814911.2kbits/s speed=0.0584x
> frame=   15 fps=3.3 q=-0.0 size=  242944KiB time=00:00:00.26
> bitrate=7455793.2kbits/s speed=0.0593x
> frame=   18 fps=3.6 q=-0.0 size=  275456KiB time=00:00:00.31
> bitrate=7118790.4kbits/s speed=0.0634x
> frame=   20 fps=3.6 q=-0.0 size=  323840KiB time=00:00:00.35
> bitrate=7572134.4kbits/s speed=0.0637x
> frame=   23 fps=3.8 q=-0.0 size=  372480KiB time=00:00:00.40
> bitrate=7620769.6kbits/s speed=0.0667x
> frame=   26 fps=4.0 q=-0.0 size=  404992KiB time=00:00:00.45
> bitrate=7365289.1kbits/s speed=0.0693x
> frame=   28 fps=4.0 q=-0.0 size=  453632KiB time=00:00:00.48
> bitrate=7680906.9kbits/s speed=0.0691x
> frame=   31 fps=4.1 q=-0.0 size=  485888KiB time=00:00:00.53
> bitrate=7455779.2kbits/s speed=0.0711x
> frame=   33 fps=4.1 q=-0.0 size=  534528KiB time=00:00:00.56
> bitrate=7719673.2kbits/s speed=0.0709x
> frame=   36 fps=4.2 q=-0.0 size=  567040KiB time=00:00:00.61
> bitrate=7525222.1kbits/s speed=0.0726x
> frame=   38 fps=4.2 q=-0.0 size=  615680KiB time=00:00:00.65
> bitrate=7751710.7kbits/s speed=0.0723x
> frame=   41 fps=4.3 q=-0.0 size=  664320KiB time=00:00:00.70
> bitrate=7766675.4kbits/s speed=0.0737x
> frame=   43 fps=4.3 q=-0.0 size=  680448KiB time=00:00:00.73
> bitrate=7593625.7kbits/s speed=0.0734x
> frame=   46 fps=4.4 q=-0.0 size=  745216KiB time=00:00:00.78
> bitrate=7785584.9kbits/s speed=0.0746x
> frame=   49 fps=4.5 q=-0.0 size=  777728KiB time=00:00:00.83
> bitrate=7637736.5kbits/s speed=0.0758x
> frame=   51 fps=4.4 q=-0.0 size=  826368KiB time=00:00:00.86
> bitrate=7803284.3kbits/s speed=0.0754x
> frame=   54 fps=4.5 q=-0.0 size=  858624KiB time=00:00:00.91
> bitrate=7665625.7kbits/s speed=0.0764xframe=   56 fps=4.5 q=-0.0 size=
> 891136KiB time=00:00:00.95 bitrate=7676729.7kbits/s speed=0.076x frame=
>  59 fps=4.5 q=-0.0 size=  955904KiB time=00:00:01.00
> bitrate=7822942.6kbits/s speed=0.077x frame=   61 fps=4.5 q=-0.0 size=
> 972032KiB time=00:00:01.03 bitrate=7698318.0kbits/s speed=0.0766xframe=
>  64 fps=4.6 q=-0.0 size= 1036800KiB time=00:00:01.08
> bitrate=7832287.4kbits/s speed=0.0774xframe=   67 fps=4.6 q=-0.0 size=
> 1069345KiB time=00:00:01.13 bitrate=7721752.5kbits/s speed=0.0782xframe=
>  69 fps=4.6 q=-0.0 size= 1117985KiB time=00:00:01.16
> bitrate=7842330.5kbits/s speed=0.0778xframe=   72 fps=4.6 q=-0.0 size=
> 1150241KiB time=00:00:01.21 bitrate=7737010.4kbits/s speed=0.0785xframe=
>  74 fps=4.6 q=-0.0 size= 1198881KiB time=00:00:01.25
> bitrate=7849136.7kbits/s speed=0.0782xframe=   77 fps=4.7 q=-0.0 size=
> 1231393KiB time=00:00:01.30 bitrate=7751917.8kbits/s speed=0.0788xframe=
>  79 fps=4.6 q=-0.0 size= 1263649KiB time=00:00:01.33
> bitrate=7756100.8kbits/s speed=0.0785xframe=   82 fps=4.7 q=-0.0 size=
> 1328673KiB time=00:00:01.38 bitrate=7860442.5kbits/s speed=0.0791xframe=
>  85 fps=4.7 q=-0.0 size= 1360929KiB time=00:00:01.43
> bitrate=7770411.2kbits/s speed=0.0797xframe=   87 fps=4.7 q=-0.0 size=
> 1409569KiB time=00:00:01.46 bitrate=7865219.6kbits/s speed=0.0793xframe=
>  90 fps=4.7 q=-0.0 size= 1442081KiB time=00:00:01.51
> bitrate=7781358.9kbits/s speed=0.0799xframe=   92 fps=4.7 q=-0.0 size=
> 1490465KiB time=00:00:01.55 bitrate=7869477.9kbits/s speed=0.0795xframe=
>  95 fps=4.7 q=-0.0 size= 1522977KiB time=00:00:01.60
> bitrate=7789851.9kbits/s speed=0.08x  frame=   97 fps=4.7 q=-0.0 size=
> 1555489KiB time=00:00:01.63 bitrate=7793775.1kbits/s speed=0.0797xframe=
> 100 fps=4.8 q=-0.0 size= 1620257KiB time=00:00:01.68
> bitrate=7877157.6kbits/s speed=0.0802xframe=  103 fps=4.8 q=-0.0 size=
> 1652513KiB time=00:00:01.73 bitrate=7802226.5kbits/s speed=0.0807xframe=
> 105 fps=4.8 q=-0.0 size= 1701153KiB time=00:00:01.76
> bitrate=7880335.1kbits/s speed=0.0803xframe=  108 fps=4.8 q=-0.0 size=
> 1733665KiB time=00:00:01.81 bitrate=7809906.9kbits/s speed=0.0808xframe=
> 110 fps=4.8 q=-0.0 size= 1782305KiB time=00:00:01.85
> bitrate=7884354.4kbits/s speed=0.0805xframe=  113 fps=4.8 q=-0.0 size=
> 1830945KiB time=00:00:01.90 bitrate=7886377.1kbits/s speed=0.0809xframe=
> 115 fps=4.8 q=-0.0 size= 1863201KiB time=00:00:01.93
> bitrate=7886943.6kbits/s speed=0.0806xframe=  118 fps=4.8 q=-0.0 size=
> 1911841KiB time=00:00:01.98 bitrate=7888816.1kbits/s speed=0.081x frame=
> 120 fps=4.8 q=-0.0 size= 1927969KiB time=00:00:02.01
> bitrate=7823873.9kbits/s speed=0.0807xframe=  123 fps=4.8 q=-0.0 size=
> 1992993KiB time=00:00:02.06 bitrate=7892075.9kbits/s speed=0.0811xframe=
> 126 fps=4.8 q=-0.0 size= 2025249KiB time=00:00:02.11
> bitrate=7830362.5kbits/s speed=0.0814xframe=  128 fps=4.8 q=-0.0 size=
> 2073889KiB time=00:00:02.15 bitrate=7894104.9kbits/s speed=0.0812xframe=
> 131 fps=4.8 q=-0.0 size= 2106401KiB time=00:00:02.20
> bitrate=7835635.4kbits/s speed=0.0815xframe=  133 fps=4.8 q=-0.0 size=
> 2154809KiB time=00:00:02.23 bitrate=7896071.0kbits/s speed=0.0813xframe=
> 136 fps=4.9 q=-0.0 size= 2187321KiB time=00:00:02.28
> bitrate=7839692.4kbits/s speed=0.0816xframe=  138 fps=4.8 q=-0.0 size=
> 2235961KiB time=00:00:02.31 bitrate=7898718.1kbits/s speed=0.0813xframe=
> 141 fps=4.9 q=-0.0 size= 2284601KiB time=00:00:02.36
> bitrate=7900038.5kbits/s speed=0.0816xframe=  144 fps=4.9 q=-0.0 size=
> 2316857KiB time=00:00:02.41 bitrate=7845821.4kbits/s speed=0.082x frame=
> 146 fps=4.9 q=-0.0 size= 2365497KiB time=00:00:02.45
> bitrate=7901548.2kbits/s speed=0.0817xframe=  149 fps=4.9 q=-0.0 size=
> 2398009KiB time=00:00:02.50 bitrate=7849946.2kbits/s speed=0.082x frame=
> 151 fps=4.9 q=-0.0 size= 2446649KiB time=00:00:02.53
> bitrate=7903785.6kbits/s speed=0.0818xframe=  154 fps=4.9 q=-0.0 size=
> 2478905KiB time=00:00:02.58 bitrate=7852993.8kbits/s speed=0.0821xframe=
> 156 fps=4.9 q=-0.0 size= 2527545KiB time=00:00:02.61
> bitrate=7905082.9kbits/s speed=0.0818xframe=  159 fps=4.9 q=-0.0 size=
> 2576185KiB time=00:00:02.66 bitrate=7906135.4kbits/s speed=0.0821xframe=
> 161 fps=4.9 q=-0.0 size= 2608697KiB time=00:00:02.70
> bitrate=7907073.1kbits/s speed=0.0819xframe=  164 fps=4.9 q=-0.0 size=
> 2657081KiB time=00:00:02.75 bitrate=7907295.6kbits/s speed=0.0821xframe=
> 166 fps=4.9 q=-0.0 size= 2673465KiB time=00:00:02.78
> bitrate=7860770.3kbits/s speed=0.0819xframe=  169 fps=4.9 q=-0.0 size=
> 2738233KiB time=00:00:02.83 bitrate=7909127.1kbits/s speed=0.0822xframe=
> 172 fps=4.9 q=-0.0 size= 2770745KiB time=00:00:02.88
> bitrate=7864254.0kbits/s speed=0.0824x
>
>
> rawvideo here only for testing opencl and decoder alone, without x264
> overhead (can't make opencl operate fully on GPU)
>
> Not exactly stellar results, but may be GB/S in pcie bandwidth  line
> report in dmesg mean GigaBITS, not gigabytes? Then it should give just 4
> gigabytes per second, this card only can do 8x and motherboard only PCIE 2.0
>

And this line decodes on GPU, pulls it to host, scales to FHD size, uploads
to GPU for opencl tonemapping, then downloads back to host for x264
encoding:

./ffmpeg -hwaccel vaapi -init_hw_device opencl=ocl -filter_hw_device ocl
 -i ~/K38_sdcard1/Documents/iPhone11_4K-recorder_59.940HDR10.mov  -vf
"scale=1920:1080,format=p010,hwupload,tonemap_opencl=tonemap=mobius:param=0.01:desat=0:r=tv:p=bt709:t=bt709:m=bt709:format=nv12,hwdownload,format=nv12"
-c:a copy -c:s copy -c:v libx264 -f mp4 /dev/shm/fhd-sdr-ocl.mp4


speeds up to nearly 7.0 fps! Without hw decoding it goes slower, at may be
5 fps. But may be on 64bit userspace sw decoding will be faster!






>
>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cinelerra-gg.org/pipermail/cin/attachments/20250426/19310dbf/attachment-0001.htm>


More information about the Cin mailing list