[Cin] OpenCL on CPU
Andrew Randrianasulu
randrianasulu at gmail.com
Fri Apr 25 21:06:27 CEST 2025
чт, 24 апр. 2025 г., 19:54 Andrew Randrianasulu <randrianasulu at gmail.com>:
>
>
> чт, 24 апр. 2025 г., 18:39 Andrew Randrianasulu <randrianasulu at gmail.com>:
>
>> note, openCL is different to openGL, mostly being more about more
>> accurate computations.
>>
>> On AMD FX4300, 32bit userspace but llvm probably uses avx?
>>
>>
>> guest at slax:/dev/shm/mesa/BUILD$ RUSTICL_ENABLE=llvmpipe clpeak
>>
>> Platform: rusticl
>> Device: llvmpipe (LLVM 20.1.3, 256 bits)
>> Driver version : 25.2.0-devel (git-845611bb43) (Linux x86)
>> Compute units : 8
>> Clock frequency : 300 MHz
>>
>> Global memory bandwidth (GBPS)
>> float : 3.72
>> float2 : 4.08
>> float4 : 3.59
>> float8 : 2.81
>> float16 : 2.09
>>
>> Single-precision compute (GFLOPS)
>> float : 14.67
>> float2 : 17.86
>> float4 : 15.99
>> float8 : 14.72
>> float16 : 14.63
>>
>> No half precision support! Skipped
>>
>> No double precision support! Skipped
>>
>> Integer compute (GIOPS)
>> int : 13.89
>> int2 : 13.25
>> int4 : 12.85
>> int8 : 13.04
>> int16 : 11.51
>>
>> Integer compute Fast 24bit (GIOPS)
>> int : 13.65
>> int2 : 13.29
>> int4 : 13.23
>> int8 : 12.90
>> int16 : 11.08
>>
>> Transfer bandwidth (GBPS)
>> enqueueWriteBuffer : 2.82
>> enqueueReadBuffer : 1.08
>> enqueueWriteBuffer non-blocking : 2.89
>> enqueueReadBuffer non-blocking : 1.02
>> enqueueMapBuffer(for read) : 1.15
>> memcpy from mapped ptr : 3.02
>> enqueueUnmap(after write) : 2.22
>> memcpy to mapped ptr : 3.01
>>
>> Kernel launch latency : 21.55 us
>>
>> guest at slax:/dev/shm/mesa/BUILD$
>>
>> command to build somewhat minimal mesa (llvmpipe + amd):
>>
>>
>> meson ../ --prefix=/usr/X11R7 --libdir=lib --strip --buildtype
>> debugoptimized -Degl=enabled -Dosmesa=true -Dplatforms=x11
>> -Dgallium-drivers=r600,radeonsi,llvmpipe -Dvulkan-drivers=amd,swrast
>> -Dgallium-nine=true -Dgallium-va=enabled -Dgallium-xa=disabled
>> -Dgallium-rusticl=true -Dllvm=enabled -Drust_std=2021 -Dvideo-codecs="all"
>>
>> of course you can set your own prefix ( I have X installed into
>> non-default location).
>>
>> Biggest obstacle for me was that mesa git require some new llvm, and just
>> released two days ago SPIRV-Tools-2024.4 !
>>
>> And github "release" is of course broken, in sense you need to manually
>> fetch headers at specific commit.
>>
>> Of course "real gpu" will get like >200 GFLOPS , even my puny GF710 was
>> that fast, but possibility of lock up makes this option less attractive ;)
>>
>
>
>
> but of course real ffmpeg command fail mysteriously:
>
> RUSTICL_ENABLE=llvmpipe ffmpeg -init_hw_device opencl=ocl
> -filter_hw_device ocl -i
> ~/K38_sdcard1/Documents/iPhone11_4K-recorder_59.940HDR10.mov -s 512:384 -r
> 10 -vf
> "format=p010,hwupload,tonemap_opencl=tonemap=mobius:param=0.01:desat=0:r=tv:p=bt709:t=bt709:m=bt709:format=nv12,hwdownload,format=nv12"
> -c:a copy -c:s copy -c:v libx264 -f mp4 /dev/null -debug verbose
>
> ffmpeg: ../src/compiler/nir/nir_metadata.c:172:
> nir_metadata_check_validation_flag: Assertion `!(impl->valid_metadata &
> nir_metadata_not_properly_reset)' failed.
> Aborted
>
real hw opencl from RX550 and ffmpeg 7.1.1 works, just with ~5 fps ;)
./ffmpeg -hwaccel vaapi -init_hw_device opencl=ocl -filter_hw_device ocl
-i ~/K38_sdcard1/Documents/iPhone11_4K-recorder_59.940HDR10.mov -vf
"format=p010,hwupload,tonemap_opencl=tonemap=mobius:param=0.01:desat=0:r=tv:p=bt709:t=bt709:m=bt709:format=p010,hwdownload,format=p010"
-c:a copy -c:s copy -c:v rawvideo -f avi /dev/null -loglevel verbose
[out#0/avi @ 0xbe7b000] Starting thread...
frame= 1 fps=0.7 q=-0.0 size= 10KiB time=00:00:00.01
bitrate=4750.2kbits/s speed=0.0111x
frame= 3 fps=1.5 q=-0.0 size= 32256KiB time=00:00:00.05
bitrate=5279543.5kbits/s speed=0.025x
frame= 5 fps=2.0 q=-0.0 size= 80896KiB time=00:00:00.08
bitrate=7944424.2kbits/s speed=0.0334x
frame= 8 fps=2.7 q=-0.0 size= 113408KiB time=00:00:00.13
bitrate=6960809.3kbits/s speed=0.0445x
frame= 10 fps=2.9 q=-0.0 size= 161792KiB time=00:00:00.18
bitrate=7222219.5kbits/s speed=0.0524x
frame= 13 fps=3.2 q=-0.0 size= 194304KiB time=00:00:00.23
bitrate=6814911.2kbits/s speed=0.0584x
frame= 15 fps=3.3 q=-0.0 size= 242944KiB time=00:00:00.26
bitrate=7455793.2kbits/s speed=0.0593x
frame= 18 fps=3.6 q=-0.0 size= 275456KiB time=00:00:00.31
bitrate=7118790.4kbits/s speed=0.0634x
frame= 20 fps=3.6 q=-0.0 size= 323840KiB time=00:00:00.35
bitrate=7572134.4kbits/s speed=0.0637x
frame= 23 fps=3.8 q=-0.0 size= 372480KiB time=00:00:00.40
bitrate=7620769.6kbits/s speed=0.0667x
frame= 26 fps=4.0 q=-0.0 size= 404992KiB time=00:00:00.45
bitrate=7365289.1kbits/s speed=0.0693x
frame= 28 fps=4.0 q=-0.0 size= 453632KiB time=00:00:00.48
bitrate=7680906.9kbits/s speed=0.0691x
frame= 31 fps=4.1 q=-0.0 size= 485888KiB time=00:00:00.53
bitrate=7455779.2kbits/s speed=0.0711x
frame= 33 fps=4.1 q=-0.0 size= 534528KiB time=00:00:00.56
bitrate=7719673.2kbits/s speed=0.0709x
frame= 36 fps=4.2 q=-0.0 size= 567040KiB time=00:00:00.61
bitrate=7525222.1kbits/s speed=0.0726x
frame= 38 fps=4.2 q=-0.0 size= 615680KiB time=00:00:00.65
bitrate=7751710.7kbits/s speed=0.0723x
frame= 41 fps=4.3 q=-0.0 size= 664320KiB time=00:00:00.70
bitrate=7766675.4kbits/s speed=0.0737x
frame= 43 fps=4.3 q=-0.0 size= 680448KiB time=00:00:00.73
bitrate=7593625.7kbits/s speed=0.0734x
frame= 46 fps=4.4 q=-0.0 size= 745216KiB time=00:00:00.78
bitrate=7785584.9kbits/s speed=0.0746x
frame= 49 fps=4.5 q=-0.0 size= 777728KiB time=00:00:00.83
bitrate=7637736.5kbits/s speed=0.0758x
frame= 51 fps=4.4 q=-0.0 size= 826368KiB time=00:00:00.86
bitrate=7803284.3kbits/s speed=0.0754x
frame= 54 fps=4.5 q=-0.0 size= 858624KiB time=00:00:00.91
bitrate=7665625.7kbits/s speed=0.0764xframe= 56 fps=4.5 q=-0.0 size=
891136KiB time=00:00:00.95 bitrate=7676729.7kbits/s speed=0.076x frame=
59 fps=4.5 q=-0.0 size= 955904KiB time=00:00:01.00
bitrate=7822942.6kbits/s speed=0.077x frame= 61 fps=4.5 q=-0.0 size=
972032KiB time=00:00:01.03 bitrate=7698318.0kbits/s speed=0.0766xframe=
64 fps=4.6 q=-0.0 size= 1036800KiB time=00:00:01.08
bitrate=7832287.4kbits/s speed=0.0774xframe= 67 fps=4.6 q=-0.0 size=
1069345KiB time=00:00:01.13 bitrate=7721752.5kbits/s speed=0.0782xframe=
69 fps=4.6 q=-0.0 size= 1117985KiB time=00:00:01.16
bitrate=7842330.5kbits/s speed=0.0778xframe= 72 fps=4.6 q=-0.0 size=
1150241KiB time=00:00:01.21 bitrate=7737010.4kbits/s speed=0.0785xframe=
74 fps=4.6 q=-0.0 size= 1198881KiB time=00:00:01.25
bitrate=7849136.7kbits/s speed=0.0782xframe= 77 fps=4.7 q=-0.0 size=
1231393KiB time=00:00:01.30 bitrate=7751917.8kbits/s speed=0.0788xframe=
79 fps=4.6 q=-0.0 size= 1263649KiB time=00:00:01.33
bitrate=7756100.8kbits/s speed=0.0785xframe= 82 fps=4.7 q=-0.0 size=
1328673KiB time=00:00:01.38 bitrate=7860442.5kbits/s speed=0.0791xframe=
85 fps=4.7 q=-0.0 size= 1360929KiB time=00:00:01.43
bitrate=7770411.2kbits/s speed=0.0797xframe= 87 fps=4.7 q=-0.0 size=
1409569KiB time=00:00:01.46 bitrate=7865219.6kbits/s speed=0.0793xframe=
90 fps=4.7 q=-0.0 size= 1442081KiB time=00:00:01.51
bitrate=7781358.9kbits/s speed=0.0799xframe= 92 fps=4.7 q=-0.0 size=
1490465KiB time=00:00:01.55 bitrate=7869477.9kbits/s speed=0.0795xframe=
95 fps=4.7 q=-0.0 size= 1522977KiB time=00:00:01.60
bitrate=7789851.9kbits/s speed=0.08x frame= 97 fps=4.7 q=-0.0 size=
1555489KiB time=00:00:01.63 bitrate=7793775.1kbits/s speed=0.0797xframe=
100 fps=4.8 q=-0.0 size= 1620257KiB time=00:00:01.68
bitrate=7877157.6kbits/s speed=0.0802xframe= 103 fps=4.8 q=-0.0 size=
1652513KiB time=00:00:01.73 bitrate=7802226.5kbits/s speed=0.0807xframe=
105 fps=4.8 q=-0.0 size= 1701153KiB time=00:00:01.76
bitrate=7880335.1kbits/s speed=0.0803xframe= 108 fps=4.8 q=-0.0 size=
1733665KiB time=00:00:01.81 bitrate=7809906.9kbits/s speed=0.0808xframe=
110 fps=4.8 q=-0.0 size= 1782305KiB time=00:00:01.85
bitrate=7884354.4kbits/s speed=0.0805xframe= 113 fps=4.8 q=-0.0 size=
1830945KiB time=00:00:01.90 bitrate=7886377.1kbits/s speed=0.0809xframe=
115 fps=4.8 q=-0.0 size= 1863201KiB time=00:00:01.93
bitrate=7886943.6kbits/s speed=0.0806xframe= 118 fps=4.8 q=-0.0 size=
1911841KiB time=00:00:01.98 bitrate=7888816.1kbits/s speed=0.081x frame=
120 fps=4.8 q=-0.0 size= 1927969KiB time=00:00:02.01
bitrate=7823873.9kbits/s speed=0.0807xframe= 123 fps=4.8 q=-0.0 size=
1992993KiB time=00:00:02.06 bitrate=7892075.9kbits/s speed=0.0811xframe=
126 fps=4.8 q=-0.0 size= 2025249KiB time=00:00:02.11
bitrate=7830362.5kbits/s speed=0.0814xframe= 128 fps=4.8 q=-0.0 size=
2073889KiB time=00:00:02.15 bitrate=7894104.9kbits/s speed=0.0812xframe=
131 fps=4.8 q=-0.0 size= 2106401KiB time=00:00:02.20
bitrate=7835635.4kbits/s speed=0.0815xframe= 133 fps=4.8 q=-0.0 size=
2154809KiB time=00:00:02.23 bitrate=7896071.0kbits/s speed=0.0813xframe=
136 fps=4.9 q=-0.0 size= 2187321KiB time=00:00:02.28
bitrate=7839692.4kbits/s speed=0.0816xframe= 138 fps=4.8 q=-0.0 size=
2235961KiB time=00:00:02.31 bitrate=7898718.1kbits/s speed=0.0813xframe=
141 fps=4.9 q=-0.0 size= 2284601KiB time=00:00:02.36
bitrate=7900038.5kbits/s speed=0.0816xframe= 144 fps=4.9 q=-0.0 size=
2316857KiB time=00:00:02.41 bitrate=7845821.4kbits/s speed=0.082x frame=
146 fps=4.9 q=-0.0 size= 2365497KiB time=00:00:02.45
bitrate=7901548.2kbits/s speed=0.0817xframe= 149 fps=4.9 q=-0.0 size=
2398009KiB time=00:00:02.50 bitrate=7849946.2kbits/s speed=0.082x frame=
151 fps=4.9 q=-0.0 size= 2446649KiB time=00:00:02.53
bitrate=7903785.6kbits/s speed=0.0818xframe= 154 fps=4.9 q=-0.0 size=
2478905KiB time=00:00:02.58 bitrate=7852993.8kbits/s speed=0.0821xframe=
156 fps=4.9 q=-0.0 size= 2527545KiB time=00:00:02.61
bitrate=7905082.9kbits/s speed=0.0818xframe= 159 fps=4.9 q=-0.0 size=
2576185KiB time=00:00:02.66 bitrate=7906135.4kbits/s speed=0.0821xframe=
161 fps=4.9 q=-0.0 size= 2608697KiB time=00:00:02.70
bitrate=7907073.1kbits/s speed=0.0819xframe= 164 fps=4.9 q=-0.0 size=
2657081KiB time=00:00:02.75 bitrate=7907295.6kbits/s speed=0.0821xframe=
166 fps=4.9 q=-0.0 size= 2673465KiB time=00:00:02.78
bitrate=7860770.3kbits/s speed=0.0819xframe= 169 fps=4.9 q=-0.0 size=
2738233KiB time=00:00:02.83 bitrate=7909127.1kbits/s speed=0.0822xframe=
172 fps=4.9 q=-0.0 size= 2770745KiB time=00:00:02.88
bitrate=7864254.0kbits/s speed=0.0824x
rawvideo here only for testing opencl and decoder alone, without x264
overhead (can't make opencl operate fully on GPU)
Not exactly stellar results, but may be GB/S in pcie bandwidth line report
in dmesg mean GigaBITS, not gigabytes? Then it should give just 4 gigabytes
per second, this card only can do 8x and motherboard only PCIE 2.0
>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cinelerra-gg.org/pipermail/cin/attachments/20250425/f318b40c/attachment-0001.htm>
More information about the Cin
mailing list