kkoukos's Content - Affinity

AMD Radeon RX Hardware Acceleration

kkoukos replied to Mark Ingram's topic in V1 Bugs found on Windows

Hi, I just tested out the latest v.2 beta compared to the latest v.2 stable (2.0.4) on a Radeon RX 6900 XT. There is a huge difference in the benchmark result. More than 18000 for v.2.1 while only marginally more than 1000 for 2.0.4. I also monitored the GPU activity in parallel and there is also a noticeable difference in the GPU utilization. When it comes to testing it with real data i don't see a huge difference (as of 18x) from 2.0.4 but it feels smoother and seems to better utilize the GPU when using filters (in both destructive and non-destructive modes). Huge difference in responsiveness was noticed in the Lens Blur filter, which i believe is GPU accelerated in 2.1 while it wasn't in the stable version (2.0.4). And that is actually an indication that GPU acceleration does a great job. So overall i believe the developer team has made a great job.

March 24, 2023
301 replies
- 1
- amd
- radeon
- (and 2 more)
  Tagged with:
  - amd
  - radeon
  - rx
  - opencl

AMD Radeon RX Hardware Acceleration

kkoukos replied to Mark Ingram's topic in V1 Bugs found on Windows

It is really nice that Serif has finally enabled OpenCL support for AMD GPUs on v.2.0. Please keep it that way !!! Through some testing i came to the conclusion that it really works. Lets look, not only the benchmark but also some real case performance. I am using the latest AMD Pro Driver 22.Q4 but even with Windows 11 default driver v.30 or v.31 it still works great. Actually Windows 11 store driver v.30 might be slightly better. When looking at the benchmark result, it is really low for the GPU tested (i have an RX 6900 XT), compared to the results at the thread below: When this GPU is used in Metal it is capable of a score of nearly 50000. When measured on Windows 11 i got only slightly more than 1000 (the result between versions are not directly comparable and is used just indicatively). So where does the difference of 50x comes from , and does it really matter? If you look at the picture below, i run the benchmark and in parallel i simply have the Windows performance monitor showing what's going on. The result in the graphs includes only the benchmark in it's full duration. A close look on the graphs, can clearly explain why the performance is so low. During the benchmark the CPU is continuously utilized at almost 100% during the entire duration, even when GPU execution is taking place. It seems to me that GPU performance is actually throttled by the CPU capability to JIT compile the OpenCL kernels. So, no matter what GPU you have RX 5500, or RX 5700 or RX 6900 the result will be pretty much the same, because it's anyway limited by the CPU ability to compile the kernels. Similarly if the CPU can compile and spawn kernels at a higher rate, most likely the GPU performance will increase. Now if we take a look at the GPU performance, we observe that the GPU is utilized at a peak of less than 15%. NVIDIA's implementation (and i believe the same applies to Apple's Metal) pre-caches compiled OpenCL kernels, and as a result the benchmark result is substantially higher just because the CPU doesn't need to recompile the kernels. But the most interesting question is does it really matter? To find out if the low benchmark result somewhat affects the user experience i stitched a large panorama, exported it on a large 16bit tiff and reloaded it (without layers or anything else); and started testing the live blur filters. To my surprise, the behavior of the application was excellent. Those filters that were implemented to use the GPU have been working really smooth, and flawless. Monitoring the GPU activity i verified that there was copy and compute activity. The compute activity was actually higher than in the benchmark but still resulted in device under-utilization, which is pretty normal considering that most likely i didn't have enough data to fully utilize the GPU. To conclude, i think it's really great that Serif has finally enabled OpenCL acceleration on AMD GPUs. The benchmark results might be on the low side but this i believe doesn't necessarily translate to bad user experience, as i believe the typical usage scenario when you apply an OpenCL filter is to spent at least a few seconds until you get the right result. And there is of course plenty of room for optimization for the developers (e.g., manual kernel pre-caching and compilation at startup, etc), although i don't think it's needed. The typical overhead of the driver (as i measured it using hello-world like code) is in the range of a few ms (50-70), so it should be completely unnoticed even if it happens at filter loading. So don't stick to the benchmark, test it out

November 20, 2022
301 replies
- 2
- amd
- radeon
- (and 2 more)
  Tagged with:
  - amd
  - radeon
  - rx
  - opencl

AMD Radeon RX Hardware Acceleration

kkoukos replied to Mark Ingram's topic in V1 Bugs found on Windows

I have been evaluating the compile latency of AMD drivers, using the code posted by the developers at github for a few months now (since i noticed the issue myself). https://github.com/MarkIngramUK/ocl-compile-benchmark My experimentation in Windows 11, starting from driver Adrenalin 21.10.2 all the way up the most recent Pro 22Q2 shows very reasonable latency with the latest (22Q2) being around 62-63ms on an RX6900. The best result on Windows was with the default Win11 driver (no AMD driver installed) with a latency slightly above 53ms, while with a small modification (of time measurement code) the same code can run in Linux. I tested it with the open source driver (in linux-kernel 5.17-5.18) the latency I measured was around 22ms. So, no matter how hard i tried to reproduce the issue (out of my own academic curiosity); i couldn't get any latency in the order of magnitude mentioned (of around 1400ms) in the github post (for the Radeon 5000 series). Sadly it also seems that all recent Affinity releases have this restriction, so there is now way to enable HW acceleration in the application and test it, if they don't remove this restriction in a future release. We can only hope that they address this issue at least in their next major release (v.2). And i personally don't see a reason for it any longer, although i understand that there might have been an issue affecting several users at a given point in time using a specific system configuration (OS, driver, etc). Even if for specific configurations this might still be the case, i believe they should make it at least possible to enable the hardware acceleration (even if we need to acknowledge the risk and modify some configuration file) in any OpenCL capable device, so that we can also measure the performance from the benchmark as well as in real test cases in the application and provide feedback back to the developers.

July 3, 2022
301 replies
- 5
- amd
- radeon
- (and 2 more)
  Tagged with:
  - amd
  - radeon
  - rx
  - opencl

AMD Radeon RX Hardware Acceleration

kkoukos replied to Mark Ingram's topic in V1 Bugs found on Windows

Hello again, I understand the frustration and i hope this post will help towards solving this driver issue. For those users that somehow feel that buying Affinity Photo on an AMD setup doesn't really work out because of this driver issue (including myself), i would advise you to contact support and ask for a refund, if you purchased it no later than 2 weeks (i gave it a try and Serif promptly refunded me, so i must admit they reacted excellent on it). However, Affinity Photo is an great photo editor and i really like it's features and how it's organized, so this refund wasn't really the best possible outcome for me. I would prefer to hear that the issue is fixable; and, i will actually promptly buy it back (discount or not) when the developers confirm that there is a solution to this driver issue, in this or a future version of the software. My academic curiosity also led me one step further, to try to reproduce the problem using the source code that @Mark Ingram posted on github. Tests run on Windows 11, using two different drivers: Adrenalin 22.5.1 and Adrenalin 22.5.2 (latest release). The results for each driver are posted below: Compiling kernel for device AMD Radeon RX 6900 XT (OpenCL 2.0 AMD-APP (3380.6)): Run 1: 342.111ms Run 2: 71.863ms Run 3: 69.1439ms Run 4: 68.1999ms Run 5: 71.0923ms Run 6: 68.409ms Run 7: 67.6458ms Run 8: 67.3344ms Run 9: 71.4381ms Run 10: 67.9187ms Average: 96.5156ms Compiling kernel for device AMD Radeon RX 6900 XT (OpenCL 2.0 AMD-APP (3417.0)): Run 1: 350.606ms Run 2: 63.0614ms Run 3: 62.748ms Run 4: 65.0131ms Run 5: 68.3654ms Run 6: 63.8348ms Run 7: 63.1042ms Run 8: 64.7047ms Run 9: 66.8371ms Run 10: 67.9719ms Average: 93.6247ms They look a bit better compared to the Radeon Pro W6800, and actually much better than the RX 5700. Do you think that these new drivers may actually be solving the issue? Are those overheads acceptable for the application to work properly? (More information on the build setup can be provided directly to the developers if needed, omitted from this post for brevity). All the best

May 26, 2022
301 replies
- 4
- amd
- radeon
- (and 2 more)
  Tagged with:
  - amd
  - radeon
  - rx
  - opencl

AMD Radeon RX Hardware Acceleration

kkoukos replied to Mark Ingram's topic in V1 Bugs found on Windows

Hi, +1 Suffering from the issue. I recently changed my GPU from an old RX 480 to the latest and greatest RX 6900 XT and i noticed that OpenCL acceleration stopped working on Affinity Photo. I tried different versions of the application and the AMD Driver but without much luck. I had a look at the discussion and also had a look at the code you published at github. I understand that fixing performance of the OpenCL compiler within the driver would significantly improve the situation and i would certainly support you 100% that it needs to be fixed. However, i believe that Serif developers could make a workaround to fix the performance issue by simply moving all the overhead outside the critical path and into the initialization phase (startup) of the application (if not already doing so). Yes, i agree, the application would take a couple more seconds to load, but when the kernel is already pre-compiled at startup, and only an enqueue of the arguments and the kernel is required to do the job when requested (when the user makes something that requests the OpenCL kernel to execute), the overall user experience should be totally unaffected by this issue. Furthermore, I believe this would significantly improve the overall performance on all OpenCL capable devices. * My suggestion assumes that the execution performance of the OpenCL kernel is unaffected by the driver issue. I am also aware that this might sound trivial when looking at the "HelloWorld" example, but it might be way more complicated, time consuming and challenging when it comes to the real application. Nevertheless, I hope you take this feedback into consideration and i would be really happy to see the full potential of the latest Radeon GPUs within affinity again.

May 7, 2022
301 replies
- 4
- amd
- radeon
- (and 2 more)
  Tagged with:
  - amd
  - radeon
  - rx
  - opencl

Sign In

kkoukos

Posts

Joined

Last visited

Content Type

Profiles

Forums

Everything posted by kkoukos

AMD Radeon RX Hardware Acceleration

AMD Radeon RX Hardware Acceleration

AMD Radeon RX Hardware Acceleration

AMD Radeon RX Hardware Acceleration

AMD Radeon RX Hardware Acceleration

Browse

Activity

Affinity

Important Information