OpenCL on Linux: state of AMD drivers is now worse than it was back in the days of fglrx

In the past years, at the same time AMD graphic cards kept their competitiveness, the drivers for those cards have become the best graphics drivers on Linux on almost every aspect (reliability, performance, stability, integration…), except for one task where the Linux drivers are a disaster: GPU compute using OpenCL.

graphics cards test

Testing and validating GPUs

AMD drivers are excellent for Vulkan, OpenGL, VA-API…

On graphic side, OpenGL support from open radeonsi Mesa driver is excellent since a long time. It has been multiple years since the free driver is recommended in comparison to the proprietary one. The 4.6 version and most of extensions are implemented since a long time, and the driver now supports newest OpenGL version with compatibility profile which were for a long time a specificity of the proprietary driver and was a functionnality specific to some industrials needs.

On Vulkan side, the new low level graphic API, the open Mesa driver RADV works very well, and the open AMD driver AMDVLK also works very well. It’s even possible to install both of them and to chose the more performant one for this or that use case using a simple environment variable. It also exists a third driver which is the amdgpu-pro variant of the AMDVLK driver that can sometime brings some features before the others. To rephrase it, the AMD card user has the choice, and all the options are good.

On the side of acceleration of video coding and decoding using the graphic card, the VDPAU and VA-API APIs are very well supported.

Thanks to the amdgpu platform, all those drivers can live together.

The Linux amdgpu driver is generally good, I know some specific situations where it can be better, but for the huge majority of users, everything works very well. Also, this kernerl driver, and the radeonsi, radv, vdpay, va-api drivers are integrated in distributions without needing third-party repository. All those features are accessible through the installation of official packages from the distribution, when those packages aren’t already installed by default.

But it remains one topic where the quality of AMD drivers isn’t there, and is even a disaster: compute with OpenCL API.

OpenCL is usable by some applications like Darktable (digital photo development), Blender (3D modeling and render), LuxRender (3D render rat tracing engine), DaVinci Resolve (video edition), Natron (video composition), and even GIMP (2D drawing and image processing) and LibreOffice Calc (cell sheet), and others.

Most of the time, those applications can work without OpenCL, but the performance gain brought by OpenCL is very welcome, either by distributing the compute between the CPU and the GPU instead of relying only on CPU, or by using a GPU that could be more efficient than a CPU for a given task.

AMD OpenCL drivers never had been in such bad situation

The state of OpenCL on AMD GPUs on Linux is now worst than it was at fglrx time (up to 2015).

It exists many OpenCL drivers for AMD, targetting this or that architecture and cohabitation is sometime difficult. Sometime drivers can't work if another card from another generation is plugged in. Sometime different drivers for different cards attempt to install themselves using the same file names.

This ROCm driver can also recognize some hardware, attempt to do something and wreck the kernel in a way users are told to reboot their computer.

The ROCm driver replaced the PAL driver without being an alternative, neither in purpose, neither in hardware support, neither in implementation, neither in the ability to fulfill the needs.

ROCm deprecates very quickly the hardware. GCN2 architecture only worked for some months years ago (and it only supported one card), and GCN4 (Polaris) is reported to have stopped working some time ago (only two were supported at the time). Currently only three chips are said to be supported by ROCm (Vega 10, Vega 7nm, MI100).

On community side, Clover only works for some usages (example: LuxRender) but not for others (example: Darktable) because image processing isn't implemented yet. With Clover LuxMark is twice faster than Orca and ROCm on GCN (probably the same with PAL).

Unfortunately I noticed that Clover has become broken during the last years. The last working Ubuntu prebuilt bersion I found is from 2019. I wrote a script to compile the latest version of Clover with the latest version of LLVM and I can confirm that is broken upstream. Oh, and for the latest working versions, they lost between 15 and 20% of performance since 2018.

Clover is also the only existing driver working with TeraScale 2 and 3 graphics cardsi (and a couple of TeraScale 1 could theoretically be be implemented but this isn't done), because the last official AMD driver fot them was Radeon Software (fglrx). It is now abandonned since years (last version was in 2015, and the one for TeraScale 1 dates back from 2013) and is incompatible with amdgpu platform and recent kernels.

Speaking of fglrx, I have seen those last years people reporting they were still using fglrx (and then environments from 2015) to use OpenCL on GCN hardware, that’s also why I often recommended my scripts to help those people to update their systems.

My I ♥ Compute! initiative to maintain scripts, quirks, documentation and track issues

I have strong interest in OpenCL because I use a lot Darktable (photographic development software), and for that, since years, I maintain scripts and quirks to keep this feature, for my own use first, then for others I share them to.

Years after years, I have wrote and rewrote many scripts. And then I started to feelm the need to track at the same place the tickets I was opening there and there. I then opened a repository on GitLab I named “I ♥ Compute!” to store there my documentation, my scripts, and to get a unified issue tracker.

The last versions of my script for Ubuntu (may work or partially work on Debian) allows to install on Ubuntu 20.04 LTS and Ubuntu 21.20, and at the same time:

That last pilot is only the CPU part which was provided with fglrx and it allows an OpenCL software to distribute OpenCL tasks to both CPU and GPU to involve all the available hardware. The GPU part of fglrx is unusable today anyway.

It must be known that some drivers have bugs. For example, the last working version of Orca becomes unusable if there is a single card using the radeon driver (instead of amdgpu) in the system, so if you have one card driven by radeon (instead of amdgpu) and another driven by amdgpu, Orca will not work with the one driven by amdgpu because of the other one being there, even if it would work if the other one would be physically removed from the system.

As I wrote before, ROCm can wreck the kernel if it attempts to supports some GCN cards, so if your system has a GCN card that is affected like the R9 390X and another card more recent like a Vega one and you install ROCm to support the Vega one, you will get problems. Oh, and I was able to verify not all Vegas (even PCIe ones) are working with ROCm (I had to use PAL once). Somes are supported by PAL but PAL has been removed. I have already said it was messy, right?

There is currently a script to install AMD OpenCL drivers on Ubuntu, a script to build Cloved by oneself, and a script to build clvk by oneself (a project to implement OpenCL on Vulkan) but for now none of them are working (even if Clover partially worked before in the past).

My scripts come with built-in help (–help).

You’ll find a summary of my knowledge on the repository I ♥ Compute! on GitLab:

This not only talk about AMD but the AMD situation is the worst of all. The Nvidia drivers (even if proprietary and not integrated) usually supports OpenCL and Intel drivers stats to become usable (even if there, there is a driver proliferation, it’s less bad).

So you’ll allso find some scripts. I make clear I don’t have the means to maintain backward compatibility. If you use one of my scripts to install a driver, make sure you’ll backup somewere the script you used to unnstall the version you installed. I don’t have the means ti make sure new versions of scripts uninstall what have been installed with an older version.

You’ll also find an issue tracker where I track at the same place the problems I face and I report elsewhere (or that have no other place to be reported to).

Hardware donation? funding? professional service?

I not only test AMD cards, even if that’s my priority (because that’s what I use).

On AMD side I already own this hardware: TeraScale 1 (AGP, PCI, PCIe), TeraScale 2, TeraScale 3 (PCIe), GCN 1, GCN 2 (PCIe), GCN 3, GCN 5 (integrated). I also own the related motherboards (AGP, PCIe 2, 3, 4).

I don’t have GCN4 hardware (Polaris), RDNA1/CDNA1 (Navi), CDNA1, RDNA2/CDNA2 (Big Navi), and specifically no AMD hardware supported by ROCr. Non-integrated GCN 3 and GCN 5 cards would be useful too (the ones embedded in APUs may have their own specific limitations).

More details about the hardware I own and instructions for possible hardware donations are given there.

I specifically need a GCN 4 card to be able to test if the latest version of Orca that doesn’t support anymore GCN 1, 2 and 3 supports GCN 4 anyway (otherwise AMD would distribute it for nothing).

This hardware is part of the hardware I use to test and validate the Unvanquished game, with this kind of table (here for OpenGL). It would be useful to have a similar table for OpenCL but this is too much work.

For all those years I have provided OpenCL scripts and provided my advices around me, but now that the situation is getting even worst, I thought it would be a good idea to offer the opportunity to make donations (hardware and money). Because I’m actually my own employer, I may eventually allocate more time to this initiative if I get some financial donations. So I added some links to make donations on the page.

And since I have my enterprise, some may be interested in some specific services? For example in the past and in professional context I wrote a script to install the AMD OpenCL PAL driver on Mageia to drive a Vega card to accelerate Blender. I don’t have the means to maintain this kind of script, but maybe there is a need.

So I told on my professional website that I offer my services to get OpenCL working with AMD for professional customers.

The last word

To sum it up, without using outdated software AMD does not support anymore and does not include in its driver suite, the current state of OpenCL support with AMD hardware on Linux seems to be ROCm of which the documentation only lists three chips as officially supported and it is said that graphical applications are not supported.

There are some attempts to make OpenCL run over Vulkan, like clvk relying on clspv, maybe that’s the future? For now this doesn’t work yet anyway.

AMD may perhaps worry about the Intel attempt to deliver PCI express cards while having an OpenCL support fully open and integrated in repositories.

If AMD needs some help, a priori I know how to make their drivers coexist…

And you, what’s your experience with OpenCL, Linux and AMD?

-- Thomas Debesse aka illwieckz for rebatir.fr.