The branch adreno_tryout contains the Qualcomm-provided kernel and also a modified tuner to tune the local workgroup size. One thing suggested by the tutorial is using an OpenCL image for matrix B, but I didn't implement that. If not, we'll have to continue investigating. ![]() If so, I can work towards integration such a kernel properly. However, it is there to be able to find out if that kernel does fix the performance issues with CLBlast. This is a very hacky integration of that kernel and is in no means meant to be actually used. ![]() I've added a test branch ( adreno_tryout) in CLBlast to test the Qualcomm-provided kernel from the tutorial mentioned above.
0 Comments
Leave a Reply. |