summaryrefslogtreecommitdiff
path: root/candle-core/src/cuda_backend
Commit message (Expand)AuthorAgeFilesLines
* 20241118 docs (#2629)zachcp2024-11-191-0/+2
* Support for UG kernels. (#2579)Laurent Mazare2024-10-271-0/+21
* Fix for cudnn bf16 conv2d. (#2535)Laurent Mazare2024-10-022-10/+14
* Add support for cuda streams. (#2532)Laurent Mazare2024-10-021-0/+14
* Update cudarc to 0.12. (#2451)Laurent Mazare2024-08-272-2/+4
* Fix the fast bf16 gemm cublas kernels. (#2274)Laurent Mazare2024-06-181-5/+3
* Add the layernorm specialized op. (#2212)Laurent Mazare2024-05-242-1/+39
* More efficient cuda implementation for ConvTranspose1d. (#2211)Laurent Mazare2024-05-241-2/+73
* Make it possible to use TF32 accumulation in F32 matmuls. (#2178)Laurent Mazare2024-05-111-6/+61
* Bump the version number to 0.5.1. (#2155)Laurent Mazare2024-05-031-38/+0
* F16/BF16 bugfix (bis). (#2143)Laurent Mazare2024-04-291-14/+36
* Bugfix the recent f16/bf16 changes. (#2142)Laurent Mazare2024-04-291-8/+8
* Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114)MilkFather2024-04-291-2/+2
* Add a toggle for F16/BF16 accumulation in gemm. (#2141)Laurent Mazare2024-04-291-12/+125
* Add StorageRef. (#2113)Laurent Mazare2024-04-231-1/+38
* Add a synchronize method to devices. (#2055)Laurent Mazare2024-04-141-0/+5
* Split the cuda error file. (#2003)Laurent Mazare2024-04-042-65/+67
* Relax the contiguous check for cuda kernels. (#2000)Laurent Mazare2024-04-031-1/+6
* Improve the handling of matmul with squeezed layouts. (#1998)Laurent Mazare2024-04-021-0/+4
* Backend refactoring. (#1966)Laurent Mazare2024-03-294-0/+2576