index
:
forks/candle.git
main
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
candle-core
/
src
/
cuda_backend
Commit message (
Expand
)
Author
Age
Files
Lines
*
20241118 docs (#2629)
zachcp
2024-11-19
1
-0
/
+2
*
Support for UG kernels. (#2579)
Laurent Mazare
2024-10-27
1
-0
/
+21
*
Fix for cudnn bf16 conv2d. (#2535)
Laurent Mazare
2024-10-02
2
-10
/
+14
*
Add support for cuda streams. (#2532)
Laurent Mazare
2024-10-02
1
-0
/
+14
*
Update cudarc to 0.12. (#2451)
Laurent Mazare
2024-08-27
2
-2
/
+4
*
Fix the fast bf16 gemm cublas kernels. (#2274)
Laurent Mazare
2024-06-18
1
-5
/
+3
*
Add the layernorm specialized op. (#2212)
Laurent Mazare
2024-05-24
2
-1
/
+39
*
More efficient cuda implementation for ConvTranspose1d. (#2211)
Laurent Mazare
2024-05-24
1
-2
/
+73
*
Make it possible to use TF32 accumulation in F32 matmuls. (#2178)
Laurent Mazare
2024-05-11
1
-6
/
+61
*
Bump the version number to 0.5.1. (#2155)
Laurent Mazare
2024-05-03
1
-38
/
+0
*
F16/BF16 bugfix (bis). (#2143)
Laurent Mazare
2024-04-29
1
-14
/
+36
*
Bugfix the recent f16/bf16 changes. (#2142)
Laurent Mazare
2024-04-29
1
-8
/
+8
*
Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114)
MilkFather
2024-04-29
1
-2
/
+2
*
Add a toggle for F16/BF16 accumulation in gemm. (#2141)
Laurent Mazare
2024-04-29
1
-12
/
+125
*
Add StorageRef. (#2113)
Laurent Mazare
2024-04-23
1
-1
/
+38
*
Add a synchronize method to devices. (#2055)
Laurent Mazare
2024-04-14
1
-0
/
+5
*
Split the cuda error file. (#2003)
Laurent Mazare
2024-04-04
2
-65
/
+67
*
Relax the contiguous check for cuda kernels. (#2000)
Laurent Mazare
2024-04-03
1
-1
/
+6
*
Improve the handling of matmul with squeezed layouts. (#1998)
Laurent Mazare
2024-04-02
1
-0
/
+4
*
Backend refactoring. (#1966)
Laurent Mazare
2024-03-29
4
-0
/
+2576