index
:
forks/candle.git
main
summary
refs
log
tree
commit
diff
log msg
author
committer
range
path:
root
/
candle-kernels
Commit message (
Expand
)
Author
Age
Files
Lines
*
Bump the caret version to 0.8.2. (#2703)
Laurent Mazare
2025-01-07
1
-1
/
+1
*
Bump the crate version to 0.8.1. (#2662)
Laurent Mazare
2024-12-07
1
-1
/
+1
*
Import the ggml_cuda_dp4a function. (#2628)
Laurent Mazare
2024-11-19
1
-33
/
+44
*
Bump the crate version to 0.8.0. (#2612)
Laurent Mazare
2024-11-12
1
-1
/
+1
*
Improved launch config for layer-norm/rms-norm. (#2591)
Laurent Mazare
2024-11-04
1
-8
/
+6
*
Bump the crate version to 0.7.2. (#2517)
Laurent Mazare
2024-09-29
1
-1
/
+1
*
Move the candle version to 0.7.1. (#2495)
Laurent Mazare
2024-09-22
1
-1
/
+1
*
Bump the crate version. (#2491)
Laurent Mazare
2024-09-21
1
-1
/
+1
*
Bump the version to 0.6.1. (#2438)
Laurent Mazare
2024-08-22
1
-1
/
+1
*
Bump the crate version. (#2248)
Laurent Mazare
2024-06-05
1
-1
/
+1
*
Add the layernorm specialized op. (#2212)
Laurent Mazare
2024-05-24
1
-0
/
+84
*
More efficient cuda implementation for ConvTranspose1d. (#2211)
Laurent Mazare
2024-05-24
1
-0
/
+65
*
Bump the version number to 0.5.1. (#2155)
Laurent Mazare
2024-05-03
1
-1
/
+1
*
Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114)
MilkFather
2024-04-29
1
-0
/
+9
*
Add the cuda dequantize f16 kernels. (#2137)
Laurent Mazare
2024-04-28
1
-37
/
+75
*
Add argsort. (#2132)
Laurent Mazare
2024-04-27
2
-0
/
+89
*
Add more QMMV cuda kernels. (#2077)
Laurent Mazare
2024-04-18
1
-0
/
+324
*
Add the mmv kernels for small batch sizes. (#2075)
Laurent Mazare
2024-04-16
1
-10
/
+254
*
Faster kernels for quantized matmul on cuda (#2060)
Laurent Mazare
2024-04-15
1
-11
/
+118
*
Add the full quantized matmul kernels for cuda. (#2057)
Laurent Mazare
2024-04-14
1
-0
/
+1071
*
Add the rope THD kernel. (#2014)
Laurent Mazare
2024-04-05
1
-5
/
+43
*
Add support for "sign" on tensors (#2012)
Thomas Santerre
2024-04-04
1
-0
/
+9
*
Bumping the version number to 0.5.0. (#2009)
Laurent Mazare
2024-04-04
1
-1
/
+1
*
Relax the contiguous check for cuda kernels. (#2000)
Laurent Mazare
2024-04-03
1
-1
/
+1
*
More ggml cuda kernels (#1977)
Laurent Mazare
2024-04-01
1
-75
/
+1014
*
Ensure that the kernels get rebuilt on cuh changes. (#1954)
Laurent Mazare
2024-03-28
1
-0
/
+3
*
Use the new rope kernel in mistral. (#1937)
Laurent Mazare
2024-03-25
1
-2
/
+2
*
Contiguous variant of the rope kernel. (#1929)
Laurent Mazare
2024-03-25
1
-6
/
+34
*
Fast kernels for rotary embeddings. (#1928)
Laurent Mazare
2024-03-24
1
-0
/
+29
*
Add cast_bf16_x/cast_x_bf16 when CUDA_ARCH<800 but CUDA_VERSION >= 11000 (#1919)
yinqiwen
2024-03-23
1
-0
/
+12
*
Support scatter/index_add with i64 indices for f16 (#1915)
Daniƫl de Kok
2024-03-22
1
-0
/
+2
*
Custom op for RmsNorm (#1890)
Laurent Mazare
2024-03-21
1
-0
/
+65
*
Cuda backend optimization (#1886)
Laurent Mazare
2024-03-20
4
-7
/
+7
*
Optimize the cat operation on contiguous tensors (#1855)
Laurent Mazare
2024-03-17
1
-1
/
+29
*
Bump the crate versions to 0.4.2. (#1821)
Laurent Mazare
2024-03-08
1
-1
/
+1
*
Add a cuda kernel for dequantizing q8_0. (#1804)
Laurent Mazare
2024-03-05
1
-0
/
+24
*
Handle Q5_0 and Q5_1 quants in cuda.
laurent
2024-02-29
1
-7
/
+9
*
Bump the version number to 0.4.1. (#1768)
Laurent Mazare
2024-02-27
1
-1
/
+1
*
Cuda kernel for dequantizing q8k. (#1760)
Laurent Mazare
2024-02-26
1
-0
/
+35
*
Cuda acceleration for quantized model. (#1754)
Laurent Mazare
2024-02-25
2
-0
/
+1537
*
Fix the silu cuda kernel. (#1710)
Laurent Mazare
2024-02-14
1
-1
/
+1
*
feat: add silu activation function (#1706)
OlivierDehaene
2024-02-14
1
-0
/
+9
*
ConvTranspose1d cuda support. (#1697)
Laurent Mazare
2024-02-12
1
-2
/
+77
*
Bump the crate version to 0.4.0. (#1658)
Laurent Mazare
2024-02-04
1
-1
/
+1
*
Moving to a proper build crate `bindgen_cuda`. (#1531)
Nicolas Patry
2024-01-07
2
-242
/
+5
*
Bump the crate version to 0.3.3. (#1490)
Laurent Mazare
2023-12-28
1
-1
/
+1
*
Bump the crate version to 0.3.2. (#1452)
Laurent Mazare
2023-12-17
1
-1
/
+1
*
Update for 0.3.1. (#1324)
Laurent Mazare
2023-11-11
1
-2
/
+2
*
Rework the cuda casting bits. (#1112)
Laurent Mazare
2023-10-17
1
-31
/
+54
*
feat: parse Cuda compute cap from env (#1066)
OlivierDehaene
2023-10-16
2
-89
/
+110
[next]