summaryrefslogtreecommitdiff
path: root/candle-kernels
Commit message (Expand)AuthorAgeFilesLines
* Bump the caret version to 0.8.2. (#2703)Laurent Mazare2025-01-071-1/+1
* Bump the crate version to 0.8.1. (#2662)Laurent Mazare2024-12-071-1/+1
* Import the ggml_cuda_dp4a function. (#2628)Laurent Mazare2024-11-191-33/+44
* Bump the crate version to 0.8.0. (#2612)Laurent Mazare2024-11-121-1/+1
* Improved launch config for layer-norm/rms-norm. (#2591)Laurent Mazare2024-11-041-8/+6
* Bump the crate version to 0.7.2. (#2517)Laurent Mazare2024-09-291-1/+1
* Move the candle version to 0.7.1. (#2495)Laurent Mazare2024-09-221-1/+1
* Bump the crate version. (#2491)Laurent Mazare2024-09-211-1/+1
* Bump the version to 0.6.1. (#2438)Laurent Mazare2024-08-221-1/+1
* Bump the crate version. (#2248)Laurent Mazare2024-06-051-1/+1
* Add the layernorm specialized op. (#2212)Laurent Mazare2024-05-241-0/+84
* More efficient cuda implementation for ConvTranspose1d. (#2211)Laurent Mazare2024-05-241-0/+65
* Bump the version number to 0.5.1. (#2155)Laurent Mazare2024-05-031-1/+1
* Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114)MilkFather2024-04-291-0/+9
* Add the cuda dequantize f16 kernels. (#2137)Laurent Mazare2024-04-281-37/+75
* Add argsort. (#2132)Laurent Mazare2024-04-272-0/+89
* Add more QMMV cuda kernels. (#2077)Laurent Mazare2024-04-181-0/+324
* Add the mmv kernels for small batch sizes. (#2075)Laurent Mazare2024-04-161-10/+254
* Faster kernels for quantized matmul on cuda (#2060)Laurent Mazare2024-04-151-11/+118
* Add the full quantized matmul kernels for cuda. (#2057)Laurent Mazare2024-04-141-0/+1071
* Add the rope THD kernel. (#2014)Laurent Mazare2024-04-051-5/+43
* Add support for "sign" on tensors (#2012)Thomas Santerre2024-04-041-0/+9
* Bumping the version number to 0.5.0. (#2009)Laurent Mazare2024-04-041-1/+1
* Relax the contiguous check for cuda kernels. (#2000)Laurent Mazare2024-04-031-1/+1
* More ggml cuda kernels (#1977)Laurent Mazare2024-04-011-75/+1014
* Ensure that the kernels get rebuilt on cuh changes. (#1954)Laurent Mazare2024-03-281-0/+3
* Use the new rope kernel in mistral. (#1937)Laurent Mazare2024-03-251-2/+2
* Contiguous variant of the rope kernel. (#1929)Laurent Mazare2024-03-251-6/+34
* Fast kernels for rotary embeddings. (#1928)Laurent Mazare2024-03-241-0/+29
* Add cast_bf16_x/cast_x_bf16 when CUDA_ARCH<800 but CUDA_VERSION >= 11000 (#1919)yinqiwen2024-03-231-0/+12
* Support scatter/index_add with i64 indices for f16 (#1915)Daniƫl de Kok2024-03-221-0/+2
* Custom op for RmsNorm (#1890)Laurent Mazare2024-03-211-0/+65
* Cuda backend optimization (#1886)Laurent Mazare2024-03-204-7/+7
* Optimize the cat operation on contiguous tensors (#1855)Laurent Mazare2024-03-171-1/+29
* Bump the crate versions to 0.4.2. (#1821)Laurent Mazare2024-03-081-1/+1
* Add a cuda kernel for dequantizing q8_0. (#1804)Laurent Mazare2024-03-051-0/+24
* Handle Q5_0 and Q5_1 quants in cuda.laurent2024-02-291-7/+9
* Bump the version number to 0.4.1. (#1768)Laurent Mazare2024-02-271-1/+1
* Cuda kernel for dequantizing q8k. (#1760)Laurent Mazare2024-02-261-0/+35
* Cuda acceleration for quantized model. (#1754)Laurent Mazare2024-02-252-0/+1537
* Fix the silu cuda kernel. (#1710)Laurent Mazare2024-02-141-1/+1
* feat: add silu activation function (#1706)OlivierDehaene2024-02-141-0/+9
* ConvTranspose1d cuda support. (#1697)Laurent Mazare2024-02-121-2/+77
* Bump the crate version to 0.4.0. (#1658)Laurent Mazare2024-02-041-1/+1
* Moving to a proper build crate `bindgen_cuda`. (#1531)Nicolas Patry2024-01-072-242/+5
* Bump the crate version to 0.3.3. (#1490)Laurent Mazare2023-12-281-1/+1
* Bump the crate version to 0.3.2. (#1452)Laurent Mazare2023-12-171-1/+1
* Update for 0.3.1. (#1324)Laurent Mazare2023-11-111-2/+2
* Rework the cuda casting bits. (#1112)Laurent Mazare2023-10-171-31/+54
* feat: parse Cuda compute cap from env (#1066)OlivierDehaene2023-10-162-89/+110