summaryrefslogtreecommitdiff
path: root/candle-kernels
Commit message (Expand)AuthorAgeFilesLines
* Bump the version number to 0.4.1. (#1768)Laurent Mazare2024-02-271-1/+1
* Cuda kernel for dequantizing q8k. (#1760)Laurent Mazare2024-02-261-0/+35
* Cuda acceleration for quantized model. (#1754)Laurent Mazare2024-02-252-0/+1537
* Fix the silu cuda kernel. (#1710)Laurent Mazare2024-02-141-1/+1
* feat: add silu activation function (#1706)OlivierDehaene2024-02-141-0/+9
* ConvTranspose1d cuda support. (#1697)Laurent Mazare2024-02-121-2/+77
* Bump the crate version to 0.4.0. (#1658)Laurent Mazare2024-02-041-1/+1
* Moving to a proper build crate `bindgen_cuda`. (#1531)Nicolas Patry2024-01-072-242/+5
* Bump the crate version to 0.3.3. (#1490)Laurent Mazare2023-12-281-1/+1
* Bump the crate version to 0.3.2. (#1452)Laurent Mazare2023-12-171-1/+1
* Update for 0.3.1. (#1324)Laurent Mazare2023-11-111-2/+2
* Rework the cuda casting bits. (#1112)Laurent Mazare2023-10-171-31/+54
* feat: parse Cuda compute cap from env (#1066)OlivierDehaene2023-10-162-89/+110
* fix: fix index_select cuda kernel for src target dim different than ids dim w...Gonzalo2023-10-051-6/+8
* Add the rounding operators. (#1030)Laurent Mazare2023-10-042-0/+24
* Bump the version to 0.3.0. (#1014)Laurent Mazare2023-10-011-1/+1
* fix: add missing gpu fill_* (#996)Gonzalo2023-09-291-0/+9
* Optimize the index-select cuda kernel. (#976)Laurent Mazare2023-09-281-14/+8
* Add the missing kernel. (#955)Laurent Mazare2023-09-241-0/+1
* cuda cast i64 (#925)Gonzalo2023-09-211-0/+10
* Add an erf based gelu op (#900)Laurent Mazare2023-09-192-0/+25
* Bump the crate versions to v0.2.3. (#886)Laurent Mazare2023-09-181-1/+1
* Add `CANDLE_NVCC_CCBIN` support for `candle-kernels`, and eliminate warning. ...Charles Lew2023-09-131-2/+9
* Bump the crate version + update the changelog. (#822)Laurent Mazare2023-09-121-1/+1
* im2col version of the conv1d kernel. (#815)Laurent Mazare2023-09-111-1/+70
* im2col based conv2d (#802)Laurent Mazare2023-09-101-0/+89
* Add a dedicated cuda kernel for softmax. (#746)Laurent Mazare2023-09-051-0/+55
* Add tanh. (#675)Laurent Mazare2023-08-301-0/+4
* Add some documentation. (#673)Laurent Mazare2023-08-301-1/+1
* Support dilation in conv-transpose2d. (#671)Laurent Mazare2023-08-301-3/+3
* Add the powf op. (#664)Laurent Mazare2023-08-291-0/+4
* Fix the dilated convolutions. (#659)Laurent Mazare2023-08-291-2/+2
* Dilated convolutions (#657)Laurent Mazare2023-08-291-6/+12
* Cuda conv transpose (#645)Laurent Mazare2023-08-281-0/+88
* Bump the crate version + update CHANGELOG. (#628)Laurent Mazare2023-08-271-1/+1
* Let's keep the dirty code on its own.Nicolas Patry2023-08-251-2/+25
* Intermediary float cast is necessary for cuda 11.8Nicolas Patry2023-08-251-2/+2
* `static_cast` ?Nicolas Patry2023-08-251-2/+2
* Different casting ?Nicolas Patry2023-08-251-2/+2
* Repairing cast bf16/f16Nicolas Patry2023-08-251-4/+4
* Add to the cuda example a reproduction of the issue. (#579)Laurent Mazare2023-08-241-10/+11
* Add some group parameter to convolutions. (#566)Laurent Mazare2023-08-231-1/+1
* Add support for i64 (#563)Laurent Mazare2023-08-236-1/+65
* Add a yolo-v3 example. (#528)Laurent Mazare2023-08-201-0/+12
* Bump the crates version to 0.1.2. (#522)Laurent Mazare2023-08-201-1/+1
* Rename vec-dot to vec-ops. (#449)Laurent Mazare2023-08-151-1/+1
* Add a cuda kernel for upsampling. (#441)Laurent Mazare2023-08-141-0/+62
* Add a cuda kernel for avg-pool2d. (#440)Laurent Mazare2023-08-141-3/+157
* Add a naive conv2d cuda kernel. (#438)Laurent Mazare2023-08-141-8/+93
* Compat windows.Nicolas Patry2023-08-101-0/+9