summaryrefslogtreecommitdiff
path: root/candle-kernels
Commit message (Expand)AuthorAgeFilesLines
* Bump the crate version + update the changelog. (#822)Laurent Mazare2023-09-121-1/+1
* im2col version of the conv1d kernel. (#815)Laurent Mazare2023-09-111-1/+70
* im2col based conv2d (#802)Laurent Mazare2023-09-101-0/+89
* Add a dedicated cuda kernel for softmax. (#746)Laurent Mazare2023-09-051-0/+55
* Add tanh. (#675)Laurent Mazare2023-08-301-0/+4
* Add some documentation. (#673)Laurent Mazare2023-08-301-1/+1
* Support dilation in conv-transpose2d. (#671)Laurent Mazare2023-08-301-3/+3
* Add the powf op. (#664)Laurent Mazare2023-08-291-0/+4
* Fix the dilated convolutions. (#659)Laurent Mazare2023-08-291-2/+2
* Dilated convolutions (#657)Laurent Mazare2023-08-291-6/+12
* Cuda conv transpose (#645)Laurent Mazare2023-08-281-0/+88
* Bump the crate version + update CHANGELOG. (#628)Laurent Mazare2023-08-271-1/+1
* Let's keep the dirty code on its own.Nicolas Patry2023-08-251-2/+25
* Intermediary float cast is necessary for cuda 11.8Nicolas Patry2023-08-251-2/+2
* `static_cast` ?Nicolas Patry2023-08-251-2/+2
* Different casting ?Nicolas Patry2023-08-251-2/+2
* Repairing cast bf16/f16Nicolas Patry2023-08-251-4/+4
* Add to the cuda example a reproduction of the issue. (#579)Laurent Mazare2023-08-241-10/+11
* Add some group parameter to convolutions. (#566)Laurent Mazare2023-08-231-1/+1
* Add support for i64 (#563)Laurent Mazare2023-08-236-1/+65
* Add a yolo-v3 example. (#528)Laurent Mazare2023-08-201-0/+12
* Bump the crates version to 0.1.2. (#522)Laurent Mazare2023-08-201-1/+1
* Rename vec-dot to vec-ops. (#449)Laurent Mazare2023-08-151-1/+1
* Add a cuda kernel for upsampling. (#441)Laurent Mazare2023-08-141-0/+62
* Add a cuda kernel for avg-pool2d. (#440)Laurent Mazare2023-08-141-3/+157
* Add a naive conv2d cuda kernel. (#438)Laurent Mazare2023-08-141-8/+93
* Compat windows.Nicolas Patry2023-08-101-0/+9
* This is duplicated code on Cuda 12.2.Nicolas Patry2023-08-101-18/+0
* Add the license files. (#335)Laurent Mazare2023-08-071-1/+1
* Add the recip op + use it in stable-diffusion. (#331)Laurent Mazare2023-08-061-0/+4
* Update the repo location. (#305)Laurent Mazare2023-08-021-1/+1
* Remove the embedding ops in favor of index-select. (#299)Laurent Mazare2023-08-021-40/+0
* Cuda support for the mnist training. (#277)Laurent Mazare2023-07-292-7/+118
* Support for where-cond on cuda for u8 and u32. (#274)Laurent Mazare2023-07-291-8/+15
* Add some flash attn test (#253)Laurent Mazare2023-07-261-2/+2
* Add a test for scatter add. (#238)Laurent Mazare2023-07-251-5/+3
* Cuda kernels for IndexAdd/ScatterAdd. (#236)Laurent Mazare2023-07-242-1/+102
* Indexing cuda (#235)Laurent Mazare2023-07-241-8/+119
* Add some cmp tests. (#233)Laurent Mazare2023-07-242-10/+56
* Cleanup some todos. (#226)Laurent Mazare2023-07-231-109/+83
* Revert "Add the layer norm files. (#222)" (#223)Laurent Mazare2023-07-229-1532/+0
* Add the layer norm files. (#222)Laurent Mazare2023-07-229-0/+1532
* Cuda kernels for fast min/max reductions (#203)Laurent Mazare2023-07-192-9/+117
* Add the elu cuda kernel. (#114)Laurent Mazare2023-07-101-0/+38
* Make it easier to use whisper samples from the repo. (#112)Laurent Mazare2023-07-081-12/+12
* Cuda kernel for the conv1d op (#111)Laurent Mazare2023-07-082-0/+75
* Sketch a fast cuda kernel for reduce-sum. (#109)Laurent Mazare2023-07-081-0/+67
* Tweak the include order to include math.h first. (#100)Laurent Mazare2023-07-071-1/+1
* Include the math.h file to get access to constants. (#99)Laurent Mazare2023-07-071-0/+2
* Fixing the cached build.Nicolas Patry2023-07-051-113/+97