summaryrefslogtreecommitdiff
path: root/candle-kernels
Commit message (Expand)AuthorAgeFilesLines
* Add a cuda kernel for upsampling. (#441)Laurent Mazare2023-08-141-0/+62
* Add a cuda kernel for avg-pool2d. (#440)Laurent Mazare2023-08-141-3/+157
* Add a naive conv2d cuda kernel. (#438)Laurent Mazare2023-08-141-8/+93
* Compat windows.Nicolas Patry2023-08-101-0/+9
* This is duplicated code on Cuda 12.2.Nicolas Patry2023-08-101-18/+0
* Add the license files. (#335)Laurent Mazare2023-08-071-1/+1
* Add the recip op + use it in stable-diffusion. (#331)Laurent Mazare2023-08-061-0/+4
* Update the repo location. (#305)Laurent Mazare2023-08-021-1/+1
* Remove the embedding ops in favor of index-select. (#299)Laurent Mazare2023-08-021-40/+0
* Cuda support for the mnist training. (#277)Laurent Mazare2023-07-292-7/+118
* Support for where-cond on cuda for u8 and u32. (#274)Laurent Mazare2023-07-291-8/+15
* Add some flash attn test (#253)Laurent Mazare2023-07-261-2/+2
* Add a test for scatter add. (#238)Laurent Mazare2023-07-251-5/+3
* Cuda kernels for IndexAdd/ScatterAdd. (#236)Laurent Mazare2023-07-242-1/+102
* Indexing cuda (#235)Laurent Mazare2023-07-241-8/+119
* Add some cmp tests. (#233)Laurent Mazare2023-07-242-10/+56
* Cleanup some todos. (#226)Laurent Mazare2023-07-231-109/+83
* Revert "Add the layer norm files. (#222)" (#223)Laurent Mazare2023-07-229-1532/+0
* Add the layer norm files. (#222)Laurent Mazare2023-07-229-0/+1532
* Cuda kernels for fast min/max reductions (#203)Laurent Mazare2023-07-192-9/+117
* Add the elu cuda kernel. (#114)Laurent Mazare2023-07-101-0/+38
* Make it easier to use whisper samples from the repo. (#112)Laurent Mazare2023-07-081-12/+12
* Cuda kernel for the conv1d op (#111)Laurent Mazare2023-07-082-0/+75
* Sketch a fast cuda kernel for reduce-sum. (#109)Laurent Mazare2023-07-081-0/+67
* Tweak the include order to include math.h first. (#100)Laurent Mazare2023-07-071-1/+1
* Include the math.h file to get access to constants. (#99)Laurent Mazare2023-07-071-0/+2
* Fixing the cached build.Nicolas Patry2023-07-051-113/+97
* Minor tweaks.laurent2023-07-031-0/+3
* Bugfix: remove the u8/bf16 conversion kernel as it is ambiguous.laurent2023-06-301-1/+1
* Add the kernels.laurent2023-06-305-0/+19
* Avoid some cast kernels.laurent2023-06-291-2/+2
* Add the bf16 cuda kernels.laurent2023-06-299-1/+67
* Rerun on new files.Nicolas Patry2023-06-291-0/+1
* Fixing kernel cache (a bit brutal for now, but if build triggers,Nicolas Patry2023-06-291-0/+8
* Add the relu op.laurent2023-06-281-4/+13
* Fix two cuda bugs (matmul and where_cond).laurent2023-06-271-1/+1
* Refactor the hierarchy.Nicolas Patry2023-06-2715-0/+963