summaryrefslogtreecommitdiff
path: root/candle-kernels
Commit message (Collapse)AuthorAgeFilesLines
* Rework the cuda casting bits. (#1112)Laurent Mazare2023-10-171-31/+54
|
* feat: parse Cuda compute cap from env (#1066)OlivierDehaene2023-10-162-89/+110
| | | | | | | | | * feat: add support for multiple compute caps * Revert to one compute cap * fmt * fix
* fix: fix index_select cuda kernel for src target dim different than ids dim ↵Gonzalo2023-10-051-6/+8
| | | | | | | when selecting dim > 0 (#1037) * fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 * cargo fmt
* Add the rounding operators. (#1030)Laurent Mazare2023-10-042-0/+24
| | | | | | | * Add the rounding operators. * Avoid tracking gradients for the rounding operations. * Add some rounding tests.
* Bump the version to 0.3.0. (#1014)Laurent Mazare2023-10-011-1/+1
| | | | | * Bump the version to 0.3.0. * Changelog update.
* fix: add missing gpu fill_* (#996)Gonzalo2023-09-291-0/+9
|
* Optimize the index-select cuda kernel. (#976)Laurent Mazare2023-09-281-14/+8
|
* Add the missing kernel. (#955)Laurent Mazare2023-09-241-0/+1
|
* cuda cast i64 (#925)Gonzalo2023-09-211-0/+10
|
* Add an erf based gelu op (#900)Laurent Mazare2023-09-192-0/+25
| | | | | | | * Erf based gelu. * Add the erf backed gelu. * Test the new gelu op (which is not gelu_new).
* Bump the crate versions to v0.2.3. (#886)Laurent Mazare2023-09-181-1/+1
| | | | | * Bump the crate version. * Also update the python bindings.
* Add `CANDLE_NVCC_CCBIN` support for `candle-kernels`, and eliminate warning. ↵Charles Lew2023-09-131-2/+9
| | | | (#836)
* Bump the crate version + update the changelog. (#822)Laurent Mazare2023-09-121-1/+1
|
* im2col version of the conv1d kernel. (#815)Laurent Mazare2023-09-111-1/+70
| | | | | * im2col version of the cuda conv1d kernel. * im2col version of the conv1d cpu kernel.
* im2col based conv2d (#802)Laurent Mazare2023-09-101-0/+89
| | | | | | | | | | | | | * im2col implementation for conv2d. * Fix for the im2col implementation to match the current conv2d. * Small optimization. * Add a cuda kernel. * Handle arbitrary layouts. * Im2Col cuda code.
* Add a dedicated cuda kernel for softmax. (#746)Laurent Mazare2023-09-051-0/+55
|
* Add tanh. (#675)Laurent Mazare2023-08-301-0/+4
| | | | | | | * Add tanh. * Use tanh in the lstm block. * Add a test for tanh forward and backward passes.
* Add some documentation. (#673)Laurent Mazare2023-08-301-1/+1
| | | | | * Add some documentation. * Bump the crate version.
* Support dilation in conv-transpose2d. (#671)Laurent Mazare2023-08-301-3/+3
|
* Add the powf op. (#664)Laurent Mazare2023-08-291-0/+4
| | | | | | | * Add the powf op. * Cuda kernels and backprop. * Add a test.
* Fix the dilated convolutions. (#659)Laurent Mazare2023-08-291-2/+2
|
* Dilated convolutions (#657)Laurent Mazare2023-08-291-6/+12
| | | | | | | | | | | | | | | | | | | * Add the dilation parameter. * Restore the basic optimizer example. * Dilation support in cudnn. * Use the dilation parameter in the cpu backend. * More dilation support. * No support for dilation in transposed convolutions. * Add dilation to a test. * Remove a print. * Helper function.
* Cuda conv transpose (#645)Laurent Mazare2023-08-281-0/+88
| | | | | | | * Cuda kernel for conv-transpose. * Fix the cuda kernel. * Fix the tests.
* Bump the crate version + update CHANGELOG. (#628)Laurent Mazare2023-08-271-1/+1
|
* Let's keep the dirty code on its own.Nicolas Patry2023-08-251-2/+25
|
* Intermediary float cast is necessary for cuda 11.8Nicolas Patry2023-08-251-2/+2
|
* `static_cast` ?Nicolas Patry2023-08-251-2/+2
|
* Different casting ?Nicolas Patry2023-08-251-2/+2
|
* Repairing cast bf16/f16Nicolas Patry2023-08-251-4/+4
|
* Add to the cuda example a reproduction of the issue. (#579)Laurent Mazare2023-08-241-10/+11
| | | | | | | | | | | | | * Add to the cuda example a reproduction of the issue. * Tweak. * Add a test using non-square matrixes. * Fix the conv2d kernel. * Display the error. * And tweak the comment.
* Add some group parameter to convolutions. (#566)Laurent Mazare2023-08-231-1/+1
| | | | | | | | | | | | | * Add some group parameter to convolutions. * Avoid some unnecessary groups checks. * Move the tensor convolution bits. * Properh handling of groups. * Bump the crate version. * And add a changelog.
* Add support for i64 (#563)Laurent Mazare2023-08-236-1/+65
| | | | | * Add the i64 dtype. * Adapt the cuda kernels.
* Add a yolo-v3 example. (#528)Laurent Mazare2023-08-201-0/+12
| | | | | | | | | | | | | | | * Add a couple functions required for yolo. * Add the yolo-v3 example. * Add minimum and maximum. * Use the newly introduced maximum. * Cuda support for min/max + add some testing. * Allow for more tests to work with accelerate. * Fix a typo.
* Bump the crates version to 0.1.2. (#522)Laurent Mazare2023-08-201-1/+1
|
* Rename vec-dot to vec-ops. (#449)Laurent Mazare2023-08-151-1/+1
| | | | | | | * Rename vec-dot to vec-ops. * Also bump the crate version. * Add a currently empty readme.
* Add a cuda kernel for upsampling. (#441)Laurent Mazare2023-08-141-0/+62
| | | | | * Add a cuda kernel for upsampling. * Update for the latest tokenizers version.
* Add a cuda kernel for avg-pool2d. (#440)Laurent Mazare2023-08-141-3/+157
| | | | | | | | | * Add a cuda kernel for avg-pool2d. * Avoid running out of bounds. * Finish wiring the avg pool kernel + add some testing. * Support for max-pool + testing.
* Add a naive conv2d cuda kernel. (#438)Laurent Mazare2023-08-141-8/+93
| | | | | | | | | | | * Add a naive conv2d cuda kernel. * Proper conv2d support on the rust side. * Conv1d testing on gpu. * Also use the test on gpus. * Fix the clean-ptx target.
* Compat windows.Nicolas Patry2023-08-101-0/+9
|
* This is duplicated code on Cuda 12.2.Nicolas Patry2023-08-101-18/+0
| | | | | Without it we can compile for 52 (but I get Operation Not supported when actually trying to use those kernels).
* Add the license files. (#335)Laurent Mazare2023-08-071-1/+1
|
* Add the recip op + use it in stable-diffusion. (#331)Laurent Mazare2023-08-061-0/+4
| | | | | | | * Add the recip unary op. * Fix the cuda kernel. * Use the recip op in sigmoid.
* Update the repo location. (#305)Laurent Mazare2023-08-021-1/+1
|
* Remove the embedding ops in favor of index-select. (#299)Laurent Mazare2023-08-021-40/+0
| | | | | * Remove the embedding ops in favor of index-select. * Also remove the cuda kernels.
* Cuda support for the mnist training. (#277)Laurent Mazare2023-07-292-7/+118
| | | | | | | | | | | * Cuda support for the mnist training. * min/max fix + testing. * Add the argmin/argmax tests. * More cuda support for argmin/argmax. * Cuda kernels for argmin and argmax.
* Support for where-cond on cuda for u8 and u32. (#274)Laurent Mazare2023-07-291-8/+15
|
* Add some flash attn test (#253)Laurent Mazare2023-07-261-2/+2
| | | | | | | | | * Add some flash-attn test. * Add the cpu test. * Fail when the head is not a multiple of 8. * Polish the flash attention test.
* Add a test for scatter add. (#238)Laurent Mazare2023-07-251-5/+3
| | | | | * Add a test for scatter add (segfaults on gpus for now). * Bugfix for the scatter add cuda kernel.
* Cuda kernels for IndexAdd/ScatterAdd. (#236)Laurent Mazare2023-07-242-1/+102
| | | | | | | | | | | | | * Skeleton methods for IndexAdd/ScatterAdd. * Add a Map2InPlace trait. * Add the glue code for the index-add/scatter-add kernels. * Tweak the file name: embeddings -> indexing. * Add the cuda kernel for indexadd. * And add the scatter-add kernels.
* Indexing cuda (#235)Laurent Mazare2023-07-241-8/+119
| | | | | | | | | * Allow using uint8_t for indexing. * Revert the default cuda feature. * Add a cuda-kernel for index-select. * Add a test for gather.