summaryrefslogtreecommitdiff
path: root/candle-nn/tests/ops.rs
Commit message (Collapse)AuthorAgeFilesLines
* Improved launch config for layer-norm/rms-norm. (#2591)Laurent Mazare2024-11-041-0/+45
| | | | | * Improved launch config for layer-norm/rms-norm. * Add more testing for the fused layer/rms norm kernels.
* Add the layernorm specialized op. (#2212)Laurent Mazare2024-05-241-0/+27
| | | | | | | | | | | | | | | * Add the layernorm cuda kernels. * Dedicated layer norm op. * Add the slower variant. * Plug the cuda implementation. * Add the metal variant. * Add a dedicated test. * Bugfix.
* Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114)MilkFather2024-04-291-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * add sigmoid op * small fix * add as a method on `Tensor` * implement gradient calculation for sigmoid * add sigmoid tests * we should have a specialized op for this * fix clippy * fix clippy 2 * Revert all previous commits in favor of a `CustomOp` based solution * use `CustomOp1` implementation * fix rustfmt * experimental add metal impl * add cuda kernel impl * fix fmt * Add a test + reduce some cuda duplication. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
* Add the rope THD kernel. (#2014)Laurent Mazare2024-04-051-0/+31
| | | | | | | | | * Add the rope THD kernel. * Cuda kernel for rope-thd. * Add the metal kernels. * Add a dedicated test.
* Contiguous variant of the rope kernel. (#1929)Laurent Mazare2024-03-251-2/+30
| | | | | | | * Contiguous variant of the rope kernel. * Add the cuda kernel. * Metal kernel.
* Fast kernels for rotary embeddings. (#1928)Laurent Mazare2024-03-241-0/+28
| | | | | | | | | | | | | | | | | | | * Fast kernels for rotary embeddings. * Add a test for the fast CPU kernel. * Rope cuda bindings. * Cuda kernel. * Metal kernel (part 1). * Cuda kernels. * Finish the metal kernel. * Use the new kernels in the quantized example. * Fix warning.
* Custom op for RmsNorm (#1890)Laurent Mazare2024-03-211-4/+30
| | | | | | | | | | | | | * Trying out a custom RmsNorm cuda kernel. * CPU implementation for rms-norm. * Cuda wrappers. * Add some validation. * Add some testing. * More testing.
* Add a custom softmax implementation. (#744)Laurent Mazare2023-09-051-0/+10
| | | | | | | | | | | | | | | * Add a custom softmax implementation. * Add softmaxlastdim to the benchmarks. * And add a test. * Support more dtypes. * Polish the code. * Use the slow implementation on cuda. * Add a todo for the cuda kernel.
* Move the test-utils bits to a shared place. (#619)Laurent Mazare2023-08-271-7/+4
|
* Add a yolo-v3 example. (#528)Laurent Mazare2023-08-201-0/+3
| | | | | | | | | | | | | | | * Add a couple functions required for yolo. * Add the yolo-v3 example. * Add minimum and maximum. * Use the newly introduced maximum. * Cuda support for min/max + add some testing. * Allow for more tests to work with accelerate. * Fix a typo.
* Add the AdamW optimizer. (#307)Laurent Mazare2023-08-021-14/+6
| | | | | * Add the AdamW optimizer. * Add some AdamW test validated against PyTorch.
* Softmax numerical stability. (#267)Laurent Mazare2023-07-281-0/+62
* Softmax numerical stability. * Fix the flash-attn test.