summaryrefslogtreecommitdiff
path: root/candle-core/src/op.rs
Commit message (Collapse)AuthorAgeFilesLines
* 20241118 docs (#2629)zachcp2024-11-191-0/+2
| | | | | | | | | | | | | | | | | * module docs * varbuilder gguf docs * add a link to gguf files * small additonal mod doc titles * safetensor docs * more core docs * more module docs in canlde_core * 2 more link fixes
* Add support for "sign" on tensors (#2012)Thomas Santerre2024-04-041-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | * add the sign unary operator * remove uneeded import * remove uneeded import * undo formatting * undo formatting * remove unnecessary redefintion * allow gradient to flow through for sign and round * fix cpu ops to ensure that negzero and positive zero are handled properly * clippy fixes * Properly avoid gradient tracking. * Use a branchless version. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
* Optimize the gelu f16 opt. (#2008)Laurent Mazare2024-04-041-8/+11
| | | | | * Optimize the gelu f16 opt. * And add a test.
* Prepare for the custom-op extension. (#1892)Laurent Mazare2024-03-211-152/+7
|
* Add grads for interpolate1d (#1742)Kirpal Grewal2024-02-221-1/+4
| | | | | | | * add backprop for interpolate1d * fix clippy lint * correct fix clippy lint
* feat: add silu activation function (#1706)OlivierDehaene2024-02-141-0/+73
| | | | | | | | | * feat: add silu activation function * use silu/arg in grad * update candle-nn * use node
* Upsample grad (#1420)KGrewal12023-12-101-1/+5
| | | | | | | | | | | | | | | * encode size of upsample in enum * working convolution method for limited 2d kernels * add test for sf 3 interpolation * add higher dimensional tests, fix to work with multichannel input * Remove commented out line. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Metal part 1 - Scaffolding for metal. (#1308)Nicolas Patry2023-11-101-1/+43
| | | | | * Metal part 1 - Scaffolding for metal. * Remove tracing.
* Add support to UL2 model family (#1300)Juarez Bochi2023-11-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | * Add support to UL2 model family * Update docs with UL2 * Create ActivationWithOptionalGating to avoid polluting activations * Also refactor quantized t5 * Remove useless conversion * Revert Activation::NewGelu name change * Remove useless return * Apply rustfmt and clippy recommendations * Reuse t5::ActivationWithOptionalGating in quantized version * (cosmetic change) use a match rather than ifs + avoid early returns. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* feat: impl backprop for erf and gelu-erf (#1258)drbh2023-11-031-0/+2
| | | | | | | | | | | * impl backprop for erf anf gelu-erf * feat: unary tests added for erf and gelu-erf * fix: (clippy) remove immediately dereferenced ref * fix: improve comments with pytorch code snippet * fix: adjust comment typo in backprop impl
* Add the conv-transpose1d op. (#1251)Laurent Mazare2023-11-031-0/+10
| | | | | * Skeleton structure for conv-transpose1d. * CPU implementation for conv-transpose1d.
* Lazy detach. (#1242)Laurent Mazare2023-11-021-0/+4
|
* Add i64-abs. (#1216)Laurent Mazare2023-10-291-1/+34
|
* Add the rounding operators. (#1030)Laurent Mazare2023-10-041-0/+108
| | | | | | | * Add the rounding operators. * Avoid tracking gradients for the rounding operations. * Add some rounding tests.
* Add slice-scatter. (#927)Laurent Mazare2023-09-221-0/+1
| | | | | | | | | | | * Add slice-scatter. * Add the op. * Make transpose be a no-op when the dimensions are identical. * Add the backprop. * And add some gradient test.
* Add the erf function. (#917)Laurent Mazare2023-09-211-0/+36
|
* Add an erf based gelu op (#900)Laurent Mazare2023-09-191-0/+36
| | | | | | | * Erf based gelu. * Add the erf backed gelu. * Test the new gelu op (which is not gelu_new).
* Add 1d upsampling. (#839)Laurent Mazare2023-09-131-0/+1
| | | | | * Add 1d upsampling. * Add the interpolate functions.
* Accelerate support for gelu. (#782)Laurent Mazare2023-09-081-0/+18
|
* Add tanh. (#675)Laurent Mazare2023-08-301-0/+3
| | | | | | | * Add tanh. * Use tanh in the lstm block. * Add a test for tanh forward and backward passes.
* Add the powf op. (#664)Laurent Mazare2023-08-291-0/+1
| | | | | | | * Add the powf op. * Cuda kernels and backprop. * Add a test.
* Dilated convolutions (#657)Laurent Mazare2023-08-291-0/+3
| | | | | | | | | | | | | | | | | | | * Add the dilation parameter. * Restore the basic optimizer example. * Dilation support in cudnn. * Use the dilation parameter in the cpu backend. * More dilation support. * No support for dilation in transposed convolutions. * Add dilation to a test. * Remove a print. * Helper function.
* Add conv-transpose. (#635)Laurent Mazare2023-08-281-0/+9
| | | | | | | | | | | * Add conv-transpose. * Return zeros for now. * Naive CPU implementation. * Add a conv-transpose test + fix the cpu implementation. * Add a second test.
* Fixes for clippy 1.72. (#587)Laurent Mazare2023-08-241-0/+2
|
* Add support for i64 (#563)Laurent Mazare2023-08-231-0/+24
| | | | | * Add the i64 dtype. * Adapt the cuda kernels.
* Add a yolo-v3 example. (#528)Laurent Mazare2023-08-201-0/+18
| | | | | | | | | | | | | | | * Add a couple functions required for yolo. * Add the yolo-v3 example. * Add minimum and maximum. * Use the newly introduced maximum. * Cuda support for min/max + add some testing. * Allow for more tests to work with accelerate. * Fix a typo.
* Add the permute op (similar to pytorch). (#504)Laurent Mazare2023-08-181-0/+1
| | | | | * Add the permute op (similar to pytorch). * Add the backprop for dimension permutation.
* Relax the requirements on CustomOp. (#486)Laurent Mazare2023-08-171-6/+15
| | | | | * Relax the requirements on CustomOp. * Simplify the custom-ops when no backward is required.
* More accelerate optimizations (#427)Laurent Mazare2023-08-131-0/+30
| | | | | | | | | | | * Add more tracing to the whisper example. * Support accelerate in more examples. * Use accelerate for pointwise functions. * Use accelerate for binary operations too. * Bugfix for binary operation: use the rhs before the lhs.
* add max_pool2d (#371)LeeeSe2023-08-091-0/+7
| | | Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local>
* Skeleton for the avg-pool2d and upsample-nearest2d ops. (#337)Laurent Mazare2023-08-071-0/+15
| | | | | * Skeleton for the avg-pool2d and upsample-nearest2d ops. * Preliminary conv2d support.
* Add the recip op + use it in stable-diffusion. (#331)Laurent Mazare2023-08-061-0/+3
| | | | | | | * Add the recip unary op. * Fix the cuda kernel. * Use the recip op in sigmoid.
* Remove the embedding ops in favor of index-select. (#299)Laurent Mazare2023-08-021-1/+0
| | | | | * Remove the embedding ops in favor of index-select. * Also remove the cuda kernels.
* Softmax numerical stability. (#267)Laurent Mazare2023-07-281-1/+0
| | | | | * Softmax numerical stability. * Fix the flash-attn test.
* Rename exposed ops.Nicolas Patry2023-07-261-2/+2
|
* Add an abstract backprop op type (#240)Laurent Mazare2023-07-251-1/+61
| | | | | | | * Start adding the backprop op type. * More backprop ops. * Finish the backprop op.
* Add the copy op. (#227)Laurent Mazare2023-07-231-0/+1
| | | | | | | | | | | * Add the copy op. * Tweak some cat error messages. * Handle the contiguous case in to_vec1. * Fast variant for to_vec2. * Add add a faster to_vec3 variant.
* Add the gather op. (#219)Laurent Mazare2023-07-221-0/+2
| | | | | | | | | | | * Start adding gather. * Gather cpu implementation + use in simple training. * Add scatter_add for the gradient of gather. * Simple cpu implementation of scatter_add. * Use gather in the simple-training backprop.
* Start adding index-add.laurent2023-07-211-0/+1
|
* Add binary and ternary custom ops. (#217)Laurent Mazare2023-07-211-1/+84
|
* Custom ops with a single argument (#214)Laurent Mazare2023-07-211-4/+29
| | | | | | | | | * Add the CustomOp1 trait. * Add an example of custom op. * Polish the custom op example. * Add some backward pass test for custom ops.
* Refactor the reduce ops in order to introduce argmin/argmax. (#212)Laurent Mazare2023-07-211-0/+14
| | | | | | | | | | | * Refactor the reduce ops in order to introduce argmin/argmax. * Clippy fixes. * Use the newly introduced argmax. * Fix the strided case. * Handle the non-contiguous case.
* More realistic training setup. (#210)Laurent Mazare2023-07-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | * More realistic training setup. * Compute the model accuracy. * Very inefficient backprop for index select. * More backprop. * Fix some backprop issues. * Backprop fix. * Another broadcasting backprop fix. * Better backprop for reducing ops. * Training again. * Add some gradient tests. * Get the training to work.
* Add the index-select op. (#209)Laurent Mazare2023-07-201-0/+1
| | | | | | | * Add the index-select op. * Cpu implementation of index-select. * Add the cpu implementation for index-select.
* Op refactor (#208)Laurent Mazare2023-07-201-21/+33
| | | | | * Add the binary and unary op enums to factorize some code. * Bugfix.
* Add the comparison operations. (#207)Laurent Mazare2023-07-201-10/+19
| | | | | | | | | * Add the comparison operations. * Add the helper functions on the tensor side. * More cmp operations. * Cpu implementation for the comparison operations.
* Add some more developed training examples. (#199)Laurent Mazare2023-07-191-0/+9
| | | | | | | | | | | | | * Use contiguous tensors for variables. * Sketch the mnist example. * Start adding the reduce ops. * Renaming. * Refactor the reduce operations. * Bugfix for the broadcasting vectorization.
* Mklize more unary ops. (#191)Laurent Mazare2023-07-181-6/+53
| | | | | * Mklize more unary ops. * Even more unary ops.
* Use mkl to accelerate binary ops. (#190)Laurent Mazare2023-07-181-5/+33
| | | | | | | | | | | * Vectorized binary ops with mkl. * Improve the binary op mkl support. * Push the support for mkl binary ops. * Proper vectorization of binary ops. * Proper mkl'isation when broadcasting binary ops.
* Preliminary support for mkl based gelu. (#187)Laurent Mazare2023-07-181-0/+29
| | | | | | | * Preliminary support for mkl based gelu. * Add the vectorized function for unary ops. * Get the mkl specialized gelu to work.