| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* module docs
* varbuilder gguf docs
* add a link to gguf files
* small additonal mod doc titles
* safetensor docs
* more core docs
* more module docs in canlde_core
* 2 more link fixes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* add the sign unary operator
* remove uneeded import
* remove uneeded import
* undo formatting
* undo formatting
* remove unnecessary redefintion
* allow gradient to flow through for sign and round
* fix cpu ops to ensure that negzero and positive zero are handled properly
* clippy fixes
* Properly avoid gradient tracking.
* Use a branchless version.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
|
|
|
|
|
| |
* Optimize the gelu f16 opt.
* And add a test.
|
| |
|
|
|
|
|
|
|
| |
* add backprop for interpolate1d
* fix clippy lint
* correct fix clippy lint
|
|
|
|
|
|
|
|
|
| |
* feat: add silu activation function
* use silu/arg in grad
* update candle-nn
* use node
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* encode size of upsample in enum
* working convolution method for limited 2d kernels
* add test for sf 3 interpolation
* add higher dimensional tests, fix to work with multichannel input
* Remove commented out line.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
|
|
|
|
|
| |
* Metal part 1 - Scaffolding for metal.
* Remove tracing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add support to UL2 model family
* Update docs with UL2
* Create ActivationWithOptionalGating to avoid polluting activations
* Also refactor quantized t5
* Remove useless conversion
* Revert Activation::NewGelu name change
* Remove useless return
* Apply rustfmt and clippy recommendations
* Reuse t5::ActivationWithOptionalGating in quantized version
* (cosmetic change) use a match rather than ifs + avoid early returns.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
* impl backprop for erf anf gelu-erf
* feat: unary tests added for erf and gelu-erf
* fix: (clippy) remove immediately dereferenced ref
* fix: improve comments with pytorch code snippet
* fix: adjust comment typo in backprop impl
|
|
|
|
|
| |
* Skeleton structure for conv-transpose1d.
* CPU implementation for conv-transpose1d.
|
| |
|
| |
|
|
|
|
|
|
|
| |
* Add the rounding operators.
* Avoid tracking gradients for the rounding operations.
* Add some rounding tests.
|
|
|
|
|
|
|
|
|
|
|
| |
* Add slice-scatter.
* Add the op.
* Make transpose be a no-op when the dimensions are identical.
* Add the backprop.
* And add some gradient test.
|
| |
|
|
|
|
|
|
|
| |
* Erf based gelu.
* Add the erf backed gelu.
* Test the new gelu op (which is not gelu_new).
|
|
|
|
|
| |
* Add 1d upsampling.
* Add the interpolate functions.
|
| |
|
|
|
|
|
|
|
| |
* Add tanh.
* Use tanh in the lstm block.
* Add a test for tanh forward and backward passes.
|
|
|
|
|
|
|
| |
* Add the powf op.
* Cuda kernels and backprop.
* Add a test.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add the dilation parameter.
* Restore the basic optimizer example.
* Dilation support in cudnn.
* Use the dilation parameter in the cpu backend.
* More dilation support.
* No support for dilation in transposed convolutions.
* Add dilation to a test.
* Remove a print.
* Helper function.
|
|
|
|
|
|
|
|
|
|
|
| |
* Add conv-transpose.
* Return zeros for now.
* Naive CPU implementation.
* Add a conv-transpose test + fix the cpu implementation.
* Add a second test.
|
| |
|
|
|
|
|
| |
* Add the i64 dtype.
* Adapt the cuda kernels.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add a couple functions required for yolo.
* Add the yolo-v3 example.
* Add minimum and maximum.
* Use the newly introduced maximum.
* Cuda support for min/max + add some testing.
* Allow for more tests to work with accelerate.
* Fix a typo.
|
|
|
|
|
| |
* Add the permute op (similar to pytorch).
* Add the backprop for dimension permutation.
|
|
|
|
|
| |
* Relax the requirements on CustomOp.
* Simplify the custom-ops when no backward is required.
|
|
|
|
|
|
|
|
|
|
|
| |
* Add more tracing to the whisper example.
* Support accelerate in more examples.
* Use accelerate for pointwise functions.
* Use accelerate for binary operations too.
* Bugfix for binary operation: use the rhs before the lhs.
|
|
|
| |
Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local>
|
|
|
|
|
| |
* Skeleton for the avg-pool2d and upsample-nearest2d ops.
* Preliminary conv2d support.
|
|
|
|
|
|
|
| |
* Add the recip unary op.
* Fix the cuda kernel.
* Use the recip op in sigmoid.
|
|
|
|
|
| |
* Remove the embedding ops in favor of index-select.
* Also remove the cuda kernels.
|
|
|
|
|
| |
* Softmax numerical stability.
* Fix the flash-attn test.
|
| |
|
|
|
|
|
|
|
| |
* Start adding the backprop op type.
* More backprop ops.
* Finish the backprop op.
|
|
|
|
|
|
|
|
|
|
|
| |
* Add the copy op.
* Tweak some cat error messages.
* Handle the contiguous case in to_vec1.
* Fast variant for to_vec2.
* Add add a faster to_vec3 variant.
|
|
|
|
|
|
|
|
|
|
|
| |
* Start adding gather.
* Gather cpu implementation + use in simple training.
* Add scatter_add for the gradient of gather.
* Simple cpu implementation of scatter_add.
* Use gather in the simple-training backprop.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
* Add the CustomOp1 trait.
* Add an example of custom op.
* Polish the custom op example.
* Add some backward pass test for custom ops.
|
|
|
|
|
|
|
|
|
|
|
| |
* Refactor the reduce ops in order to introduce argmin/argmax.
* Clippy fixes.
* Use the newly introduced argmax.
* Fix the strided case.
* Handle the non-contiguous case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* More realistic training setup.
* Compute the model accuracy.
* Very inefficient backprop for index select.
* More backprop.
* Fix some backprop issues.
* Backprop fix.
* Another broadcasting backprop fix.
* Better backprop for reducing ops.
* Training again.
* Add some gradient tests.
* Get the training to work.
|
|
|
|
|
|
|
| |
* Add the index-select op.
* Cpu implementation of index-select.
* Add the cpu implementation for index-select.
|
|
|
|
|
| |
* Add the binary and unary op enums to factorize some code.
* Bugfix.
|
|
|
|
|
|
|
|
|
| |
* Add the comparison operations.
* Add the helper functions on the tensor side.
* More cmp operations.
* Cpu implementation for the comparison operations.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Use contiguous tensors for variables.
* Sketch the mnist example.
* Start adding the reduce ops.
* Renaming.
* Refactor the reduce operations.
* Bugfix for the broadcasting vectorization.
|
|
|
|
|
| |
* Mklize more unary ops.
* Even more unary ops.
|
|
|
|
|
|
|
|
|
|
|
| |
* Vectorized binary ops with mkl.
* Improve the binary op mkl support.
* Push the support for mkl binary ops.
* Proper vectorization of binary ops.
* Proper mkl'isation when broadcasting binary ops.
|
|
|
|
|
|
|
| |
* Preliminary support for mkl based gelu.
* Add the vectorized function for unary ops.
* Get the mkl specialized gelu to work.
|