| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
| |
* feat: add support for multiple compute caps
* Revert to one compute cap
* fmt
* fix
|
|
|
|
|
|
|
| |
when selecting dim > 0 (#1037)
* fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0
* cargo fmt
|
|
|
|
|
|
|
| |
* Add the rounding operators.
* Avoid tracking gradients for the rounding operations.
* Add some rounding tests.
|
|
|
|
|
| |
* Bump the version to 0.3.0.
* Changelog update.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
* Erf based gelu.
* Add the erf backed gelu.
* Test the new gelu op (which is not gelu_new).
|
|
|
|
|
| |
* Bump the crate version.
* Also update the python bindings.
|
|
|
|
| |
(#836)
|
| |
|
|
|
|
|
| |
* im2col version of the cuda conv1d kernel.
* im2col version of the conv1d cpu kernel.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* im2col implementation for conv2d.
* Fix for the im2col implementation to match the current conv2d.
* Small optimization.
* Add a cuda kernel.
* Handle arbitrary layouts.
* Im2Col cuda code.
|
| |
|
|
|
|
|
|
|
| |
* Add tanh.
* Use tanh in the lstm block.
* Add a test for tanh forward and backward passes.
|
|
|
|
|
| |
* Add some documentation.
* Bump the crate version.
|
| |
|
|
|
|
|
|
|
| |
* Add the powf op.
* Cuda kernels and backprop.
* Add a test.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add the dilation parameter.
* Restore the basic optimizer example.
* Dilation support in cudnn.
* Use the dilation parameter in the cpu backend.
* More dilation support.
* No support for dilation in transposed convolutions.
* Add dilation to a test.
* Remove a print.
* Helper function.
|
|
|
|
|
|
|
| |
* Cuda kernel for conv-transpose.
* Fix the cuda kernel.
* Fix the tests.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add to the cuda example a reproduction of the issue.
* Tweak.
* Add a test using non-square matrixes.
* Fix the conv2d kernel.
* Display the error.
* And tweak the comment.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add some group parameter to convolutions.
* Avoid some unnecessary groups checks.
* Move the tensor convolution bits.
* Properh handling of groups.
* Bump the crate version.
* And add a changelog.
|
|
|
|
|
| |
* Add the i64 dtype.
* Adapt the cuda kernels.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add a couple functions required for yolo.
* Add the yolo-v3 example.
* Add minimum and maximum.
* Use the newly introduced maximum.
* Cuda support for min/max + add some testing.
* Allow for more tests to work with accelerate.
* Fix a typo.
|
| |
|
|
|
|
|
|
|
| |
* Rename vec-dot to vec-ops.
* Also bump the crate version.
* Add a currently empty readme.
|
|
|
|
|
| |
* Add a cuda kernel for upsampling.
* Update for the latest tokenizers version.
|
|
|
|
|
|
|
|
|
| |
* Add a cuda kernel for avg-pool2d.
* Avoid running out of bounds.
* Finish wiring the avg pool kernel + add some testing.
* Support for max-pool + testing.
|
|
|
|
|
|
|
|
|
|
|
| |
* Add a naive conv2d cuda kernel.
* Proper conv2d support on the rust side.
* Conv1d testing on gpu.
* Also use the test on gpus.
* Fix the clean-ptx target.
|
| |
|
|
|
|
|
| |
Without it we can compile for 52 (but I get Operation Not supported
when actually trying to use those kernels).
|
| |
|
|
|
|
|
|
|
| |
* Add the recip unary op.
* Fix the cuda kernel.
* Use the recip op in sigmoid.
|
| |
|
|
|
|
|
| |
* Remove the embedding ops in favor of index-select.
* Also remove the cuda kernels.
|
|
|
|
|
|
|
|
|
|
|
| |
* Cuda support for the mnist training.
* min/max fix + testing.
* Add the argmin/argmax tests.
* More cuda support for argmin/argmax.
* Cuda kernels for argmin and argmax.
|
| |
|
|
|
|
|
|
|
|
|
| |
* Add some flash-attn test.
* Add the cpu test.
* Fail when the head is not a multiple of 8.
* Polish the flash attention test.
|
|
|
|
|
| |
* Add a test for scatter add (segfaults on gpus for now).
* Bugfix for the scatter add cuda kernel.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Skeleton methods for IndexAdd/ScatterAdd.
* Add a Map2InPlace trait.
* Add the glue code for the index-add/scatter-add kernels.
* Tweak the file name: embeddings -> indexing.
* Add the cuda kernel for indexadd.
* And add the scatter-add kernels.
|
|
|
|
|
|
|
|
|
| |
* Allow using uint8_t for indexing.
* Revert the default cuda feature.
* Add a cuda-kernel for index-select.
* Add a test for gather.
|