Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Bump the crate version to 0.3.3. (#1490) | Laurent Mazare | 2023-12-28 | 1 | -1/+1 |
| | |||||
* | Bump the crate version to 0.3.2. (#1452) | Laurent Mazare | 2023-12-17 | 1 | -1/+1 |
| | |||||
* | Update for 0.3.1. (#1324) | Laurent Mazare | 2023-11-11 | 1 | -2/+2 |
| | |||||
* | Rework the cuda casting bits. (#1112) | Laurent Mazare | 2023-10-17 | 1 | -31/+54 |
| | |||||
* | feat: parse Cuda compute cap from env (#1066) | OlivierDehaene | 2023-10-16 | 2 | -89/+110 |
| | | | | | | | | | * feat: add support for multiple compute caps * Revert to one compute cap * fmt * fix | ||||
* | fix: fix index_select cuda kernel for src target dim different than ids dim ↵ | Gonzalo | 2023-10-05 | 1 | -6/+8 |
| | | | | | | | when selecting dim > 0 (#1037) * fix: fix index_select cuda kernel for src target dim different than ids dim when selecting dim > 0 * cargo fmt | ||||
* | Add the rounding operators. (#1030) | Laurent Mazare | 2023-10-04 | 2 | -0/+24 |
| | | | | | | | * Add the rounding operators. * Avoid tracking gradients for the rounding operations. * Add some rounding tests. | ||||
* | Bump the version to 0.3.0. (#1014) | Laurent Mazare | 2023-10-01 | 1 | -1/+1 |
| | | | | | * Bump the version to 0.3.0. * Changelog update. | ||||
* | fix: add missing gpu fill_* (#996) | Gonzalo | 2023-09-29 | 1 | -0/+9 |
| | |||||
* | Optimize the index-select cuda kernel. (#976) | Laurent Mazare | 2023-09-28 | 1 | -14/+8 |
| | |||||
* | Add the missing kernel. (#955) | Laurent Mazare | 2023-09-24 | 1 | -0/+1 |
| | |||||
* | cuda cast i64 (#925) | Gonzalo | 2023-09-21 | 1 | -0/+10 |
| | |||||
* | Add an erf based gelu op (#900) | Laurent Mazare | 2023-09-19 | 2 | -0/+25 |
| | | | | | | | * Erf based gelu. * Add the erf backed gelu. * Test the new gelu op (which is not gelu_new). | ||||
* | Bump the crate versions to v0.2.3. (#886) | Laurent Mazare | 2023-09-18 | 1 | -1/+1 |
| | | | | | * Bump the crate version. * Also update the python bindings. | ||||
* | Add `CANDLE_NVCC_CCBIN` support for `candle-kernels`, and eliminate warning. ↵ | Charles Lew | 2023-09-13 | 1 | -2/+9 |
| | | | | (#836) | ||||
* | Bump the crate version + update the changelog. (#822) | Laurent Mazare | 2023-09-12 | 1 | -1/+1 |
| | |||||
* | im2col version of the conv1d kernel. (#815) | Laurent Mazare | 2023-09-11 | 1 | -1/+70 |
| | | | | | * im2col version of the cuda conv1d kernel. * im2col version of the conv1d cpu kernel. | ||||
* | im2col based conv2d (#802) | Laurent Mazare | 2023-09-10 | 1 | -0/+89 |
| | | | | | | | | | | | | | * im2col implementation for conv2d. * Fix for the im2col implementation to match the current conv2d. * Small optimization. * Add a cuda kernel. * Handle arbitrary layouts. * Im2Col cuda code. | ||||
* | Add a dedicated cuda kernel for softmax. (#746) | Laurent Mazare | 2023-09-05 | 1 | -0/+55 |
| | |||||
* | Add tanh. (#675) | Laurent Mazare | 2023-08-30 | 1 | -0/+4 |
| | | | | | | | * Add tanh. * Use tanh in the lstm block. * Add a test for tanh forward and backward passes. | ||||
* | Add some documentation. (#673) | Laurent Mazare | 2023-08-30 | 1 | -1/+1 |
| | | | | | * Add some documentation. * Bump the crate version. | ||||
* | Support dilation in conv-transpose2d. (#671) | Laurent Mazare | 2023-08-30 | 1 | -3/+3 |
| | |||||
* | Add the powf op. (#664) | Laurent Mazare | 2023-08-29 | 1 | -0/+4 |
| | | | | | | | * Add the powf op. * Cuda kernels and backprop. * Add a test. | ||||
* | Fix the dilated convolutions. (#659) | Laurent Mazare | 2023-08-29 | 1 | -2/+2 |
| | |||||
* | Dilated convolutions (#657) | Laurent Mazare | 2023-08-29 | 1 | -6/+12 |
| | | | | | | | | | | | | | | | | | | | * Add the dilation parameter. * Restore the basic optimizer example. * Dilation support in cudnn. * Use the dilation parameter in the cpu backend. * More dilation support. * No support for dilation in transposed convolutions. * Add dilation to a test. * Remove a print. * Helper function. | ||||
* | Cuda conv transpose (#645) | Laurent Mazare | 2023-08-28 | 1 | -0/+88 |
| | | | | | | | * Cuda kernel for conv-transpose. * Fix the cuda kernel. * Fix the tests. | ||||
* | Bump the crate version + update CHANGELOG. (#628) | Laurent Mazare | 2023-08-27 | 1 | -1/+1 |
| | |||||
* | Let's keep the dirty code on its own. | Nicolas Patry | 2023-08-25 | 1 | -2/+25 |
| | |||||
* | Intermediary float cast is necessary for cuda 11.8 | Nicolas Patry | 2023-08-25 | 1 | -2/+2 |
| | |||||
* | `static_cast` ? | Nicolas Patry | 2023-08-25 | 1 | -2/+2 |
| | |||||
* | Different casting ? | Nicolas Patry | 2023-08-25 | 1 | -2/+2 |
| | |||||
* | Repairing cast bf16/f16 | Nicolas Patry | 2023-08-25 | 1 | -4/+4 |
| | |||||
* | Add to the cuda example a reproduction of the issue. (#579) | Laurent Mazare | 2023-08-24 | 1 | -10/+11 |
| | | | | | | | | | | | | | * Add to the cuda example a reproduction of the issue. * Tweak. * Add a test using non-square matrixes. * Fix the conv2d kernel. * Display the error. * And tweak the comment. | ||||
* | Add some group parameter to convolutions. (#566) | Laurent Mazare | 2023-08-23 | 1 | -1/+1 |
| | | | | | | | | | | | | | * Add some group parameter to convolutions. * Avoid some unnecessary groups checks. * Move the tensor convolution bits. * Properh handling of groups. * Bump the crate version. * And add a changelog. | ||||
* | Add support for i64 (#563) | Laurent Mazare | 2023-08-23 | 6 | -1/+65 |
| | | | | | * Add the i64 dtype. * Adapt the cuda kernels. | ||||
* | Add a yolo-v3 example. (#528) | Laurent Mazare | 2023-08-20 | 1 | -0/+12 |
| | | | | | | | | | | | | | | | * Add a couple functions required for yolo. * Add the yolo-v3 example. * Add minimum and maximum. * Use the newly introduced maximum. * Cuda support for min/max + add some testing. * Allow for more tests to work with accelerate. * Fix a typo. | ||||
* | Bump the crates version to 0.1.2. (#522) | Laurent Mazare | 2023-08-20 | 1 | -1/+1 |
| | |||||
* | Rename vec-dot to vec-ops. (#449) | Laurent Mazare | 2023-08-15 | 1 | -1/+1 |
| | | | | | | | * Rename vec-dot to vec-ops. * Also bump the crate version. * Add a currently empty readme. | ||||
* | Add a cuda kernel for upsampling. (#441) | Laurent Mazare | 2023-08-14 | 1 | -0/+62 |
| | | | | | * Add a cuda kernel for upsampling. * Update for the latest tokenizers version. | ||||
* | Add a cuda kernel for avg-pool2d. (#440) | Laurent Mazare | 2023-08-14 | 1 | -3/+157 |
| | | | | | | | | | * Add a cuda kernel for avg-pool2d. * Avoid running out of bounds. * Finish wiring the avg pool kernel + add some testing. * Support for max-pool + testing. | ||||
* | Add a naive conv2d cuda kernel. (#438) | Laurent Mazare | 2023-08-14 | 1 | -8/+93 |
| | | | | | | | | | | | * Add a naive conv2d cuda kernel. * Proper conv2d support on the rust side. * Conv1d testing on gpu. * Also use the test on gpus. * Fix the clean-ptx target. | ||||
* | Compat windows. | Nicolas Patry | 2023-08-10 | 1 | -0/+9 |
| | |||||
* | This is duplicated code on Cuda 12.2. | Nicolas Patry | 2023-08-10 | 1 | -18/+0 |
| | | | | | Without it we can compile for 52 (but I get Operation Not supported when actually trying to use those kernels). | ||||
* | Add the license files. (#335) | Laurent Mazare | 2023-08-07 | 1 | -1/+1 |
| | |||||
* | Add the recip op + use it in stable-diffusion. (#331) | Laurent Mazare | 2023-08-06 | 1 | -0/+4 |
| | | | | | | | * Add the recip unary op. * Fix the cuda kernel. * Use the recip op in sigmoid. | ||||
* | Update the repo location. (#305) | Laurent Mazare | 2023-08-02 | 1 | -1/+1 |
| | |||||
* | Remove the embedding ops in favor of index-select. (#299) | Laurent Mazare | 2023-08-02 | 1 | -40/+0 |
| | | | | | * Remove the embedding ops in favor of index-select. * Also remove the cuda kernels. | ||||
* | Cuda support for the mnist training. (#277) | Laurent Mazare | 2023-07-29 | 2 | -7/+118 |
| | | | | | | | | | | | * Cuda support for the mnist training. * min/max fix + testing. * Add the argmin/argmax tests. * More cuda support for argmin/argmax. * Cuda kernels for argmin and argmax. | ||||
* | Support for where-cond on cuda for u8 and u32. (#274) | Laurent Mazare | 2023-07-29 | 1 | -8/+15 |
| | |||||
* | Add some flash attn test (#253) | Laurent Mazare | 2023-07-26 | 1 | -2/+2 |
| | | | | | | | | | * Add some flash-attn test. * Add the cpu test. * Fail when the head is not a multiple of 8. * Polish the flash attention test. |