summaryrefslogtreecommitdiff
path: root/candle-core/src/backend.rs
Commit message (Collapse)AuthorAgeFilesLines
* 20241118 docs (#2629)zachcp2024-11-191-0/+2
| | | | | | | | | | | | | | | | | * module docs * varbuilder gguf docs * add a link to gguf files * small additonal mod doc titles * safetensor docs * more core docs * more module docs in canlde_core * 2 more link fixes
* Add StorageRef. (#2113)Laurent Mazare2024-04-231-0/+2
| | | | | * Add the storage-ref bits. * Add the metal implementation.
* Add a synchronize method to devices. (#2055)Laurent Mazare2024-04-141-0/+3
| | | | | * Add a synchronize method to devices. * Metal version.
* Add the alloc_uninit function. (#1901)Laurent Mazare2024-03-221-0/+6
| | | | | | | * Add the alloc_uninit function. * Dummy metal fix. * Lazy initialization.
* Async tensor copying. (#1900)Laurent Mazare2024-03-211-0/+2
|
* Optimize the cat operation on contiguous tensors (#1855)Laurent Mazare2024-03-171-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add a specialized kernel for copy2d. * Move the cat operations. * Avoid transpositions in cat. * Bugfix. * Bugfix for the cuda kernel. * Add a benchmark. * Add more testing. * Test fix. * Faster kernel. * Add the missing kernel. * Tweak the test. * Add a metal kernel. * Fix for the metal kernel. * Get the tests to pass on metal. * Also use this opportunity to fix the metal kernel for ELU. * Add some bf16 kernels. * Clippy fixes.
* Add the conv-transpose1d op. (#1251)Laurent Mazare2023-11-031-0/+8
| | | | | * Skeleton structure for conv-transpose1d. * CPU implementation for conv-transpose1d.
* Make the cuda rng seedable. (#1056)Laurent Mazare2023-10-081-0/+2
|
* Add 1d upsampling. (#839)Laurent Mazare2023-09-131-0/+1
| | | | | * Add 1d upsampling. * Add the interpolate functions.
* Add the powf op. (#664)Laurent Mazare2023-08-291-0/+2
| | | | | | | * Add the powf op. * Cuda kernels and backprop. * Add a test.
* Add conv-transpose. (#635)Laurent Mazare2023-08-281-0/+8
| | | | | | | | | | | * Add conv-transpose. * Return zeros for now. * Naive CPU implementation. * Add a conv-transpose test + fix the cpu implementation. * Add a second test.
* add max_pool2d (#371)LeeeSe2023-08-091-0/+1
| | | Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local>
* Add more conv2d support. (#340)Laurent Mazare2023-08-081-0/+8
| | | | | | | * Add more conv2d support. * Conv2d cpu work. * Conv2d output shape.
* CPU implementation for upsample-nearest2d. (#339)Laurent Mazare2023-08-071-0/+1
|
* Some CLIP fixes for stable diffusion. (#338)Laurent Mazare2023-08-071-0/+2
| | | | | * Some CLIP fixes for stable diffusion. * Add the avg-pool2d operation on cpu.
* Remove the embedding ops in favor of index-select. (#299)Laurent Mazare2023-08-021-1/+0
| | | | | * Remove the embedding ops in favor of index-select. * Also remove the cuda kernels.
* Softmax numerical stability. (#267)Laurent Mazare2023-07-281-2/+0
| | | | | * Softmax numerical stability. * Fix the flash-attn test.
* Add the gather op. (#219)Laurent Mazare2023-07-221-0/+10
| | | | | | | | | | | * Start adding gather. * Gather cpu implementation + use in simple training. * Add scatter_add for the gradient of gather. * Simple cpu implementation of scatter_add. * Use gather in the simple-training backprop.
* Start adding index-add.laurent2023-07-211-0/+9
|
* Custom ops with a single argument (#214)Laurent Mazare2023-07-211-2/+2
| | | | | | | | | * Add the CustomOp1 trait. * Add an example of custom op. * Polish the custom op example. * Add some backward pass test for custom ops.
* Add the index-select op. (#209)Laurent Mazare2023-07-201-0/+1
| | | | | | | * Add the index-select op. * Cpu implementation of index-select. * Add the cpu implementation for index-select.
* Op refactor (#208)Laurent Mazare2023-07-201-4/+3
| | | | | * Add the binary and unary op enums to factorize some code. * Bugfix.
* Add the comparison operations. (#207)Laurent Mazare2023-07-201-1/+4
| | | | | | | | | * Add the comparison operations. * Add the helper functions on the tensor side. * More cmp operations. * Cpu implementation for the comparison operations.
* Add some more developed training examples. (#199)Laurent Mazare2023-07-191-1/+1
| | | | | | | | | | | | | * Use contiguous tensors for variables. * Sketch the mnist example. * Start adding the reduce ops. * Renaming. * Refactor the reduce operations. * Bugfix for the broadcasting vectorization.
* Implement the backend trait for the cpu backend. (#143)Laurent Mazare2023-07-121-0/+1
|
* Modular backends (#138)Laurent Mazare2023-07-111-0/+71
* Add some trait to formalize backends. * Use the generic backend trait.