| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* module docs
* varbuilder gguf docs
* add a link to gguf files
* small additonal mod doc titles
* safetensor docs
* more core docs
* more module docs in canlde_core
* 2 more link fixes
|
|
|
|
|
| |
* Add the storage-ref bits.
* Add the metal implementation.
|
|
|
|
|
| |
* Add a synchronize method to devices.
* Metal version.
|
|
|
|
|
|
|
| |
* Add the alloc_uninit function.
* Dummy metal fix.
* Lazy initialization.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add a specialized kernel for copy2d.
* Move the cat operations.
* Avoid transpositions in cat.
* Bugfix.
* Bugfix for the cuda kernel.
* Add a benchmark.
* Add more testing.
* Test fix.
* Faster kernel.
* Add the missing kernel.
* Tweak the test.
* Add a metal kernel.
* Fix for the metal kernel.
* Get the tests to pass on metal.
* Also use this opportunity to fix the metal kernel for ELU.
* Add some bf16 kernels.
* Clippy fixes.
|
|
|
|
|
| |
* Skeleton structure for conv-transpose1d.
* CPU implementation for conv-transpose1d.
|
| |
|
|
|
|
|
| |
* Add 1d upsampling.
* Add the interpolate functions.
|
|
|
|
|
|
|
| |
* Add the powf op.
* Cuda kernels and backprop.
* Add a test.
|
|
|
|
|
|
|
|
|
|
|
| |
* Add conv-transpose.
* Return zeros for now.
* Naive CPU implementation.
* Add a conv-transpose test + fix the cpu implementation.
* Add a second test.
|
|
|
| |
Co-authored-by: 赵理山 <ls@zhaolishandeMacBook-Air.local>
|
|
|
|
|
|
|
| |
* Add more conv2d support.
* Conv2d cpu work.
* Conv2d output shape.
|
| |
|
|
|
|
|
| |
* Some CLIP fixes for stable diffusion.
* Add the avg-pool2d operation on cpu.
|
|
|
|
|
| |
* Remove the embedding ops in favor of index-select.
* Also remove the cuda kernels.
|
|
|
|
|
| |
* Softmax numerical stability.
* Fix the flash-attn test.
|
|
|
|
|
|
|
|
|
|
|
| |
* Start adding gather.
* Gather cpu implementation + use in simple training.
* Add scatter_add for the gradient of gather.
* Simple cpu implementation of scatter_add.
* Use gather in the simple-training backprop.
|
| |
|
|
|
|
|
|
|
|
|
| |
* Add the CustomOp1 trait.
* Add an example of custom op.
* Polish the custom op example.
* Add some backward pass test for custom ops.
|
|
|
|
|
|
|
| |
* Add the index-select op.
* Cpu implementation of index-select.
* Add the cpu implementation for index-select.
|
|
|
|
|
| |
* Add the binary and unary op enums to factorize some code.
* Bugfix.
|
|
|
|
|
|
|
|
|
| |
* Add the comparison operations.
* Add the helper functions on the tensor side.
* More cmp operations.
* Cpu implementation for the comparison operations.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Use contiguous tensors for variables.
* Sketch the mnist example.
* Start adding the reduce ops.
* Renaming.
* Refactor the reduce operations.
* Bugfix for the broadcasting vectorization.
|
| |
|
|
* Add some trait to formalize backends.
* Use the generic backend trait.
|