Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Bump the ug dependency. (#2720) | Laurent Mazare | 2025-01-16 | 1 | -1/+1 |
| | | | | | | | * Bump the ug dependency. * Fix some test. * Fix the ug test. | ||||
* | Clippy fixes for 1.84. (#2710) | Laurent Mazare | 2025-01-10 | 1 | -4/+1 |
| | |||||
* | Fix a cuda warning. (#2693) | Laurent Mazare | 2024-12-31 | 1 | -39/+44 |
| | |||||
* | Add a Context trait similar to anyhow::Context. (#2676) | Laurent Mazare | 2024-12-22 | 6 | -16/+76 |
| | | | | | * Add a Context trait similar to anyhow::Context. * Switch two unwrap to context. | ||||
* | add scatter add (#2656) | zachcp | 2024-12-01 | 1 | -0/+1 |
| | |||||
* | add u32 - U32 gather (#2653) | zachcp | 2024-11-30 | 1 | -0/+1 |
| | |||||
* | Clippy fixes for the cuda feature. (#2650) | Laurent Mazare | 2024-11-29 | 2 | -11/+11 |
| | |||||
* | Lint fixes introduced with Rust 1.83 (#2646) | Anubhab Bandyopadhyay | 2024-11-28 | 5 | -16/+16 |
| | | | | | | | | | | | * Fixes for lint errors introduced with Rust 1.83 * rustfmt * Fix more lints. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com> | ||||
* | fix typo (#2606) | Andrei Fajardo | 2024-11-23 | 1 | -1/+1 |
| | |||||
* | 20241118 docs (#2629) | zachcp | 2024-11-19 | 22 | -12/+47 |
| | | | | | | | | | | | | | | | | | * module docs * varbuilder gguf docs * add a link to gguf files * small additonal mod doc titles * safetensor docs * more core docs * more module docs in canlde_core * 2 more link fixes | ||||
* | Add max-all/min-all. (#2616) | Laurent Mazare | 2024-11-14 | 1 | -0/+36 |
| | |||||
* | Add some missing index-select metal kernels. (#2613) | Laurent Mazare | 2024-11-12 | 1 | -1/+10 |
| | | | | | * Add some missing index-select metal kernels. * Make some matrix contiguous pre-matmul. | ||||
* | Update docs (#2553) | zachcp | 2024-11-11 | 1 | -0/+14 |
| | | | | | * add module docs for candle-core * doc each of the candle-nn modules and add the links to the doc page | ||||
* | Add some fast Metal MLX SDPA kernels (#2584) | Eric Buehler | 2024-11-05 | 2 | -0/+12 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add some fast Metal MLX SDPA kernels (#32) * Sketch the sdpa kernel * Add full sdpa kernel, * Add test * Add vectorized kernel for decoding * Update tests * Add some docs * Fix sdpa_vector names * Add softcapping for vectorized sdpa * Add softcapping for full sdpa * Add support for head dim 32, 96, 256 * Add support for head dim 32, 96, 256 * Update docs * Add update notice * Clippy and format * Conditional compilation for bf16 * Use it in quantized llama * Some review comments * Use set_params! * Remove unused * Remove feature * Fix metal sdpa for v stride * Remove comma * Add the dim method to layout and shape. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com> | ||||
* | UG metal integration. (#2580) | Laurent Mazare | 2024-10-27 | 5 | -10/+87 |
| | |||||
* | Support for UG kernels. (#2579) | Laurent Mazare | 2024-10-27 | 7 | -2/+137 |
| | | | | | * Support for UG kernels. * Add a dedicated test. | ||||
* | Testcases (#2567) | Anubhab Bandyopadhyay | 2024-10-17 | 2 | -3/+278 |
| | |||||
* | Switch to using the MLX matmul by default. (#2547) | Laurent Mazare | 2024-10-06 | 1 | -3/+3 |
| | |||||
* | Fix for cudnn bf16 conv2d. (#2535) | Laurent Mazare | 2024-10-02 | 2 | -10/+14 |
| | |||||
* | Add support for cuda streams. (#2532) | Laurent Mazare | 2024-10-02 | 3 | -0/+24 |
| | |||||
* | Efficient implementation of `Tensor::ones()` for `metal` (#2512) | Anubhab Bandyopadhyay | 2024-10-01 | 2 | -4/+62 |
| | | | | | | | | | | | | | * WIP: hopefully better const impl * with GPU * More tests on * Reverting primitive for * Incorporating review changes - added check elem count check in kerner, using for call strategy * rustfmt ran | ||||
* | Cuda quantized mmv bugfix. (#2526) | Laurent Mazare | 2024-10-01 | 1 | -1/+25 |
| | |||||
* | Yet another cuda qmm padding fix. (#2509) | Laurent Mazare | 2024-09-30 | 1 | -25/+55 |
| | |||||
* | Bugfix for the metal elu kernel. (#2490) | Laurent Mazare | 2024-09-21 | 1 | -0/+13 |
| | | | | | * Bugfix for the metal elu kernel. * Add a test. | ||||
* | Metal commands refactoring (#2489) | Laurent Mazare | 2024-09-21 | 2 | -99/+113 |
| | | | | | | | | | * Split out the commands part of the metal device. * Make most fields private. * Move the allocator back. * Rework the encoder provider type. | ||||
* | Improve error message (#2485) | ivnsch | 2024-09-20 | 1 | -1/+5 |
| | |||||
* | Add a couple cast metal kernels. (#2479) | Laurent Mazare | 2024-09-15 | 1 | -8/+31 |
| | |||||
* | Export TensorIndexer public to candle users (#2477) | Shengtuo Hu | 2024-09-13 | 1 | -1/+1 |
| | |||||
* | Missing metal kernels. (#2474) | Laurent Mazare | 2024-09-12 | 1 | -0/+2 |
| | |||||
* | Hook the MLX matmul kernels in candle-core. (#2473) | Laurent Mazare | 2024-09-12 | 2 | -0/+38 |
| | |||||
* | Use the new MLX kernels to handle the BF16 matmul. (#2470) | Laurent Mazare | 2024-09-11 | 2 | -26/+46 |
| | |||||
* | Complete the missing backticks in the comments (#2469) | hongmengning | 2024-09-11 | 1 | -0/+3 |
| | |||||
* | Update cudarc to 0.12. (#2451) | Laurent Mazare | 2024-08-27 | 2 | -2/+4 |
| | | | | | * Update cudarc to 0.12. * Some cudnn tweaks. | ||||
* | Stream tensor (#2429) | Laurent Mazare | 2024-08-17 | 2 | -0/+208 |
| | | | | | | | * Support Minus(u) for arbitrary values of u, e.g. Minus(3). * Forces u to be strictly positive. * Add StreamTensor. | ||||
* | Support Minus(u) for arbitrary values of u, e.g. Minus(3). (#2428) | Laurent Mazare | 2024-08-17 | 1 | -0/+4 |
| | | | | | * Support Minus(u) for arbitrary values of u, e.g. Minus(3). * Forces u to be strictly positive. | ||||
* | Add documentation examples for `Tensor::i` and `Tensor::narrow` methods (#2308) | Carsten Csiky | 2024-08-10 | 2 | -8/+169 |
| | | | | | | | | | | | * Add documentation examples for `Tensor` methods * Apply fmt. * Cosmetic tweaks. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com> | ||||
* | optimize gradient for silu a bit (#2393) | MilkFather | 2024-08-04 | 1 | -2/+2 |
| | |||||
* | Revert the bf16 gemm metal changes for now. (#2386) | Laurent Mazare | 2024-08-01 | 1 | -2/+2 |
| | |||||
* | Add a minimal test for the metal bf16 matmul. (#2381) | Laurent Mazare | 2024-08-01 | 1 | -0/+20 |
| | |||||
* | Enable BF16 on metal. (#2380) | Laurent Mazare | 2024-08-01 | 1 | -0/+1 |
| | |||||
* | Add get_ids to GradStore (#2379) | Takanori MAEHARA | 2024-08-01 | 1 | -0/+5 |
| | |||||
* | Use BF16 on metal when possible. (#2378) | Laurent Mazare | 2024-08-01 | 1 | -0/+16 |
| | |||||
* | Fix log_sum_exp to handle large positive/negative inputs (#2367) | Yun-Jhong Wu | 2024-08-01 | 2 | -6/+34 |
| | |||||
* | Enable the affine kernel for u8/u32. (#2376) | Laurent Mazare | 2024-08-01 | 1 | -0/+2 |
| | |||||
* | Add support for Llama 3.1 (#2359) | Eric Buehler | 2024-07-26 | 5 | -10/+10 |
| | | | | | | | | | | | | | | | | | * Add Llama 3.1 rope * Clippy * Format * Clippy * Add support for multiple eos tokens: * Untagged either * Remove either dep and fix settings.json * Make the max positional embeddings configurable | ||||
* | Fix for backprop in ConvTranspose2D with stride of 2 (#2337) | Ivor Wanders | 2024-07-17 | 2 | -2/+99 |
| | | | | | | | | | | | * Add gradient test for conv_transpose2d with stride of 2. * Swap dilation and stride in ConvTranspose2D backpropagation. Without this, a shape mismatch occurs with a stride of 2 and dilation of 1. * Add further tests of the ConvTranspose2D gradient. Values calculated with torch, minor numerical errors adjusted and commented. | ||||
* | Fix Elu gradient NaN on large input (#2328) | Alexey Gerasev | 2024-07-16 | 1 | -1/+2 |
| | | | | | * Fix Elu gradient NaN on large input * Reuse previously computed exp in Elu | ||||
* | Add a basic metal example with capture (#2324) | Laurent Mazare | 2024-07-09 | 3 | -1/+39 |
| | | | | | * Add some tracing. * Get the trace to work. | ||||
* | Fix a bug in the metal implemtation of col2im1d. (#2284) | Laurent Mazare | 2024-06-22 | 1 | -1/+6 |
| | |||||
* | Fix the fast bf16 gemm cublas kernels. (#2274) | Laurent Mazare | 2024-06-18 | 4 | -12/+24 |
| | | | | | | | | | | | * Use flash-attn in gemma. * Fix for the fast bf16 cublas gemm. * Fix some clippy lints. * Fix another lint. * Proper clippy fix. |