summaryrefslogtreecommitdiff
path: root/candle-core
Commit message (Collapse)AuthorAgeFilesLines
* Bump the ug dependency. (#2720)Laurent Mazare2025-01-161-1/+1
| | | | | | | * Bump the ug dependency. * Fix some test. * Fix the ug test.
* Clippy fixes for 1.84. (#2710)Laurent Mazare2025-01-101-4/+1
|
* Fix a cuda warning. (#2693)Laurent Mazare2024-12-311-39/+44
|
* Add a Context trait similar to anyhow::Context. (#2676)Laurent Mazare2024-12-226-16/+76
| | | | | * Add a Context trait similar to anyhow::Context. * Switch two unwrap to context.
* add scatter add (#2656)zachcp2024-12-011-0/+1
|
* add u32 - U32 gather (#2653)zachcp2024-11-301-0/+1
|
* Clippy fixes for the cuda feature. (#2650)Laurent Mazare2024-11-292-11/+11
|
* Lint fixes introduced with Rust 1.83 (#2646)Anubhab Bandyopadhyay2024-11-285-16/+16
| | | | | | | | | | | * Fixes for lint errors introduced with Rust 1.83 * rustfmt * Fix more lints. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* fix typo (#2606)Andrei Fajardo2024-11-231-1/+1
|
* 20241118 docs (#2629)zachcp2024-11-1922-12/+47
| | | | | | | | | | | | | | | | | * module docs * varbuilder gguf docs * add a link to gguf files * small additonal mod doc titles * safetensor docs * more core docs * more module docs in canlde_core * 2 more link fixes
* Add max-all/min-all. (#2616)Laurent Mazare2024-11-141-0/+36
|
* Add some missing index-select metal kernels. (#2613)Laurent Mazare2024-11-121-1/+10
| | | | | * Add some missing index-select metal kernels. * Make some matrix contiguous pre-matmul.
* Update docs (#2553)zachcp2024-11-111-0/+14
| | | | | * add module docs for candle-core * doc each of the candle-nn modules and add the links to the doc page
* Add some fast Metal MLX SDPA kernels (#2584)Eric Buehler2024-11-052-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add some fast Metal MLX SDPA kernels (#32) * Sketch the sdpa kernel * Add full sdpa kernel, * Add test * Add vectorized kernel for decoding * Update tests * Add some docs * Fix sdpa_vector names * Add softcapping for vectorized sdpa * Add softcapping for full sdpa * Add support for head dim 32, 96, 256 * Add support for head dim 32, 96, 256 * Update docs * Add update notice * Clippy and format * Conditional compilation for bf16 * Use it in quantized llama * Some review comments * Use set_params! * Remove unused * Remove feature * Fix metal sdpa for v stride * Remove comma * Add the dim method to layout and shape. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* UG metal integration. (#2580)Laurent Mazare2024-10-275-10/+87
|
* Support for UG kernels. (#2579)Laurent Mazare2024-10-277-2/+137
| | | | | * Support for UG kernels. * Add a dedicated test.
* Testcases (#2567)Anubhab Bandyopadhyay2024-10-172-3/+278
|
* Switch to using the MLX matmul by default. (#2547)Laurent Mazare2024-10-061-3/+3
|
* Fix for cudnn bf16 conv2d. (#2535)Laurent Mazare2024-10-022-10/+14
|
* Add support for cuda streams. (#2532)Laurent Mazare2024-10-023-0/+24
|
* Efficient implementation of `Tensor::ones()` for `metal` (#2512)Anubhab Bandyopadhyay2024-10-012-4/+62
| | | | | | | | | | | | | * WIP: hopefully better const impl * with GPU * More tests on * Reverting primitive for * Incorporating review changes - added check elem count check in kerner, using for call strategy * rustfmt ran
* Cuda quantized mmv bugfix. (#2526)Laurent Mazare2024-10-011-1/+25
|
* Yet another cuda qmm padding fix. (#2509)Laurent Mazare2024-09-301-25/+55
|
* Bugfix for the metal elu kernel. (#2490)Laurent Mazare2024-09-211-0/+13
| | | | | * Bugfix for the metal elu kernel. * Add a test.
* Metal commands refactoring (#2489)Laurent Mazare2024-09-212-99/+113
| | | | | | | | | * Split out the commands part of the metal device. * Make most fields private. * Move the allocator back. * Rework the encoder provider type.
* Improve error message (#2485)ivnsch2024-09-201-1/+5
|
* Add a couple cast metal kernels. (#2479)Laurent Mazare2024-09-151-8/+31
|
* Export TensorIndexer public to candle users (#2477)Shengtuo Hu2024-09-131-1/+1
|
* Missing metal kernels. (#2474)Laurent Mazare2024-09-121-0/+2
|
* Hook the MLX matmul kernels in candle-core. (#2473)Laurent Mazare2024-09-122-0/+38
|
* Use the new MLX kernels to handle the BF16 matmul. (#2470)Laurent Mazare2024-09-112-26/+46
|
* Complete the missing backticks in the comments (#2469)hongmengning2024-09-111-0/+3
|
* Update cudarc to 0.12. (#2451)Laurent Mazare2024-08-272-2/+4
| | | | | * Update cudarc to 0.12. * Some cudnn tweaks.
* Stream tensor (#2429)Laurent Mazare2024-08-172-0/+208
| | | | | | | * Support Minus(u) for arbitrary values of u, e.g. Minus(3). * Forces u to be strictly positive. * Add StreamTensor.
* Support Minus(u) for arbitrary values of u, e.g. Minus(3). (#2428)Laurent Mazare2024-08-171-0/+4
| | | | | * Support Minus(u) for arbitrary values of u, e.g. Minus(3). * Forces u to be strictly positive.
* Add documentation examples for `Tensor::i` and `Tensor::narrow` methods (#2308)Carsten Csiky2024-08-102-8/+169
| | | | | | | | | | | * Add documentation examples for `Tensor` methods * Apply fmt. * Cosmetic tweaks. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* optimize gradient for silu a bit (#2393)MilkFather2024-08-041-2/+2
|
* Revert the bf16 gemm metal changes for now. (#2386)Laurent Mazare2024-08-011-2/+2
|
* Add a minimal test for the metal bf16 matmul. (#2381)Laurent Mazare2024-08-011-0/+20
|
* Enable BF16 on metal. (#2380)Laurent Mazare2024-08-011-0/+1
|
* Add get_ids to GradStore (#2379)Takanori MAEHARA2024-08-011-0/+5
|
* Use BF16 on metal when possible. (#2378)Laurent Mazare2024-08-011-0/+16
|
* Fix log_sum_exp to handle large positive/negative inputs (#2367)Yun-Jhong Wu2024-08-012-6/+34
|
* Enable the affine kernel for u8/u32. (#2376)Laurent Mazare2024-08-011-0/+2
|
* Add support for Llama 3.1 (#2359)Eric Buehler2024-07-265-10/+10
| | | | | | | | | | | | | | | | | * Add Llama 3.1 rope * Clippy * Format * Clippy * Add support for multiple eos tokens: * Untagged either * Remove either dep and fix settings.json * Make the max positional embeddings configurable
* Fix for backprop in ConvTranspose2D with stride of 2 (#2337)Ivor Wanders2024-07-172-2/+99
| | | | | | | | | | | * Add gradient test for conv_transpose2d with stride of 2. * Swap dilation and stride in ConvTranspose2D backpropagation. Without this, a shape mismatch occurs with a stride of 2 and dilation of 1. * Add further tests of the ConvTranspose2D gradient. Values calculated with torch, minor numerical errors adjusted and commented.
* Fix Elu gradient NaN on large input (#2328)Alexey Gerasev2024-07-161-1/+2
| | | | | * Fix Elu gradient NaN on large input * Reuse previously computed exp in Elu
* Add a basic metal example with capture (#2324)Laurent Mazare2024-07-093-1/+39
| | | | | * Add some tracing. * Get the trace to work.
* Fix a bug in the metal implemtation of col2im1d. (#2284)Laurent Mazare2024-06-221-1/+6
|
* Fix the fast bf16 gemm cublas kernels. (#2274)Laurent Mazare2024-06-184-12/+24
| | | | | | | | | | | * Use flash-attn in gemma. * Fix for the fast bf16 cublas gemm. * Fix some clippy lints. * Fix another lint. * Proper clippy fix.