summaryrefslogtreecommitdiff
path: root/candle-metal-kernels
Commit message (Collapse)AuthorAgeFilesLines
...
* More flexible matmul contiguity checks. (#1949)Laurent Mazare2024-03-271-4/+8
| | | | | * More flexible matmul contiguity checks. * Also relax the checks on the metal side.
* Extend supported dtypes for metal (im2col & upsample_2d) (#1938)Thomas Santerre2024-03-261-0/+8
| | | | | * update im2col dtype implementations * update dtypes for upsample
* Contiguous variant of the rope kernel. (#1929)Laurent Mazare2024-03-252-5/+73
| | | | | | | * Contiguous variant of the rope kernel. * Add the cuda kernel. * Metal kernel.
* Fast kernels for rotary embeddings. (#1928)Laurent Mazare2024-03-242-0/+64
| | | | | | | | | | | | | | | | | | | * Fast kernels for rotary embeddings. * Add a test for the fast CPU kernel. * Rope cuda bindings. * Cuda kernel. * Metal kernel (part 1). * Cuda kernels. * Finish the metal kernel. * Use the new kernels in the quantized example. * Fix warning.
* Add support for strided index-select on Metal (#1909)Thomas Santerre2024-03-223-15/+119
| | | | | | | * initial implementation * use correct index, but still not breaking like it should have... * fix test
* Add support for conv_transpose2d on Metal backend (#1903)Thomas Santerre2024-03-212-0/+144
| | | | | | | * add support for conv transpose 2d and add bench mark for float types * update bench calculation * enable testing all conv operations on metal
* RmsNorm kernel for metal. (#1895)Laurent Mazare2024-03-212-0/+114
| | | | | | | | | * RmsNorm kernel for metal. * Wrapper for the metal kernel. * Get the ops to actually work. * Fix, get the tests to pass.
* Add support for conv_transpose1d for metal backend (#1874)Thomas Santerre2024-03-193-0/+347
| | | | | | | | | | | | | * first attempt * progress * integrate into metal backend * finish and get test passing * add other dtype support * update transpose1d dtypes supported
* Add avg_pool2d metal implementation for the metal backend (#1869)Thomas Santerre2024-03-183-13/+194
| | | | | | | * implement metal avg pool 2d * fixX * add suggested precision workaround for the accumulator
* Add support for max_pool2d for Metal backend (#1863)Thomas Santerre2024-03-183-1/+353
| | | | | | | | | | | | | * first pass at implementation of maxpool2d * Add definitions for other dtypes * add tests for other dtypes * Cosmetic tweaks + re-enable maxpool2d tests for metal. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* add test for index add and add missing match statements (#1862)Thomas Santerre2024-03-172-15/+139
|
* add support for casting between all datatypes (#1860)Thomas Santerre2024-03-172-96/+211
|
* Optimize the cat operation on contiguous tensors (#1855)Laurent Mazare2024-03-173-1/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add a specialized kernel for copy2d. * Move the cat operations. * Avoid transpositions in cat. * Bugfix. * Bugfix for the cuda kernel. * Add a benchmark. * Add more testing. * Test fix. * Faster kernel. * Add the missing kernel. * Tweak the test. * Add a metal kernel. * Fix for the metal kernel. * Get the tests to pass on metal. * Also use this opportunity to fix the metal kernel for ELU. * Add some bf16 kernels. * Clippy fixes.
* Add support for index u8/i64 and input f16/bf16 scatter-add on metal (#1849)Thomas Santerre2024-03-172-2/+115
| | | | | * add support and tests for scatter add on metal * add support for all datatypes
* Bump the crate versions to 0.4.2. (#1821)Laurent Mazare2024-03-081-1/+1
|
* Metal random-generation bug fixes (#1811)Niklas Hallqvist2024-03-082-12/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | * use_resource API misunderstood. It is not additive. Several usages must be bit-ORed together. * The seeding was incorrect and used the address instead of the value of the passed in seed. * Add a check that likely exhibits failure to update the seed between generation of random tensors. * Buffer overrun, the length given to the std::ptr::copy call was in bytes, and not 32-bit units. * By default seed the RNG with a time-based value, so that different runs may produce different output, just like the CPU engine. Use device.set_seed if determinism is warranted. * Revert "By default seed the RNG with a time-based value, so that different runs may produce different output, just like the CPU engine. Use device.set_seed if determinism is warranted." This reverts commit d7302de9 Discussion in https://github.com/huggingface/candle/pull/1811#issuecomment-1983079119 * The Metal random kernel failed to set element N/2 of tensors with N elements, N being even. The reason was that all threads but thread 0 all created 2 random samples, but thread 0 only one, i.e. an odd number. In order to produce an even number of samples, the early termination of thread 0 should only everr occur for odd sized tensors. * Add a test catching any deterministic tensor element in rand and randn output. --------- Co-authored-by: niklas <niklas@appli.se> Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>
* Bump the version number to 0.4.1. (#1768)Laurent Mazare2024-02-271-1/+1
| | | | | * Fix the block size for some cuda kernels. * Bump the version number to 0.4.1.
* feat: add silu activation function (#1706)OlivierDehaene2024-02-143-1/+25
| | | | | | | | | * feat: add silu activation function * use silu/arg in grad * update candle-nn * use node
* Bump the crate version to 0.4.0. (#1658)Laurent Mazare2024-02-041-1/+1
|
* Merge pull request #1606 from FL33TW00D/feature/larger-batchesChristopher Fleetwood2024-01-292-7/+6
|\ | | | | fix: larger batches
| * chore: finalFL33TW00D2024-01-222-15/+10
| |
| * chore: actual fixFL33TW00D2024-01-192-2/+3
| |
| * chore: switch to bufferFL33TW00D2024-01-192-10/+14
| |
| * fix: larger batchesFL33TW00D2024-01-182-7/+6
| |
* | Merge pull request #1533 from huggingface/ivarflakstad/metal-prngivarflakstad2024-01-223-4/+402
|\ \ | |/ |/|
| * Revert public EncoderParamIvar Flakstad2024-01-171-1/+1
| |
| * Merge branch 'main' into ivarflakstad/metal-prngIvar Flakstad2024-01-174-84/+5300
| |\
| * | Update metal random kernel and set_seed methodIvar Flakstad2024-01-171-8/+10
| | | | | | | | | | | | | | | | | | * set_seed via buffer content pointer copy + did_modify_range * ensure random.metal kernel does not write outside of buffer range when tid==0
| * | Seed should be updated by random kernel result.Ivar Flakstad2024-01-153-20/+48
| | |
| * | Merge branch 'main' into ivarflakstad/metal-prngIvar Flakstad2024-01-142-30/+50
| |\ \
| * | | fmtIvar Flakstad2024-01-121-9/+29
| | | |
| * | | Merge branch 'main' into ivarflakstad/metal-prngIvar Flakstad2024-01-129-24/+206
| |\ \ \
| * \ \ \ Merge branch 'main' into ivarflakstad/metal-prngIvar Flakstad2024-01-076-8/+77
| |\ \ \ \
| * | | | | Gaussian normal distribution of PRNG via Box-Muller transformIvar Flakstad2024-01-073-86/+178
| | | | | |
| * | | | | Implement hybrid Tausworthe + LCG psuedo random number generator in metalIvar Flakstad2024-01-053-4/+264
| | | | | |
* | | | | | Merge pull request #1602 from mimiquate/fix-metal-kernel-typeivarflakstad2024-01-181-1/+1
|\ \ \ \ \ \ | |_|_|_|_|/ |/| | | | | Metal: Use uint8_t as output type in int64_t binary op kernel
| * | | | | Fixes metal kernel u8 typeGonzalo2024-01-171-1/+1
| | |_|_|/ | |/| | |
* / | | | Quantized GGUF style (#1523)Nicolas Patry2024-01-174-75/+5295
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml). * Cleanup. * Fix the rebase. * Removing the fences speeds everything up and *is* correct this time... * Cleanup the fence. * After rebase. * Bad code removal. * Rebase after phi2 merge + fix replit default to CPU. * Making the CI happy. * More happy tests. --------- Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>
* | | | Metal: Activate bfloat affine and add benchmark (#1543)ivarflakstad2024-01-121-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Use cfg to seperate benchmark results based on features * Add bfloat affine and benchmarks * Fix flops calculation * Remove allow pragma * Avoid some unnecessary returns. * Improve benchmarks layout --------- Co-authored-by: Laurent <laurent.mazare@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* | | | Metal: f16 and bf16 where_cond + benchmark (#1545)ivarflakstad2024-01-121-23/+43
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Use cfg to seperate benchmark results based on features * Add metal where_cond for f16 and bf16. Add benchmark * Remove allow pragma * Avoid some unnecessary returns. * Improve benchmarks layout * Updated feature separated benchmarks --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* | | remove metal version checkBaye Dieng2024-01-111-2/+0
| | |
* | | close ifdefBaye Dieng2024-01-111-1/+1
| | |
* | | feat(bf16): add cast support + tests for cast + bin ops (#1524)Kyle McCarthy2024-01-114-15/+191
| | |
* | | Use __HAVE_BFLOAT__ to check for bfloat support instead of metal version ↵ivarflakstad2024-01-106-6/+6
| | | | | | | | | | | | check (#1540)
* | | Add relu kernel for metal (#1488)Juarez Bochi2024-01-102-2/+10
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add relu kernel for metal * Copy error messages proposed in #1491 * Revert non relu changes * Fix name changes * Fix the last of us (: * Fix copy and paste mistakes * Fix typo * Revert order changes * Revert order change * Add deleted functions back * Run rustfmt
* | Adding bfloat16 support for the cast kernels. (#1520)Nicolas Patry2024-01-041-0/+2
| |
* | Metal: support unary abs (#1503)Gonzalo2023-12-302-1/+5
| | | | | | | | | | * Metal: support unary abs * cargo fmt
* | Metal: more u8/u32 (#1502)Gonzalo2023-12-294-4/+17
| | | | | | | | | | * Adds more metal u8 * Metal: more u32
* | Metal: i64 basic support (#1495)Gonzalo2023-12-296-1/+48
| | | | | | | | | | * Adds basic metal i64 support * metal copy i64
* | fix bad pattern matching and function nameBaye Dieng2023-12-292-4/+4
| |