summaryrefslogtreecommitdiff
path: root/candle-metal-kernels/src/unary.metal
Commit message (Collapse)AuthorAgeFilesLines
* Fix for metal tanh. (#2475)Laurent Mazare2024-09-131-3/+8
|
* Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114)MilkFather2024-04-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * add sigmoid op * small fix * add as a method on `Tensor` * implement gradient calculation for sigmoid * add sigmoid tests * we should have a specialized op for this * fix clippy * fix clippy 2 * Revert all previous commits in favor of a `CustomOp` based solution * use `CustomOp1` implementation * fix rustfmt * experimental add metal impl * add cuda kernel impl * fix fmt * Add a test + reduce some cuda duplication. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
* Metal Unary: Add benchmarks and process kernels in a tile based fashion (#2056)Thomas Santerre2024-04-211-2/+15
| | | | | | | | | | | | | | | | | * add basic unary bench for sqrt * process unary commands in tiles of 4 * re-enable all benchmarks * rename helper to unary * modify approach to split up tiled and non-tiled operations * undo bench ignore for other tests * update tile size to 2 * only perform the optimization on the contiguous even numbered element case
* Add missing bfloat unary strided kernels and fix typo (#2058)ivarflakstad2024-04-141-1/+1
|
* Optimize copy-2d for metal. (#2024)Laurent Mazare2024-04-071-12/+8
| | | | | * Optimize copy-2d for metal. * Add a hacky stopping rule for moondream.
* Add support for "sign" on tensors (#2012)Thomas Santerre2024-04-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | * add the sign unary operator * remove uneeded import * remove uneeded import * undo formatting * undo formatting * remove unnecessary redefintion * allow gradient to flow through for sign and round * fix cpu ops to ensure that negzero and positive zero are handled properly * clippy fixes * Properly avoid gradient tracking. * Use a branchless version. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
* Optimize the cat operation on contiguous tensors (#1855)Laurent Mazare2024-03-171-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add a specialized kernel for copy2d. * Move the cat operations. * Avoid transpositions in cat. * Bugfix. * Bugfix for the cuda kernel. * Add a benchmark. * Add more testing. * Test fix. * Faster kernel. * Add the missing kernel. * Tweak the test. * Add a metal kernel. * Fix for the metal kernel. * Get the tests to pass on metal. * Also use this opportunity to fix the metal kernel for ELU. * Add some bf16 kernels. * Clippy fixes.
* feat: add silu activation function (#1706)OlivierDehaene2024-02-141-0/+5
| | | | | | | | | * feat: add silu activation function * use silu/arg in grad * update candle-nn * use node
* Quantized GGUF style (#1523)Nicolas Patry2024-01-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml). * Cleanup. * Fix the rebase. * Removing the fences speeds everything up and *is* correct this time... * Cleanup the fence. * After rebase. * Bad code removal. * Rebase after phi2 merge + fix replit default to CPU. * Making the CI happy. * More happy tests. --------- Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>
* Use __HAVE_BFLOAT__ to check for bfloat support instead of metal version ↵ivarflakstad2024-01-101-1/+1
| | | | check (#1540)
* Add relu kernel for metal (#1488)Juarez Bochi2024-01-101-0/+8
| | | | | | | | | | | | | | | | | | | | | | | * Add relu kernel for metal * Copy error messages proposed in #1491 * Revert non relu changes * Fix name changes * Fix the last of us (: * Fix copy and paste mistakes * Fix typo * Revert order changes * Revert order change * Add deleted functions back * Run rustfmt
* Metal: support unary abs (#1503)Gonzalo2023-12-301-0/+1
| | | | | * Metal: support unary abs * cargo fmt
* Metal: i64 basic support (#1495)Gonzalo2023-12-291-0/+4
| | | | | * Adds basic metal i64 support * metal copy i64
* fix bad pattern matching and function nameBaye Dieng2023-12-291-3/+3
|
* add urecip op to metal backendBaye Dieng2023-12-281-2/+5
|
* Adding the convolutions (1d + 2d) to candle on metal.Nicolas Patry2023-12-211-6/+6
|
* Renamed all kernel names.Nicolas Patry2023-12-151-6/+6
|
* Lots of updates including some stack of command buffers.nicolas2023-12-121-2/+4
|
* Fix gelu for large xJuarez Bochi2023-12-061-3/+8
|
* Starting to fix some tests.Nicolas Patry2023-11-301-1/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Few fixes. Going back on remote metal-rs. Reusing a single buffer (for now) to speed things up. Adding some half kernels. All tests are panicking instead of random failure. Putting back f16 index select. Add erf. Working version for llama2-c. Fixes + cache compute_pipeline_state. BF16 metal fix. Remove some prints. new_owned -> new()..to_owned(). Better batched matmul. Metal operational. Reuse buffers on our own reference counts. Tmp gemm. Revert "Tmp gemm." This reverts commit c65f68e98814b65daa596696bda076a73303dd82. Interleave committing. Speeding up copies using blit. Fmt. Fmt. Remove the assert! Fmt all. Fixes after big rebase. Add softmax for half and bfloat + tests Fixing Llama example + accumulate softmax in float.
* Cleanup fixed a few ops removed debugging scaffolding.Nicolas Patry2023-11-201-0/+2
|
* Fixing the kernels + launches to make them faster.Nicolas Patry2023-11-201-14/+10
| | | | | | Cool work by @ivarflakstad Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>
* Adding the actual backendNicolas Patry2023-11-201-0/+82