forks/candle.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add the layernorm specialized op. (#2212)	Laurent Mazare	2024-05-24	1	-0/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add the layernorm cuda kernels. * Dedicated layer norm op. * Add the slower variant. * Plug the cuda implementation. * Add the metal variant. * Add a dedicated test. * Bugfix.
*	Add the rope THD kernel. (#2014)	Laurent Mazare	2024-04-05	1	-4/+44
\| \| \| \| \| \| \| \| \|	* Add the rope THD kernel. * Cuda kernel for rope-thd. * Add the metal kernels. * Add a dedicated test.
*	update dtypes checks for several metal operations (#2010)	Thomas Santerre	2024-04-04	1	-0/+4
\|
*	Minor cleanups in reduce.metal. (#2004)	Laurent Mazare	2024-04-04	1	-23/+1
\|
*	refactor to reduce the amount of code wrapped in template syntax (#2002)	Thomas Santerre	2024-04-04	1	-261/+368
\|
*	Contiguous variant of the rope kernel. (#1929)	Laurent Mazare	2024-03-25	1	-5/+30
\| \| \| \| \| \| \|	* Contiguous variant of the rope kernel. * Add the cuda kernel. * Metal kernel.
*	Fast kernels for rotary embeddings. (#1928)	Laurent Mazare	2024-03-24	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fast kernels for rotary embeddings. * Add a test for the fast CPU kernel. * Rope cuda bindings. * Cuda kernel. * Metal kernel (part 1). * Cuda kernels. * Finish the metal kernel. * Use the new kernels in the quantized example. * Fix warning.
*	RmsNorm kernel for metal. (#1895)	Laurent Mazare	2024-03-21	1	-0/+56
\| \| \| \| \| \| \| \| \|	* RmsNorm kernel for metal. * Wrapper for the metal kernel. * Get the ops to actually work. * Fix, get the tests to pass.
*	Use __HAVE_BFLOAT__ to check for bfloat support instead of metal version ↵	ivarflakstad	2024-01-10	1	-1/+1
\| \| \| \|	check (#1540)
*	Metal: more u8/u32 (#1502)	Gonzalo	2023-12-29	1	-0/+5
\| \| \| \| \|	* Adds more metal u8 * Metal: more u32
*	Metal: i64 basic support (#1495)	Gonzalo	2023-12-29	1	-0/+9
\| \| \| \| \|	* Adds basic metal i64 support * metal copy i64
*	Finish reduce kernels.	Nicolas Patry	2023-12-17	1	-10/+153
\|
*	Renamed all kernel names.	Nicolas Patry	2023-12-15	1	-6/+6
\|
*	Fixing softmax.	Nicolas Patry	2023-12-15	1	-4/+7
\|
*	Fix softmax for long sequences (missing barrier).	Nicolas Patry	2023-12-14	1	-6/+9
\|
*	Lots of updates including some stack of command buffers.	nicolas	2023-12-12	1	-1/+1
\|
*	Starting to fix some tests.	Nicolas Patry	2023-11-30	1	-76/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Few fixes. Going back on remote metal-rs. Reusing a single buffer (for now) to speed things up. Adding some half kernels. All tests are panicking instead of random failure. Putting back f16 index select. Add erf. Working version for llama2-c. Fixes + cache compute_pipeline_state. BF16 metal fix. Remove some prints. new_owned -> new()..to_owned(). Better batched matmul. Metal operational. Reuse buffers on our own reference counts. Tmp gemm. Revert "Tmp gemm." This reverts commit c65f68e98814b65daa596696bda076a73303dd82. Interleave committing. Speeding up copies using blit. Fmt. Fmt. Remove the assert! Fmt all. Fixes after big rebase. Add softmax for half and bfloat + tests Fixing Llama example + accumulate softmax in float.
*	Adding indexing.	Nicolas Patry	2023-11-20	1	-39/+54
\| \| \| \|	Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>
*	Adding the actual backend	Nicolas Patry	2023-11-20	1	-0/+124