forks/candle.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	add scatter add (#2656)	zachcp	2024-12-01	1	-0/+1
\|
*	add u32 - U32 gather (#2653)	zachcp	2024-11-30	1	-0/+1
\|
*	20241118 docs (#2629)	zachcp	2024-11-19	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* module docs * varbuilder gguf docs * add a link to gguf files * small additonal mod doc titles * safetensor docs * more core docs * more module docs in canlde_core * 2 more link fixes
*	Add some missing index-select metal kernels. (#2613)	Laurent Mazare	2024-11-12	1	-1/+10
\| \| \| \| \|	* Add some missing index-select metal kernels. * Make some matrix contiguous pre-matmul.
*	UG metal integration. (#2580)	Laurent Mazare	2024-10-27	1	-0/+22
\|
*	Switch to using the MLX matmul by default. (#2547)	Laurent Mazare	2024-10-06	1	-3/+3
\|
*	Efficient implementation of `Tensor::ones()` for `metal` (#2512)	Anubhab Bandyopadhyay	2024-10-01	1	-4/+32
\| \| \| \| \| \| \| \| \| \| \| \| \|	* WIP: hopefully better const impl * with GPU * More tests on * Reverting primitive for * Incorporating review changes - added check elem count check in kerner, using for call strategy * rustfmt ran
*	Metal commands refactoring (#2489)	Laurent Mazare	2024-09-21	2	-99/+113
\| \| \| \| \| \| \| \| \|	* Split out the commands part of the metal device. * Make most fields private. * Move the allocator back. * Rework the encoder provider type.
*	Add a couple cast metal kernels. (#2479)	Laurent Mazare	2024-09-15	1	-8/+31
\|
*	Missing metal kernels. (#2474)	Laurent Mazare	2024-09-12	1	-0/+2
\|
*	Hook the MLX matmul kernels in candle-core. (#2473)	Laurent Mazare	2024-09-12	2	-0/+38
\|
*	Use the new MLX kernels to handle the BF16 matmul. (#2470)	Laurent Mazare	2024-09-11	1	-24/+44
\|
*	Enable BF16 on metal. (#2380)	Laurent Mazare	2024-08-01	1	-0/+1
\|
*	Enable the affine kernel for u8/u32. (#2376)	Laurent Mazare	2024-08-01	1	-0/+2
\|
*	Add a basic metal example with capture (#2324)	Laurent Mazare	2024-07-09	1	-1/+7
\| \| \| \| \|	* Add some tracing. * Get the trace to work.
*	Fix a bug in the metal implemtation of col2im1d. (#2284)	Laurent Mazare	2024-06-22	1	-1/+6
\|
*	add where_cond f32 for metal (#2236)	Lionel Touati	2024-06-02	1	-0/+1
\|
*	Add a metal kernel for col2im1d. (#2214)	Laurent Mazare	2024-05-25	1	-34/+92
\| \| \| \| \| \| \| \| \|	* Add a metal kernel for col2im1d. * Enable the col2im variant. * Bugfix. * Revert the quantized tweak.
*	Use write rather than try-write on the metal rw-locks. (#2162)	Laurent Mazare	2024-05-05	2	-7/+13
\|
*	Separate quantized phi-3 implementation. (#2157)	Laurent Mazare	2024-05-04	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \|	* Separate quantized phi-3 implementation. * Integrate the quantized phi3 model.= * Small fixes, get the generation to work properly. * Keep the old llama implementation around. * Change the default.
*	Add argsort. (#2132)	Laurent Mazare	2024-04-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add the argsort cuda kernels. * CPU version of arg-sort. * Hook the cuda kernel + rework the cpu bits. * Add some dedicated test. * Working cuda kernel. * Metal kernel. * Metal adjustments. * Bugfix. * Use the fast rope in qwen. * Rework the expert selection in qwen.
*	Add StorageRef. (#2113)	Laurent Mazare	2024-04-23	1	-1/+14
\| \| \| \| \|	* Add the storage-ref bits. * Add the metal implementation.
*	Metal Unary: Add benchmarks and process kernels in a tile based fashion (#2056)	Thomas Santerre	2024-04-21	1	-147/+232
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* add basic unary bench for sqrt * process unary commands in tiles of 4 * re-enable all benchmarks * rename helper to unary * modify approach to split up tiled and non-tiled operations * undo bench ignore for other tests * update tile size to 2 * only perform the optimization on the contiguous even numbered element case
*	Fix for the batch dim in the quantized matmul example. (#2073)	Laurent Mazare	2024-04-15	1	-1/+1
\| \| \| \| \| \| \| \| \|	* Fix for the batch dim in the quantized matmul example. * Enable more tests on cuda. * Add a test for qmm with a batch. * Fix the zeros-dim test on metal.
*	Add missing bfloat unary strided kernels and fix typo (#2058)	ivarflakstad	2024-04-14	1	-0/+20
\|
*	Add a synchronize method to devices. (#2055)	Laurent Mazare	2024-04-14	1	-0/+4
\| \| \| \| \|	* Add a synchronize method to devices. * Metal version.
*	Support gather on bf16 for metal. (#2035)	Laurent Mazare	2024-04-10	1	-0/+1
\|
*	Use BufferOffset in metal backend ops. (#2029)	Laurent Mazare	2024-04-08	1	-50/+39
\| \| \| \| \| \| \|	* Use BufferOffset in the metal backend. * More BufferOffset usage. * Use in where-cond.
*	Rework the buffer offset logic for metal kernels (#2028)	Laurent Mazare	2024-04-07	1	-39/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Move the metal kernels utils in a separate module. * Use the BufferOffset for unary ops. * Fix clippy lints. * Use the new BufferOffset. * Adapt the binary ops. * Affine. * More ops (powf, elu, cast).
*	Add support for "sign" on tensors (#2012)	Thomas Santerre	2024-04-04	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* add the sign unary operator * remove uneeded import * remove uneeded import * undo formatting * undo formatting * remove unnecessary redefintion * allow gradient to flow through for sign and round * fix cpu ops to ensure that negzero and positive zero are handled properly * clippy fixes * Properly avoid gradient tracking. * Use a branchless version. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
*	update dtypes checks for several metal operations (#2010)	Thomas Santerre	2024-04-04	1	-27/+45
\|
*	Backend refactoring. (#1966)	Laurent Mazare	2024-03-29	2	-0/+2071
	* Backend refactoring. * Metal tweaks. * Move the cudnn module.