forks/candle.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add argsort. (#2132)	Laurent Mazare	2024-04-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add the argsort cuda kernels. * CPU version of arg-sort. * Hook the cuda kernel + rework the cpu bits. * Add some dedicated test. * Working cuda kernel. * Metal kernel. * Metal adjustments. * Bugfix. * Use the fast rope in qwen. * Rework the expert selection in qwen.
*	Cuda acceleration for quantized model. (#1754)	Laurent Mazare	2024-02-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Boilerplate for the quantized cuda support. * More basic cuda support. * More cuda quantization (quantize on cpu for now). * Add the dequantization bit. * Start adding some dedicated cuda kernels from llama.cpp. * Move the kernel code. * Start interfacing with the kernel. * Tweak the kernel launch params. * Bugfix for quantized metal. * Fix some clippy lints. * Tweak the launch parameters. * Tweak cuda basics to perform a quantized matmul. * Perform the dequantization on the cpu + use cublas for matmul. * Add the dequantization kernel. * Test the qmatmul. * More kernels. * Matmul-vec kernel. * Add a couple kernels. * More dequantization kernels.
*	Cuda kernels for IndexAdd/ScatterAdd. (#236)	Laurent Mazare	2023-07-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Skeleton methods for IndexAdd/ScatterAdd. * Add a Map2InPlace trait. * Add the glue code for the index-add/scatter-add kernels. * Tweak the file name: embeddings -> indexing. * Add the cuda kernel for indexadd. * And add the scatter-add kernels.
*	Revert "Add the layer norm files. (#222)" (#223)	Laurent Mazare	2023-07-22	1	-1/+0
\| \| \|	This reverts commit c8459d199ddcea909f6ccd18ae4945cb19d3eb9e.
*	Add the layer norm files. (#222)	Laurent Mazare	2023-07-22	1	-0/+1
\|
*	Cuda kernel for the conv1d op (#111)	Laurent Mazare	2023-07-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Boilerplate code for conv1d. * Boilerplate code for conv1d. * More boilerplate for conv1d. * Conv1d work. * Get the conv1d cuda kernel to work. * Conv1d support when no batch dim.
*	Refactor the hierarchy.	Nicolas Patry	2023-06-27	1	-0/+8