forks/candle.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix the fast bf16 gemm cublas kernels. (#2274)	Laurent Mazare	2024-06-18	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \|	* Use flash-attn in gemma. * Fix for the fast bf16 cublas gemm. * Fix some clippy lints. * Fix another lint. * Proper clippy fix.
*	Make it possible to use TF32 accumulation in F32 matmuls. (#2178)	Laurent Mazare	2024-05-11	1	-24/+18
\| \| \| \| \| \| \|	* Allow the use of tf32 accumulation in matmul. * Better timings. * Dummy versions for use when cuda is not enabled.
*	Cuda kernel for dequantizing q8k. (#1760)	Laurent Mazare	2024-02-26	1	-4/+4
\| \| \| \| \|	* Cuda kernel for dequantizing q8k. * Clippy lints.
*	Cuda acceleration for quantized model. (#1754)	Laurent Mazare	2024-02-25	1	-16/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Boilerplate for the quantized cuda support. * More basic cuda support. * More cuda quantization (quantize on cpu for now). * Add the dequantization bit. * Start adding some dedicated cuda kernels from llama.cpp. * Move the kernel code. * Start interfacing with the kernel. * Tweak the kernel launch params. * Bugfix for quantized metal. * Fix some clippy lints. * Tweak the launch parameters. * Tweak cuda basics to perform a quantized matmul. * Perform the dequantization on the cpu + use cublas for matmul. * Add the dequantization kernel. * Test the qmatmul. * More kernels. * Matmul-vec kernel. * Add a couple kernels. * More dequantization kernels.
*	Dilated convolutions (#657)	Laurent Mazare	2023-08-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add the dilation parameter. * Restore the basic optimizer example. * Dilation support in cudnn. * Use the dilation parameter in the cpu backend. * More dilation support. * No support for dilation in transposed convolutions. * Add dilation to a test. * Remove a print. * Helper function.
*	Add to the cuda example a reproduction of the issue. (#579)	Laurent Mazare	2023-08-24	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Add to the cuda example a reproduction of the issue. * Tweak. * Add a test using non-square matrixes. * Fix the conv2d kernel. * Display the error. * And tweak the comment.
*	Add a test for conv2d with padding + bugfix the random number generation on ↵	Laurent Mazare	2023-08-24	1	-0/+3
\| \| \| \| \| \| \| \| \|	cuda. (#578) * Add a test for conv2d with padding. * Cosmetic changes. * Bugfix the rand function on the cuda backend.
*	Add some group parameter to convolutions. (#566)	Laurent Mazare	2023-08-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Add some group parameter to convolutions. * Avoid some unnecessary groups checks. * Move the tensor convolution bits. * Properh handling of groups. * Bump the crate version. * And add a changelog.
*	Cudnn support (#445)	Laurent Mazare	2023-08-14	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \|	* Add a cudnn feature to be used for conv2d. * Allocate the proper workspace. * Only create a single cudnn handle per cuda device. * Proper cudnn usage. * Bugfix.
*	More accelerate optimizations (#427)	Laurent Mazare	2023-08-13	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \|	* Add more tracing to the whisper example. * Support accelerate in more examples. * Use accelerate for pointwise functions. * Use accelerate for binary operations too. * Bugfix for binary operation: use the rhs before the lhs.
*	Rename the candle crate to candle-core (#301)	Laurent Mazare	2023-08-02	1	-1/+1
\| \| \| \| \|	* Rename to candle-core. * More candle-core renaming.
*	Simplify the parameters used by sum and sum_keepdim. (#165)	Laurent Mazare	2023-07-14	1	-2/+2
\|
*	Use the same default as pytorch for sum. (#164)	Laurent Mazare	2023-07-13	1	-2/+2
\|
*	Sketch a fast cuda kernel for reduce-sum. (#109)	Laurent Mazare	2023-07-08	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \|	* Sketch a fast cuda kernel for reduce-sum. * Sketch the rust support code for the fast sum kernel. * More work on the fast kernel. * Add some testing ground. * A couple fixes for the fast sum kernel.
*	Add some very simple sum benchmark. (#108)	Laurent Mazare	2023-07-08	1	-34/+0
\| \| \| \| \|	* Add some very simple sum benchmark. * Rename the file.
*	Add mkl support for matrix multiply. (#86)	Laurent Mazare	2023-07-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix some rebase issues. * Use mkl instead. * Use mkl in bert. * Add the optional mkl feature. * Conditional compilation based on the mkl feature. * Add more mkl support.
*	Refactor the hierarchy.	Nicolas Patry	2023-06-27	1	-0/+31