forks/candle.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	Clippy fixes for the cuda feature. (#2650)	Laurent Mazare	2024-11-29	1	-1/+1
\|
*	Cuda quantized mmv bugfix. (#2526)	Laurent Mazare	2024-10-01	1	-1/+25
\|
*	Yet another cuda qmm padding fix. (#2509)	Laurent Mazare	2024-09-30	1	-25/+55
\|
*	Add the cuda dequantize f16 kernels. (#2137)	Laurent Mazare	2024-04-28	1	-13/+75
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Add the cuda dequantize f16 kernels. * Expose the cuda kernels. * Add some testing + fix. * Test the other cases too. * A few more tests. * Add an environment variable to enable the dequantize f16 + matmul behavior.
*	Add more QMMV cuda kernels. (#2077)	Laurent Mazare	2024-04-18	1	-8/+10
\| \| \| \| \| \| \|	* Add more QMMV cuda kernels. * Enable the new kernels. * Adapt the testing.
*	Add the mmv kernels for small batch sizes. (#2075)	Laurent Mazare	2024-04-16	1	-18/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add the mmv kernels for smaller sizes. * Support more mmv kernels. * Use the new kernels. * Fix the call. * Silly fix. * Improve the testing. * Fix for dmmv. * Add another dedicated test for the batching mmv.
*	Fix for the batch dim in the quantized matmul example. (#2073)	Laurent Mazare	2024-04-15	1	-1/+1
\| \| \| \| \| \| \| \| \|	* Fix for the batch dim in the quantized matmul example. * Enable more tests on cuda. * Add a test for qmm with a batch. * Fix the zeros-dim test on metal.
*	Add a function to clear the KV cache in falcon. (#2066)	Laurent Mazare	2024-04-15	1	-0/+1
\| \| \| \| \|	* Add a function to clear the KV cache in falcon. * Clippy.
*	Faster kernels for quantized matmul on cuda (#2060)	Laurent Mazare	2024-04-15	1	-6/+137
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Hook the quantized matmul cuda kernels. * Add a (currently broken) test. * Kernel fixes. * Fix by transposing the rhs matrix. * Add the q4-1 kernels. * Proper block sizes. * More details in the tests.
*	Quantized cuda tweaks. (#1981)	Laurent Mazare	2024-04-01	1	-89/+62
\| \| \| \| \| \| \|	* Quantized cuda tweaks. * Add some safety checks. * Factorize the dequantization bits.
*	Switch the default to using the faster kernels. (#1978)	Laurent Mazare	2024-04-01	1	-1/+1
\| \| \| \| \|	* Switch the default to using the faster kernels. * Add the force-dmmv flag.
*	More ggml cuda kernels (#1977)	Laurent Mazare	2024-04-01	1	-7/+147
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add more cuda kernels for quantized matmul. * Add the vec-dot bits. * Expose the quantized matmul-vec kernels. * Also include the quantize-q8-1 kernel. * Glue code for the q8-1 quantization. * mm-vec product via q8-1 quantization. * Add a test. * Add a mm test. * Get the test to return some sensible results. * Also test dmmv. * Fix the launch params. * Allow for tweaking the force_dmmv parameter while it's experimental.
*	Properly handle the batch dimension in cuda quantized matmul. (#1832)	Laurent Mazare	2024-03-10	1	-1/+1
\|
*	Handle Q5_0 and Q5_1 quants in cuda.	laurent	2024-02-29	1	-16/+38
\|
*	Fix the block size for some cuda kernels. (#1767)	Laurent Mazare	2024-02-27	1	-13/+15
\|
*	Cuda kernel for dequantizing q8k. (#1760)	Laurent Mazare	2024-02-26	1	-18/+16
\| \| \| \| \|	* Cuda kernel for dequantizing q8k. * Clippy lints.
*	Cuda acceleration for quantized model. (#1754)	Laurent Mazare	2024-02-25	1	-0/+321
	* Boilerplate for the quantized cuda support. * More basic cuda support. * More cuda quantization (quantize on cpu for now). * Add the dequantization bit. * Start adding some dedicated cuda kernels from llama.cpp. * Move the kernel code. * Start interfacing with the kernel. * Tweak the kernel launch params. * Bugfix for quantized metal. * Fix some clippy lints. * Tweak the launch parameters. * Tweak cuda basics to perform a quantized matmul. * Perform the dequantization on the cpu + use cublas for matmul. * Add the dequantization kernel. * Test the qmatmul. * More kernels. * Matmul-vec kernel. * Add a couple kernels. * More dequantization kernels.