forks/candle.git -

	Commit message (Expand)	Author	Age	Files	Lines
*	Bump the caret version to 0.8.2. (#2703)	Laurent Mazare	2025-01-07	1	-1/+1
*	Bump the crate version to 0.8.1. (#2662)	Laurent Mazare	2024-12-07	1	-1/+1
*	Import the ggml_cuda_dp4a function. (#2628)	Laurent Mazare	2024-11-19	1	-33/+44
*	Bump the crate version to 0.8.0. (#2612)	Laurent Mazare	2024-11-12	1	-1/+1
*	Improved launch config for layer-norm/rms-norm. (#2591)	Laurent Mazare	2024-11-04	1	-8/+6
*	Bump the crate version to 0.7.2. (#2517)	Laurent Mazare	2024-09-29	1	-1/+1
*	Move the candle version to 0.7.1. (#2495)	Laurent Mazare	2024-09-22	1	-1/+1
*	Bump the crate version. (#2491)	Laurent Mazare	2024-09-21	1	-1/+1
*	Bump the version to 0.6.1. (#2438)	Laurent Mazare	2024-08-22	1	-1/+1
*	Bump the crate version. (#2248)	Laurent Mazare	2024-06-05	1	-1/+1
*	Add the layernorm specialized op. (#2212)	Laurent Mazare	2024-05-24	1	-0/+84
*	More efficient cuda implementation for ConvTranspose1d. (#2211)	Laurent Mazare	2024-05-24	1	-0/+65
*	Bump the version number to 0.5.1. (#2155)	Laurent Mazare	2024-05-03	1	-1/+1
*	Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114)	MilkFather	2024-04-29	1	-0/+9
*	Add the cuda dequantize f16 kernels. (#2137)	Laurent Mazare	2024-04-28	1	-37/+75
*	Add argsort. (#2132)	Laurent Mazare	2024-04-27	2	-0/+89
*	Add more QMMV cuda kernels. (#2077)	Laurent Mazare	2024-04-18	1	-0/+324
*	Add the mmv kernels for small batch sizes. (#2075)	Laurent Mazare	2024-04-16	1	-10/+254
*	Faster kernels for quantized matmul on cuda (#2060)	Laurent Mazare	2024-04-15	1	-11/+118
*	Add the full quantized matmul kernels for cuda. (#2057)	Laurent Mazare	2024-04-14	1	-0/+1071
*	Add the rope THD kernel. (#2014)	Laurent Mazare	2024-04-05	1	-5/+43
*	Add support for "sign" on tensors (#2012)	Thomas Santerre	2024-04-04	1	-0/+9
*	Bumping the version number to 0.5.0. (#2009)	Laurent Mazare	2024-04-04	1	-1/+1
*	Relax the contiguous check for cuda kernels. (#2000)	Laurent Mazare	2024-04-03	1	-1/+1
*	More ggml cuda kernels (#1977)	Laurent Mazare	2024-04-01	1	-75/+1014
*	Ensure that the kernels get rebuilt on cuh changes. (#1954)	Laurent Mazare	2024-03-28	1	-0/+3
*	Use the new rope kernel in mistral. (#1937)	Laurent Mazare	2024-03-25	1	-2/+2
*	Contiguous variant of the rope kernel. (#1929)	Laurent Mazare	2024-03-25	1	-6/+34
*	Fast kernels for rotary embeddings. (#1928)	Laurent Mazare	2024-03-24	1	-0/+29
*	Add cast_bf16_x/cast_x_bf16 when CUDA_ARCH<800 but CUDA_VERSION >= 11000 (#1919)	yinqiwen	2024-03-23	1	-0/+12
*	Support scatter/index_add with i64 indices for f16 (#1915)	Daniël de Kok	2024-03-22	1	-0/+2
*	Custom op for RmsNorm (#1890)	Laurent Mazare	2024-03-21	1	-0/+65
*	Cuda backend optimization (#1886)	Laurent Mazare	2024-03-20	4	-7/+7
*	Optimize the cat operation on contiguous tensors (#1855)	Laurent Mazare	2024-03-17	1	-1/+29
*	Bump the crate versions to 0.4.2. (#1821)	Laurent Mazare	2024-03-08	1	-1/+1
*	Add a cuda kernel for dequantizing q8_0. (#1804)	Laurent Mazare	2024-03-05	1	-0/+24
*	Handle Q5_0 and Q5_1 quants in cuda.	laurent	2024-02-29	1	-7/+9
*	Bump the version number to 0.4.1. (#1768)	Laurent Mazare	2024-02-27	1	-1/+1
*	Cuda kernel for dequantizing q8k. (#1760)	Laurent Mazare	2024-02-26	1	-0/+35
*	Cuda acceleration for quantized model. (#1754)	Laurent Mazare	2024-02-25	2	-0/+1537
*	Fix the silu cuda kernel. (#1710)	Laurent Mazare	2024-02-14	1	-1/+1
*	feat: add silu activation function (#1706)	OlivierDehaene	2024-02-14	1	-0/+9
*	ConvTranspose1d cuda support. (#1697)	Laurent Mazare	2024-02-12	1	-2/+77
*	Bump the crate version to 0.4.0. (#1658)	Laurent Mazare	2024-02-04	1	-1/+1
*	Moving to a proper build crate `bindgen_cuda`. (#1531)	Nicolas Patry	2024-01-07	2	-242/+5
*	Bump the crate version to 0.3.3. (#1490)	Laurent Mazare	2023-12-28	1	-1/+1
*	Bump the crate version to 0.3.2. (#1452)	Laurent Mazare	2023-12-17	1	-1/+1
*	Update for 0.3.1. (#1324)	Laurent Mazare	2023-11-11	1	-2/+2
*	Rework the cuda casting bits. (#1112)	Laurent Mazare	2023-10-17	1	-31/+54
*	feat: parse Cuda compute cap from env (#1066)	OlivierDehaene	2023-10-16	2	-89/+110