forks/candle.git -

	Commit message (Expand)	Author	Age	Files	Lines
*	Rework the cuda casting bits. (#1112)	Laurent Mazare	2023-10-17	1	-31/+54
*	fix: fix index_select cuda kernel for src target dim different than ids dim w...	Gonzalo	2023-10-05	1	-6/+8
*	Add the rounding operators. (#1030)	Laurent Mazare	2023-10-04	2	-0/+24
*	fix: add missing gpu fill_* (#996)	Gonzalo	2023-09-29	1	-0/+9
*	Optimize the index-select cuda kernel. (#976)	Laurent Mazare	2023-09-28	1	-14/+8
*	Add the missing kernel. (#955)	Laurent Mazare	2023-09-24	1	-0/+1
*	cuda cast i64 (#925)	Gonzalo	2023-09-21	1	-0/+10
*	Add an erf based gelu op (#900)	Laurent Mazare	2023-09-19	2	-0/+25
*	im2col version of the conv1d kernel. (#815)	Laurent Mazare	2023-09-11	1	-1/+70
*	im2col based conv2d (#802)	Laurent Mazare	2023-09-10	1	-0/+89
*	Add a dedicated cuda kernel for softmax. (#746)	Laurent Mazare	2023-09-05	1	-0/+55
*	Add tanh. (#675)	Laurent Mazare	2023-08-30	1	-0/+4
*	Support dilation in conv-transpose2d. (#671)	Laurent Mazare	2023-08-30	1	-3/+3
*	Add the powf op. (#664)	Laurent Mazare	2023-08-29	1	-0/+4
*	Fix the dilated convolutions. (#659)	Laurent Mazare	2023-08-29	1	-2/+2
*	Dilated convolutions (#657)	Laurent Mazare	2023-08-29	1	-6/+12
*	Cuda conv transpose (#645)	Laurent Mazare	2023-08-28	1	-0/+88
*	Let's keep the dirty code on its own.	Nicolas Patry	2023-08-25	1	-2/+25
*	Intermediary float cast is necessary for cuda 11.8	Nicolas Patry	2023-08-25	1	-2/+2
*	`static_cast` ?	Nicolas Patry	2023-08-25	1	-2/+2
*	Different casting ?	Nicolas Patry	2023-08-25	1	-2/+2
*	Repairing cast bf16/f16	Nicolas Patry	2023-08-25	1	-4/+4
*	Add to the cuda example a reproduction of the issue. (#579)	Laurent Mazare	2023-08-24	1	-10/+11
*	Add support for i64 (#563)	Laurent Mazare	2023-08-23	6	-1/+65
*	Add a yolo-v3 example. (#528)	Laurent Mazare	2023-08-20	1	-0/+12
*	Add a cuda kernel for upsampling. (#441)	Laurent Mazare	2023-08-14	1	-0/+62
*	Add a cuda kernel for avg-pool2d. (#440)	Laurent Mazare	2023-08-14	1	-3/+157
*	Add a naive conv2d cuda kernel. (#438)	Laurent Mazare	2023-08-14	1	-8/+93
*	Compat windows.	Nicolas Patry	2023-08-10	1	-0/+9
*	This is duplicated code on Cuda 12.2.	Nicolas Patry	2023-08-10	1	-18/+0
*	Add the recip op + use it in stable-diffusion. (#331)	Laurent Mazare	2023-08-06	1	-0/+4
*	Remove the embedding ops in favor of index-select. (#299)	Laurent Mazare	2023-08-02	1	-40/+0
*	Cuda support for the mnist training. (#277)	Laurent Mazare	2023-07-29	2	-7/+118
*	Support for where-cond on cuda for u8 and u32. (#274)	Laurent Mazare	2023-07-29	1	-8/+15
*	Add some flash attn test (#253)	Laurent Mazare	2023-07-26	1	-2/+2
*	Add a test for scatter add. (#238)	Laurent Mazare	2023-07-25	1	-5/+3
*	Cuda kernels for IndexAdd/ScatterAdd. (#236)	Laurent Mazare	2023-07-24	2	-1/+102
*	Indexing cuda (#235)	Laurent Mazare	2023-07-24	1	-8/+119
*	Add some cmp tests. (#233)	Laurent Mazare	2023-07-24	2	-10/+56
*	Cleanup some todos. (#226)	Laurent Mazare	2023-07-23	1	-109/+83
*	Revert "Add the layer norm files. (#222)" (#223)	Laurent Mazare	2023-07-22	7	-1527/+0
*	Add the layer norm files. (#222)	Laurent Mazare	2023-07-22	7	-0/+1527
*	Cuda kernels for fast min/max reductions (#203)	Laurent Mazare	2023-07-19	2	-9/+117
*	Add the elu cuda kernel. (#114)	Laurent Mazare	2023-07-10	1	-0/+38
*	Make it easier to use whisper samples from the repo. (#112)	Laurent Mazare	2023-07-08	1	-12/+12
*	Cuda kernel for the conv1d op (#111)	Laurent Mazare	2023-07-08	2	-0/+75
*	Sketch a fast cuda kernel for reduce-sum. (#109)	Laurent Mazare	2023-07-08	1	-0/+67
*	Tweak the include order to include math.h first. (#100)	Laurent Mazare	2023-07-07	1	-1/+1
*	Include the math.h file to get access to constants. (#99)	Laurent Mazare	2023-07-07	1	-0/+2
*	Minor tweaks.	laurent	2023-07-03	1	-0/+3