forks/candle.git -

	Commit message (Expand)	Author	Age	Files	Lines
*	Get the cuda tests to pass.	laurent	2023-06-28	1	-20/+29
*	Adapt the cuda bits.	laurent	2023-06-28	5	-87/+109
*	Fix some cpu issue.	laurent	2023-06-28	1	-27/+30
*	Remove some todos.	laurent	2023-06-28	1	-2/+0
*	Get the cpu tests to run.	laurent	2023-06-28	5	-19/+6
*	Get the cpu backend to compile.	laurent	2023-06-28	5	-59/+44
*	Propagate the changes on the cpu backend.	laurent	2023-06-28	2	-81/+89
*	Propagate the layout refactoring.	laurent	2023-06-28	5	-129/+130
*	Simplify the narrow implementation.	laurent	2023-06-28	3	-34/+36
*	Start refactoring the stride.	laurent	2023-06-28	5	-108/+124
*	Add the grad for narrow.	laurent	2023-06-28	1	-2/+25
*	Add more gradients.	laurent	2023-06-28	2	-6/+17
*	Add the relu op.	laurent	2023-06-28	3	-5/+34
*	Factor out the gemm bits.	laurent	2023-06-28	1	-180/+74
*	Add more cuda testing again.	laurent	2023-06-28	2	-42/+59
*	Also run the backprop tests on cuda.	laurent	2023-06-28	3	-8/+10
*	Add some display tests + bugfixes.	laurent	2023-06-27	3	-14/+102
*	PyTorch like display implementation.	laurent	2023-06-27	3	-196/+203
*	Add squeeze/unsqueeze/stack.	laurent	2023-06-27	1	-0/+30
*	Rework the debug trait.	laurent	2023-06-27	3	-6/+449
*	Add the get method.	laurent	2023-06-27	1	-0/+9
*	Add some helper functions.	laurent	2023-06-27	2	-6/+26
*	Add some test utils module.	laurent	2023-06-27	2	-41/+54
*	Factor the slicing code in cuda.	laurent	2023-06-27	1	-19/+23
*	Run the tensor tests for the cuda backend too.	laurent	2023-06-27	2	-41/+74
*	Use num-cpus to enable parallelism.	laurent	2023-06-27	4	-4/+17
*	Cache the causal mask in llama.	laurent	2023-06-27	1	-17/+51
*	Fix two cuda bugs (matmul and where_cond).	laurent	2023-06-27	2	-14/+2
*	Refactor the hierarchy.	Nicolas Patry	2023-06-27	25	-0/+6053