summaryrefslogtreecommitdiff
path: root/candle-kernels/src/lib.rs
Commit message (Collapse)AuthorAgeFilesLines
* Add argsort. (#2132)Laurent Mazare2024-04-271-0/+1
| | | | | | | | | | | | | | | | | | | | | * Add the argsort cuda kernels. * CPU version of arg-sort. * Hook the cuda kernel + rework the cpu bits. * Add some dedicated test. * Working cuda kernel. * Metal kernel. * Metal adjustments. * Bugfix. * Use the fast rope in qwen. * Rework the expert selection in qwen.
* Cuda acceleration for quantized model. (#1754)Laurent Mazare2024-02-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Boilerplate for the quantized cuda support. * More basic cuda support. * More cuda quantization (quantize on cpu for now). * Add the dequantization bit. * Start adding some dedicated cuda kernels from llama.cpp. * Move the kernel code. * Start interfacing with the kernel. * Tweak the kernel launch params. * Bugfix for quantized metal. * Fix some clippy lints. * Tweak the launch parameters. * Tweak cuda basics to perform a quantized matmul. * Perform the dequantization on the cpu + use cublas for matmul. * Add the dequantization kernel. * Test the qmatmul. * More kernels. * Matmul-vec kernel. * Add a couple kernels. * More dequantization kernels.
* Cuda kernels for IndexAdd/ScatterAdd. (#236)Laurent Mazare2023-07-241-1/+1
| | | | | | | | | | | | | * Skeleton methods for IndexAdd/ScatterAdd. * Add a Map2InPlace trait. * Add the glue code for the index-add/scatter-add kernels. * Tweak the file name: embeddings -> indexing. * Add the cuda kernel for indexadd. * And add the scatter-add kernels.
* Revert "Add the layer norm files. (#222)" (#223)Laurent Mazare2023-07-221-1/+0
| | | This reverts commit c8459d199ddcea909f6ccd18ae4945cb19d3eb9e.
* Add the layer norm files. (#222)Laurent Mazare2023-07-221-0/+1
|
* Cuda kernel for the conv1d op (#111)Laurent Mazare2023-07-081-0/+1
| | | | | | | | | | | | | * Boilerplate code for conv1d. * Boilerplate code for conv1d. * More boilerplate for conv1d. * Conv1d work. * Get the conv1d cuda kernel to work. * Conv1d support when no batch dim.
* Refactor the hierarchy.Nicolas Patry2023-06-271-0/+8