summaryrefslogtreecommitdiff
path: root/candle-transformers
Commit message (Expand)AuthorAgeFilesLines
...
* Make the falcon model cloneable. (#2067)Laurent Mazare2024-04-151-5/+5
* Add a function to clear the KV cache in falcon. (#2066)Laurent Mazare2024-04-151-0/+14
* Add a quantized version of recurrent-gemma. (#2054)Laurent Mazare2024-04-134-61/+477
* Avoid crashes when running T5 models with F16 tensors on CPU (#2047)Victor-Mihaila2024-04-131-1/+1
* Change for the encoder-only ProstT5 model (#2045)Victor-Mihaila2024-04-131-1/+3
* Add the recurrent-gemma model. (#2039)Laurent Mazare2024-04-132-0/+641
* Use cat for faster MQA computation. (#2043)Laurent Mazare2024-04-1216-195/+47
* Add the code-gemma models. (#2038)Laurent Mazare2024-04-101-4/+15
* Support alternative dtypes for mamba (#2036)Laurent Mazare2024-04-103-8/+15
* Add the new gemma models. (#2023)Laurent Mazare2024-04-061-0/+1
* Fix the final rmsnorm for quantized-metavoice. (#2021)Laurent Mazare2024-04-061-0/+1
* Faster mask implementation for mixformers. (#2017)Laurent Mazare2024-04-051-21/+6
* Moondream tracing. (#2016)Laurent Mazare2024-04-052-13/+48
* Add the rope THD kernel. (#2014)Laurent Mazare2024-04-051-22/+6
* Use F16 for moondream on cuda. (#2013)Laurent Mazare2024-04-041-5/+8
* Include topk sampling in the quantized example. (#2005)Laurent Mazare2024-04-041-1/+25
* Relax the contiguous check for cuda kernels. (#2000)Laurent Mazare2024-04-031-1/+2
* Improve the handling of matmul with squeezed layouts. (#1998)Laurent Mazare2024-04-021-1/+1
* Match Moondream's latest release (#1997)Santiago Medina2024-04-021-1/+1
* first commit (#1994)Jorge António2024-04-021-1/+2
* Stable diffusion fix. (#1993)Laurent Mazare2024-04-021-1/+3
* Expose the t5 config fields + allow t5-large. (#1987)Laurent Mazare2024-04-011-16/+16
* Quantized moondream implementation and BOS token (#1980)Santiago Medina2024-04-015-16/+316
* Add options to use local files + specify a custom repo or branch. (#1973)Laurent Mazare2024-03-311-13/+15
* Add Moondream transformer implementation and example (#1970)Santiago Medina2024-03-313-0/+329
* Remove some unnecessary calls to contiguous. (#1968)Laurent Mazare2024-03-301-4/+10
* Qwen MoE model. (#1960)Laurent Mazare2024-03-282-0/+489
* Fix clippy lints + minor cleanups. (#1957)Laurent Mazare2024-03-284-100/+41
* CLIP model implementation with example (#1950)Tigran Zhampeissov2024-03-284-0/+694
* add send and sync trait bounds for scheduler config in stable diffusion model...Jorge António2024-03-281-1/+1
* add config for mamba 2.8b model parameter (#1946)Jorge António2024-03-271-4/+4
* Another fix for squeezing. (#1943)Laurent Mazare2024-03-261-2/+2
* Faster repeat penalty (#1940)Laurent Mazare2024-03-261-3/+7
* Use the new rope kernel in mistral. (#1937)Laurent Mazare2024-03-252-28/+12
* Avoid the attention mask where possible. (#1933)Laurent Mazare2024-03-253-16/+32
* Fast kernels for rotary embeddings. (#1928)Laurent Mazare2024-03-241-26/+5
* Also avoid the mask in the llama example.laurent2024-03-241-2/+6
* Avoid using the attn mask when not necessary.laurent2024-03-241-5/+19
* Support more mistral models. (#1927)Laurent Mazare2024-03-242-24/+31
* Allow for arbitrary temperature modifications.laurent2024-03-231-1/+7
* Add topk sampling. (#1923)Laurent Mazare2024-03-232-24/+88
* Avoid broadcasting on the batch dimension for the attention mask. (#1920)Laurent Mazare2024-03-232-8/+6
* Fix loading the gguf files. (#1913)Laurent Mazare2024-03-221-1/+1
* Fix for the llama model. (#1906)Laurent Mazare2024-03-211-1/+1
* Use the fast RmsNorm in the quantized model. (#1904)Laurent Mazare2024-03-213-35/+21
* Avoid copying the data on squeeze and unsqueeze. (#1884)Laurent Mazare2024-03-202-2/+2
* Use a common with_tracing::RmsNorm in a few models. (#1871)Jani Monoses2024-03-186-111/+29
* Expose some helper functions to create quantized models. (#1837)Laurent Mazare2024-03-123-0/+15
* Add some tracing to metavoice. (#1826)Laurent Mazare2024-03-092-8/+82
* Quantized version of the metavoice model. (#1824)Laurent Mazare2024-03-094-4/+241