summaryrefslogtreecommitdiff
path: root/candle-transformers
Commit message (Expand)AuthorAgeFilesLines
* Bump the version number to 0.5.1. (#2155)Laurent Mazare2024-05-031-1/+1
* Add argsort. (#2132)Laurent Mazare2024-04-272-43/+21
* Add Olmo models (#2127)Isotr0py2024-04-262-0/+338
* Add the phi-3 model. (#2120)Laurent Mazare2024-04-242-0/+330
* Use the faster rms-norm kernel for llama. (#2107)Laurent Mazare2024-04-221-0/+5
* Updated quantized phi model (#2099)Laurent Mazare2024-04-212-0/+289
* Derive clone and debug traits for Moondream model (#2100)Santiago Medina2024-04-211-0/+1
* Small cleanups to the llama multi-process example. (#2098)Laurent Mazare2024-04-201-1/+7
* Fix for gemma MQA. (#2091)Laurent Mazare2024-04-191-2/+3
* Use faster rotary embeddings for llama like models. (#2087)Laurent Mazare2024-04-181-11/+6
* Llama v3. (#2085)Laurent Mazare2024-04-181-0/+10
* Make the falcon model cloneable. (#2067)Laurent Mazare2024-04-151-5/+5
* Add a function to clear the KV cache in falcon. (#2066)Laurent Mazare2024-04-151-0/+14
* Add a quantized version of recurrent-gemma. (#2054)Laurent Mazare2024-04-134-61/+477
* Avoid crashes when running T5 models with F16 tensors on CPU (#2047)Victor-Mihaila2024-04-131-1/+1
* Change for the encoder-only ProstT5 model (#2045)Victor-Mihaila2024-04-131-1/+3
* Add the recurrent-gemma model. (#2039)Laurent Mazare2024-04-132-0/+641
* Use cat for faster MQA computation. (#2043)Laurent Mazare2024-04-1216-195/+47
* Add the code-gemma models. (#2038)Laurent Mazare2024-04-101-4/+15
* Support alternative dtypes for mamba (#2036)Laurent Mazare2024-04-103-8/+15
* Add the new gemma models. (#2023)Laurent Mazare2024-04-061-0/+1
* Fix the final rmsnorm for quantized-metavoice. (#2021)Laurent Mazare2024-04-061-0/+1
* Faster mask implementation for mixformers. (#2017)Laurent Mazare2024-04-051-21/+6
* Moondream tracing. (#2016)Laurent Mazare2024-04-052-13/+48
* Add the rope THD kernel. (#2014)Laurent Mazare2024-04-051-22/+6
* Use F16 for moondream on cuda. (#2013)Laurent Mazare2024-04-041-5/+8
* Include topk sampling in the quantized example. (#2005)Laurent Mazare2024-04-041-1/+25
* Relax the contiguous check for cuda kernels. (#2000)Laurent Mazare2024-04-031-1/+2
* Improve the handling of matmul with squeezed layouts. (#1998)Laurent Mazare2024-04-021-1/+1
* Match Moondream's latest release (#1997)Santiago Medina2024-04-021-1/+1
* first commit (#1994)Jorge António2024-04-021-1/+2
* Stable diffusion fix. (#1993)Laurent Mazare2024-04-021-1/+3
* Expose the t5 config fields + allow t5-large. (#1987)Laurent Mazare2024-04-011-16/+16
* Quantized moondream implementation and BOS token (#1980)Santiago Medina2024-04-015-16/+316
* Add options to use local files + specify a custom repo or branch. (#1973)Laurent Mazare2024-03-311-13/+15
* Add Moondream transformer implementation and example (#1970)Santiago Medina2024-03-313-0/+329
* Remove some unnecessary calls to contiguous. (#1968)Laurent Mazare2024-03-301-4/+10
* Qwen MoE model. (#1960)Laurent Mazare2024-03-282-0/+489
* Fix clippy lints + minor cleanups. (#1957)Laurent Mazare2024-03-284-100/+41
* CLIP model implementation with example (#1950)Tigran Zhampeissov2024-03-284-0/+694
* add send and sync trait bounds for scheduler config in stable diffusion model...Jorge António2024-03-281-1/+1
* add config for mamba 2.8b model parameter (#1946)Jorge António2024-03-271-4/+4
* Another fix for squeezing. (#1943)Laurent Mazare2024-03-261-2/+2
* Faster repeat penalty (#1940)Laurent Mazare2024-03-261-3/+7
* Use the new rope kernel in mistral. (#1937)Laurent Mazare2024-03-252-28/+12
* Avoid the attention mask where possible. (#1933)Laurent Mazare2024-03-253-16/+32
* Fast kernels for rotary embeddings. (#1928)Laurent Mazare2024-03-241-26/+5
* Also avoid the mask in the llama example.laurent2024-03-241-2/+6
* Avoid using the attn mask when not necessary.laurent2024-03-241-5/+19
* Support more mistral models. (#1927)Laurent Mazare2024-03-242-24/+31