summaryrefslogtreecommitdiff
path: root/candle-transformers
Commit message (Collapse)AuthorAgeFilesLines
...
* fix: qwen2 lm_head loading #2443 (#2445)ilookee2024-08-231-1/+1
| | | Co-authored-by: Yi Xu <xuyi@me.com>
* Add FastViT model. (#2444)Jani Monoses2024-08-232-0/+513
|
* Fix for parler-tts, do not add the last slice of padding tokens. (#2442)Laurent Mazare2024-08-221-1/+0
| | | | | * Fix for parler-tts, do not add the last slice of padding tokens. * Support for the mini model.
* Add the DAC model. (#2433)Laurent Mazare2024-08-194-1/+383
| | | | | | | | | * Add the DAC model. * More quantization support. * Handle DAC decoding. * Plug the DAC decoding in parler-tts.
* parler-tts support (#2431)Laurent Mazare2024-08-182-0/+453
| | | | | | | | | | | | | | | | | | | | | | | * Start sketching parler-tts support. * Implement the attention. * Add the example code. * Fix the example. * Add the description + t5 encode it. * More of the parler forward pass. * Fix the positional embeddings. * Support random sampling in generation. * Handle EOS. * Add the python decoder. * Proper causality mask.
* Add support for gemma-2. (#2425)Laurent Mazare2024-08-172-0/+450
| | | | | | | | | | | * Add gemma-2. * Support a couple more models. * Sliding window support. * Example + readme updates. * Update the main readme.
* Fix the device for the bert attention mask. (#2414)Laurent Mazare2024-08-141-1/+2
|
* Add Based LLM from Hazy Research. (#2411)Jani Monoses2024-08-122-0/+590
|
* Soft Non-Maximum Suppression (#2400)Matthew O'Malley-Nichols2024-08-102-0/+280
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Soft NMS with thresholds * NMS Test * Soft nms w/ boxes removed below threshold * Soft nms test * No longer removing bounding boxes to fit Soft-NMS focus * Initialize confidence * Added comments * Refactored out updating based on IOU/sigma * Score_threshold -> confidence_threshold for clarity * Remove bboxes below confidence threshold * Softnms basic functionality test * Softnms confidence decay test * Softnms confidence threshold test * Softnms no overlapping bbox test * Testing confidence after no overlap test * Single bbox and no bbox tests * Signify test completion * Handling result of test functions * Checking all pairs of bboxes instead of a forward pass * Equal confidence overlap test * Clarified tests for implementation * No longer dropping boxes, just setting to 0.0 * Formatted w/ cargo
* Add the MMDiT model of Stable Diffusion 3 (#2397)Czxck0012024-08-056-0/+763
| | | | | | | | | | | | | | | | | * add mmdit of stable diffusion 3 lint add comments * correct a misplaced comment * fix cargo fmt * fix clippy error * use bail! instead of assert! * use get_on_dim in splitting qkv
* add models support and example for THUDM/glm-4 (#2362)唐璜2024-08-052-0/+596
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * add models support and example for THUDM/glm-4 * fix the ci report * fmt * fix * Update README.org * Update README.org * fmt * Update README.org * README.md add codegeex4 * README.md add glm4 * Typo. * change expect into ? --------- Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
* Support for mistral-nemo. (#2396)Laurent Mazare2024-08-041-5/+12
|
* Simplify handling of flux modulations. (#2394)Laurent Mazare2024-08-041-46/+88
|
* Add the flux model for image generation. (#2390)Laurent Mazare2024-08-045-0/+1145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add the flux autoencoder. * Add the encoder down-blocks. * Upsampling in the decoder. * Sketch the flow matching model. * More flux model. * Add some of the positional embeddings. * Add the rope embeddings. * Add the sampling functions. * Add the flux example. * Fix the T5 bits. * Proper T5 tokenizer. * Clip encoder path fix. * Get the clip embeddings. * No configurable weights in layer norm. * More weights related fixes. * Yet another shape fix. * DType fix. * Fix a couple more shape issues. * DType fixes. * Fix the latent dims. * Fix more shape issues. * Autoencoder fixes. * Get some generations out. * Bugfix. * T5 padding. * Clippy fix. * Add the decode only mode. * Fix. * More fixes. * Finally get some generations to work. * Add readme.
* Fix cargo fmt. (#2383)Laurent Mazare2024-08-011-0/+1
| | | | | | | * Fix cargo fmt. * Clippy fix. * Cosmetic tweaks.
* Jina Bert Example fix and more configuration (#2191)Joan Fontanals2024-08-011-0/+30
| | | | | | | * fix: fix jina bert example logic * feat: enable jina embeddings de * feat: allow more flexibility on Jina Bert
* Add Hiera vision model. (#2382)Jani Monoses2024-08-012-0/+303
|
* bert attention mask (#1934)Zheng Li2024-08-011-17/+32
| | | | | | | | | | | | | | | * bert attention mask * Allow for using None as a mask. * Revert part of the changes so that the proper default mask applies. * Cosmetic change. * Another cosmetic tweak. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Add support for Llama 3.1 (#2359)Eric Buehler2024-07-2614-50/+125
| | | | | | | | | | | | | | | | | * Add Llama 3.1 rope * Clippy * Format * Clippy * Add support for multiple eos tokens: * Untagged either * Remove either dep and fix settings.json * Make the max positional embeddings configurable
* feat(candle-transformers/models/codegeex4-9b): add codegeex4-9 (#2334)donjuanplatinum2024-07-212-0/+597
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * feat(candle-transformers/models/codegeex4-9b): add codegeex4-9b transoformers * change mod.rs * feat(candle-examples/codegeex4-9b) * Update codegeex4_9b.rs * Update main.rs * Update codegeex4_9b.rs * Update main.rs * fmt * fix * fmt * Clippy fix. * Remove some print statements. * Avoid using unwrap. * 1. add README 2. change the print fmt * Another clippy fix. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* add quantized qwen2 (#2329)Zhuo Jinggang2024-07-122-0/+324
| | | | | * add quantized version of qwen2 and corresponding example for qwen2-instruct * fix quantized qwen2 clippy error
* Add Mobilenet v4 (#2325)Jani Monoses2024-07-092-0/+801
| | | | | | | * Support different resolutions in load_image() * Added MobilenetV4 model. * Add MobileNetv4 to README
* Add EVA-02 model ( https://arxiv.org/abs/2303.11331 ) (#2311)v-espitalier2024-07-072-0/+419
| | | | | | | | | | | | * Add EVA-02 model ( https://arxiv.org/abs/2303.11331 ) * Clippy fix. * And apply fmt. --------- Co-authored-by: v-espitalier <> Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Beit: Add the gen_relative_position_index() function (#2306)v-espitalier2024-07-041-26/+63
| | | Co-authored-by: v-espitalier <>
* Add Beit model ( https://arxiv.org/abs/2106.08254 ) (#2305)v-espitalier2024-07-012-0/+368
| | | Co-authored-by: v-espitalier <>
* Add DINOv2Reg4 + PlantCLEF2024 (#2293)v-espitalier2024-06-292-0/+282
| | | | | | | | | | | | | | | | | | | | * Add: DINOv2Reg4 with PlantCLEF2024 weights and example ( See https://arxiv.org/abs/2309.16588 and https://zenodo.org/records/10848263 ) * Remove extra files + update README to download them + remove extra lines * minor fix (README remove extra spaces) * minor fix (README: Fix image url) * Modif: Add back interpolate_pos_encoding() + fix when no interpolation + remove extra comments + Update README ( source image changed and so the predictions ) * Fix: Improve code lisibility with '$ cargo clippy' and '$ cargo fmt' * Another clippy fix. --------- Co-authored-by: x-VEspit <vincent.espitalier@cirad.fr> Co-authored-by: laurent <laurent.mazare@gmail.com>
* Depth Anything v2 (#2279)Jeroen Vlek2024-06-243-0/+632
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * define structs * construct ResidualConvUnit * forward() for ResidualConvUnit * implement FeatureFusionBlock * implement Scratch * implement DPTHead * add identity module * implement forward for DTPHead * add get_intermediate_layers to DinoVisionTransformer * implement DepthAnythingV2 * some minor tweaks * fix compile errors * fix var builder prefixes * setup initial example * use fixed patch size of 37 (518 / 14) * debugged until output * print min and max values * add some dynamism to the output location * scale input image * extract prep function * extract output path function * normalize image with magic mean and std * add spectral coloring * squeeze in the right place * make enterpolation optional * use bail instead of panic * omit unnecessary Shape call * remove empty curly braces * use bail instead of assert * use vb and pp * remove closures * extract config object * Apply rustfmt. * Fix some clippy lints. * More lints. * Use the array methods. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
* Fix the fast bf16 gemm cublas kernels. (#2274)Laurent Mazare2024-06-181-2/+1
| | | | | | | | | | | * Use flash-attn in gemma. * Fix for the fast bf16 cublas gemm. * Fix some clippy lints. * Fix another lint. * Proper clippy fix.
* Support for the new Qwen2 models. (#2257)Laurent Mazare2024-06-071-2/+6
| | | | | * Support for the new Qwen2 models. * Add more models.
* Add LLaVA support (#2234)chenwanqq2024-06-037-0/+776
| | | | | | | | | | | | | | | | | | | | | | | | | * first commit * llava * clippy and fmt * some fixes * minor fixes * remove useless file * refactor: Remove llava/constants.rs and update llava/mod.rs * modify variable name * modify code after clippy * Minor tweaks. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
* Add Debug, Clone, Deserialize to moondream config (#2222)Dave Lage2024-05-281-0/+1
|
* Enable the new layer-norm. (#2213)Laurent Mazare2024-05-241-8/+4
| | | | | * Enable the new layer-norm. * Shape fixes.
* Avoid a contiguous call in the quantized phi 3 model. (#2209)Laurent Mazare2024-05-231-1/+1
| | | | | * Simplify the KvCache api. * Avoid a contiguous call in the quantized phi3 model.
* Simplify the KvCache api. (#2207)Laurent Mazare2024-05-231-7/+1
|
* Use flash-attn in gemma. (#2195)Laurent Mazare2024-05-181-18/+44
| | | | | * Use flash-attn in gemma. * Fix flash-attn for head dim 256.
* Support flash-attn in quantized phi3. (#2194)Laurent Mazare2024-05-181-10/+40
|
* Add a slice_set op. (#2193)Laurent Mazare2024-05-181-22/+19
| | | | | | | | | | | | | | | * Add a slice_set op. * Add some testing. * Add the dedicated kv-cache module. * Derive debug and clone. * Expose more kv-cache functions. * Return the current data when appending. * Use the new cache in the quantized phi3 model.
* Support embedding model gte-Qwen1.5-7B-instruct (#2190)Yin Guobing2024-05-161-15/+62
| | | | | | | | | | | | | | | | | * Support embedding model gte-Qwen1.5-7B-instruct This is a text embedding model based on Qwen2. They share same model architecture except the last MLP module. This commit brings in minimal modification of the old Qwen2 implementation to support both models. An example is provided, and had been verified according to the official PyTorch implementation. * Avoid doing the 'last-token filtering' based on the absence of attention mask. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Separate quantized phi-3 implementation. (#2157)Laurent Mazare2024-05-043-4/+306
| | | | | | | | | | | * Separate quantized phi-3 implementation. * Integrate the quantized phi3 model.= * Small fixes, get the generation to work properly. * Keep the old llama implementation around. * Change the default.
* Bump the version number to 0.5.1. (#2155)Laurent Mazare2024-05-031-1/+1
| | | | | | | * Bump the version number to 0.5.1. * Fix clippy lints for 1.78. * More clippy fixes.
* Add argsort. (#2132)Laurent Mazare2024-04-272-43/+21
| | | | | | | | | | | | | | | | | | | | | * Add the argsort cuda kernels. * CPU version of arg-sort. * Hook the cuda kernel + rework the cpu bits. * Add some dedicated test. * Working cuda kernel. * Metal kernel. * Metal adjustments. * Bugfix. * Use the fast rope in qwen. * Rework the expert selection in qwen.
* Add Olmo models (#2127)Isotr0py2024-04-262-0/+338
| | | | | | | | | | | | | | | * add olmo support * add olmo readme * Fix fmt. * Fix clippy. * Get olmo to work on cuda. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
* Add the phi-3 model. (#2120)Laurent Mazare2024-04-242-0/+330
| | | | | | | | | * Add the phi-3 model. * Faster rope. * Bugfix. * Fix the detokenization.
* Use the faster rms-norm kernel for llama. (#2107)Laurent Mazare2024-04-221-0/+5
| | | | | * Use the faster rms-norm kernel for llama. * Use the fast variant by default.
* Updated quantized phi model (#2099)Laurent Mazare2024-04-212-0/+289
| | | | | | | | | | | | | | | | | * Quantized phi in a separate file. * Add the quantized phi example + rework the model code. * Improve the phi model. * Get some generation out. * Use the appropriate rope shape. * Tweak the default prompt. --------- Co-authored-by: Jane Doe <jane.doe@example.org>
* Derive clone and debug traits for Moondream model (#2100)Santiago Medina2024-04-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | * moondream implementation * add moondream example * change config default activation * Add assets and integrate phi mixformer with example * Make use of kv cache and fix seq_len bug; Clean up example code * Add README link to example * Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig * Delete image * Use apply instead of forward * Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2 * Derive debug and clone traits for Moondream model.
* Small cleanups to the llama multi-process example. (#2098)Laurent Mazare2024-04-201-1/+7
|
* Fix for gemma MQA. (#2091)Laurent Mazare2024-04-191-2/+3
|
* Use faster rotary embeddings for llama like models. (#2087)Laurent Mazare2024-04-181-11/+6
|
* Llama v3. (#2085)Laurent Mazare2024-04-181-0/+10
| | | | | | | * Llama v3. * Tweak the default params + handle special tokens. * Small tweak.