| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
* Fix cargo fmt.
* Clippy fix.
* Cosmetic tweaks.
|
|
|
|
|
|
|
| |
* fix: fix jina bert example logic
* feat: enable jina embeddings de
* feat: allow more flexibility on Jina Bert
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* bert attention mask
* Allow for using None as a mask.
* Revert part of the changes so that the proper default mask applies.
* Cosmetic change.
* Another cosmetic tweak.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add Llama 3.1 rope
* Clippy
* Format
* Clippy
* Add support for multiple eos tokens:
* Untagged either
* Remove either dep and fix settings.json
* Make the max positional embeddings configurable
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* feat(candle-transformers/models/codegeex4-9b): add codegeex4-9b transoformers
* change mod.rs
* feat(candle-examples/codegeex4-9b)
* Update codegeex4_9b.rs
* Update main.rs
* Update codegeex4_9b.rs
* Update main.rs
* fmt
* fix
* fmt
* Clippy fix.
* Remove some print statements.
* Avoid using unwrap.
* 1. add README
2. change the print fmt
* Another clippy fix.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
|
|
|
|
|
| |
* add quantized version of qwen2 and corresponding example for qwen2-instruct
* fix quantized qwen2 clippy error
|
|
|
|
|
|
|
| |
* Support different resolutions in load_image()
* Added MobilenetV4 model.
* Add MobileNetv4 to README
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add EVA-02 model ( https://arxiv.org/abs/2303.11331 )
* Clippy fix.
* And apply fmt.
---------
Co-authored-by: v-espitalier <>
Co-authored-by: Laurent <laurent.mazare@gmail.com>
|
|
|
| |
Co-authored-by: v-espitalier <>
|
|
|
| |
Co-authored-by: v-espitalier <>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add: DINOv2Reg4 with PlantCLEF2024 weights and example ( See https://arxiv.org/abs/2309.16588 and https://zenodo.org/records/10848263 )
* Remove extra files + update README to download them + remove extra lines
* minor fix (README remove extra spaces)
* minor fix (README: Fix image url)
* Modif: Add back interpolate_pos_encoding() + fix when no interpolation + remove extra comments + Update README ( source image changed and so the predictions )
* Fix: Improve code lisibility with '$ cargo clippy' and '$ cargo fmt'
* Another clippy fix.
---------
Co-authored-by: x-VEspit <vincent.espitalier@cirad.fr>
Co-authored-by: laurent <laurent.mazare@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* define structs
* construct ResidualConvUnit
* forward() for ResidualConvUnit
* implement FeatureFusionBlock
* implement Scratch
* implement DPTHead
* add identity module
* implement forward for DTPHead
* add get_intermediate_layers to DinoVisionTransformer
* implement DepthAnythingV2
* some minor tweaks
* fix compile errors
* fix var builder prefixes
* setup initial example
* use fixed patch size of 37 (518 / 14)
* debugged until output
* print min and max values
* add some dynamism to the output location
* scale input image
* extract prep function
* extract output path function
* normalize image with magic mean and std
* add spectral coloring
* squeeze in the right place
* make enterpolation optional
* use bail instead of panic
* omit unnecessary Shape call
* remove empty curly braces
* use bail instead of assert
* use vb and pp
* remove closures
* extract config object
* Apply rustfmt.
* Fix some clippy lints.
* More lints.
* Use the array methods.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
* Use flash-attn in gemma.
* Fix for the fast bf16 cublas gemm.
* Fix some clippy lints.
* Fix another lint.
* Proper clippy fix.
|
|
|
|
|
| |
* Support for the new Qwen2 models.
* Add more models.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* first commit
* llava
* clippy and fmt
* some fixes
* minor fixes
* remove useless file
* refactor: Remove llava/constants.rs and update llava/mod.rs
* modify variable name
* modify code after clippy
* Minor tweaks.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
|
| |
|
|
|
|
|
| |
* Enable the new layer-norm.
* Shape fixes.
|
|
|
|
|
| |
* Simplify the KvCache api.
* Avoid a contiguous call in the quantized phi3 model.
|
| |
|
|
|
|
|
| |
* Use flash-attn in gemma.
* Fix flash-attn for head dim 256.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add a slice_set op.
* Add some testing.
* Add the dedicated kv-cache module.
* Derive debug and clone.
* Expose more kv-cache functions.
* Return the current data when appending.
* Use the new cache in the quantized phi3 model.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Support embedding model gte-Qwen1.5-7B-instruct
This is a text embedding model based on Qwen2. They share same
model architecture except the last MLP module. This commit brings in
minimal modification of the old Qwen2 implementation to support both
models.
An example is provided, and had been verified according to the official
PyTorch implementation.
* Avoid doing the 'last-token filtering' based on the absence of attention mask.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
* Separate quantized phi-3 implementation.
* Integrate the quantized phi3 model.=
* Small fixes, get the generation to work properly.
* Keep the old llama implementation around.
* Change the default.
|
|
|
|
|
|
|
| |
* Bump the version number to 0.5.1.
* Fix clippy lints for 1.78.
* More clippy fixes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add the argsort cuda kernels.
* CPU version of arg-sort.
* Hook the cuda kernel + rework the cpu bits.
* Add some dedicated test.
* Working cuda kernel.
* Metal kernel.
* Metal adjustments.
* Bugfix.
* Use the fast rope in qwen.
* Rework the expert selection in qwen.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* add olmo support
* add olmo readme
* Fix fmt.
* Fix clippy.
* Get olmo to work on cuda.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
|
|
|
|
|
|
|
|
|
| |
* Add the phi-3 model.
* Faster rope.
* Bugfix.
* Fix the detokenization.
|
|
|
|
|
| |
* Use the faster rms-norm kernel for llama.
* Use the fast variant by default.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Quantized phi in a separate file.
* Add the quantized phi example + rework the model code.
* Improve the phi model.
* Get some generation out.
* Use the appropriate rope shape.
* Tweak the default prompt.
---------
Co-authored-by: Jane Doe <jane.doe@example.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* moondream implementation
* add moondream example
* change config default activation
* Add assets and integrate phi mixformer with example
* Make use of kv cache and fix seq_len bug; Clean up example code
* Add README link to example
* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig
* Delete image
* Use apply instead of forward
* Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2
* Derive debug and clone traits for Moondream model.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
* Llama v3.
* Tweak the default params + handle special tokens.
* Small tweak.
|
| |
|
|
|
|
|
| |
* Add a function to clear the KV cache in falcon.
* Clippy.
|
|
|
|
|
|
|
| |
* Add a quantized version of recurrent-gemma.
* Share the rglru part.
* Get the quantized gemma model to work.
|
|
|
|
|
|
|
|
|
| |
* This change avoids crashes when running T5 models with F16 tensors on CPU.
* This enables running ProstT5's (https://huggingface.co/Rostlab/ProstT5) encoder-only mode in Candle. This ProstT5 mode stores it's embed_tokens weights within the encoder, as its decoding stage was replaced with a CNN. You could write more, like: This alone is not sufficient to run ProstT5 within Candle examples. We will develop a ProstT5 runner outside candle for now, but would be willing to upstream it to candle-examples at a later point.
* Revert "This enables running ProstT5's (https://huggingface.co/Rostlab/ProstT5) encoder-only mode in Candle. This ProstT5 mode stores it's embed_tokens weights within the encoder, as its decoding stage was replaced with a CNN. You could write more, like: This alone is not sufficient to run ProstT5 within Candle examples. We will develop a ProstT5 runner outside candle for now, but would be willing to upstream it to candle-examples at a later point."
This reverts commit d886d3ce5e3f1504934f4f6f7cf86108b7efd191.
|
|
|
|
|
| |
* This change avoids crashes when running T5 models with F16 tensors on CPU.
* This enables running ProstT5's (https://huggingface.co/Rostlab/ProstT5) encoder-only mode in Candle. This ProstT5 mode stores it's embed_tokens weights within the encoder, as its decoding stage was replaced with a CNN. This alone is not sufficient to run ProstT5 within Candle examples. We will develop a ProstT5 runner outside candle for now, but would be willing to upstream it to candle-examples at a later point.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Start adding the recurrent-gemma model.
* More griffin.
* Add the example + get the weights to load from the HF version.
* More inference code.
* Rope + kv-cache on the attention side.
* Add to the inference code.
* Add more to the recurrent gemma inference.
* Get some first inference to run.
* Add the softcap on logits.
* Fixes.
* Use partial rotary embeddings.
* Get inference to work.
* Add a comment.
* And add a readme.
|
|
|
|
|
|
|
|
|
| |
* Use cat for faster MQA computation.
* Move the function to utils + use it in mistral.
* Use the shared repeat-kv in a few more models.
* Fix.
|
|
|
|
|
| |
* Add the code-gemma models.
* Tweak to the gemma config.
|
|
|
|
|
| |
* Allow different dtypes in mamba.
* Add a dtype flag.
|
|
|
|
|
|
|
| |
* Add the new gemma models.
* Revert the lightning changes.
* Support for the 1.1 models.
|
| |
|
|
|
|
|
| |
* Faster mask implementation for mixformers.
* Clippy.
|
|
|
|
|
| |
* Moondream tracing.
* A bit more tracing.
|
|
|
|
|
|
|
|
|
| |
* Add the rope THD kernel.
* Cuda kernel for rope-thd.
* Add the metal kernels.
* Add a dedicated test.
|