| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
* Support more mistral models.
* Use the appropriate rope parameter.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
* Update the image crate + use the re-exported version.
* Update to using ab_glyph.
|
|
|
|
|
|
|
|
|
| |
* Avoid copying the data on squeeze and unsqueeze.
* Fix the quantized llama example.
* Unrelated fix for the quantized stable-lm example on cuda.
* Fix for mamba on cuda (unrelated to the PR).
|
| |
|
| |
|
| |
|
|
|
|
|
| |
* Improve the encodec example: handle resampling.
* Play the audio directly.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
* Update gemma README
* Fixit
|
| |
|
|
|
|
|
| |
* Quantized version of the metavoice model.
* Integrate the quantized version of metavoice.
|
|
|
|
|
| |
* Fast CPU kernel for transposed 1d convolutions.
* Bugfix.
|
|
|
|
|
|
|
|
|
| |
* Add a --seed argument to the stable-diffusion example.
* Make the case when no seed is specified, that it will not be set, but use the engine's default. This will make the CPU engine work again when no --seed is given, and will cause a bailout when a seed is there, as the engine does not currently support it.
---------
Co-authored-by: niklas <niklas@appli.se>
|
| |
|
|
|
|
|
|
|
|
|
| |
* add segformer
* Make the id2label field optional.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Normalize loudness of the generated audio.
* Lints.
* One more lint.
* Avoid running the bs1770 tests.
* Another attempt at discarding doc comments.
* Also normalize the loudness in the encodec example.
|
| |
|
|
|
|
|
|
|
| |
* Enable tanh + tweak conv-transpose.
* Run the encodec decoding on cpu.
* Clippy fixes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add the metavoice transformer.
* Sketch the speaker-encoder module.
* Adding to the metavoice model.
* Start adding the metavoice example.
* Get some logits out.
* Load the second stage model.
* Get the second step to run.
* Tweak the example.
* Add encodec tilting.
* Glue the different bits together.
* Fix a shape issue.
* Use a constant.
* BPE tokenization.
* Fix the position index in metavoice.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add the metavoice transformer.
* Sketch the speaker-encoder module.
* Adding to the metavoice model.
* Start adding the metavoice example.
* Get some logits out.
* Load the second stage model.
* Get the second step to run.
* Tweak the example.
* Add encodec tilting.
* Glue the different bits together.
* Fix a shape issue.
* Use a constant.
* BPE tokenization.
* Add a warning.
|
| |
|
| |
|
|
|
|
|
| |
* Add EfficientVit (Microsoft Research Asia) model.
* Mention models in README
|
|
|
|
|
| |
* add models of rwkv v6 and quantized rwkv v6
* fix ci clippy fail
|
|
|
|
|
|
|
| |
* Add the StarCoder2 model.
* Add the example code and get things to work.
* And also tweak the readme.
|
|
|
|
|
| |
* Add a flag to force running the quantized model on CPUs.
* Add encodec to the readme.
|
|
|
|
|
| |
* Support more modes in the encodec example.
* Remove the old encodec model from the musicgen bits.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Encodec model.
* Fixes.
* Add the padding functions.
* Get the LSTM bit to work.
* Get the encodec model to generate some tokens (decoder only for now).
* Minor tweak.
* Minor tweak.
|
| |
|
|
|
|
|
|
|
|
|
| |
* and quantized rwkv v5 model
* Integrate the quantized rwkv model in the initial example.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Boilerplate for the quantized cuda support.
* More basic cuda support.
* More cuda quantization (quantize on cpu for now).
* Add the dequantization bit.
* Start adding some dedicated cuda kernels from llama.cpp.
* Move the kernel code.
* Start interfacing with the kernel.
* Tweak the kernel launch params.
* Bugfix for quantized metal.
* Fix some clippy lints.
* Tweak the launch parameters.
* Tweak cuda basics to perform a quantized matmul.
* Perform the dequantization on the cpu + use cublas for matmul.
* Add the dequantization kernel.
* Test the qmatmul.
* More kernels.
* Matmul-vec kernel.
* Add a couple kernels.
* More dequantization kernels.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add the Gemma models.
* Add the gemma example.
* Adapt the RmsNorm.
* Get the 2b model to work.
* 7b support.
* Use the config head dim.
* Yet another fix.
* Make the matrixes contiguous.
* Also get the 7b model to work.
* And add to the readme.
|
|
|
|
|
| |
* Use the tokenizer-output-stream in the llama example.
* Also use tokenizer-output-stream for llama2-c.
|
| |
|
|
|
|
|
|
|
| |
* Custom tokenizer for rwkv.
* Custom tokenizer.
* Getting the tokenizer to work.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Start adding the RWKV model.
* More of the forward step.
* Handle rescaling.
* FeedForward.
* More work on RWKV.
* Better state tracking.
* Finish a first pass on forward.
* Fix the shape mismatches.
* Do not rescale in f32.
* Rename to rwkv-v5.
* Add the new models to the readme.
|
| |
|