| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
* Add the StarCoder2 model.
* Add the example code and get things to work.
* And also tweak the readme.
|
|
|
| |
Co-authored-by: Guoqing Bao <guoqing.bao@enflame-tech.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Quantized version of mistral.
* Integrate the quantized mistral variant.
* Use the quantized weight files.
* Tweak the quantization command.
* Fix the dtype when computing the rotary embeddings.
* Update the readme with the quantized version.
* Fix the decoding of the remaining tokens.
|
|
* Token streaming.
* Use the token output stream.
* Flush the output.
* Ensure that the last characters get reported.
|