Commit message (Expand) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Use cat for faster MQA computation. (#2043) | Laurent Mazare | 2024-04-12 | 1 | -14/+2 |
* | Use the new rope kernel in mistral. (#1937) | Laurent Mazare | 2024-03-25 | 1 | -14/+6 |
* | Support more mistral models. (#1927) | Laurent Mazare | 2024-03-24 | 1 | -4/+5 |
* | Avoid broadcasting on the batch dimension for the attention mask. (#1920) | Laurent Mazare | 2024-03-23 | 1 | -4/+3 |
* | Use the fast RmsNorm in the quantized model. (#1904) | Laurent Mazare | 2024-03-21 | 1 | -0/+1 |
* | feat: add clear_kv_cache to mistral and qmistral models (#1464) | drbh | 2023-12-21 | 1 | -0/+14 |
* | More model cloning. (#1126) | Laurent Mazare | 2023-10-18 | 1 | -5/+5 |
* | Move the common quantized-nn code to a shared module. (#1063) | Laurent Mazare | 2023-10-09 | 1 | -40/+1 |
* | Quantized version of mistral. (#1009) | Laurent Mazare | 2023-09-30 | 1 | -0/+364 |