| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
* Support for the new Qwen2 models.
* Add more models.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Support embedding model gte-Qwen1.5-7B-instruct
This is a text embedding model based on Qwen2. They share same
model architecture except the last MLP module. This commit brings in
minimal modification of the old Qwen2 implementation to support both
models.
An example is provided, and had been verified according to the official
PyTorch implementation.
* Avoid doing the 'last-token filtering' based on the absence of attention mask.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Qwen MoE model.
* Add the MoE model to the example.
* Fix the scaling.
* Readme updates.
* Readme tweaks.
|
|
|
|
| |
Using the chatglm one causes a bug where the "<|endoftext|>" is not
found.
|
| |
|
| |
|
|
* Initial check-in for the qwen2 model.
* More qwen2 inference.
* Polish the qwen example.
* Fix the rope basis.
* Get the inference to work.
* Support different model sizes.
|