| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
* Use the tokenizer-output-stream in the llama example.
* Also use tokenizer-output-stream for llama2-c.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Metal quantized modifications proposal.
- Add a device param, wherever needed.
- Create new QMetal storage thing that implements QuantizedType.
- Update everywhere needed.
Fix Python.
Fixing examples.
Fix: fmt + clippy + stub.
Moving everything around.
Only missing the actual implems.
Fixing everything + adding dequantized kernels.
More work.
Fixing matmul.
Fmt + Clippy
Some clippy fixes.
Working state.
Q2K Metal -> Bugged (also present in GGML).
Q4K CPU -> Bugged (present previously, new test catch it).
Q5K CPU -> Bugged (present previously).
Q8_1 Both -> Never really implemented it seems
Q8K metal -> Never implemented in metal
Fixing Q2K bug (present in ggml).
* Cleanup.
* Fix the rebase.
* Removing the fences speeds everything up and *is* correct this time...
* Cleanup the fence.
* After rebase.
* Bad code removal.
* Rebase after phi2 merge + fix replit default to CPU.
* Making the CI happy.
* More happy tests.
---------
Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>
|
| |
|
| |
|
|
|
|
|
| |
* Add a quantized variant of llama2.c
* Clippy fixes.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Implement top_p / nucleus sampling
* Update changelog
* rustfmt
* Add tests
* Fix clippy warning
* Fix another clippy error
|
|
|
|
|
| |
* Add a repeat penality to the llama2-c command line example.
* Another fix attempt.
|
| |
|
|
|
|
|
|
|
| |
* Start adding the module trait.
* Use the module trait.
* Implement module for qmatmul.
|
| |
|
|
|
|
|
|
|
| |
* Add some options to make layer-norm more configurable.
* Add the rms-norm variant.
* Replace the RmsNorm with the shared bits.
|
|
|
|
|
| |
* Add the accelerate feature.
* Ffi tweaks.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Move the vision datasets to a separate crate.
* Move the batcher bits.
* Update the readme.
* Move the tiny-stories bits.
---------
Co-authored-by: Jane Doe <jane.doe@example.org>
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Rework the var-builder to handle initializations.
* Add some helper functions for layer creation.
* Improve the layer initializations.
* Get initialized variables.
* Precompute the rot embeddings when training lamas.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Rework the commands and run inference by default.
* Add the training module and load the training dataset.
* Random dataset iterator.
* Proper valid-loss computation.
* Compute the evaluation loss.
* Add more substance to the training loop.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
* Add an eval mode to llama2-c.
* Encode line by line.
* Get the eval to run.
|
|
|
|
|
| |
* Support more models in llama2-c.
* Add a prompt.
|
| |
|
| |
|
|
|
|
|
| |
* Softmax numerical stability.
* Fix the flash-attn test.
|
|
|
|
|
|
|
|
|
| |
* Use the binary decoder for llama2.c.
* Add the temperature.
* Formatting tweak.
* Fix the rotary embeddings.
|
|
* Start adding llama2.c.
* Model loading.
* Add the llama-v2 model.
* Start converting the weights.
* Rotary embedding tweaks.
* Get the model to generate some tokens.
|