summaryrefslogtreecommitdiff
path: root/candle-examples/examples/llama2-c
Commit message (Collapse)AuthorAgeFilesLines
* Explicit caching in llama2.c.laurent2024-02-222-20/+21
|
* Use the tokenizer-output-stream in the llama example. (#1715)Laurent Mazare2024-02-151-7/+6
| | | | | * Use the tokenizer-output-stream in the llama example. * Also use tokenizer-output-stream for llama2-c.
* Quantized GGUF style (#1523)Nicolas Patry2024-01-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml). * Cleanup. * Fix the rebase. * Removing the fences speeds everything up and *is* correct this time... * Cleanup the fence. * After rebase. * Bad code removal. * Rebase after phi2 merge + fix replit default to CPU. * Making the CI happy. * More happy tests. --------- Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>
* Infer the config for llama2-c. (#1208)Laurent Mazare2023-10-282-3/+13
|
* Move the llama2-c model in transformers. (#1205)Laurent Mazare2023-10-284-712/+3
|
* Add a quantized variant of llama2.c (#1197)Laurent Mazare2023-10-273-10/+285
| | | | | * Add a quantized variant of llama2.c * Clippy fixes.
* Implement top_p / nucleus sampling (#819)Juarez Bochi2023-09-121-1/+6
| | | | | | | | | | | | | * Implement top_p / nucleus sampling * Update changelog * rustfmt * Add tests * Fix clippy warning * Fix another clippy error
* Add a repeat penality to the llama2-c command line example. (#713)Laurent Mazare2023-09-011-0/+18
| | | | | * Add a repeat penality to the llama2-c command line example. * Another fix attempt.
* Add the optimizer trait. (#702)Laurent Mazare2023-09-011-0/+1
|
* Add a simple Module trait and implement it for the various nn layers (#500)Laurent Mazare2023-08-181-1/+1
| | | | | | | * Start adding the module trait. * Use the module trait. * Implement module for qmatmul.
* Add an abstract type for RmsNorm. (#499)Laurent Mazare2023-08-181-5/+5
|
* Layer norm tweaks (#482)Laurent Mazare2023-08-171-34/+8
| | | | | | | * Add some options to make layer-norm more configurable. * Add the rms-norm variant. * Replace the RmsNorm with the shared bits.
* Support the Accelerate BLAS on macOS. (#325)Laurent Mazare2023-08-051-0/+3
| | | | | * Add the accelerate feature. * Ffi tweaks.
* Add the candle-datasets crate (#322)Laurent Mazare2023-08-052-119/+7
| | | | | | | | | | | | | * Move the vision datasets to a separate crate. * Move the batcher bits. * Update the readme. * Move the tiny-stories bits. --------- Co-authored-by: Jane Doe <jane.doe@example.org>
* Transpose the weight matrixes for llama2.c. (#321)Laurent Mazare2023-08-041-8/+15
|
* Support safetensors weights in llama2.c inference. (#317)Laurent Mazare2023-08-032-7/+18
|
* Use AdamW in the llama2 training. (#308)Laurent Mazare2023-08-021-2/+9
|
* Llama more training (#297)Laurent Mazare2023-08-012-18/+26
| | | | | | | | | | | * Rework the var-builder to handle initializations. * Add some helper functions for layer creation. * Improve the layer initializations. * Get initialized variables. * Precompute the rot embeddings when training lamas.
* Add training for the llama2.c example (#296)Laurent Mazare2023-08-013-7/+216
| | | | | | | | | | | | | * Rework the commands and run inference by default. * Add the training module and load the training dataset. * Random dataset iterator. * Proper valid-loss computation. * Compute the evaluation loss. * Add more substance to the training loop.
* Move the weight bits in a separate module. (#295)Laurent Mazare2023-08-013-164/+168
|
* Add some batcher variants that handle errors. (#294)Laurent Mazare2023-08-011-4/+4
|
* Add the batcher. (#293)Laurent Mazare2023-08-011-18/+14
|
* Use subcommands in llama2. (#292)Laurent Mazare2023-08-011-100/+90
|
* Pre-tokenized evaluation mode for llama2.c. (#291)Laurent Mazare2023-08-011-30/+51
|
* Evaluate on the pre-tokenized file. (#290)Laurent Mazare2023-07-311-1/+58
|
* Remove the end of text tokens. (#289)Laurent Mazare2023-07-311-1/+2
|
* Add an eval mode to llama2-c (#288)Laurent Mazare2023-07-312-35/+87
| | | | | | | * Add an eval mode to llama2-c. * Encode line by line. * Get the eval to run.
* Add a prompt and support more models in llama2-c. (#285)Laurent Mazare2023-07-312-6/+26
| | | | | * Support more models in llama2-c. * Add a prompt.
* Use the hub models for llama2.c (#284)Laurent Mazare2023-07-311-25/+37
|
* Use u8 tensors for masks. (#273)Laurent Mazare2023-07-291-2/+1
|
* Softmax numerical stability. (#267)Laurent Mazare2023-07-281-1/+1
| | | | | * Softmax numerical stability. * Fix the flash-attn test.
* Use the binary decoder for llama2.c. (#230)Laurent Mazare2023-07-242-65/+85
| | | | | | | | | * Use the binary decoder for llama2.c. * Add the temperature. * Formatting tweak. * Fix the rotary embeddings.
* Add llama2.c as an example. (#229)Laurent Mazare2023-07-242-0/+558
* Start adding llama2.c. * Model loading. * Add the llama-v2 model. * Start converting the weights. * Rotary embedding tweaks. * Get the model to generate some tokens.