diff options
author | Laurent Mazare <laurent.mazare@gmail.com> | 2023-09-12 10:17:31 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-09-12 10:17:31 +0100 |
commit | bb23b90b1df684471e21b9133a1008c2604e1738 (patch) | |
tree | 124682710e6c133bb478bbdacad95279b747f0d8 | |
parent | 2257f4d475c676e33dbd8dbafabf95e821d27f62 (diff) | |
download | candle-bb23b90b1df684471e21b9133a1008c2604e1738.tar.gz candle-bb23b90b1df684471e21b9133a1008c2604e1738.tar.bz2 candle-bb23b90b1df684471e21b9133a1008c2604e1738.zip |
Add a small readme for the quantized example. (#823)
-rw-r--r-- | candle-examples/examples/quantized/README.md | 35 |
1 files changed, 35 insertions, 0 deletions
diff --git a/candle-examples/examples/quantized/README.md b/candle-examples/examples/quantized/README.md new file mode 100644 index 00000000..f3159493 --- /dev/null +++ b/candle-examples/examples/quantized/README.md @@ -0,0 +1,35 @@ +# candle-quantized-llama: Fast Inference of quantized LLaMA models + +This example provides a quantized LLaMA model similar to +[llama.cpp](https://github.com/ggerganov/llama.cpp). This is based on candle +built-in quantization methods. Supported features include: + +- 2-bit, 3-bit, 4-bit, 5-bit, 6-bit and 8-bit integer quantization support. +- SIMD optimizations on Apple Silicon and x86. +- Support using the `gguf` and `ggml` file formats. + +The weights are automatically downloaded for you from the [HuggingFace +Hub](https://huggingface.co/) on the first run. There are various command line +flags to use local files instead, run with `--help` to learn about them. + +## Running some example. + +```bash +cargo run --example quantized --release -- --prompt "The best thing about coding in rust is " + +> avx: true, neon: false, simd128: false, f16c: true +> temp: 0.80 repeat-penalty: 1.10 repeat-last-n: 64 +> loaded 291 tensors (3.79GB) in 2.17s +> params: HParams { n_vocab: 32000, n_embd: 4096, n_mult: 256, n_head: 32, n_layer: 32, n_rot: 128, ftype: 2 } +> The best thing about coding in rust is 1.) that I don’t need to worry about memory leaks, 2.) speed and 3.) my program will compile even on old machines. +``` + +### Command-line flags + +Run with `--help` to see all options. + +- `--which`: specify the model to use, e.g. `7b`, `13-chat`, `7b-code`. +- `--prompt interactive`: interactive mode where multiple prompts can be + entered. +- `--model mymodelfile.gguf`: use a local model file rather than getting one + from the hub. |