diff options
author | Nicolas Patry <patry.nicolas@protonmail.com> | 2024-01-17 10:27:58 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-01-17 10:27:58 +0100 |
commit | 403680f17ddc086295fbaee316cbed22d97a519b (patch) | |
tree | 80dcffe6e929640e7f0ebfff3ba90410fd58992e /candle-examples/examples/replit-code | |
parent | 5270224f407502b82fe90bc2622894ce3871b002 (diff) | |
download | candle-403680f17ddc086295fbaee316cbed22d97a519b.tar.gz candle-403680f17ddc086295fbaee316cbed22d97a519b.tar.bz2 candle-403680f17ddc086295fbaee316cbed22d97a519b.zip |
Quantized GGUF style (#1523)
* Metal quantized modifications proposal.
- Add a device param, wherever needed.
- Create new QMetal storage thing that implements QuantizedType.
- Update everywhere needed.
Fix Python.
Fixing examples.
Fix: fmt + clippy + stub.
Moving everything around.
Only missing the actual implems.
Fixing everything + adding dequantized kernels.
More work.
Fixing matmul.
Fmt + Clippy
Some clippy fixes.
Working state.
Q2K Metal -> Bugged (also present in GGML).
Q4K CPU -> Bugged (present previously, new test catch it).
Q5K CPU -> Bugged (present previously).
Q8_1 Both -> Never really implemented it seems
Q8K metal -> Never implemented in metal
Fixing Q2K bug (present in ggml).
* Cleanup.
* Fix the rebase.
* Removing the fences speeds everything up and *is* correct this time...
* Cleanup the fence.
* After rebase.
* Bad code removal.
* Rebase after phi2 merge + fix replit default to CPU.
* Making the CI happy.
* More happy tests.
---------
Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>
Diffstat (limited to 'candle-examples/examples/replit-code')
-rw-r--r-- | candle-examples/examples/replit-code/main.rs | 13 |
1 files changed, 6 insertions, 7 deletions
diff --git a/candle-examples/examples/replit-code/main.rs b/candle-examples/examples/replit-code/main.rs index 0f72b862..b7f767b9 100644 --- a/candle-examples/examples/replit-code/main.rs +++ b/candle-examples/examples/replit-code/main.rs @@ -236,16 +236,15 @@ fn main() -> Result<()> { let tokenizer = Tokenizer::from_file(tokenizer_filename).map_err(E::msg)?; let start = std::time::Instant::now(); + let device = candle_examples::device(args.cpu)?; let config = Config::replit_code_v1_5_3b(); - let (model, device) = if args.quantized { - let vb = candle_transformers::quantized_var_builder::VarBuilder::from_gguf(&filename)?; - let model = Model::Q(Q::new(&config, vb.pp("transformer"))?); - (model, Device::Cpu) + let model = if args.quantized { + let vb = + candle_transformers::quantized_var_builder::VarBuilder::from_gguf(&filename, &device)?; + Model::Q(Q::new(&config, vb.pp("transformer"))?) } else { - let device = candle_examples::device(args.cpu)?; let vb = unsafe { VarBuilder::from_mmaped_safetensors(&[filename], DType::F32, &device)? }; - let model = Model::M(M::new(&config, vb.pp("transformer"))?); - (model, device) + Model::M(M::new(&config, vb.pp("transformer"))?) }; println!("loaded the model in {:?}", start.elapsed()); |