Get the ggml based llama to generate some text. (#464)

diff options

author	Laurent Mazare <laurent.mazare@gmail.com>	2023-08-16 12:41:07 +0100
committer	GitHub <noreply@github.com>	2023-08-16 12:41:07 +0100
commit	3071134788334c972d9e356f53887d2b2ff026b7 (patch)
tree	adbb58e3babee5d62fa6150bde4f9bb03770607c /candle-core/examples/cpu_benchmarks.rs
parent	fec87e86f50da78656a0fb28fc254390435fb3fd (diff)
download	candle-3071134788334c972d9e356f53887d2b2ff026b7.tar.gz candle-3071134788334c972d9e356f53887d2b2ff026b7.tar.bz2 candle-3071134788334c972d9e356f53887d2b2ff026b7.zip

* Add more stats to the ggml example. * Build a quantized model from the file content. * Move the tensor retrieval in the main crate. * Start adding the forward pass. * Add more to the forward pass of the quantized llama. * Apply the attention layers. * Add the sampling loop. * Get the sampling loop to work. * Minor tweak. * Add a quantize/dequantize test. * Bugfix. * Add a comment + swap the order. * Bugfixes.

Diffstat (limited to 'candle-core/examples/cpu_benchmarks.rs')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: