forks/candle.git -

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add the SmolLM2 models. (#2595)	Laurent Mazare	2024-11-03	1	-1/+24
\| \| \| \| \|	* Add the SmolLM2 models. * More SmolLM2 support.
*	Force the revision for the phi3-llama quantized models. (#2159)	Laurent Mazare	2024-05-04	1	-2/+11
\|
*	Add a toggle for F16/BF16 accumulation in gemm. (#2141)	Laurent Mazare	2024-04-29	1	-0/+3
\| \| \| \| \| \| \|	* Add a toggle to control f16/bf16 gemm precision. * Use the faster variant in the quantized example. * Bugfix.
*	Add the phi-v3 quantized model. (#2118)	Laurent Mazare	2024-04-24	1	-24/+35
\| \| \| \| \|	* Add the phi-v3 quantized model. * Also include phi-3 in the main phi example.
*	Add support for llama3 on the quantized example (#2086)	Thomas Santerre	2024-04-18	1	-8/+23
\| \| \| \| \| \| \| \| \| \| \| \| \|	* add support for l3b, new tokenizer * add todo * Add todo and use k_s model * Use the official tokenizers. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
*	Include topk sampling in the quantized example. (#2005)	Laurent Mazare	2024-04-04	1	-7/+19
\| \| \| \| \|	* Include topk sampling in the quantized example. * Also sample with top-k on the mistral side.
*	Switch the default to using the faster kernels. (#1978)	Laurent Mazare	2024-04-01	1	-3/+3
\| \| \| \| \|	* Switch the default to using the faster kernels. * Add the force-dmmv flag.
*	More ggml cuda kernels (#1977)	Laurent Mazare	2024-04-01	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add more cuda kernels for quantized matmul. * Add the vec-dot bits. * Expose the quantized matmul-vec kernels. * Also include the quantize-q8-1 kernel. * Glue code for the q8-1 quantization. * mm-vec product via q8-1 quantization. * Add a test. * Add a mm test. * Get the test to return some sensible results. * Also test dmmv. * Fix the launch params. * Allow for tweaking the force_dmmv parameter while it's experimental.
*	Add a flag to force running the quantized model on CPUs. (#1778)	Laurent Mazare	2024-02-28	1	-1/+5
\| \| \| \| \|	* Add a flag to force running the quantized model on CPUs. * Add encodec to the readme.
*	Add an option to split the prompt. (#1766)	Laurent Mazare	2024-02-27	1	-1/+14
\|
*	Quantized GGUF style (#1523)	Nicolas Patry	2024-01-17	1	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Metal quantized modifications proposal. - Add a device param, wherever needed. - Create new QMetal storage thing that implements QuantizedType. - Update everywhere needed. Fix Python. Fixing examples. Fix: fmt + clippy + stub. Moving everything around. Only missing the actual implems. Fixing everything + adding dequantized kernels. More work. Fixing matmul. Fmt + Clippy Some clippy fixes. Working state. Q2K Metal -> Bugged (also present in GGML). Q4K CPU -> Bugged (present previously, new test catch it). Q5K CPU -> Bugged (present previously). Q8_1 Both -> Never really implemented it seems Q8K metal -> Never implemented in metal Fixing Q2K bug (present in ggml). * Cleanup. * Fix the rebase. * Removing the fences speeds everything up and is correct this time... * Cleanup the fence. * After rebase. * Bad code removal. * Rebase after phi2 merge + fix replit default to CPU. * Making the CI happy. * More happy tests. --------- Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>
*	Support mistral instruct v0.2. (#1475)	Laurent Mazare	2023-12-23	1	-4/+15
\| \| \| \| \|	* Support mistral instruct v0.2. * Use the safetensors model now that they are available.
*	Mixtral quantized instruct. (#1447)	Laurent Mazare	2023-12-16	1	-0/+11
\|
*	Update the readme to mention mixtral. (#1443)	Laurent Mazare	2023-12-15	1	-0/+13
\|
*	Quantized mixtral model (#1442)	Laurent Mazare	2023-12-15	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add the Mixtral model. * Add more of the mixtral layers. * Add the final layers for mixtral. * Sketch the expert selection. * Add some expert routing logic. * Hopefully finish the routing logic for mixtral. * Add the mixtral example. * Fix the weight filenames. * Bugfix. * Another fix. * Yet another fix + remove the unused pragma. * Shape fix. * Support for quantized mixtral. * Support mixtral in the quantized example. * Mlp or moe type. * Fix the expert field namings. * Refactor the mlp bit. * More MoE logic. * Add the MoE quantized logic. * Fix the experts length.
*	Add the leo models to the quantized examples. (#1398)	Laurent Mazare	2023-12-03	1	-31/+46
\|
*	Add quantized Starling, fix open-chat prompt (#1393)	Lucas de Ávila Martins	2023-12-02	1	-6/+36
\| \| \| \| \|	* Add quantized Starling, fix open-chat prompt * Fix open-chat and starling prompts
*	Fix OpenChat 3.5 tokenizer (#1347)	Lucas de Ávila Martins	2023-11-19	1	-1/+3
\|
*	Add OpenChat 3.5 to quantized examples (#1346)	Lucas de Ávila Martins	2023-11-19	1	-7/+39
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Add OpenChat to quantized examples * Add chat prompt * Make the openchat example more in line with the other models. * Fix a typo. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>
*	Fix quantized zephyr chat prompt (#1314) (#1317)	Michael Leandersson	2023-11-11	1	-2/+7
\| \| \| \| \| \| \| \| \|	* Fix quantized zephyr chat prompt (#1314) * Avoid using a mutable variable. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
*	Quantized model small tweaks (#1290)	Laurent Mazare	2023-11-07	1	-39/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Support the shape op in ONNX. * Share the axis normalization bits. * Add some limited support for gather. * Unsqueeze. * Comparison with broadcasting. * Add Not + handle i32. * Tweaks for the quantized model.
*	Adds check for 7b-zephyr and uses correct template (#1283)	DTJ11235	2023-11-06	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \|	* Adds check for 7b-zephyr and uses correct template * Handle zephyr as mistral. * Disable the protoc bits of the CI. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
*	Add support for Zephyr-7b in the quantized model. (#1124)	Laurent Mazare	2023-10-18	1	-2/+12
\|
*	Fix the prompt for mistral when using instruct/interactive mode. (#1013)	Laurent Mazare	2023-10-01	1	-12/+31
\|
*	Integrate TheBloke quantized mistral weights. (#1012)	Laurent Mazare	2023-09-30	1	-2/+26
\|
*	Add a gif to the quantized readme. (#833)	Laurent Mazare	2023-09-13	2	-0/+2
\| \| \| \| \|	* Add a gif to the quantized readme. * gif update.
*	Add more example readmes. (#828)	Laurent Mazare	2023-09-12	1	-1/+1
\| \| \| \| \| \| \| \| \|	* Add more readmes. * Add a readme for dinov2. * Add some skeleton files for a couple more examples. * More whisper details.
*	Implement top_p / nucleus sampling (#819)	Juarez Bochi	2023-09-12	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Implement top_p / nucleus sampling * Update changelog * rustfmt * Add tests * Fix clippy warning * Fix another clippy error
*	Add a small readme for the quantized example. (#823)	Laurent Mazare	2023-09-12	1	-0/+35
\|
*	Move more models to candle-transformers (#796)	Laurent Mazare	2023-09-10	2	-372/+1
\| \| \| \| \| \| \| \| \|	* Move dinov2. * Move efficientnet. * Move the quantized llama model. * Move segment-anything.
*	Tweak some quantized args (#692)	Laurent Mazare	2023-08-31	1	-5/+14
\| \| \| \| \|	* Print the args + change the default temp/repeat penalty. * Minor formatting tweak.
*	Interactive mode for the quantized model. (#690)	Laurent Mazare	2023-08-31	2	-55/+109
\|
*	Neon optimized vecdot (#666)	Laurent Mazare	2023-08-29	2	-364/+371
\| \| \| \| \| \| \| \| \|	* Q5k vecdot. * Add the q3k vecdot. * Q2k vecdot. * Move the quantized model to its own file.
*	Remove some dead-code annotations. (#629)	Laurent Mazare	2023-08-27	1	-11/+0
\| \| \| \| \| \| \| \| \|	* Remove some dead-code annotations. * More dead code removal. * One more. * CI fix.
*	Add some optional repeat penalty. (#623)	Laurent Mazare	2023-08-27	1	-17/+5
\| \| \| \| \|	* Add some optional repeat penalty. * Add the missing files.
*	Generic implementation of vecdot for q80. (#596)	Laurent Mazare	2023-08-25	1	-5/+23
\| \| \| \| \| \| \|	* Generic implementation of vecdot for q80. * Add support for code-llama 7b. * Support more code-llama.
*	Get the rms epsilon from GGUF. (#565)	Laurent Mazare	2023-08-23	1	-8/+10
\|
*	Fix the quantized example. (#564)	Laurent Mazare	2023-08-23	1	-2/+2
\|
*	add chat models in quantized example (#551)	cksac	2023-08-23	1	-0/+18
\| \| \| \| \|	* add chat models in quantized example * cargo fmt
*	GGUF support in the quantized model. (#559)	Laurent Mazare	2023-08-23	1	-45/+143
\| \| \| \| \|	* GGUF support in the quantized model. * Get the GGUF support to work on llama.
*	GQA support in the quantized model. (#555)	Laurent Mazare	2023-08-22	1	-5/+31
\| \| \| \| \| \| \| \| \|	* GQA support in the quantized model. * Fix the reshaping. * Fix the main llama model. * Infer the proper gqa from the model kind.
*	Add some llama-v2 variants. (#545)	Laurent Mazare	2023-08-22	1	-3/+22
\|
*	Add some optional repeat penalty. (#535)	Laurent Mazare	2023-08-21	1	-0/+33
\|
*	Add a yolo-v3 example. (#528)	Laurent Mazare	2023-08-20	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Add a couple functions required for yolo. * Add the yolo-v3 example. * Add minimum and maximum. * Use the newly introduced maximum. * Cuda support for min/max + add some testing. * Allow for more tests to work with accelerate. * Fix a typo.
*	Line up the llama.cpp implementation with the candle one. (#518)	Laurent Mazare	2023-08-19	1	-40/+78
\| \| \| \| \| \| \|	* Separate the prompt stats from the post-prompt ones in the quantized example. * Slightly nicer output printing. * Line up with the llama.cpp implementation.
*	Add a simple Module trait and implement it for the various nn layers (#500)	Laurent Mazare	2023-08-18	1	-1/+1
\| \| \| \| \| \| \|	* Start adding the module trait. * Use the module trait. * Implement module for qmatmul.
*	Q6K quantization (#495)	Laurent Mazare	2023-08-17	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Print the detected arch options. * Add the q6k quantization. * Add a currently broken test. * Bugfix. * Bugfix. * Another bugfix. * Another bugfix + get the test to work.
*	Add the whisper small model. (#490)	Laurent Mazare	2023-08-17	1	-1/+1
\|
*	Add a verbose-prompt mode, similar to llama.cpp. (#489)	Laurent Mazare	2023-08-17	1	-5/+13
\|
*	Layer norm tweaks (#482)	Laurent Mazare	2023-08-17	1	-18/+4
\| \| \| \| \| \| \|	* Add some options to make layer-norm more configurable. * Add the rms-norm variant. * Replace the RmsNorm with the shared bits.