summaryrefslogtreecommitdiff
path: root/candle-core/examples
Commit message (Collapse)AuthorAgeFilesLines
* Implement the module trait directly for QMatMul. (#1372)Laurent Mazare2023-11-251-6/+5
|
* Quantized version of mistral. (#1009)Laurent Mazare2023-09-301-9/+27
| | | | | | | | | | | | | | | * Quantized version of mistral. * Integrate the quantized mistral variant. * Use the quantized weight files. * Tweak the quantization command. * Fix the dtype when computing the rotary embeddings. * Update the readme with the quantized version. * Fix the decoding of the remaining tokens.
* Use yoke to provide a self-referential container for mmaped safetenso… (#939)Laurent Mazare2023-09-231-2/+1
| | | | | | | | | * Use yoke to provide a self-referential container for mmaped safetensor files. * Add the new self-owned type for safetensor files without removing the previous version. * Add routing. * Add an initializer for the case of multiple files.
* Use the proper block size for quantizing models. (#933)Laurent Mazare2023-09-221-2/+17
| | | | | * Use the proper block size for quantizing models. * Use the proper dimension.
* T5 quantized example (#922)Laurent Mazare2023-09-211-0/+53
| | | | | | | | | | | | | | | * Load gguf files for the quantized t5. * Add the quantized t5 example. * Allow for loading local files. * Add some support for quantizing safetensor files. * Transpose before quantizing. * Quantized t5. * Retrieve the weights from the hub.
* Add a custom softmax implementation. (#744)Laurent Mazare2023-09-051-166/+0
| | | | | | | | | | | | | | | * Add a custom softmax implementation. * Add softmaxlastdim to the benchmarks. * And add a test. * Support more dtypes. * Polish the code. * Use the slow implementation on cuda. * Add a todo for the cuda kernel.
* Dilated convolutions (#657)Laurent Mazare2023-08-293-6/+6
| | | | | | | | | | | | | | | | | | | * Add the dilation parameter. * Restore the basic optimizer example. * Dilation support in cudnn. * Use the dilation parameter in the cpu backend. * More dilation support. * No support for dilation in transposed convolutions. * Add dilation to a test. * Remove a print. * Helper function.
* Llama quantization. (#625)Laurent Mazare2023-08-271-15/+75
|
* Add the quantize command. (#624)Laurent Mazare2023-08-271-1/+75
| | | | | | | * Add the quantize command. * Bugfix for writing gguf files. * And add a comment.
* More pickle support. (#588)Laurent Mazare2023-08-241-1/+1
| | | | | * More pickle support. * Be more verbose.
* Add to the cuda example a reproduction of the issue. (#579)Laurent Mazare2023-08-241-2/+11
| | | | | | | | | | | | | * Add to the cuda example a reproduction of the issue. * Tweak. * Add a test using non-square matrixes. * Fix the conv2d kernel. * Display the error. * And tweak the comment.
* Add a test for conv2d with padding + bugfix the random number generation on ↵Laurent Mazare2023-08-241-0/+3
| | | | | | | | | cuda. (#578) * Add a test for conv2d with padding. * Cosmetic changes. * Bugfix the rand function on the cuda backend.
* Add some group parameter to convolutions. (#566)Laurent Mazare2023-08-233-4/+4
| | | | | | | | | | | | | * Add some group parameter to convolutions. * Avoid some unnecessary groups checks. * Move the tensor convolution bits. * Properh handling of groups. * Bump the crate version. * And add a changelog.
* Handle GGUF files in tensor-tools. (#558)Laurent Mazare2023-08-231-1/+20
|
* Small tweaks to tensor-tools. (#517)Laurent Mazare2023-08-191-9/+15
|
* Retrieve tensor data from PyTorch files. (#516)Laurent Mazare2023-08-191-5/+7
|
* Retrieve more information from PyTorch checkpoints. (#515)Laurent Mazare2023-08-191-3/+9
| | | | | * Retrieve more information from PyTorch checkpoints. * Add enough support to load dino-v2 backbone weights.
* Add ggml support to tensor-tools (#512)Laurent Mazare2023-08-191-15/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Pickle work-in-progress. * More unpickling. * More pickling. * Proper handling of setitems. * Clippy. * Again more pickling. * Restore the example. * Add enough pickle support to get the list of tensors. * Read the data from zip files. * Retrieve the tensor shape. * Extract the size and dtype. * More storage types. * Improve the destructuring. * Also support ggml files.
* Preliminary support for importing PyTorch weights. (#511)Laurent Mazare2023-08-191-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | * Pickle work-in-progress. * More unpickling. * More pickling. * Proper handling of setitems. * Clippy. * Again more pickling. * Restore the example. * Add enough pickle support to get the list of tensors. * Read the data from zip files. * Retrieve the tensor shape. * Extract the size and dtype. * More storage types. * Improve the destructuring.
* Add the tensor-tools binary. (#510)Laurent Mazare2023-08-191-0/+72
|
* Tensor -> QTensor conversion (#496)Laurent Mazare2023-08-181-1/+1
| | | | | | | | | | | * Sketch some qmatmul test. * Add the quantization function. * More testing. * Make the test smaller and faster. * Add some shape checking.
* AVX version of the vecdot for q4_0. (#474)Laurent Mazare2023-08-171-0/+24
| | | | | | | | | * AVX version of the vecdot for q4_0. * Tweak the avx bits. * Add a qmatmul benchmark. * Fix the quantized test.
* Cudnn support (#445)Laurent Mazare2023-08-141-5/+4
| | | | | | | | | | | * Add a cudnn feature to be used for conv2d. * Allocate the proper workspace. * Only create a single cudnn handle per cuda device. * Proper cudnn usage. * Bugfix.
* Add a softmax bench. (#433)Laurent Mazare2023-08-131-1/+29
| | | | | * Add a softmax bench. * Add the vectorized sum reduce.
* Add a matmul benchmark. (#429)Laurent Mazare2023-08-131-0/+19
|
* More accelerate optimizations (#427)Laurent Mazare2023-08-132-0/+6
| | | | | | | | | | | * Add more tracing to the whisper example. * Support accelerate in more examples. * Use accelerate for pointwise functions. * Use accelerate for binary operations too. * Bugfix for binary operation: use the rhs before the lhs.
* Small example for benchmarking some cpu ops (#394)Laurent Mazare2023-08-102-24/+95
| | | | | | | * Refactor the benchmark example. * Rename the example. * Add some comments.
* Add a conv1d benchmark based on the whisper sizes. (#377)Laurent Mazare2023-08-091-0/+24
| | | | | * Add a conv1d benchmark based on the whisper sizes. * Enforce the batch-dim in conv1d.
* Add some conv1d test + bugfix using padding. (#349)Laurent Mazare2023-08-081-20/+6
|
* Support the Accelerate BLAS on macOS. (#325)Laurent Mazare2023-08-051-0/+3
| | | | | * Add the accelerate feature. * Ffi tweaks.
* Rename the candle crate to candle-core (#301)Laurent Mazare2023-08-023-3/+3
| | | | | * Rename to candle-core. * More candle-core renaming.
* Simplify Tensor::randn. (#255)Laurent Mazare2023-07-271-0/+5
| | | | | | | | | * Simplify Tensor::randn. * Also switch Tensor::rand to use a generic dtype. * Support sampling for f16. * Cleanup.
* Simplify the parameters used by sum and sum_keepdim. (#165)Laurent Mazare2023-07-142-6/+6
|
* Use the same default as pytorch for sum. (#164)Laurent Mazare2023-07-132-10/+10
|
* Sketch a fast cuda kernel for reduce-sum. (#109)Laurent Mazare2023-07-081-0/+15
| | | | | | | | | | | * Sketch a fast cuda kernel for reduce-sum. * Sketch the rust support code for the fast sum kernel. * More work on the fast kernel. * Add some testing ground. * A couple fixes for the fast sum kernel.
* Add some very simple sum benchmark. (#108)Laurent Mazare2023-07-082-34/+51
| | | | | * Add some very simple sum benchmark. * Rename the file.
* Add mkl support for matrix multiply. (#86)Laurent Mazare2023-07-062-0/+6
| | | | | | | | | | | | | * Fix some rebase issues. * Use mkl instead. * Use mkl in bert. * Add the optional mkl feature. * Conditional compilation based on the mkl feature. * Add more mkl support.
* Move llama in a cargo-examples directory.laurent2023-07-034-912/+0
|
* Adding a bit more docs around safety.Nicolas Patry2023-07-031-1/+1
|
* Move more safetensors bits to the shared module.laurent2023-07-031-16/+8
|
* Move some safetensors bits in the candle-core crate.laurent2023-07-031-31/+2
|
* Add a flag for custom prompt.laurent2023-07-011-2/+7
|
* Early conversion for the llama weights.laurent2023-06-302-45/+19
|
* Add a const to easily tweak the dtype used for llama internal computations.laurent2023-06-301-4/+8
|
* Tweak the kv-cache flag.laurent2023-06-291-4/+4
|
* Add a flag.laurent2023-06-291-6/+11
|
* Enable the KV cache after fixing the caching length and the rope bits.laurent2023-06-291-14/+21
|
* Only narrow when needed + deactivate the kv cache.laurent2023-06-291-2/+6
|
* Add some KV cache to llama.laurent2023-06-291-36/+72
|
* Typo.Nicolas Patry2023-06-291-1/+1
|