summaryrefslogtreecommitdiff
path: root/candle-core/Cargo.toml
Commit message (Collapse)AuthorAgeFilesLines
* UG metal integration. (#2580)Laurent Mazare2024-10-271-1/+2
|
* Support for UG kernels. (#2579)Laurent Mazare2024-10-271-1/+3
| | | | | * Support for UG kernels. * Add a dedicated test.
* Add a basic metal example with capture (#2324)Laurent Mazare2024-07-091-0/+4
| | | | | * Add some tracing. * Get the trace to work.
* feat(bf16): add cast support + tests for cast + bin ops (#1524)Kyle McCarthy2024-01-111-1/+0
|
* Seperate benchmarks by enabled features (#1538)ivarflakstad2024-01-111-1/+1
| | | | | | | | | | | | | | | | | * Use cfg to seperate benchmark results based on features * Remove allow pragma * Avoid some unnecessary returns. * Improve benchmarks layout * Derive bench_name from actual device * Run CPU benchmarks even when GPU feature is enabled --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>
* Simplifying our internal cargo dependencies. (#1529)Nicolas Patry2024-01-071-2/+2
|
* Bump the crate version to 0.3.3. (#1490)Laurent Mazare2023-12-281-2/+2
|
* Merge pull request #1318 from huggingface/metal4Nicolas Patry2023-12-201-0/+7
|\ | | | | Starting to fix some tests.
| * Optimizing decode matmul (Phi at 28tok/s on M3).Nicolas Patry2023-12-201-0/+7
| | | | | | | | Adding some benchmark in order to help checking out matmul performance.
* | Bump the crate version to 0.3.2. (#1452)Laurent Mazare2023-12-171-2/+2
|/
* Fix comments.Nicolas Patry2023-11-201-1/+1
|
* Adding the actual backendNicolas Patry2023-11-201-1/+2
|
* Update for 0.3.1. (#1324)Laurent Mazare2023-11-111-1/+1
|
* Metal part 1 - Scaffolding for metal. (#1308)Nicolas Patry2023-11-101-0/+2
| | | | | * Metal part 1 - Scaffolding for metal. * Remove tracing.
* Bump the version to 0.3.0. (#1014)Laurent Mazare2023-10-011-1/+1
| | | | | * Bump the version to 0.3.0. * Changelog update.
* Use yoke to provide a self-referential container for mmaped safetenso… (#939)Laurent Mazare2023-09-231-0/+1
| | | | | | | | | * Use yoke to provide a self-referential container for mmaped safetensor files. * Add the new self-owned type for safetensor files without removing the previous version. * Add routing. * Add an initializer for the case of multiple files.
* Bump the crate versions to v0.2.3. (#886)Laurent Mazare2023-09-181-1/+1
| | | | | * Bump the crate version. * Also update the python bindings.
* Bump the crate version + update the changelog. (#822)Laurent Mazare2023-09-121-1/+1
|
* Add some documentation. (#673)Laurent Mazare2023-08-301-1/+1
| | | | | * Add some documentation. * Bump the crate version.
* Bump the crate version + update CHANGELOG. (#628)Laurent Mazare2023-08-271-1/+1
|
* Add some group parameter to convolutions. (#566)Laurent Mazare2023-08-231-1/+1
| | | | | | | | | | | | | * Add some group parameter to convolutions. * Avoid some unnecessary groups checks. * Move the tensor convolution bits. * Properh handling of groups. * Bump the crate version. * And add a changelog.
* Bump the crates version to 0.1.2. (#522)Laurent Mazare2023-08-201-1/+1
|
* Rename vec-dot to vec-ops. (#449)Laurent Mazare2023-08-151-1/+1
| | | | | | | * Rename vec-dot to vec-ops. * Also bump the crate version. * Add a currently empty readme.
* Simd support (#448)Laurent Mazare2023-08-151-1/+0
| | | | | | | | | * Import the simd intrinsics in candle-core. * simd version of reduce-sum. * Bugfix. * Fix some clippy lints.
* Cudnn support (#445)Laurent Mazare2023-08-141-1/+2
| | | | | | | | | | | * Add a cudnn feature to be used for conv2d. * Allocate the proper workspace. * Only create a single cudnn handle per cuda device. * Proper cudnn usage. * Bugfix.
* Parallelise the CPU kernels for the conv ops. (#401)Laurent Mazare2023-08-111-0/+1
| | | | | | | | | * Parallelise the conv2d op. * Tighter control on threading. * Also parallelise conv1d. * Add some safety comment.
* Small example for benchmarking some cpu ops (#394)Laurent Mazare2023-08-101-0/+1
| | | | | | | * Refactor the benchmark example. * Rename the example. * Add some comments.
* Conv1d optimize (#392)Laurent Mazare2023-08-101-0/+1
| | | | | | | | | * Reorder the conv1d loops in the cpu backend. * Optimize the 1d convolution. * Conv1D optimize. * Fix some clippy lints.
* Fix randn cpu (#382)Lei2023-08-101-0/+1
| | | | | | | | | | | * Change distributions Standard generates in [0, 1), Normal is correct. * Add test Not sure if this is the best place to put the test * Remove unnecessary use
* Support the Accelerate BLAS on macOS. (#325)Laurent Mazare2023-08-051-0/+2
| | | | | * Add the accelerate feature. * Ffi tweaks.
* Update the repo location. (#305)Laurent Mazare2023-08-021-8/+7
|
* Add version numbers for all the candle crates (#303)Laurent Mazare2023-08-021-1/+1
| | | | | * Switch to candle-gemm for the time being. * Add the missing versions.
* Rename the candle crate to candle-core (#301)Laurent Mazare2023-08-021-1/+1
| | | | | * Rename to candle-core. * More candle-core renaming.
* Centralize the dependency versions and inherit them. (#177)Laurent Mazare2023-07-161-18/+14
|
* Removing cuda default.Nicolas Patry2023-07-141-1/+1
| | | | | | | Seems very important for a lot of exploring users usually on laptop without GPUs. Adding more README instructions in a follow up.
* Random initializers. (#128)Laurent Mazare2023-07-101-0/+1
| | | | | * Random initialization. * CPU rng generation.
* Remove the dependency to blas and use mkl directly. (#125)Laurent Mazare2023-07-101-2/+2
|
* Sketch the candle-nn crate. (#115)Laurent Mazare2023-07-101-1/+2
| | | | | | | * Sketch the candle-nn crate. * Tweak the cuda dependencies. * More cuda tweaks.
* Use cublas bf16. (#101)Laurent Mazare2023-07-071-1/+2
|
* Add mkl support for matrix multiply. (#86)Laurent Mazare2023-07-061-0/+3
| | | | | | | | | | | | | * Fix some rebase issues. * Use mkl instead. * Use mkl in bert. * Add the optional mkl feature. * Conditional compilation based on the mkl feature. * Add more mkl support.
* Move llama in a cargo-examples directory.laurent2023-07-031-5/+0
|
* Use the patched gemm for the time being.laurent2023-07-031-1/+3
|
* Move more safetensors bits to the shared module.laurent2023-07-031-7/+7
|
* Add backtraces.laurent2023-06-291-1/+1
|
* Tmp.Ubuntu2023-06-281-0/+3
|
* Use num-cpus to enable parallelism.laurent2023-06-271-0/+1
|
* Refactor the hierarchy.Nicolas Patry2023-06-271-0/+32