Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | UG metal integration. (#2580) | Laurent Mazare | 2024-10-27 | 1 | -1/+2 |
| | |||||
* | Support for UG kernels. (#2579) | Laurent Mazare | 2024-10-27 | 1 | -1/+3 |
| | | | | | * Support for UG kernels. * Add a dedicated test. | ||||
* | Add a basic metal example with capture (#2324) | Laurent Mazare | 2024-07-09 | 1 | -0/+4 |
| | | | | | * Add some tracing. * Get the trace to work. | ||||
* | feat(bf16): add cast support + tests for cast + bin ops (#1524) | Kyle McCarthy | 2024-01-11 | 1 | -1/+0 |
| | |||||
* | Seperate benchmarks by enabled features (#1538) | ivarflakstad | 2024-01-11 | 1 | -1/+1 |
| | | | | | | | | | | | | | | | | | * Use cfg to seperate benchmark results based on features * Remove allow pragma * Avoid some unnecessary returns. * Improve benchmarks layout * Derive bench_name from actual device * Run CPU benchmarks even when GPU feature is enabled --------- Co-authored-by: Laurent <laurent.mazare@gmail.com> | ||||
* | Simplifying our internal cargo dependencies. (#1529) | Nicolas Patry | 2024-01-07 | 1 | -2/+2 |
| | |||||
* | Bump the crate version to 0.3.3. (#1490) | Laurent Mazare | 2023-12-28 | 1 | -2/+2 |
| | |||||
* | Merge pull request #1318 from huggingface/metal4 | Nicolas Patry | 2023-12-20 | 1 | -0/+7 |
|\ | | | | | Starting to fix some tests. | ||||
| * | Optimizing decode matmul (Phi at 28tok/s on M3). | Nicolas Patry | 2023-12-20 | 1 | -0/+7 |
| | | | | | | | | Adding some benchmark in order to help checking out matmul performance. | ||||
* | | Bump the crate version to 0.3.2. (#1452) | Laurent Mazare | 2023-12-17 | 1 | -2/+2 |
|/ | |||||
* | Fix comments. | Nicolas Patry | 2023-11-20 | 1 | -1/+1 |
| | |||||
* | Adding the actual backend | Nicolas Patry | 2023-11-20 | 1 | -1/+2 |
| | |||||
* | Update for 0.3.1. (#1324) | Laurent Mazare | 2023-11-11 | 1 | -1/+1 |
| | |||||
* | Metal part 1 - Scaffolding for metal. (#1308) | Nicolas Patry | 2023-11-10 | 1 | -0/+2 |
| | | | | | * Metal part 1 - Scaffolding for metal. * Remove tracing. | ||||
* | Bump the version to 0.3.0. (#1014) | Laurent Mazare | 2023-10-01 | 1 | -1/+1 |
| | | | | | * Bump the version to 0.3.0. * Changelog update. | ||||
* | Use yoke to provide a self-referential container for mmaped safetenso… (#939) | Laurent Mazare | 2023-09-23 | 1 | -0/+1 |
| | | | | | | | | | * Use yoke to provide a self-referential container for mmaped safetensor files. * Add the new self-owned type for safetensor files without removing the previous version. * Add routing. * Add an initializer for the case of multiple files. | ||||
* | Bump the crate versions to v0.2.3. (#886) | Laurent Mazare | 2023-09-18 | 1 | -1/+1 |
| | | | | | * Bump the crate version. * Also update the python bindings. | ||||
* | Bump the crate version + update the changelog. (#822) | Laurent Mazare | 2023-09-12 | 1 | -1/+1 |
| | |||||
* | Add some documentation. (#673) | Laurent Mazare | 2023-08-30 | 1 | -1/+1 |
| | | | | | * Add some documentation. * Bump the crate version. | ||||
* | Bump the crate version + update CHANGELOG. (#628) | Laurent Mazare | 2023-08-27 | 1 | -1/+1 |
| | |||||
* | Add some group parameter to convolutions. (#566) | Laurent Mazare | 2023-08-23 | 1 | -1/+1 |
| | | | | | | | | | | | | | * Add some group parameter to convolutions. * Avoid some unnecessary groups checks. * Move the tensor convolution bits. * Properh handling of groups. * Bump the crate version. * And add a changelog. | ||||
* | Bump the crates version to 0.1.2. (#522) | Laurent Mazare | 2023-08-20 | 1 | -1/+1 |
| | |||||
* | Rename vec-dot to vec-ops. (#449) | Laurent Mazare | 2023-08-15 | 1 | -1/+1 |
| | | | | | | | * Rename vec-dot to vec-ops. * Also bump the crate version. * Add a currently empty readme. | ||||
* | Simd support (#448) | Laurent Mazare | 2023-08-15 | 1 | -1/+0 |
| | | | | | | | | | * Import the simd intrinsics in candle-core. * simd version of reduce-sum. * Bugfix. * Fix some clippy lints. | ||||
* | Cudnn support (#445) | Laurent Mazare | 2023-08-14 | 1 | -1/+2 |
| | | | | | | | | | | | * Add a cudnn feature to be used for conv2d. * Allocate the proper workspace. * Only create a single cudnn handle per cuda device. * Proper cudnn usage. * Bugfix. | ||||
* | Parallelise the CPU kernels for the conv ops. (#401) | Laurent Mazare | 2023-08-11 | 1 | -0/+1 |
| | | | | | | | | | * Parallelise the conv2d op. * Tighter control on threading. * Also parallelise conv1d. * Add some safety comment. | ||||
* | Small example for benchmarking some cpu ops (#394) | Laurent Mazare | 2023-08-10 | 1 | -0/+1 |
| | | | | | | | * Refactor the benchmark example. * Rename the example. * Add some comments. | ||||
* | Conv1d optimize (#392) | Laurent Mazare | 2023-08-10 | 1 | -0/+1 |
| | | | | | | | | | * Reorder the conv1d loops in the cpu backend. * Optimize the 1d convolution. * Conv1D optimize. * Fix some clippy lints. | ||||
* | Fix randn cpu (#382) | Lei | 2023-08-10 | 1 | -0/+1 |
| | | | | | | | | | | | * Change distributions Standard generates in [0, 1), Normal is correct. * Add test Not sure if this is the best place to put the test * Remove unnecessary use | ||||
* | Support the Accelerate BLAS on macOS. (#325) | Laurent Mazare | 2023-08-05 | 1 | -0/+2 |
| | | | | | * Add the accelerate feature. * Ffi tweaks. | ||||
* | Update the repo location. (#305) | Laurent Mazare | 2023-08-02 | 1 | -8/+7 |
| | |||||
* | Add version numbers for all the candle crates (#303) | Laurent Mazare | 2023-08-02 | 1 | -1/+1 |
| | | | | | * Switch to candle-gemm for the time being. * Add the missing versions. | ||||
* | Rename the candle crate to candle-core (#301) | Laurent Mazare | 2023-08-02 | 1 | -1/+1 |
| | | | | | * Rename to candle-core. * More candle-core renaming. | ||||
* | Centralize the dependency versions and inherit them. (#177) | Laurent Mazare | 2023-07-16 | 1 | -18/+14 |
| | |||||
* | Removing cuda default. | Nicolas Patry | 2023-07-14 | 1 | -1/+1 |
| | | | | | | | Seems very important for a lot of exploring users usually on laptop without GPUs. Adding more README instructions in a follow up. | ||||
* | Random initializers. (#128) | Laurent Mazare | 2023-07-10 | 1 | -0/+1 |
| | | | | | * Random initialization. * CPU rng generation. | ||||
* | Remove the dependency to blas and use mkl directly. (#125) | Laurent Mazare | 2023-07-10 | 1 | -2/+2 |
| | |||||
* | Sketch the candle-nn crate. (#115) | Laurent Mazare | 2023-07-10 | 1 | -1/+2 |
| | | | | | | | * Sketch the candle-nn crate. * Tweak the cuda dependencies. * More cuda tweaks. | ||||
* | Use cublas bf16. (#101) | Laurent Mazare | 2023-07-07 | 1 | -1/+2 |
| | |||||
* | Add mkl support for matrix multiply. (#86) | Laurent Mazare | 2023-07-06 | 1 | -0/+3 |
| | | | | | | | | | | | | | * Fix some rebase issues. * Use mkl instead. * Use mkl in bert. * Add the optional mkl feature. * Conditional compilation based on the mkl feature. * Add more mkl support. | ||||
* | Move llama in a cargo-examples directory. | laurent | 2023-07-03 | 1 | -5/+0 |
| | |||||
* | Use the patched gemm for the time being. | laurent | 2023-07-03 | 1 | -1/+3 |
| | |||||
* | Move more safetensors bits to the shared module. | laurent | 2023-07-03 | 1 | -7/+7 |
| | |||||
* | Add backtraces. | laurent | 2023-06-29 | 1 | -1/+1 |
| | |||||
* | Tmp. | Ubuntu | 2023-06-28 | 1 | -0/+3 |
| | |||||
* | Use num-cpus to enable parallelism. | laurent | 2023-06-27 | 1 | -0/+1 |
| | |||||
* | Refactor the hierarchy. | Nicolas Patry | 2023-06-27 | 1 | -0/+32 |