Commit message (Collapse) | Author | Age | Files | Lines | ||
---|---|---|---|---|---|---|
... | ||||||
* | Add the AdamW optimizer. (#307) | Laurent Mazare | 2023-08-02 | 2 | -3/+115 | |
| | | | | | * Add the AdamW optimizer. * Add some AdamW test validated against PyTorch. | |||||
* | Use index-select for the embeddings as it supports backprop. (#298) | Laurent Mazare | 2023-08-01 | 1 | -1/+1 | |
| | ||||||
* | Llama more training (#297) | Laurent Mazare | 2023-08-01 | 6 | -19/+195 | |
| | | | | | | | | | | | * Rework the var-builder to handle initializations. * Add some helper functions for layer creation. * Improve the layer initializations. * Get initialized variables. * Precompute the rot embeddings when training lamas. | |||||
* | Add some batcher variants that handle errors. (#294) | Laurent Mazare | 2023-08-01 | 1 | -0/+75 | |
| | ||||||
* | Add the batcher. (#293) | Laurent Mazare | 2023-08-01 | 2 | -0/+97 | |
| | ||||||
* | Add the cross-entropy loss. (#287) | Laurent Mazare | 2023-07-31 | 1 | -0/+17 | |
| | ||||||
* | Make the nll op closer to the pytorch version + add a test. (#286) | Laurent Mazare | 2023-07-31 | 1 | -2/+22 | |
| | ||||||
* | Improve the mnist training example. (#276) | Laurent Mazare | 2023-07-29 | 2 | -4/+37 | |
| | | | | | | | * Improve the mnist training example. * Add some initialization routine that can be used for nn. * Proper initialization in the mnist example. | |||||
* | More mnist training. (#275) | Laurent Mazare | 2023-07-29 | 1 | -0/+1 | |
| | ||||||
* | Softmax numerical stability. (#267) | Laurent Mazare | 2023-07-28 | 1 | -0/+24 | |
| | | | | | * Softmax numerical stability. * Fix the flash-attn test. | |||||
* | Added comment about offsets. | Nicolas Patry | 2023-07-27 | 1 | -0/+3 | |
| | ||||||
* | Fixing slice errors + comments. | Nicolas Patry | 2023-07-27 | 1 | -3/+22 | |
| | ||||||
* | Removing inner dependency on safetensors. | Nicolas Patry | 2023-07-27 | 1 | -4/+6 | |
| | ||||||
* | TP sharding v2 | Nicolas Patry | 2023-07-27 | 1 | -5/+53 | |
| | ||||||
* | Move some shared functions to the nn module. (#221) | Laurent Mazare | 2023-07-22 | 3 | -0/+20 | |
| | ||||||
* | Rename the .r functions to .dims so as to be a bit more explicit. (#220) | Laurent Mazare | 2023-07-22 | 2 | -2/+2 | |
| | ||||||
* | [Proposal] Remove SafeTensor wrapper (allows finer control for users). | Nicolas Patry | 2023-07-19 | 1 | -2/+6 | |
| | ||||||
* | Vision dataset (#179) | Laurent Mazare | 2023-07-16 | 4 | -0/+140 | |
| | | | | | * Add some readers for the mnist dataset. * Import the cifar and mnist dataset. | |||||
* | Add backtrace information to errors where relevant. (#166) | Laurent Mazare | 2023-07-14 | 1 | -7/+17 | |
| | | | | | | | * Add backtrace information to errors where relevant. * More backtrace information. * Add to the FAQ. | |||||
* | Simplify the parameters used by sum and sum_keepdim. (#165) | Laurent Mazare | 2023-07-14 | 1 | -2/+2 | |
| | ||||||
* | Use the same default as pytorch for sum. (#164) | Laurent Mazare | 2023-07-13 | 1 | -2/+2 | |
| | ||||||
* | Add the gradient for reduce-sum. (#162) | Laurent Mazare | 2023-07-13 | 1 | -1/+1 | |
| | | | | | | | | | * Add the gradient for reduce-sum. * And add the gradient for the broadcast ops. * Add some backprop tests. * Add some linear regression example. | |||||
* | Add the SGD optimizer (#160) | Laurent Mazare | 2023-07-13 | 2 | -0/+49 | |
| | | | | | | | | | * Add the nn::optim and some conversion traits. * Add the backward_step function for SGD. * Get the SGD optimizer to work and add a test. * Make the test slighly simpler. | |||||
* | Add some documentation and test to the linear layer. (#151) | Laurent Mazare | 2023-07-12 | 4 | -0/+51 | |
| | | | | | | | * Add some documentation and test to the linear layer. * Layer norm doc. * Minor tweaks. | |||||
* | Cleanup the main crate error and add a couple dedicated ones (#142) | Laurent Mazare | 2023-07-12 | 1 | -2/+3 | |
| | | | | | | | | | * Cosmetic cleanups to the error enum. * More error cleanup. * Proper error handling rather than panicing. * Add some conv1d dedicated error. | |||||
* | Allow for lazy loading of npz files, use it in llama to reduce memory usage ↵ | Laurent Mazare | 2023-07-11 | 1 | -2/+27 | |
| | | | | in the cpu version. (#141) | |||||
* | Resurrect the llama npy support. (#140) | Laurent Mazare | 2023-07-11 | 1 | -28/+55 | |
| | ||||||
* | Sketch the tensor initialization module. (#134) | Laurent Mazare | 2023-07-11 | 2 | -6/+116 | |
| | ||||||
* | VarBuilder path creation (#131) | Laurent Mazare | 2023-07-10 | 1 | -19/+84 | |
| | | | | | | | * Use a struct for the safetensor+routing. * Group the path and the var-builder together. * Fix for the empty path case. | |||||
* | Move the var-builder in a central place. (#130) | Laurent Mazare | 2023-07-10 | 2 | -0/+61 | |
| | ||||||
* | Move the conv1d layer to candle_nn. (#117) | Laurent Mazare | 2023-07-10 | 2 | -0/+51 | |
| | ||||||
* | [nn] Move the Embedding and Activation parts. (#116) | Laurent Mazare | 2023-07-10 | 3 | -0/+53 | |
| | | | | | * Share the Embedding and Activation parts. * Tweak some activations. | |||||
* | Sketch the candle-nn crate. (#115) | Laurent Mazare | 2023-07-10 | 3 | -0/+64 | |
* Sketch the candle-nn crate. * Tweak the cuda dependencies. * More cuda tweaks. |