| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Some first `Module` implementations
* Add `state_dict` and `load_state_dict` functionality
* Move modules around and create `candle.nn.Linear`
* Add `nn.Embedding` and `nn.LayerNorm`
* Add BERT implementation
* Batch q-matmul
* Automatically dequantize `QTensors` if a `Tensor` is expected
* Add Module `.to()`, `.cuda()`, `cpu()` and `.type()` functionality
* Unittests for `Module`, `Tensor` and `candle.utils`
* Add `pytorch` like slicing to `Tensor`
* Cleanup and BERT fixes
* `black` formatting + unit-test for `nn.Linear`
* Refactor slicing implementation
|
|
|
|
|
|
|
|
|
| |
* Start generating return types
* Finish tensor type hinting
* Add `save_gguf` to `utils`
* Typehint `quant-llama.py`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Begin to generate typehints.
* generate correct stubs
* Correctly include stubs
* Add comments and typhints to static functions
* ensure candle-pyo3 directory
* Make `llama.rope.freq_base` optional
* `fmt`
|
|
|
|
|
|
|
| |
* Return the metadata in the gguf pyo3 bindings.
* Read the metadata in the quantized llama example.
* Get inference to work on gguf files.
|
| |
|
|
|
|
|
|
|
|
|
| |
* More quantized llama in python.
* Expose a couple more functions.
* Apply the last layer.
* Use the vocab from the ggml files.
|
|
* Sketch a quantized llama using the pyo3 api.
* Add more ops.
* Expose a few more functions to use in the quantized model.
* Rope embeddings.
* Get the forward pass to work.
|