summaryrefslogtreecommitdiff
path: root/candle-pyo3/quant-llama.py
Commit message (Collapse)AuthorAgeFilesLines
* Make the Python Wrapper more Hackable and simplify Quantization (#1010)Lukas Kreussel2023-10-061-159/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | * Some first `Module` implementations * Add `state_dict` and `load_state_dict` functionality * Move modules around and create `candle.nn.Linear` * Add `nn.Embedding` and `nn.LayerNorm` * Add BERT implementation * Batch q-matmul * Automatically dequantize `QTensors` if a `Tensor` is expected * Add Module `.to()`, `.cuda()`, `cpu()` and `.type()` functionality * Unittests for `Module`, `Tensor` and `candle.utils` * Add `pytorch` like slicing to `Tensor` * Cleanup and BERT fixes * `black` formatting + unit-test for `nn.Linear` * Refactor slicing implementation
* Add return types to `*.pyi` stubs (#880)Lukas Kreussel2023-09-171-15/+16
| | | | | | | | | * Start generating return types * Finish tensor type hinting * Add `save_gguf` to `utils` * Typehint `quant-llama.py`
* Generate `*.pyi` stubs for PyO3 wrapper (#870)Lukas Kreussel2023-09-161-3/+4
| | | | | | | | | | | | | | | * Begin to generate typehints. * generate correct stubs * Correctly include stubs * Add comments and typhints to static functions * ensure candle-pyo3 directory * Make `llama.rope.freq_base` optional * `fmt`
* Return the metadata in the gguf pyo3 bindings. (#729)Laurent Mazare2023-09-041-4/+35
| | | | | | | * Return the metadata in the gguf pyo3 bindings. * Read the metadata in the quantized llama example. * Get inference to work on gguf files.
* Recommend using maturin. (#717)Laurent Mazare2023-09-021-14/+0
|
* More quantized llama in python. (#716)Laurent Mazare2023-09-021-6/+13
| | | | | | | | | * More quantized llama in python. * Expose a couple more functions. * Apply the last layer. * Use the vocab from the ggml files.
* Sketch a quantized llama using the pyo3 api. (#715)Laurent Mazare2023-09-021-0/+171
* Sketch a quantized llama using the pyo3 api. * Add more ops. * Expose a few more functions to use in the quantized model. * Rope embeddings. * Get the forward pass to work.