diff options
author | zachcp <zachcp@users.noreply.github.com> | 2024-11-15 02:30:15 -0500 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-11-15 08:30:15 +0100 |
commit | f689ce5d39c6f1475dfc71503288ea2905c8f685 (patch) | |
tree | 10b35ae68f1f5683edfebdcf92970de78ba05283 /candle-transformers/src/models/recurrent_gemma.rs | |
parent | 0ed24b9852ccc7dfb92d555afba3d56c2a3f3224 (diff) | |
download | candle-f689ce5d39c6f1475dfc71503288ea2905c8f685.tar.gz candle-f689ce5d39c6f1475dfc71503288ea2905c8f685.tar.bz2 candle-f689ce5d39c6f1475dfc71503288ea2905c8f685.zip |
Documentation Pass for Models (#2617)
* links in chinese_clip
* links for clip model
* add mod docs for flux and llava
* module doc for MMDIT and MIMI
* add docs for a few more modesl
* mod docs for bert naser and beit
* add module docs for convmixer colpali codegeex and chatglm
* add another series of moddocs
* add fastvit-llama2_c
* module docs mamba -> mobileone
* module docs from moondream-phi3
* mod docs for quantized and qwen
* update to yi
* fix long names
* Update llama2_c.rs
* Update llama2_c_weights.rs
* Fix the link for mimi + tweaks
---------
Co-authored-by: Laurent Mazare <laurent.mazare@gmail.com>
Diffstat (limited to 'candle-transformers/src/models/recurrent_gemma.rs')
-rw-r--r-- | candle-transformers/src/models/recurrent_gemma.rs | 21 |
1 files changed, 19 insertions, 2 deletions
diff --git a/candle-transformers/src/models/recurrent_gemma.rs b/candle-transformers/src/models/recurrent_gemma.rs index 24d2b7e3..d6a029ba 100644 --- a/candle-transformers/src/models/recurrent_gemma.rs +++ b/candle-transformers/src/models/recurrent_gemma.rs @@ -1,5 +1,22 @@ -// This implementation is based on the python version from huggingface/transformers. -// https://github.com/huggingface/transformers/blob/b109257f4fb8b1166e7c53cc5418632014ed53a5/src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py#L2 +//! Recurrent Gemma model implementation +//! +//! Recurrent Gemma is a version of the Gemma language model that incorporates recurrent memory. +//! This allows the model to maintain state between predictions and have longer-range memory. +//! +//! Key characteristics: +//! - Real-gated linear recurrent units (RGLRU) +//! - 1D convolution for local context +//! - RMSNorm for layer normalization +//! - Rotary positional embeddings (RoPE) +//! - Grouped query attention +//! +//! References: +//! - [Gemma: Open Models Based on Gemini Technology](https://blog.google/technology/developers/gemma-open-models/) +//! - [Recurrent Memory model architecture](https://arxiv.org/abs/2402.00441) +//! +//! This implementation is based on the python version from huggingface/transformers. +//! https://github.com/huggingface/transformers/blob/b109257f4fb8b1166e7c53cc5418632014ed53a5/src/transformers/models/recurrent_gemma/modeling_recurrent_gemma.py#L2 +//! use candle::{DType, Device, IndexOp, Module, Result, Tensor, D}; use candle_nn::{linear_b as linear, Linear, VarBuilder}; use std::sync::Arc; |