Class: Secryst::TransformerDecoderLayer
- Inherits:
-
Torch::NN::Module
- Object
- Torch::NN::Module
- Secryst::TransformerDecoderLayer
- Defined in:
- lib/secryst/transformer.rb
Instance Method Summary collapse
-
#forward(tgt, memory, tgt_mask: nil, memory_mask: nil, tgt_key_padding_mask: nil, memory_key_padding_mask: nil) ⇒ Object
Pass the inputs (and mask) through the decoder layer.
-
#initialize(d_model, nhead, dim_feedforward: 2048, dropout: 0.1, activation: "relu") ⇒ TransformerDecoderLayer
constructor
TransformerDecoderLayer is made up of self-attn, multi-head-attn and feedforward network.
Constructor Details
#initialize(d_model, nhead, dim_feedforward: 2048, dropout: 0.1, activation: "relu") ⇒ TransformerDecoderLayer
TransformerDecoderLayer is made up of self-attn, multi-head-attn and feedforward network. This standard decoder layer is based on the paper “Attention Is All You Need”. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000-6010. Users may modify or implement in a different way during application. Args:
d_model: the number of expected features in the input (required).
nhead: the number of heads in the multiheadattention models (required).
dim_feedforward: the dimension of the feedforward network model (default=2048).
dropout: the dropout value (default=0.1).
activation: the activation function of intermediate layer, relu or gelu (default=relu).
- Examples
-
>>> decoder_layer = TransformerDecoderLayer(512, 8) >>> memory = Torch.rand(10, 32, 512) >>> tgt = Torch.rand(20, 32, 512) >>> out = decoder_layer.call(tgt, memory)
298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 |
# File 'lib/secryst/transformer.rb', line 298 def initialize(d_model, nhead, dim_feedforward: 2048, dropout: 0.1, activation: "relu") super() @self_attn = MultiheadAttention.new(d_model, nhead, dropout: dropout) @multihead_attn = MultiheadAttention.new(d_model, nhead, dropout: dropout) # Implementation of Feedforward model @linear1 = Torch::NN::Linear.new(d_model, dim_feedforward) @dropout = Torch::NN::Dropout.new(p: dropout) @linear2 = Torch::NN::Linear.new(dim_feedforward, d_model) @norm1 = Torch::NN::LayerNorm.new(d_model) @norm2 = Torch::NN::LayerNorm.new(d_model) @norm3 = Torch::NN::LayerNorm.new(d_model) @dropout1 = Torch::NN::Dropout.new(p: dropout) @dropout2 = Torch::NN::Dropout.new(p: dropout) @dropout3 = Torch::NN::Dropout.new(p: dropout) @activation = _get_activation_fn(activation) end |
Instance Method Details
#forward(tgt, memory, tgt_mask: nil, memory_mask: nil, tgt_key_padding_mask: nil, memory_key_padding_mask: nil) ⇒ Object
Pass the inputs (and mask) through the decoder layer. Args:
tgt: the sequence to the decoder layer (required).
memory: the sequence from the last layer of the encoder (required).
tgt_mask: the mask for the tgt sequence (optional).
memory_mask: the mask for the memory sequence (optional).
tgt_key_padding_mask: the mask for the tgt keys per batch (optional).
memory_key_padding_mask: the mask for the memory keys per batch (optional).
Shape:
see the docs in Transformer class.
327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 |
# File 'lib/secryst/transformer.rb', line 327 def forward(tgt, memory, tgt_mask: nil, memory_mask: nil, tgt_key_padding_mask: nil, memory_key_padding_mask: nil) tgt2 = @self_attn.call(tgt, tgt, tgt, attn_mask: tgt_mask, key_padding_mask: tgt_key_padding_mask)[0] tgt = tgt + @dropout1.call(tgt2) tgt = @norm1.call(tgt) tgt2 = @multihead_attn.call(tgt, memory, memory, attn_mask: memory_mask, key_padding_mask: memory_key_padding_mask)[0] tgt = tgt + @dropout2.call(tgt2) tgt = @norm2.call(tgt) tgt2 = @linear2.call(@dropout.call(@activation.call(@linear1.call(tgt)))) tgt = tgt + @dropout3.call(tgt2) tgt = @norm3.call(tgt) return tgt end |