Class: Secryst::TransformerEncoderLayer

Inherits:

Torch::NN::Module

Object
Torch::NN::Module
Secryst::TransformerEncoderLayer

show all

Defined in:: lib/secryst/transformer.rb

Instance Method Summary collapse

#forward(src, src_mask: nil, src_key_padding_mask: nil) ⇒ Object

Pass the input through the encoder layer.
#initialize(d_model, nhead, dim_feedforward: 2048, dropout: 0.1, activation: "relu") ⇒ TransformerEncoderLayer constructor

TransformerEncoderLayer is made up of self-attn and feedforward network.

Constructor Details

#initialize(d_model, nhead, dim_feedforward: 2048, dropout: 0.1, activation: "relu") ⇒ `TransformerEncoderLayer`

TransformerEncoderLayer is made up of self-attn and feedforward network. This standard encoder layer is based on the paper “Attention Is All You Need”. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000-6010. Users may modify or implement in a different way during application. Args:

d_model: the number of expected features in the input (required).
nhead: the number of heads in the multiheadattention models (required).
dim_feedforward: the dimension of the feedforward network model (default=2048).
dropout: the dropout value (default=0.1).
activation: the activation function of intermediate layer, relu or gelu (default=relu).

Examples: >>> encoder_layer = TransformerEncoderLayer.new(512, 8) >>> src = Torch.rand(10, 32, 512) >>> out = encoder_layer.call(src)

# File 'lib/secryst/transformer.rb', line 185

def initialize(d_model, nhead, dim_feedforward:2048, dropout:0.1, activation:"relu")
  super()
  @self_attn = MultiheadAttention.new(d_model, nhead, dropout: dropout)
  # Implementation of Feedforward model
  @linear1 = Torch::NN::Linear.new(d_model, dim_feedforward)
  @dropout = Torch::NN::Dropout.new(p: dropout)
  @linear2 = Torch::NN::Linear.new(dim_feedforward, d_model)

  @norm1 = Torch::NN::LayerNorm.new(d_model)
  @norm2 = Torch::NN::LayerNorm.new(d_model)
  @dropout1 = Torch::NN::Dropout.new(p: dropout)
  @dropout2 = Torch::NN::Dropout.new(p: dropout)

  @activation = _get_activation_fn(activation)
end

Instance Method Details

#forward(src, src_mask: nil, src_key_padding_mask: nil) ⇒ `Object`

Pass the input through the encoder layer. Args:

src: the sequence to the encoder layer (required).
src_mask: the mask for the src sequence (optional).
src_key_padding_mask: the mask for the src keys per batch (optional).

Shape:

see the docs in Transformer class.

# File 'lib/secryst/transformer.rb', line 208

def forward(src, src_mask: nil, src_key_padding_mask: nil)
  src2 = @self_attn.call(src, src, src, attn_mask: src_mask,
                        key_padding_mask: src_key_padding_mask)[0]
  src = src + @dropout1.call(src2)
  src = @norm1.call(src)
  src2 = @linear2.call(@dropout.call(@activation.call(@linear1.call(src))))
  src = src + @dropout2.call(src2)
  src = @norm2.call(src)
  return src
end