torch.nn.functional.embedding_bag¶
- torch.nn.functional.embedding_bag(input, weight, offsets=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, mode='mean', sparse=False, per_sample_weights=None, include_last_offset=False, padding_idx=None)[source]¶
Computes sums, means or maxes of bags of embeddings, without instantiating the intermediate embeddings.
See
torch.nn.EmbeddingBag
for more details.Note
This operation may produce nondeterministic gradients when given tensors on a CUDA device. See Reproducibility for more information.
- Parameters:
input (LongTensor) – Tensor containing bags of indices into the embedding matrix
weight (Tensor) – The embedding matrix with number of rows equal to the maximum possible index + 1, and number of columns equal to the embedding size
offsets (LongTensor, optional) – Only used when
input
is 1D.offsets
determines the starting index position of each bag (sequence) ininput
.max_norm (float, optional) – If given, each embedding vector with norm larger than
max_norm
is renormalized to have normmax_norm
. Note: this will modifyweight
in-place.norm_type (float, optional) – The
p
in thep
-norm to compute for themax_norm
option. Default2
.scale_grad_by_freq (bool, optional) – if given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default
False
. Note: this option is not supported whenmode="max"
.mode (str, optional) –
"sum"
,"mean"
or"max"
. Specifies the way to reduce the bag. Default:"mean"
sparse (bool, optional) – if
True
, gradient w.r.t.weight
will be a sparse tensor. See Notes undertorch.nn.Embedding
for more details regarding sparse gradients. Note: this option is not supported whenmode="max"
.per_sample_weights (Tensor, optional) – a tensor of float / double weights, or None to indicate all weights should be taken to be 1. If specified,
per_sample_weights
must have exactly the same shape as input and is treated as having the sameoffsets
, if those are not None.include_last_offset (bool, optional) – if
True
, the size of offsets is equal to the number of bags + 1. The last element is the size of the input, or the ending index position of the last bag (sequence).padding_idx (int, optional) – If specified, the entries at
padding_idx
do not contribute to the gradient; therefore, the embedding vector atpadding_idx
is not updated during training, i.e. it remains as a fixed “pad”. Note that the embedding vector atpadding_idx
is excluded from the reduction.
- Return type:
- Shape:
input
(LongTensor) andoffsets
(LongTensor, optional)If
input
is 2D of shape (B, N), it will be treated asB
bags (sequences) each of fixed lengthN
, and this will returnB
values aggregated in a way depending on themode
.offsets
is ignored and required to beNone
in this case.If
input
is 1D of shape (N), it will be treated as a concatenation of multiple bags (sequences).offsets
is required to be a 1D tensor containing the starting index positions of each bag ininput
. Therefore, foroffsets
of shape (B),input
will be viewed as havingB
bags. Empty bags (i.e., having 0-length) will have returned vectors filled by zeros.
weight
(Tensor): the learnable weights of the module of shape (num_embeddings, embedding_dim)per_sample_weights
(Tensor, optional). Has the same shape asinput
.output
: aggregated embedding values of shape (B, embedding_dim)
Examples:
>>> # an Embedding module containing 10 tensors of size 3 >>> embedding_matrix = torch.rand(10, 3) >>> # a batch of 2 samples of 4 indices each >>> input = torch.tensor([1, 2, 4, 5, 4, 3, 2, 9]) >>> offsets = torch.tensor([0, 4]) >>> F.embedding_bag(input, embedding_matrix, offsets) tensor([[ 0.3397, 0.3552, 0.5545], [ 0.5893, 0.4386, 0.5882]]) >>> # example with padding_idx >>> embedding_matrix = torch.rand(10, 3) >>> input = torch.tensor([2, 2, 2, 2, 4, 3, 2, 9]) >>> offsets = torch.tensor([0, 4]) >>> F.embedding_bag(input, embedding_matrix, offsets, padding_idx=2, mode='sum') tensor([[ 0.0000, 0.0000, 0.0000], [-0.7082, 3.2145, -2.6251]])