嵌入层 - Pytorch 中的 torch.nn.Embedding
我对 NN 很陌生,如果我的问题很愚蠢,我很抱歉。我刚刚在 github 上阅读代码,发现专业人士使用嵌入(在这种情况下不是单词嵌入),但我可以问一下:
- 嵌入层是否具有可训练的变量,可以随着时间的推移学习以改进嵌入?
- 您能否提供对此的直觉以及使用什么情况,例如房价回归会从中受益吗?
- 如果是这样(它学习到的)与仅使用线性层有什么区别?
>>> embedding = nn.Embedding(10, 3)
>>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
>>> input
tensor([[1, 2, 4, 5],
[4, 3, 2, 9]])
>>> embedding(input)
tensor([[[-0.0251, -1.6902, 0.7172],
[-0.6431, 0.0748, 0.6969],
[ 1.4970, 1.3448, -0.9685],
[-0.3677, -2.7265, -0.1685]],
[[ 1.4970, 1.3448, -0.9685],
[ 0.4362, -0.4004, 0.9400],
[-0.6431, 0.0748, 0.6969],
[ 0.9124, -2.3616, 1.1151]]])
I'm quite new to NN and sorry if my question is quite dumb. I was just reading codes on github and found the pros use embedding (in that case not a word embedding) but may I please just ask in general:
- Does Embedding Layer has trainable variables that learn over time as to improve in embedding?
- May you please provide an intuition on it and what circumstances to use, like would the house price regression benefit from it ?
- If so (that it learns) what is the difference than just using Linear Layers?
>>> embedding = nn.Embedding(10, 3)
>>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
>>> input
tensor([[1, 2, 4, 5],
[4, 3, 2, 9]])
>>> embedding(input)
tensor([[[-0.0251, -1.6902, 0.7172],
[-0.6431, 0.0748, 0.6969],
[ 1.4970, 1.3448, -0.9685],
[-0.3677, -2.7265, -0.1685]],
[[ 1.4970, 1.3448, -0.9685],
[ 0.4362, -0.4004, 0.9400],
[-0.6431, 0.0748, 0.6969],
[ 0.9124, -2.3616, 1.1151]]])
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
简而言之,嵌入层具有可学习的参数,并且该层的有用性取决于您想要对数据产生什么归纳偏差。
是的,如变量部分下的 文档 中所述,它的嵌入权重在训练过程中会发生变化。
嵌入层通常用于输入被标记化的 NLP 任务。这意味着输入在某种意义上是离散的,可以用于索引权重(这基本上就是嵌入层在前向模式下的情况)。这种离散归因意味着像
1
、2
、42
这样的输入是完全不同的(直到学习了语义相关性)。房价回归具有连续的输入空间,1.0
和1.1
等值可能比值1.0
和42.0
相关性更高代码>.这种关于假设空间的假设称为归纳偏差,几乎每个机器学习架构都符合某种归纳偏差。我相信使用嵌入层来解决需要某种离散化的回归问题是可能的,但它不会从中受益。有一个很大的区别,线性层与权重执行矩阵乘法,而不是将其用作查找表。在嵌入层的反向传播期间,梯度只会传播到查找中使用的相应索引,并累积重复索引。
In short, the embedding layer has learnable parameters and the usefulness of the layer depends on what inductive bias you want on the data.
Yes, as stated in the docs under the Variables section, it has an embedding weight that is altered during the training process.
An embedding layer is commonly used in NLP tasks where the input is tokenized. This means that the input is discrete in a sense and can be used for indexing the weight (which is basically what the embedding layer is in forward mode). This discrete attribution implies that inputs like
1
,2
,42
are entirely different (until the semantic correlation has been learnt). House price regression has continuous input space and values such as1.0
and1.1
might be more correlated than the values1.0
and42.0
. This kind of assumption about the hypothesis space is called an inductive bias and pretty much every machine learning architecture conforms to some sort of inductive bias. I believe it is possible to use embedding layers for regression problems which would require some kind of discretization, but it would not benefit from it.There is a big difference, the linear layer performs matrix multiplication with the weight as opposed to using it as a lookup table. During backpropagation for the embedding layer, the gradients will only propagate threw the corresponding indices used in the lookup and duplicate indices are accumulated.