嵌入层 - Pytorch 中的 torch.nn.Embedding

发布于 2025-01-17 01:51:12 字数 746 浏览 0 评论 0原文

我对 NN 很陌生，如果我的问题很愚蠢，我很抱歉。我刚刚在 github 上阅读代码，发现专业人士使用嵌入（在这种情况下不是单词嵌入），但我可以问一下：

嵌入层是否具有可训练的变量，可以随着时间的推移学习以改进嵌入？
您能否提供对此的直觉以及使用什么情况，例如房价回归会从中受益吗？
如果是这样（它学习到的）与仅使用线性层有什么区别？

>>> embedding = nn.Embedding(10, 3)
>>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
>>> input
tensor([[1, 2, 4, 5],
        [4, 3, 2, 9]])

>>> embedding(input)
tensor([[[-0.0251, -1.6902,  0.7172],
         [-0.6431,  0.0748,  0.6969],
         [ 1.4970,  1.3448, -0.9685],
         [-0.3677, -2.7265, -0.1685]],

        [[ 1.4970,  1.3448, -0.9685],
         [ 0.4362, -0.4004,  0.9400],
         [-0.6431,  0.0748,  0.6969],
         [ 0.9124, -2.3616,  1.1151]]])

原文

I'm quite new to NN and sorry if my question is quite dumb. I was just reading codes on github and found the pros use embedding (in that case not a word embedding) but may I please just ask in general:

Does Embedding Layer has trainable variables that learn over time as to improve in embedding?
May you please provide an intuition on it and what circumstances to use, like would the house price regression benefit from it ?
If so (that it learns) what is the difference than just using Linear Layers?

>>> embedding = nn.Embedding(10, 3)
>>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
>>> input
tensor([[1, 2, 4, 5],
        [4, 3, 2, 9]])

>>> embedding(input)
tensor([[[-0.0251, -1.6902,  0.7172],
         [-0.6431,  0.0748,  0.6969],
         [ 1.4970,  1.3448, -0.9685],
         [-0.3677, -2.7265, -0.1685]],

        [[ 1.4970,  1.3448, -0.9685],
         [ 0.4362, -0.4004,  0.9400],
         [-0.6431,  0.0748,  0.6969],
         [ 0.9124, -2.3616,  1.1151]]])

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

月下客 2025-01-24 01:51:12

简而言之，嵌入层具有可学习的参数，并且该层的有用性取决于您想要对数据产生什么归纳偏差。

嵌入层是否具有可训练变量，可以随着时间的推移进行学习以改进嵌入？

是的，如变量部分下的文档中所述，它的嵌入权重在训练过程中会发生变化。

请您提供一下对此的直觉以及使用什么情况，例如房价回归会从中受益吗？

嵌入层通常用于输入被标记化的 NLP 任务。这意味着输入在某种意义上是离散的，可以用于索引权重（这基本上就是嵌入层在前向模式下的情况）。这种离散归因意味着像 1、2、42 这样的输入是完全不同的（直到学习了语义相关性）。房价回归具有连续的输入空间，1.0 和 1.1 等值可能比值 1.0 和 42.0 相关性更高代码>.这种关于假设空间的假设称为归纳偏差，几乎每个机器学习架构都符合某种归纳偏差。我相信使用嵌入层来解决需要某种离散化的回归问题是可能的，但它不会从中受益。