时间融合变压器(Pytorch 预测):“hidden_size”参数
PytorchForecasting 包中的 Temporal-Fusion-Transformer (TFT) 模型有几个参数(请参阅:https://pytorch-forecasting.readthedocs.io/en/latest/_modules/pytorch_forecasting/models/temporal_fusion_transformer.html#TemporalFusionTransformer)。
hidden_size
参数到底指的是什么?我最好的猜测是,它指的是 TFT 的 GRN 组件中包含的神经元数量。如果是,这些神经元包含在哪一层?
我发现文档在这种情况下并没有多大帮助,因为他们将 hidden_size
参数描述为:“网络的隐藏大小,这是其主要超参数,范围可以从 8 到 512”
旁注:我的部分内容无知可能是因为我不完全熟悉 TFT 模型的各个组件。
The Temporal-Fusion-Transformer (TFT) model in the PytorchForecasting package has several parameters (see: https://pytorch-forecasting.readthedocs.io/en/latest/_modules/pytorch_forecasting/models/temporal_fusion_transformer.html#TemporalFusionTransformer).
What does the hidden_size
parameter exactly refer to? My best guess is that it refers to the number of neurons contained in the GRN component of the TFT. If so, in which layer are these neurons contained?
I found the documentation not really helpful in this case, since they describe the hidden_size
parameter as: "hidden size of network which is its main hyperparameter and can range from 8 to 512"
Side note: part of my ignorance might be due to the fact that I am not fully familiar with the individual components of the TFT model.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
对链接中提供的源代码进行了一些研究后,我能够弄清楚
hidden_size
如何成为模型的主要超参数。这里是:hidden_size 确实描述了 GRN 每个 Dense 层的神经元数量。您可以在https://arxiv.org/pdf/1912.09363.pdf<查看GRN的结构/a>(第 6 页,图 2)。请注意,由于 GRN 的最后一层只是归一化层,因此 GRN 的输出也具有维度
hidden_size
。这是模型的主要超参数吗?通过查看 TFT 模型的结构(也在第 6 页),GRN 单元出现在变量选择过程中、静态丰富部分和位置前馈部分中,因此基本上出现在学习的每个步骤中过程。这些 GRN 中的每一个都是以相同的方式构建的(只是输入大小不同)。
After a bit of research on the source code provided in the link, I was able to figure out how
hidden_size
is the main hyperparameter of the model. Here it is:hidden_size
describes indeed the number of neurons of each Dense layer of the GRN. You can check out the structure of the GRN at https://arxiv.org/pdf/1912.09363.pdf (page 6, Figure 2). Note that since the final layer of the GRN is just a normalization layer, also the output of the GRN has dimensionhidden_size
.How is this the main hyperparameter of the model? By looking at the structure of the TFT model (on page 6 as well), the GRN unit appears in the Variable Selection process, in the Static Enrichment section and in the Position-wise Feed Forward section, so basically in every step of the learning process. Each one of these GRNs is built in the same way (only the input size varies).