注意机制中的矩阵操作

发布于 2025-02-10 02:36:27 字数 1212 浏览 2 评论 0 原文

我正在阅读一篇有关用于金融的变压器机器学习模型的文章。我试图理解体系结构背后的数学,但是我未能理解这部分:

”在此处输入图像描述”

尤其是,我不明白为什么操作之间的尺寸不匹配。 根据我的理解:

  1. 步骤(8) u 应为 m(d_model,1)
  2. 步骤(9):这是不可能的,因为矩阵乘法维度与执行操作不匹配: m(d_model,k)。 m(1,d_model)

这是研究的完整部分:

“在此处输入图像说明”

我想,我想,我对此表示符号 “非线性将矩阵M向u投射到u” 句子。

有人可以启发我吗?

基于变压器的股票运动预测的注意力网络,2022年, Qiuyue Zhang,Chao Qin,Yunfeng Zhang,Fangxun Bao,Caimingzhang,Peide Liu

I am reading an article dealing with Transformer Machine Learning models applied to finance. I am trying to understand the math behind the architecture, but I failed to understand this part :

enter image description here
enter image description here

Especially, I don't get why the dimensions are not matching between the operations.
According to my comprehension:

  1. step (8) : u should be M(d_model, 1)
  2. step (9) : this should not be possible as the matrix multiplication dimensions does not match to perform the operation:
    M(d_model, K) . M(1,d_model)

Here is the full part of the study :

enter image description here

I guess, I am missunderstanding something with this notation enter image description here
or with the "non-linearly project the matrix M to u" sentence.

Can someone enlighten me about this, please ?

Transformer-based attention network for stock movement prediction, 2022,
Qiuyue Zhang, Chao Qin, Yunfeng Zhang , Fangxun Bao, CaimingZhang, Peide Liu

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

倾城花音 2025-02-17 02:36:27

如果您没有代码或与作者联系,则使用此文本,我们必须猜测错误在哪里。

我的猜测

  1. $ w_m^m $在等式中,是$ w_m^m $
  2. softmax输出实际上是计算$ u^t $,而不是$ u $
  3. eq。 9应该是$ m^t u^t $,而不是$ m u^t $

另一个假设是它们具有如所述的方程式,它们有一个工作代码,并且在编写论文时,他们计算出矩阵的维度错误。

我不知道我是否会相信将来发行日期的论文,而引用为零。


If you don't have the code or you reach the authors, with this piece of text we have to guess where the error is.

My guess

  1. $W_m^M$ in the equation, is $W_m^M$
  2. The softmax output is actually computing $u^T$, not $u$
  3. eq. 9 should be $M^T u^T$, instead of $M u^T$

Another hypothesis is that they had the equations as described, they had a working code, and when writing the paper they worked out the dimension of the matrices incorrectly.

I don't know if I would trust a paper with publication date in the future, and zero citations.

https://www.sciencedirect.com/science/article/abs/pii/S0957417422006170#!
enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文