我正在阅读一篇有关用于金融的变压器机器学习模型的文章。我试图理解体系结构背后的数学,但是我未能理解这部分:
尤其是,我不明白为什么操作之间的尺寸不匹配。
根据我的理解:
- 步骤(8): u 应为 m(d_model,1)
- 步骤(9):这是不可能的,因为矩阵乘法维度与执行操作不匹配:
m(d_model,k)。 m(1,d_model)
这是研究的完整部分:
我想,我想,我对此表示符号
“非线性将矩阵M向u投射到u” 句子。
有人可以启发我吗?
基于变压器的股票运动预测的注意力网络,2022年,
Qiuyue Zhang,Chao Qin,Yunfeng Zhang,Fangxun Bao,Caimingzhang,Peide Liu
I am reading an article dealing with Transformer Machine Learning models applied to finance. I am trying to understand the math behind the architecture, but I failed to understand this part :


Especially, I don't get why the dimensions are not matching between the operations.
According to my comprehension:
- step (8) : u should be M(d_model, 1)
- step (9) : this should not be possible as the matrix multiplication dimensions does not match to perform the operation:
M(d_model, K) . M(1,d_model)
Here is the full part of the study :

I guess, I am missunderstanding something with this notation 
or with the "non-linearly project the matrix M to u" sentence.
Can someone enlighten me about this, please ?
Transformer-based attention network for stock movement prediction, 2022,
Qiuyue Zhang, Chao Qin, Yunfeng Zhang , Fangxun Bao, CaimingZhang, Peide Liu
发布评论
评论(1)
如果您没有代码或与作者联系,则使用此文本,我们必须猜测错误在哪里。
我的猜测
另一个假设是它们具有如所述的方程式,它们有一个工作代码,并且在编写论文时,他们计算出矩阵的维度错误。
我不知道我是否会相信将来发行日期的论文,而引用为零。
!
If you don't have the code or you reach the authors, with this piece of text we have to guess where the error is.
My guess
Another hypothesis is that they had the equations as described, they had a working code, and when writing the paper they worked out the dimension of the matrices incorrectly.
I don't know if I would trust a paper with publication date in the future, and zero citations.
https://www.sciencedirect.com/science/article/abs/pii/S0957417422006170#!
