我读了标题为'与表演者重新思考注意力'的作品。这是处理具有强大理论保证的变压器中使用的自我注意力的二次时间复杂性的开创性贡献。
但是,我坚持以下方程式(纸上的方程5),以近似非线性换档内核。
我无法证明上述方程,尤其是方程式中使用的确定性函数h(x)。但是,文献(论文和其他文献)具有以下形式的随机傅立叶特征函数。此\ phi函数不包含确定性函数h(x)。
上述方程中找到一些引用
似乎方程5是公式5上述方程式。但是,我无法从这个方程式驱动方程5。请帮助我获得“与表演者的重新思考”的方程式5。
有一个名为'大规模内核计算机的随机特征''为了更好地理解内核函数的低维近似。
I read the work titled 'Rethinking Attention with Performers'. This is a seminal contribution to handling the quadratic time complexity of self-attention used in Transformer with strong theoretical guarantees.
However, I am stuck with the following equation (equation 5 on paper) to approximate the non-linear shift-invariant kernel.

I am unable to prove the above equation, especially the deterministic function h(x) used in the equation. However, literature (cited in the paper and others) has the following form of Random Fourier Feature functions. This \phi function doesn't contain the deterministic function h(x).

Find a few references for the above equation
It seems that equation 5 is a generalization of the above equation. However, I am unable to drive equation 5 from this equation. Kindly help me out to get equation 5 of 'Rethinking Attention with Performers'.
There is a blog for the work titled 'Random Features for Large-Scale Kernel Machines' for a better understanding of low-dimensional approximation of kernel function.
发布评论