并行 Cholesky 分解用于训练机器学习算法
我正在尝试弄清楚是否可以并行机器学习算法的训练方面。训练中计算成本较高的部分涉及 Cholesky 分解正定矩阵(协方差矩阵)。我将尝试纯粹用矩阵代数来构建这个问题。如果您需要更多信息,请告诉我。
假设我们有一个块矩阵(协方差矩阵,但这与问题无关),
M = A B B* C
其中 A 和 C 与来自两个不同集合的训练数据相关。 A 和 B 都是正定的。为了简单起见,我们还假设 A 和 C 的大小为 nxn。
有一个进行分块 Cholesky 分解的公式。请参阅http://en.wikipedia.org/wiki/Block_LU_decomposition。总结一下我们有以下结果。
M = LU
where (* 表示转置)
L = A^{1/2} 0 B*A^{-*/2} Q^{1/2}
where
Q = C - B*A^{-1}B
现在假设已经进行了与矩阵 A 和 C 相关的训练,因此我们对 A 和 C 进行了乔列斯基分解,得到 A^{1/2} 和 C^{1 /2} (因此,使用前向替换可以直接计算逆 A^{-1/2} 和 C^{-1/2})。
用我们现在拥有的这些数量重写 Q。
Q = Q^{1/2} Q^{*/2} = C^{1/2} C^{*/2} - B* A^{-*/2}A^{-1/2} B
我的问题是:给定这个设置,是否可以代数计算 Q^{1/2},而不必对 Q 应用乔列斯基分解。或者换句话说,我可以使用 C^{1/2} 来帮助我Q^{1/2} 的计算。如果这是可能的,那么就可以轻松地并行训练。
预先感谢您的回复。抱歉矩阵排版。有没有什么合理的方法来排版数学或矩阵?
马特。
I am trying to work out if I can parallelise the training aspect of a machine learning algorithm. The computationally expensive part of the training involves Cholesky decomposing a positive-definite matrix (covariance matrix). I'll try and frame the question purely in terms of the matrix algebra. Let me know if you need any more info.
Lets say we have a block matrix (covariance matrix, but that's not relevant to the problem)
M = A B B* C
where A and C relate to training data from two different sets. Both A , and B are positive definite. Lets also assume for simplicity that A and C have size nxn.
There is a formula for carrying out block Cholesky decomposition. See http://en.wikipedia.org/wiki/Block_LU_decomposition. Summarising we have the following result.
M = LU
where (* indicates transpose)
L = A^{1/2} 0 B*A^{-*/2} Q^{1/2}
where
Q = C - B*A^{-1}B
Now lets say training related to matrices A and C has already been carried out, so we have carried out the cholesky decomposition for A, and C giving A^{1/2}, and C^{1/2} (It is therefore straightforward to calculate the inverses A^{-1/2}, and C^{-1/2} using forward substitution).
Rewriting the Q in terms of these quantities we now have.
Q = Q^{1/2} Q^{*/2} = C^{1/2} C^{*/2} - B* A^{-*/2}A^{-1/2} B
My question is this: Given this set up is it possible to algebraicly calculate Q^{1/2} without having to apply cholesky decomposition to Q. Or in other words can I use C^{1/2} to help me in the calculation of Q^{1/2}. If this were possible it would then be possible to easily parallelise the training.
Thanks in advance for any replies. Sorry about the matrix typesetting. Is there any way sensible way to typeset maths or matrices in particular?
Matt.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以通过一系列 cholesky downdates 来做到这一点:(
下面我使用 ' 进行转置,以避免与乘法混淆)
如果 A 的 cholesky 因子是 a,C 的 cholesky 因子是 c,则方程
对于 Q 可以写成
Q = c*c' - beta'*beta
其中 beta = inverse(a)B (即求解 abeta = B 得到 beta)
如果我们将 beta 的第 i 列写为 b[i],则
Q = c*c' - Sum b[i]*b[i]'
的 cholesky 分解
求cc' - xx'(其中 x 是向量,c 是下三角)
被称为 1 级 cholesky downdate。 Golub 和 van Loan 中有一个稳定的算法
You can do this with a sequence of cholesky downdates:
(Below I use ' for transpose to avoid confusion with multiplication)
If the cholesky factor of A is a, and of C is c, then the equation
for Q can be written
Q = c*c' - beta'*beta
where beta = inverse(a)B (ie solve abeta = B for beta)
If we write b[i] for the i'th column of beta, then
Q = c*c' - Sum b[i]*b[i]'
Finding the cholesky decomposition of
cc' - xx' (where x is a vector and c is lower triangular)
is known as a rank 1 cholesky downdate. There is a stable algorithm for this in Golub and van Loan
我想我已经找到了答案,尽管它并不完全如我所希望的那样。
除去机器学习背景,我的问题归结为了解 C^{1/2} 是否有助于计算 Q^{-1/2}。我将在下面详细介绍,但为了避免不必要的麻烦,答案是肯定的,但仅限于稳定性而不是计算(目前无法证明情况确实如此,但相当肯定)。
为什么对于稳定性答案是肯定的,我们看看原始问题中 Q 的定义已重新排列如下。
Q = C - B* A^{-1} B = (C^{1/2} + B*A^{-*/2})(C^{1/2} - B*A^{-* /2})*
通过事先知道 C^{1/2},我们可以计算 Q,而不必直接反转 A。直接反演在数值上不稳定。
遗憾的是,尽管我对这个主题做了相当多的研究,但 $C^{1/2}$ 似乎并没有帮助 Q^{-1/2} 的精确计算。最好的方法似乎是使用上述 C^{1/2} 计算 Q,然后使用 Cholesky 将 Q 分解为 Q^{1/2},然后前向代入计算 Q^{-1/2}。
进一步研究
我没有详细研究的一个领域是是否可以使用 C^{1/2} 来近似 Q^{-1/2}。类似于使用 C^{1/2} 作为起点的迭代方法。我不知道任何这样的迭代近似过程,但我会继续搜索。我什至可能以此为焦点开始一个新问题。
如果我有任何重大突破,我会向大家通报。
I think I've come to an answer although it is not exactly as I'd hoped.
Removing the machine learning context, my question boiled down to whether knowing C^{1/2} would help in the calculation of Q^{-1/2}. I'll go into more detail below but to cut the chase, the answer is yes, but only with respect to stability and not computation (can't prove this to be the case currently, but fairly certain).
For why the answer is yes wrt to stability we look at the definition Q from the original question has been rearranged as follows.
Q = C - B* A^{-1} B = (C^{1/2} + B*A^{-*/2})(C^{1/2} - B*A^{-*/2})*
By knowing C^{1/2} before hand, we can calculate Q without having to invert A directly. Direct inversion is not numerically stable.
Sadly, although I have done a fair amount of research on the subject, it does not appear that $C^{1/2}$ helps wrt computation in the exact calculation of Q^{-1/2}. The best approach appears to be to calculate Q using C^{1/2} as above and then use Cholesky to decompose Q to Q^{1/2} and then forward substitution to calculate Q^{-1/2}.
Further Research
One area I did not look into in much detail was whether it was possible to use C^{1/2} to approximate Q^{-1/2}. Something along the lines of an iterative method using C^{1/2} as a starting point. I do not know of any such iterative approximation process, but I'll keep searching. I may even start a new question with that as the focus.
I'll update you all if I have any major breakthroughs.