Python 中的自动 CudaMat 转换
我正在考虑使用某种形式的 CUDA 来加速我的 python 代码,这些代码都是矩阵数学。目前我的代码正在使用 Python 和 Numpy,因此使用 PyCUDA 或 CudaMat 之类的东西重写它似乎应该不会太困难。
然而,在我第一次尝试使用 CudaMat 时,我意识到我必须重新排列许多方程才能将所有操作都保留在 GPU 上。这包括创建许多临时变量,以便我可以存储操作结果。
我理解为什么这是必要的,但它使曾经容易阅读的方程变得有些混乱,难以检查其正确性。此外,我希望以后能够轻松修改方程,这不是转换后的形式。
Theano 包首先创建操作的符号表示,然后将它们编译为 CUDA,从而设法做到这一点。然而,在尝试了 Theano 一段时间后,我对一切的不透明感到沮丧。例如,仅仅获取 myvar.shape[0] 的实际值就变得很困难,因为树直到很晚才被评估。我也更喜欢更少的框架,在该框架中我的代码非常符合一个以无形的方式代替 Numpy 的库。
因此,我真正想要的是简单得多的东西。我不需要自动微分(如果我需要的话,还有像 OpenOpt 这样的其他软件包可以做到这一点),或者树的优化,而只是从标准 Numpy 表示法到 CudaMat/PyCUDA/somethingCUDA 的转换。事实上,我希望能够将其评估为 Numpy,而不需要任何 CUDA 代码进行测试。
我目前正在考虑自己写这篇文章,但在考虑这样的冒险之前,我想看看是否有其他人知道类似的项目或一个好的起点。我知道的唯一一个可能与此接近的项目是 SymPy,但我不知道适应这个目的有多容易。
我当前的想法是创建一个看起来像 Numpy.array 类的数组类。它唯一的功能就是建造一棵树。在任何时候,该符号数组类都可以转换为 Numpy 数组类并进行求值(也将存在一对一奇偶校验)。或者,可以遍历数组类并生成 CudaMat 命令。如果需要优化,则可以在该阶段完成(例如,重新排序操作、创建临时变量等),而不会妨碍检查正在发生的情况。
任何想法/评论/等等。对此将不胜感激!
更新
用例可能看起来像这样(其中 sym 是理论模块),我们可能会做一些事情,例如计算梯度:
W = sym.array(np.rand(size=(numVisible, numHidden)))
delta_o = -(x - z)
delta_h = sym.dot(delta_o, W)*h*(1.0-h)
grad_W = sym.dot(X.T, delta_h)
在这种情况下,grad_W
实际上会只是一棵包含需要完成的操作的树。如果你想正常评估表达式(即通过 Numpy),你可以这样做:
npGrad_W = grad_W.asNumpy()
这只会执行树代表的 Numpy 命令。另一方面,如果您想使用 CUDA,您可以这样做:
cudaGrad_W = grad_W.asCUDA()
这会将树转换为可以通过 CUDA 执行的表达式(这可能以几种不同的方式发生)。
这样,执行以下操作应该很简单:(1) 测试 grad_W.asNumpy() == grad_W.asCUDA()
,以及 (2) 转换您预先存在的代码以使用 CUDA。
I'm looking into speeding up my python code, which is all matrix math, using some form of CUDA. Currently my code is using Python and Numpy, so it seems like it shouldn't be too difficult to rewrite it using something like either PyCUDA or CudaMat.
However, on my first attempt using CudaMat, I realized I had to rearrange a lot of the equations in order to keep the operations all on the GPU. This included the creation of many temporary variables so I could store the results of the operations.
I understand why this is necessary, but it makes what were once easy to read equations into somewhat of a mess that difficult to inspect for correctness. Additionally, I would like to be able to easily modify the equations later on, which isn't in their converted form.
The package Theano manages to do this by first creating a symbolic representation of the operations, then compiling them to CUDA. However, after trying Theano out for a bit, I was frustrated by how opaque everything was. For example, just getting the actual value for myvar.shape[0] is made difficult since the tree doesn't get evaluated until much later. I would also much prefer less of a framework in which my code much conform to a library that acts invisibly in the place of Numpy.
Thus, what I would really like is something much simpler. I don't want automatic differentiation (there are other packages like OpenOpt that can do that if I require it), or optimization of the tree, but just a conversion from standard Numpy notation to CudaMat/PyCUDA/somethingCUDA. In fact, I want to be able to have it evaluate to just Numpy without any CUDA code for testing.
I'm currently considering writing this myself, but before even consider such a venture, I wanted to see if anyone else knows of similar projects or a good starting place. The only other project I know that might be close to this is SymPy, but I don't know how easy it would be to adapt to this purpose.
My current idea would be to create an array class that looked like a Numpy.array class. It's only function would be to build a tree. At any time, that symbolic array class could be converted to a Numpy array class and be evaluated (there would also be a one-to-one parity). Alternatively, the array class could be traversed and have CudaMat commands be generated. If optimizations are required they can be done at that stage (e.g. re-ordering of operations, creation of temporary variables, etc.) without getting in the way of inspecting what's going on.
Any thoughts/comments/etc. on this would be greatly appreciated!
Update
A usage case may look something like (where sym is the theoretical module), where we might be doing something such as calculating the gradient:
W = sym.array(np.rand(size=(numVisible, numHidden)))
delta_o = -(x - z)
delta_h = sym.dot(delta_o, W)*h*(1.0-h)
grad_W = sym.dot(X.T, delta_h)
In this case, grad_W
would actually just be a tree containing the operations that needed to be done. If you wanted to evaluate the expression normally (i.e. via Numpy) you could do:
npGrad_W = grad_W.asNumpy()
which would just execute the Numpy commands that the tree represents. If on the other hand, you wanted to use CUDA, you would do:
cudaGrad_W = grad_W.asCUDA()
which would convert the tree into expressions that can executed via CUDA (this could happen in a couple of different ways).
That way it should be trivial to: (1) test grad_W.asNumpy() == grad_W.asCUDA()
, and (2) convert your pre-existing code to use CUDA.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你看过 PyCUDA 的 GPUArray 部分吗?
http://documen.tician.de/pycuda/array.html
虽然我还没有我自己没用过,看来这就是你要找的。特别是,请查看该页面底部附近的“单遍自定义表达式评估”部分。
Have you looked at the GPUArray portion of PyCUDA?
http://documen.tician.de/pycuda/array.html
While I haven't used it myself, it seems like it would be what you're looking for. In particular, check out the "Single-pass Custom Expression Evaluation" section near the bottom of that page.