C/C++ 的快速转换向量到 Numpy 数组
我使用 SWIG 将一些 C++ 代码粘合到 Python (2.6),并且该粘合的一部分包括一段代码,该代码将大型数据字段(数百万个值)从 C++ 端转换为 Numpy 数组。我能想到的最好方法是为类实现一个迭代器,然后提供一个 Python 方法:
def __array__(self, dtype=float):
return np.fromiter(self, dtype, self.size())
问题是每个迭代器 next
调用的成本非常高,因为它必须经过大约三到四次SWIG 包装器。这需要太长的时间。我可以保证 C++ 数据是连续存储的(因为它们位于 std::vector 中),并且感觉 Numpy 应该能够获取指向该数据开头的指针以及它包含的值的数量,并且直接阅读。
有没有办法将指向 internal_data_[0]
的指针和值 internal_data_.size()
传递给 numpy,以便它可以直接访问或复制数据,而无需所有操作Python 开销?
I'm using SWIG to glue together some C++ code to Python (2.6), and part of that glue includes a piece of code that converts large fields of data (millions of values) from the C++ side to a Numpy array. The best method I can come up with implements an iterator for the class and then provides a Python method:
def __array__(self, dtype=float):
return np.fromiter(self, dtype, self.size())
The problem is that each iterator next
call is very costly, since it has to go through about three or four SWIG wrappers. It takes far too long. I can guarantee that the C++ data are stored contiguously (since they live in a std::vector), and it just feels like Numpy should be able to take a pointer to the beginning of that data alongside the number of values it contains, and read it directly.
Is there a way to pass a pointer to internal_data_[0]
and the value internal_data_.size()
to numpy so that it can directly access or copy the data without all the Python overhead?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您需要定义
__array_interface__()
改为。这将让您直接传回指针和形状信息。You will want to define
__array_interface__()
instead. This will let you pass back the pointer and the shape information directly.也许可以使用 f2py 代替 swig。尽管它的名字是这样的,但它能够将 python 与 C 以及 Fortran 进行交互。请参阅 http://www.scipy.org/Cookbook/f2py_and_NumPy
优点是它可以处理自动转换为 numpy 数组。
两个警告:如果您还不了解 Fortran,您可能会发现 f2py 有点奇怪;我不知道它与 C++ 的配合效果如何。
Maybe it would be possible to use f2py instead of swig. Despite its name, it is capable of interfacing python with C as well as Fortran. See http://www.scipy.org/Cookbook/f2py_and_NumPy
The advantage is that it handles the conversion to numpy arrays automatically.
Two caveats: if you don't already know Fortran, you may find f2py a bit strange; and I don't know how well it works with C++.
如果将向量包装在实现 Python Buffer Interface 的对象中,则可以通过到 numpy 数组进行初始化(参见 docs,第三个参数)。我敢打赌,这种初始化速度快得多,因为它可以仅使用
memcpy
来复制数据。If you wrap your vector in an object that implements Pythons Buffer Interface, you can pass that to the numpy array for initialization (see docs, third argument). I would bet that this initialization is much faster, since it can just use
memcpy
to copy the data.因此,看起来唯一真正的解决方案是基于 pybuffer.i 构建一些可以从 C++ 复制到现有缓冲区的东西。如果将其添加到 SWIG 包含文件中:
那么您可以使用 then 在 Python 中创建一个“Numpy”容器
,只需执行以下操作:
这仅具有单个 Python <--> 的开销。 C++ 转换调用,而不是典型长度为 N 的数组产生的 N。
此代码的稍微完整的版本是我的 PyTRT 项目的一部分,位于github。
So it looks like the only real solution is to base something off
pybuffer.i
that can copy from C++ into an existing buffer. If you add this to a SWIG include file:then you can make a container "Numpy"-able with
Then in Python, just do:
This has only the overhead of a single Python <--> C++ translation call, not the N that would result from a typical length-N array.
A slightly more complete version of this code is part of my PyTRT project at github.