C/C++ 的快速转换向量到 Numpy 数组

发布于 2024-10-26 12:00:10 字数 536 浏览 3 评论 0原文

我使用 SWIG 将一些 C++ 代码粘合到 Python (2.6),并且该粘合的一部分包括一段代码,该代码将大型数据字段(数百万个值)从 C++ 端转换为 Numpy 数组。我能想到的最好方法是为类实现一个迭代器,然后提供一个 Python 方法:

def __array__(self, dtype=float):
    return np.fromiter(self, dtype, self.size())

问题是每个迭代器 next 调用的成本非常高,因为它必须经过大约三到四次SWIG 包装器。这需要太长的时间。我可以保证 C++ 数据是连续存储的(因为它们位于 std::vector 中),并且感觉 Numpy 应该能够获取指向该数据开头的指针以及它包含的值的数量,并且直接阅读。

有没有办法将指向 internal_data_[0] 的指针和值 internal_data_.size() 传递给 numpy,以便它可以直接访问或复制数据,而无需所有操作Python 开销?

I'm using SWIG to glue together some C++ code to Python (2.6), and part of that glue includes a piece of code that converts large fields of data (millions of values) from the C++ side to a Numpy array. The best method I can come up with implements an iterator for the class and then provides a Python method:

def __array__(self, dtype=float):
    return np.fromiter(self, dtype, self.size())

The problem is that each iterator next call is very costly, since it has to go through about three or four SWIG wrappers. It takes far too long. I can guarantee that the C++ data are stored contiguously (since they live in a std::vector), and it just feels like Numpy should be able to take a pointer to the beginning of that data alongside the number of values it contains, and read it directly.

Is there a way to pass a pointer to internal_data_[0] and the value internal_data_.size() to numpy so that it can directly access or copy the data without all the Python overhead?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

甜是你 2024-11-02 12:00:10

您需要定义 __array_interface__() 改为。这将让您直接传回指针和形状信息。

You will want to define __array_interface__() instead. This will let you pass back the pointer and the shape information directly.

羁〃客ぐ 2024-11-02 12:00:10

也许可以使用 f2py 代替 swig。尽管它的名字是这样的,但它能够将 python 与 C 以及 Fortran 进行交互。请参阅 http://www.scipy.org/Cookbook/f2py_and_NumPy

优点是它可以处理自动转换为 numpy 数组。

两个警告:如果您还不了解 Fortran,您可能会发现 f2py 有点奇怪;我不知道它与 C++ 的配合效果如何。

Maybe it would be possible to use f2py instead of swig. Despite its name, it is capable of interfacing python with C as well as Fortran. See http://www.scipy.org/Cookbook/f2py_and_NumPy

The advantage is that it handles the conversion to numpy arrays automatically.

Two caveats: if you don't already know Fortran, you may find f2py a bit strange; and I don't know how well it works with C++.

旧伤慢歌 2024-11-02 12:00:10

如果将向量包装在实现 Python Buffer Interface 的对象中,则可以通过到 numpy 数组进行初始化(参见 docs,第三个参数)。我敢打赌,这种初始化速度快得多,因为它可以仅使用memcpy来复制数据。

If you wrap your vector in an object that implements Pythons Buffer Interface, you can pass that to the numpy array for initialization (see docs, third argument). I would bet that this initialization is much faster, since it can just use memcpy to copy the data.

╰つ倒转 2024-11-02 12:00:10

因此,看起来唯一真正的解决方案是基于 pybuffer.i 构建一些可以从 C++ 复制到现有缓冲区的东西。如果将其添加到 SWIG 包含文件中:

%insert("python") %{
import numpy as np
%}

/*! Templated function to copy contents of a container to an allocated memory
 * buffer
 */
%inline %{
//==== ADDED BY numpy.i
#include <algorithm>

template < typename Container_T >
void copy_to_buffer(
        const Container_T& field,
        typename Container_T::value_type* buffer,
        typename Container_T::size_type length
        )
{
//    ValidateUserInput( length == field.size(),
//            "Destination buffer is the wrong size" );
    // put your own assertion here or BAD THINGS CAN HAPPEN

    if (length == field.size()) {
        std::copy( field.begin(), field.end(), buffer );
    }
}
//====

%}

%define TYPEMAP_COPY_TO_BUFFER(CLASS...)
%typemap(in) (CLASS::value_type* buffer, CLASS::size_type length)
(int res = 0, Py_ssize_t size_ = 0, void *buffer_ = 0) {

    res = PyObject_AsWriteBuffer($input, &buffer_, &size_);
    if ( res < 0 ) {
        PyErr_Clear();
        %argument_fail(res, "(CLASS::value_type*, CLASS::size_type length)",
                $symname, $argnum);
    }
    $1 = ($1_ltype) buffer_;
    $2 = ($2_ltype) (size_/sizeof($*1_type));
}
%enddef


%define ADD_NUMPY_ARRAY_INTERFACE(PYVALUE, PYCLASS, CLASS...)

TYPEMAP_COPY_TO_BUFFER(CLASS)

%template(_copy_to_buffer_ ## PYCLASS) copy_to_buffer< CLASS >;

%extend CLASS {
%insert("python") %{
def __array__(self):
    """Enable access to this data as a numpy array"""
    a = np.ndarray( shape=( len(self), ), dtype=PYVALUE )
    _copy_to_buffer_ ## PYCLASS(self, a)
    return a
%}
}

%enddef

那么您可以使用 then 在 Python 中创建一个“Numpy”容器

%template(DumbVectorFloat) DumbVector<double>;
ADD_NUMPY_ARRAY_INTERFACE(float, DumbVectorFloat, DumbVector<double>);

,只需执行以下操作:

# dvf is an instance of DumbVectorFloat
import numpy as np
my_numpy_array = np.asarray( dvf )

这仅具有单个 Python <--> 的开销。 C++ 转换调用,而不是典型长度为 N 的数组产生的 N。

此代码的稍微完整的版本是我的 PyTRT 项目的一部分,位于github

So it looks like the only real solution is to base something off pybuffer.i that can copy from C++ into an existing buffer. If you add this to a SWIG include file:

%insert("python") %{
import numpy as np
%}

/*! Templated function to copy contents of a container to an allocated memory
 * buffer
 */
%inline %{
//==== ADDED BY numpy.i
#include <algorithm>

template < typename Container_T >
void copy_to_buffer(
        const Container_T& field,
        typename Container_T::value_type* buffer,
        typename Container_T::size_type length
        )
{
//    ValidateUserInput( length == field.size(),
//            "Destination buffer is the wrong size" );
    // put your own assertion here or BAD THINGS CAN HAPPEN

    if (length == field.size()) {
        std::copy( field.begin(), field.end(), buffer );
    }
}
//====

%}

%define TYPEMAP_COPY_TO_BUFFER(CLASS...)
%typemap(in) (CLASS::value_type* buffer, CLASS::size_type length)
(int res = 0, Py_ssize_t size_ = 0, void *buffer_ = 0) {

    res = PyObject_AsWriteBuffer($input, &buffer_, &size_);
    if ( res < 0 ) {
        PyErr_Clear();
        %argument_fail(res, "(CLASS::value_type*, CLASS::size_type length)",
                $symname, $argnum);
    }
    $1 = ($1_ltype) buffer_;
    $2 = ($2_ltype) (size_/sizeof($*1_type));
}
%enddef


%define ADD_NUMPY_ARRAY_INTERFACE(PYVALUE, PYCLASS, CLASS...)

TYPEMAP_COPY_TO_BUFFER(CLASS)

%template(_copy_to_buffer_ ## PYCLASS) copy_to_buffer< CLASS >;

%extend CLASS {
%insert("python") %{
def __array__(self):
    """Enable access to this data as a numpy array"""
    a = np.ndarray( shape=( len(self), ), dtype=PYVALUE )
    _copy_to_buffer_ ## PYCLASS(self, a)
    return a
%}
}

%enddef

then you can make a container "Numpy"-able with

%template(DumbVectorFloat) DumbVector<double>;
ADD_NUMPY_ARRAY_INTERFACE(float, DumbVectorFloat, DumbVector<double>);

Then in Python, just do:

# dvf is an instance of DumbVectorFloat
import numpy as np
my_numpy_array = np.asarray( dvf )

This has only the overhead of a single Python <--> C++ translation call, not the N that would result from a typical length-N array.

A slightly more complete version of this code is part of my PyTRT project at github.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文