一种将 python 中的数百万个项目快速连续多次传递给 C 程序的方法

发布于 2024-10-17 22:07:58 字数 2261 浏览 8 评论 0原文

我编写了一个Python脚本,需要将数百万个项目传递给C程序,并在短时间内多次接收其输出(从1到1000万个顶点数据(整数索引和2个浮点坐标)快速传递500次) ,每次Python脚本调用C程序时,我都需要将返回值存储在变量中)。我已经实现了一种读取和写入文本和/或二进制文件的方法,但它很慢而且不智能(为什么要在 python 脚本终止后不需要存储数据时将文件写入硬盘?)。我尝试使用管道,但对于大数据,他们给了我错误...... 所以,现在我认为最好的方法是使用 ctypes 加载 .dll 中的函数的能力 由于我从未创建过 dll,所以我想知道如何设置它(我知道很多 ide 有一个模板,但是当我尝试打开它时,我的 wxdev-c++ 崩溃了。现在我正在下载代码::Blocks )

你能告诉我我开始实施的解决方案是否正确,或者是否有更好的解决方案? 我需要在 python 中调用的 2 个函数是这些

void find_vertex(vertex *list, int len, vertex* lower, vertex* highter)
{
    int i;
    *lower=list[0];
    *highter=list[1];
    for(i=0;i<len;i++)
    {
        if ((list[i].x<=lower->x) && (list[i].y<=lower->y))
            *lower=list[i];
        else
        {
            if ((list[i].x>=highter->x) && (list[i].y>=highter->y))
                *highter=list[i];
        }
    }
}

vertex *square_list_of_vertex(vertex *list,int len,vertex start, float size)
{
    int i=0,a=0;
    unsigned int *num;
    num=(int*)malloc(sizeof(unsigned int)*len);
    if (num==NULL)
    {
        printf("Can't allocate the memory");
        return 0;
    }
    //controlls which points are in the right position and adds their index from the main list in another list
    for(i=0;i<len;i++)
    {
        if ((list[i].x-start.x)<size && (list[i].y-start.y<size))
        {
            if (list[i].y-start.y>-size/100)
            {
                num[a]=i;
                a++;//len of the list to return
            }
        }
    }

    //create the list with the right vertices
    vertex *retlist;
    retlist=(vertex*)malloc(sizeof(vertex)*(a+1));
    if (retlist==NULL)
    {
        printf("Can't allocate the memory");
        return 0;
    }
    //the first index is used only as an info container
    vertex infos;
    infos.index=a+1;
    retlist[0]=infos;

    //set the value for the return pointer
    for(i=1;i<=a;i++)
    {
        retlist[i]=list[num[i-1]];
    }

    return retlist;
}

编辑: 忘记发布顶点 EDIT2 的类型定义

typedef struct{
    int index;
    float x,y;
} vertex;

: 我将重新分发代码,所以我不喜欢在 python 中使用外部模块,在 C 中使用外部程序。Alsa 我想尝试保持代码跨平台。该脚本是 3D 应用程序的插件,因此使用外部“东西”越少越好。

I've wrote a python script that need to pass millions of items to a C program and receive its output many times in a short period (pass from 1 up to 10 millions of vertices data (integer index and 2 float coords) rapidly 500 times, and each time the python script call the C program, i need to store the returned values in variables). I already implemented a way reading and writing text and or binary files, but it's slow and not smart(why write files to hdd while you don't need to store the data after the python script terminates?). I tried to use pipes, but for large data they gave me errors...
So, by now i think the best way can be using the ability of ctypes to load functions in .dll
Since i've never created a dll, i would like to know how to set it up (i know many ide have a template for this, but my wxdev-c++ crashes when i try to open it. Right now i'm downloading Code::Blocks )

Can you tell me if the solution i'm starting to implement is right, or if there is a better solution?
The 2 functions i need to call in python are these

void find_vertex(vertex *list, int len, vertex* lower, vertex* highter)
{
    int i;
    *lower=list[0];
    *highter=list[1];
    for(i=0;i<len;i++)
    {
        if ((list[i].x<=lower->x) && (list[i].y<=lower->y))
            *lower=list[i];
        else
        {
            if ((list[i].x>=highter->x) && (list[i].y>=highter->y))
                *highter=list[i];
        }
    }
}

and

vertex *square_list_of_vertex(vertex *list,int len,vertex start, float size)
{
    int i=0,a=0;
    unsigned int *num;
    num=(int*)malloc(sizeof(unsigned int)*len);
    if (num==NULL)
    {
        printf("Can't allocate the memory");
        return 0;
    }
    //controlls which points are in the right position and adds their index from the main list in another list
    for(i=0;i<len;i++)
    {
        if ((list[i].x-start.x)<size && (list[i].y-start.y<size))
        {
            if (list[i].y-start.y>-size/100)
            {
                num[a]=i;
                a++;//len of the list to return
            }
        }
    }

    //create the list with the right vertices
    vertex *retlist;
    retlist=(vertex*)malloc(sizeof(vertex)*(a+1));
    if (retlist==NULL)
    {
        printf("Can't allocate the memory");
        return 0;
    }
    //the first index is used only as an info container
    vertex infos;
    infos.index=a+1;
    retlist[0]=infos;

    //set the value for the return pointer
    for(i=1;i<=a;i++)
    {
        retlist[i]=list[num[i-1]];
    }

    return retlist;
}

EDIT:
forgot to post the type defintion of vertex

typedef struct{
    int index;
    float x,y;
} vertex;

EDIT2:
I'll redistribute the code, so i prefer not to use external modules in python and external programs in C. Alsa i want try to keep the code cross platform. The script is an addon for a 3D app, so the less it uses external "stuff" the better it is.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

懷念過去 2024-10-24 22:07:58

使用 ctypes 或 Cython 来包装 C 函数绝对是最佳选择。这样,您甚至不需要在 C 和 Python 代码之间复制数据——C 和 Python 部分都在同一进程中运行并访问相同的数据。让我们坚持使用ctypes,因为这是您所建议的。此外,使用 NumPy 将使这变得更加舒适。

我推断您的 vertex 类型如下所示:

typedef struct
{
    int index;
    float x, y;
} vertex;

要将这些顶点放入 NumPy 数组中,您可以为其定义一个记录“dtype”:

vertex_dtype = [('index', 'i'), ('x', 'f'), ('y', 'f')]

同时将此类型定义为 ctypes结构:

class Vertex(ctypes.Structure):
    _fields_ = [("index", ctypes.c_int),
                ("x", ctypes.c_float),
                ("y", ctypes.c_float)]

现在,函数 find_vertex()ctypes 原型将如下所示:

from numpy.ctypeslib import ndpointer
lib = ctypes.CDLL(...)
lib.find_vertex.argtypes = [ndpointer(dtype=vertex_dtype, flags="C_CONTIGUOUS"),
                            ctypes.c_int,
                            ctypes.POINTER(Vertex),
                            ctypes.POINTER(Vertex)]
lib.find_vertex.restypes = None

要调用此函数,请创建一个 NumPy 顶点数组

vertices = numpy.empty(1000, dtype=vertex_dtype)

和两个返回值结构体

lower = Vertex()
higher = Vertex()

最后调用您的函数:

lib.find_vertex(vertices, len(vertices), lower, higher)

NumPy 和 ctypes 将负责将指向 vertices 数据开头的指针传递给您的 C 函数——无需复制。

也许,您将需要阅读一些有关 ctypes 和 NumPy 的文档,但我希望这个答案可以帮助您开始使用它。

Using ctypes or Cython to wrap your C functions is definitely the way to go. That way, you won't even need to copy the data between the C and Python code -- both the C and the Python part run within the same process and access the same data. Let's stick with ctypes, since this is what you suggested. Additionally, using NumPy will make this a lot more comfortable.

I infer your vertex type looks like this:

typedef struct
{
    int index;
    float x, y;
} vertex;

To have these vertices in a NumPy array, you can define a record "dtype" for it:

vertex_dtype = [('index', 'i'), ('x', 'f'), ('y', 'f')]

Also define this type as a ctypes structure:

class Vertex(ctypes.Structure):
    _fields_ = [("index", ctypes.c_int),
                ("x", ctypes.c_float),
                ("y", ctypes.c_float)]

Now, the ctypes prototype for your function find_vertex() would look like this:

from numpy.ctypeslib import ndpointer
lib = ctypes.CDLL(...)
lib.find_vertex.argtypes = [ndpointer(dtype=vertex_dtype, flags="C_CONTIGUOUS"),
                            ctypes.c_int,
                            ctypes.POINTER(Vertex),
                            ctypes.POINTER(Vertex)]
lib.find_vertex.restypes = None

To call this function, create a NumPy array of vertices

vertices = numpy.empty(1000, dtype=vertex_dtype)

and two structures for the return values

lower = Vertex()
higher = Vertex()

and finally call your function:

lib.find_vertex(vertices, len(vertices), lower, higher)

NumPy and ctypes will take care of passing the pointer to the beginning of the data of vertices to your C function -- no copying required.

Probably, you will have to read a bit of documentation on ctypes and NumPy, but I hope this answer helps you to get started with it.

巴黎夜雨 2024-10-24 22:07:58

看来你真正想要的是将你的 C 程序变成 Python 模块。 这里是一个可以帮助您入门的教程。

It seems like what you really want is to turn your C program into a Python module. Here is a tutorial that will get you started.

零度℉ 2024-10-24 22:07:58

如果您想在两个程序之间传递数据,并且您已经有了使用文件的代码,为什么不直接使用 RAMdisk 呢?对于 Windows,您可以使用类似 http://www.ltr-data.se/ opencode.html/#ImDisk 创建 RAMdisk,您可以使用列出的命令 此处。对于少量数据(任何适合 RAM 而不需要不断调出页面的数据),这应该比基于磁盘的操作好几个数量级。

If you want to pass data between two programs, and you already have the code to use a file, why not just use a RAMdisk? For Windows, you can use something like http://www.ltr-data.se/opencode.html/#ImDisk to create the RAMdisk and you can use the commands listed here for Linux. For smallish amounts of data (anything that will fit in RAM without requiring to be constantly paged out), this should outperform disk-based operations by a couple of orders of magnitude.

ζ澈沫 2024-10-24 22:07:58

迭代数百万个项目是你在 Python 中可能做的最糟糕的操作...如果可能的话用 C 或 C++ 编写这部分程序,它将快 100 倍,并且使用的内存会少 100 倍...

我喜欢 python,但它不是此类操作的最佳解决方案。

Iterating over millions of items is the worst possible operation you could do in Python... If at all possible write this portion of the program in C or C++, it will be 100's of times faster and use 100's of times less memory...

I love python, but it's not a best solution for this type of operation.

国粹 2024-10-24 22:07:58

如果可以的话,让 Python 程序缓冲它正在发送的数据,这样它就不会一一发送每个顶点。将它们保存到 100、500 或 1000 个,这样您拨打的电话就会减少。进行一些时序测试以确定最佳缓冲区大小。

If you can, make the Python program buffer the data that it is sending so that it does not send every vertex one by one. Save them up until there are 100 or 500 or 1000 and that way you will make fewer calls. Do some timing tests to determine optimal buffer size.

悲念泪 2024-10-24 22:07:58

我想我会使用像 sysv ipc 这样的库来完成这项工作,并将数据简单地映射到共享内存段。

I think I would use a library like sysv ipc for this job and simply map the data to a shared memory segment.

醉南桥 2024-10-24 22:07:58

这是使用 Cython 为 CPython 编写扩展模块的变体。

要在 Cython 中使用的 C 声明:

# file: cvertex.pxd
cdef extern from "vertex.h":
    ctypedef struct vertex:
        int index
        float x,y
    void find_vertex(vertex *list, int len, vertex* lower, vertex* highter)

其中 vertex.h 是:

typedef struct{
    int index;
    float x,y;
} vertex;

void find_vertex(vertex *list, int len, vertex* lower, vertex* highter);

要在 Python 中使用的 Cython 实现:

# file: pyvertex.pyx
cimport numpy
cimport cvertex # use declarations from cvertex.pxd

def find_vertex(numpy.ndarray[cvertex.vertex,ndim=1,mode="c"] vertices):
    if len(vertices) < 2:
        raise ValueError('provide at least 2 vertices')

    cdef cvertex.vertex lower, highter
    cvertex.find_vertex(<cvertex.vertex*>vertices.data, len(vertices),
                        &lower, &highter)
    return lower, highter # implicitly convert to dicts

要编译扩展,请运行:

$ python setup.py build_ext -i

其中 setup.py 是:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

setup(
    cmdclass = {'build_ext': build_ext},
    ext_modules = [Extension("vertex", ["pyvertex.pyx", "vertex.c"])]
)

现在可以从 Python 使用该扩展:

import numpy
import vertex # import the extension

n = 10000000
vertex_list = numpy.zeros(n, dtype=[('index', 'i'), ('x', 'f'), ('y', 'f')])
i = n//2
vertex_list[i] = i, 1, 1
v1, v2 = vertex.find_vertex(vertex_list)
print(v2['index'])
print(v1, v2)

Output

5000000
{'y': 0.0, 'index': 0, 'x': 0.0} {'y': 1.0, 'index': 5000000, 'x': 1.0}

Here's a variant that uses Cython to write an extension module for CPython.

C declarations to be used in Cython:

# file: cvertex.pxd
cdef extern from "vertex.h":
    ctypedef struct vertex:
        int index
        float x,y
    void find_vertex(vertex *list, int len, vertex* lower, vertex* highter)

Where vertex.h is:

typedef struct{
    int index;
    float x,y;
} vertex;

void find_vertex(vertex *list, int len, vertex* lower, vertex* highter);

Cython implementation to be used in Python:

# file: pyvertex.pyx
cimport numpy
cimport cvertex # use declarations from cvertex.pxd

def find_vertex(numpy.ndarray[cvertex.vertex,ndim=1,mode="c"] vertices):
    if len(vertices) < 2:
        raise ValueError('provide at least 2 vertices')

    cdef cvertex.vertex lower, highter
    cvertex.find_vertex(<cvertex.vertex*>vertices.data, len(vertices),
                        &lower, &highter)
    return lower, highter # implicitly convert to dicts

To compile the extension, run:

$ python setup.py build_ext -i

Where setup.py is:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

setup(
    cmdclass = {'build_ext': build_ext},
    ext_modules = [Extension("vertex", ["pyvertex.pyx", "vertex.c"])]
)

Now the extension can be used from Python:

import numpy
import vertex # import the extension

n = 10000000
vertex_list = numpy.zeros(n, dtype=[('index', 'i'), ('x', 'f'), ('y', 'f')])
i = n//2
vertex_list[i] = i, 1, 1
v1, v2 = vertex.find_vertex(vertex_list)
print(v2['index'])
print(v1, v2)

Output

5000000
{'y': 0.0, 'index': 0, 'x': 0.0} {'y': 1.0, 'index': 5000000, 'x': 1.0}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文