一种将 python 中的数百万个项目快速连续多次传递给 C 程序的方法
我编写了一个Python脚本,需要将数百万个项目传递给C程序,并在短时间内多次接收其输出(从1到1000万个顶点数据(整数索引和2个浮点坐标)快速传递500次) ,每次Python脚本调用C程序时,我都需要将返回值存储在变量中)。我已经实现了一种读取和写入文本和/或二进制文件的方法,但它很慢而且不智能(为什么要在 python 脚本终止后不需要存储数据时将文件写入硬盘?)。我尝试使用管道,但对于大数据,他们给了我错误...... 所以,现在我认为最好的方法是使用 ctypes 加载 .dll 中的函数的能力 由于我从未创建过 dll,所以我想知道如何设置它(我知道很多 ide 有一个模板,但是当我尝试打开它时,我的 wxdev-c++ 崩溃了。现在我正在下载代码::Blocks )
你能告诉我我开始实施的解决方案是否正确,或者是否有更好的解决方案? 我需要在 python 中调用的 2 个函数是这些
void find_vertex(vertex *list, int len, vertex* lower, vertex* highter)
{
int i;
*lower=list[0];
*highter=list[1];
for(i=0;i<len;i++)
{
if ((list[i].x<=lower->x) && (list[i].y<=lower->y))
*lower=list[i];
else
{
if ((list[i].x>=highter->x) && (list[i].y>=highter->y))
*highter=list[i];
}
}
}
和
vertex *square_list_of_vertex(vertex *list,int len,vertex start, float size)
{
int i=0,a=0;
unsigned int *num;
num=(int*)malloc(sizeof(unsigned int)*len);
if (num==NULL)
{
printf("Can't allocate the memory");
return 0;
}
//controlls which points are in the right position and adds their index from the main list in another list
for(i=0;i<len;i++)
{
if ((list[i].x-start.x)<size && (list[i].y-start.y<size))
{
if (list[i].y-start.y>-size/100)
{
num[a]=i;
a++;//len of the list to return
}
}
}
//create the list with the right vertices
vertex *retlist;
retlist=(vertex*)malloc(sizeof(vertex)*(a+1));
if (retlist==NULL)
{
printf("Can't allocate the memory");
return 0;
}
//the first index is used only as an info container
vertex infos;
infos.index=a+1;
retlist[0]=infos;
//set the value for the return pointer
for(i=1;i<=a;i++)
{
retlist[i]=list[num[i-1]];
}
return retlist;
}
编辑: 忘记发布顶点 EDIT2 的类型定义
typedef struct{
int index;
float x,y;
} vertex;
: 我将重新分发代码,所以我不喜欢在 python 中使用外部模块,在 C 中使用外部程序。Alsa 我想尝试保持代码跨平台。该脚本是 3D 应用程序的插件,因此使用外部“东西”越少越好。
I've wrote a python script that need to pass millions of items to a C program and receive its output many times in a short period (pass from 1 up to 10 millions of vertices data (integer index and 2 float coords) rapidly 500 times, and each time the python script call the C program, i need to store the returned values in variables). I already implemented a way reading and writing text and or binary files, but it's slow and not smart(why write files to hdd while you don't need to store the data after the python script terminates?). I tried to use pipes, but for large data they gave me errors...
So, by now i think the best way can be using the ability of ctypes to load functions in .dll
Since i've never created a dll, i would like to know how to set it up (i know many ide have a template for this, but my wxdev-c++ crashes when i try to open it. Right now i'm downloading Code::Blocks )
Can you tell me if the solution i'm starting to implement is right, or if there is a better solution?
The 2 functions i need to call in python are these
void find_vertex(vertex *list, int len, vertex* lower, vertex* highter)
{
int i;
*lower=list[0];
*highter=list[1];
for(i=0;i<len;i++)
{
if ((list[i].x<=lower->x) && (list[i].y<=lower->y))
*lower=list[i];
else
{
if ((list[i].x>=highter->x) && (list[i].y>=highter->y))
*highter=list[i];
}
}
}
and
vertex *square_list_of_vertex(vertex *list,int len,vertex start, float size)
{
int i=0,a=0;
unsigned int *num;
num=(int*)malloc(sizeof(unsigned int)*len);
if (num==NULL)
{
printf("Can't allocate the memory");
return 0;
}
//controlls which points are in the right position and adds their index from the main list in another list
for(i=0;i<len;i++)
{
if ((list[i].x-start.x)<size && (list[i].y-start.y<size))
{
if (list[i].y-start.y>-size/100)
{
num[a]=i;
a++;//len of the list to return
}
}
}
//create the list with the right vertices
vertex *retlist;
retlist=(vertex*)malloc(sizeof(vertex)*(a+1));
if (retlist==NULL)
{
printf("Can't allocate the memory");
return 0;
}
//the first index is used only as an info container
vertex infos;
infos.index=a+1;
retlist[0]=infos;
//set the value for the return pointer
for(i=1;i<=a;i++)
{
retlist[i]=list[num[i-1]];
}
return retlist;
}
EDIT:
forgot to post the type defintion of vertex
typedef struct{
int index;
float x,y;
} vertex;
EDIT2:
I'll redistribute the code, so i prefer not to use external modules in python and external programs in C. Alsa i want try to keep the code cross platform. The script is an addon for a 3D app, so the less it uses external "stuff" the better it is.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
使用 ctypes 或 Cython 来包装 C 函数绝对是最佳选择。这样,您甚至不需要在 C 和 Python 代码之间复制数据——C 和 Python 部分都在同一进程中运行并访问相同的数据。让我们坚持使用
ctypes
,因为这是您所建议的。此外,使用 NumPy 将使这变得更加舒适。我推断您的
vertex
类型如下所示:要将这些顶点放入 NumPy 数组中,您可以为其定义一个记录“dtype”:
同时将此类型定义为
ctypes
结构:现在,函数
find_vertex()
的ctypes
原型将如下所示:要调用此函数,请创建一个 NumPy 顶点数组
和两个返回值结构体
最后调用您的函数:
NumPy 和
ctypes
将负责将指向vertices
数据开头的指针传递给您的 C 函数——无需复制。也许,您将需要阅读一些有关 ctypes 和 NumPy 的文档,但我希望这个答案可以帮助您开始使用它。
Using
ctypes
or Cython to wrap your C functions is definitely the way to go. That way, you won't even need to copy the data between the C and Python code -- both the C and the Python part run within the same process and access the same data. Let's stick withctypes
, since this is what you suggested. Additionally, using NumPy will make this a lot more comfortable.I infer your
vertex
type looks like this:To have these vertices in a NumPy array, you can define a record "dtype" for it:
Also define this type as a
ctypes
structure:Now, the
ctypes
prototype for your functionfind_vertex()
would look like this:To call this function, create a NumPy array of vertices
and two structures for the return values
and finally call your function:
NumPy and
ctypes
will take care of passing the pointer to the beginning of the data ofvertices
to your C function -- no copying required.Probably, you will have to read a bit of documentation on
ctypes
and NumPy, but I hope this answer helps you to get started with it.看来你真正想要的是将你的 C 程序变成 Python 模块。 这里是一个可以帮助您入门的教程。
It seems like what you really want is to turn your C program into a Python module. Here is a tutorial that will get you started.
如果您想在两个程序之间传递数据,并且您已经有了使用文件的代码,为什么不直接使用 RAMdisk 呢?对于 Windows,您可以使用类似 http://www.ltr-data.se/ opencode.html/#ImDisk 创建 RAMdisk,您可以使用列出的命令 此处。对于少量数据(任何适合 RAM 而不需要不断调出页面的数据),这应该比基于磁盘的操作好几个数量级。
If you want to pass data between two programs, and you already have the code to use a file, why not just use a RAMdisk? For Windows, you can use something like http://www.ltr-data.se/opencode.html/#ImDisk to create the RAMdisk and you can use the commands listed here for Linux. For smallish amounts of data (anything that will fit in RAM without requiring to be constantly paged out), this should outperform disk-based operations by a couple of orders of magnitude.
迭代数百万个项目是你在 Python 中可能做的最糟糕的操作...如果可能的话用 C 或 C++ 编写这部分程序,它将快 100 倍,并且使用的内存会少 100 倍...
我喜欢 python,但它不是此类操作的最佳解决方案。
Iterating over millions of items is the worst possible operation you could do in Python... If at all possible write this portion of the program in C or C++, it will be 100's of times faster and use 100's of times less memory...
I love python, but it's not a best solution for this type of operation.
如果可以的话,让 Python 程序缓冲它正在发送的数据,这样它就不会一一发送每个顶点。将它们保存到 100、500 或 1000 个,这样您拨打的电话就会减少。进行一些时序测试以确定最佳缓冲区大小。
If you can, make the Python program buffer the data that it is sending so that it does not send every vertex one by one. Save them up until there are 100 or 500 or 1000 and that way you will make fewer calls. Do some timing tests to determine optimal buffer size.
我想我会使用像 sysv ipc 这样的库来完成这项工作,并将数据简单地映射到共享内存段。
I think I would use a library like sysv ipc for this job and simply map the data to a shared memory segment.
这是使用 Cython 为 CPython 编写扩展模块的变体。
要在 Cython 中使用的 C 声明:
其中
vertex.h
是:要在 Python 中使用的 Cython 实现:
要编译扩展,请运行:
其中
setup.py
是:现在可以从 Python 使用该扩展:
Output
Here's a variant that uses Cython to write an extension module for CPython.
C declarations to be used in Cython:
Where
vertex.h
is:Cython implementation to be used in Python:
To compile the extension, run:
Where
setup.py
is:Now the extension can be used from Python:
Output