python中从浮点数元组列表构建ac数组的最快方法是什么?

发布于 2024-10-01 10:38:52 字数 536 浏览 11 评论 0原文

上下文:我的 Python 代码将 2D 顶点数组传递给 OpenGL。

我测试了两种方法,一种使用 ctypes,另一种使用 struct,后者速度快两倍多。

from random import random
points = [(random(), random()) for _ in xrange(1000)]

from ctypes import c_float
def array_ctypes(points):
    n = len(points)
    return n, (c_float*(2*n))(*[u for point in points for u in point])

from struct import pack
def array_struct(points):
    n = len(points)
    return n, pack("f"*2*n, *[u for point in points for u in point])

还有其他选择吗? 关于如何加速此类代码的任何提示(是的,这是我的代码的一个瓶颈)?

The context: my Python code pass arrays of 2D vertices to OpenGL.

I tested 2 approaches, one with ctypes, the other with struct, the latter being more than twice faster.

from random import random
points = [(random(), random()) for _ in xrange(1000)]

from ctypes import c_float
def array_ctypes(points):
    n = len(points)
    return n, (c_float*(2*n))(*[u for point in points for u in point])

from struct import pack
def array_struct(points):
    n = len(points)
    return n, pack("f"*2*n, *[u for point in points for u in point])

Any other alternative?
Any hint on how to accelerate such code (and yes, this is one bottleneck of my code)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

扛起拖把扫天下 2024-10-08 10:38:52

您可以将 numpy 数组传递给 PyOpenGL,而不会产生任何开销。 (numpy 数组的 data 属性是一个缓冲区,指向底层 C 数据结构,其中包含与您正在构建的数组相同的信息)

import numpy as np  
def array_numpy(points):
    n = len(points)
    return n, np.array(points, dtype=np.float32)

在我的计算机上,这比基于struct的方法。

You can pass numpy arrays to PyOpenGL without incurring any overhead. (The data attribute of the numpy array is a buffer that points to the underlying C data structure that contains the same information as the array you're building)

import numpy as np  
def array_numpy(points):
    n = len(points)
    return n, np.array(points, dtype=np.float32)

On my computer, this is about 40% faster than the struct-based approach.

司马昭之心 2024-10-08 10:38:52

你可以试试 Cython。对我来说,这给出了:

function       usec per loop:
               Python  Cython
array_ctypes   1370    1220
array_struct    384     249
array_numpy     336     339

所以 Numpy 只在我的硬件(运行 WindowsXP 的旧笔记本电脑)上提供 15% 的好处,而 Cython 提供大约 35% (在分布式代码中没有任何额外的依赖)。

如果您可以放宽每个点都是浮点数元组的要求,并简单地将“点”设为浮点数的扁平列表:

def array_struct_flat(points):
    n = len(points)
    return pack(
        "f"*n,
        *[
            coord
            for coord in points
        ]
    )

points = [random() for _ in xrange(1000 * 2)]

那么结果输出是相同的,但时间会进一步缩短:

function            usec per loop:
                    Python  Cython
array_struct_flat           157

Cython 可能能够比如果比我聪明的人想在代码中添加静态类型声明,这也是如此。 (运行“cython -a test.pyx”对此非常有用,它会生成一个 html 文件,向您显示代码中最慢的(黄色)纯 Python 位置,以及已转换为纯 C 的 Python(白色)位置。这就是为什么我将上面的代码分散到很多行中,因为着色是按行完成的,所以它有助于像这样分散它。)

完整的 Cython 说明在这里:
http://docs.cython.org/src/quickstart/build.html

Cython 可能会在整个代码库中产生类似的性能优势,并且在理想条件下,应用适当的静态类型,可以将速度提高十到一百倍。

You could try Cython. For me, this gives:

function       usec per loop:
               Python  Cython
array_ctypes   1370    1220
array_struct    384     249
array_numpy     336     339

So Numpy only gives 15% benefit on my hardware (old laptop running WindowsXP), whereas Cython gives about 35% (without any extra dependency in your distributed code).

If you can loosen your requirement that each point is a tuple of floats, and simply make 'points' a flattened list of floats:

def array_struct_flat(points):
    n = len(points)
    return pack(
        "f"*n,
        *[
            coord
            for coord in points
        ]
    )

points = [random() for _ in xrange(1000 * 2)]

then the resulting output is the same, but the timing goes down further:

function            usec per loop:
                    Python  Cython
array_struct_flat           157

Cython might be capable of substantially better than this too, if someone smarter than me wanted to add static type declarations to the code. (Running 'cython -a test.pyx' is invaluable for this, it produces an html file showing you where the slowest (yellow) plain Python is in your code, versus python that has been converted to pure C (white). That's why I spread the code above out onto so many lines, because the coloring is done per-line, so it helps to spread it out like that.)

Full Cython instructions are here:
http://docs.cython.org/src/quickstart/build.html

Cython might produce similar performance benefits across your whole codebase, and in ideal conditions, with proper static typing applied, can improve speed by factors of ten or a hundred.

汐鸠 2024-10-08 10:38:52

如果性能是一个问题,您不希望将 ctypes 数组与星型操作一起使用(例如,(ctypes.c_float * size)(*t))。

在我的测试中,pack 最快,其次是使用 array 模块和地址转换(或使用 from_buffer 函数)。

import timeit
repeat = 100
setup="from struct import pack; from random import random; import numpy;  from array import array; import ctypes; t = [random() for _ in range(2* 1000)];"
print(timeit.timeit(stmt="v = array('f',t); addr, count = v.buffer_info();x = ctypes.cast(addr,ctypes.POINTER(ctypes.c_float))",setup=setup,number=repeat))
print(timeit.timeit(stmt="v = array('f',t);a = (ctypes.c_float * len(v)).from_buffer(v)",setup=setup,number=repeat))
print(timeit.timeit(stmt='x = (ctypes.c_float * len(t))(*t)',setup=setup,number=repeat))
print(timeit.timeit(stmt="x = pack('f'*len(t), *t);",setup=setup,number=repeat))
print(timeit.timeit(stmt='x = (ctypes.c_float * len(t))(); x[:] = t',setup=setup,number=repeat))
print(timeit.timeit(stmt='x = numpy.array(t,numpy.float32).data',setup=setup,number=repeat))

在我的测试中,array.array 方法比 Jonathan Hartley 的方法稍快,而 numpy 方法的速度大约只有一半:

python3 convert.py
0.004665990360081196
0.004661010578274727
0.026358536444604397
0.0028003649786114693
0.005843495950102806
0.009067213162779808

最终的获胜者是 pack。

If performance is an issue, you do not want to use ctypes arrays with the star operation (e.g., (ctypes.c_float * size)(*t)).

In my test packis fastest followed by the use of the array module with a cast of the address (or using the from_buffer function).

import timeit
repeat = 100
setup="from struct import pack; from random import random; import numpy;  from array import array; import ctypes; t = [random() for _ in range(2* 1000)];"
print(timeit.timeit(stmt="v = array('f',t); addr, count = v.buffer_info();x = ctypes.cast(addr,ctypes.POINTER(ctypes.c_float))",setup=setup,number=repeat))
print(timeit.timeit(stmt="v = array('f',t);a = (ctypes.c_float * len(v)).from_buffer(v)",setup=setup,number=repeat))
print(timeit.timeit(stmt='x = (ctypes.c_float * len(t))(*t)',setup=setup,number=repeat))
print(timeit.timeit(stmt="x = pack('f'*len(t), *t);",setup=setup,number=repeat))
print(timeit.timeit(stmt='x = (ctypes.c_float * len(t))(); x[:] = t',setup=setup,number=repeat))
print(timeit.timeit(stmt='x = numpy.array(t,numpy.float32).data',setup=setup,number=repeat))

The array.array approach is slightly faster than Jonathan Hartley's approach in my test while the numpy approach has about half the speed:

python3 convert.py
0.004665990360081196
0.004661010578274727
0.026358536444604397
0.0028003649786114693
0.005843495950102806
0.009067213162779808

The net winner is pack.

遗失的美好 2024-10-08 10:38:52

我偶然发现了另一个想法。我现在没有时间分析它,但以防万一其他人这样做:

 # untested, but I'm fairly confident it runs
 # using 'flattened points' list, i.e. a list of n*2 floats
 points = [random() for _ in xrange(1000 * 2)]
 c_array = c_float * len(points * 2)
 c_array[:] = points

也就是说,首先我们创建 ctypes 数组但不填充它。然后我们使用切片符号填充它。比我聪明的人告诉我,分配给这样的切片可能会提高性能。它允许我们直接在赋值的 RHS 上传递列表或可迭代对象,而不必使用 *iterable 语法,这会执行可迭代对象的一些中间处理。我怀疑这就是在创建 pyglet 批次的深处发生的事情。

大概您可以只创建一次 c_array ,然后每次点列表更改时重新分配给它(上面代码中的最后一行)。

可能有一个替代的公式接受点的原始定义((x,y)元组的列表。)像这样:

 # very untested, likely contains errors
 # using a list of n tuples of two floats
 points = [(random(), random()) for _ in xrange(1000)]
 c_array = c_float * len(points * 2)
 c_array[:] = chain(p for p in points)

There's another idea I stumbled across. I don't have time to profile it right now, but in case someone else does:

 # untested, but I'm fairly confident it runs
 # using 'flattened points' list, i.e. a list of n*2 floats
 points = [random() for _ in xrange(1000 * 2)]
 c_array = c_float * len(points * 2)
 c_array[:] = points

That is, first we create the ctypes array but don't populate it. Then we populate it using the slice notation. People smarter than I tell me that assigning to a slice like this may help performance. It allows us to pass a list or iterable directly on the RHS of the assignment, without having to use the *iterable syntax, which would perform some intermediate wrangling of the iterable. I suspect that this is what happens in the depths of creating pyglet's Batches.

Presumably you could just create c_array once, then just reassign to it (the final line in the above code) every time the points list changes.

There is probably an alternative formulation which accepts the original definition of points (a list of (x,y) tuples.) Something like this:

 # very untested, likely contains errors
 # using a list of n tuples of two floats
 points = [(random(), random()) for _ in xrange(1000)]
 c_array = c_float * len(points * 2)
 c_array[:] = chain(p for p in points)
心舞飞扬 2024-10-08 10:38:52

您可以使用 array (还要注意生成器表达式而不是列表理解):

array("f", (u for point in points for u in point)).tostring()

另一种优化就是从一开始就保持点平坦。

You can use array (notice also the generator expression instead of the list comprehension):

array("f", (u for point in points for u in point)).tostring()

Another optimization would be to keep the points flattened from the beginning.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文