为 pyopengl 和 numpy 构建交错缓冲区
我试图在将交错数组发送到 pyOpengl 的 glInterleavedArrays/glDrawArrays 之前批量处理一堆顶点和纹理坐标。唯一的问题是我无法找到足够快的方法将数据附加到 numpy 数组中。
有更好的方法吗?我本以为预先分配数组然后用数据填充它会更快,但相反,生成 python 列表并将其转换为 numpy 数组“更快”。虽然 4096 四边形的 15 毫秒看起来很慢。
我已经包含了一些示例代码及其时间安排。
#!/usr/bin/python
import timeit
import numpy
import ctypes
import random
USE_RANDOM=True
USE_STATIC_BUFFER=True
STATIC_BUFFER = numpy.empty(4096*20, dtype=numpy.float32)
def render(i):
# pretend these are different each time
if USE_RANDOM:
tex_left, tex_right, tex_top, tex_bottom = random.random(), random.random(), random.random(), random.random()
left, right, top, bottom = random.random(), random.random(), random.random(), random.random()
else:
tex_left, tex_right, tex_top, tex_bottom = 0.0, 1.0, 1.0, 0.0
left, right, top, bottom = -1.0, 1.0, 1.0, -1.0
ibuffer = (
tex_left, tex_bottom, left, bottom, 0.0, # Lower left corner
tex_right, tex_bottom, right, bottom, 0.0, # Lower right corner
tex_right, tex_top, right, top, 0.0, # Upper right corner
tex_left, tex_top, left, top, 0.0, # upper left
)
return ibuffer
# create python list.. convert to numpy array at end
def create_array_1():
ibuffer = []
for x in xrange(4096):
data = render(x)
ibuffer += data
ibuffer = numpy.array(ibuffer, dtype=numpy.float32)
return ibuffer
# numpy.array, placing individually by index
def create_array_2():
if USE_STATIC_BUFFER:
ibuffer = STATIC_BUFFER
else:
ibuffer = numpy.empty(4096*20, dtype=numpy.float32)
index = 0
for x in xrange(4096):
data = render(x)
for v in data:
ibuffer[index] = v
index += 1
return ibuffer
# using slicing
def create_array_3():
if USE_STATIC_BUFFER:
ibuffer = STATIC_BUFFER
else:
ibuffer = numpy.empty(4096*20, dtype=numpy.float32)
index = 0
for x in xrange(4096):
data = render(x)
ibuffer[index:index+20] = data
index += 20
return ibuffer
# using numpy.concat on a list of ibuffers
def create_array_4():
ibuffer_concat = []
for x in xrange(4096):
data = render(x)
# converting makes a diff!
data = numpy.array(data, dtype=numpy.float32)
ibuffer_concat.append(data)
return numpy.concatenate(ibuffer_concat)
# using numpy array.put
def create_array_5():
if USE_STATIC_BUFFER:
ibuffer = STATIC_BUFFER
else:
ibuffer = numpy.empty(4096*20, dtype=numpy.float32)
index = 0
for x in xrange(4096):
data = render(x)
ibuffer.put( xrange(index, index+20), data)
index += 20
return ibuffer
# using ctype array
CTYPES_ARRAY = ctypes.c_float*(4096*20)
def create_array_6():
ibuffer = []
for x in xrange(4096):
data = render(x)
ibuffer += data
ibuffer = CTYPES_ARRAY(*ibuffer)
return ibuffer
def equals(a, b):
for i,v in enumerate(a):
if b[i] != v:
return False
return True
if __name__ == "__main__":
number = 100
# if random, don't try and compare arrays
if not USE_RANDOM and not USE_STATIC_BUFFER:
a = create_array_1()
assert equals( a, create_array_2() )
assert equals( a, create_array_3() )
assert equals( a, create_array_4() )
assert equals( a, create_array_5() )
assert equals( a, create_array_6() )
t = timeit.Timer( "testing2.create_array_1()", "import testing2" )
print 'from list:', t.timeit(number)/number*1000.0, 'ms'
t = timeit.Timer( "testing2.create_array_2()", "import testing2" )
print 'array: indexed:', t.timeit(number)/number*1000.0, 'ms'
t = timeit.Timer( "testing2.create_array_3()", "import testing2" )
print 'array: slicing:', t.timeit(number)/number*1000.0, 'ms'
t = timeit.Timer( "testing2.create_array_4()", "import testing2" )
print 'array: concat:', t.timeit(number)/number*1000.0, 'ms'
t = timeit.Timer( "testing2.create_array_5()", "import testing2" )
print 'array: put:', t.timeit(number)/number*1000.0, 'ms'
t = timeit.Timer( "testing2.create_array_6()", "import testing2" )
print 'ctypes float array:', t.timeit(number)/number*1000.0, 'ms'
使用随机数的计时:
$ python testing2.py
from list: 15.0486779213 ms
array: indexed: 24.8184704781 ms
array: slicing: 50.2214789391 ms
array: concat: 44.1691994667 ms
array: put: 73.5879898071 ms
ctypes float array: 20.6674289703 ms
编辑注释:更改代码以为每个渲染生成随机数,以减少对象重用并每次模拟不同的顶点。
编辑注释2:添加静态缓冲区并强制所有numpy。 empty() 使用 dtype=float32
注释 1/4/2010:仍然没有进展,我并不真正认为任何答案已经解决了问题。
I'm trying to batch up a bunch of vertices and texture coords in an interleaved array before sending it to pyOpengl's glInterleavedArrays/glDrawArrays. The only problem is that I'm unable to find a suitably fast enough way to append data into a numpy array.
Is there a better way to do this? I would have thought it would be quicker to preallocate the array and then fill it with data but instead, generating a python list and converting it to a numpy array is "faster". Although 15ms for 4096 quads seems slow.
I have included some example code and their timings.
#!/usr/bin/python
import timeit
import numpy
import ctypes
import random
USE_RANDOM=True
USE_STATIC_BUFFER=True
STATIC_BUFFER = numpy.empty(4096*20, dtype=numpy.float32)
def render(i):
# pretend these are different each time
if USE_RANDOM:
tex_left, tex_right, tex_top, tex_bottom = random.random(), random.random(), random.random(), random.random()
left, right, top, bottom = random.random(), random.random(), random.random(), random.random()
else:
tex_left, tex_right, tex_top, tex_bottom = 0.0, 1.0, 1.0, 0.0
left, right, top, bottom = -1.0, 1.0, 1.0, -1.0
ibuffer = (
tex_left, tex_bottom, left, bottom, 0.0, # Lower left corner
tex_right, tex_bottom, right, bottom, 0.0, # Lower right corner
tex_right, tex_top, right, top, 0.0, # Upper right corner
tex_left, tex_top, left, top, 0.0, # upper left
)
return ibuffer
# create python list.. convert to numpy array at end
def create_array_1():
ibuffer = []
for x in xrange(4096):
data = render(x)
ibuffer += data
ibuffer = numpy.array(ibuffer, dtype=numpy.float32)
return ibuffer
# numpy.array, placing individually by index
def create_array_2():
if USE_STATIC_BUFFER:
ibuffer = STATIC_BUFFER
else:
ibuffer = numpy.empty(4096*20, dtype=numpy.float32)
index = 0
for x in xrange(4096):
data = render(x)
for v in data:
ibuffer[index] = v
index += 1
return ibuffer
# using slicing
def create_array_3():
if USE_STATIC_BUFFER:
ibuffer = STATIC_BUFFER
else:
ibuffer = numpy.empty(4096*20, dtype=numpy.float32)
index = 0
for x in xrange(4096):
data = render(x)
ibuffer[index:index+20] = data
index += 20
return ibuffer
# using numpy.concat on a list of ibuffers
def create_array_4():
ibuffer_concat = []
for x in xrange(4096):
data = render(x)
# converting makes a diff!
data = numpy.array(data, dtype=numpy.float32)
ibuffer_concat.append(data)
return numpy.concatenate(ibuffer_concat)
# using numpy array.put
def create_array_5():
if USE_STATIC_BUFFER:
ibuffer = STATIC_BUFFER
else:
ibuffer = numpy.empty(4096*20, dtype=numpy.float32)
index = 0
for x in xrange(4096):
data = render(x)
ibuffer.put( xrange(index, index+20), data)
index += 20
return ibuffer
# using ctype array
CTYPES_ARRAY = ctypes.c_float*(4096*20)
def create_array_6():
ibuffer = []
for x in xrange(4096):
data = render(x)
ibuffer += data
ibuffer = CTYPES_ARRAY(*ibuffer)
return ibuffer
def equals(a, b):
for i,v in enumerate(a):
if b[i] != v:
return False
return True
if __name__ == "__main__":
number = 100
# if random, don't try and compare arrays
if not USE_RANDOM and not USE_STATIC_BUFFER:
a = create_array_1()
assert equals( a, create_array_2() )
assert equals( a, create_array_3() )
assert equals( a, create_array_4() )
assert equals( a, create_array_5() )
assert equals( a, create_array_6() )
t = timeit.Timer( "testing2.create_array_1()", "import testing2" )
print 'from list:', t.timeit(number)/number*1000.0, 'ms'
t = timeit.Timer( "testing2.create_array_2()", "import testing2" )
print 'array: indexed:', t.timeit(number)/number*1000.0, 'ms'
t = timeit.Timer( "testing2.create_array_3()", "import testing2" )
print 'array: slicing:', t.timeit(number)/number*1000.0, 'ms'
t = timeit.Timer( "testing2.create_array_4()", "import testing2" )
print 'array: concat:', t.timeit(number)/number*1000.0, 'ms'
t = timeit.Timer( "testing2.create_array_5()", "import testing2" )
print 'array: put:', t.timeit(number)/number*1000.0, 'ms'
t = timeit.Timer( "testing2.create_array_6()", "import testing2" )
print 'ctypes float array:', t.timeit(number)/number*1000.0, 'ms'
Timings using random numbers:
$ python testing2.py
from list: 15.0486779213 ms
array: indexed: 24.8184704781 ms
array: slicing: 50.2214789391 ms
array: concat: 44.1691994667 ms
array: put: 73.5879898071 ms
ctypes float array: 20.6674289703 ms
edit note: changed code to produce random numbers for each render to reduce object reuse and to simulate different vertices each time.
edit note2: added static buffer and force all numpy.empty() to use dtype=float32
note 1/Apr/2010: still no progress and I don't really feel that any of the answers have solved the problem yet.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
create_array_1 如此快的原因似乎是(python)列表中的项目都指向同一个对象。 进行测试,您可以看到这一点
如果您在子例程内 。在 create_array_1 中这是 true (在创建 numpy 数组之前),而在 create_array_2 中这总是 false。我猜这意味着数组转换中的数据转换步骤只需在 create_array_1 中发生一次,而在 create_array_2 中发生 4096 次。
如果这是原因,我想如果你让渲染生成随机数据,时间会有所不同。 Create_array_5 是最慢的,因为每次向末尾添加数据时它都会创建一个新数组。
The reason that create_array_1 is so much faster seems to be that the items in the (python) list all point to the same object. You can see this if you test:
inside the subroutines. In create_array_1 this is true (before you create the numpy array), while in create_array_2 this is always going to be false. I guess this means that data conversion step in the array conversion only has to happen once in create_array_1, while it happens 4096 times in create_array_2.
If this is the reason, I guess the timings will be different if you make render generate random data. Create_array_5 is slowest as it makes a new array each time you add data to the end.
numpy 的好处不是通过简单地将数据存储在数组中来实现的,它是通过对数组中的许多元素而不是一个一个地执行操作来实现的。您的示例可以归结并优化为这个简单的解决方案,并具有数量级的加速:
...这不是很有帮助,但它确实暗示了成本在哪里。
这是一个增量改进,通过消除超过 4096 个元素的迭代,击败了列表追加解决方案(但只是轻微)。
...但不是我们正在寻找的加速。
真正的节省将通过重新转换渲染例程来实现,这样您就不必为最终放置在缓冲区中的每个值创建一个 python 对象。 tex_left、tex_right...等在哪里?来自?它们是计算出来的还是读取出来的?
The benefit of numpy is not realized by simply storing the data in an array, it is achieved by performing operations across many elements in an array instead of one by one. Your example can be boiled down and optimized to this trivial solution with orders of magnitude speedup:
...that's not very helpful, but it does kind of hint at where the costs are.
Here is an incremental improvement that beats the list append solution (but only slightly) by eliminating the iteration over 4096 elements.
... but not the speedup we are looking for.
The real savings will be realized by a recasting of the render routine so that you don't have to create a python object for every value that ends up being placed in the buffer. Where does tex_left, tex_right...etc. come from? Are they calculated or read?
我知道这看起来很奇怪,但是你尝试过
fromfile
吗?I know it seems strange, but have you tried
fromfile
?