Win32 事件循环似乎是程序瓶颈
我正在用 Pyglet 用 Python 制作一个游戏。我刚刚完成显示部分,并遇到速度问题。像一个好人一样,我进行了分析,并得到了以下内容:(排除了无趣的部分;目前,当我用随机的洋红色和白色按下箭头键时,它只是重新绘制屏幕
15085326 function calls (15085306 primitive calls) in 32.166 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 32.168 32.168 <string>:1(<module>)
120139 0.499 0.000 0.686 0.000 allocation.py:132(alloc)
120121 0.563 0.000 0.844 0.000 allocation.py:268(dealloc)
99 0.743 0.008 20.531 0.207 engine.py:58(Update)
237600 0.796 0.000 11.995 0.000 sprite.py:349(_set_texture)
120121 0.677 0.000 9.062 0.000 sprite.py:365(_create_vertex_list)
357721 1.487 0.000 3.478 0.000 sprite.py:377(_update_position)
420767 0.786 0.000 2.054 0.000 vertexbuffer.py:421(get_region)
715442 0.859 0.000 1.280 0.000 vertexbuffer.py:467(invalidate)
1 9.674 9.674 32.168 32.168 win32.py:46(run)
180 0.007 0.000 1.771 0.010 win32.py:83(_timer_func)
237600 0.416 0.000 17.069 0.000 window.py:60(SetTile)
237600 0.646 0.000 2.174 0.000 window.py:72(GetTileTexture)
)总时间 0.5 秒已经被删除了,差不多了。主要是那些不会成为问题的东西。
这是我敲了半分钟键盘的结果。在大多数情况下,我每秒可以更改 2 或 3 次屏幕。我个人希望能尽快敲击键盘。哎呀,我的目标是 50-60fps。
win32 运行 10 秒没有花在子函数上是我担心的事情。这可能是空闲时间(即使有一个 pyglet 空闲),但这不会花在绘图上吗?
我以为慢的部分其实很快;窗口SetTile部分。为了处理图块,我有一个 2D 精灵列表,它们在屏幕上表示它们并简单地更改图像。 我认为这不是问题。
我看到的另一个潜在问题是我的更新 - 每次调用时我都必须迭代大约 2400 个图块。然而,情况似乎并没有那么糟糕。 90 次按键仅需 0.7 秒。
我开始怀疑这是否表明 Python 速度太慢,无法满足我的需求。话又说回来,不应该这样。我正在做的计算量并不是太大。
tl;dr Python 中的 win32 事件循环是我的瓶颈吗?这意味着什么?如果没有,我可能在哪里失去速度?
如果需要,可以提供代码。我假设它是 pyglet 使用的 Pywin32。
I am making a game in Python with Pyglet. I have just finished the display part, and getting issues with speed. Like a good person, I profiled, and got the following: (uninteresting bits excluded; currently it just redraws the screen when I push an arrow key with random magenta and white)
15085326 function calls (15085306 primitive calls) in 32.166 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 32.168 32.168 <string>:1(<module>)
120139 0.499 0.000 0.686 0.000 allocation.py:132(alloc)
120121 0.563 0.000 0.844 0.000 allocation.py:268(dealloc)
99 0.743 0.008 20.531 0.207 engine.py:58(Update)
237600 0.796 0.000 11.995 0.000 sprite.py:349(_set_texture)
120121 0.677 0.000 9.062 0.000 sprite.py:365(_create_vertex_list)
357721 1.487 0.000 3.478 0.000 sprite.py:377(_update_position)
420767 0.786 0.000 2.054 0.000 vertexbuffer.py:421(get_region)
715442 0.859 0.000 1.280 0.000 vertexbuffer.py:467(invalidate)
1 9.674 9.674 32.168 32.168 win32.py:46(run)
180 0.007 0.000 1.771 0.010 win32.py:83(_timer_func)
237600 0.416 0.000 17.069 0.000 window.py:60(SetTile)
237600 0.646 0.000 2.174 0.000 window.py:72(GetTileTexture)
Everything which took < 0.5 seconds for total time has been removed, pretty much. Mostly stuff that couldn't be a problem.
This is the result of me hitting the keyboard for half a minute. For the most part, I could get 2 or 3 changes of screen per second.. I would personally like as fast as I could hit the keyboard. Heck, my aim is a good 50-60fps.
The win32 run being 10 seconds not spent in subfunctions is what worries me. It could be idle time (even though there is a pyglet idle), but wouldn't that be spent drawing?
The part I thought was slow was actually fast; the window SetTile part. To deal with the tiles, I have a 2D list of sprites that represent them on screen and simply alter the images.
I don't think that's an issue.
The other potential problem I saw was my Update - I have to iterate across ~2400 tiles each time it is called. However, it doesn't seem all that bad. Only 0.7 seconds for 90 keypresses.
I start to wonder if this is a sign that Python is too slow for my needs. Then again, it shouldn't be. It's not too much of a computationally heavy thing I'm doing.
tl;dr Is the win32 event loop in Python my bottleneck, and what does that mean? If not, where may I have lost speed?
Code available if needed. I assume it's Pywin32 used by pyglet.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
修改后的答案:
我删除了那些毫无价值的信息列,例如自我时间、呼叫计数和每次呼叫时间。
然后我按照时间降序排列它们,并丢弃小的。
Cumtime 是指特定例程在调用堆栈上的总时间。
因此,很自然地,一些高级例程会在堆栈上停留 32 秒。
其他人在堆栈上的时间较少。
例如,
_set_texture
大约有 1/3 的时间处于活动状态,而_create_vertex_list
也大约有 1/3 的时间处于活动状态。这表明顶点被创建了很多,而不是被重复使用,所以也许通过不重新创建它们可以节省大约 30% 的时间。
但这只是一个猜测。无需猜测。
你需要知道的是时间语句的分数(不仅仅是函数)
您的代码在堆栈上处于活动状态。
您需要知道这一点,因为如果存在性能问题,
就是这么一行代码。
如果您遇到问题,请按以下步骤查找问题。
探查器似乎基于gprof,并且这里有一些关于的评论。
REVISED Answer:
I deleted the columns that are worthless information, such as self time, call count, and per-call time.
Then I arranged them in descending order by cumtime, and discarded the small ones.
Cumtime means the total amount of time that particular routine was on the call stack.
So naturally some high-level routines were on the stack for all 32 seconds.
Others were on the stack a smaller fraction of the time.
For example,
_set_texture
was active about 1/3 the time, while_create_vertex_list
was also active about 1/3 of the time.That suggests vertices are being created a lot, rather than being re-used, so maybe you could save about 30% of time by not recreating them.
But that's just a guess. There is no need to guess.
What you need to know is the fraction of time statements (not just functions)
in your code were active on the stack.
You need to know that because if there is a performance problem,
it is such a line of code.
Here's how the problem can be found if you have one.
The profiler seems based on gprof, and here are some comments about that.