发电机功能性能
我试图了解生成器函数的性能。我使用 cProfile 和 pstats 模块来收集和检查分析数据。有问题的函数是这样的:
def __iter__(self):
delimiter = None
inData = self.inData
lenData = len(inData)
cursor = 0
while cursor < lenData:
if delimiter:
mo = self.stringEnd[delimiter].search(inData[cursor:])
else:
mo = self.patt.match(inData[cursor:])
if mo:
mo_lastgroup = mo.lastgroup
mstart = cursor
mend = mo.end()
cursor += mend
delimiter = (yield (mo_lastgroup, mo.group(mo_lastgroup), mstart, mend))
else:
raise SyntaxError("Unable to tokenize text starting with: \"%s\"" % inData[cursor:cursor+200])
self.inData
是一个unicode文本字符串,self.stringEnd
是一个带有4个简单正则表达式的字典,self.patt是一个大正则表达式。整个过程就是将大字符串一一拆分成较小的字符串。
对使用它的程序进行性能分析,我发现程序运行时间的最大部分花费在这个函数上:
In [800]: st.print_stats("Scanner.py:124")
463263 function calls (448688 primitive calls) in 13.091 CPU seconds
Ordered by: cumulative time
List reduced from 231 to 1 due to restriction <'Scanner.py:124'>
ncalls tottime percall cumtime percall filename:lineno(function)
10835 11.465 0.001 11.534 0.001 Scanner.py:124(__iter__)
但是查看函数本身的性能分析,在函数的子调用上花费的时间并不多:
In [799]: st.print_callees("Scanner.py:124")
Ordered by: cumulative time
List reduced from 231 to 1 due to restriction <'Scanner.py:124'>
Function called...
ncalls tottime cumtime
Scanner.py:124(__iter__) -> 10834 0.006 0.006 {built-in method end}
10834 0.009 0.009 {built-in method group}
8028 0.030 0.030 {built-in method match}
2806 0.025 0.025 {built-in method search}
1 0.000 0.000 {len}
其余的除了while、赋值和if-else之外,功能并不多。即使我使用的生成器上的 send
方法也很快:
ncalls tottime percall cumtime percall filename:lineno(function)
13643/10835 0.007 0.000 11.552 0.001 {method 'send' of 'generator' objects}
是否有可能将值传递回消费者的 yield
占用了大部分时间? !还有什么我不知道的吗?
编辑:
我可能应该提到生成器函数__iter__
是一个小类的方法,因此self
指的是此类的实例。
I'm trying to understand the performance of a generator function. I've used cProfile and the pstats module to collect and inspect profiling data. The function in question is this:
def __iter__(self):
delimiter = None
inData = self.inData
lenData = len(inData)
cursor = 0
while cursor < lenData:
if delimiter:
mo = self.stringEnd[delimiter].search(inData[cursor:])
else:
mo = self.patt.match(inData[cursor:])
if mo:
mo_lastgroup = mo.lastgroup
mstart = cursor
mend = mo.end()
cursor += mend
delimiter = (yield (mo_lastgroup, mo.group(mo_lastgroup), mstart, mend))
else:
raise SyntaxError("Unable to tokenize text starting with: \"%s\"" % inData[cursor:cursor+200])
self.inData
is a unicode text string, self.stringEnd
is a dict with 4 simple regex's, self.patt is one big regex. The whole thing is to split the big string into smaller strings, one-by-one.
Profiling a program that uses it I found that the biggest part of the program's run time is spent in this function:
In [800]: st.print_stats("Scanner.py:124")
463263 function calls (448688 primitive calls) in 13.091 CPU seconds
Ordered by: cumulative time
List reduced from 231 to 1 due to restriction <'Scanner.py:124'>
ncalls tottime percall cumtime percall filename:lineno(function)
10835 11.465 0.001 11.534 0.001 Scanner.py:124(__iter__)
But looking at the profile of the function itself, there is not much time spent in the sub-calls of functions:
In [799]: st.print_callees("Scanner.py:124")
Ordered by: cumulative time
List reduced from 231 to 1 due to restriction <'Scanner.py:124'>
Function called...
ncalls tottime cumtime
Scanner.py:124(__iter__) -> 10834 0.006 0.006 {built-in method end}
10834 0.009 0.009 {built-in method group}
8028 0.030 0.030 {built-in method match}
2806 0.025 0.025 {built-in method search}
1 0.000 0.000 {len}
The rest of the function is not much besides while, assignments and if-else. Even the send
method on the generator which I use is fast:
ncalls tottime percall cumtime percall filename:lineno(function)
13643/10835 0.007 0.000 11.552 0.001 {method 'send' of 'generator' objects}
Is it possible that the yield
, passing a value back to the consumer, is taking the majority of the time?! Anything else that I'm not aware of?
EDIT:
I probably should have mentioned that the generator function __iter__
is a method of a small class, so self
refers to an instance of this class.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这实际上是 Dunes 的答案,不幸的是,他只是将其作为评论给出,并且似乎并不倾向于将其放在正确的答案。
性能的主要罪魁祸首是字符串切片。一些计时测量表明,大切片(意味着从已经很大的字符串中取出大切片)的切片性能明显下降。为了解决这个问题,我现在使用正则表达式对象方法的 pos 参数:
感谢 所有提供帮助的人。
This is actually the answer of Dunes, who unfortunately only gave it as a comment and doesn't seem to be inclined to put it in a proper answer.
The main performance culprit were the string slices. Some timing measurements showed that slicing performance degrades perceivably with big slices (meaning taking a big slice from an already big string). To work around that I now use the
pos
parameter for the regex object methods:Thanks to all who helped.
如果正确读取示例,则您将获取一个生成器对象,将其放入
delimiter
中,并将其用于数组查找。这可能不是你的速度问题,但我很确定这是一个错误。If reading your sample correctly, you are taking a generator object, putting it into
delimiter
, and using it for an array lookup. That may not be your speed issue, but I'm pretty sure that's a bug.