re.compile(pattern, file) 调用导致系统崩溃
我有一个需要解析的文件。解析是增量构建的,以便在每次迭代时表达式变得更加具体。
使系统超载的代码段大致如下所示:
for item in ret:
pat = r'a\sstyle=".+class="VEAPI_Pushpin"\sid="msftve(.+?)".+>%s<'%item[1]
r=re.compile(pat, re.DOTALL)
match = r.findall(f)
该文件是一个相当大的 HTML 文件(从 bing 地图解析),每个答案必须与其确切的 id 匹配。
在应用此更改之前,工作流程非常好。我可以做些什么来避免这种情况吗?或者优化代码?
I have a file I need to parse. The parsing is built incrementally, such that on each iteration the expressions becomes more case specific.
The code segment which overloads the system looks roughly like this:
for item in ret:
pat = r'a\sstyle=".+class="VEAPI_Pushpin"\sid="msftve(.+?)".+>%s<'%item[1]
r=re.compile(pat, re.DOTALL)
match = r.findall(f)
The file is a rather large HTML file (parsed from bing maps), and each answer must match its exact id.
Before appying this change the workflow was very good. Is there anything I can do to avoid this? Or to optimize the code?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我唯一的猜测是你得到了太多的匹配并且内存不足。虽然这看起来不太合理,但情况可能就是这样。尝试使用 finditer 而不是 findall 一次获取一个匹配项,而无需创建一个巨大的匹配项列表。如果这不能解决您的问题,您可能在 re 模块中偶然发现了更严重的错误。
My only guess is that you are getting too many matches and running out of memory. Though this doesn't seem very reasonable, it might be the case. Try using finditer instead of findall to get one match at a time without creating a monster list of matches. If that doesn't fix your problem, you might have stumbled on a more serious bug in the re module.