在python中创建可中断进程
我正在创建一个 python 脚本,它解析一个大的(但简单的)CSV。
需要一些时间来处理。我希望能够中断 CSV 的解析,以便我可以在稍后阶段继续。
目前我有这个 - 它位于一个更大的类中:(未完成)
编辑:
我有一些更改的代码。但系统将解析超过 300 万行。
def parseData(self)
reader = csv.reader(open(self.file))
for id, title, disc in reader:
print "%-5s %-50s %s" % (id, title, disc)
l = LegacyData()
l.old_id = int(id)
l.name = title
l.disc_number = disc
l.parsed = False
l.save()
这是旧代码。
def parseData(self):
#first line start
fields = self.data.next()
for row in self.data:
items = zip(fields, row)
item = {}
for (name, value) in items:
item[name] = value.strip()
self.save(item)
谢谢你们。
I'm creating a python script of which parses a large (but simple) CSV.
It'll take some time to process. I would like the ability to interrupt the parsing of the CSV so I can continue at a later stage.
Currently I have this - of which lives in a larger class: (unfinished)
Edit:
I have some changed code. But the system will parse over 3 million rows.
def parseData(self)
reader = csv.reader(open(self.file))
for id, title, disc in reader:
print "%-5s %-50s %s" % (id, title, disc)
l = LegacyData()
l.old_id = int(id)
l.name = title
l.disc_number = disc
l.parsed = False
l.save()
This is the old code.
def parseData(self):
#first line start
fields = self.data.next()
for row in self.data:
items = zip(fields, row)
item = {}
for (name, value) in items:
item[name] = value.strip()
self.save(item)
Thanks guys.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果在 Linux 下,请按 Ctrl-Z 并停止正在运行的进程。输入“fg”将其恢复并从停止的地方开始。
If under linux, hit Ctrl-Z and stop the running process. Type "fg" to bring it back and start where you stopped it.
您可以使用
signal
来捕获事件。这是一个解析器的模型,可以在 Windows 上捕获CTRL-C
并停止解析:但无法测试它,因为我目前在 Linux 上。
You can use
signal
to catch the event. This is a mockup of a parser than can catchCTRL-C
on windows and stop parsing:Can't test it though as I'm on linux at the moment.
我这样做的方式:
将实际的处理代码放在一个类中,在该类上我将实现 Pickle 协议(http://docs.python.org/library/pickle.html)(基本上,编写正确的代码)
__getstate__
和__setstate__
函数)此类将接受文件名、保留打开的文件和 CSV 读取器实例作为实例成员。
__getstate__
方法将保存当前文件位置,setstate 将重新打开文件,将其转发到正确的位置,并创建一个新的读取器。我将在 __iter__ 方法中执行实际工作,该方法在处理每行后会生成一个外部函数。
这个外部函数将运行一个“主循环”,监视中断输入(套接字、键盘、文件系统上特定文件的状态等)——一切都很安静,它只会调用处理器的下一次迭代。如果发生中断,它会将处理器状态pickle到磁盘上的特定文件中。
当启动时,程序只需检查是否有保存的执行,如果有,则使用 pickle 检索执行器对象,然后恢复主循环。
这是一些(未经测试的)代码 - IEA 非常简单:
The way I'd do it:
Puty the actual processing code in a class, and on that class I'd implement the Pickle protocol (http://docs.python.org/library/pickle.html ) (basically, write proper
__getstate__
and__setstate__
functions)This class would accept the filename, keep the open file, and the CSV reader instance as instance members. The
__getstate__
method would save the current file position, and setstate would reopen the file, forward it to the proper position, and create a new reader.I'd perform the actuall work in an
__iter__
method, that would yeld to an external function after each line was processed.This external function would run a "main loop" monitoring input for interrupts (sockets, keyboard, state of an specific file on the filesystem, etc...) - everything being quiet, it would just call for the next iteration of the processor. If an interrupt happens, it would pickle the processor state to an specific file on disk.
When startingm the program just has to check if a there is a saved execution, if so, use pickle to retrieve the executor object, and resume the main loop.
Here goes some (untested) code - the iea is simple enough:
我的完整代码
我遵循了带有标志的@jsbueno的建议 - 但我没有将另一个文件保存在类中作为变量:
我创建一个类 - 当我调用它时要求任何输入,然后开始另一个进程执行我的操作工作。当它循环时 - 如果我按下一个键,则会设置该标志,并且仅在为下一个解析调用循环时才检查该标志。因此我不会终止当前的操作。
在数据库中为我调用的数据中的每个对象添加一个
process
标志意味着我可以随时启动它并从中断处继续。感谢你们所有的帮助(尤其是你们的@jsbueno)——如果没有你们,我就不会得到这门课的想法。
My Complete Code
I followed the advice of @jsbueno with a flag - but instead of another file, I kept it within the class as a variable:
I create a class - when I call it asks for ANY input and then begins another process doing my work. As its looped - if I were to press a key, the flag is set and only checked when the loop is called for my next parse. Thus I don't kill the current action.
Adding a
process
flag in the database for each object from the data I'm calling means I can start this any any time and resume where I left off.Thanks for all you help guys (especially yours @jsbueno) - if it wasn't for you I wouldn't have got this class idea.