如何从流中读取 CSV 文件并在写入时处理每一行?
我想从标准输入读取 CSV 文件并处理每一行。我的 CSV 输出代码逐行写入行,但我的读者在迭代行之前等待流终止。这是 csv 模块的限制吗?我做错了什么吗?
我的读者代码:
import csv
import sys
import time
reader = csv.reader(sys.stdin)
for row in reader:
print "Read: (%s) %r" % (time.time(), row)
我的作者代码:
import csv
import sys
import time
writer = csv.writer(sys.stdout)
for i in range(8):
writer.writerow(["R%d" % i, "$" * (i+1)])
sys.stdout.flush()
time.sleep(0.5)
python test_writer.py | 的输出python test_reader.py
:
Read: (1309597426.3) ['R0', '$']
Read: (1309597426.3) ['R1', '$$']
Read: (1309597426.3) ['R2', '$$$']
Read: (1309597426.3) ['R3', '$$$$']
Read: (1309597426.3) ['R4', '$$$$$']
Read: (1309597426.3) ['R5', '$$$$$$']
Read: (1309597426.3) ['R6', '$$$$$$$']
Read: (1309597426.3) ['R7', '$$$$$$$$']
如您所见,所有打印语句都是同时执行的,但我预计会有 500 毫秒的间隙。
I would like to read a CSV file from the standard input and process each row as it comes. My CSV outputting code writes rows one by one, but my reader waits the stream to be terminated before iterating the rows. Is this a limitation of csv
module? Am I doing something wrong?
My reader code:
import csv
import sys
import time
reader = csv.reader(sys.stdin)
for row in reader:
print "Read: (%s) %r" % (time.time(), row)
My writer code:
import csv
import sys
import time
writer = csv.writer(sys.stdout)
for i in range(8):
writer.writerow(["R%d" % i, "$" * (i+1)])
sys.stdout.flush()
time.sleep(0.5)
Output of python test_writer.py | python test_reader.py
:
Read: (1309597426.3) ['R0', '
As you can see all print statements are executed at the same time, but I expect there to be a 500ms gap.
]
Read: (1309597426.3) ['R1', '$']
Read: (1309597426.3) ['R2', '$
As you can see all print statements are executed at the same time, but I expect there to be a 500ms gap.
]
Read: (1309597426.3) ['R3', '$$']
Read: (1309597426.3) ['R4', '$$
As you can see all print statements are executed at the same time, but I expect there to be a 500ms gap.
]
Read: (1309597426.3) ['R5', '$$$']
Read: (1309597426.3) ['R6', '$$$
As you can see all print statements are executed at the same time, but I expect there to be a 500ms gap.
]
Read: (1309597426.3) ['R7', '$$$$']
As you can see all print statements are executed at the same time, but I expect there to be a 500ms gap.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
正如文档中所述,
您可以通过查看 csv 的实现来了解code> module(第 784 行),
csv.reader
调用底层迭代器的next()
方法(通过PyIter_Next
)。因此,如果您确实想要无缓冲读取 CSV 文件,则需要将文件对象(此处为 sys.stdin )转换为迭代器,其
next()
方法实际上调用readline()
相反。这可以使用iter
函数。因此,将
test_reader.py
中的代码更改为如下所示:例如,
您能解释一下为什么需要无缓冲读取 CSV 文件吗?无论您尝试做什么,都可能有更好的解决方案。
As it says in the documentation,
And you can see by looking at the implementation of the
csv
module (line 784) thatcsv.reader
calls thenext()
method of the underlyling iterator (viaPyIter_Next
).So if you really want unbuffered reading of CSV files, you need to convert the file object (here
sys.stdin
) into an iterator whosenext()
method actually callsreadline()
instead. This can easily be done using the two-argument form of theiter
function. So change the code intest_reader.py
to something like this:For example,
Can you explain why you need unbuffered reading of CSV files? There might be a better solution to whatever it is you are trying to do.
也许这是一个限制。阅读此http://docs.python.org/using/cmdline .html#cmdoption-unittest-discover-u
我修改了 test_reader.py 如下:
输出
Maybe it's a limitation. Read this http://docs.python.org/using/cmdline.html#cmdoption-unittest-discover-u
I modified test_reader.py as follows :
Output
您正在刷新标准输出,但没有刷新标准输入。
Sys.stdin
还有一个flush()
方法,如果您确实想禁用缓冲,请尝试在每行读取后使用该方法。You are flushing stdout, but not stdin.
Sys.stdin
also has aflush()
method, try using that after each line read if you really want to disable the buffering.