如何从流中读取 CSV 文件并在写入时处理每一行?

发布于 2024-11-18 15:59:43 字数 914 浏览 0 评论 0原文

我想从标准输入读取 CSV 文件并处理每一行。我的 CSV 输出代码逐行写入行,但我的读者在迭代行之前等待流终止。这是 csv 模块的限制吗?我做错了什么吗?

我的读者代码:

import csv
import sys
import time


reader = csv.reader(sys.stdin)
for row in reader:
    print "Read: (%s) %r" % (time.time(), row)

我的作者代码:

import csv
import sys
import time


writer = csv.writer(sys.stdout)
for i in range(8):
    writer.writerow(["R%d" % i, "$" * (i+1)])
    sys.stdout.flush()
    time.sleep(0.5)

python test_writer.py | 的输出python test_reader.py

Read: (1309597426.3) ['R0', '$']
Read: (1309597426.3) ['R1', '$$']
Read: (1309597426.3) ['R2', '$$$']
Read: (1309597426.3) ['R3', '$$$$']
Read: (1309597426.3) ['R4', '$$$$$']
Read: (1309597426.3) ['R5', '$$$$$$']
Read: (1309597426.3) ['R6', '$$$$$$$']
Read: (1309597426.3) ['R7', '$$$$$$$$']

如您所见,所有打印语句都是同时执行的,但我预计会有 500 毫秒的间隙。

I would like to read a CSV file from the standard input and process each row as it comes. My CSV outputting code writes rows one by one, but my reader waits the stream to be terminated before iterating the rows. Is this a limitation of csv module? Am I doing something wrong?

My reader code:

import csv
import sys
import time


reader = csv.reader(sys.stdin)
for row in reader:
    print "Read: (%s) %r" % (time.time(), row)

My writer code:

import csv
import sys
import time


writer = csv.writer(sys.stdout)
for i in range(8):
    writer.writerow(["R%d" % i, "$" * (i+1)])
    sys.stdout.flush()
    time.sleep(0.5)

Output of python test_writer.py | python test_reader.py:

Read: (1309597426.3) ['R0', '

As you can see all print statements are executed at the same time, but I expect there to be a 500ms gap.

] Read: (1309597426.3) ['R1', '$'] Read: (1309597426.3) ['R2', '$

As you can see all print statements are executed at the same time, but I expect there to be a 500ms gap.

] Read: (1309597426.3) ['R3', '$$'] Read: (1309597426.3) ['R4', '$$

As you can see all print statements are executed at the same time, but I expect there to be a 500ms gap.

] Read: (1309597426.3) ['R5', '$$$'] Read: (1309597426.3) ['R6', '$$$

As you can see all print statements are executed at the same time, but I expect there to be a 500ms gap.

] Read: (1309597426.3) ['R7', '$$$$']

As you can see all print statements are executed at the same time, but I expect there to be a 500ms gap.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

空城缀染半城烟沙 2024-11-25 15:59:43

正如文档中所述

为了使 for 循环成为循环文件行的最有效方式(一种非常常见的操作),next() 方法使用隐藏的预读缓冲区。

您可以通过查看 csv 的实现来了解code> module(第 784 行),csv.reader 调用底层迭代器的 next() 方法(通过PyIter_Next)。

因此,如果您确实想要无缓冲读取 CSV 文件,则需要将文件对象(此处为 sys.stdin )转换为迭代器,其 next() 方法实际上调用 readline() 相反。这可以使用 iter函数。因此,将 test_reader.py 中的代码更改为如下所示:

for row in csv.reader(iter(sys.stdin.readline, '')):
    print("Read: ({}) {!r}".format(time.time(), row))

例如,

$ python test_writer.py | python test_reader.py
Read: (1388776652.964925) ['R0', '

您能解释一下为什么需要无缓冲读取 CSV 文件吗?无论您尝试做什么,都可能有更好的解决方案。

] Read: (1388776653.466134) ['R1', '$'] Read: (1388776653.967327) ['R2', '$

您能解释一下为什么需要无缓冲读取 CSV 文件吗?无论您尝试做什么,都可能有更好的解决方案。

] Read: (1388776654.468532) ['R3', '$$'] [etc]

您能解释一下为什么需要无缓冲读取 CSV 文件吗?无论您尝试做什么,都可能有更好的解决方案。

As it says in the documentation,

In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer.

And you can see by looking at the implementation of the csv module (line 784) that csv.reader calls the next() method of the underlyling iterator (via PyIter_Next).

So if you really want unbuffered reading of CSV files, you need to convert the file object (here sys.stdin) into an iterator whose next() method actually calls readline() instead. This can easily be done using the two-argument form of the iter function. So change the code in test_reader.py to something like this:

for row in csv.reader(iter(sys.stdin.readline, '')):
    print("Read: ({}) {!r}".format(time.time(), row))

For example,

$ python test_writer.py | python test_reader.py
Read: (1388776652.964925) ['R0', '

Can you explain why you need unbuffered reading of CSV files? There might be a better solution to whatever it is you are trying to do.

] Read: (1388776653.466134) ['R1', '$'] Read: (1388776653.967327) ['R2', '$

Can you explain why you need unbuffered reading of CSV files? There might be a better solution to whatever it is you are trying to do.

] Read: (1388776654.468532) ['R3', '$$'] [etc]

Can you explain why you need unbuffered reading of CSV files? There might be a better solution to whatever it is you are trying to do.

幸福%小乖 2024-11-25 15:59:43

也许这是一个限制。阅读此http://docs.python.org/using/cmdline .html#cmdoption-unittest-discover-u

注意有内部缓冲
在 file.readlines() 和文件对象中
(对于 sys.stdin 中的行)这不是
受此选项的影响。去上班
围绕这个,你会想要使用
一段时间内的 file.readline() 1:
循环。

我修改了 test_reader.py 如下:

import csv, sys, time

while True:
    print "Read: (%s) %r" % (time.time(), sys.stdin.readline())

输出

python test_writer.py | python  test_reader.py
Read: (1309600865.84) 'R0,$\r\n'
Read: (1309600865.84) 'R1,$\r\n'
Read: (1309600866.34) 'R2,$$\r\n'
Read: (1309600866.84) 'R3,$$\r\n'
Read: (1309600867.34) 'R4,$$$\r\n'
Read: (1309600867.84) 'R5,$$$\r\n'
Read: (1309600868.34) 'R6,$$$$\r\n'
Read: (1309600868.84) 'R7,$$$$\r\n'

Maybe it's a limitation. Read this http://docs.python.org/using/cmdline.html#cmdoption-unittest-discover-u

Note that there is internal buffering
in file.readlines() and File Objects
(for line in sys.stdin) which is not
influenced by this option. To work
around this, you will want to use
file.readline() inside a while 1:
loop.

I modified test_reader.py as follows :

import csv, sys, time

while True:
    print "Read: (%s) %r" % (time.time(), sys.stdin.readline())

Output

python test_writer.py | python  test_reader.py
Read: (1309600865.84) 'R0,$\r\n'
Read: (1309600865.84) 'R1,$\r\n'
Read: (1309600866.34) 'R2,$$\r\n'
Read: (1309600866.84) 'R3,$$\r\n'
Read: (1309600867.34) 'R4,$$$\r\n'
Read: (1309600867.84) 'R5,$$$\r\n'
Read: (1309600868.34) 'R6,$$$$\r\n'
Read: (1309600868.84) 'R7,$$$$\r\n'
寒江雪… 2024-11-25 15:59:43

您正在刷新标准输出,但没有刷新标准输入。

Sys.stdin 还有一个 flush() 方法,如果您确实想禁用缓冲,请尝试在每行读取后使用该方法。

You are flushing stdout, but not stdin.

Sys.stdin also has a flush() method, try using that after each line read if you really want to disable the buffering.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文