Python:itertools.islice 不在循环中工作

发布于 2024-10-17 14:48:04 字数 503 浏览 4 评论 0原文

我有这样的代码:

#opened file f
goto_line = num_lines #Total number of lines
while not found:
   line_str = next(itertools.islice(f, goto_line - 1, goto_line))
   goto_line = goto_line/2
   #checks for data, sets found to True if needed

line_str 在第一遍中是正确的,但之后的每遍都读取不同的行。

例如,goto_line 从 1000 开始。它可以很好地读取第 1000 行。然后下一个循环,goto_line 是 500,但它不读取第 500 行。它读取一些接近 1000 的行。

我试图读取大文件中的特定行,而不读取超出必要的行。有时它向后跳到一行,有时向前跳。

我确实尝试过 linecache,但我通常不会在同一个文件上多次运行此代码。

I have code like this:

#opened file f
goto_line = num_lines #Total number of lines
while not found:
   line_str = next(itertools.islice(f, goto_line - 1, goto_line))
   goto_line = goto_line/2
   #checks for data, sets found to True if needed

line_str is correct the first pass, but every pass after that is reading a different line then it should.

So for example, goto_line starts off as 1000. It reads line 1000 just fine. Then the next loop, goto_line is 500 but it doesn't read line 500. It reads some line closer to 1000.

I'm trying to read specific lines in a large file without reading more than necessary. Sometimes it jumps backwards to a line and sometimes forward.

I did try linecache, but I typically don't run this code more than once on the same file.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

青丝拂面 2024-10-24 14:48:04

Python 迭代器只能使用一次。通过示例最容易看出这一点。下面的代码

from itertools import islice
a = range(10)
i = iter(a)
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))

打印

[1, 2]
[4, 5]
[7, 8]
[]

切片总是从我们上次停止的地方开始。

让代码正常工作的最简单方法是使用 f.readlines() 获取文件中的行列表,然后使用普通的 Python 列表切片 [i:j]< /代码>。如果你确实想使用islice(),你可以使用f.seek(0)每次从头开始读取文件,但这会非常低效。

Python iterators can be consumed only once. This is easiest seen by example. The following code

from itertools import islice
a = range(10)
i = iter(a)
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))

prints

[1, 2]
[4, 5]
[7, 8]
[]

The slicing always starts where we stopped last time.

The easiest way to make your code work is to use the f.readlines() to get a list of the lines in the file and then use normal Python list slicing [i:j]. If you really want to use islice(), you could start reading the file from the beginning each time by using f.seek(0), but this will be very inefficient.

掩饰不了的爱 2024-10-24 14:48:04

您不能(这样 - 也许有某种方法取决于文件的打开方式)返回文件。标准文件迭代器(事实上,大多数迭代器 - Python 的迭代器协议仅支持前向迭代器)向前移动。因此,在读取 k 行后,再读取 k/2 行实际上给出了第 k+k/2 行。

可以尝试将整个文件读入内存,但是您有大量数据,因此内存消耗可能会成为一个问题。您可以使用 file.seek 滚动浏览文件。但这仍然有很多工作 - 也许您可以使用内存映射文件?但这只有在线条尺寸固定的情况下才有可能。如果有必要,您可以预先计算要检查的行号并保存所有这些行(不应该太多,大致为 int(log_2(line_count)) + 1 如果我没有弄错)在一次迭代中,这样您就不必在阅读整个文件后向后滚动。

You cannot (this way - perhaps there is some way depending on how the file is opened) go back in the file. The standard file iterator (in fact, most iterators - Python's iterator protocol only supports forward iterators) moves only forward. So after reading k lines, reading another k/2 lines actually gives the k+k/2th line.

You could try reading the whole file into memory, but you have a lot of data so memory consumption propably becomes an issue. You could use file.seek to scroll through the file. But that's still a lot of work - perhaps you could use a memory-mapped file? That's only possible if lines are fixed-size though. If it's necessary, you could pre-calculate the line numbers you'd like to check and save all those lines (shouldn't be too much, roughly int(log_2(line_count)) + 1 if I'm not mistaken) in one iteration so you don't have to scroll back after reading the whole file.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文