Python：itertools.islice 不在循环中工作

发布于 2024-10-17 14:48:04 字数 503 浏览 4 评论 0原文

我有这样的代码：

#opened file f
goto_line = num_lines #Total number of lines
while not found:
   line_str = next(itertools.islice(f, goto_line - 1, goto_line))
   goto_line = goto_line/2
   #checks for data, sets found to True if needed

line_str 在第一遍中是正确的，但之后的每遍都读取不同的行。

例如，goto_line 从 1000 开始。它可以很好地读取第 1000 行。然后下一个循环，goto_line 是 500，但它不读取第 500 行。它读取一些接近 1000 的行。

我试图读取大文件中的特定行，而不读取超出必要的行。有时它向后跳到一行，有时向前跳。

我确实尝试过 linecache，但我通常不会在同一个文件上多次运行此代码。

原文

I have code like this:

#opened file f
goto_line = num_lines #Total number of lines
while not found:
   line_str = next(itertools.islice(f, goto_line - 1, goto_line))
   goto_line = goto_line/2
   #checks for data, sets found to True if needed

line_str is correct the first pass, but every pass after that is reading a different line then it should.

So for example, goto_line starts off as 1000. It reads line 1000 just fine. Then the next loop, goto_line is 500 but it doesn't read line 500. It reads some line closer to 1000.

I'm trying to read specific lines in a large file without reading more than necessary. Sometimes it jumps backwards to a line and sometimes forward.

I did try linecache, but I typically don't run this code more than once on the same file.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

青丝拂面 2024-10-24 14:48:04

Python 迭代器只能使用一次。通过示例最容易看出这一点。下面的代码

from itertools import islice
a = range(10)
i = iter(a)
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))

打印

[1, 2]
[4, 5]
[7, 8]
[]

切片总是从我们上次停止的地方开始。

让代码正常工作的最简单方法是使用 f.readlines() 获取文件中的行列表，然后使用普通的 Python 列表切片 [i:j]< /代码>。如果你确实想使用islice()，你可以使用f.seek(0)每次从头开始读取文件，但这会非常低效。

Python iterators can be consumed only once. This is easiest seen by example. The following code

from itertools import islice
a = range(10)
i = iter(a)
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))
print list(islice(i, 1, 3))

prints

[1, 2]
[4, 5]
[7, 8]
[]

The slicing always starts where we stopped last time.

The easiest way to make your code work is to use the f.readlines() to get a list of the lines in the file and then use normal Python list slicing [i:j]. If you really want to use islice(), you could start reading the file from the beginning each time by using f.seek(0), but this will be very inefficient.

回复收藏 0 原文

掩饰不了的爱 2024-10-24 14:48:04

您不能（这样 - 也许有某种方法取决于文件的打开方式）返回文件。标准文件迭代器（事实上，大多数迭代器 - Python 的迭代器协议仅支持前向迭代器）仅向前移动。因此，在读取 k 行后，再读取 k/2 行实际上给出了第 k+k/2 行。

您可以尝试将整个文件读入内存，但是您有大量数据，因此内存消耗可能会成为一个问题。您可以使用 file.seek 滚动浏览文件。但这仍然有很多工作 - 也许您可以使用内存映射文件？但这只有在线条尺寸固定的情况下才有可能。如果有必要，您可以预先计算要检查的行号并保存所有这些行（不应该太多，大致为 int(log_2(line_count)) + 1 如果我没有弄错）在一次迭代中，这样您就不必在阅读整个文件后向后滚动。