在 Python 中迭代文件的单词
我需要遍历一个大文件的单词,该文件由一条长长的行组成。我知道逐行迭代文件的方法,但由于其单行结构,它们不适用于我的情况。
还有其他选择吗?
I need to iterate through the words of a large file, which consists of a single, long long line. I am aware of methods iterating through the file line by line, however they are not applicable in my case, because of its single line structure.
Any alternatives?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
这实际上取决于您对单词的定义。但试试这个:
这将使用空白字符作为单词边界。
当然,请记住正确打开和关闭文件,这只是一个简单的示例。
It really depends on your definition of word. But try this:
This will use whitespace characters as word boundaries.
Of course, remember to properly open and close the file, this is just a quick example.
长长的队伍?我认为该行太大而无法合理地容纳在内存中,因此您需要某种缓冲。
首先,这是一种不好的格式;如果您对文件有任何形式的控制,请使其每行一个字。
如果没有,请使用类似以下内容:
Long long line? I assume the line is too big to reasonably fit in memory, so you want some kind of buffering.
First of all, this is a bad format; if you have any kind of control over the file, make it one word per line.
If not, use something like:
你真的应该考虑使用 Generator
You really should consider using Generator
有更有效的方法可以做到这一点,但从语法上来说,这可能是最短的:
如果内存是一个问题,你不会想要这样做,因为它将把整个东西加载到内存中,而不是迭代它。
There are more efficient ways of doing this, but syntactically, this might be the shortest:
If memory is a concern, you aren't going to want to do this because it will load the entire thing into memory, instead of iterating over it.
我之前回答过类似的问题,但我已经改进了该答案中使用的方法,这是更新的版本(从最近的答案复制):
I've answered a similar question before, but I have refined the method used in that answer and here is the updated version (copied from a recent answer):
像平常一样读入该行,然后将其拆分为空格以将其分解为单词?
像这样的东西:
Read in the line as normal, then split it on whitespace to break it down into words?
Something like:
读完这句话后,你可以这样做:
Alex。
After reading the line you could do:
Alex.
唐纳德·迈纳的建议看起来不错。简单又简短。我在前段时间编写的代码中使用了以下内容:
Donald Miner 建议的更长版本。
What Donald Miner suggested looks good. Simple and short. I used the below in a code that I have written some time ago:
longer version of what Donald Miner suggested.