将文件下载到内存中

发布于 2024-12-07 14:39:30 字数 593 浏览 1 评论 0原文

我正在编写一个 python 脚本,我只需要一系列非常小的文本文件的第二行。我想提取此文件而不像我目前那样将文件保存到我的硬盘驱动器上。

我发现了一些引用 TempFile 和 StringIO 模块的线程,但我无法理解它们。

目前,我下载所有文件并按顺序命名它们,例如 1.txt、2.txt 等,然后遍历所有文件并提取第二行。我想打开文件,抓取该行,然后继续查找、打开和读取下一个文件。

以下是我目前将其写入硬盘的操作:

while (count4 <= num_files):
    file_p = [directory,str(count4),'.txt']
    file_path = ''.join(file_p)        
    cand_summary = string.strip(linecache.getline(file_path, 2))
    linkFile = open('Summary.txt', 'a')
    linkFile.write(cand_summary)
    linkFile.write("\n")
    count4 = count4 + 1
    linkFile.close()

I am writing a python script and I just need the second line of a series of very small text files. I would like to extract this without saving the file to my harddrive as I currently do.

I have found a few threads that reference the TempFile and StringIO modules but I was unable to make much sense of them.

Currently I download all of the files and name them sequentially like 1.txt, 2.txt, etc, then go through all of them and extract the second line. I would like to open the file grab the line then move on to finding and opening and reading the next file.

Here is what I do currently with writing it to my HDD:

while (count4 <= num_files):
    file_p = [directory,str(count4),'.txt']
    file_path = ''.join(file_p)        
    cand_summary = string.strip(linecache.getline(file_path, 2))
    linkFile = open('Summary.txt', 'a')
    linkFile.write(cand_summary)
    linkFile.write("\n")
    count4 = count4 + 1
    linkFile.close()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

平定天下 2024-12-14 14:39:30

只需将文件写入替换为对列表上的 append() 的调用即可。例如:

summary = []
while (count4 <= num_files):
    file_p = [directory,str(count4),'.txt']
    file_path = ''.join(file_p)        
    cand_summary = string.strip(linecache.getline(file_path, 2))
    summary.append(cand_summary)
    count4 = count4 + 1

顺便说一句,您通常会写 count += 1。而且 count4 看起来也使用从 1 开始的索引。这对于 Python 来说似乎很不寻常。

Just replace the file writing with a call to append() on a list. For example:

summary = []
while (count4 <= num_files):
    file_p = [directory,str(count4),'.txt']
    file_path = ''.join(file_p)        
    cand_summary = string.strip(linecache.getline(file_path, 2))
    summary.append(cand_summary)
    count4 = count4 + 1

As an aside you would normally write count += 1. Also it looks like count4 uses 1-based indexing. That seems pretty unusual for Python.

帅哥哥的热头脑 2024-12-14 14:39:30

您可以在每次迭代中打开和关闭输出文件。

为什么不简单地这样做

with open("Summary.txt", "w") as linkfile:
    while (count4 <= num_files):
        file_p = [directory,str(count4),'.txt']
        file_path = ''.join(file_p)        
        cand_summary = linecache.getline(file_path, 2).strip() # string module is deprecated
        linkFile.write(cand_summary)
        linkFile.write("\n")
        count4 = count4 + 1

另外,linecache 可能不是正确的工具,因为它针对从同一文件中读取多行进行了优化,而不是从多个文件中读取同一行。

相反,最好这样做

with open(file_path, "r") as infile:
    dummy = infile.readline()
    cand_summary = infile.readline.strip()

此外,如果您删除 strip() 方法,则不必重新添加 \n,但谁知道为什么您在其中添加它那里。也许 .lstrip() 会更好?

最后,手动 while 循环是怎么回事?为什么不使用 for 循环呢?

最后,在您发表评论后,我知道您希望将结果放入列表而不是文件中。好的。

总而言之:

summary = []
for count in xrange(num_files):
    file_p = [directory,str(count),'.txt'] # or count+1, if you start at 1
    file_path = ''.join(file_p)        
    with open(file_path, "r") as infile:
        dummy = infile.readline()
        cand_summary = infile.readline().strip()
        summary.append(cand_summary)

You open and close the output file in every iteration.

Why not simply do

with open("Summary.txt", "w") as linkfile:
    while (count4 <= num_files):
        file_p = [directory,str(count4),'.txt']
        file_path = ''.join(file_p)        
        cand_summary = linecache.getline(file_path, 2).strip() # string module is deprecated
        linkFile.write(cand_summary)
        linkFile.write("\n")
        count4 = count4 + 1

Also, linecache is probably not the right tool here since it's optimized for reading multiple lines from the same file, not the same line from multiple files.

Instead, better do

with open(file_path, "r") as infile:
    dummy = infile.readline()
    cand_summary = infile.readline.strip()

Also, if you drop the strip() method, you don't have to re-add the \n, but who knows why you have that in there. Perhaps .lstrip() would be better?

Finally, what's with the manual while loop? Why not use a for loop?

Lastly, after your comment, I understand you want to put the result in a list instead of a file. OK.

All in all:

summary = []
for count in xrange(num_files):
    file_p = [directory,str(count),'.txt'] # or count+1, if you start at 1
    file_path = ''.join(file_p)        
    with open(file_path, "r") as infile:
        dummy = infile.readline()
        cand_summary = infile.readline().strip()
        summary.append(cand_summary)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文