按行读取行,并将匹配列保存到新文档python -beginne

发布于 2025-01-30 11:26:44 字数 919 浏览 5 评论 0原文

我需要说我是编码的初学者,如果您经验丰富,这很容易,但是我找不到与我的任何问题,所以我要在这里发送。

我有一个庞大的文档(〜40 GB),这是一个表内容。该文件分为列之间,它们之间有许多空白。另外,我在另一个文档(一个很小的文档)中有一个列表,其中我的名称也包含在大型文档中。如果我的代码在此列中找到了名称与列表中的名称之一匹配的行,那么我想将每个线附加到另一个列表中以仅使用此特定的列表:

我在蜘蛛网上创建了以下代码:

import pandas as pd 
station_list = pd.read_fwf(r"D:\Não apagar arquivos importantes\Desktop\Para dados grandes\Lisn.txt", header = None, skiprows = 1)



saved=[]
with open (r"D:\Não apagar arquivos importantes\Downloads\Out 2016\los_20161015.001.h5.txt") as f:
    for line in f:
        lst=line.split()
        sline=" ".join(lst)
        tag=lst[12]
        if tag in station_list:
            saved.append(sline)

因此, 当我运行它时,它会很快逐行读取行,但是..即使我在file Station_list上的大多数名称都在F上的列内,但它并不能将Sline保存到我的新保存列表中。

它可以是什么?有人有建议吗?是否有更有效的使用大文件的方法?

变量的图像我得到了 谢谢

I need to say that I am a beginner at coding and maybe this can be easy if you are more experienced but I couldn't find any problem similar to mine so I am sending here.

I have a huge document (~40 GB) that is a table content. This file is divided in columns with many blank spaces between them. Also, I have a list in another document (a very small one) where I have some names that is also contained in the huge document. If my code find lines that in this column the name matches with one of the names I have on the list, then I want to append everyline to another list to use only this specific ones:

So I created the following code on spider:

import pandas as pd 
station_list = pd.read_fwf(r"D:\Não apagar arquivos importantes\Desktop\Para dados grandes\Lisn.txt", header = None, skiprows = 1)



saved=[]
with open (r"D:\Não apagar arquivos importantes\Downloads\Out 2016\los_20161015.001.h5.txt") as f:
    for line in f:
        lst=line.split()
        sline=" ".join(lst)
        tag=lst[12]
        if tag in station_list:
            saved.append(sline)

So, when I run it, it reads line by line very quickly but.. Even though most of the names I have on the file station_list are inside the column on f, it do not save the slines to my new saved list.

What can it be?? Does anybody have a suggestion? Is there any more effective way to use big files?

Image of the variables I get
Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文