使用 csv modele 从较大的文件中提取特定的文本行

发布于 2024-09-10 12:13:02 字数 458 浏览 3 评论 0原文

所以我使用这个程序从这个更大的文件中提取我想要的行:

import csv

name = ['NAMETHEFIRST,' 'NAMEANOTHERNAME ']
data = csv.reader(open('C:\\bigfile.csv'))

with open('C:\\smalldataset.xcl','w') as outf:
    csv.writer(outf).writerows(l for l in data if l[0] in name)

程序运行。但是,我只从 NAMETHEFIRST 获取数据行,而没有从 NAMETHEOTHERNAME 获取写入我的小数据集文件的数据。这与我想要打印 NAME THE FIRST 数据行的大数据集中的所有相关信息完全一样,但我没有从写入较小文件的第二个名称或另一个名称中获得任何信息。为什么这不起作用?

So I'm extracting the lines that I want from this larger file using this program:

import csv

name = ['NAMETHEFIRST,' 'NAMEANOTHERNAME ']
data = csv.reader(open('C:\\bigfile.csv'))

with open('C:\\smalldataset.xcl','w') as outf:
    csv.writer(outf).writerows(l for l in data if l[0] in name)

The program runs. However I am only getting the line of data from NAMETHEFIRST and I get no data from NAMETHEOTHERNAME written to my small dataset file. This works exactly as I want printing all relevant info from the large data set of the line of data for NAME THE FIRST but i get no information from the second nametheother name written to the smaller file. Why isn't this working?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

疯了 2024-09-17 12:13:02

这是一个包含一个字符串的列表:

['NAMETHEFIRST,' 'NAMEANOTHERNAME ']

这是一个包含两个字符串的列表:

['NAMETHEFIRST', 'NAMEANOTHERNAME ']

请注意逗号的位置。

另请注意,第二个字符串末尾有一个空格。

This is a list with one string:

['NAMETHEFIRST,' 'NAMEANOTHERNAME ']

This is a list with two strings:

['NAMETHEFIRST', 'NAMEANOTHERNAME ']

Note the placement of the comma.

Also note that your second string has a space at the end.

后知后觉 2024-09-17 12:13:02

这行代码

name = ['NAMETHEFIRST,' 'NAMEANOTHERNAME ']

相当于

name = ['NAMETHEFIRST,NAMEANOTHERNAME ']

因为Python在编译时遵循C连接相邻的字符串常量。

您说“”“我只从 NAMETHEFIRST 获取数据行,而我没有从 NAMETHEOTHERNAME 获取任何数据写入我的小数据集文件”“” - 但是您显示的代码不会产生该结果;它将仅选择以

"NAMETHEFIRST,NAMEANOTHERNAME ", 

You will get the statements result only if that line 实际上是:

name = ['NAMETHEFIRST', 'NAMEANOTHERNAME ']

开头的行,这可能是因为文件中的第二个名称没有如上所述的尾随空格。

其他问题:

csv.writer(outf).writerows(l for l in data if l[0] in name) 试图变得有点太聪明了。如果将其分解为小块,则可以更轻松地使用调试器或仅打印语句来显示实际发生的情况。

试试这个:

print len(name), name
data = csv.reader(open('C:\\bigfile.csv', 'rb')) # ALWAYS open csv files in BINARY mode
with open('C:\\smalldataset.xcl','wb') as outf: # ALWAYS open csv files in BINARY mode
    writer = csv.writer(outf)
    for row_index, row in enumerate (data): # don't use 'l' as a variable name
        print row_index + 1, row
        if row[0] in name:
            writer.writerow(row)

This line of code

name = ['NAMETHEFIRST,' 'NAMEANOTHERNAME ']

is equivalent to

name = ['NAMETHEFIRST,NAMEANOTHERNAME ']

because Python follows C in concatenating adjacent string constants at compile time.

You say """I am only getting the line of data from NAMETHEFIRST and I get no data from NAMETHEOTHERNAME written to my small dataset file""" -- however the code that you show will NOT produce that result; it will select only lines that start with

"NAMETHEFIRST,NAMEANOTHERNAME ", 

You will get the stated result only if that line is actually:

name = ['NAMETHEFIRST', 'NAMEANOTHERNAME ']

and that is presumably because the second name in the file doesn't have a trailing space as above.

Other problems:

csv.writer(outf).writerows(l for l in data if l[0] in name) is trying to be a bit too clever. If you break it down into bite-size chunks, you can much more easily use a debugger or just print statements to show you what is actually happening.

Try this:

print len(name), name
data = csv.reader(open('C:\\bigfile.csv', 'rb')) # ALWAYS open csv files in BINARY mode
with open('C:\\smalldataset.xcl','wb') as outf: # ALWAYS open csv files in BINARY mode
    writer = csv.writer(outf)
    for row_index, row in enumerate (data): # don't use 'l' as a variable name
        print row_index + 1, row
        if row[0] in name:
            writer.writerow(row)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文