Python：使用多个分割分隔符分割文件

发布于 2024-09-04 23:29:54 字数 306 浏览 3 评论 0原文

我有多个 CSV 文件，需要循环解析它们以收集信息。问题是，虽然它们的格式相同，但有些由“\t”分隔，另一些则由“,”分隔。之后，我想删除字符串周围的双引号。

python 可以通过多个可能的分隔符进行分割吗？

目前，我可以使用以下方法将行拆分为一个：

f = open(filename, "r")
fields = f.readlines()
for fs in fields:
    sf = fs.split('\t')
    tf = [fi.strip ('"') for fi in sf]

原文

I have multiple CSV files which I need to parse in a loop to gather information.
The problem is that while they are the same format, some are delimited by '\t' and others by ','.
After this, I want to remove the double-quote from around the string.

Can python split via multiple possible delimiters?

At the minute, I can split the line with one by using:

f = open(filename, "r")
fields = f.readlines()
for fs in fields:
    sf = fs.split('\t')
    tf = [fi.strip ('"') for fi in sf]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

乖乖兔^ω^ 2024-09-11 23:29:54

像这样拆分文件不是一个好主意：如果其中一个字段中有逗号，它将失败。例如（对于制表符分隔的文件）：行 "field1"\t"Hello, world"\t"field3" 将被拆分为 4 个字段，而不是 3 个。

相反，您应该使用csv 模块。它包含有用的 Sniffer 类，可以检测文件中使用了哪些分隔符。 csv 模块还将为您删除双引号。

import csv

csvfile = open("example.csv")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)

for line in reader:
    #process line

Splitting the file like that is not a good idea: It will fail if there is a comma within one of the fields. For example (for a tab-delimited file): The line "field1"\t"Hello, world"\t"field3" will be split into 4 fields instead of 3.

Instead, you should use the csv module. It contains the helpful Sniffer class which can detect which delimiters are used in the file. The csv module will also remove the double-quotes for you.

import csv

csvfile = open("example.csv")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)

for line in reader:
    #process line

回复收藏 0 原文