Python:使用多个分割分隔符分割文件
我有多个 CSV 文件,需要循环解析它们以收集信息。 问题是,虽然它们的格式相同,但有些由“\t”分隔,另一些则由“,”分隔。 之后,我想删除字符串周围的双引号。
python 可以通过多个可能的分隔符进行分割吗?
目前,我可以使用以下方法将行拆分为一个:
f = open(filename, "r")
fields = f.readlines()
for fs in fields:
sf = fs.split('\t')
tf = [fi.strip ('"') for fi in sf]
I have multiple CSV files which I need to parse in a loop to gather information.
The problem is that while they are the same format, some are delimited by '\t' and others by ','.
After this, I want to remove the double-quote from around the string.
Can python split via multiple possible delimiters?
At the minute, I can split the line with one by using:
f = open(filename, "r")
fields = f.readlines()
for fs in fields:
sf = fs.split('\t')
tf = [fi.strip ('"') for fi in sf]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
像这样拆分文件不是一个好主意:如果其中一个字段中有逗号,它将失败。例如(对于制表符分隔的文件):行
"field1"\t"Hello, world"\t"field3"
将被拆分为 4 个字段,而不是 3 个。相反,您应该使用
csv
模块。它包含有用的Sniffer
类,可以检测文件中使用了哪些分隔符。 csv 模块还将为您删除双引号。Splitting the file like that is not a good idea: It will fail if there is a comma within one of the fields. For example (for a tab-delimited file): The line
"field1"\t"Hello, world"\t"field3"
will be split into 4 fields instead of 3.Instead, you should use the
csv
module. It contains the helpfulSniffer
class which can detect which delimiters are used in the file. The csv module will also remove the double-quotes for you.您可以使用正则表达式(可选编译)来执行此操作:
这不考虑制表符分隔字段内的逗号等。我会看看
csv
模块是否有帮助。You can do this with regex (optionally compiled):
This doesn't account for e.g. commas inside tab-delimited fields. I would see if the
csv
module is helpful.