python中如何合并两个文件

发布于 2025-01-03 15:25:38 字数 876 浏览 0 评论 0原文

我有两个制表符分隔的 csv 文件(带标题),需要在 python 中合并。

另外,在合并的文件中,我想在末尾添加一列来标识文件,因为虽然它们具有相同的格式,但它们具有不同的数据,我稍后需要将其分开。 因此,我想在输出的每一行添加一个名为“source”的列,其中 file1 为 0,file2 为 1。

我已经使用了 csv 模块,但 writerow 在它写入的每一行之间添加了一个附加的换行符,并且此代码不会从 file2 中写入任何内容。我在这里做错了什么?另外,如何在 line 对象中添加额外的列“source”?

import os, csv

path1 = os.path.abspath("../data/file1.txt")
path2 = os.path.abspath("../data/file2.txt")
merged_path = os.path.abspath('../data/output.txt')

# merge the two files for further processing
merged_file = csv.writer(open(merged_path, 'a'), delimiter = '\t')

#file1
fg = csv.reader(open(path1, 'r'), delimiter = '\t')

for line in fg:
    if line[7] != '\N':
        merged_file.writerow(line) 

#file2
bg = csv.reader(open(path2, 'r'), delimiter = '\t')

for line in bg:
    if line[16] != '\N':
        merged_file.writerow(line) 

I have two tab delimited csv files (with headers) that I need to merge in python.

Also, in the merged file I want to add a column in the end to identify the files because though they have same format, they have different data that I need to separate later on.
So, I want to add a column called 'source' on each line of output which is 0 for file1 and 1 for file2.

I have gone far as using the csv module but the writerow adds an additioal newline character between each line it writes and this code doesn't write anything from file2. What am I doing wrong here? Also, how do I add the extra column 'source' in the line object?

import os, csv

path1 = os.path.abspath("../data/file1.txt")
path2 = os.path.abspath("../data/file2.txt")
merged_path = os.path.abspath('../data/output.txt')

# merge the two files for further processing
merged_file = csv.writer(open(merged_path, 'a'), delimiter = '\t')

#file1
fg = csv.reader(open(path1, 'r'), delimiter = '\t')

for line in fg:
    if line[7] != '\N':
        merged_file.writerow(line) 

#file2
bg = csv.reader(open(path2, 'r'), delimiter = '\t')

for line in bg:
    if line[16] != '\N':
        merged_file.writerow(line) 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

花开半夏魅人心 2025-01-10 15:25:38

我更喜欢使用 dictWriter 来实现此目的。此外,您的代码不起作用,因为 csv 库需要以 binary 模式打开文件。

import os, csv

path1 = os.path.abspath("../data/file1.txt")
path2 = os.path.abspath("../data/file2.txt")
merged_path = os.path.abspath('../data/output.txt')

#file1
fg = csv.DictReader(open(path1, 'rb'), delimiter = '\t')

fieldnames = fg.fieldnames
fieldnames.append('source')
# merge the two files for further processing
merged_file = csv.DictWriter(open(merged_path, 'ab'), delimiter = '\t', fieldnames=fieldnames)
merged_file.writeheader()

for row in fg:
    row['source'] = os.path.basename(path1)
    merged_file.writerow(row)

#file2
bg = csv.DictReader(open(path2, 'rb'), delimiter = '\t')

for row in bg:
    row['source'] = os.path.basename(path1)
    merged_file.writerow(row)

I prefer to use the dictWriter for this. Also, your code doesn't work because the csv library requires opening files in binary mode.

import os, csv

path1 = os.path.abspath("../data/file1.txt")
path2 = os.path.abspath("../data/file2.txt")
merged_path = os.path.abspath('../data/output.txt')

#file1
fg = csv.DictReader(open(path1, 'rb'), delimiter = '\t')

fieldnames = fg.fieldnames
fieldnames.append('source')
# merge the two files for further processing
merged_file = csv.DictWriter(open(merged_path, 'ab'), delimiter = '\t', fieldnames=fieldnames)
merged_file.writeheader()

for row in fg:
    row['source'] = os.path.basename(path1)
    merged_file.writerow(row)

#file2
bg = csv.DictReader(open(path2, 'rb'), delimiter = '\t')

for row in bg:
    row['source'] = os.path.basename(path1)
    merged_file.writerow(row)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文