当前位置：文江博客话题详情

两个如何用python水平合并多个.csv文件？

发布于 2024-09-28 14:18:48 字数 185 浏览 4 评论 0原文

我有几个 .csv 文件（~10），需要将它们水平合并到一个文件中。每个文件具有相同的行数 (~300) 和 4 个标题行，这些标题行不一定相同，但不应合并（仅从第一个 .csv 文件中获取标题行）。行中的标记以逗号分隔，中间没有空格。

作为一个Python菜鸟，我还没有想出解决方案，尽管我确信这个问题有一个简单的解决方案。欢迎任何帮助。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你没皮卡萌 2024-10-05 14:18:49

您可以使用 Python 中的 csv 模块加载 CSV 文件。加载代码请参考该模块的文档，我不记得了，但它真的很容易。类似于：

import csv
reader = csv.reader(open("some.csv", "rb"))
csvContent = list(reader)

之后，当您以这种形式（元组列表）加载 CSV 文件时：

[ ("header1", "header2", "header3", "header4"),
  ("value01", "value12", "value13", "value14"),
  ("value11", "value12", "value13", "value14"),
  ... 
]

您可以逐行合并两个这样的列表：

result = [a+b for (a,b) in zip(csvList1, csvList2)]

要保存这样的结果，您可以使用：

writer = csv.writer(open("some.csv", "wb"))
writer.writerows(result)

You can load the CSV files using the csv module in Python. Please refer to the documentation of this module for the loading code, I cannot remember it but it is really easy. Something like:

import csv
reader = csv.reader(open("some.csv", "rb"))
csvContent = list(reader)

After that, when you have the CSV files loaded in such form (a list of tuples):

[ ("header1", "header2", "header3", "header4"),
  ("value01", "value12", "value13", "value14"),
  ("value11", "value12", "value13", "value14"),
  ... 
]

You can merge two such lists line-by-line:

result = [a+b for (a,b) in zip(csvList1, csvList2)]

To save such a result, you can use:

writer = csv.writer(open("some.csv", "wb"))
writer.writerows(result)

回复收藏 0 原文

我们的影子 2024-10-05 14:18:49

csv 模块是你的朋友。

回复收藏 0 原文

倾城泪 2024-10-05 14:18:49

如果你不一定要使用Python，你可以使用 shell 工具，如 paste/gawk 等。

$ paste file1 file2 file3 file4 .. | awk 'NR>4'

上面会将它们水平放置，不带标题。如果您想要标头，只需从 file1 获取它们即可

$  ( head -4 file ; paste file[1-4] | awk 'NR>4' ) > output

If you don't necessarily have to use Python, you can use shell tools like paste/gawk etc

$ paste file1 file2 file3 file4 .. | awk 'NR>4'

The above will put them horizontally without the headers. If you want the headers, just get them from file1

$  ( head -4 file ; paste file[1-4] | awk 'NR>4' ) > output

回复收藏 0 原文

脱离于你 2024-10-05 14:18:49

您不需要为此使用 csv 模块。你可以使用

file1 = open(file1)

打开所有文件后你可以这样做

from itertools import izip_longest

foo=[]
for new_line in izip_longest(file1,fil2,file3....,fillvalue=''):
    foo.append(new_line)

这会给你这个结构（kon已经告诉你了）..如果每个文件中有不同的行数，它也会起作用

[ ("line10", "line20", "line30", "line40"),
  ("line11", "line21", "line31", "line41"),
  ... 
]

之后你可以写它到一个新文件，一次获取 1 个列表

for listx in foo:
    new_file.write(','.join(j for j in listx))

PS：有关 izip_longest 的更多信息此处

You dont need to use csv module for this. You can just use

file1 = open(file1)

After opening all your files you can do this

from itertools import izip_longest

foo=[]
for new_line in izip_longest(file1,fil2,file3....,fillvalue=''):
    foo.append(new_line)

This will give you this structure (which kon has already told you)..It will also work if you have different number of lines in each file

[ ("line10", "line20", "line30", "line40"),
  ("line11", "line21", "line31", "line41"),
  ... 
]

After this you can just write it to a new file taking 1 list at a time

for listx in foo:
    new_file.write(','.join(j for j in listx))

PS: more about izip_longest here

回复收藏 0 原文

分開簡單 2024-10-05 14:18:49

你通过实践（甚至尝试）来学习。所以，我只会给你一些提示。使用以下函数：

要打开文件： open()
要读取文件中的所有行： < code>IOBase.readlines()
根据一系列分割标记分割字符串： str.split()

如果您真的不知道该怎么做，我建议您阅读教程和深入了解 Python 3。（根据您对 Python 的了解程度，您要么必须阅读前几章，要么直接跳到文件 IO 章节。）

回复收藏 0 原文

飘落散花 2024-10-05 14:18:49

纯粹出于学习目的

一种不利用 csv 模块的简单方法：

# open file to write
file_to_write = open(filename, 'w')
# your list of csv files
csv_files = [file1, file2, ...] 

headers = True
# iterate through your list
for filex in csv_files:
    # mark the lines that are header lines
    header_count = 0
    # open the csv file and read line by line
    filex_f = open(filex, 'r')
    for line in filex_f:
        # write header only once
        if headers:
            file_to_write.write(line+"\n")
            if header_count > 3: headers = False
        # Write all other lines to the file
        if header_count > 3:
            file_to_write.write(line+"\n")
        # count lines
        header_count = header_count + 1
    # close file
    filex_f.close()
file_to_write.close()

Purely for learning purposes

A simple approach that does not take advantage of csv module:

# open file to write
file_to_write = open(filename, 'w')
# your list of csv files
csv_files = [file1, file2, ...] 

headers = True
# iterate through your list
for filex in csv_files:
    # mark the lines that are header lines
    header_count = 0
    # open the csv file and read line by line
    filex_f = open(filex, 'r')
    for line in filex_f:
        # write header only once
        if headers:
            file_to_write.write(line+"\n")
            if header_count > 3: headers = False
        # Write all other lines to the file
        if header_count > 3:
            file_to_write.write(line+"\n")
        # count lines
        header_count = header_count + 1
    # close file
    filex_f.close()
file_to_write.close()

回复收藏 0 原文

~没有更多了~