两个如何用python水平合并多个.csv文件?

发布于 2024-09-28 14:18:48 字数 185 浏览 4 评论 0原文

我有几个 .csv 文件(~10),需要将它们水平合并到一个文件中。每个文件具有相同的行数 (~300) 和 4 个标题行,这些标题行不一定相同,但不应合并(仅从第一个 .csv 文件中获取标题行)。行中的标记以逗号分隔,中间没有空格。

作为一个Python菜鸟,我还没有想出解决方案,尽管我确信这个问题有一个简单的解决方案。欢迎任何帮助。

I've several .csv files (~10) and need to merge them together into a single file horizontally. Each file has the same number of rows (~300) and 4 header lines which are not necessarily identical, but should not be merged (only take the header lines from the first .csv file). The tokens in the lines are comma separated with no spaces in between.

As a python noob I've not come up with a solution, though I'm sure there's a simple solution to this problem. Any help is welcome.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

你没皮卡萌 2024-10-05 14:18:49

您可以使用 Python 中的 csv 模块加载 CSV 文件。加载代码请参考该模块的文档,我不记得了,但它真的很容易。类似于:

import csv
reader = csv.reader(open("some.csv", "rb"))
csvContent = list(reader)

之后,当您以这种形式(元组列表)加载 CSV 文件时:

[ ("header1", "header2", "header3", "header4"),
  ("value01", "value12", "value13", "value14"),
  ("value11", "value12", "value13", "value14"),
  ... 
]

您可以逐行合并两个这样的列表:

result = [a+b for (a,b) in zip(csvList1, csvList2)]

要保存这样的结果,您可以使用:

writer = csv.writer(open("some.csv", "wb"))
writer.writerows(result)

You can load the CSV files using the csv module in Python. Please refer to the documentation of this module for the loading code, I cannot remember it but it is really easy. Something like:

import csv
reader = csv.reader(open("some.csv", "rb"))
csvContent = list(reader)

After that, when you have the CSV files loaded in such form (a list of tuples):

[ ("header1", "header2", "header3", "header4"),
  ("value01", "value12", "value13", "value14"),
  ("value11", "value12", "value13", "value14"),
  ... 
]

You can merge two such lists line-by-line:

result = [a+b for (a,b) in zip(csvList1, csvList2)]

To save such a result, you can use:

writer = csv.writer(open("some.csv", "wb"))
writer.writerows(result)
我们的影子 2024-10-05 14:18:49

csv 模块是你的朋友。

The csv module is your friend.

倾城泪 2024-10-05 14:18:49

如果你不一定要使用Python,你可以使用 shell 工具,如 paste/gawk 等。

$ paste file1 file2 file3 file4 .. | awk 'NR>4'

上面会将它们水平放置,不带标题。如果您想要标头,只需从 file1 获取它们即可

$  ( head -4 file ; paste file[1-4] | awk 'NR>4' ) > output

If you don't necessarily have to use Python, you can use shell tools like paste/gawk etc

$ paste file1 file2 file3 file4 .. | awk 'NR>4'

The above will put them horizontally without the headers. If you want the headers, just get them from file1

$  ( head -4 file ; paste file[1-4] | awk 'NR>4' ) > output
脱离于你 2024-10-05 14:18:49

您不需要为此使用 csv 模块。你可以使用

file1 = open(file1)

打开所有文件后你可以这样做

from itertools import izip_longest

foo=[]
for new_line in izip_longest(file1,fil2,file3....,fillvalue=''):
    foo.append(new_line)

这会给你这个结构(kon已经告诉你了)..如果每个文件中有不同的行数,它也会起作用

[ ("line10", "line20", "line30", "line40"),
  ("line11", "line21", "line31", "line41"),
  ... 
]

之后你可以写它到一个新文件,一次获取 1 个列表

for listx in foo:
    new_file.write(','.join(j for j in listx))

PS:有关 izip_longest 的更多信息 此处

You dont need to use csv module for this. You can just use

file1 = open(file1)

After opening all your files you can do this

from itertools import izip_longest

foo=[]
for new_line in izip_longest(file1,fil2,file3....,fillvalue=''):
    foo.append(new_line)

This will give you this structure (which kon has already told you)..It will also work if you have different number of lines in each file

[ ("line10", "line20", "line30", "line40"),
  ("line11", "line21", "line31", "line41"),
  ... 
]

After this you can just write it to a new file taking 1 list at a time

for listx in foo:
    new_file.write(','.join(j for j in listx))

PS: more about izip_longest here

分開簡單 2024-10-05 14:18:49

你通过实践(甚至尝试)来学习。所以,我只会给你一些提示。使用以下函数:

如果您真的不知道该怎么做,我建议您阅读教程深入了解 Python 3。 (根据您对 Python 的了解程度,您要么必须阅读前几章,要么直接跳到文件 IO 章节。)

You learn by doing (and trying, even). So, I'll just give you a few hints. Use the following functions:

If you really don't know what to do, I recommend you read the tutorial and Dive Into Python 3. (Depending on how much Python you know, you'll either have to read through the first few chapters or cut straight to the file IO chapters.)

飘落散花 2024-10-05 14:18:49

纯粹出于学习目的

一种不利用 csv 模块的简单方法:

# open file to write
file_to_write = open(filename, 'w')
# your list of csv files
csv_files = [file1, file2, ...] 

headers = True
# iterate through your list
for filex in csv_files:
    # mark the lines that are header lines
    header_count = 0
    # open the csv file and read line by line
    filex_f = open(filex, 'r')
    for line in filex_f:
        # write header only once
        if headers:
            file_to_write.write(line+"\n")
            if header_count > 3: headers = False
        # Write all other lines to the file
        if header_count > 3:
            file_to_write.write(line+"\n")
        # count lines
        header_count = header_count + 1
    # close file
    filex_f.close()
file_to_write.close()

Purely for learning purposes

A simple approach that does not take advantage of csv module:

# open file to write
file_to_write = open(filename, 'w')
# your list of csv files
csv_files = [file1, file2, ...] 

headers = True
# iterate through your list
for filex in csv_files:
    # mark the lines that are header lines
    header_count = 0
    # open the csv file and read line by line
    filex_f = open(filex, 'r')
    for line in filex_f:
        # write header only once
        if headers:
            file_to_write.write(line+"\n")
            if header_count > 3: headers = False
        # Write all other lines to the file
        if header_count > 3:
            file_to_write.write(line+"\n")
        # count lines
        header_count = header_count + 1
    # close file
    filex_f.close()
file_to_write.close()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文