添加文件名作为 CSV 文件的最后一列

发布于 2024-11-02 04:23:33 字数 387 浏览 0 评论 0原文

我有一个 Python 脚本，它修改 CSV 文件以将文件名添加为最后一列：

import sys
import glob

for filename in glob.glob(sys.argv[1]):
    file = open(filename)
    data = [line.rstrip() + "," + filename for line in file]
    file.close()

    file = open(filename, "w")
    file.write("\n".join(data))
    file.close()

不幸的是，它还将文件名添加到文件的标题（第一）行。我希望将字符串“ID”添加到标题中。有人能建议我如何做到这一点吗？

原文

I have a Python script which modifies a CSV file to add the filename as the last column:

import sys
import glob

for filename in glob.glob(sys.argv[1]):
    file = open(filename)
    data = [line.rstrip() + "," + filename for line in file]
    file.close()

    file = open(filename, "w")
    file.write("\n".join(data))
    file.close()

Unfortunately, it also adds the filename to the header (first) row of the file. I would like the string "ID" added to the header instead. Can anybody suggest how I could do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不羁少年 2024-11-09 04:23:33

查看官方 csv 模块。

回复收藏 0 原文

我还不会笑 2024-11-09 04:23:33

以下是关于当前代码的一些小注释：

使用 file 作为变量名是一个坏主意，因为这会隐藏内置类型。
您可以使用 with 语法自动关闭文件对象。
您不想在标题行中添加一个额外的列（称为 Filename），而不是仅仅省略第一行中的一列吗？
如果您的文件名中包含逗号（或者不太可能是换行符），您需要确保文件名被引用 - 仅仅附加它是不行的。

最后一个考虑因素会让我倾向于使用 csv 模块，它将为您处理引用和取消引用。例如，您可以尝试类似以下代码的代码：

import glob
import csv
import sys

for filename in glob.glob(sys.argv[1]):
    data = []
    with open(filename) as finput:
        for i, row in enumerate(csv.reader(finput)):
            to_append = "Filename" if i == 0 else filename
            data.append(row+[to_append])
    with open(filename,'wb') as foutput:
        writer = csv.writer(foutput)
        for row in data:
            writer.writerow(row)

这可能会与您的输入文件引用数据略有不同，因此您可能需要使用 csv.reader 和 csv 的引用选项.writer 在 csv 模块的文档中进行了描述。

进一步说，您可能有充分的理由将 glob 作为参数，而不仅仅是命令行上的文件，但这有点令人惊讶 - 您必须将脚本调用为 ./whatever.py '*.csv' 而不仅仅是 ./whatever.py *.csv。相反，您可以这样做：

for filename in sys.argv[1:]:

... 并让 shell 在脚本了解任何信息之前扩展您的 glob。

最后一件事 - 您当前采用的方法有点危险，因为如果写回同一文件名时出现任何失败，您将丢失数据。避免这种情况的标准方法是写入临时文件，如果成功，则将临时文件重命名为原始文件。因此，您可以将整个事情重写为：

import csv
import sys
import tempfile
import shutil

for filename in sys.argv[1:]:
    tmp = tempfile.NamedTemporaryFile(delete=False)
    with open(filename) as finput:
        with open(tmp.name,'wb') as ftmp:
            writer = csv.writer(ftmp)
            for i, row in enumerate(csv.reader(finput)):
                to_append = "Filename" if i == 0 else filename
                writer.writerow(row+[to_append])
    shutil.move(tmp.name,filename)

Here are a few minor notes on your current code:

It's a bad idea to use file as a variable name, since that shadows the built-in type.
You can close the file objects automatically by using the with syntax.
Don't you want to add an extra column in the header line, called something like Filename, rather than just omitting a column in the first row?
If your filenames have commas (or, less probably, newlines) in them, you'll need to make sure that the filename is quoted - just appending it won't do.

That last consideration would incline me to use the csv module instead, which will deal with the quoting and unquoting for you. For example, you could try something like the following code:

import glob
import csv
import sys

for filename in glob.glob(sys.argv[1]):
    data = []
    with open(filename) as finput:
        for i, row in enumerate(csv.reader(finput)):
            to_append = "Filename" if i == 0 else filename
            data.append(row+[to_append])
    with open(filename,'wb') as foutput:
        writer = csv.writer(foutput)
        for row in data:
            writer.writerow(row)

That may quote the data slightly differently from your input file, so you might want to play with the quoting options for csv.reader and csv.writer described in the documentation for the csv module.

As a further point, you might have good reasons for taking a glob as a parameter rather than just the files on the command line, but it's a bit surprising - you'll have to call your script as ./whatever.py '*.csv' rather than just ./whatever.py *.csv. Instead, you could just do:

for filename in sys.argv[1:]:

... and let the shell expand your glob before the script knows anything about it.

One last thing - the current approach you're taking is slightly dangerous, in that if anything fails when writing back to the same filename, you'll lose data. The standard way of avoiding this is to instead write to a temporary file, and, if that was successful, rename the temporary file over the original. So, you might rewrite the whole thing as:

import csv
import sys
import tempfile
import shutil

for filename in sys.argv[1:]:
    tmp = tempfile.NamedTemporaryFile(delete=False)
    with open(filename) as finput:
        with open(tmp.name,'wb') as ftmp:
            writer = csv.writer(ftmp)
            for i, row in enumerate(csv.reader(finput)):
                to_append = "Filename" if i == 0 else filename
                writer.writerow(row+[to_append])
    shutil.move(tmp.name,filename)

回复收藏 0 原文

淡淡的优雅 2024-11-09 04:23:33

您可以尝试：

data = [file.readline().rstrip() + ",id"]
data += [line.rstrip() + "," + filename for line in file]

You can try:

data = [file.readline().rstrip() + ",id"]
data += [line.rstrip() + "," + filename for line in file]

回复收藏 0 原文

独﹏钓一江月 2024-11-09 04:23:33

您可以尝试更改代码，但建议使用 csv 模块。这应该会给你你想要的结果：

import sys
import glob
import csv

filename = glob.glob(sys.argv[1])[0]
yourfile = csv.reader(open(filename, 'rw'))

csv_output=[]

for row in yourfile:
    if len(csv_output) != 0:     # skip the header
        row.append(filename)
    csv_output.append(row)

yourfile = csv.writer(open(filename,'w'),delimiter=',')
yourfile.writerows(csv_output)

You can try changing your code, but using the csv module is recommended. This should give you the result you want:

import sys
import glob
import csv

filename = glob.glob(sys.argv[1])[0]
yourfile = csv.reader(open(filename, 'rw'))

csv_output=[]

for row in yourfile:
    if len(csv_output) != 0:     # skip the header
        row.append(filename)
    csv_output.append(row)

yourfile = csv.writer(open(filename,'w'),delimiter=',')
yourfile.writerows(csv_output)

回复收藏 0 原文

泅人 2024-11-09 04:23:33

使用Python附带的CSV模块。

import csv
import sys

def process_file(filename):
    # Read the contents of the file into a list of lines.
    f = open(filename, 'r')
    contents = f.readlines()
    f.close()

    # Use a CSV reader to parse the contents.
    reader = csv.reader(contents)

    # Open the output and create a CSV writer for it.
    f = open(filename, 'wb')
    writer = csv.writer(f)

    # Process the header.
    header = reader.next()
    header.append('ID')
    writer.writerow(header)

    # Process each row of the body.
    for row in reader:
        row.append(filename)
        writer.writerow(row)

    # Close the file and we're done.
    f.close()

# Run the function on all command-line arguments. Note that this does no
# checking for things such as file existence or permissions.
map(process_file, sys.argv[1:])

您可以按如下方式运行它：

blair@blair-eeepc:~$ python csv_add_filename.py file1.csv file2.csv

Use the CSV module that comes with Python.

import csv
import sys

def process_file(filename):
    # Read the contents of the file into a list of lines.
    f = open(filename, 'r')
    contents = f.readlines()
    f.close()

    # Use a CSV reader to parse the contents.
    reader = csv.reader(contents)

    # Open the output and create a CSV writer for it.
    f = open(filename, 'wb')
    writer = csv.writer(f)

    # Process the header.
    header = reader.next()
    header.append('ID')
    writer.writerow(header)

    # Process each row of the body.
    for row in reader:
        row.append(filename)
        writer.writerow(row)

    # Close the file and we're done.
    f.close()

# Run the function on all command-line arguments. Note that this does no
# checking for things such as file existence or permissions.
map(process_file, sys.argv[1:])

You can run this as follows:

blair@blair-eeepc:~$ python csv_add_filename.py file1.csv file2.csv

回复收藏 0 原文

岛徒 2024-11-09 04:23:33

您可以使用 fileinput 进行就地编辑

import sys
import glob
import fileinput

for filename in glob.glob(sys.argv[1]):
    for line in fileinput.FileInput(filename,inplace=1) :
       if fileinput.lineno()==1:
          print line.rstrip() + " ID"
       else
          print line.rstrip() + "," + filename

you can use fileinput to do in place editing

import sys
import glob
import fileinput

for filename in glob.glob(sys.argv[1]):
    for line in fileinput.FileInput(filename,inplace=1) :
       if fileinput.lineno()==1:
          print line.rstrip() + " ID"
       else
          print line.rstrip() + "," + filename

回复收藏 0 原文

~没有更多了~