需要在Python中对多个CSV文件中的每一行进行数学运算

发布于 2024-12-03 17:55:48 字数 576 浏览 2 评论 0原文

我有大约 100 个 CSV 文件,每个月都要处理一次,我试图解决这个问题,但遇到了困难。我开始了解有关 Python 的一些事情,但是将几件事结合起来仍然给我带来问题,所以我无法弄清楚这一点。

这是我的问题:

我有很多 CSV 文件,这就是我需要做的:

在每行的前面添加一个“列”(或后面,实际上并不重要,但前面是理想的)。另外,每行有 5 行(不包括将要添加的文件名),格式如下:

6 位 ID 号,YYYY-MM-DD(1),YYYY-MM-DD(2),YYYY- MM-DD(3),1-2 位数字

我需要为文件中的每一行从 YYYY-MM-DD(2) 中减去 YYYY-MM-DD(3)(没有标题)行),对于给定目录中的每个 CSV。

我需要行内的文件名,因为我将合并这些文件(如果包含在脚本中,那就太棒了,但我想我可以弄清楚该部分),并且我需要知道记录来自哪个文件。文件名的格式始终为“4-5-digit-number.csv”

我希望这是有意义的,如果没有,请告诉我。我有点不知道从哪里开始,所以我没有任何真正开始为我工作的示例代码。真的很沮丧,所以我感谢你们提供的任何帮助,这个网站很棒!

迈兰

I have about 100 CSV files I have to operate on once a month and I was trying to wrap my head around this but I'm running into a wall. I'm starting to understand some things about Python, but combining several things is still giving me issues, so I can't figure this out.

Here's my problem:

I have many CSV files, and here's what I need done:

add a "column" to the front of each row (or the back, doesn't matter really, but front is ideal). In addition, each line has 5 rows (not counting the filename that will be added), and here's the format:

6-digit ID number,YYYY-MM-DD(1),YYYY-MM-DD(2),YYYY-MM-DD(3),1-2-digit number

I need to subtract YYYY-MM-DD(3) from YYYY-MM-DD(2) for every line in the file (there is no header row), for every CSV in a given directory.

I need the filename inside the row because I will combine the files (which, if is included in the script would be awesome, but I think I can figure that part out), and I need to know what file the records came from. Format of filename is always '4-5-digit-number.csv'

I hope this makes sense, if it does not, please let me know. I'm kind of stumped as to where to even begin, so I don't have any sample code that even really began to work for me. Really frustrated, so I appreciate any help you guys may provide, this site rocks!

Mylan

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

南渊 2024-12-10 17:55:48

标准库中有一个工具可以完成这些任务:

要迭代目录中的所有 CSV 文件,请使用 glob 模块

import glob
for csvfilename in glob.glob(r"C:\mydirectory\*.csv"):
    #do_something

要解析 CSV 文件,请使用 csv模块

import csv
with open(csvfilename, "rb") as csvfile:
    reader = csv.reader(csvfile, delimiter=",")
    for row in reader:
        # row is a list of all the entries in the current row

要解析日期并计算差异,请使用 datetime module

from datetime import datetime
startdate = datetime.strptime("1999-10-20", "%Y-%m-%d")
enddate = datetime.strptime("2003-02-28", "%Y-%m-%d")
delta = enddate - startdate # difference in days

将值添加到行的开头:

row[0:0] = [str(delta)]

将文件名附加到行的末尾:

row.append(csvfilename)

将一行写入新的 CSV 文件:

with open(csvfilename, "wb") as csvfile:
    writer = csv.writer(csvfile, delimiter=",")
    writer.writerow(row)

总而言之,您将得到:

import glob
import csv
from datetime import datetime

with open("combined_files_csv", "wb") as outfile:
    writer = csv.writer(outfile, delimiter=",")
    for csvfilename in glob.glob(r"C:\mydirectory\*.csv"):
        with open(csvfilename, "rb") as infile:
            reader = csv.reader(infile, delimiter=",")
            for row in reader:
                startdate = datetime.strptime(row[3], "%Y-%m-%d")
                enddate = datetime.strptime(row[2], "%Y-%m-%d")
                delta = enddate - startdate # difference in days
                row[0:0] = [str(delta)]
                row.append(csvfilename)
                writer.writerow(row)

There's a tool in the standard library for each of these tasks:

To iterate over all CSV files in a directory, use the glob module:

import glob
for csvfilename in glob.glob(r"C:\mydirectory\*.csv"):
    #do_something

To parse a CSV file, use the csv module:

import csv
with open(csvfilename, "rb") as csvfile:
    reader = csv.reader(csvfile, delimiter=",")
    for row in reader:
        # row is a list of all the entries in the current row

To parse a date and calculate a difference, use the datetime module:

from datetime import datetime
startdate = datetime.strptime("1999-10-20", "%Y-%m-%d")
enddate = datetime.strptime("2003-02-28", "%Y-%m-%d")
delta = enddate - startdate # difference in days

To add a value to the beginning of a row:

row[0:0] = [str(delta)]

To append the filename to the end of a row:

row.append(csvfilename)

And to write a row to a new CSV file:

with open(csvfilename, "wb") as csvfile:
    writer = csv.writer(csvfile, delimiter=",")
    writer.writerow(row)

Taken all together, you get:

import glob
import csv
from datetime import datetime

with open("combined_files_csv", "wb") as outfile:
    writer = csv.writer(outfile, delimiter=",")
    for csvfilename in glob.glob(r"C:\mydirectory\*.csv"):
        with open(csvfilename, "rb") as infile:
            reader = csv.reader(infile, delimiter=",")
            for row in reader:
                startdate = datetime.strptime(row[3], "%Y-%m-%d")
                enddate = datetime.strptime(row[2], "%Y-%m-%d")
                delta = enddate - startdate # difference in days
                row[0:0] = [str(delta)]
                row.append(csvfilename)
                writer.writerow(row)
笑饮青盏花 2024-12-10 17:55:48

该程序的基本轮廓如下:

  1. 使用 os 模块从感兴趣的目录中获取文件名
  2. 一次读入每个文件
  3. 对于文件中的每一行,将其分成带有 < 的列code>columns = line.split(",")
  4. 使用 datetime.date 将“2011-05-03”等字符串转换为 datetime.dates。
  5. 从第二个日期中减去第三个日期,得到一个 datetime.timedelta。
  6. 将所有信息放入您想要的格式(提示:str(foo) 生成 foo 的字符串表示形式,适用于任何类型)并记住它以供稍后使用
  7. 关闭您的文件,重新打开它以进行写入,然后将您的新内容写入

The basic outline of the program is going to be like this:

  1. Use the os module to get the filenames out of the directory/directories of interest
  2. Read in each file one at a time
  3. For each line in the file, split it into columns with columns = line.split(",")
  4. Use datetime.date to convert strings like "2011-05-03" to datetime.dates.
  5. Subtract the third date from the second, which yields a datetime.timedelta.
  6. Put all your information in the format you want (hint: str(foo) yields a string representation of foo, for just about any type) and remember it for later
  7. Close your file, reopen it for writing, and write your new stuff in
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文