从CSV删除或忽略一条线-Python

发布于 2025-01-19 03:41:47 字数 1130 浏览 3 评论 0原文

我尝试找到一种在脚本中添加函数以忽略或删除CSV文件的第一行的方法。我知道我们可以用大熊猫做到这一点,但是没有吗?

非常感谢您的帮助。

这是我的代码 -

from os import mkdir
from os.path import join, splitext, isdir
from glob import iglob
from csv import DictReader
from collections import defaultdict
from urllib.request import urlopen
from shutil import copyfileobj

csv_folder = r"/Users/folder/PycharmProjects/pythonProject/CSVfiles/"
glob_pattern = "*.csv"
for file in iglob(join(csv_folder, glob_pattern)):
    with open(file) as csv_file:
        reader = DictReader(csv_file)
        save_folder, _ = splitext(file)
        if not isdir(save_folder):
            mkdir(save_folder)
        title_counter = defaultdict(int)
        for row in reader:
            url = row["link"]
            title = row["title"]
            title_counter[title] += 1
            _, ext = splitext(url)
            save_filename = join(save_folder, f"{title}_{title_counter[title]}{ext}".replace('/', '-'))
            print(f"'{save_filename}'")
            with urlopen(url) as req, open(save_filename, "wb") as save_file:
                copyfileobj(req, save_file)

I try to find a way to add a function in my script to ignore or delete the first line of my CSV files. I know we can do that with pandas but it is possible without?

Many thanks for your help.

Here is my code -

from os import mkdir
from os.path import join, splitext, isdir
from glob import iglob
from csv import DictReader
from collections import defaultdict
from urllib.request import urlopen
from shutil import copyfileobj

csv_folder = r"/Users/folder/PycharmProjects/pythonProject/CSVfiles/"
glob_pattern = "*.csv"
for file in iglob(join(csv_folder, glob_pattern)):
    with open(file) as csv_file:
        reader = DictReader(csv_file)
        save_folder, _ = splitext(file)
        if not isdir(save_folder):
            mkdir(save_folder)
        title_counter = defaultdict(int)
        for row in reader:
            url = row["link"]
            title = row["title"]
            title_counter[title] += 1
            _, ext = splitext(url)
            save_filename = join(save_folder, f"{title}_{title_counter[title]}{ext}".replace('/', '-'))
            print(f"'{save_filename}'")
            with urlopen(url) as req, open(save_filename, "wb") as save_file:
                copyfileobj(req, save_file)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

您的好友蓝忘机已上羡 2025-01-26 03:41:47

使用next()函数跳过CSV的第一行。

with open(file) as csv_file:
    reader = DictReader(csv_file)

    # skip first row
    next(reader)

Use the next() function to skip the first row of your CSV.

with open(file) as csv_file:
    reader = DictReader(csv_file)

    # skip first row
    next(reader)

笑,眼淚并存 2025-01-26 03:41:47

您只需按照文件读取文件的原始文本,然后按新行划分文本并删除第一行:

file = open(filename, 'r')   # Open the file
content = file.read()        # Read the file
lines = content.split("\n")  # Split the text by the newline character
del lines[0]                 # Delete the first index from the resulting list, ie delete the first line.

尽管这可能需要很长时间才能用于较大的CSV文件,因此这可能不是最好的解决方案。

或者,您可以简单地跳过循环中的第一行。
而不是:

...
for row in reader:
...

您可以使用:

...
for row_num, row in enumerate(list(reader)):
    if row_num == 0:
        continue
    ...

而是吗?我认为应该跳过第一行。

You could just read the raw text from the file as normal and then split the text by new line and delete the first line:

file = open(filename, 'r')   # Open the file
content = file.read()        # Read the file
lines = content.split("\n")  # Split the text by the newline character
del lines[0]                 # Delete the first index from the resulting list, ie delete the first line.

Although this may take a long time for larger CSV files, so this may not be the best solution.

Or you could simply skip the first row in your for loop.
Instead of:

...
for row in reader:
...

Could you use:

...
for row_num, row in enumerate(list(reader)):
    if row_num == 0:
        continue
    ...

instead? I think that should skip the first row.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文