多次下载 - CSV 文件

发布于 01-17 23:35 字数 1102 浏览 3 评论 0原文

我有一个脚本,如下所示,可以从仅 1 个 CSV 文件的特定行下载文件。我对此没有任何问题,它运行良好,所有文件都下载到我的“Python 项目”文件夹 root 中。

但我想在这里添加功能,首先,不仅下载 1 个,而且下载多个(20 个或更多)CSV 文件,然后我就不必在这里手动更改名称 - open('name1.csv') 每次我的脚本完成工作时。第二个请求,下载需要放置在与下载的 csv 文件同名的文件夹中。希望我足够清楚:)

然后我可以:

  • name1.csv -> name1 文件夹 ->从 name1 csv 下载
  • name2.csv -> name2 文件夹 ->从 name2 csv 下载
  • name3.csv -> name3文件夹->从 name3 csv 下载
  • ...

任何帮助或建议将不胜感激:) 非常感谢!

from collections import Counter
import urllib.request
import csv
import os

with open('name1.csv') as csvfile:  #need to add multiple .csv files here.
    reader = csv.DictReader(csvfile)
    title_counts = Counter()
    
    for row in reader:
        name, ext = os.path.splitext(row['link'])
        title = row['title']
        title_counts[title] += 1
        title_filename = f"{title}_{title_counts[title]}{ext}".replace('/', '-') #need to create a folder for each CSV file with the download inside.
        urllib.request.urlretrieve(row['link'], title_filename)

I have a script, below, that can download files from a particular row from 1 only CSV file. I have no problem with it, it works well and all files are downloaded into my 'Python Project' folder, root.

But I would like to add functions here, First, download not only 1 but multiple (20 or more) CSV files then I don't have to change the name manually here - open('name1.csv') everytime my script has done the job. Second request, downloads need to be placed in a folder with the same name of the csv file that downloads come from. Hopefully I'm clear enough :)

Then I could have:

  • name1.csv -> name1 folder -> download from name1 csv
  • name2.csv -> name2 folder -> download from name2 csv
  • name3.csv -> name3 folder -> download from name3 csv
  • ...

Any help or suggestions will be more than appreciate :) Many thanks!

from collections import Counter
import urllib.request
import csv
import os

with open('name1.csv') as csvfile:  #need to add multiple .csv files here.
    reader = csv.DictReader(csvfile)
    title_counts = Counter()
    
    for row in reader:
        name, ext = os.path.splitext(row['link'])
        title = row['title']
        title_counts[title] += 1
        title_filename = f"{title}_{title_counts[title]}{ext}".replace('/', '-') #need to create a folder for each CSV file with the download inside.
        urllib.request.urlretrieve(row['link'], title_filename)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

迷乱花海 2025-01-24 23:35:42

您需要添加一个外循环,该循环将在特定文件夹中迭代文件。您可以使用 os.listdir() 返回所有条目的列表或 glob。 iglob() 带有*。csv模式,仅获取.csv扩展名的文件。

另外,您可以在代码中进行一些较小的改进。您正在使用 code> counter> counter> counter> 甚至简单dict> dict 。另外, /code> 是可能被弃用的旧接口的一部分,因此您可以组合 urllib.request.urlopen() and shutil.copyfileobj()

最后,要创建一个文件夹,您可以使用 os.mkdir(os.mkdir( )但是以前您需要检查文件夹是否已经使用 os.path.isdir() ,需要防止file> file existsisterror异常。

完整代码:

from os import mkdir
from os.path import join, splitext, isdir
from glob import iglob
from csv import DictReader
from collections import defaultdict
from urllib.request import urlopen
from shutil import copyfileobj

csv_folder = r"/some/path"
glob_pattern = "*.csv"
for file in iglob(join(csv_folder, glob_pattern)):
    with open(file) as csv_file:
        reader = DictReader(csv_file)
        save_folder, _ = splitext(file)
        if not isdir(save_folder):
            mkdir(save_folder)
        title_counter = defaultdict(int)
        for row in reader:
            url = row["link"]
            title = row["title"]
            title_counter[title] += 1
            _, ext = splitext(url)
            save_filename = join(save_folder, f"{title}_{title_counter[title]}{ext}")
            with urlopen(url) as req, open(save_filename, "wb") as save_file:
                copyfileobj(req, save_file)

You need to add an outer loop which will iterate over files in specific folder. You can use either os.listdir() which returns list of all entries or glob.iglob() with *.csv pattern to get only files with .csv extension.

Also there are some minor improvements you can make in your code. You're using Counter in the way that it can be replaced with defaultdict or even simple dict. Also urllib.request.urlretrieve() is a part of legacy interface which might get deprecated, so you can replace it with combination of urllib.request.urlopen() and shutil.copyfileobj().

Finally, to create a folder you can use os.mkdir() but previously you need to check whether folder already exists using os.path.isdir(), it's required to prevent FileExistsError exception.

Full code:

from os import mkdir
from os.path import join, splitext, isdir
from glob import iglob
from csv import DictReader
from collections import defaultdict
from urllib.request import urlopen
from shutil import copyfileobj

csv_folder = r"/some/path"
glob_pattern = "*.csv"
for file in iglob(join(csv_folder, glob_pattern)):
    with open(file) as csv_file:
        reader = DictReader(csv_file)
        save_folder, _ = splitext(file)
        if not isdir(save_folder):
            mkdir(save_folder)
        title_counter = defaultdict(int)
        for row in reader:
            url = row["link"]
            title = row["title"]
            title_counter[title] += 1
            _, ext = splitext(url)
            save_filename = join(save_folder, f"{title}_{title_counter[title]}{ext}")
            with urlopen(url) as req, open(save_filename, "wb") as save_file:
                copyfileobj(req, save_file)
祁梦 2025-01-24 23:35:42

对于1:只需循环循环,其中包含所需文件的名称即可。
可以使用“ os.listdir(path)”来检索列表,该列表返回“路径”中包含的文件列表(在您的情况下包含CSV文件的文件夹)。

For 1: Just loop over a list containing the names of your desired files.
The list can be retrieved using "os.listdir(path)" which returns a list of the files contained inside your "path" (a folder containing the csv files in your case).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文