多次下载 - CSV 文件
我有一个脚本,如下所示,可以从仅 1 个 CSV 文件的特定行下载文件。我对此没有任何问题,它运行良好,所有文件都下载到我的“Python 项目”文件夹 root 中。
但我想在这里添加功能,首先,不仅下载 1 个,而且下载多个(20 个或更多)CSV 文件,然后我就不必在这里手动更改名称 - open('name1.csv') 每次我的脚本完成工作时。第二个请求,下载需要放置在与下载的 csv 文件同名的文件夹中。希望我足够清楚:)
然后我可以:
- name1.csv -> name1 文件夹 ->从 name1 csv 下载
- name2.csv -> name2 文件夹 ->从 name2 csv 下载
- name3.csv -> name3文件夹->从 name3 csv 下载
- ...
任何帮助或建议将不胜感激:) 非常感谢!
from collections import Counter
import urllib.request
import csv
import os
with open('name1.csv') as csvfile: #need to add multiple .csv files here.
reader = csv.DictReader(csvfile)
title_counts = Counter()
for row in reader:
name, ext = os.path.splitext(row['link'])
title = row['title']
title_counts[title] += 1
title_filename = f"{title}_{title_counts[title]}{ext}".replace('/', '-') #need to create a folder for each CSV file with the download inside.
urllib.request.urlretrieve(row['link'], title_filename)
I have a script, below, that can download files from a particular row from 1 only CSV file. I have no problem with it, it works well and all files are downloaded into my 'Python Project' folder, root.
But I would like to add functions here, First, download not only 1 but multiple (20 or more) CSV files then I don't have to change the name manually here - open('name1.csv') everytime my script has done the job. Second request, downloads need to be placed in a folder with the same name of the csv file that downloads come from. Hopefully I'm clear enough :)
Then I could have:
- name1.csv -> name1 folder -> download from name1 csv
- name2.csv -> name2 folder -> download from name2 csv
- name3.csv -> name3 folder -> download from name3 csv
- ...
Any help or suggestions will be more than appreciate :) Many thanks!
from collections import Counter
import urllib.request
import csv
import os
with open('name1.csv') as csvfile: #need to add multiple .csv files here.
reader = csv.DictReader(csvfile)
title_counts = Counter()
for row in reader:
name, ext = os.path.splitext(row['link'])
title = row['title']
title_counts[title] += 1
title_filename = f"{title}_{title_counts[title]}{ext}".replace('/', '-') #need to create a folder for each CSV file with the download inside.
urllib.request.urlretrieve(row['link'], title_filename)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要添加一个外循环,该循环将在特定文件夹中迭代文件。您可以使用
os.listdir()
返回所有条目的列表或glob。 iglob()
带有*。csv
模式,仅获取.csv
扩展名的文件。另外,您可以在代码中进行一些较小的改进。您正在使用
code> counter> counter> counter>
以 甚至简单
dict> dict 。另外, /code> 是可能被弃用的旧接口的一部分,因此您可以组合urllib.request.urlopen()
andshutil.copyfileobj()
。最后,要创建一个文件夹,您可以使用
os.mkdir(os.mkdir( )
但是以前您需要检查文件夹是否已经使用
os.path.isdir()
,需要防止file> file existsisterror
异常。完整代码:
You need to add an outer loop which will iterate over files in specific folder. You can use either
os.listdir()
which returns list of all entries orglob.iglob()
with*.csv
pattern to get only files with.csv
extension.Also there are some minor improvements you can make in your code. You're using
Counter
in the way that it can be replaced withdefaultdict
or even simpledict
. Alsourllib.request.urlretrieve()
is a part of legacy interface which might get deprecated, so you can replace it with combination ofurllib.request.urlopen()
andshutil.copyfileobj()
.Finally, to create a folder you can use
os.mkdir()
but previously you need to check whether folder already exists usingos.path.isdir()
, it's required to preventFileExistsError
exception.Full code:
对于1:只需循环循环,其中包含所需文件的名称即可。
可以使用“ os.listdir(path)”来检索列表,该列表返回“路径”中包含的文件列表(在您的情况下包含CSV文件的文件夹)。
For 1: Just loop over a list containing the names of your desired files.
The list can be retrieved using "os.listdir(path)" which returns a list of the files contained inside your "path" (a folder containing the csv files in your case).