如何将图像从Web刮擦中保存到文件夹中？（Python）

发布于 2025-01-30 17:24:43 字数 424 浏览 2 评论 0原文

我该如何做到这一点，以便将我从网络刮擦中获得的每个图像存储在文件夹中？我目前使用Google Colab，因为我只是在练习工作。我想将它们存储在我的Google Drive文件夹中。

这是我的网络刮擦代码：

import requests 
from bs4 import BeautifulSoup 

def getdata(url):
  r = requests.get(url)
  return r.text

htmldata = getdata('https://www.yahoo.com/')
soup = BeautifulSoup(htmldata, 'html.parser')

imgdata = []
for i in soup.find_all('img'):
  imgdata = i['src']
  print(imgdata)

原文

How do I make it so that each image I garnered from web scraping is then stored to a folder? I use Google Colab currently since I am just practicing stuff. I want to store them in my Google Drive folder.

This is my code for web scraping:

import requests 
from bs4 import BeautifulSoup 

def getdata(url):
  r = requests.get(url)
  return r.text

htmldata = getdata('https://www.yahoo.com/')
soup = BeautifulSoup(htmldata, 'html.parser')

imgdata = []
for i in soup.find_all('img'):
  imgdata = i['src']
  print(imgdata)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

最终幸福 2025-02-06 17:24:43

我在脚本运行以存储图片的文件夹中手动创建了一个图片文件夹。比我更改了您的代码for循环，因此将其附加到imgdata列表中。 尝试以外的块除外，因为列表中的每个URL不是有效的。

import requests 
from bs4 import BeautifulSoup 

def getdata(url):
    r = requests.get(url)
    return r.text

htmldata = getdata('https://www.yahoo.com/')
soup = BeautifulSoup(htmldata, 'html.parser')

imgdata = []
for i in soup.find_all('img'):
    imgdata.append(i['src']) # made a change here so its appendig to the list
    


filename = "pics/picture{}.jpg"
for i in range(len(imgdata)):
    print(f"img {i+1} / {len(imgdata)+1}")
    # try block because not everything in the imgdata list is a valid url
    try:
        r = requests.get(imgdata[i], stream=True)
        with open(filename.format(i), "wb") as f:
            f.write(r.content)
    except:
        print("Url is not an valid")

I created a pics folder manually in the folder where the script is running to store the pictures in it. Than i changed your code in the for loop so its appending urls to the imgdata list. The try exceptblock is there because not every url in the list is valid.

import requests 
from bs4 import BeautifulSoup 

def getdata(url):
    r = requests.get(url)
    return r.text

htmldata = getdata('https://www.yahoo.com/')
soup = BeautifulSoup(htmldata, 'html.parser')

imgdata = []
for i in soup.find_all('img'):
    imgdata.append(i['src']) # made a change here so its appendig to the list
    


filename = "pics/picture{}.jpg"
for i in range(len(imgdata)):
    print(f"img {i+1} / {len(imgdata)+1}")
    # try block because not everything in the imgdata list is a valid url
    try:
        r = requests.get(imgdata[i], stream=True)
        with open(filename.format(i), "wb") as f:
            f.write(r.content)
    except:
        print("Url is not an valid")

回复收藏 0 原文