我的python代码用于网络刮擦和下载五个图像不起作用。.使用Blender（3D）作为IDE

发布于 2025-02-05 03:15:40 字数 3055 浏览 3 评论 0原文

我正在运行此代码网络刮擦，并从Google下载5个图像：应该发生的是在我运行代码后，应该出现Chrome Web浏览器，并且该代码应该导致鼠标单击图像，然后下载它，然后向下滚动到另一个图像鼠标单击该图像，然后下载等等，最多五次。此代码中发生的事情是Chrome浏览器出现了一秒钟，没有其他任何事情发生……尽管代码不会引发任何python错误。

我使用Blender 3D建模软件作为我的IDE，因为我希望添加搅拌器（搅拌器附加组件有点像Google Chrome中的扩展名，这是您安装到Blender中的一小部分软件，以提高其功能）将来使用Python代码。这就是我代码顶部额外导入线的原因...

一个相关的项目是我得到此警告：

e：\ global Assets \ scrippting \ web刮擦图像\ web-scraper.blend \ web-scraper。 PY：21：debecationwarning：已弃用dopecate_path，请传递服务对象，

这是我运行代码后我控制台中唯一的其他信息：

DevTools listening on ws://127.0.0.1:52643/devtools/browser/ea448f70-0066-4d50-bfb8-8671528789b8

对此的任何帮助都将不胜感激。

import bpy
import subprocess
import sys
import os
import cv2
import random
from random import randrange
from PIL import Image #make sure both pil from c:\users\mjoe6\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages is in blender pip3.exe folder
from selenium import webdriver
from selenium.webdriver.common.by import By
import requests
import io
import time

# path to python.exe
python_exe = os.path.join(sys.prefix, 'bin', 'python.exe')
py_lib = os.path.join(sys.prefix, 'lib', 'site-packages','pip')


PATH = "E:\\GLOBAL ASSETS\\SCRIPTING\\Web Scraping Images\\chromedriver.exe"
wd = webdriver.Chrome(PATH)

def get_images_from_google(wd, delay, max_images):
    def scroll_down(wd):
        wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(delay)
        
        url = "https://www.google.com/search?q=cats+2019+IMDb&sxsrf=ALiCzsZmBIp-JZmZv23v6ORoc0VL2NRuxg:1654543304286&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiRjquPxpn4AhUymo4IHbC6Cx0Q_AUoAnoECAEQBA&biw=1536&bih=714&dpr=1.25#imgrc=ixH-uoDQFN_gpM"
        wd.get(url)
        
        image_urls = set()
        while len (image_urls) < max_images:
            scroll_down(wd)
            thumbnails = wd.find_elements(By.CLASS_NAME, "Q4LuWd")
            for img in thumbnails[len(image_urls):max_images]:
                try:
                    img.click()
                    time.sleep.delay
                except:
                    continue    
                images = wd.find_elements(By.CLASS_NAME, "n3VNCb")
                for image in images:
                    if image.get_attribute('src') and 'http' in image.get_attribute('src'):
                        image_urls.add(image.get_attribute('src'))
                        print(f"Found {len (image_urls)}")
        return image_urls
        
def download_image(download_path, url, file_name):
    try:
        image_content = requests.get(url).content
        image_file = io.BytesIO(image_content)
        image = Image.open(image_file)
        file_path = download_path + file_name
        with open(file_path, "wb") as f:
            image.save(f, "PNG")
        
        print("Success")
    except Exception as e:
        print('FAILED -', e)    


urls = get_images_from_google(wd, 1, 5)
print(urls)
wd.quit()

原文

I am running this code too web scrape and download 5 images from Google: what's supposed to happen is after I run the code, a Chrome web browser is supposed to come up and the code is supposed to cause mouse click on an image and then download it and then Scroll down to another image mouse click that and download it etc, up to five times. What happens in this code is the Chrome browser comes up for a second closes and nothing else happens...though code doesn't throw up any Python errors.

I am using blender 3D modeling software as my IDE because I hope to make a Blender add on(a Blender add-on is kind of like an extension in Google Chrome it's a small piece of software that you install into Blender to increase its functionality) in the future using Python code. Which is the reason for the extra import lines at the top of my code...

One relevant item is that I get this warning:

E:\GLOBAL ASSETS\SCRIPTING\Web Scraping Images\web-scraper.blend\web-scraper.py:21: DeprecationWarning: executable_path has been deprecated, please pass in a Service object

this is the only other piece of information that is in my console after I run the code:

DevTools listening on ws://127.0.0.1:52643/devtools/browser/ea448f70-0066-4d50-bfb8-8671528789b8

any help on this would be appreciated

import bpy
import subprocess
import sys
import os
import cv2
import random
from random import randrange
from PIL import Image #make sure both pil from c:\users\mjoe6\appdata\local\packages\pythonsoftwarefoundation.python.3.10_qbz5n2kfra8p0\localcache\local-packages\python310\site-packages is in blender pip3.exe folder
from selenium import webdriver
from selenium.webdriver.common.by import By
import requests
import io
import time

# path to python.exe
python_exe = os.path.join(sys.prefix, 'bin', 'python.exe')
py_lib = os.path.join(sys.prefix, 'lib', 'site-packages','pip')


PATH = "E:\\GLOBAL ASSETS\\SCRIPTING\\Web Scraping Images\\chromedriver.exe"
wd = webdriver.Chrome(PATH)

def get_images_from_google(wd, delay, max_images):
    def scroll_down(wd):
        wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(delay)
        
        url = "https://www.google.com/search?q=cats+2019+IMDb&sxsrf=ALiCzsZmBIp-JZmZv23v6ORoc0VL2NRuxg:1654543304286&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiRjquPxpn4AhUymo4IHbC6Cx0Q_AUoAnoECAEQBA&biw=1536&bih=714&dpr=1.25#imgrc=ixH-uoDQFN_gpM"
        wd.get(url)
        
        image_urls = set()
        while len (image_urls) < max_images:
            scroll_down(wd)
            thumbnails = wd.find_elements(By.CLASS_NAME, "Q4LuWd")
            for img in thumbnails[len(image_urls):max_images]:
                try:
                    img.click()
                    time.sleep.delay
                except:
                    continue    
                images = wd.find_elements(By.CLASS_NAME, "n3VNCb")
                for image in images:
                    if image.get_attribute('src') and 'http' in image.get_attribute('src'):
                        image_urls.add(image.get_attribute('src'))
                        print(f"Found {len (image_urls)}")
        return image_urls
        
def download_image(download_path, url, file_name):
    try:
        image_content = requests.get(url).content
        image_file = io.BytesIO(image_content)
        image = Image.open(image_file)
        file_path = download_path + file_name
        with open(file_path, "wb") as f:
            image.save(f, "PNG")
        
        print("Success")
    except Exception as e:
        print('FAILED -', e)    


urls = get_images_from_google(wd, 1, 5)
print(urls)
wd.quit()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

安人多梦 2025-02-12 03:15:40

错误可能是在这些行中，

                images = wd.find_elements(By.CLASS_NAME, "n3VNCb")

更改

                images = wd.find_elements(By.CLASS_NAME, "iPVvYb")

应该像在此行中一样

                if image.get_attribute('src') and 'http' in image.get_attribute('src'):

，应使用https更改http，

                 if image.get_attribute('src') and 'http' in image.get_attribute('src'):

Error could be in these lines,

                images = wd.find_elements(By.CLASS_NAME, "n3VNCb")

should change like

                images = wd.find_elements(By.CLASS_NAME, "iPVvYb")

also in this line

                if image.get_attribute('src') and 'http' in image.get_attribute('src'):

should change http with https,

                 if image.get_attribute('src') and 'http' in image.get_attribute('src'):

回复收藏 0 原文

~没有更多了~

关于作者

享受孤独

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

我的python代码用于网络刮擦和下载五个图像不起作用。.使用Blender（3D）作为IDE

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

甲如呢乙后呢

王权女流氓

云雾

wyh2033345759

乖乖

qq_xR3jkM

友情链接

我的python代码用于网络刮擦和下载五个图像不起作用。.使用Blender（3D）作为IDE

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

甲如呢乙后呢

王权女流氓

云雾

wyh2033345759

乖乖

qq_xR3jkM

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。