googlecaptcha障碍在网站刮刀中

发布于 2025-02-10 14:25:16 字数 501 浏览 3 评论 0 原文

我目前正在为aniworld.to制作刮刀。 我的目标是输入动漫名称并下载所有情节。 除了一件事,我有工作的一切... 网站有一个手表按钮。该按钮将您重定向到 https://aniworld.to/redirect/something ,该站点具有码头,这意味着该链接不在html中... 有没有办法绕过此/获取Python中的链接?还是一种显示验证码以便解决的方法? 因为验证码只会出现每个光年。 我从该页面唯一需要的是重定向链接。看起来像这样: https://vidoza.net/embed-something.html 如果有帮助,我非常非常WIP代码在这里: https://github.com/wolfswolke/wolfswolke/wolfswolke/aniworld_scraper

I am currently working on a scraper for aniworld.to.
My goal is it to enter the anime name and get all of the Episodes downloaded.
I have everything working except one thing...
The websites has a Watch button. That Button redirects you to https://aniworld.to/redirect/SOMETHING and that Site has a captcha which means the link is not in the html...
Is there a way to bypass this/get the link in python? Or a way to display the captcha so I can solve it?
Because the captcha only appears every lightyear.
The only thing I need from that page is the redirect link. It looks like this:
https://vidoza.net/embed-something.html
My very very wip code is here if it helps: https://github.com/wolfswolke/aniworld_scraper

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

无边思念无边月 2025-02-17 14:25:16

米奇(Mitchdu)向我展示了如何做。
如果其他人需要帮助,这里是我的代码:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from threading import Thread

import os
def open_captcha_window(full_url):
    working_dir = os.getcwd()
    path_to_ublock = r'{}\extensions\ublock'.format(working_dir)
    options = webdriver.ChromeOptions()
    options.add_argument("app=" + full_url)
    options.add_argument("window-size=423,705")
    options.add_experimental_option('excludeSwitches', ['enable-logging'])
    if os.path.exists(path_to_ublock):
        options.add_argument('load-extension=' + path_to_ublock)

    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
    driver.get(full_url)

    wait = WebDriverWait(driver, 100, 0.3)
    wait.until(lambda redirect: redirect.current_url != full_url)

    new_page = driver.current_url
    Thread(target=threaded_driver_close, args=(driver,)).start()
    return new_page


def threaded_driver_close(driver):
    driver.close()

Mitchdu showed me how to do it.
If anyone else needs help here is my code: https://github.com/wolfswolke/aniworld_scraper/blob/main/src/logic/captcha.py

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from threading import Thread

import os
def open_captcha_window(full_url):
    working_dir = os.getcwd()
    path_to_ublock = r'{}\extensions\ublock'.format(working_dir)
    options = webdriver.ChromeOptions()
    options.add_argument("app=" + full_url)
    options.add_argument("window-size=423,705")
    options.add_experimental_option('excludeSwitches', ['enable-logging'])
    if os.path.exists(path_to_ublock):
        options.add_argument('load-extension=' + path_to_ublock)

    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
    driver.get(full_url)

    wait = WebDriverWait(driver, 100, 0.3)
    wait.until(lambda redirect: redirect.current_url != full_url)

    new_page = driver.current_url
    Thread(target=threaded_driver_close, args=(driver,)).start()
    return new_page


def threaded_driver_close(driver):
    driver.close()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文