当前位置：文江博客话题详情

Python Scrapy python爬虫

scrapy，我想模拟登陆天眼查网站，那个网站要滑动对齐验证，我能怎么办才能模拟登陆成功呢？

发布于 2022-09-11 17:43:27 字数 2293 浏览 15 评论 0

这是我模拟登陆的核心代码：

def __init__(self):
        dcap = dict(webdriver.DesiredCapabilities.PHANTOMJS)  # 设置userAgent
        # dcap[
        #     "phantomjs.page.settings.userAgent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"
        self.driver = webdriver.PhantomJS(
            executable_path='C:\\Users\\gt\\Desktop\\tutorial\\phantomjs.exe',
            desired_capabilities=dcap)

        self.driver.maximize_window()

def start_requests(self):
        print("start request!!!")
        yield scrapy.Request(self.login_url, callback=self.parse)

def parse(self, response):
        print("parse!!!")

        self.driver.get(response.url)
        self.set_sleep_time()
        # print(self.driver.page_source)
        self.driver.find_element_by_xpath('//*[@id="web-content"]/div/div[2]/div/div[2]/div/div[3]/div[1]/div[1]').click()
        print("CLICK LEFT")
        time.sleep(1)
        temp = self.driver.find_element_by_xpath('//*[@id="web-content"]/div/div[2]/div/div[2]/div/div[3]/div[3]/div[2]/input')
        temp.click()
        temp.send_keys(PHONE)
        print("PHONE SENT")
        self.driver.find_element_by_xpath('//*[@id="web-content"]/div/div[2]/div/div[2]/div/div[3]/div[1]/div[2]').click()
        print("CLICK RIGHT")
        time.sleep(5)
        temp2 = self.driver.find_element_by_xpath('//*[@id="web-content"]/div/div[2]/div/div[2]/div/div[3]/div[2]/div[3]/input')
        temp2.click()
        temp2.send_keys(PASSWORD)
        print("PASSWORD SENT")
        self.driver.find_element_by_xpath('//*[@id="web-content"]/div/div[2]/div/div[2]/div/div[3]/div[2]/div[5]').click()
        self.set_sleep_time()
        time.sleep(3)
        # print self.driver.page_source
        print("准备进入解析。。。。。")
        cookies = self.driver.get_cookies()
        # print(cookies)

        f = open('data/url_list.txt', mode='r', encoding='utf-8')
        for line in f.readlines():
            url = str(line.replace('\r', '').replace('\n', '').replace('=', ''))
            print(url)
            time.sleep(1)
            print("停顿1秒...............")
            requests = scrapy.Request(url, cookies=cookies,
                                      callback=self.sub_parse)
            yield requests

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（1）

↙温凉少女 2022-09-18 17:43:27

你可以人工操作滑动验证，这个时候爬虫暂停；
等人工处理完成后爬虫再继续执行。

~没有更多了~

关于作者

暂无简介

0 文章

0 评论

24 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

胡图图

文章 0 评论 0

zt006

文章 0 评论 0

z祗昰~

文章 0 评论 0

冰葑

文章 0 评论 0

野の

文章 0 评论 0

天空

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文