使用 scrapy 选择单选按钮

发布于 2024-10-18 21:46:34 字数 241 浏览 4 评论 0原文

我将如何使用 scrapy 选择单选按钮?

我正在尝试选择以下内容

formdata={'rd1':'E'} does not work

<input type="radio" name="rd1" value="E" checked="checked" />Employee
<input type="radio" name="rd2" value="o" />Other

How would i go about selection radio buttons with scrapy?

I am trying to select the following

formdata={'rd1':'E'} does not work

<input type="radio" name="rd1" value="E" checked="checked" />Employee
<input type="radio" name="rd2" value="o" />Other

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不再让梦枯萎 2024-10-25 21:46:34

您可以使用 lxml.cssselector 来选择单选按钮。

>>> import lxml.html
>>> from lxml.cssselect import CSSSelector
>>> str = """
... '<input type="radio" name="rd1" value="E" checked="checked" />Employee
... <input type="radio" name="rd2" value="o" />Other'
... """
>>> input_sel = CSSSelector('input[name="rd1"]')
>>> lx = lxml.html.fromstring(str)
>>> input_sel(lx)
[<InputElement b7e7665c name='rd1' type='radio'>]

You could use lxml.cssselector to select the radio buttons.

>>> import lxml.html
>>> from lxml.cssselect import CSSSelector
>>> str = """
... '<input type="radio" name="rd1" value="E" checked="checked" />Employee
... <input type="radio" name="rd2" value="o" />Other'
... """
>>> input_sel = CSSSelector('input[name="rd1"]')
>>> lx = lxml.html.fromstring(str)
>>> input_sel(lx)
[<InputElement b7e7665c name='rd1' type='radio'>]
心碎无痕… 2024-10-25 21:46:34

我刚刚遇到了类似的问题(当然这就是我来这里的原因)。芝加哥市的这个精彩网站(https://webapps1.chicago.gov/buildingrecords/home) 要求您的机器人通过单选按钮和单击按钮来“同意”他们的“免责声明”(这确实非常有趣!)。我在 scrapy.FormRequest.from_response 的帮助下解决了这个问题:

def agreement_failed(response):
    # check the result of your first post here
    return  # something if it's a failure or nothing if it's not


class InspectionsListSpider(scrapy.Spider):
    name = 'inspections_list'
    start_urls = ['https://webapps1.chicago.gov/buildingrecords/home']

    def parse(self, response):
        return scrapy.FormRequest.from_response(
            response,
            formid='agreement',
            formdata = {"agreement": "Y",
                        "submit": "submit"},
            callback = self.after_agreement
            )

    def after_agreement(self, response):
        if agreement_failed(response):
            self.logger.error("agreement failed!")
            return
        else:
            ... # whatever you are going to do after

与页面代码一起,它非常不言自明。您可能还需要此处描述的表单的其他参数:https://docs.scrapy.org/en/latest/topics/request-response.html?highlight=FormRequest()#scrapy.http.FormRequest.from_response

PS下一页的谜语也可以用同样的方式解决。 :)

I've just bumped into a similar problem (that's why I'm here of course). This wonderful site of the City of Chicago (https://webapps1.chicago.gov/buildingrecords/home) requires your bot to 'Aggree' to their 'liability disclaimer' (this is very funny indeed!) with a radio-button and a click on a button. I solved the problem with the help of scrapy.FormRequest.from_response:

def agreement_failed(response):
    # check the result of your first post here
    return  # something if it's a failure or nothing if it's not


class InspectionsListSpider(scrapy.Spider):
    name = 'inspections_list'
    start_urls = ['https://webapps1.chicago.gov/buildingrecords/home']

    def parse(self, response):
        return scrapy.FormRequest.from_response(
            response,
            formid='agreement',
            formdata = {"agreement": "Y",
                        "submit": "submit"},
            callback = self.after_agreement
            )

    def after_agreement(self, response):
        if agreement_failed(response):
            self.logger.error("agreement failed!")
            return
        else:
            ... # whatever you are going to do after

Together with the code of the page it's pretty self-explanatory. You may also need other parameters of your form described here: https://docs.scrapy.org/en/latest/topics/request-response.html?highlight=FormRequest()#scrapy.http.FormRequest.from_response

P.S. The next pages' riddle is solvable too in the same way. :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文