htmlsession：cssselect.xpath.expressionerror：不支持伪元素

发布于 2025-02-07 02:03:52 字数 2647 浏览 0 评论 0原文

我正在使用htmlsession进行Web Scraper项目，我计划使用一组用户定义的关键字来刮擦搜索引擎结果。我已经开始为刮板编写代码，这是：

from requests_html import HTMLSession

class Scraper():
    def scrapedata(self,tag):
        url = f'https://www.ask.com/web?q={tag}'
        s = HTMLSession()
        r = s.get(url)
        print(r.status_code)

        qlist = []

        ask = r.html.find('div.PartialSearchResults-item')

        for a in ask:
            print(a.find('a.PartialSearchResults-item-title-link.result-link::text', first = True ).text.strip())


ask = Scraper()
ask.scrapedata('ferrari')

但是，当我运行此代码时，而不是获得与终端中搜索的关键字相关的所有网页标题的列表，我得到了以下错误：

[Running] python -u "c:\Users\user\Documents\AAprojects\Whelpsgroups1\Beauty\scraper.py"
200
Traceback (most recent call last):
  File "c:\Users\user\Documents\AAprojects\Whelpsgroups1\Beauty\scraper.py", line 19, in <module>
    ask.scrapedata('ferrari')
  File "c:\Users\user\Documents\AAprojects\Whelpsgroups1\Beauty\scraper.py", line 15, in scrapedata
    print(a.find('a.PartialSearchResults-item-title-link.result-link::text', first = True ).text.strip())
  File "C:\Python310\lib\site-packages\requests_html.py", line 212, in find
    for found in self.pq(selector)
  File "C:\Python310\lib\site-packages\pyquery\pyquery.py", line 261, in __call__
    result = self._copy(*args, parent=self, **kwargs)
  File "C:\Python310\lib\site-packages\pyquery\pyquery.py", line 247, in _copy
    return self.__class__(*args, **kwargs)
  File "C:\Python310\lib\site-packages\pyquery\pyquery.py", line 232, in __init__
    xpath = self._css_to_xpath(selector)
  File "C:\Python310\lib\site-packages\pyquery\pyquery.py", line 243, in _css_to_xpath
    return self._translator.css_to_xpath(selector, prefix)
  File "C:\Python310\lib\site-packages\cssselect\xpath.py", line 190, in css_to_xpath
    return ' | '.join(self.selector_to_xpath(selector, prefix,
  File "C:\Python310\lib\site-packages\cssselect\xpath.py", line 190, in <genexpr>
    return ' | '.join(self.selector_to_xpath(selector, prefix,
  File "C:\Python310\lib\site-packages\cssselect\xpath.py", line 222, in selector_to_xpath
    xpath = self.xpath_pseudo_element(xpath, selector.pseudo_element)
  File "C:\Python310\lib\site-packages\cssselect\xpath.py", line 232, in xpath_pseudo_element
    raise ExpressionError('Pseudo-elements are not supported.')
cssselect.xpath.ExpressionError: Pseudo-elements are not supported.

[Done] exited with code=1 in 17.566 seconds

我什至不知道这意味着什么，我搜索了互联网，而是遇到了与IE7有关的问题，并且我不知道与我的问题有什么关系，尤其是因为我将Microsoft Edge用作默认值Web浏览器。另外，我希望依靠社区中经验丰富的成员的帮助来帮助我解决问题。谢谢喀麦隆。

原文

I'm working on a web scraper project with HTMLSession, I plan to scrape Ask search engine results using a set of user-defined keywords. I have already started with writing the code for my scraper, here it is:

from requests_html import HTMLSession

class Scraper():
    def scrapedata(self,tag):
        url = f'https://www.ask.com/web?q={tag}'
        s = HTMLSession()
        r = s.get(url)
        print(r.status_code)

        qlist = []

        ask = r.html.find('div.PartialSearchResults-item')

        for a in ask:
            print(a.find('a.PartialSearchResults-item-title-link.result-link::text', first = True ).text.strip())


ask = Scraper()
ask.scrapedata('ferrari')

However when I run this code, instead of getting the list of all the web page titles related to the keywords searched in my terminal as it should have, I get the following errors:

[Running] python -u "c:\Users\user\Documents\AAprojects\Whelpsgroups1\Beauty\scraper.py"
200
Traceback (most recent call last):
  File "c:\Users\user\Documents\AAprojects\Whelpsgroups1\Beauty\scraper.py", line 19, in <module>
    ask.scrapedata('ferrari')
  File "c:\Users\user\Documents\AAprojects\Whelpsgroups1\Beauty\scraper.py", line 15, in scrapedata
    print(a.find('a.PartialSearchResults-item-title-link.result-link::text', first = True ).text.strip())
  File "C:\Python310\lib\site-packages\requests_html.py", line 212, in find
    for found in self.pq(selector)
  File "C:\Python310\lib\site-packages\pyquery\pyquery.py", line 261, in __call__
    result = self._copy(*args, parent=self, **kwargs)
  File "C:\Python310\lib\site-packages\pyquery\pyquery.py", line 247, in _copy
    return self.__class__(*args, **kwargs)
  File "C:\Python310\lib\site-packages\pyquery\pyquery.py", line 232, in __init__
    xpath = self._css_to_xpath(selector)
  File "C:\Python310\lib\site-packages\pyquery\pyquery.py", line 243, in _css_to_xpath
    return self._translator.css_to_xpath(selector, prefix)
  File "C:\Python310\lib\site-packages\cssselect\xpath.py", line 190, in css_to_xpath
    return ' | '.join(self.selector_to_xpath(selector, prefix,
  File "C:\Python310\lib\site-packages\cssselect\xpath.py", line 190, in <genexpr>
    return ' | '.join(self.selector_to_xpath(selector, prefix,
  File "C:\Python310\lib\site-packages\cssselect\xpath.py", line 222, in selector_to_xpath
    xpath = self.xpath_pseudo_element(xpath, selector.pseudo_element)
  File "C:\Python310\lib\site-packages\cssselect\xpath.py", line 232, in xpath_pseudo_element
    raise ExpressionError('Pseudo-elements are not supported.')
cssselect.xpath.ExpressionError: Pseudo-elements are not supported.

[Done] exited with code=1 in 17.566 seconds

I don't even know what this means, I searched the Internet but instead came across problems related to IE7 and I don't see what has to do with my problem, especially since I'm using Microsoft Edge as my default web browser. Also, I hope to count on the help of more experienced members of the community to help me solve the problem. Thank you from Cameroon.

分享到QQ

分享到微博