文章来源于网络收集而来，版权归原创者所有，如有侵权请及时联系！

3.4 CSS 选择器

发布于 2024-02-05 21:13:20 字数 5916 浏览 0 评论 0 收藏 0

CSS即层叠样式表，其选择器是一种用来确定HTML文档中某部分位置的语言。

CSS选择器的语法比XPath更简单一些，但功能不如XPath强大。实际上，当我们调用Selector对象的CSS方法时，在其内部会使用Python库cssselect将CSS选择器表达式翻译成XPath表达式，然后调用Selector对象的XPATH方法。

表3-2列出了CSS选择器的一些基本语法。

表3-2　CSS选择器

和学习XPath一样，通过一些例子展示CSS选择器的使用。

先创建一个HTML文档并构造一个HtmlResponse对象：

>>> from scrapy.selector import Selector
>>> from scrapy.http import HtmlResponse
>>> body = '''
... <html>
...  <head>
...    <base href='http://example.com/' />
...    <title>Example website</title>
...  </head>
...  <body>
...    <div id='images-1'>
...     <a href='image1.html'>Name: Image 1 <br/><img src='image1.jpg' /></a>
...     <a href='image2.html'>Name: Image 2 <br/><img src='image2.jpg' /></a>
...     <a href='image3.html'>Name: Image 3 <br/><img src='image3.jpg' /></a>
...    </div>
...
...    <div id='images-2' class='small'>
...     <a href='image4.html'>Name: Image 4 <br/><img src='image4.jpg' /></a>
...     <a href='image5.html'>Name: Image 5 <br/><img src='image5.jpg' /></a>
...    </div>
...  </body>
... </html>
... '''
...
>>> response = HtmlResponse(url='http://www.example.com', body=body, encoding='utf8')

E：选中E元素。

# 选中所有的img
>>> response.css('img')
[<Selector xpath='descendant-or-self::img' data='<img src="image1.jpg">'>,
 <Selector xpath='descendant-or-self::img' data='<img src="image2.jpg">'>,
 <Selector xpath='descendant-or-self::img' data='<img src="image3.jpg">'>,
 <Selector xpath='descendant-or-self::img' data='<img src="image4.jpg">'>,
 <Selector xpath='descendant-or-self::img' data='<img src="image5.jpg">'>]

E1,E2：选中E1和E2元素。

  # 选中所有base和title
  >>> response.css('base,title')
  [<Selector xpath='descendant-or-self::base | descendant-or-self::title' data='<base
href="http://example.com/">'>,
  <Selector xpath='descendant-or-self::base | descendant-or-self::title' data='<title>Example
website</title>'>]

E1 E2：选中E1后代元素中的E2元素。

# div 后代中的img
>>> response.css('div img')
[<Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image1.jpg">'>,
 <Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image2.jpg">'>,
 <Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image3.jpg">'>,
 <Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image4.jpg">'>,
 <Selector xpath='descendant-or-self::div/descendant-or-self::*/img' data='<img src="image5.jpg">'>]

E1>E2：选中E1子元素中的E2元素。

# body 子元素中的div
>>> response.css('body>div')
[<Selector xpath='descendant-or-self::body/div' data='<div id="images-1"images-2" class="small">\n '>]

[ATTR]：选中包含ATTR属性的元素。

# 选中包含style属性的元素
>>> response.css('[style]')
[<Selector xpath='descendant-or-self::*[@style]' data='<div id="images-1"mso-spacerun:'yes';font-family:monospace;color:rgb(0,0,0);
letter-spacing:0.0000pt;font-weight:normal;text-transform:none;
font-style:normal;font-variant:normal;font-size:12.0000pt;">
[ATTR=VALUE]：选中包含ATTR属性且值为VALUE的元素。

  # 选中属性id值为images-1的元素
  >>> response.css('[id=images-1]')
  [<Selector xpath="descendant-or-self::*[@id = 'images-1']" data='<div id="images-1"mso-spacerun:'yes';font-family:monospace;color:rgb(0,0,0);
letter-spacing:0.0000pt;font-weight:normal;text-transform:none;
font-style:normal;font-variant:normal;font-size:12.0000pt;">
E:nth-child(n)：选中E元素，且该元素必须是其父元素的第n个子元素。

  # 选中每个div的第一个a
  >>> response.css('div>a:nth-child(1)')
  [<Selector xpath="descendant-or-self::div/*[name() = 'a' and (position() = 1)]" data='<a
href="image1.html">Name: Image 1 <br>'>,
  <Selector xpath="descendant-or-self::div/*[name() = 'a' and (position() = 1)]" data='<a
href="image4.html">Name: Image 4 <br>'>]

  # 选中第二个div的第一个a
  >>> response.css('div:nth-child(2)>a:nth-child(1)')
  [<Selector xpath="descendant-or-self::*/*[name() = 'div' and (position() = 2)]/*[name() = 'a' and
(position() = 1)]" data='<a href="image4.html">Name: Image 4 <br>'>]

E:first-child：选中E元素，该元素必须是其父元素的第一个子元素。

E:last-child：选中E元素，该元素必须是其父元素的倒数第一个子元素。

  # 选中第一个div的最后一个a
  >>> response.css('div:first-child>a:last-child')
  [<Selector xpath="descendant-or-self::*/*[name() = 'div' and (position() = 1)]/*[name() = 'a' and
(position() = last())]" data='<a href="image3.html">Name: Image 3 <br>'>]

E::text：选中E元素的文本节点。

# 选中所有a的文本
>>> sel = response.css('a::text')
>>> sel
[<Selector xpath='descendant-or-self::a/text()' data='Name: Image 1 '>,
 <Selector xpath='descendant-or-self::a/text()' data='Name: Image 2 '>,
 <Selector xpath='descendant-or-self::a/text()' data='Name: Image 3 '>,
 <Selector xpath='descendant-or-self::a/text()' data='Name: Image 4 '>,
 <Selector xpath='descendant-or-self::a/text()' data='Name: Image 5 '>]
>>> sel.extract()
['Name: Image 1 ',
 'Name: Image 2 ',
 'Name: Image 3 ',
 'Name: Image 4 ',
 'Name: Image 5 ']

关于CSS选择器的使用先介绍到这里，更多详细内容可以参看CSS选择器文档：https://www.w3.org/TR/css3-selectors/。

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

3.4 CSS 选择器

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。