我只能从＃0-9和az'中选择链接。在美丽的小组？

发布于 2025-02-08 10:44:27 字数 859 浏览 1 评论 0原文

我的URL是这个

/在选择从A到Z的所有链接时。

 link = s.get(url)
    link_soup = BeautifulSoup(link.text, 'lxml')
    links = (
        link_soup
        .select_one('#A')
        .parent
        .find_next_sibling("ul")
        .find_all("a", href=True)
    )

但是，当我尝试选择＃0-9

....

 link_soup
        .select_one('#0-9')
        .parent
        .find_next_sibling("ul")
        .find_all("a", href=True)
    )

我收到此错误

SelectorSyntaxError: Malformed id selector at position 0
  line 1:
#0-9
^

时如何仅从“＃0-9和AZ”中选择链接？我知道我只能使用循环并使用re来更改URL的结尾并从那里手动刮擦链接，但是有一种方法可以使用SELECT或BS4获得相同的结果。

再次感谢您的帮助。

原文

my URL is this

https://en.wikipedia.org/wiki/List_of_South_Korean_dramas

This works well in selecting all links from for A to Z.

 link = s.get(url)
    link_soup = BeautifulSoup(link.text, 'lxml')
    links = (
        link_soup
        .select_one('#A')
        .parent
        .find_next_sibling("ul")
        .find_all("a", href=True)
    )

But when I try to select_one #0-9

....

 link_soup
        .select_one('#0-9')
        .parent
        .find_next_sibling("ul")
        .find_all("a", href=True)
    )

I get this error

SelectorSyntaxError: Malformed id selector at position 0
  line 1:
#0-9
^

How can I select only the links from "#0-9 and A-Z"?
I know I can just use a for loop and use re to change the ending of the URL
and manually scrape the links from there but is there a way to get the same results using select or bs4.

Thanks again for the help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冬天旳寂寞 2025-02-15 10:44:27

要回答直接问题，您可以使用属性=值CSS选择器来指定ID属性及其值。这些数字在“”中，因此不要向解析器构成问题。

link_soup.select('[id="0-9"]')

或逃脱使用其Unicode代码点的领先数字（在这种情况下不需要以下空间，并且可以被缩写为但是，到\ 30）

link_soup.select('#\\30-9')

但是，您可以指定一个单个模式，以一口气提取所有链接，而无需额外的DOM步行。

links = ['https://en.wikipedia.org' + i['href'] for i in link_soup.select('h2:not(:has(#See_also)) + ul a')]

To answer the direct question you can use an attribute = value css selector to specify the id attribute and its value. The numbers are within "" and so do not pose an issue to the parser.

link_soup.select('[id="0-9"]')

Or escape the leading digit using its Unicode code point (no following space needed in this case and can be abbreviated to \30)

link_soup.select('#\\30-9')

However, you could specify a single pattern to extract all links in one go and without the additional up down walking of the DOM.

links = ['https://en.wikipedia.org' + i['href'] for i in link_soup.select('h2:not(:has(#See_also)) + ul a')]

回复收藏 0 原文

~没有更多了~

关于作者

寄离

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

我只能从＃0-9和az'中选择链接。在美丽的小组？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

眼泪淡了忧伤

corot39

守护在此方

github_3h15MP3i7

相思故

滥情空心

友情链接

我只能从＃0-9和az'中选择链接。在美丽的小组？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

眼泪淡了忧伤

corot39

守护在此方

github_3h15MP3i7

相思故

滥情空心

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。