我只能从#0-9和az'中选择链接。在美丽的小组?
我的URL是这个
/在选择从A到Z的所有链接时。
link = s.get(url)
link_soup = BeautifulSoup(link.text, 'lxml')
links = (
link_soup
.select_one('#A')
.parent
.find_next_sibling("ul")
.find_all("a", href=True)
)
但是,当我尝试选择#0-9
....
link_soup
.select_one('#0-9')
.parent
.find_next_sibling("ul")
.find_all("a", href=True)
)
我收到此错误
SelectorSyntaxError: Malformed id selector at position 0
line 1:
#0-9
^
时如何仅从“#0-9和AZ”中选择链接? 我知道我只能使用循环并使用re来更改URL的结尾 并从那里手动刮擦链接,但是有一种方法可以使用SELECT或BS4获得相同的结果。
再次感谢您的帮助。
my URL is this
https://en.wikipedia.org/wiki/List_of_South_Korean_dramas
This works well in selecting all links from for A to Z.
link = s.get(url)
link_soup = BeautifulSoup(link.text, 'lxml')
links = (
link_soup
.select_one('#A')
.parent
.find_next_sibling("ul")
.find_all("a", href=True)
)
But when I try to select_one #0-9
....
link_soup
.select_one('#0-9')
.parent
.find_next_sibling("ul")
.find_all("a", href=True)
)
I get this error
SelectorSyntaxError: Malformed id selector at position 0
line 1:
#0-9
^
How can I select only the links from "#0-9 and A-Z"?
I know I can just use a for loop and use re to change the ending of the URL
and manually scrape the links from there but is there a way to get the same results using select or bs4.
Thanks again for the help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
要回答直接问题,您可以使用属性=值CSS选择器来指定ID属性及其值。这些数字在“”中,因此不要向解析器构成问题。
或逃脱使用其Unicode代码点的领先数字(在这种情况下不需要以下空间,并且可以被缩写为 但是,到\ 30)
但是,您可以指定一个单个模式,以一口气提取所有链接,而无需额外的DOM步行。
To answer the direct question you can use an attribute = value css selector to specify the id attribute and its value. The numbers are within "" and so do not pose an issue to the parser.
Or escape the leading digit using its Unicode code point (no following space needed in this case and can be abbreviated to \30)
However, you could specify a single pattern to extract all links in one go and without the additional up down walking of the DOM.