Beautifuresoup bs.find_all(' a')不在网页上工作
有人可以确切地解释一下是否有一种方法可以从此网页刮擦链接 https:https:https:// hackmd 。
url = 'https://hackmd.io/@nearly-learning/near-201'
html = urlopen(url)
bs = BeautifulSoup(html.read(), 'lxml') # also tried all other parcers
links = bs.find_all('a') # only obtains 23 links, when there are actually loads more.
for link in links:
if 'href' in link.attrs:
print(link.attrs['href'])
仅在本文的实际主体中获得一些链接和非链接。
但是,我可以用硒来做到这一点:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://hackmd.io/@nearly-learning/near-201")
elems = driver.find_elements_by_xpath("//a[@href]")
for elem in elems:
print(elem.get_attribute("href"))
但是如果可能的话,想使用美丽的小组!谁知道这是否?
Could someone please explain exactly if there is a way to scrape links from this webpage https://hackmd.io/@nearly-learning/near-201 using BeautifulSoup or is it only possible with Selenium?
url = 'https://hackmd.io/@nearly-learning/near-201'
html = urlopen(url)
bs = BeautifulSoup(html.read(), 'lxml') # also tried all other parcers
links = bs.find_all('a') # only obtains 23 links, when there are actually loads more.
for link in links:
if 'href' in link.attrs:
print(link.attrs['href'])
Only obtain a few links and non in the actual body of the article.
I am however able to do it with Selenium:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://hackmd.io/@nearly-learning/near-201")
elems = driver.find_elements_by_xpath("//a[@href]")
for elem in elems:
print(elem.get_attribute("href"))
But would like to use BeautifulSoup if possible! Who knows if it is?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您不想使用
selenium
,可以使用Markdown< /code>
包将 Markdown 文本渲染为 HTML 并使用 BeautifulSoup 解析它
:
If you don't want to use
selenium
, you can useMarkdown
package to render the markdown text to HTML and parse it with BeautifulSoup:Prints:
如前所述,它需要
selenium
或类似的东西来渲染所有内容,并且如果您愿意,请确保您可以在 da mix 中使用selenium
和BeautifulSoup
以这种方式选择你的元素。只需将
driver.page_source
推送到您的BeautifulSoup()
示例
As mentioned it needs
selenium
or something similar to render all the content and sure you could useselenium
andBeautifulSoup
in da mix, if you prefer to select your elements in that way.Just push the
driver.page_source
to yourBeautifulSoup()
Example