可以从网页检索链接
我正在使用BS4来浏览许多网站,并从每个页面上获取一个特定的链接,但我遇到了一个问题,请抓住该链接。
我尝试使用所有链接使用。
soup = BeautifulSoup(browser.page_source,"lxml")
print(soup.find_all('a'))
我尝试了许多其他方法,包括告诉它一个站点的确切地址。
但是,除了我想要的链接外,每次似乎都返回所有内容。
对于上下文,我的代码转到本网站的页面
https://ce.naco.org/?find=true
我正在寻找
中的链接的页面 https://ce.naco.org/?county_info=06019
https://ce.naco.org/?county_info=08045 大多数这些页面都有一个链接,那是我要抓取的链接它仅返回该链接似乎是BS4看不见的。
我认为这与该页面如何根据用户点击加载数据有关,并且由于BS4不与网站进行交互,因此不会加载数据???但这只是一个猜测。
I am using bs4 to run through a bunch of websites and grab a specific link off each page but I am having an issue grabbing that link.
I have tried getting all the links using.
soup = BeautifulSoup(browser.page_source,"lxml")
print(soup.find_all('a'))
I have tried many other ways including telling it the exact address of one site.
but every time seems to return everything but the link I want.
For context my code goes to pages of this site
https://ce.naco.org/?find=true
These are two of many pages that I am searching for the link in
https://ce.naco.org/?county_info=06019
https://ce.naco.org/?county_info=08045
Under "COUNTY CONTACT" there is a link in most of these pages and that is the link I want to grab but I just can't find a way to make it return only that link it just seems to be invisible to bs4.
I think it has something to do with how the page loads data based on what the user clicks and since bs4 isn't interacting with the site it doesn't load the data??? but this is just a guess.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不要使用此端点来获取数据:
https://ce.naco.org/get/county?fips=06019
以下方法:
输出:
这两个
县代码:
输出:
Instead of scraping the page, just use this endpoint to grab the data:
https://ce.naco.org/get/county?fips=06019
Here's how:
Output:
This works for both
county codes
:Output: