可以从网页检索链接

发布于 2025-02-13 11:24:42 字数 821 浏览 1 评论 0原文

我正在使用BS4来浏览许多网站，并从每个页面上获取一个特定的链接，但我遇到了一个问题，请抓住该链接。

我尝试使用所有链接使用。

 soup = BeautifulSoup(browser.page_source,"lxml")
 print(soup.find_all('a'))

我尝试了许多其他方法，包括告诉它一个站点的确切地址。

但是，除了我想要的链接外，每次似乎都返回所有内容。

对于上下文，我的代码转到本网站的页面
https://ce.naco.org/?find=true

我正在寻找
中的链接的页面 https://ce.naco.org/?county_info=06019

https://ce.naco.org/?county_info=08045 大多数这些页面都有一个链接，那是我要抓取的链接它仅返回该链接似乎是BS4看不见的。

我认为这与该页面如何根据用户点击加载数据有关，并且由于BS4不与网站进行交互，因此不会加载数据？？？但这只是一个猜测。

原文

I am using bs4 to run through a bunch of websites and grab a specific link off each page but I am having an issue grabbing that link.

I have tried getting all the links using.

 soup = BeautifulSoup(browser.page_source,"lxml")
 print(soup.find_all('a'))

I have tried many other ways including telling it the exact address of one site.

but every time seems to return everything but the link I want.

For context my code goes to pages of this site
https://ce.naco.org/?find=true

These are two of many pages that I am searching for the link in
https://ce.naco.org/?county_info=06019
https://ce.naco.org/?county_info=08045

Under "COUNTY CONTACT" there is a link in most of these pages and that is the link I want to grab but I just can't find a way to make it return only that link it just seems to be invisible to bs4.

I think it has something to do with how the page loads data based on what the user clicks and since bs4 isn't interacting with the site it doesn't load the data??? but this is just a guess.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一杯敬自由 2025-02-20 11:24:42

不要使用此端点来获取数据：

https://ce.naco.org/get/county?fips=06019

以下方法：

import requests

data = requests.get("https://ce.naco.org/get/county?fips=06019").json()
print(f'{data["county"]["Full_Address"]}\n{data["county"]["County_Website"]}')

输出：

2281 Tulare St<br>Hall Of Records<br>Fresno, CA 93721-2105
http://www.co.fresno.ca.us

这两个县代码：

import requests

county_codes = ["06019", "08045"]

with requests.Session() as s:
    for county_code in county_codes:
        data = requests.get(f"https://ce.naco.org/get/county?fips={county_code}").json()
        print(f'{data["county"]["Full_Address"]}\n{data["county"]["County_Website"]}')

输出：

2281 Tulare St<br>Hall Of Records<br>Fresno, CA 93721-2105
http://www.co.fresno.ca.us
108 8Th St<br>Glenwood Springs, CO 81601-3355
http://www.garfield-county.com/

Instead of scraping the page, just use this endpoint to grab the data:

https://ce.naco.org/get/county?fips=06019

Here's how:

import requests

data = requests.get("https://ce.naco.org/get/county?fips=06019").json()
print(f'{data["county"]["Full_Address"]}\n{data["county"]["County_Website"]}')

Output:

2281 Tulare St<br>Hall Of Records<br>Fresno, CA 93721-2105
http://www.co.fresno.ca.us

This works for both county codes:

import requests

county_codes = ["06019", "08045"]

with requests.Session() as s:
    for county_code in county_codes:
        data = requests.get(f"https://ce.naco.org/get/county?fips={county_code}").json()
        print(f'{data["county"]["Full_Address"]}\n{data["county"]["County_Website"]}')

Output:

2281 Tulare St<br>Hall Of Records<br>Fresno, CA 93721-2105
http://www.co.fresno.ca.us
108 8Th St<br>Glenwood Springs, CO 81601-3355
http://www.garfield-county.com/

回复收藏 0 原文

~没有更多了~