如何刮擦_ngcontent-c0？

发布于 2025-02-08 12:24:09 字数 933 浏览 0 评论 0原文

我正在尝试写我的第一个刮刀，并且正在面临问题。当然，我看过的所有教程都提到了标签和每个故事的国家：

import requests
import csv
from bs4 import BeautifulSoup
from itertools import zip_longest

result = requests.get("https://www.cdc.gov/globalhealth/healthprotection/stories-from-the- 
field/stories-by-country.html?Sort=Date%3A%3Adesc")
source = result.content
soup = BeautifulSoup(source,"lxml")

-----------------------------现在我的问题出现了------------------------------------------------------------ ---------------------------------------------- 当我开始寻求在CDC越南刮擦标题时，会使用技术创新来改善COVID-19的响应！

当我尝试了解代码时：

title = soup.find_all("span__ngcontent-c0",{"class": ##I don't know what goes here!})

当然它不起作用。我已经搜索并发现此_ngcontent-c0实际上是角度的，但我不知道该如何刮擦！有帮助吗？

原文

I'm trying to write my first ever scraper and I'm facing a problem. all of the tutorials I've watched of course mention Tags in order to kind of catch the part you want to scrape and they mention something like this, or this is actually my code thus far, I'm trying to scrape the title, date, and country of each story:

import requests
import csv
from bs4 import BeautifulSoup
from itertools import zip_longest

result = requests.get("https://www.cdc.gov/globalhealth/healthprotection/stories-from-the- 
field/stories-by-country.html?Sort=Date%3A%3Adesc")
source = result.content
soup = BeautifulSoup(source,"lxml")

--------------------------NOW COMES MY PROBLEM------------------------------------------
when I start looking to scrape the title it in a CDC Vietnam uses Technology Innovations to Improve COVID-19 Response like this!

When I try the code I learned :

title = soup.find_all("span__ngcontent-c0",{"class": ##I don't know what goes here!})

of course it doesn't work. I have searched and found this _ngcontent-c0 is actually angular but I don't know how to scrape it! Any help?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小猫一只 2025-02-15 12:24:09

此网络需要 javascript 渲染您要刮擦的所有内容。
它调用API获取所有内容。 只需要求此API。
您需要做类似的事情：

import requests

result = requests.get(
    "https://www.cdc.gov/globalhealth/healthprotection/stories-from-the-field/dghp-stories-country.json")

for item in result.json()["items"]:
    print("Title: " + item["Title"])
    print("Date: " + item["Date"][0:10])
    print("Country: " + ','.join(item["Country"]))
    print()

输出：

Title: System Strengthening – A One Health Approach
Date: 2016-12-12
Country: Kenya,Multiple

Title: Early Warning Alert and Response Network Put the Brakes on Deadly Diseases
Date: 2016-12-12
Country: Somalia,Syria

我希望我能够为您提供帮助。

This web needs javascript to render all content you want to scrape.
It calls API to get all content. Just request this API.
You need to do something like this:

import requests

result = requests.get(
    "https://www.cdc.gov/globalhealth/healthprotection/stories-from-the-field/dghp-stories-country.json")

for item in result.json()["items"]:
    print("Title: " + item["Title"])
    print("Date: " + item["Date"][0:10])
    print("Country: " + ','.join(item["Country"]))
    print()

OUTPUT:

Title: System Strengthening – A One Health Approach
Date: 2016-12-12
Country: Kenya,Multiple

Title: Early Warning Alert and Response Network Put the Brakes on Deadly Diseases
Date: 2016-12-12
Country: Somalia,Syria

I hope I have been able to help you.

回复收藏 0 原文

~没有更多了~