如何在报废时选择多个元素?

发布于 2025-02-07 12:55:38 字数 1001 浏览 1 评论 0原文

我正在研究有关《纽约时报》报道的新闻标题和内容。

这就是我写的:

import requests 
from bs4 import BeautifulSoup
import urllib.request as req 
import bs4
import pandas as pd
import numpy as np
import time
import warnings

url = 'https://www.nytimes.com/search?query=ukraine+war'

headers = {
   'User-Agent':
       'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
       ' AppleWebKit/537.36 (KHTML, like Gecko)'
       ' Chrome/102.0.5005.63 Safari/537.36 Edg/102.0.1245.33'
}

r = requests.get(url, headers=headers)

soup = BeautifulSoup(r.text, 'html5lib')

title = soup.find_all("h4", "p.css-16nhkrn")

for title in titles:
    title = title.text.strip()
    print(title)

from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt

text = title

wordcloud = WordCloud().generate(text)

plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

我不确定为什么p class = CSS-16NHKRN未出现。我尝试了其他人,但失败了。

I am doing research on news titles and contents reporting on New York Times.

This is what I have written:

import requests 
from bs4 import BeautifulSoup
import urllib.request as req 
import bs4
import pandas as pd
import numpy as np
import time
import warnings

url = 'https://www.nytimes.com/search?query=ukraine+war'

headers = {
   'User-Agent':
       'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
       ' AppleWebKit/537.36 (KHTML, like Gecko)'
       ' Chrome/102.0.5005.63 Safari/537.36 Edg/102.0.1245.33'
}

r = requests.get(url, headers=headers)

soup = BeautifulSoup(r.text, 'html5lib')

title = soup.find_all("h4", "p.css-16nhkrn")

for title in titles:
    title = title.text.strip()
    print(title)

from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt

text = title

wordcloud = WordCloud().generate(text)

plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

I am not sure why p class = css-16nhkrn didn't appear. I have tried others but failed.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

万人眼中万个我 2025-02-14 12:55:38

尝试更改title = soup.find_all(“ H4”,“ P.CSS-16NHKRN”) to title = soup.find_all([[“ H4”,“ P”])。然后,这将允许您根据类css-16nhkrn过滤P标签,您可以使用正则表达式进行操作。

让我知道这是否有效,或者您是否需要正则表达式的帮助!

Try changing title = soup.find_all("h4", "p.css-16nhkrn") to title = soup.find_all(["h4", "p"]). This would then allow you to filter the p tags based on the class css-16nhkrn, which you can do using regular expressions.

Let me know if this works, or if you need help with regular expressions!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文