如何在报废时选择多个元素?
我正在研究有关《纽约时报》报道的新闻标题和内容。
这就是我写的:
import requests
from bs4 import BeautifulSoup
import urllib.request as req
import bs4
import pandas as pd
import numpy as np
import time
import warnings
url = 'https://www.nytimes.com/search?query=ukraine+war'
headers = {
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
' AppleWebKit/537.36 (KHTML, like Gecko)'
' Chrome/102.0.5005.63 Safari/537.36 Edg/102.0.1245.33'
}
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, 'html5lib')
title = soup.find_all("h4", "p.css-16nhkrn")
for title in titles:
title = title.text.strip()
print(title)
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
text = title
wordcloud = WordCloud().generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
我不确定为什么p class = CSS-16NHKRN
未出现。我尝试了其他人,但失败了。
I am doing research on news titles and contents reporting on New York Times.
This is what I have written:
import requests
from bs4 import BeautifulSoup
import urllib.request as req
import bs4
import pandas as pd
import numpy as np
import time
import warnings
url = 'https://www.nytimes.com/search?query=ukraine+war'
headers = {
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
' AppleWebKit/537.36 (KHTML, like Gecko)'
' Chrome/102.0.5005.63 Safari/537.36 Edg/102.0.1245.33'
}
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, 'html5lib')
title = soup.find_all("h4", "p.css-16nhkrn")
for title in titles:
title = title.text.strip()
print(title)
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
text = title
wordcloud = WordCloud().generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()
I am not sure why p class = css-16nhkrn
didn't appear. I have tried others but failed.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试更改
title = soup.find_all(“ H4”,“ P.CSS-16NHKRN”)
totitle = soup.find_all([[“ H4”,“ P”])。然后,这将允许您根据类
css-16nhkrn
过滤P标签,您可以使用正则表达式进行操作。让我知道这是否有效,或者您是否需要正则表达式的帮助!
Try changing
title = soup.find_all("h4", "p.css-16nhkrn")
totitle = soup.find_all(["h4", "p"])
. This would then allow you to filter the p tags based on the classcss-16nhkrn
, which you can do using regular expressions.Let me know if this works, or if you need help with regular expressions!