我如何刮擦“肯塔基州”一词。从整个页面？

发布于 2025-02-10 04:22:33 字数 295 浏览 1 评论 0原文

每次我运行此代码时，它只会给我三遍数据，但我知道它在页面上会弹出更多次。

   from bs4 import BeautifulSoup
   import requests

   url = 'https://www.nba.com/players'
   result = requests.get(url)
   doc = BeautifulSoup(result.text, 'lxml')
   college = doc.find_all(text='Kentucky')
   print(college)

原文

Every time I run this code it only gives me the data three times but I know it pops up way more times on the page.

   from bs4 import BeautifulSoup
   import requests

   url = 'https://www.nba.com/players'
   result = requests.get(url)
   doc = BeautifulSoup(result.text, 'lxml')
   college = doc.find_all(text='Kentucky')
   print(college)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤独患者 2025-02-17 04:22:33

这是因为带有文本的Find_all将搜索精确的文本匹配。

college = doc.find_all(text='Kentucky')

有效的一个： ＆lt; option value =“ kentucky”＆gt; kentucky＆lt;/option＆gt;。
无效： ＆lt;选项值=“ Western Kentucky”＆gt; Western Kentucky＆lt;/option＆gt;。

如果您想找到所有发生的事件，则需要使用正则表达式。
检查 beautiful-soup-doc-doc 。还使用字符串而不是文本。

The string argument is new in Beautiful Soup 4.4.0.
In earlier versions it was called text:

import re
from bs4 import BeautifulSoup
import requests

url = 'https://www.nba.com/players'
result = requests.get(url)
doc = BeautifulSoup(result.text, 'lxml')
college = doc.find_all(string=re.compile("Kentucky"))
print("Total elements: " + str(len(college)))
print(college)

输出：

Total elements: 6
['Kentucky', 'Western Kentucky', 'Kentucky', 'Western Kentucky', 'Kentucky', '{"props":{"pageProps":{"la

It is because find_all with text will search for exact text match.

college = doc.find_all(text='Kentucky')

Valid one: <option value="Kentucky">Kentucky</option>.
No valid: <option value="Western Kentucky">Western Kentucky</option>.

If you want to find all occurrences you need to use regular expressions.
Check beautiful-soup-doc. Also use string instead of text.

The string argument is new in Beautiful Soup 4.4.0.
In earlier versions it was called text:

import re
from bs4 import BeautifulSoup
import requests

url = 'https://www.nba.com/players'
result = requests.get(url)
doc = BeautifulSoup(result.text, 'lxml')
college = doc.find_all(string=re.compile("Kentucky"))
print("Total elements: " + str(len(college)))
print(college)

OUTPUT:

Total elements: 6
['Kentucky', 'Western Kentucky', 'Kentucky', 'Western Kentucky', 'Kentucky', '{"props":{"pageProps":{"la

回复收藏 0 原文

~没有更多了~

关于作者

述情

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

我如何刮擦“肯塔基州”一词。从整个页面？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

身边

qq_oxT0yE

卷着的草席

￡冰雨忧蓝°

我还不会笑

Unbroken

友情链接

我如何刮擦“肯塔基州”一词。从整个页面？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

身边

qq_oxT0yE

卷着的草席

￡冰雨忧蓝°

我还不会笑

Unbroken

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。