我如何刮擦“肯塔基州”一词。从整个页面?

发布于 2025-02-10 04:22:33 字数 295 浏览 1 评论 0原文

每次我运行此代码时,它只会给我三遍数据,但我知道它在页面上会弹出更多次。

   from bs4 import BeautifulSoup
   import requests

   url = 'https://www.nba.com/players'
   result = requests.get(url)
   doc = BeautifulSoup(result.text, 'lxml')
   college = doc.find_all(text='Kentucky')
   print(college)

Every time I run this code it only gives me the data three times but I know it pops up way more times on the page.

   from bs4 import BeautifulSoup
   import requests

   url = 'https://www.nba.com/players'
   result = requests.get(url)
   doc = BeautifulSoup(result.text, 'lxml')
   college = doc.find_all(text='Kentucky')
   print(college)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

孤独患者 2025-02-17 04:22:33

这是因为带有文本的Find_all将搜索精确的文本匹配

college = doc.find_all(text='Kentucky')

有效的一个: < option value =“ kentucky”> kentucky</option>
无效: <选项值=“ Western Kentucky”> Western Kentucky</option>

如果您想找到所有发生的事件,则需要使用正则表达式
检查 beautiful-soup-doc-doc 。还使用字符串而不是文本

The string argument is new in Beautiful Soup 4.4.0.
In earlier versions it was called text:
import re
from bs4 import BeautifulSoup
import requests

url = 'https://www.nba.com/players'
result = requests.get(url)
doc = BeautifulSoup(result.text, 'lxml')
college = doc.find_all(string=re.compile("Kentucky"))
print("Total elements: " + str(len(college)))
print(college)

输出:

Total elements: 6
['Kentucky', 'Western Kentucky', 'Kentucky', 'Western Kentucky', 'Kentucky', '{"props":{"pageProps":{"la

It is because find_all with text will search for exact text match.

college = doc.find_all(text='Kentucky')

Valid one: <option value="Kentucky">Kentucky</option>.
No valid: <option value="Western Kentucky">Western Kentucky</option>.

If you want to find all occurrences you need to use regular expressions.
Check beautiful-soup-doc. Also use string instead of text.

The string argument is new in Beautiful Soup 4.4.0.
In earlier versions it was called text:
import re
from bs4 import BeautifulSoup
import requests

url = 'https://www.nba.com/players'
result = requests.get(url)
doc = BeautifulSoup(result.text, 'lxml')
college = doc.find_all(string=re.compile("Kentucky"))
print("Total elements: " + str(len(college)))
print(college)

OUTPUT:

Total elements: 6
['Kentucky', 'Western Kentucky', 'Kentucky', 'Western Kentucky', 'Kentucky', '{"props":{"pageProps":{"la
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文