如何抓取以“0”结尾的数字从网站上？

发布于 2025-01-17 02:45:44 字数 1770 浏览 1 评论 0原文

我使用 BeaustifulSoup 抓取网址“https://nature.altmetric.com/details/114136890”上的一些文本，并得到这样的响应

# The table is called twitterGeographical_TableChoice
<table>
<tr>
<th>Country</th>
<th class="num">Count</th>
<th class="num percent">As %</th>
</tr>
<tr>
<td>Japan</td>
<td class="num">3</td>
<td class="num">12%</td>
</tr>
<tr>
<td>Poland</td>
<td class="num">3</td>
<td class="num">12%</td>
</tr>
<tr>
<td>Spain</td>
<td class="num">3</td>
<td class="num">12%</td>
</tr>
<tr>
<td>El Salvador</td>
<td class="num">2</td>
<td class="num">8%</td>
</tr>
<tr>
<td>Ecuador</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr>
<td>Mexico</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr>
<td>Chile</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr>
<td>India</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr class="meta">
<td>Unknown</td>
<td class="num">10</td>
<td class="num">40%</td>
</tr>
</table>

然后我想从中获取数字。我使用正则表达式来获取它。我的格式是

twitterGeographical_Table_Num_pattern = re.compile('<td class=\"num\">(\d*%)</td>',re.S)
twitterGeographical_Table_Num = twitterGeographical_Table_Num_pattern.findall(twitterGeographical_TableChoice)

但我只能得到 4% 而不是 40%。我很困惑。感谢您的帮助！

原文

I use BeaustifulSoup to grab some texts on the url"https://nature.altmetric.com/details/114136890",and get such response

# The table is called twitterGeographical_TableChoice
<table>
<tr>
<th>Country</th>
<th class="num">Count</th>
<th class="num percent">As %</th>
</tr>
<tr>
<td>Japan</td>
<td class="num">3</td>
<td class="num">12%</td>
</tr>
<tr>
<td>Poland</td>
<td class="num">3</td>
<td class="num">12%</td>
</tr>
<tr>
<td>Spain</td>
<td class="num">3</td>
<td class="num">12%</td>
</tr>
<tr>
<td>El Salvador</td>
<td class="num">2</td>
<td class="num">8%</td>
</tr>
<tr>
<td>Ecuador</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr>
<td>Mexico</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr>
<td>Chile</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr>
<td>India</td>
<td class="num">1</td>
<td class="num">4%</td>
</tr>
<tr class="meta">
<td>Unknown</td>
<td class="num">10</td>
<td class="num">40%</td>
</tr>
</table>

Then I want to get the number from it.I use regular expression to get it.
My format is

twitterGeographical_Table_Num_pattern = re.compile('<td class=\"num\">(\d*%)</td>',re.S)
twitterGeographical_Table_Num = twitterGeographical_Table_Num_pattern.findall(twitterGeographical_TableChoice)

But I can only get 4% instead of 40%.I am puzzled.Thanks for your help!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

计㈡愣 2025-01-24 02:45:44

我不知道为什么你要使用正则表达式模块来获取数字，而 BeautifulSoup 已经有很多方法可以实现这一点。无论如何，如果您对正则表达式感兴趣，您可以使用此模式：

<td class=\"num\">((\d+)(%)?)</td>

然后您可以使用下面的代码获取数字（百分比，如果是的话）：

[x[0] for x in twitterGeographical_Table_Num]

输出

['10', '40%']

侧注：我请求您考虑将变量命名得更短、更清晰！ :)

I am not sure why you are going to get the numbers with the regex module when BeautifulSoup has already a lot of approaches for this. Anyway, if you are interested in regex you can use this pattern instead:

<td class=\"num\">((\d+)(%)?)</td>

Then you can get the numbers (percentages, if they are) using the code below:

[x[0] for x in twitterGeographical_Table_Num]

Output

['10', '40%']

Side note: I beg you to consider naming the variables shorter and more clear!:)

回复收藏 0 原文

~没有更多了~

关于作者

孤独岁月

暂无简介

文章

29 人气

关注发私信

alipaysp_snBf0MSZIv

文章 0 评论 0

关注

梦断已成空

文章 0 评论 0

关注

瞎闹

文章 0 评论 0

关注

凯凯我们等你回来

文章 0 评论 0

关注

寄意

文章 0 评论 0

关注

似梦非梦

文章 0 评论 0

友情链接

文江博客

如何抓取以“0”结尾的数字从网站上？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

输出

Output

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如何抓取以“0”结尾的数字从网站上？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

输出

Output

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。