从pdf文件中提取的文本中提取带小数点的数字

发布于 2025-01-10 19:58:57 字数 583 浏览 0 评论 0 原文

我需要从以下字符串中仅提取带小数点的数字。我使用了 re 模块,但遇到了多个逗号的问题(不能有逗号或超过 1 个)。另一个问题是十进制数字后面跟着单词(即 1,513,971.63Savings )。由于我从 PDF 文件中提取了字符串,因此无法更改格式。

示例字符串:

Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy

输出:

19,858,700.86
350,745,799.38
174,381.98
1,125,990.66
131,647.15

有人帮忙吗?

I need to extract only numbers with a decimal point from the following string. I used re module but faced a problem with a number of commas(there can be no commas or more than 1). Another problem is decimal numbers followed by words (i.e. 1,513,971.63Savings ). As I extracted the string from PDF files so I can't change the format.

sample string:

Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy

output:

19,858,700.86
350,745,799.38
174,381.98
1,125,990.66
131,647.15

Anyone help?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

秋凉 2025-01-17 19:58:57

我猜您错过了174,381.98。如果是这样,请使用 (\d+(?:[,.]\d+)+) 模式获得预期结果。

import re

string = """Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy"""

print(*re.findall(r"(\d+(?:[,.]\d+)+)", string), sep="\n")

I guess you missed the 174,381.98. If so, use (\d+(?:[,.]\d+)+) pattern to get the expected result.

import re

string = """Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy"""

print(*re.findall(r"(\d+(?:[,.]\d+)+)", string), sep="\n")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文