当前位置：文江博客话题详情

Python regex python-re

从pdf文件中提取的文本中提取带小数点的数字

发布于 2025-01-10 19:58:57 字数 583 浏览 0 评论 0 原文

我需要从以下字符串中仅提取带小数点的数字。我使用了 re 模块，但遇到了多个逗号的问题（不能有逗号或超过 1 个）。另一个问题是十进制数字后面跟着单词（即 1,513,971.63Savings ）。由于我从 PDF 文件中提取了字符串，因此无法更改格式。

示例字符串：

Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy

输出：

19,858,700.86
350,745,799.38
174,381.98
1,125,990.66
131,647.15

有人帮忙吗？

原文

I need to extract only numbers with a decimal point from the following string. I used re module but faced a problem with a number of commas(there can be no commas or more than 1). Another problem is decimal numbers followed by words (i.e. 1,513,971.63Savings ). As I extracted the string from PDF files so I can't change the format.

sample string:

Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy

output:

19,858,700.86
350,745,799.38
174,381.98
1,125,990.66
131,647.15

Anyone help?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

秋凉 2025-01-17 19:58:57

我猜您错过了174,381.98。如果是这样，请使用 (\d+(?:[,.]\d+)+) 模式获得预期结果。

import re

string = """Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy"""

print(*re.findall(r"(\d+(?:[,.]\d+)+)", string), sep="\n")

I guess you missed the 174,381.98. If so, use (\d+(?:[,.]\d+)+) pattern to get the expected result.

import re

string = """Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy"""

print(*re.findall(r"(\d+(?:[,.]\d+)+)", string), sep="\n")

回复收藏 0 原文

~没有更多了~

关于作者

小…楫夜泊

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

从pdf文件中提取的文本中提取带小数点的数字

示例字符串：

输出：

sample string:

output:

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

紫罗兰の梦幻

-2134

liuxuanli

意中人

○愚か者の日

xxhui

友情链接

从pdf文件中提取的文本中提取带小数点的数字

示例字符串：

输出：

sample string:

output:

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

紫罗兰の梦幻

-2134

liuxuanli

意中人

○愚か者の日

xxhui

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。