Regex，如何找到重复的随机数组？

发布于 2025-02-08 15:15:11 字数 953 浏览 1 评论 0原文

我目前正在从PDF中解析数据，我想以简单的格式获取名称和数量：[name] [量]

 NAME LAST
7 494 25 7 494 25 199 44
 NAME LAST
4 488 00 4 488 00 109 07
 NAME MIDDLE LAST
7 854 00 7 854 00 298 25
 NAME LAST
494 23 494 23 12 01
 NAME MIDDLE LAST
4 301 56 4 301 56 112 61
 NAME M LAST
13 359 25 13 359 25 130 54

此数据表示以下内容：
[名称] [M？] [最后]
[总工资] [Pit Wages] [扣留坑]
名称最后$ 7,494.25 $ 7,494.25 $ 199.44
名称最后$ 4,488.00 $ 4,488.00 $ 109.07
名称中间$ 7,854.00 $ 7,854.00 $ 298.25
名称最后$ 494.23 $ 494.23 $ 12.01
名称中间$ 4,301.56 $ 4,301.56 $ 112.61
名称M最后$ 13,359.25 $ 13,359.25 $ 130.54

我希望将重复的数字组检测到这一点，以便对此进行解析：
名称最后$ 7,494.25
名称最后$ 4,488.00
名称中间$ 7,854.00
名称最后$ 494.23
名称中间$ 4,301.56
名称M最后$ 13,359.25

希望这是有道理的。谢谢

原文

I'm currently parsing data from PDFs and I'd like to get the name and amount in a simple format: [NAME] [AMOUNT]

 NAME LAST
7 494 25 7 494 25 199 44
 NAME LAST
4 488 00 4 488 00 109 07
 NAME MIDDLE LAST
7 854 00 7 854 00 298 25
 NAME LAST
494 23 494 23 12 01
 NAME MIDDLE LAST
4 301 56 4 301 56 112 61
 NAME M LAST
13 359 25 13 359 25 130 54

This data means the following:
[NAME] [M?] [LAST]
[TOTAL WAGES] [PIT WAGES] [PIT WITHHELD]
NAME LAST $7,494.25 $7,494.25 $199.44
NAME LAST $4,488.00 $4,488.00 $109.07
NAME MIDDLE LAST $7,854.00 $7,854.00 $298.25
NAME LAST $494.23 $494.23 $12.01
NAME MIDDLE LAST $4,301.56 $4,301.56 $112.61
NAME M LAST $13,359.25 $13,359.25 $130.54

I'd like a regex to detect the duplicate group of numbers so that it parses to this:
NAME LAST $7,494.25
NAME LAST $4,488.00
NAME MIDDLE LAST $7,854.00
NAME LAST $494.23
NAME MIDDLE LAST $4,301.56
NAME M LAST $13,359.25

Hopefully, that makes sense. Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

彩扇题诗 2025-02-15 15:15:11

假设您的组织中没有人赚取超过100万美元以上的$ 1，则此正则是您想要的：

 *([a-z][a-z ]+)\R+((\d+)(?: (\d+))? (\d+)) (?=\2).*

来寻找

一些空格
它可以使用[AZ] [AZ] [AZ]+< 名称（简单地） /code>（在第1组中捕获）
newline字符（\ r+）
2或3组由空格分隔的数字（（\ d+）（？：（\ d+））？（\ \ d+）？ d+））（在第2组中总体捕获，第3、4和5组中捕获的单个数字组）
，然后断言第2组重复（？= \ 2）< /code>
字符将字符串的其余部分与行结束（可能不需要，取决于您的应用程序）（。*），

您可以将其替换为以下

$1 \$3$4.$5

以获取示例的以下输出数据：

NAME LAST $7494.25
NAME LAST $4488.00
NAME MIDDLE LAST $7854.00
NAME LAST $494.23
NAME MIDDLE LAST $4301.56
NAME M LAST $13359.25

regex101上的演示

如果您使用的是javascript，则需要一些较小的更改。在正则时，将\ r替换为[\ r \ n]，因为JavaScript无法识别\ r。在替换中，用\ $ $$替换。

在REGEX 101上的演示

通过检查第4组是否是匹配的一部分，在数千和数百之间：

$1 \$3${4:+,}$4.$5

在这种情况下，输出为：

NAME LAST $7,494.25
NAME LAST $4,488.00
NAME MIDDLE LAST $7,854.00
NAME LAST $494.23
NAME MIDDLE LAST $4,301.56
NAME M LAST $13,359.25

演示

Assuming that no-one in your organisation is making more than $1M or less than $1, this regex will do what you want:

 *([a-z][a-z ]+)\R+((\d+)(?: (\d+))? (\d+)) (?=\2).*

It looks for

some number of spaces
names (simplistically) with [a-z][a-z ]+ (captured in group 1)
newline characters (\R+)
2 or 3 sets of digits separated by spaces ((\d+)(?: (\d+))? (\d+)) (captured overall in group 2, with individual groups of digits captured in groups 3, 4 and 5)
a space, followed by an assertion that group 2 is repeated (?=\2)
characters to match the rest of the string to end of line (may not be required, dependent on your application) (.*)

You can replace that with

$1 \$3$4.$5

to get the following output for your sample data:

NAME LAST $7494.25
NAME LAST $4488.00
NAME MIDDLE LAST $7854.00
NAME LAST $494.23
NAME MIDDLE LAST $4301.56
NAME M LAST $13359.25

Demo on regex101

If you're using JavaScript, you need a couple of minor changes. In the regex, replace \R with [\r\n] as JavaScript doesn't recognise \R. In the substitution, replace \$ with $$.

Demo on regex 101

If your regex flavour supports conditional replacements, you can add a , between the thousands and hundreds by checking if group 4 was part of the match:

$1 \$3${4:+,}$4.$5

In this case the output is:

NAME LAST $7,494.25
NAME LAST $4,488.00
NAME MIDDLE LAST $7,854.00
NAME LAST $494.23
NAME MIDDLE LAST $4,301.56
NAME M LAST $13,359.25

Demo on regex101

回复收藏 0 原文

~没有更多了~

关于作者

昔梦

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

Regex，如何找到重复的随机数组？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

Regex，如何找到重复的随机数组？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

alipaysp_snBf0MSZIv

梦断已成空

瞎闹

凯凯我们等你回来

寄意

似梦非梦

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。