Regex,如何找到重复的随机数组?

发布于 2025-02-08 15:15:11 字数 953 浏览 1 评论 0原文

我目前正在从PDF中解析数据,我想以简单的格式获取名称和数量:[name] [量]

 NAME LAST
7 494 25 7 494 25 199 44
 NAME LAST
4 488 00 4 488 00 109 07
 NAME MIDDLE LAST
7 854 00 7 854 00 298 25
 NAME LAST
494 23 494 23 12 01
 NAME MIDDLE LAST
4 301 56 4 301 56 112 61
 NAME M LAST
13 359 25 13 359 25 130 54

此数据表示以下内容:
[名称] [M?] [最后]
[总工资] [Pit Wages] [扣留坑]
名称最后$ 7,494.25 $ 7,494.25 $ 199.44
名称最后$ 4,488.00 $ 4,488.00 $ 109.07
名称中间$ 7,854.00 $ 7,854.00 $ 298.25
名称最后$ 494.23 $ 494.23 $ 12.01
名称中间$ 4,301.56 $ 4,301.56 $ 112.61
名称M最后$ 13,359.25 $ 13,359.25 $ 130.54

我希望将重复的数字组检测到这一点,以便对此进行解析:
名称最后$ 7,494.25
名称最后$ 4,488.00
名称中间$ 7,854.00
名称最后$ 494.23
名称中间$ 4,301.56
名称M最后$ 13,359.25

希望这是有道理的。谢谢

I'm currently parsing data from PDFs and I'd like to get the name and amount in a simple format: [NAME] [AMOUNT]

 NAME LAST
7 494 25 7 494 25 199 44
 NAME LAST
4 488 00 4 488 00 109 07
 NAME MIDDLE LAST
7 854 00 7 854 00 298 25
 NAME LAST
494 23 494 23 12 01
 NAME MIDDLE LAST
4 301 56 4 301 56 112 61
 NAME M LAST
13 359 25 13 359 25 130 54

This data means the following:
[NAME] [M?] [LAST]
[TOTAL WAGES] [PIT WAGES] [PIT WITHHELD]
NAME LAST $7,494.25 $7,494.25 $199.44
NAME LAST $4,488.00 $4,488.00 $109.07
NAME MIDDLE LAST $7,854.00 $7,854.00 $298.25
NAME LAST $494.23 $494.23 $12.01
NAME MIDDLE LAST $4,301.56 $4,301.56 $112.61
NAME M LAST $13,359.25 $13,359.25 $130.54

I'd like a regex to detect the duplicate group of numbers so that it parses to this:
NAME LAST $7,494.25
NAME LAST $4,488.00
NAME MIDDLE LAST $7,854.00
NAME LAST $494.23
NAME MIDDLE LAST $4,301.56
NAME M LAST $13,359.25

Hopefully, that makes sense. Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

彩扇题诗 2025-02-15 15:15:11

假设您的组织中没有人赚取超过100万美元以上的$ 1,则此正则是您想要的:

 *([a-z][a-z ]+)\R+((\d+)(?: (\d+))? (\d+)) (?=\2).*

来寻找

  • 一些空格
  • 它可以使用[AZ] [AZ] [AZ]+< 名称(简单地) /code>(在第1组中捕获)
  • newline字符(\ r+
  • 2或3组由空格分隔的数字((\ d+)(?:(\ d+))?(\ \ d+)? d+))(在第2组中总体捕获,第3、4和5组中捕获的单个数字组)
  • ,然后断言第2组重复(?= \ 2)< /code>
  • 字符将字符串的其余部分与行结束(可能不需要,取决于您的应用程序)(。*),

您可以将其替换为以下

$1 \$3$4.$5

以获取示例的以下输出数据:

NAME LAST $7494.25
NAME LAST $4488.00
NAME MIDDLE LAST $7854.00
NAME LAST $494.23
NAME MIDDLE LAST $4301.56
NAME M LAST $13359.25

regex101上的演示

如果您使用的是javascript,则需要一些较小的更改。在正则时,将\ r替换为[\ r \ n],因为JavaScript无法识别\ r。在替换中,用\ $ $$替换。

在REGEX 101上的演示

通过检查第4组是否是匹配的一部分,在数千和数百之间:

$1 \$3${4:+,}$4.$5

在这种情况下,输出为:

NAME LAST $7,494.25
NAME LAST $4,488.00
NAME MIDDLE LAST $7,854.00
NAME LAST $494.23
NAME MIDDLE LAST $4,301.56
NAME M LAST $13,359.25

演示

Assuming that no-one in your organisation is making more than $1M or less than $1, this regex will do what you want:

 *([a-z][a-z ]+)\R+((\d+)(?: (\d+))? (\d+)) (?=\2).*

It looks for

  • some number of spaces
  • names (simplistically) with [a-z][a-z ]+ (captured in group 1)
  • newline characters (\R+)
  • 2 or 3 sets of digits separated by spaces ((\d+)(?: (\d+))? (\d+)) (captured overall in group 2, with individual groups of digits captured in groups 3, 4 and 5)
  • a space, followed by an assertion that group 2 is repeated (?=\2)
  • characters to match the rest of the string to end of line (may not be required, dependent on your application) (.*)

You can replace that with

$1 \$3$4.$5

to get the following output for your sample data:

NAME LAST $7494.25
NAME LAST $4488.00
NAME MIDDLE LAST $7854.00
NAME LAST $494.23
NAME MIDDLE LAST $4301.56
NAME M LAST $13359.25

Demo on regex101

If you're using JavaScript, you need a couple of minor changes. In the regex, replace \R with [\r\n] as JavaScript doesn't recognise \R. In the substitution, replace \$ with $$.

Demo on regex 101

If your regex flavour supports conditional replacements, you can add a , between the thousands and hundreds by checking if group 4 was part of the match:

$1 \$3${4:+,}$4.$5

In this case the output is:

NAME LAST $7,494.25
NAME LAST $4,488.00
NAME MIDDLE LAST $7,854.00
NAME LAST $494.23
NAME MIDDLE LAST $4,301.56
NAME M LAST $13,359.25

Demo on regex101

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文