Regex,如何找到重复的随机数组?
我目前正在从PDF中解析数据,我想以简单的格式获取名称和数量:[name] [量]
NAME LAST
7 494 25 7 494 25 199 44
NAME LAST
4 488 00 4 488 00 109 07
NAME MIDDLE LAST
7 854 00 7 854 00 298 25
NAME LAST
494 23 494 23 12 01
NAME MIDDLE LAST
4 301 56 4 301 56 112 61
NAME M LAST
13 359 25 13 359 25 130 54
此数据表示以下内容:
[名称] [M?] [最后]
[总工资] [Pit Wages] [扣留坑]名称最后$ 7,494.25 $ 7,494.25 $ 199.44
名称最后$ 4,488.00 $ 4,488.00 $ 109.07
名称中间$ 7,854.00 $ 7,854.00 $ 298.25
名称最后$ 494.23 $ 494.23 $ 12.01
名称中间$ 4,301.56 $ 4,301.56 $ 112.61
名称M最后$ 13,359.25 $ 13,359.25 $ 130.54
我希望将重复的数字组检测到这一点,以便对此进行解析:名称最后$ 7,494.25
名称最后$ 4,488.00
名称中间$ 7,854.00
名称最后$ 494.23
名称中间$ 4,301.56
名称M最后$ 13,359.25
希望这是有道理的。谢谢
I'm currently parsing data from PDFs and I'd like to get the name and amount in a simple format: [NAME] [AMOUNT]
NAME LAST
7 494 25 7 494 25 199 44
NAME LAST
4 488 00 4 488 00 109 07
NAME MIDDLE LAST
7 854 00 7 854 00 298 25
NAME LAST
494 23 494 23 12 01
NAME MIDDLE LAST
4 301 56 4 301 56 112 61
NAME M LAST
13 359 25 13 359 25 130 54
This data means the following:
[NAME] [M?] [LAST]
[TOTAL WAGES] [PIT WAGES] [PIT WITHHELD]NAME LAST $7,494.25 $7,494.25 $199.44
NAME LAST $4,488.00 $4,488.00 $109.07
NAME MIDDLE LAST $7,854.00 $7,854.00 $298.25
NAME LAST $494.23 $494.23 $12.01
NAME MIDDLE LAST $4,301.56 $4,301.56 $112.61
NAME M LAST $13,359.25 $13,359.25 $130.54
I'd like a regex to detect the duplicate group of numbers so that it parses to this:NAME LAST $7,494.25
NAME LAST $4,488.00
NAME MIDDLE LAST $7,854.00
NAME LAST $494.23
NAME MIDDLE LAST $4,301.56
NAME M LAST $13,359.25
Hopefully, that makes sense. Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设您的组织中没有人赚取超过100万美元以上的$ 1,则此正则是您想要的:
来寻找
[AZ] [AZ] [AZ]+< 名称(简单地) /code>(在第1组中捕获)
\ r+
)((\ d+)(?:(\ d+))?(\ \ d+)? d+))
(在第2组中总体捕获,第3、4和5组中捕获的单个数字组)(?= \ 2)< /code>
。*
),您可以将其替换为以下
以获取示例的以下输出数据:
regex101上的演示
如果您使用的是javascript,则需要一些较小的更改。在正则时,将
\ r
替换为[\ r \ n]
,因为JavaScript无法识别\ r
。在替换中,用\ $
$$
替换。在REGEX 101上的演示
通过检查第4组是否是匹配的一部分,在数千和数百之间:
在这种情况下,输出为:
演示
Assuming that no-one in your organisation is making more than $1M or less than $1, this regex will do what you want:
It looks for
[a-z][a-z ]+
(captured in group 1)\R+
)((\d+)(?: (\d+))? (\d+))
(captured overall in group 2, with individual groups of digits captured in groups 3, 4 and 5)(?=\2)
.*
)You can replace that with
to get the following output for your sample data:
Demo on regex101
If you're using JavaScript, you need a couple of minor changes. In the regex, replace
\R
with[\r\n]
as JavaScript doesn't recognise\R
. In the substitution, replace\$
with$$
.Demo on regex 101
If your regex flavour supports conditional replacements, you can add a
,
between the thousands and hundreds by checking if group 4 was part of the match:In this case the output is:
Demo on regex101