REGEX从字符串中删除重复数字
我制作了一个数据集,该数据集的代码被管道符号隔开。我意识到每行都有许多重复。 这是三个示例行(将ROGEX应用于Knime中的每一行),
0612|0613|061|0612|0612
0211|0612|021|0212|0211|0211
0111|0111
0511|0512|0511|0511|0521|0512|0511
我正在尝试构建一个正则拨号,该正则从每行中删除重复的代码编号。 我测试了\ b(\ d+)\ b。 -numeric-in-a-a-text-file“> thread”> thread 在这里,但该表达式不能保留其他代码。上面的示例行的所需输出将
0612|0613|061
0211|0612|021|0212
0111
0511|0512|0521
感谢您的帮助
I have produced a data set with codes separated by pipe symbols. I realized there are many duplicates in each row.
Here are three example rows (the regex is applied to each row individually in KNIME)
0612|0613|061|0612|0612
0211|0612|021|0212|0211|0211
0111|0111
0511|0512|0511|0511|0521|0512|0511
I am trying to build a regex that removes the duplicate code numbers from each row.
I tested \b(\d+)\b.*\b\1\b
from a different thread here but the expression does not keep the other codes. The desired outputs for the example rows above would be
0612|0613|061
0211|0612|021|0212
0111
0511|0512|0521
Appreciate your help
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
不知道这刀使用的是什么发动机。
您可能需要一个支持变量长度 lookbehind 一个通过,例如。 net
请参阅RegexStorm at Regexstorm (检查[•]用“上下文”替换匹配,单击“上下文”
) >:事实证明,Knime使用Java的模式实现...
在Java Regex变量Width中实际实现了,但仅通过使用有限的重新定位。第二个问题是,BackReference
\ 1
无法在LookBehind中使用。因此,我们需要一些骗局,并将其放入前面的外观之前,我们将其放入了外观。让我们假设重复项之间的最大潜在距离为999个字符,并且每个字段最多包含9位数字(根据您的需求调整这些值)。
(右侧说明)
java regex demo at Regex101 /代码>
0211 | 0612 | 021 | 0212
0111
0511 | 0512 | 0521
仅使用a lookahead 您也可以获得唯一的行,但是VICE反之亦然(不像您所需的结果)
regex101上的另一个演示
0613 | 061 | 0612 | 0612
0612 | 021 | 0212 | 0211
0111
0521 | 0512 | 0511
有关更多信息,请查看
No idea what regex engine this knime uses.
Probably you need one that supports variable length lookbehind to do it in one pass, eg .NET
See .NET regex demo at Regexstorm (check [•] replace matches with, click on "context")
Update: Turns out knime uses Java's pattern implementation...
In Java regex variable-width lookbehind is actually implemented, but only by use of finite repitition. The second issue is, that backreference
\1
can't be used inside a lookbehind. So we'd need some trickery and put it into a lookahead which we put in the lookbehind.Let's assume a maximum potential distance of 999 characters between duplicates and each field can contain up to 9 digits (adjust these values to your needs).
Java regex demo at Regex101 (explanation on right side)
0612|0613|061
0211|0612|021|0212
0111
0511|0512|0521
With only a lookahead you can get unique rows too, but vice versa (not like your desired results)
Another demo on Regex101
0613|061|0612
0612|021|0212|0211
0111
0521|0512|0511
For further information have a look into the Stackoverflow Regex FAQ.
基于所示的预期输出,您可以使用此正则:
替换字符串为:
$ 2
REGEX DEMO
Based on the expected output shown, you can use this regex:
Replacement string is:
$2
RegEx Demo