用于删除不同行上的重复数字的正则表达式
这也许很简单,但我无法弄清楚:
我有一个随机数(可以是 1、2、3 或 4 位数字) 它在第二行重复:
2131
2131
如何删除第一个数字?
编辑:抱歉,我没有更好地解释它。这些行位于纯文本文件中。我使用 BBEdit 作为我的编辑器。实际文件如下所示(仅约 10.000 行):
336
336
rinde
337
337
diving
338
338
graffiti
339
339
forest
340
340
mountain
如果可能,结果应如下所示:
336 - rinde
337 - diving
338 - graffiti
339 - forest
340 - mountain
It's perhaps quite simple, but I can't figure it out:
I have a random number (can be 1,2,3 or 4 digits)
It's repeating on a second line:
2131
2131
How can I remove the first number?
EDIT: Sorry I didn't explained it better. These lines are in a plain text file. I'm using BBEdit as my editor. And the actual file looks like this (only then app. 10.000 lines):
336
336
rinde
337
337
diving
338
338
graffiti
339
339
forest
340
340
mountain
If possible the result should look like this:
336 - rinde
337 - diving
338 - graffiti
339 - forest
340 - mountain
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
搜索:
替换:
我无权访问 BBEdit,但显然您必须检查“Grep”选项才能启用正则表达式搜索-n-替换。 (我不知道他们为什么这么称呼它,因为它似乎由 PCRE 库提供支持,它比
grep
强大得多。)Search:
Replace:
I don't have access to BBEdit, but apparently you have to check the "Grep" option to enable regex search-n-replace. (I don't know why they call it that, since it seems to be powered by the PCRE library, which is much more powerful than
grep
.)因为你没有提到任何编程语言、工具。我假设这些数字在一个文件中。每个每行,任何重复的数字都在相邻行中。
uniq
命令可以解决您的问题:since you didn't mention any programming language, tools. I assume those numbers are in a file. each per line, and any repeated numbers are in neighbour lines.
uniq
command can solve your problem:另一种查找方式:
/^(\d{1,4})\n(?=\1$)/
替换:""
修饰符
mg
(多行和全局)输出:
1234
431
222
234
添加 在修改后的示例中,您可以执行以下操作:
Find:
/(?=^(\d{1,4}))(?:\1\n)+ \s*([^\n\d]*$)/
替换:
$1 - $2
Mods:/mg(多行,全局)
测试:
输出:
336 - 林德
337 - 潜水
338 - 涂鸦
339
337
339 - 森林
340 - mountain
Added2 - OP后来所需的输出格式比原来的问题给我留下了更深刻的印象。它有很多元素,因此无法控制自己,生成了一个过于复杂的正则表达式。
搜索:<代码>/^(\d{1,4})\n+(?:\1\n+)*\s*(?:((?:(?:\w|[^\S\n]) )*[a-zA-Z](?:\w|[^\S\n])*))\s*(?:\n|$)|)/
替换:
$1 - $2\n
修饰符:mg(
多行,全局
)扩展-
Another way find:
/^(\d{1,4})\n(?=\1$)/
replace:""
modifiers
mg
(multi-line and global)Output:
1234
431
222
234
Added On the revised sample, you could do something like this:
Find:
/(?=^(\d{1,4}))(?:\1\n)+\s*([^\n\d]*$)/
Replace:
$1 - $2
Mods: /mg (multi-line, global)
Test:
Output:
336 - rinde
337 - diving
338 - graffiti
339
337
339 - forest
340 - mountain
Added2 - I was more impressed with the OP's later desired output format than the original question. It has many elements to it so, unable to control myself, generated a way too complicated regex.
Search:
/^(\d{1,4})\n+(?:\1\n+)*\s*(?:((?:(?:\w|[^\S\n])*[a-zA-Z](?:\w|[^\S\n])*))\s*(?:\n|$)|)/
Replace:
$1 - $2\n
Modifiers: mg (
multi-line, global
)Expanded-
查找:
替换:
您应该能够很容易地从那里清理它!
find:
replace:
You should be able to clean it up from there quite easily!
使用正则表达式不可能检测到这种模式。
您可以用“\n”分割字符串然后进行比较。
Detecting such a pattern is not possible using regexp.
You can split the string by the "\n" and then compare.