用于删除不同行上的重复数字的正则表达式

发布于 2024-12-07 08:55:14 字数 437 浏览 3 评论 0原文

这也许很简单，但我无法弄清楚：

我有一个随机数（可以是 1、2、3 或 4 位数字）它在第二行重复：

2131
2131

如何删除第一个数字？

编辑：抱歉，我没有更好地解释它。这些行位于纯文本文件中。我使用 BBEdit 作为我的编辑器。实际文件如下所示（仅约 10.000 行）：

336
336
rinde
337
337
diving
338
338
graffiti
339
339
forest
340
340
mountain

如果可能，结果应如下所示：

336 - rinde
337 - diving
338 - graffiti
339 - forest
340 - mountain

原文

It's perhaps quite simple, but I can't figure it out:

I have a random number (can be 1,2,3 or 4 digits)
It's repeating on a second line:

2131
2131

How can I remove the first number?

EDIT: Sorry I didn't explained it better. These lines are in a plain text file. I'm using BBEdit as my editor. And the actual file looks like this (only then app. 10.000 lines):

336
336
rinde
337
337
diving
338
338
graffiti
339
339
forest
340
340
mountain

If possible the result should look like this:

336 - rinde
337 - diving
338 - graffiti
339 - forest
340 - mountain

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

如梦初醒的夏天 2024-12-14 08:55:14

搜索：

^(\d{1,4})\n(?:\1\n)+([a-z]+$)

替换：

\1 - \2

我无权访问 BBEdit，但显然您必须检查“Grep”选项才能启用正则表达式搜索-n-替换。（我不知道他们为什么这么称呼它，因为它似乎由 PCRE 库提供支持，它比 grep 强大得多。）

Search:

^(\d{1,4})\n(?:\1\n)+([a-z]+$)

Replace:

\1 - \2

I don't have access to BBEdit, but apparently you have to check the "Grep" option to enable regex search-n-replace. (I don't know why they call it that, since it seems to be powered by the PCRE library, which is much more powerful than grep.)

回复收藏 0 原文

原来分手还会想你 2024-12-14 08:55:14

因为你没有提到任何编程语言、工具。我假设这些数字在一个文件中。每个每行，任何重复的数字都在相邻行中。 uniq 命令可以解决您的问题：

kent$  echo "1234
dquote> 1234
dquote> 431
dquote> 431
dquote> 222
dquote> 222
dquote> 234"|uniq

1234
431
222
234

since you didn't mention any programming language, tools. I assume those numbers are in a file. each per line, and any repeated numbers are in neighbour lines. uniq command can solve your problem:

kent$  echo "1234
dquote> 1234
dquote> 431
dquote> 431
dquote> 222
dquote> 222
dquote> 234"|uniq

1234
431
222
234

回复收藏 0 原文

帝王念 2024-12-14 08:55:14

另一种查找方式：/^(\d{1,4})\n(?=\1$)/ 替换：""
修饰符 mg （多行和全局）

$str =
'1234
1234
431
431
222
222
222
234
234';

$str =~ s/^(\d{1,4})\n(?=\1$)//mg;
print $str;

输出：
1234
431
222
234

添加在修改后的示例中，您可以执行以下操作：

Find: /(?=^(\d{1,4}))(?:\1\n)+ \s*([^\n\d]*$)/
替换：$1 - $2
Mods：/mg（多行，全局）

测试：

$str =
'
336
336
rinde
337
337
337
diving
338
338
graffiti
339
337
339
forest
340
340
mountain
';

$str =~ s/(?=^(\d{1,4}))(?:\1\n)+\s*([^\n\d]*$)/$1 - $2/mg;

print $str;

输出：
336 - 林德
337 - 潜水
338 - 涂鸦
339
337
339 - 森林
340 - mountain

Added2 - OP后来所需的输出格式比原来的问题给我留下了更深刻的印象。它有很多元素，因此无法控制自己，生成了一个过于复杂的正则表达式。

搜索：<代码>/^(\d{1,4})\n+(?:\1\n+)*\s*(?:((?:(?:\w|[^\S\n]) )*[a-zA-Z](?:\w|[^\S\n])*))\s*(?:\n|$)|)/
替换：$1 - $2\n
修饰符：mg（多行，全局）

扩展-

# Find:
s{ # Find a single unique digit pattern on a line (group 1)

   ^(\d{1,4})\n+   # Grp 1, capture a digit sequence

   (?:\1\n+)*      # Optionally consume the sequence many times,
   \s*             # and whitespaces (cleanup)

   # Get the next word (group 2)
   (?:
     # Either find a valid word
       (                      # Grp2 
          (?:
             (?:\w|[^\S\n])*     # Optional \w or non-newline whitespaces
             [a-zA-Z]            # with at least one alpha character
             (?:\w|[^\S\n])*
          )
       )
       \s*                    # Consume whitespaces (cleanup),
       (?:\n|$)               # a newline
                              # or, end of string
     |
     # OR, dont find anything (clears group 2)
   )
 }

# Replace (rewrite the new block)
 {$1 - $2\n}xmg;  # modifiers expanded, multi-line, global

Another way find: /^(\d{1,4})\n(?=\1$)/ replace: ""
modifiers mg (multi-line and global)

$str =
'1234
1234
431
431
222
222
222
234
234';

$str =~ s/^(\d{1,4})\n(?=\1$)//mg;
print $str;

Output:
1234
431
222
234

Added On the revised sample, you could do something like this:

Find: /(?=^(\d{1,4}))(?:\1\n)+\s*([^\n\d]*$)/
Replace: $1 - $2
Mods: /mg (multi-line, global)

Test:

$str =
'
336
336
rinde
337
337
337
diving
338
338
graffiti
339
337
339
forest
340
340
mountain
';

$str =~ s/(?=^(\d{1,4}))(?:\1\n)+\s*([^\n\d]*$)/$1 - $2/mg;

print $str;

Output:
336 - rinde
337 - diving
338 - graffiti
339
337
339 - forest
340 - mountain

Added2 - I was more impressed with the OP's later desired output format than the original question. It has many elements to it so, unable to control myself, generated a way too complicated regex.

Search: /^(\d{1,4})\n+(?:\1\n+)*\s*(?:((?:(?:\w|[^\S\n])*[a-zA-Z](?:\w|[^\S\n])*))\s*(?:\n|$)|)/
Replace: $1 - $2\n
Modifiers: mg (multi-line, global)

Expanded-

# Find:
s{ # Find a single unique digit pattern on a line (group 1)

   ^(\d{1,4})\n+   # Grp 1, capture a digit sequence

   (?:\1\n+)*      # Optionally consume the sequence many times,
   \s*             # and whitespaces (cleanup)

   # Get the next word (group 2)
   (?:
     # Either find a valid word
       (                      # Grp2 
          (?:
             (?:\w|[^\S\n])*     # Optional \w or non-newline whitespaces
             [a-zA-Z]            # with at least one alpha character
             (?:\w|[^\S\n])*
          )
       )
       \s*                    # Consume whitespaces (cleanup),
       (?:\n|$)               # a newline
                              # or, end of string
     |
     # OR, dont find anything (clears group 2)
   )
 }

# Replace (rewrite the new block)
 {$1 - $2\n}xmg;  # modifiers expanded, multi-line, global

回复收藏 0 原文