正则表达式忽略一定数量的字符重复

发布于 2024-08-23 07:56:10 字数 305 浏览 7 评论 0原文

我正在尝试编写一个使用两个字符作为标记边界的解析器,但我无法弄清楚当我对整个字符串进行正则表达式转义时可以忽略它们的正则表达式。

给定一个像这样的字符串:

This | is || token || some ||| text

我想最终得到:

This \| is || token || some \|\|\| text

其中所有 |除非他们两个在一起,否则就会逃脱。

是否有一个正则表达式可以让我转义每个 |那不是成对的吗?

I'm trying to write a parser that uses two characters as token boundaries, but I can't figure out the regular expression that will allow me to ignore them when I'm regex-escaping the whole string.

Given a string like:

This | is || token || some ||| text

I would like to end up with:

This \| is || token || some \|\|\| text

where all of the | are escaped unless there are two of them together.

Is there a regular expression that will allow me to escape every | that isn't in a pair?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

只是一片海 2024-08-30 07:56:10

不需要正则表达式。毕竟你正在使用Python。 :)

>>> s="This | is || token || some ||| text"
>>> items=s.split()
>>> items
['This', '|', 'is', '||', 'token', '||', 'some', '|||', 'text']
>>> for n,i in enumerate(items):
...     if "|" in i and i.count("|")!=2:
...          items[n]=i.replace("|","\|")
...
>>> print ' '.join(items)
This \| is || token || some \|\|\| text

No need regex. You are using Python after all. :)

>>> s="This | is || token || some ||| text"
>>> items=s.split()
>>> items
['This', '|', 'is', '||', 'token', '||', 'some', '|||', 'text']
>>> for n,i in enumerate(items):
...     if "|" in i and i.count("|")!=2:
...          items[n]=i.replace("|","\|")
...
>>> print ' '.join(items)
This \| is || token || some \|\|\| text
╄→承喏 2024-08-30 07:56:10

我不明白为什么您需要对标记进行正则表达式转义,但为什么不先拆分字符串,然后然后转义它们呢?这个正则表达式分成两个管道,这两个管道之前或之后没有更多管道:

re.split('(?<!\|)\|\|(?!\|)', 'This | is || token || some ||| text')
>>> ['This | is ', ' token ', ' some ||| text']

顺便说一句,有针对所有更常见的正则表达式风格的测试人员可供谷歌搜索。这是 Python 的一个: http://re.dabase.com/

I don't see why you would need to regex-escape the tokens, but why not split up the string first and then escape them? This regex splits on two pipes that aren't preceded or followed by more pipes:

re.split('(?<!\|)\|\|(?!\|)', 'This | is || token || some ||| text')
>>> ['This | is ', ' token ', ' some ||| text']

By the way, there are testers for all of the more common regex flavors out there for the Googling. Here's one for Python: http://re.dabase.com/

梦初启 2024-08-30 07:56:10

如果有人感兴趣的话,这里有一种在 Perl 中使用正则表达式来实现这一点的方法。我使用了两个单独的正则表达式,一个用于单个匹配,一个用于 3 个或更多匹配。我确信可以将它们组合起来,但是正则表达式已经足够难以阅读,而不会增加不必要的复杂性。

#!/usr/bin/perl

#$s = "This | is || token || some ||| text";
$s = "| This |||| is || more | evil |";

$s =~ s/([^|]|^)(\|)([^|]|$)/\1\\\2\3/g;
$s =~ s{(\|{3,})}
{
   $a = $1;
   $a =~ s{\|} {\\\|}g;
   $a;
}eg;

print $s . "\n";

输出:

\| This \|\|\|\| is || more \| evil \|

Here's a way to do it with regular expressions in perl, if anyone's interested. I used two separate regular expressions, one for the single match and one for the 3 or more match. I'm sure it's possible to combine them, but regular expressions are already difficult enough to read without adding needless complexity.

#!/usr/bin/perl

#$s = "This | is || token || some ||| text";
$s = "| This |||| is || more | evil |";

$s =~ s/([^|]|^)(\|)([^|]|$)/\1\\\2\3/g;
$s =~ s{(\|{3,})}
{
   $a = $1;
   $a =~ s{\|} {\\\|}g;
   $a;
}eg;

print $s . "\n";

Outputs:

\| This \|\|\|\| is || more \| evil \|
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文