使用正则表达式替换首先从列表的其余部分中拆分
我需要仅使用正则表达式替换将列表的第一个项目和其余项目分开。
项目列表使用“##”作为分隔符作为字符串输入,例如:
''
'one'
'one##two'
'one##two##three'
'one##two words##three'
我的 Perl 尝试实际上不起作用:
my $sampleText = 'one##two words##three';
my $first = $sampleText;
my $rest = $sampleText;
$first =~ s/(.+?)(##.*)?/$1/g;
$rest =~ s/(.?+)(##)?(.*)/$3/g;
print "sampleText = '$sampleText', first = '$first', rest = '$rest'\n";
sampleText = 'one##two words##three', first = 'one', rest = 'ne##two words##three'
请注意约束:
- 分隔符是多字符字符串,
- 仅允许正则表达式替换 (1)
- I如果需要,可以“链接”正则表达式替换
- 预期的最终结果是两个字符串:第一个元素和第一个元素被截断的初始字符串 (2)
- 列表可能有 0 到 n 个项目,每个项目都是不包含分隔符。
(1) 我使用这个相当大的 Perl 系统,其中在某些时候使用提供的操作来处理项目列表。其中之一是正则表达式替换。其他一项均不适用。使用完整的 Perl 代码解决问题很容易,但这意味着修改系统,而这一次不是一个选择。
(2) 上下文是 Unimarc 书目格式,其中出版物的作者将被分为标准 Unimarc 字段,第一作者为 700$a,其余作者为 701$a。
I need to split a list between its first item and the rest of its items using regex substitution only.
The lists of items are input as strings using '##' as a separator, e.g.:
''
'one'
'one##two'
'one##two##three'
'one##two words##three'
My Perl attempt doesn't really work:
my $sampleText = 'one##two words##three';
my $first = $sampleText;
my $rest = $sampleText;
$first =~ s/(.+?)(##.*)?/$1/g;
$rest =~ s/(.?+)(##)?(.*)/$3/g;
print "sampleText = '$sampleText', first = '$first', rest = '$rest'\n";
sampleText = 'one##two words##three', first = 'one', rest = 'ne##two words##three'
Please note the constraints:
- the separator is a multi-character string
- only regex substitutions are allowed (1)
- I could "chain" regex substitutions if necessary
- The expected end result is two strings: the first element, and the initial string with the first element cut off (2)
- the list may have from 0 to n items, each being any string not containing the separator.
(1) I work with this rather large Perl system where at some point lists of items are processed using provided operations. One of them is a regex substitution. None of the others one are applicable. Solving the problem using full Perl code is easy, but that would mean modifying the system, which is not an option as this time.
(2) the context is the Unimarc bibliographic format, where authors of a publication are to be split into the standard Unimarc fields 700$a for the first author, and 701$a for any remaining authors.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我假设第 (1) 点意味着您不能使用内置的
split
?使用 splits 可选的第三个参数会很容易,它可以让您指定最大项目数。但是如果它必须是正则表达式替换那么你几乎是正确的,但是当没有sperators时使用
.+?
将不起作用(因为它只需要第一个字符你可以通过锚定结尾来解决这个问题。相反,类似:I assume point (1) means you cannot use the
split
builtin? It would be easy using splits optional third parameter which lets you specify the maximum number of items.But if it has to be regex replace then your is almost right, but using
.+?
wont work when there's no sperators (because it will just take the first character You can fix this by anchoring the end. Instead something like:不管怎么回事:
?
Whatever is the matter with :
?
尝试
//
(或m//
)匹配;您不需要使用s///
进行替换。它返回匹配项(此处为$first
、$rest
),或者您可以稍后使用$1
、$2< 捕获它们/代码>,&c。
try
//
(or,m//
) is matching; you don't need to uses///
for substitution. It returns the matches (here, to$first
,$rest
), or you can capture them later using$1
,$2
, &c.您已经颠倒了第二个正则表达式中的量词
?
和+
,它应该是:或更简洁:
You have reversed the quantifiers
?
and+
in the second regex, it should be:or more concise:
我必须匹配;不替代:
I'd must match; not substitute: