Vim 正则表达式:覆盖反向引用?
项目:
获取维基百科上的罗马执政官列表,将数据放入 CSV 中,这样我就可以制作一个图表,展示各个氏族在执政方面的兴衰情况
示例数据源:
509,L. Iunius Brutus,L. Tarquinius Collatinus
suff.,Sp. Lucretius Tricipitinus,P. Valerius Poplicola
suff.,M. Horatius Pulvillus,
508,P. Valerius Poplicola II,T. Lucretius Tricipitinus
507,P. Valerius Poplicola III,M. Horatius Pulvillus II
Vim 搜索:
/\v(\d+|suff\.),((\w+\.=) (\w+)(\s\w+)=(\s\w+)=(\s[iv]+)=(\s\(.{-}\))=,=){,2}
所以本质上是:
- 找到开头的年份(或 suffect consul 的指示):
(\d+|suff\.)
- 下一个分组(我们称之为外部组)最多需要找到两次:
(outer group){,2}
- 对于这两个外部组中的每一个,查找:
- Praenomen,带有可选句点(有时不存在):
(\w+.=)
- 名称:
(\w+)
- 可选的别名(包括空格,如下所示):
(\s\w+)=
- 可选 agnomen:
(\s\w+)=
- 可选迭代(表示他第n次担任执政官)。数据源迭代次数不超过 8 次(因此 I 和 V 就足够了):
(\s[iv]+)=
- 可选的解释性注释,如“Sicinius (Sabinus?)”:
(\s\(.{-}\))=
- Praenomen,带有可选句点(有时不存在):
(最后一个逗号是可选的,因为它是行的末尾。)
所以后面的引用结果是:
\1: year or suffect
\2: the entire second outer group
\3: Praenomen of second outer group (same with all below)
\4: Nomen
\5: Cognomen
\6: Agnomen
\7: Iteration
\8: Explanatory note
问题是我不知道如何捕获第一个外部 团体。就像当它看到第二个外部组时 \2 和 \3-\8 引用被覆盖一样。
使用此替换:
:%s//1:{\1}^I2:{\2}^I3:{\3}^I4:{\4}^I5:{\5}^I6:{\6}^I7:{\7}^I8:{\8}^I9:{\9}
我得到以下输出:
1:{509} 2:{L. Tarquinius Collatinus} 3:{L.} 4:{Tarquinius} 5:{ Collatinus} 6:{} 7:{} 8:{} 9:{}
1:{suff.} 2:{P. Valerius Poplicola} 3:{P.} 4:{Valerius} 5:{ Poplicola} 6:{} 7:{} 8:{} 9:{}
1:{suff.} 2:{M. Horatius Pulvillus,} 3:{M.} 4:{Horatius} 5:{ Pulvillus} 6:{} 7:{} 8:{} 9:{}
1:{508} 2:{T. Lucretius Tricipitinus} 3:{T.} 4:{Lucretius} 5:{ Tricipitinus} 6:{ II} 7:{} 8:{} 9:{}
1:{507} 2:{M. Horatius Pulvillus II} 3:{M.} 4:{Horatius} 5:{ Pulvillus} 6:{ II} 7:{} 8:{} 9:{}
我无法访问第一个外部组中的这些组。我认为它们被覆盖了:它们被覆盖了吗?如果是这样,有办法解决这个问题吗?
编辑: 原标题 Vim 正则表达式(或任何兼容的正则表达式):如果迭代外部组,如何引用组(组内)?
Project:
Take Wikipedia's list of Roman consuls, put the data in a CSV so I can make a graph of the rise and fall of various gens in terms of consulage
Example data source:
509,L. Iunius Brutus,L. Tarquinius Collatinus
suff.,Sp. Lucretius Tricipitinus,P. Valerius Poplicola
suff.,M. Horatius Pulvillus,
508,P. Valerius Poplicola II,T. Lucretius Tricipitinus
507,P. Valerius Poplicola III,M. Horatius Pulvillus II
Vim search:
/\v(\d+|suff\.),((\w+\.=) (\w+)(\s\w+)=(\s\w+)=(\s[iv]+)=(\s\(.{-}\))=,=){,2}
So essentially:
- Find the year at the beginning (or indication of suffect consul):
(\d+|suff\.)
- The next grouping (let's call it the outer group) needs to be found up to two times:
(outer group){,2}
- For each of these two outer groups, find:
- Praenomen, with optional period (sometimes this isn't present):
(\w+.=)
- Nomen:
(\w+)
- Optional cognomen (includes space, as do all below):
(\s\w+)=
- Optional agnomen:
(\s\w+)=
- Optional iteration (indicates the nth time he's been consul). Data source does not have more than 8 iterations (so I and V will suffice):
(\s[iv]+)=
- Optional explanatory note like "Sicinius (Sabinus?)":
(\s\(.{-}\))=
- Praenomen, with optional period (sometimes this isn't present):
(Last comma is optional since it's the end of the row.)
So the back references turn out to be:
\1: year or suffect
\2: the entire second outer group
\3: Praenomen of second outer group (same with all below)
\4: Nomen
\5: Cognomen
\6: Agnomen
\7: Iteration
\8: Explanatory note
The problem is I can't figure out how to capture that first outer group. It's like the \2 and \3-\8 references get overwritten when it sees that second outer group.
Using this replace:
:%s//1:{\1}^I2:{\2}^I3:{\3}^I4:{\4}^I5:{\5}^I6:{\6}^I7:{\7}^I8:{\8}^I9:{\9}
I get this output:
1:{509} 2:{L. Tarquinius Collatinus} 3:{L.} 4:{Tarquinius} 5:{ Collatinus} 6:{} 7:{} 8:{} 9:{}
1:{suff.} 2:{P. Valerius Poplicola} 3:{P.} 4:{Valerius} 5:{ Poplicola} 6:{} 7:{} 8:{} 9:{}
1:{suff.} 2:{M. Horatius Pulvillus,} 3:{M.} 4:{Horatius} 5:{ Pulvillus} 6:{} 7:{} 8:{} 9:{}
1:{508} 2:{T. Lucretius Tricipitinus} 3:{T.} 4:{Lucretius} 5:{ Tricipitinus} 6:{ II} 7:{} 8:{} 9:{}
1:{507} 2:{M. Horatius Pulvillus II} 3:{M.} 4:{Horatius} 5:{ Pulvillus} 6:{ II} 7:{} 8:{} 9:{}
I can't access those groups within the first outer group. I think they're being overwritten: are they being overwritten? If so, is there a way around this?
Edit:
Original title
Vim regex (or any compatible regex): how to reference a group (within a group) if the outer group is iterated?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我会将其分解为多个子步骤,使用 vim 函数,而不是用
正常
(双关语)方式进行操作:看看我做了什么?使事情变得更简单和更清晰
编辑 稍微不那么懒惰,让我们定义一个辅助函数来分割成至少 3 个子字符串并制表符分隔它们:
现在将替换减少到 (仅用于 SO 的换行符< /em>)
在您的输入上运行这种美丽的效果
我很确定这将是一个非常简单的步骤,可以根据您的喜好进一步装饰现在整齐的制表符分隔的列。我可能会添加它,但现在,这是我能想到的最简单的事情:
结果:
I'd break it down in substeps, employing vim functions instead of doing it all the
normal
(pun intended) way:See what I did? made that much simpler and clearer
Edit Getting slightly less lazy, let's define a helper function to split into a minimum of 3 substrings and tabseparate them:
Now reduce the substitution to (linebreaks for SO only)
Running that beauty on your input yields
I'm pretty sure it will be a very easy step to further decorate the now neatly tab-separated columns to your liking. I might add it, but for now, here's simplest thing I can think of:
Result:
是的,重复中的捕获组会被覆盖为最新的匹配值。根据链接页面底部附近的重复和反向引用部分:
您必须明确写出一定数量的捕获组。
我对 vim 的正则表达式引擎不是特别熟悉,所以这里是一个简单的例子。
假设您的文本是
abc 12 345 6789 xyz
。请注意,重复范围为
{1,3}
,我将第二个和第三个(\d+)
设为可选,并使用?
。Yes, capturing groups within repetitions get overwritten to the most recent matched values. According to the Repetition and Backreferences section near the bottom of the linked page:
You'll have to explicitly write out a certain number of capturing groups.
I'm not specifically familiar with vim's regex engine, so here's a simple example.
Let's say your text is
abc 12 345 6789 xyz
.Note that with a repetition range of
{1,3}
, I made the second and third( \d+)
optional with?
.