从文本中删除注释,引号之间的注释字符除外
我正在尝试构建一个正则表达式来从配置文件中删除注释。注释用 ;
字符标记。例如:
; This is a comment line
keyword1 keyword2 ; comment
keyword3 "key ; word 4" ; comment
我遇到的困难是忽略放在引号之间的注释字符。
有什么想法吗?
I'm trying to build a regexp for removing comments from a configuration file. Comments are marked with the ;
character. For example:
; This is a comment line
keyword1 keyword2 ; comment
keyword3 "key ; word 4" ; comment
The difficulty I have is ignoring the comment character when it's placed between quotes.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
仅当分号后跟偶数个引号时,您才可以尝试匹配分号:
请务必在使用此正则表达式时将
Singleline
选项关闭关闭,并且Multiline< /code> 选项已打开。
在Python中:
You could try matching a semicolon only if it's followed by an even number of quotes:
Be sure to use this regex with the
Singleline
option turned off and theMultiline
option turned on.In Python:
没有正则表达式:)
No regex :)
您可以使用正则表达式先取出所有字符串,用一些占位符替换它们,然后简单地截掉所有
\$.*
,最后替换回字符串:)You may use regexp to get all strings out first, replace them with some place-holder, and then simply cut off all
\$.*
, and replace back the strings at last :)像这样的事情:
首先,匹配引号之间的任意数量的文本,然后匹配 ;。如果;位于引号之间,它将由第一组匹配,而不是第二组。
Something like this:
First, match any number of text between quotes, then match a ;. If the ; is between quotes it will be matches by the first group, not by the second group.
我(有点意外)想出了一个有效的正则表达式:
replace(/^((?:[^'";]*(?:'[^']*'|"[^"]*")? )*)[ \t]*;.*$/gm, '$1')
我想要:
'
s有用(但也接受"
)(因此,在评论分隔符后匹配平衡的引号(偶数),如 Tim Pietzcker 的答案 是不合适的),
;
单独保留在 正确 中(闭合)引用的“字符串”缺乏对 javascript 的回顾,我认为这可能是一个想法不匹配评论(并将其替换为
''
),但匹配评论之前的数据,然后将完整匹配数据替换为子比赛数据。人们可以逐行设想这个概念(因此用匹配替换整行,从而“丢失”注释),但是多行参数似乎并不完全按照这种方式工作(至少在浏览器中)。
[^'";]*
开始吃掉从 'start' 开始的任何不'";
的字符。(对我来说完全违反直觉,
[^'";\r\n]*
将不起作用。)(?:'[^']*'|"[^"]*")?
是一个非捕获组,匹配零个或一组quote any chars quote
(和(?:(['"])[^\2]*\2)?
in/^((?:[^'";]*(?:(['") ])[^\2]*\2)?)*)[ \t]*;.*$/gm
或(?:(['"])[^\2\r\n]*\2)?
在/^((?:[^'";]*(?: (['"])[^\2\r\n]*\2)?)*)[ \t]*;.*$/gm
(虽然神秘地更好)不 工作(在db 上崩溃'WDVPIVAlQEFQ;WzRcU',"hi;hi",0xfe,"'as
),但不添加另一个捕获组以便在比赛中重复使用是一件好事,因为无论如何它们都会受到惩罚)。 上面的组合被放置在一个非捕获组中,它可以重复零次或多次,并且它的结果被放置在一个捕获组
1
中传递。这样我们就得到了
[ \t]*;.*
,它“简单地”匹配零个或多个空格和制表符,后跟一个分号,后跟零个或多个不是换行的字符。请注意;
不是可选的!!!要更好地了解此(多行参数)的工作原理,请点击下面演示中的
exp
按钮。希望这有帮助。
PS:请评论有效示例是否以及在哪里可能会出现问题!由于我普遍认为(根据丰富的个人经验)不可能使用正则表达式(尤其是高级编程语言)可靠地删除注释,因此我的直觉仍然认为这不可能是万无一失的。然而,我已经投入现有数据并精心设计了“假设”两个多小时,但无法打破它(我通常非常擅长)。
I (somewhat accidentally) came up with a working regex:
replace(/^((?:[^'";]*(?:'[^']*'|"[^"]*")?)*)[ \t]*;.*$/gm, '$1')
I wanted:
'
s useful (but accept"
as well)(so matching on a balanced set (even number) of quotes after a comment-delimiter as in Tim Pietzcker's answer was not suitable),
;
alone in correctly (closed) quoted 'strings'Lacking look-back on javascript I thought it might be an idea to not match comments (and replace them with
''
), but match on data preceding the comment and then replace the full match data with the sub-match data.One could envision this concept on a line by line basis (so replace the full line with the match, thereby 'loosing' the comment), BUT the multiline parameter doesn't seem to work exactly that way (at least in the browser).
[^'";]*
starts eating any characters from the 'start' that are not'";
.(Completely counter-intuitive (to me),
[^'";\r\n]*
will not work.)(?:'[^']*'|"[^"]*")?
is a non-capturing group matching zero or one set ofquote any chars quote
(and(?:(['"])[^\2]*\2)?
in/^((?:[^'";]*(?:(['"])[^\2]*\2)?)*)[ \t]*;.*$/gm
or(?:(['"])[^\2\r\n]*\2)?
in/^((?:[^'";]*(?:(['"])[^\2\r\n]*\2)?)*)[ \t]*;.*$/gm
(although mysteriously better) do not work (broke ondb 'WDVPIVAlQEFQ;WzRcU',"hi;hi",0xfe,"'as
), but not adding another capturing group for re-use in the match is a good thing as they come with penalties anyway).The above combo is placed in a non-capturing group which may repeat zero or more times and it's result is placed in a capturing group
1
to pass along.That leaves us with
[ \t]*;.*
which 'simply' matches zero or more spaces and tabs followed by a semicolon, followed by zero or more chars that are not a new line. Note how;
is NOT optional !!!To get a better idea of how this (multi-line parameter) works, hit the
exp
button in the demo below.Hope this helps.
PS: Please comment valid examples if and where this might break! Since I generally agree (from extensive personal experience) that it is impossible to reliably remove comments using regex (especially higher level programming languages), my gut is still saying this can't be fool-proof. However I've been throwing existing data and crafted 'what-ifs' at it for over 2 hours and couldn't get it to break (, which I'm usually very good at).