用于匹配可选字符的简单正则表达式?
我确信对于熟悉正则表达式的人来说这是一个简单的问题:
我需要匹配所有内容,直到字符 #
我不希望 # 字符后面的字符串,只需要它之前的内容,并且字符本身应该不匹配。这是最重要的部分,也是我主要要问的。作为第二个问题,我还想知道如何匹配 # 字符之后的其余部分。但不是在同一个表达式中,因为我在另一个上下文中需要它。
这是一个示例字符串:
topics/install.xml#id_install
我只想要topics/install.xml。对于第二个问题(单独的表达)我想要 id_install
I'm sure this is a simple question for someone at ease with regular expressions:
I need to match everything up until the character #
I don't want the string following the # character, just the stuff before it, and the character itself should not be matched. This is the most important part, and what I'm mainly asking. As a second question, I would also like to know how to match the rest, after the # character. But not in the same expression, because I will need that in another context.
Here's an example string:
topics/install.xml#id_install
I want only topics/install.xml. And for the second question (separate expression) I want id_install
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
第一个表达式:
第二个表达式:
First expression:
Second expression:
如果您的字符串包含任何其他特殊字符,您需要将它们添加到转义的第一个方括号中。
If your string contains any other special characters you need to add them into the first square bracket escaped.
我不使用 C#,但我假设它使用 pcre...如果是这样,
则调用“match”。对“搜索”的调用不需要尾随“.*”。
括号定义“保留组”; [^#] 表示任何不是“#”的字符
您可能尝试过类似的操作
,但发现当存在多个“#”符号时(保留前导“#”),它会失败?
这是因为“.*”是贪婪的,并且会尽可能匹配。
您的匹配器应该有一个类似于“group(...)”的方法。大多数匹配者
返回整个匹配序列为 group(0),第一个括号匹配的组为 group(1),
等等。
PCRE 非常重要,我强烈建议您在 google 上搜索它,学习它,并始终将它放在您的编程工具包中。
I don't use C#, but i will assume that it uses pcre... if so,
with a call to 'match'. A call to 'search' does not need the trailing ".*"
The parens define the 'keep group'; the [^#] means any character that is not a '#'
You probably tried something like
and found that it fails when multiple '#' signs are present (keeping the leading '#'s)?
That is because ".*" is greedy, and will match as much as it can.
Your matcher should have a method that looks something like 'group(...)'. Most matchers
return the entire matched sequence as group(0), the first paren-matched group as group(1),
and so forth.
PCRE is so important i strongly encourage you to search for it on google, learn it, and always have it in your programming toolkit.
使用向前看和向后看:
.*?(?=\#)
(?<=\#).*
如果您不介意使用组,您可以一次性完成所有操作:
(.*?)\#( .*)
您的答案将在组(1) 中并且组(2)。请注意非贪婪构造*?
,它将尝试尽可能少地匹配,而不是尽可能多地匹配。([^\#]*)(?:\#(.*))?
。它使用非收集组来测试后半部分,如果找到,则返回磅后的所有内容。但老实说,对于您的情况,使用
String
中提供的Split
方法可能更容易。有关前向和后向的更多信息
Use look ahead and look behind:
.*?(?=\#)
(?<=\#).*
If you don't mind using groups, you can do it all in one shot:
(.*?)\#(.*)
Your answers will be in group(1) and group(2). Notice the non-greedy construct,*?
, which will attempt to match as little as possible instead of as much as possible.([^\#]*)(?:\#(.*))?
. It uses a non-collecting group to test the second half, and if it finds it, returns everything after the pound.Honestly though, for you situation, it is probably easier to use the
Split
method provided inString
.More on lookahead and lookbehind
第一的:
/[^\#]*(?=\#)/
编辑:比/.*?(?=\#)/
第二:
/(?<=\#).*/
first:
/[^\#]*(?=\#)/
edit: is faster than/.*?(?=\#)/
second:
/(?<=\#).*/
对于 C# 中的类似内容,我通常会完全跳过正则表达式内容并执行以下操作:
For something like this in C# I would usually skip the regular expressions stuff altogether and do something like: