正则表达式解析 #@user 提及
我一直在使用以下正则表达式来解析我的应用程序中帖子中的#username。
'/(^|\s)#(\w*[a-zA-Z_]+\w*)/
有人可以解释一下 (^|\s)
的用途吗?如果我省略那部分怎么办?
I have been using following Regex to parse #username from posts in my application.
'/(^|\s)#(\w*[a-zA-Z_]+\w*)/
Can somebody explain me the purpose of (^|\s)
. What if I omit that part?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
(^|\s)
匹配字符串的开头 (^
) 或空格字符 (\s
)。这是为了防止hallo#world
匹配为提及。另一种方法是使用
\b
(字边界)。它的语义略有不同,但在这种情况下应该可以工作。(^|\s)
either matches the beginning of a string (^
) or a space character (\s
). This is in order to preventhallo#world
from matching as a mention.An alternative to that is using
\b
(a word boundary). It has slightly different semantics, but it should work in this case.(^|\s)
是行或字符串的开头 (^
) 或 (|
) 空白字符 (\s
)(^|\s)
is either the start of the line or string (^
) or (|
) a white space character (\s
)模式开头的捕获组 (
(^|\s)
) 的一个潜在问题是,它将在匹配时匹配/消耗零个或一个空白字符。为了避免这种情况,请使用正向后查找来检查字符串的开头或空格。另外,我不认为您打算匹配
@_
作为提及,因此我会在\w*
检查之间调整该字符类。这将要求提及内容至少包含一个字母(当下一个\w*
也匹配其他字母时,匹配一个或多个字母没有任何好处)。代码:(演示)
A potential problem with the capture group at the start of the pattern (
(^|\s)
) is that it will match/consume zero or one whitespace character while matching. To avoid this, use a positive lookbehind to check for the start of the string or a whitespace.Also, I don't think you meant to match a
@_
as a mention, so I'd adjust that character class between the\w*
checks. This will require that the mention contains at lease one letter (there is no benefit in matching one or more letters when the next\w*
will also match additional letter).Code: (Demo)