如何忽略正则表达式主题字符串中的空格?

发布于 2024-10-10 16:08:22 字数 149 浏览 5 评论 0原文

使用正则表达式模式搜索匹配项时,是否有一种简单的方法可以忽略目标字符串中的空格?例如,如果我的搜索是“cats”,我希望“cats”或“ca ts”匹配。我无法事先删除空格,因为我需要找到匹配的开始和结束索引(包括任何空格),以便突出显示该匹配,并且出于格式化目的,需要存在任何空格。

Is there a simple way to ignore the white space in a target string when searching for matches using a regular expression pattern? For example, if my search is for "cats", I would want "c ats" or "ca ts" to match. I can't strip out the whitespace beforehand because I need to find the begin and end index of the match (including any whitespace) in order to highlight that match and any whitespace needs to be there for formatting purposes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

强者自强 2024-10-17 16:08:22

您可以在正则表达式中的每个其他字符之间粘贴可选的空白字符 \s* 。虽然被授予,但它会变得有点冗长。

/cats/ -> /c\s*a\s*t\s*s/

You can stick optional whitespace characters \s* in between every other character in your regex. Although granted, it will get a bit lengthy.

/cats/ -> /c\s*a\s*t\s*s/

心清如水 2024-10-17 16:08:22

虽然接受的答案在技术上是正确的,但如果可能的话,更实用的方法是从正则表达式和搜索字符串中删除空格。

如果您想搜索“我的猫”,而不是:

myString.match(/m\s*y\s*c\s*a\*st\s*s\s*/g)

只需执行:

myString.replace(/\s*/g,"").match(/mycats/g)

警告:您不能通过仅用空字符串替换所有空格来在正则表达式上自动执行此操作,因为它们可能出现在否定中或以其他方式使您的正则表达式无效的。

While the accepted answer is technically correct, a more practical approach, if possible, is to just strip whitespace out of both the regular expression and the search string.

If you want to search for "my cats", instead of:

myString.match(/m\s*y\s*c\s*a\*st\s*s\s*/g)

Just do:

myString.replace(/\s*/g,"").match(/mycats/g)

Warning: You can't automate this on the regular expression by just replacing all spaces with empty strings because they may occur in a negation or otherwise make your regular expression invalid.

束缚m 2024-10-17 16:08:22

针对史蒂文对萨姆·杜菲尔的回答的评论

谢谢,听起来就是这样。但我刚刚意识到我只想要可选的空白字符(如果它们遵循换行符)。例如,“c\n ats”或“ca\n ts”应该匹配。但如果没有换行符,则不希望“cats”匹配。关于如何做到这一点有什么想法吗?

这应该可以解决问题:

/c(?:\n\s*)?a(?:\n\s*)?t(?:\n\s*)?s/

请参阅 此页 匹配所有不同的“cats”变体。

您还可以使用 条件 解决此问题,但正则表达式的 javascript 风格不支持它们。

Addressing Steven's comment to Sam Dufel's answer

Thanks, sounds like that's the way to go. But I just realized that I only want the optional whitespace characters if they follow a newline. So for example, "c\n ats" or "ca\n ts" should match. But wouldn't want "c ats" to match if there is no newline. Any ideas on how that might be done?

This should do the trick:

/c(?:\n\s*)?a(?:\n\s*)?t(?:\n\s*)?s/

See this page for all the different variations of 'cats' that this matches.

You can also solve this using conditionals, but they are not supported in the javascript flavor of regex.

一萌ing 2024-10-17 16:08:22

您可以将 \s* 放在搜索字符串中的每个字符之间,因此如果您要查找 cat,则可以使用 c\s*a\s*t\s*s\s*s

它很长,但您当然可以动态构建字符串。

您可以在这里看到它的工作原理: http://www.rubular.com/r/zzWwvppSpE

You could put \s* inbetween every character in your search string so if you were looking for cat you would use c\s*a\s*t\s*s\s*s

It's long but you could build the string dynamically of course.

You can see it working here: http://www.rubular.com/r/zzWwvppSpE

不气馁 2024-10-17 16:08:22

如果您只想允许空格,那么

\bc *a *t *s\b

应该这样做。 如果您还想在 bobcatscatsupcats ,也可以使用制表符,请使用

\bc[ \t]*a[ \t]*t[ \t]*s\b

删除 \b 锚点代码>.

If you only want to allow spaces, then

\bc *a *t *s\b

should do it. To also allow tabs, use

\bc[ \t]*a[ \t]*t[ \t]*s\b

Remove the \b anchors if you also want to find cats within words like bobcats or catsup.

趴在窗边数星星i 2024-10-17 16:08:22

这种方法可用于自动化
(以下示例性解决方案是用Python编写的,尽管显然它可以移植到任何语言):

您可以预先去除空格并保存非空格字符的位置,以便稍后可以使用它们来找出匹配的字符串边界位置原始字符串如下所示:

def regex_search_ignore_space(regex, string):
    no_spaces = ''
    char_positions = []

    for pos, char in enumerate(string):
        if re.match(r'\S', char):  # upper \S matches non-whitespace chars
            no_spaces += char
            char_positions.append(pos)

    match = re.search(regex, no_spaces)
    if not match:
        return match

    # match.start() and match.end() are indices of start and end
    # of the found string in the spaceless string
    # (as we have searched in it).
    start = char_positions[match.start()]  # in the original string
    end = char_positions[match.end()]  # in the original string
    matched_string = string[start:end]  # see

    # the match WITH spaces is returned.
    return matched_string

with_spaces = 'a li on and a cat'
print(regex_search_ignore_space('lion', with_spaces))
# prints 'li on'

如果你想更进一步,你可以构造匹配对象并返回它,所以这个助手的使用会更方便。

当然这个函数的性能也可以优化,这个例子只是为了展示解决方案的路径。

This approach can be used to automate this
(the following exemplary solution is in python, although obviously it can be ported to any language):

you can strip the whitespace beforehand AND save the positions of non-whitespace characters so you can use them later to find out the matched string boundary positions in the original string like the following:

def regex_search_ignore_space(regex, string):
    no_spaces = ''
    char_positions = []

    for pos, char in enumerate(string):
        if re.match(r'\S', char):  # upper \S matches non-whitespace chars
            no_spaces += char
            char_positions.append(pos)

    match = re.search(regex, no_spaces)
    if not match:
        return match

    # match.start() and match.end() are indices of start and end
    # of the found string in the spaceless string
    # (as we have searched in it).
    start = char_positions[match.start()]  # in the original string
    end = char_positions[match.end()]  # in the original string
    matched_string = string[start:end]  # see

    # the match WITH spaces is returned.
    return matched_string

with_spaces = 'a li on and a cat'
print(regex_search_ignore_space('lion', with_spaces))
# prints 'li on'

If you want to go further you can construct the match object and return it instead, so the use of this helper will be more handy.

And the performance of this function can of course also be optimized, this example is just to show the path to a solution.

旧城空念 2024-10-17 16:08:22

如果您传递动态值(例如数组循环中的“当前值”)作为正则表达式测试值,则接受的答案将不起作用。如果没有一些非常难看的正则表达式,您将无法输入可选的空格。
因此,Konrad Hoffner 的解决方案在这种情况下更好,因为它会去除 regest 和 test 字符串中的空格。测试将按照两者都没有空格的方式进行。

The accepted answer will not work if and when you are passing a dynamic value (such as "current value" in an array loop) as the regex test value. You would not be able to input the optional white spaces without getting some really ugly regex.
Konrad Hoffner's solution is therefore better in such cases as it will strip both the regest and test string of whitespace. The test will be conducted as though both have no whitespace.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文