php preg regex - 多行模式下无换行符的空白组

发布于 2024-10-23 13:36:22 字数 356 浏览 1 评论 0原文

您好,我正在尝试按行分割一些输入,并在每行上使用trim()。但我想不使用修剪,只使用正则表达式来做到这一点。

我遇到的问题是,行尾的空白没有被修剪掉。我猜我的组 [^$\s] 空格但没有换行符不起作用。

所以问题是,如何解决我的问题,以及如何在 preg 正则表达式中定义一个组,它明确表示忽略换行符?目前我觉得我的做法还是错误的。问题是,如果我写 \s* 而不是这个奇怪的组。 .+ 吃掉所有。如果我写.+?我没有得到包含空格的完整字符串。

preg_match_all("/^\s*+(.+)[^$\s]*+$/m", $_POST['input'], $matches, PREG_SET_ORDER );

Hello I am trying to split some input by line, and use trim() on each line. But I would like to do it without using trim, just with regex.

The issue I am having with this, is that whitspaces at the end of the line are not trimmed away. I guess my group [^$\s] whitespaces but no linebreak does not work.

So the question is, how to solve my problem, and how to define a group in preg regex, which explicitly says ignore line breaks? At the moment I am thinking my approach is still wrong. The problem is, if I write \s* instead of this weird group. .+ eats all. If I write .+? I do not get strings which include spaces back complete.

preg_match_all("/^\s*+(.+)[^$\s]*+$/m", $_POST['input'], $matches, PREG_SET_ORDER );

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

雅心素梦 2024-10-30 13:36:22

好吧,我通常都赞成使用正则表达式。但这里的 trim 方法会更简单。我假设您避免了它,因为它通常需要额外的循环。但在这种情况下,您可以将其压缩为:

 $lines = array_map("trim", explode("\n", $_POST["input"]));
 // quite a handy utility function, so just wanted to note that here

但作为您找到的解决方案的替代方案,您可以选择使用:

preg_split('/((?!\n)\p{Z})*\n((?!\n)\p{Z})*/u', "...\n...");

现在有点黑客。将 ^$ 替换为 \n,并使用断言在其他地方排除换行符。但 \p{Z} 是捕获所有 Unicode 空格字符变体(包括 NBSP 和其他 ninja 占位符)的不错选择。

Okay, I'm usually all for using regular expressions. But the trim approach would be simpler here. And I assume you avoided it because it usually requires an extra loop. But in this instance you could compact it to:

 $lines = array_map("trim", explode("\n", $_POST["input"]));
 // quite a handy utility function, so just wanted to note that here

But as alternative to your found solution, you could have alternatively used:

preg_split('/((?!\n)\p{Z})*\n((?!\n)\p{Z})*/u', "...\n...");

A bit hackish now. Swapped out the ^$ just for \n, and used assertions to exclude newlines elsewhere. But the \p{Z} is a nice alternative to catch all Unicode space character variations, including NBSP and other ninja placeholders.

如痴如狂 2024-10-30 13:36:22
preg_match_all("/\s*(.*\S)/", $_POST['input'], $matches, PREG_SET_ORDER );

您需要一些东西来吃掉捕获组之前的前导空白,包括整行。 \s* 就是这样做的。您不需要强制它从行首开始,无论如何您都不会保存它 - 它的唯一目的是匹配非空白字符之前。

现在您知道您正在查看非空白,并且需要捕获同一行上的最后一个非空白。由于 . 不会匹配换行符,因此 .*\S 就是这样做的。

与您的版本的一个区别是下一个匹配的初始 \s* 会吃掉您刚刚匹配的行上的尾随空格。由于我们不再关心行结尾,因此不再需要 /m 修饰符。

您可以将第一个星设为所有格 (\s*+);这不会改变它匹配的内容,但如果有很长的空尾部,它会使其在文件末尾稍微更快地失败。

preg_match_all("/\s*(.*\S)/", $_POST['input'], $matches, PREG_SET_ORDER );

You need something to eat leading whitespace before your capture group, including whole lines. \s* does that. You don't need to force it to start at the beginning of a line, you're not saving it anyway -- its only purpose is to match up to just before a non-whitespace character.

So now you know that you're looking at non-whitespace, and need to capture up to the last non-whitespace on the same line. Since . won't match newline, .*\S does just that.

One difference from your version is that the initial \s* of the next match gets to eat the trailing whitespace on the line you just matched. Since we no longer care about line endings, the /m modifier is no longer necessary.

You could make the first star possessive (\s*+); that won't change what it matches, but it will make it fail marginally faster at the end of the file if there's a long empty tail.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文