php preg regex - 多行模式下无换行符的空白组

发布于 2024-10-23 13:36:22 字数 356 浏览 1 评论 0原文

您好，我正在尝试按行分割一些输入，并在每行上使用trim()。但我想不使用修剪，只使用正则表达式来做到这一点。

我遇到的问题是，行尾的空白没有被修剪掉。我猜我的组 [^$\s] 空格但没有换行符不起作用。

所以问题是，如何解决我的问题，以及如何在 preg 正则表达式中定义一个组，它明确表示忽略换行符？目前我觉得我的做法还是错误的。问题是，如果我写 \s* 而不是这个奇怪的组。 .+ 吃掉所有。如果我写.+？我没有得到包含空格的完整字符串。

preg_match_all("/^\s*+(.+)[^$\s]*+$/m", $_POST['input'], $matches, PREG_SET_ORDER );

原文

Hello I am trying to split some input by line, and use trim() on each line. But I would like to do it without using trim, just with regex.

The issue I am having with this, is that whitspaces at the end of the line are not trimmed away. I guess my group [^$\s] whitespaces but no linebreak does not work.

So the question is, how to solve my problem, and how to define a group in preg regex, which explicitly says ignore line breaks? At the moment I am thinking my approach is still wrong. The problem is, if I write \s* instead of this weird group. .+ eats all. If I write .+? I do not get strings which include spaces back complete.

preg_match_all("/^\s*+(.+)[^$\s]*+$/m", $_POST['input'], $matches, PREG_SET_ORDER );

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

雅心素梦 2024-10-30 13:36:22

好吧，我通常都赞成使用正则表达式。但这里的 trim 方法会更简单。我假设您避免了它，因为它通常需要额外的循环。但在这种情况下，您可以将其压缩为：

 $lines = array_map("trim", explode("\n", $_POST["input"]));
 // quite a handy utility function, so just wanted to note that here

但作为您找到的解决方案的替代方案，您可以选择使用：

preg_split('/((?!\n)\p{Z})*\n((?!\n)\p{Z})*/u', "...\n...");

现在有点黑客。将 ^$ 替换为 \n，并使用断言在其他地方排除换行符。但 \p{Z} 是捕获所有 Unicode 空格字符变体（包括 NBSP 和其他 ninja 占位符）的不错选择。

Okay, I'm usually all for using regular expressions. But the trim approach would be simpler here. And I assume you avoided it because it usually requires an extra loop. But in this instance you could compact it to:

 $lines = array_map("trim", explode("\n", $_POST["input"]));
 // quite a handy utility function, so just wanted to note that here

But as alternative to your found solution, you could have alternatively used:

preg_split('/((?!\n)\p{Z})*\n((?!\n)\p{Z})*/u', "...\n...");

A bit hackish now. Swapped out the ^$ just for \n, and used assertions to exclude newlines elsewhere. But the \p{Z} is a nice alternative to catch all Unicode space character variations, including NBSP and other ninja placeholders.

回复收藏 0 原文

如痴如狂 2024-10-30 13:36:22

preg_match_all("/\s*(.*\S)/", $_POST['input'], $matches, PREG_SET_ORDER );

您需要一些东西来吃掉捕获组之前的前导空白，包括整行。 \s* 就是这样做的。您不需要强制它从行首开始，无论如何您都不会保存它 - 它的唯一目的是匹配非空白字符之前。

现在您知道您正在查看非空白，并且需要捕获同一行上的最后一个非空白。由于 . 不会匹配换行符，因此 .*\S 就是这样做的。

与您的版本的一个区别是下一个匹配的初始 \s* 会吃掉您刚刚匹配的行上的尾随空格。由于我们不再关心行结尾，因此不再需要 /m 修饰符。

您可以将第一个星设为所有格 (\s*+)；这不会改变它匹配的内容，但如果有很长的空尾部，它会使其在文件末尾稍微更快地失败。

preg_match_all("/\s*(.*\S)/", $_POST['input'], $matches, PREG_SET_ORDER );

You need something to eat leading whitespace before your capture group, including whole lines. \s* does that. You don't need to force it to start at the beginning of a line, you're not saving it anyway -- its only purpose is to match up to just before a non-whitespace character.

So now you know that you're looking at non-whitespace, and need to capture up to the last non-whitespace on the same line. Since . won't match newline, .*\S does just that.

One difference from your version is that the initial \s* of the next match gets to eat the trailing whitespace on the line you just matched. Since we no longer care about line endings, the /m modifier is no longer necessary.

You could make the first star possessive (\s*+); that won't change what it matches, but it will make it fail marginally faster at the end of the file if there's a long empty tail.

回复收藏 0 原文

~没有更多了~