如何以最聪明的方式替换 PHP 中的不同换行符样式?

发布于 2024-12-11 04:13:48 字数 423 浏览 0 评论 0原文

我的文本可能有不同的换行样式。 我想用相同的换行符替换所有换行符 '\r\n', '\n','\r' (在本例中为 \r\n )。

最快的方法是什么?我当前的解决方案看起来像这样,这很糟糕:

    $sNicetext = str_replace("\r\n",'%%%%somthing%%%%', $sNicetext);
    $sNicetext = str_replace(array("\r","\n"),array("\r\n","\r\n"), $sNicetext);
    $sNicetext = str_replace('%%%%somthing%%%%',"\r\n", $sNicetext);

问题是您无法通过一次替换来完成此操作,因为 \r\n 将被复制到 \r\n\r\n 。

感谢您的帮助!

I have a text which might have different newline styles.
I want to replace all newlines '\r\n', '\n','\r' with the same newline (in this case \r\n ).

What's the fastest way to do this? My current solution looks like this which is way sucky:

    $sNicetext = str_replace("\r\n",'%%%%somthing%%%%', $sNicetext);
    $sNicetext = str_replace(array("\r","\n"),array("\r\n","\r\n"), $sNicetext);
    $sNicetext = str_replace('%%%%somthing%%%%',"\r\n", $sNicetext);

Problem is that you can't do this with one replace because the \r\n will be duplicated to \r\n\r\n .

Thank you for your help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

太阳男子 2024-12-18 04:13:49
$sNicetext = str_replace(["\r\n", "\r"], "\n", $sNicetext);

也有效

$sNicetext = str_replace(["\r\n", "\r"], "\n", $sNicetext);

also works

舞袖。长 2024-12-18 04:13:48
$string = preg_replace('~\R~u', "\r\n", $string);

如果您不想替换所有 Unicode 换行符,而只想替换 CRLF 样式的换行符,请使用:

$string = preg_replace('~(*BSR_ANYCRLF)\R~', "\r\n", $string);

\R 匹配这些换行符,u 是将输入字符串视为 UTF 的修饰符-8。


来自 PCRE 文档

\R 匹配什么

默认情况下,模式中的序列 \R 与任何 Unicode 换行符匹配
序列,无论已选择作为行结束序列。如果
您指定

<前><代码> --enable-bsr-anycrlf

默认值已更改,以便 \R 仅匹配 CR、LF 或 CRLF。构建 PCRE 时选择的任何内容都可以在库被覆盖时被覆盖
调用函数。

换行序列

在字符类之外,默认情况下,转义序列 \R 匹配
任何 Unicode 换行序列。在非 UTF-8 模式下 \R 相当于
以下:

<前><代码> (?>\r\n|\n|\x0b|\f|\r|\x85)

这是“原子团”的示例,给出了其详细信息
以下。该特定组与两个字符序列匹配
CR 后跟 LF,或单个字符 LF 之一(换行符、
U+000A)、VT(垂直制表符、U+000B)、FF(换页纸、U+000C)、CR(托架
返回,U+000D),或 NEL(下一行,U+0085)。两个字符的序列
被视为不可分割的单个单元。

在 UTF-8 模式下,两个额外的字符,其代码点更大
超过 255 的添加:LS(行分隔符,U+2028)和 PS(段落分隔符,U+2029)。不需要 Unicode 字符属性支持
这些字符将被识别。

可以限制 \R 仅匹配 CR、LF 或 CRLF(而不是
完整的 Unicode 行结尾集)通过设置选项
PCRE_BSR_ANYCRLF 在编译时或模式匹配时。
(BSR 是“反斜杠 R”的缩写。)这可以设为默认值
PCRE构建时;如果是这种情况,其他行为可以是
通过 PCRE_BSR_UNICODE 选项请求。也可以
通过以其中之一开始模式字符串来指定这些设置
以下序列:

 (*BSR_ANYCRLF) 仅 CR、LF 或 CRLF
    (*BSR_UNICODE) 任何 Unicode 换行序列

这些会覆盖默认值以及提供给 pcre_compile() 或的选项
pcre_compile2(),但它们可以被给定的选项覆盖
pcre_exec() 或 pcre_dfa_exec()。请注意,这些特殊设置
不与 Perl 兼容,仅在程序开始时被识别
模式,并且它们必须为大写。如果其中不止一个
存在,则使用最后一个。它们可以通过改变来组合
换行约定;例如,模式可以以以下内容开头:

<前><代码> (*ANY)(*BSR_ANYCRLF)

它们还可以与 (*UTF8) 或 (*UCP) 特殊序列组合。
在字符类中,\R 被视为无法识别的转义符
序列,因此默认匹配字母“R”,但会导致错误
如果设置了PCRE_EXTRA。

$string = preg_replace('~\R~u', "\r\n", $string);

If you don't want to replace all Unicode newlines but only CRLF style ones, use:

$string = preg_replace('~(*BSR_ANYCRLF)\R~', "\r\n", $string);

\R matches these newlines, u is a modifier to treat the input string as UTF-8.


From the PCRE docs:

What \R matches

By default, the sequence \R in a pattern matches any Unicode newline
sequence, whatever has been selected as the line ending sequence. If
you specify

     --enable-bsr-anycrlf

the default is changed so that \R matches only CR, LF, or CRLF. Whatever is selected when PCRE is built can be overridden when the library
functions are called.

and

Newline sequences

Outside a character class, by default, the escape sequence \R matches
any Unicode newline sequence. In non-UTF-8 mode \R is equivalent to the
following:

    (?>\r\n|\n|\x0b|\f|\r|\x85)

This is an example of an "atomic group", details of which are given
below. This particular group matches either the two-character sequence
CR followed by LF, or one of the single characters LF (linefeed,
U+000A), VT (vertical tab, U+000B), FF (formfeed, U+000C), CR (carriage
return, U+000D), or NEL (next line, U+0085). The two-character sequence
is treated as a single unit that cannot be split.

In UTF-8 mode, two additional characters whose codepoints are greater
than 255 are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029). Unicode character property support is not needed for
these characters to be recognized.

It is possible to restrict \R to match only CR, LF, or CRLF (instead of
the complete set of Unicode line endings) by setting the option
PCRE_BSR_ANYCRLF either at compile time or when the pattern is matched.
(BSR is an abbrevation for "backslash R".) This can be made the default
when PCRE is built; if this is the case, the other behaviour can be
requested via the PCRE_BSR_UNICODE option. It is also possible to
specify these settings by starting a pattern string with one of the
following sequences:

    (*BSR_ANYCRLF)   CR, LF, or CRLF only
    (*BSR_UNICODE)   any Unicode newline sequence

These override the default and the options given to pcre_compile() or
pcre_compile2(), but they can be overridden by options given to
pcre_exec() or pcre_dfa_exec(). Note that these special settings, which
are not Perl-compatible, are recognized only at the very start of a
pattern, and that they must be in upper case. If more than one of them
is present, the last one is used. They can be combined with a change of
newline convention; for example, a pattern can start with:

    (*ANY)(*BSR_ANYCRLF)

They can also be combined with the (*UTF8) or (*UCP) special sequences.
Inside a character class, \R is treated as an unrecognized escape
sequence, and so matches the letter "R" by default, but causes an error
if PCRE_EXTRA is set.

我恋#小黄人 2024-12-18 04:13:48

为了标准化换行符,我总是使用:

$str = preg_replace('~\r\n?~', "\n", $str);

它将旧的 Mac (\r) 和 Windows (\r\n) 换行符替换为 Unix 等效项 (\n )。

我更喜欢使用 \n,因为它只需要一个字节而不是两个字节,但您可以轻松地将其更改为 \r\n

To normalize newlines I always use:

$str = preg_replace('~\r\n?~', "\n", $str);

It replaces the old Mac (\r) and the Windows (\r\n) newlines with the Unix equivalent (\n).

I preffer using \n because it only takes one byte instead of two, but you can easily change it to \r\n.

枕花眠 2024-12-18 04:13:48

怎么样

$sNicetext = preg_replace('/\r\n|\r|\n/', "\r\n", $sNicetext);

How about

$sNicetext = preg_replace('/\r\n|\r|\n/', "\r\n", $sNicetext);
玉环 2024-12-18 04:13:48

我认为转换为 CRLF 的最聪明/最简单的方法是:

$output = str_replace("\n", "\r\n", str_replace("\r", '', $input));

仅转换为 LF:

$output = str_replace("\r", '', $input);

它比正则表达式容易得多。

i think the smartest/simplest way to convert to CRLF is:

$output = str_replace("\n", "\r\n", str_replace("\r", '', $input));

to convert to LF only:

$output = str_replace("\r", '', $input);

it's much more easier than regular expressions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文