编译失败：不支持 POSIX 整理元素

发布于 2024-11-30 22:08:35 字数 767 浏览 1 评论 0原文

我刚刚安装了一个网站&将旧版 CMS 迁移到我们的服务器上，我收到 POSIX 编译错误。幸运的是，它只出现在后端，但客户渴望摆脱它。

Warning: preg_match_all() [function.preg-match-all]: Compilation failed: 
POSIX collating elements are not supported at offset 32 in
/home/kwecars/public_html/webEdition/we/include/we_classes/SEEM/we_SEEM.class.php
on line 621

据我所知，是较新版本的 PHP 导致了该问题。这是代码：

function getAllHrefs($code){

$trenner = "[\040|\n|\t|\r]*";

$pattern = "/<(a".$trenner."[^>]+href".$trenner."[=\"|=\'|=\\\\|=]*".$trenner.")
([^\'\">\040? \\\]*)([^\"\' \040\\\\>]*)(".$trenner."[^>]*)>/sie";

preg_match_all($pattern, $code, $allLinks); // ---- line 621
return $allLinks;

}

我如何调整它以在该服务器上的较新版本的 php 上工作？

预先感谢，我的巫毒还不够强大；）

原文

I've just installed a website & legacy CMS onto our server and I'm getting a POSIX compilation error. Luckily it's only appearing in the backend however the client's keen to get rid of it.

Warning: preg_match_all() [function.preg-match-all]: Compilation failed: 
POSIX collating elements are not supported at offset 32 in
/home/kwecars/public_html/webEdition/we/include/we_classes/SEEM/we_SEEM.class.php
on line 621

From what I can tell it's the newer version of PHP causing the issue. Here's the code:

function getAllHrefs($code){

$trenner = "[\040|\n|\t|\r]*";

$pattern = "/<(a".$trenner."[^>]+href".$trenner."[=\"|=\'|=\\\\|=]*".$trenner.")
([^\'\">\040? \\\]*)([^\"\' \040\\\\>]*)(".$trenner."[^>]*)>/sie";

preg_match_all($pattern, $code, $allLinks); // ---- line 621
return $allLinks;

}

How can I tweak this to work on the newer version of php on this server?

Thanks in advance, my voodoo just isn't strong enough ;)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风流物 2024-12-07 22:08:35

您的错误消息“不支持 POSIX 整理元素” 值得一些解释。毕竟，POSIX 整理元素到底是什么？我怎样才能避免它呢？

简短的答案是，你的方括号内有一个等号，其用途是保留供将来使用的，假设我们曾经抽出时间来实现它，但这是肯定的。您可以通过这种方式在命令行上的 Perl 中勾选此选项，这会提供比 PHP 提供的更好的错误消息：

% perl -le 'print "abc" =~ /[=foo=]/ || "Fail"'
POSIX syntax [= =] is reserved for future extensions in regex; marked by <-- HERE in m/[=foo=] <-- HERE / at -e line 1.

这是简短的答案；更长的答案如下。

花哨的 POSIX 字符类

在方括号字符类中，POSIX 允许三种不同的嵌套括号形式，所有这些形式都使用括号内的额外符号成对表示：

命名的 POSIX 字符类，它们基本上类似于 Unicode 属性，使用额外的冒号两侧：[:PROPERTY:]，如[:alpha:]。
整理要视为彼此等效的元素，请在它们两侧使用额外的等号：[=ELEMENTS=]，如英语中的 [=eéèëê=] 或法语和瑞典语 [=vw=]。
多字母（二字母、三字母、四字母等）是多字符元素，被视为单个字符，其两侧有一个额外的点：[.DIGRAPH.]，如 [. ch.] 或 [.ll.] 按照传统的西班牙语字母表。这些有时被称为收缩，因为两个或多个代码点就像该序列是单个代码点一样。

Perl 仅支持其中第一个，而不支持第二个和第三个。

它们使用起来都很尴尬，因为它们必须嵌套在一组额外的括号内，如 [[:punct:] 中表示 \pP 或 \ p{punct}。当您选择其中之一时，只需要带有 Unicode 属性的额外大括号，如 [\pL\pN\pM\p{Pc}] 中所示。

意图

另外两个是尝试在旧 8 位语言环境下的前 Unicode 环境中支持特定于语言环境的语言元素。例如，要表达传统的西班牙语字母表，它将元音上的重音符号和 u 上的分音符视为同一个字母，而将 n 上的波形符视为一个字母完全不同的字母，而且还有两个二合字母，每个字母都算作一个不同的字母，你必须在 POSIX 中这样写：

[[=aá=]bc[.ch.]d[=eé=]fgh[=ií=]jkl[.ll.]mnñ[=oó=]pqrst[=uúü=]vwxyz]

你可以有时将它们结合起来。例如，在德语电话簿中，通过插入以下 e，可以在不使用变音符号的情况下拼写三个 i 突变元音：

[a[=ä[.ae.]=]bcdefghijklmno[=ö[.oe.]=]pqrs[=ß[.ss.]=]tu[=ü[.ue.]=]vwxyz]

这样，假设 $ES 和 $DE 是这些语言各自的字母表，您可以说类似的话

[$ES]{4}

，并让它匹配诸如 guía、niño、 之类的单词llave，和西班牙语中的choco；或者在德语中，have

[$DE]{6}

和 has it 匹配诸如 tschüß 或其大写未标记的等效词 TSCHUESS。

Unicode 方式

由于多种原因，这很尴尬，而不仅仅是从上面列出的两个字母表中显而易见的原因。它不承认组合字符的概念，因此您必须为非标准化文本显式添加这些字符，如 [=e\xE9[.e\x{301.]=] 中。

Unicode 在如何实现此类语言元素方面采取了另一条道路。幸运的是，每个 UTS#18 的 Unicode 正则表达式不需要支持为特定语言定制的语言功能或语言环境，直到 3 级。这是目前还没有人实施的事情。

请注意，SS 和 ß 具有相同的皮套不被视为区域设置定制。无论语言环境如何，它都是该代码点的完整案例折叠。所以忽略大小写时这些是相同的。奇怪但真实。鉴于 ß 是代码点 U+00DF，我们看到无论语言环境如何，这些都是相同的：

$ perl5.14.0 -E 'say "SS" =~ /^\xDF$/i ? "Pass" : "Fail"'
Pass
$ perl5.14.0 -E 'say "\xDF" =~ /^SS$/i ? "Pass" : "Fail"'
Pass

尽管我们仍然无法对模式进行语言环境定制，但已经实现了排序规则，包括语言环境支持，并且你可以从 Perl 访问它。

但是，PHP 尚不支持 Unicode 排序规则。

Unicode 排序规则的参考包括：

Your error message that “POSIX collating elements are not supported” deserves some explanation. After all, what in the world is a POSIX collating element anyway, and how can I avoid it?

The short answer is that you have an equals sign inside your square brackets in a place where its use is reserved for future use, assuming we ever get around to implementing it, which is anything but certain. You can tickle this in Perl on the command line this way, which gives a much better error message than PHP is providing:

% perl -le 'print "abc" =~ /[=foo=]/ || "Fail"'
POSIX syntax [= =] is reserved for future extensions in regex; marked by <-- HERE in m/[=foo=] <-- HERE / at -e line 1.

That’s the short answer; the longer answer follows.

Fancy POSIX Character Classes

Inside a square bracketed character class, POSIX admits three different nestedbracketed forms, all indicated using an extra symbol inside the brackets in pairs:

Named POSIX character classes, which are basically like Unicode properties, use an extra colon flanking: [:PROPERTY:], as in [:alpha:].
Collating elements intended to be treated as equivalent to each other, use an extra equals sign flanking them: [=ELEMENTS=], as in [=eéèëê=] in English or French, and [=vw=] in Swedish.
Polygraphs (digraphs, trigraphs, tetragraphs, etc), which are multicharacter elements meant to count as a single character, have an extra dot flanking them: [.DIGRAPH.], as in [.ch.] or [.ll.] per the traditional Spanish alphabet. These are sometimes known as contractions because two or more code points count as though that sequence were a single code point.

Perl supports only the first of these, not the second and third.

They are all awkward to use, because they must be nested inside an extra set of brackets, as in [[:punct:] to mean \pP or \p{punct}. You only need extra braces with Unicode properties when you are selecting one of many, as in [\pL\pN\pM\p{Pc}].

The Intent

The other two were an attempt to support locale-specific linguistic elements in a pre‐Unicode enviornment under legacy 8‑bit locales. For example, to express the traditional Spanish alphabet, which counts acute accents over vowels and diaereses over u’s as the same letter yet which counts a tilde over an n as a different letter altogether, and which furthermore has two digraphs each counting as a distinct letter, you would have to write this in POSIX:

[[=aá=]bc[.ch.]d[=eé=]fgh[=ií=]jkl[.ll.]mnñ[=oó=]pqrst[=uúü=]vwxyz]

You can and sometimes much combine these. For example, in German phonebooks where the three i‑mutated vowels can be spelt without diacritics by inserting a following e:

[a[=ä[.ae.]=]bcdefghijklmno[=ö[.oe.]=]pqrs[=ß[.ss.]=]tu[=ü[.ue.]=]vwxyz]

That way, assuming $ES and $DE are those languages’ respective alphabets, you could say something like

[$ES]{4}

and have it match words like guía, niño, llave, and choco in Spanish; or in German have

[$DE]{6}

and have it match words like tschüß or its uppercase undiacriticked equivalent, TSCHUESS.

The Unicode Way

This is awkward for various reasons, and not just those that are obvious from the two alphabets listed above. It does not admit the notion of combining characters, so you have to add those explicitly for non-normalized text, as in [=e\xE9[.e\x{301.]=].

Unicode has taken another path in how to implement linguistic elements like this. Fortunately, Unicode regular expressions per UTS#18 do not need to support language features tailored for specific languages or locales until Level 3. This is something no one yet has yet implemented.

Note that having SS and ß have the same casefold is not considered a locale tailoring. It is the full casefold for that code point no matter the linguistic context. So those are the same when case is ignored. Strange but true. Given that ß is code point U+00DF, we see that these are the same no matter the locale:

$ perl5.14.0 -E 'say "SS" =~ /^\xDF$/i ? "Pass" : "Fail"'
Pass
$ perl5.14.0 -E 'say "\xDF" =~ /^SS$/i ? "Pass" : "Fail"'
Pass

Although locale tailoring for patterns is still beyond us, collation has been implemented, including with locale support, and you can access it from Perl just fine.

However, PHP does not yet support Unicode collation.

References for Unicode collation include:

回复收藏 0 原文

桜花祭 2024-12-07 22:08:35

[...] 是字符类，它们匹配括号之间的任何字符，您不必在它们之间添加 |。请参阅字符类。

因此[abcd]将匹配a或b或c或d。

如果您想匹配多个字符的交替，例如 red 或 blue 或 Yellow，请使用子模式：

"(red|blue|yellow)"

您猜到了，[abcd] 相当于 <代码>(a|b|c|d)。

因此，您可以为正则表达式执行以下操作：

对于

$trenner = "[\040|\n|\t|\r]*";

Write this 相反：

$trenner = "[\040\n\t\r]*";

对于

"[=\"|=\'|=\\\\|=]"

您可以这样做

"(=\"|=\'|=\\\\|=)"

或者

"=[\"'\\\\]?"

顺便说一句，您可以使用 \s 而不是 $trenner （请参阅 http://www.php.net/manual/en/regexp.reference.escape .php)

[...] are character classes, they match any character between the brackets, you don't have to add | between them. See character classes.

So [abcd] will match a or b or c or d.

If you want to match alternations of more than one character, for example red or blue or yellow, use a sub pattern:

"(red|blue|yellow)"

And you guessed, [abcd] is equivalent to (a|b|c|d).

So here is what you could do for your regex:

For

$trenner = "[\040|\n|\t|\r]*";

Write this instead:

$trenner = "[\040\n\t\r]*";

And for

"[=\"|=\'|=\\\\|=]"

You could do

"(=\"|=\'|=\\\\|=)"

"=[\"'\\\\]?"

BTW you could use \s instead of $trenner (see http://www.php.net/manual/en/regexp.reference.escape.php)

回复收藏 0 原文

~没有更多了~

关于作者

玉环

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

编译失败：不支持 POSIX 整理元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

花哨的 POSIX 字符类

意图

Unicode 方式

Fancy POSIX Character Classes

The Intent

The Unicode Way

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

编译失败：不支持 POSIX 整理元素

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

花哨的 POSIX 字符类

意图

Unicode 方式

Fancy POSIX Character Classes

The Intent

The Unicode Way

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。