我如何使用 php 检测字符串中的 iso8859-8 和 utf8 希伯来语字符
我希望能够检测(使用正则表达式)字符串是否包含 php 编程语言中的 utf8 和 iso8859-8 希伯来语字符。谢谢!
I want to be able to detect (using regular expressions) if a string contains hebrew characters both utf8 and iso8859-8 in the php programming language. thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这是iso8859-8 字符集映射。 E0 - FA 范围似乎是为希伯来语保留的。您可以检查字符类中的这些字符:
对于 UTF-8,为希伯来语保留的范围 似乎是 0591 到 05F4。因此,您可以通过以下方式检测到:
这是 PHP 中正则表达式匹配的示例:
Here's map of the iso8859-8 character set. The range E0 - FA appears to be reserved for Hebrew. You could check for those characters in a character class:
For UTF-8, the range reserved for Hebrew appears to be 0591 to 05F4. So you could detect that with:
Here's an example of a regex match in PHP:
如果您的 PHP 文件是用 UTF-8 编码的(在其中包含希伯来语的情况下),您应该使用以下 RegX:
well if your PHP file is encoded with UTF-8 as should be in cases that you have hebrew in it, you should use the following RegX:
这是一个小函数,用于检查字符串中的第一个字符是否为希伯来语:
祝你好运:)
Here's a small function to check whether the first character in a string is in hebrew:
good luck :)
首先,这样的字符串完全没有用——两种不同字符集的混合?
iso8859-8 中的希伯来语字符和 UTF-8 中多字节序列的每个字节都有一个值
ord($char) > 127.
.所以我要做的就是找到值大于 127 的所有字节,然后检查它们是否像 is8859-8 一样有意义,或者您认为它们作为 UTF8 序列是否更有意义......First, such a string would be completely useless - a mix of two different character sets?
Both the hebrew characters in iso8859-8, and each byte of multibyte sequences in UTF-8, have a value
ord($char) > 127
. So what I would do is find all bytes with a value greater than 127, and then check if they make sense as is8859-8, or if you think they would make more sense as an UTF8-sequence...