我如何使用 php 检测字符串中的 iso8859-8 和 utf8 希伯来语字符

发布于 2024-08-11 00:13:24 字数 69 浏览 12 评论 0原文

我希望能够检测(使用正则表达式)字符串是否包含 php 编程语言中的 utf8 和 iso8859-8 希伯来语字符。谢谢!

I want to be able to detect (using regular expressions) if a string contains hebrew characters both utf8 and iso8859-8 in the php programming language. thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

半﹌身腐败 2024-08-18 00:13:24

这是iso8859-8 字符集映射。 E0 - FA 范围似乎是为希伯来语保留的。您可以检查字符类中的这些字符:

[\xE0-\xFA]

对于 UTF-8,为希伯来语保留的范围 似乎是 0591 到 05F4。因此,您可以通过以下方式检测到:

[\u0591-\u05F4]

这是 PHP 中正则表达式匹配的示例:

echo preg_match("/[\u0591-\u05F4]/", $string);

Here's map of the iso8859-8 character set. The range E0 - FA appears to be reserved for Hebrew. You could check for those characters in a character class:

[\xE0-\xFA]

For UTF-8, the range reserved for Hebrew appears to be 0591 to 05F4. So you could detect that with:

[\u0591-\u05F4]

Here's an example of a regex match in PHP:

echo preg_match("/[\u0591-\u05F4]/", $string);
感情旳空白 2024-08-18 00:13:24

如果您的 PHP 文件是用 UTF-8 编码的(在其中包含希伯来语的情况下),您应该使用以下 RegX:

$string="אבהג";
echo preg_match("/\p{Hebrew}/u", $string);
// output: 1

well if your PHP file is encoded with UTF-8 as should be in cases that you have hebrew in it, you should use the following RegX:

$string="אבהג";
echo preg_match("/\p{Hebrew}/u", $string);
// output: 1
各空 2024-08-18 00:13:24

这是一个小函数,用于检查字符串中的第一个字符是否为希伯来语:

function IsStringStartsWithHebrew($string)
{
    return (strlen($string) > 1 && //minimum of chars for hebrew encoding
        ord($string[0]) == 215 && //first byte is 110-10111
        ord($string[1]) >= 144 && ord($string[1]) <= 170 //hebrew range in the second byte.
        );
}

祝你好运:)

Here's a small function to check whether the first character in a string is in hebrew:

function IsStringStartsWithHebrew($string)
{
    return (strlen($string) > 1 && //minimum of chars for hebrew encoding
        ord($string[0]) == 215 && //first byte is 110-10111
        ord($string[1]) >= 144 && ord($string[1]) <= 170 //hebrew range in the second byte.
        );
}

good luck :)

凉宸 2024-08-18 00:13:24

首先,这样的字符串完全没有用——两种不同字符集的混合?

iso8859-8 中的希伯来语字符和 UTF-8 中多字节序列的每个字节都有一个值 ord($char) > 127..所以我要做的就是找到值大于 127 的所有字节,然后检查它们是否像 is8859-8 一样有意义,或者您认为它们作为 UTF8 序列是否更有意义......

First, such a string would be completely useless - a mix of two different character sets?

Both the hebrew characters in iso8859-8, and each byte of multibyte sequences in UTF-8, have a value ord($char) > 127. So what I would do is find all bytes with a value greater than 127, and then check if they make sense as is8859-8, or if you think they would make more sense as an UTF8-sequence...

戏蝶舞 2024-08-18 00:13:24
function is_hebrew($string)
{
    return preg_match("/\p{Hebrew}/u", $string);
}
function is_hebrew($string)
{
    return preg_match("/\p{Hebrew}/u", $string);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文