Unicode 未知“�” PHP 中的字符检测

发布于 2024-10-09 14:21:54 字数 286 浏览 14 评论 0原文

PHP 有没有办法检测以下字符�？

我目前正在使用几种不同的算法修复许多 UTF-8 编码问题，并且需要能够检测 � 是否存在于字符串中。如何使用 strpos 做到这一点？

简单地将角色粘贴到我的代码库中似乎不起作用。

if (strpos($names['decode'], '?') !== false || strpos($names['decode'], '�') !== false)

原文

Is there any way in PHP of detecting the following character �?

I'm currently fixing a number of UTF-8 encoding issues with a few different algorithms and need to be able to detect if � is present in a string. How do I do so with strpos?

Simply pasting the character into my codebase does not seem to work.

if (strpos($names['decode'], '?') !== false || strpos($names['decode'], '�') !== false)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一梦等七年七年为一梦 2024-10-16 14:21:54

使用 //IGNORE 参数使用 iconv() 将 UTF-8 字符串转换为 UTF-8 会产生删除无效 UTF-8 字符的结果。

因此，您可以通过比较 iconv 操作前后的字符串长度来检测损坏的字符。如果它们不同，则它们包含损坏的字符。

测试用例（确保将文件保存为 UTF-8）：

<?php

header("Content-type: text/html; charset=utf-8");

$teststring = "Düsseldorf";

// Deliberately create broken string
// by encoding the original string as ISO-8859-1
$teststring_broken = utf8_decode($teststring); 

echo "Broken string: ".$teststring_broken ;

echo "<br>";

$teststring_converted = iconv("UTF-8", "UTF-8//IGNORE", $teststring_broken );

echo $teststring_converted;

echo "<br>";

if (strlen($teststring_converted) != strlen($teststring_broken  ))
 echo "The string contained an invalid character";

理论上，您可以删除 //IGNORE 并简单地测试失败（空）的 iconv 操作，但 iconv 失败可能还有其他原因，而不仅仅是无效字符......我不知道。我会使用比较方法。

Converting a UTF-8 string into UTF-8 using iconv() using the //IGNORE parameter produces a result where invalid UTF-8 characters are dropped.

Therefore, you can detect a broken character by comparing the length of the string before and after the iconv operation. If they differ, they contained a broken character.

Test case (make sure you save the file as UTF-8):

<?php

header("Content-type: text/html; charset=utf-8");

$teststring = "Düsseldorf";

// Deliberately create broken string
// by encoding the original string as ISO-8859-1
$teststring_broken = utf8_decode($teststring); 

echo "Broken string: ".$teststring_broken ;

echo "<br>";

$teststring_converted = iconv("UTF-8", "UTF-8//IGNORE", $teststring_broken );

echo $teststring_converted;

echo "<br>";

if (strlen($teststring_converted) != strlen($teststring_broken  ))
 echo "The string contained an invalid character";

in theory, you could drop //IGNORE and simply test for a failed (empty) iconv operation, but there might be other reasons for a iconv to fail than just invalid characters... I don't know. I would use the comparison method.

回复收藏 0 原文

成熟稳重的好男人 2024-10-16 14:21:54

当我期望的时候，我会执行以下操作来检测和纠正未以 UTF-8 编码的字符串的编码：

    $encoding = mb_detect_encoding($str, 'utf-8, iso-8859-1, ascii', true);
    if (strcasecmp($encoding, 'UTF-8') !== 0) {
      $str = iconv($encoding, 'utf-8', $str);
    }

Here is what I do to detect and correct the encoding of strings not encoded in UTF-8 when that is what I am expecting:

    $encoding = mb_detect_encoding($str, 'utf-8, iso-8859-1, ascii', true);
    if (strcasecmp($encoding, 'UTF-8') !== 0) {
      $str = iconv($encoding, 'utf-8', $str);
    }

回复收藏 0 原文

森末i 2024-10-16 14:21:54

据我所知，那个问号符号不是单个字符。标准字体集中有许多不同的字符代码未映射到符号，这是使用的默认符号。要在 PHP 中进行检测，您首先需要知道您正在使用的是什么字体。然后您需要查看字体实现并查看哪些范围的代码映射到“？”符号，然后查看给定字符是否在这些范围之一内。

回复收藏 0 原文

菊凝晚露 2024-10-16 14:21:54

我使用 CUSTOM 方法（使用 str_replace）来清理未定义的字符：

    $input='a³';

    $text=str_replace("\n\n",  "sample000"        ,$text);
    $text=str_replace("\n",    "sample111"        ,$text);

    $text=filter_var($text,FILTER_SANITIZE_SPECIAL_CHARS, FILTER_FLAG_STRIP_LOW);

    $text=str_replace("sample000",  "<br/><br/>"  ,$text);
    $text=str_replace("sample111",  "<br/>"       ,$text);

    echo $text; //outputs ------------>   a3

I use the CUSTOM method (using str_replace) to sanitize undefined characters:

    $input='a³';

    $text=str_replace("\n\n",  "sample000"        ,$text);
    $text=str_replace("\n",    "sample111"        ,$text);

    $text=filter_var($text,FILTER_SANITIZE_SPECIAL_CHARS, FILTER_FLAG_STRIP_LOW);

    $text=str_replace("sample000",  "<br/><br/>"  ,$text);
    $text=str_replace("sample111",  "<br/>"       ,$text);

    echo $text; //outputs ------------>   a3

回复收藏 0 原文

~没有更多了~