在 PHP 中确定并删除字符串中的不可见字符 (%E2%80%8E)

发布于 2025-01-11 00:33:01 字数 572 浏览 0 评论 0原文

我有从数据库读取的 PHP 字符串。这些字符串是 URL,乍一看它们看起来不错,但末尾似乎有一些奇怪的字符。在浏览器的地址栏中,字符串“%E2%80%8E”被附加到 URL,这会破坏 URL。

我找到了 这篇文章从 PHP 中的字符串中剥离从左到右的标记 ,它似乎与我的问题有关,但该解决方案对我不起作用,因为我的字符似乎是其他东西。

那么如何确定我拥有哪个字符,以便将其从字符串中删除呢?

(我会在这里发布一个 URL 作为示例,但是一旦我将其粘贴到此处,堆栈溢出表单就会删除末尾的字符。)

我知道我只能允许字符串中出现某些字符并丢弃所有其他字符。但我仍然想知道它是什么字符——以及它如何进入数据库。

编辑:问题已得到解答,接受的答案中给出的代码对我有用:

$str = preg_replace('/\p{C}+/u', "", $str);

I have strings in PHP which I read from a database. The strings are URLs and at first glance they look good, but there seems to be some weird character at the end. In the address bar of the browser, the string '%E2%80%8E' gets appended to the URL, which breaks the URL.

I found this post on stripping the left-to-right-mark from a string in PHP and it seems related to my problem, but the solution does not work for me because my characters seem to be something else.

So how can I determine which character I have so I can remove it from the strings?

(I would post one of the URLs here as an example, but the stack overflow form strips the character at the end as soon as I paste it in here.)

I know that I could only allow certain chars in the string and discard all others. But I would still like to know what char it is -- and how it gets into the database.

EDIT: The question has been answered and the code given in the accepted answer works for me:

$str = preg_replace('/\p{C}+/u', "", $str);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

你列表最软的妹 2025-01-18 00:33:01

如果输入是 utf8 编码的,可以使用 unicode regex 来匹配/删除不可见的控制字符就像 e2808e (从左到右标记)。使用 u (PCRE_UTF8) 修饰符\p{C}\p{Other}

删除所有不可见内容

$str = preg_replace('/\p{C}+/u', "", $str);

这是一个列表< /a> of \p{Other}


检测/识别隐形物体

$str = ".\xE2\x80\x8E.\xE2\x80\x8B.\xE2\x80\x8F";

// get invisibles + offset
if(preg_match_all('/\p{C}/u', $str, $out, PREG_OFFSET_CAPTURE))
{
  echo "<pre>\n";
  foreach($out[0] AS $k => $v) {
    echo "detected ".bin2hex($v[0])." @ offset ".$v[1]."\n";
  }
  echo "</pre>";
}

输出

detected e2808e @ offset 1
detected e2808b @ offset 5
detected e2808f @ offset 9

在 eval.in 上进行测试

要识别,请在 Google 上查找,例如 fileformat.info:

@google: site:fileformat.info e2808e

If the input is utf8-encoded, might use unicode regex to match/strip invisible control characters like e2808e (left-to-right-mark). Use u (PCRE_UTF8) modifier and \p{C} or \p{Other}.

Strip out all invisibles:

$str = preg_replace('/\p{C}+/u', "", $str);

Here is a list of \p{Other}


Detect/identify invisibles:

$str = ".\xE2\x80\x8E.\xE2\x80\x8B.\xE2\x80\x8F";

// get invisibles + offset
if(preg_match_all('/\p{C}/u', $str, $out, PREG_OFFSET_CAPTURE))
{
  echo "<pre>\n";
  foreach($out[0] AS $k => $v) {
    echo "detected ".bin2hex($v[0])." @ offset ".$v[1]."\n";
  }
  echo "</pre>";
}

outputs:

detected e2808e @ offset 1
detected e2808b @ offset 5
detected e2808f @ offset 9

Test on eval.in

To identify, look up at Google e.g. fileformat.info:

@google: site:fileformat.info e2808e

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文