如何删除 PHP 字符串中的 %EF%BB%BF

发布于 2024-09-30 01:41:29 字数 462 浏览 3 评论 0原文

我正在尝试使用 Microsoft Bing API。

$data = file_get_contents("http://api.microsofttranslator.com/V2/Ajax.svc/Speak?appId=APPID&text={$text}&language=ja&format=audio/wav");
$data = stripslashes(trim($data));

返回的数据在返回字符串的第一个字符中有一个“ ”字符。它不是空格,因为我在返回数据之前修剪了它。

' ' 字符结果是 %EF%BB%BF。

我想知道为什么会发生这种情况,也许是微软的错误?

如何在 PHP 中删除这个 %EF%BB%BF?

I am trying to use the Microsoft Bing API.

$data = file_get_contents("http://api.microsofttranslator.com/V2/Ajax.svc/Speak?appId=APPID&text={$text}&language=ja&format=audio/wav");
$data = stripslashes(trim($data));

The data returned has a ' ' character in the first character of the returned string. It is not a space, because I trimed it before returning the data.

The ' ' character turned out to be %EF%BB%BF.

I wonder why this happened, maybe a bug from Microsoft?

How can I remove this %EF%BB%BF in PHP?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

≈。彩虹 2024-10-07 01:41:29

您不应简单地丢弃 BOM,除非您 100% 确定流将:(a) 始终为 UTF-8,并且 (b) 始终具有 UTF-8 BOM。

原因:

  1. 在 UTF-8 中,BOM 是可选 - 因此,如果服务在将来某个时刻停止发送它,您将丢弃响应的前三个字符。
  2. BOM 的全部目的是明确识别被解释为 UTF-8 的 UTF 流的类型? -16?或-32?,并且还指示编码信息的“字节序”(字节顺序)。如果你直接把它扔掉,你就会认为你总是得到 UTF-8;这可能不是一个很好的假设。
  3. 并非所有 BOM 都是 3 字节长,只有 UTF-8 是 3 字节长。 UTF-16 是两个字节,UTF-32 是四个字节。因此,如果服务将来切换到更广泛的 UTF 编码,您的代码将会崩溃。

我认为处理这个问题的更合适的方法是:

/* Detect the encoding, then convert from detected encoding to ASCII */
$enc = mb_detect_encoding($data);
$data = mb_convert_encoding($data, "ASCII", $enc);

You should not simply discard the BOM unless you're 100% sure that the stream will: (a) always be UTF-8, and (b) always have a UTF-8 BOM.

The reasons:

  1. In UTF-8, a BOM is optional - so if the service quits sending it at some future point you'll be throwing away the first three characters of your response instead.
  2. The whole purpose of the BOM is to identify unambiguously the type of UTF stream being interpreted UTF-8? -16? or -32?, and also to indicate the 'endian-ness' (byte order) of the encoded information. If you just throw it away you're assuming that you're always getting UTF-8; this may not be a very good assumption.
  3. Not all BOMs are 3-bytes long, only the UTF-8 one is three bytes. UTF-16 is two bytes, and UTF-32 is four bytes. So if the service switches to a wider UTF encoding in the future, your code will break.

I think a more appropriate way to handle this would be something like:

/* Detect the encoding, then convert from detected encoding to ASCII */
$enc = mb_detect_encoding($data);
$data = mb_convert_encoding($data, "ASCII", $enc);
裂开嘴轻声笑有多痛 2024-10-07 01:41:29

$data = file_get_contents("http://api.microsofttranslator.com/V2/Ajax.svc/Speak?appId=APPID&text={$text}&language=ja&format=audio/wav") ;
$data = stripslashes(trim($data));

if (substr($data, 0, 3) == "\xef\xbb\xbf") {
$data = substr($data, 3);
}

$data = file_get_contents("http://api.microsofttranslator.com/V2/Ajax.svc/Speak?appId=APPID&text={$text}&language=ja&format=audio/wav");
$data = stripslashes(trim($data));

if (substr($data, 0, 3) == "\xef\xbb\xbf") {
$data = substr($data, 3);
}

花开浅夏 2024-10-07 01:41:29

它是一个 字节顺序标记 (BOM),指示响应被编码为 UTF-8。您可以安全地删除它,但您应该将其余部分解析为 UTF-8。

It's a byte order mark (BOM), indicating the response is encoded as UTF-8. You can safely remove it, but you should parse the remainder as UTF-8.

oО清风挽发oО 2024-10-07 01:41:29

我今天遇到了同样的问题,并通过确保字符串设置为 UTF-8 进行了修复:

http://php.net/manual/en/function.utf8-encode.php

$content = utf8_encode ( $content );

I had the same problem today, and fixed by ensuring the string was set to UTF-8:

http://php.net/manual/en/function.utf8-encode.php

$content = utf8_encode ( $content );

忘东忘西忘不掉你 2024-10-07 01:41:29

要从字符串的开头删除它(仅):

$data = preg_replace('/^%EF%BB%BF/', '', $data);

To remove it from the beginning of the string (only):

$data = preg_replace('/^%EF%BB%BF/', '', $data);
甜扑 2024-10-07 01:41:29

$data = str_replace('%EF%BB%BF', '', $data);

您可能不应该使用 stripslashes —— 除非 API 返回 blackslashed数据(99.99% 的可能性不是),请接受该呼吁。

$data = str_replace('%EF%BB%BF', '', $data);

You probably shouldn't be using stripslashes -- unless the API returns blackslashed data (and 99.99% chance it doesn't), take that call out.

时光病人 2024-10-07 01:41:29

您可以使用 substr 只获取其余部分,而无需 UTF-8 BOM

// if it’s binary UTF-8
$data = substr($data, 3);
// if it’s percent-encoded UTF-8
$data = substr($data, 9);

You could use substr to only get the rest without the UTF-8 BOM:

// if it’s binary UTF-8
$data = substr($data, 3);
// if it’s percent-encoded UTF-8
$data = substr($data, 9);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文