如何检测解码的字符串

发布于 2024-11-29 00:48:58 字数 455 浏览 4 评论 0原文

我正在追寻 Perl 代码中的一个错误，该错误似乎基本上是此版本的一个版本：

基本上，在某些条件下， Encode::decode('utf8', $string) 在同一个字符串上被调用两次，随之而来的是欢闹。现在，最好的解决方案是找出导致双重解码的条件并阻止其发生。不幸的是，这是功能丰富的产品的成熟生产代码；找出这些条件并以不引入回归错误的方式修复它们似乎具有挑战性。

有没有一些快速可靠的方法来检测字符串是否已经从 utf8 解码？在这些调用之前插入“if”语句感觉有点笨拙，但应该是一个非常安全的解决方案。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

白首有我共你 2024-12-06 00:48:58

不可能正确检测标量是否包含已解码的字符串。无法将该信息传达给 Perl，因此它也无法将其传达给您。充其量，人们可以猜测。您可以使用一些启发式方法。从最可靠到最不可靠：

如果字符串包含超过 255 个字符，则不会对其进行编码。这正是导致“宽字符”警告/错误的原因。
```
utf8::encode($s) if /[^\x00-\xFF]/;
```
如果标量将使用 UTF-8 进行编码（如果已编码且标量包含有效的 UTF-8），则可能已进行编码。
如果标量使用 UTF-8 进行编码，并且标量不包含有效的 UTF-8，则它可能已被解码。
```
utf8::encode($s) if !utf8::decode(my $tmp = $s);
```
如果标量的 UTF8 标志打开，则字符串可能已解码。
如果标量的 UTF8 标志关闭，则字符串可能未解码。
```
utf8::encode($s) if utf8::is_utf8($s);
```

您应该对所有输入进行解码并对所有输出进行编码。

It's impossible to correctly detect whether a scalar contains a decoded string or not. There's no way to communicate that info to Perl, so there's no way for it to communicate it to you. At best, one can guess. There are some heuristics you could use. From most reliable to least:

If the string contains characters above 255, it's not encoded. This is exactly what causes the "wide character" warning/error.
```
utf8::encode($s) if /[^\x00-\xFF]/;
```
If the scalar would be encoded using UTF-8 if it was encoded and the scalar contains valid UTF-8, it's probably encoded.
If the scalar would be encoded using UTF-8 if it was encoded and the scalar does not contain valid UTF-8, it's probably decoded.
```
utf8::encode($s) if !utf8::decode(my $tmp = $s);
```
If the scalar's UTF8 flag is on, then the string is probably decoded.
If the scalar's UTF8 flag is off, then the string is probably not decoded.
```
utf8::encode($s) if utf8::is_utf8($s);
```

You should decode all your inputs and encode all your outputs.

回复收藏 0 原文