将文本从 utf 转换为可读文本

发布于 2024-11-16 22:43:44 字数 193 浏览 6 评论 0原文

我有一些以“ef bb bf”开头的UTF-文本。如何将此消息转换为人类可读的文本? vim、gedit 等将文件解释为纯文本并显示所有 ef 文本,即使我强迫它们使用多种 utf 编码读取文件也是如此。我尝试了“重新编码”工具,它不起作用。即使 php 的 utf8_decode 也无法产生预期的文本输出。

请帮忙,我该如何转换该文件以便我可以读取它?

I have some UTF-Text starting with "ef bb bf". How can I turn this message to human read-able text? vim, gedit, etc. interpret the file as plain text and show all the ef-text even when I force them to read the file with several utf-encodings. I tried the "recode" tool, it doesn't work. Even php's utf8_decode failed to produce the expected text output.

Please help, how can I convert this file so that I can read it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

转角预定愛 2024-11-23 22:43:44

ef bb bfUTF-8 BOM。去掉前三个字节并尝试对剩余部分进行 utf8_decode 处理。

$text = "\xef\xbb\xbf....";
echo utf8_decode(substr($text, 3));

ef bb bf is the UTF-8 BOM. Strip of the first three bytes and try to utf8_decode the remainder.

$text = "\xef\xbb\xbf....";
echo utf8_decode(substr($text, 3));
瞄了个咪的 2024-11-23 22:43:44

是UFT8、UTF16、UTF32吗?这很重要!我假设您想将文本转换为老式 ASCII(所有字符均为 1 字节长)。

UTF8 应该已经(至少大部分)可读,因为它使用 1 个字节表示标准 ASCII 字符,仅使用多个字节表示特殊/多语言字符(字符代码 > 127)。听起来您的文件不是 UTF8,或者您已经能够读取它了!在线内容一般都是UTF-8。

Unicode 字符代码与旧的 ASCII 代码相同,最高可达 127。UTF16

和 UTF32 总是分别使用 2 和 4 个字节来编码每个字符,无论这些字符是否可以用单个字节表示。如果文本编辑器需要 UTF8,这会导致它无法读取。

Gedit 支持 UTF16 和 UTF32,但您需要在打开的对话框中显式“添加”这些编码(并且可能显式选择它们而不是使用自动检测)

Is it UFT8, UTF16, UTF32? It matters a lot! I assume you want to convert the text into old-fashioned ASCII (all characters are 1 byte long).

UTF8 should already be (at least mostly) readable as it uses 1 byte for standard ASCII characters and only uses multiple bytes for special/multilingual characters (Character codes > 127). It sounds like your file isn't UTF8, or you'd already be able to read it! Online content is generally UTF-8.

Unicode character codes are the same as the old ASCII codes up to 127.

UTF16 and UTF32 always use 2 and 4 bytes respectively to encode every character, whether those characters can be represented in a single byte or not. That makes it unreadable if the text editor is expecting UTF8.

Gedit supports UTF16 and UTF32 but you need to 'add' those encoding explicitly in the open dialog box (and possibly select them explicitly instead of using auto-detect)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文