读取ansi文件并转换为UTF-8字符串

发布于 2024-10-10 04:13:03 字数 87 浏览 4 评论 0原文

有什么方法可以用 PHP 做到这一点吗?

当我打印出来时,要插入的数据看起来很好。

但是当我将其插入数据库时​​,该字段变为空。

Is there any way to do that with PHP?

The data to be inserted looks fine when I print it out.

But when I insert it in the database the field becomes empty.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

赤濁 2024-10-17 04:13:03
$tmp = iconv('YOUR CURRENT CHARSET', 'UTF-8', $string);

或者

$tmp = utf8_encode($string);

奇怪的是你最终在数据库中得到一个空字符串。我可以理解您最终会在数据库中出现一些垃圾,但没有什么(空字符串)是奇怪的。

我刚刚在控制台中输入了以下内容:

iconv -l | grep -i ansi

它向我显示:

ANSI_X3.4-1968
ANSI_X3.4-1986
ANSI_X3.4
ANSI_X3.110-1983
ANSI_X3.110
MS-ANSI

这些是您当前的字符集的可能值
正如之前指出的,当您的输入字符串包含 UTF 允许的字符时,您不需要转换任何内容。

当您不想省略字符而是用类似的字符替换它们时(当它们不在 UTF-8 集中时),请更改 UTF-8//TRANSLIT 中的 UTF-8

$tmp = iconv('YOUR CURRENT CHARSET', 'UTF-8', $string);

or

$tmp = utf8_encode($string);

Strange thing is you end up with an empty string in your DB. I can understand you'll end up with some garbarge in your DB but nothing at all (empty string) is strange.

I just typed this in my console:

iconv -l | grep -i ansi

It showed me:

ANSI_X3.4-1968
ANSI_X3.4-1986
ANSI_X3.4
ANSI_X3.110-1983
ANSI_X3.110
MS-ANSI

These are possible values for YOUR CURRENT CHARSET
As pointed out before when your input string contains chars that are allowed in UTF, you dont need to convert anything.

Change UTF-8 in UTF-8//TRANSLIT when you dont want to omit chars but replace them with a look-a-like (when they are not in the UTF-8 set)

清眉祭 2024-10-17 04:13:03

“ANSI”并不是真正的字符集。这是“无论什么字符集都是创建数据的计算机中的默认字符集”的简短说法。因此,您有一个双重任务:

  1. 找出正在使用的字符集数据。
  2. 使用适当的函数转换为 UTF-8。

对于#2,我通常对 iconv() 很满意,但如果源数据碰巧使用 ISO-8859-1,则 utf8_encode() 也可以完成这项工作。

更新

看起来您不知道您的数据正在使用什么字符集。在某些情况下,如果您知道用户所在的国家/地区和语言(例如,西班牙/西班牙语),则可以通过 Microsoft Windows 在此类地区使用的默认编码来确定。

"ANSI" is not really a charset. It's a short way of saying "whatever charset is the default in the computer that creates the data". So you have a double task:

  1. Find out what's the charset data is using.
  2. Use an appropriate function to convert into UTF-8.

For #2, I'm normally happy with iconv() but utf8_encode() can also do the job if source data happens to use ISO-8859-1.

Update

It looks like you don't know what charset your data is using. In some cases, you can figure it out if you know the country and language of the user (e.g., Spain/Spanish) through the default encoding used by Microsoft Windows in such territory.

z祗昰~ 2024-10-17 04:13:03

请注意,如果转换失败,使用 iconv() 可能会返回 false。

我也遇到了一些类似的问题,如果文件以 UNICODE 编码,则中文字母中的某些字符会被误认为 \n ,但如果文件是 UFT-8 则不会。

要回到您的问题,请确保文件的编码与数据库的编码相同。另外,在已经是 utf-8 的文本上使用 utf-8_encode() 可能会产生令人不快的结果。尝试使用 mb_detect_encoding() 来查看文件的编码,但不幸的是这种方法并不总是有效。据我所知,字符编码没有简单的修复方法:(

Be careful, using iconv() can return false if the conversion fails.

I am also having a somewhat similar problem, some characters from the Chinese alphabet are mistaken for \n if the file is encoded in UNICODE, but not if it is UFT-8.

To get back to your problem, make sure the encoding of your file is the same with the one of your database. Also using utf-8_encode() on an already utf-8 text can have unpleasant results. Try using mb_detect_encoding() to see the encoding of the file, but unfortunately this way doesn't always work. There is no easy fix for character encoding from what i can see :(

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文