如何将日语编码为“日本に行って”之类的内容？ (UTF-8)

发布于 2024-10-17 08:19:45 字数 778 浏览 5 评论 0原文

正如标题中的问题所述。我似乎无法找到以下任何问题的答案： php headers、css headers、html headers、mysql charsets（到 utf8_general_ci），或者

<form acceptcharset="utf-8"... >

真的被这个难住了。

我基本上正在经历这个过程：

输入日语字符，通过表单进行处理
表单保存在 MySQL DB 中
PHP 从 MySQL DB 中提取数据，并将其格式化为网页

在第 3 步，我检查代码并看到它确实显示日语字符。因为它正在这样做，所以我猜测它导致了我收到的 PHP 错误（对于英文字符工作正常的函数对于日语文本工作得不太好）。

所以我想以UTF-8格式编码，但我不知道该怎么做？

编辑：这是我在日语文本上使用的 PHP 函数，

function short_text_jap($text, $length=300) { 
    if (strlen($text) > $length) { 
            $pattern = '/^(.{0,'.$length.'}\\b).*$/s'; 
            $text = preg_replace($pattern, "$1...", $text); 
    } 
    return $text;

但它返回的是整个内容，而不是缩短的文本量。

原文

As the question in the title states.
I can't seem to find the answer with any of the following:
php headers, css headers, html headers, mysql charsets (to utf8_general_ci), or

<form acceptcharset="utf-8"... >

Really stumped on this one.

I'm basically going through this process:

Type Japanese characters, process through a form
Form saves in MySQL DB
PHP pulls data out of MySQL DB, and formats it for a webpage

At step 3, I check the code and see that it's literally displaying the Japanese characters.
Because it's doing that, I'm guessing it's causing the PHP errors I'm getting (the functions that work fine for English characters aren't working so fine for the Japanese text).

So I want to encode in UTF-8 format, but I'm not sure how to do this?

Edit: Here's the PHP function I'm using on the Japanese text

function short_text_jap($text, $length=300) { 
    if (strlen($text) > $length) { 
            $pattern = '/^(.{0,'.$length.'}\\b).*$/s'; 
            $text = preg_replace($pattern, "$1...", $text); 
    } 
    return $text;

But instead of a shortened amount of text, it returns the whole thing.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

疯了 2024-10-24 08:19:45

当您似乎想要将 UTF-8 编码字符串转换为 ASCII 并将非 ASCII 字符转换为字符引用时，您可以使用 PHP 的多字节字符串函数执行此操作：

mb_substitute_character('entity');
$str = '日本語';  // UTF-8 encoded string
echo mb_convert_encoding($str, 'US-ASCII', 'UTF-8');

输出为：

日本語

As you seem to want to convert your UTF-8 encoded string to ASCII and non-ASCII characters to character references, you can use PHP’s multi-byte string functions to do so:

mb_substitute_character('entity');
$str = '日本語';  // UTF-8 encoded string
echo mb_convert_encoding($str, 'US-ASCII', 'UTF-8');

The output is:

日本語

回复收藏 0 原文

旧人九事 2024-10-24 08:19:45

对于什么是 UTF8 似乎有些混乱：通过将目标声明为获得字面日语字符的“UTF8 版本”。

像 日 这样的东西是已经在某些编码中表示的 ASCII 兼容的 HTML 实体（基本上是 Unicode 引用），而 UTF8 是一种多字节编码方案，它定义了字符在字节级别上的存储方式。

我建议依靠字面形式，因为它使国际字母表的混乱更容易管理。

只需在任何地方迁移到 UTF8：数据库、HTML、PHP 和文件类型。然后就可以使用 PHP 多字节字符串扩展来处理多字节字符：

mb_internal_encoding("UTF-8");

function short_text_jap($text, $length=300) {
    return mb_strlen($text) > $length ? mb_substr($text, 0, $length) : $text;
}

echo short_text_jap('日本語', 2); // outputs 日本

There seems to be a bit of a confusion about what UTF8 is: by stating the goal as getting the "UTF8 version" of literal Japanese characters.

Things like 日 are ASCII-compatible HTML entities (basically Unicode references) already represented in some encoding whereas UTF8 is a multibyte encoding scheme that defines how characters are stored on the byte level.

I suggest relying on the literal form since it makes the whole mess with international alphabets easier to manage.

Simply migrate to UTF8 everywhere: in the database, in HTML, in PHP and in file types. Then it would be possible to use the PHP Multibyte String extension which is designed to handle multibyte characters:

mb_internal_encoding("UTF-8");

function short_text_jap($text, $length=300) {
    return mb_strlen($text) > $length ? mb_substr($text, 0, $length) : $text;
}

echo short_text_jap('日本語', 2); // outputs 日本

回复收藏 0 原文

~没有更多了~