当前位置：文江博客话题详情

php mb_convert_case() 保留大写单词

发布于 09-10 09:39 字数 227 浏览 8 评论 0原文

假设我有一个字符串“HET1200 文本字符串”，我需要将其更改为“HET1200 文本字符串”。编码为 UTF-8。

我怎样才能做到这一点？目前，我使用 mb_convert_case($string, MB_CASE_TITLE, "UTF-8"); 但这会将“HET1200”更改为“Het1200”。

我可以指定一个例外，但这不会是详尽的。所以我宁愿所有大写单词都保持大写，

谢谢:)

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尛丟丟2024-09-17 09:39:18

好的，让我们尝试尽可能接近地重新创建 mb_convert_case，但仅更改每个单词的第一个字符。

mb_convert_case 实现的相关部分是这样的：

int mode = 0; 

for (i = 0; i < unicode_len; i+=4) {
    int res = php_unicode_is_prop(
        BE_ARY_TO_UINT32(&unicode_ptr[i]),
        UC_MN|UC_ME|UC_CF|UC_LM|UC_SK|UC_LU|UC_LL|UC_LT|UC_PO|UC_OS, 0);
    if (mode) {
        if (res) {
            UINT32_TO_BE_ARY(&unicode_ptr[i],
                php_unicode_tolower(BE_ARY_TO_UINT32(&unicode_ptr[i]),
                    _src_encoding TSRMLS_CC));
        } else {
            mode = 0;
        }   
    } else {
        if (res) {
            mode = 1;
            UINT32_TO_BE_ARY(&unicode_ptr[i],
                php_unicode_totitle(BE_ARY_TO_UINT32(&unicode_ptr[i]),
                    _src_encoding TSRMLS_CC));
        }
    }
}

基本上，它执行以下操作：

将 mode 设置为 0。 mode 将确定我们是否位于单词的第一个字符。如果是0，我们就是，否则，我们不是。
遍历字符串的字符。
- 确定它是什么样的角色。
  - 如果是单词字符，请将 res 设置为 1。更具体地说，如果它具有属性“标记，非间距”，“标记，封闭”，“其他，格式”，“字母，修饰符”，“符号，修饰符”，则将其设置为 1 、“字母，大写”、“字母，小写”、“字母，标题大写”、“标点符号，其他”或“其他，替代项”。奇怪的是，“信件，其他”不包括在内。
- 如果我们不在单词的开头
  - 如果我们遇到的是单词字符，请将其转换为小写 - 这是我们不想要的。
  - 否则，我们不会到达单词字符，因此我们将 mode 设置为 0 以表明我们正在移至单词的开头。
- 如果我们位于单词的开头并且确实有一个单词字符
  - 将此字符转换为标题大小写
  - 表明我们不再处于单词的开头。

mbstring 扩展似乎没有公开字符属性。这给我们带来了一个问题，因为我们没有一个好的方法来确定一个角色是否具有 mb_convert_case 测试的 10 个属性中的任何一个。

幸运的是，正则表达式中的unicode字符属性可以拯救我们。

忠实再现 mb_convert_case 而没有出现问题的小写转换变为：

function mb_convert_case_utf8_variation($s) {
    $arr = preg_split("//u", $s, -1, PREG_SPLIT_NO_EMPTY);
    $result = "";
    $mode = false;
    foreach ($arr as $char) {
        $res = preg_match(
            '/\\p{Mn}|\\p{Me}|\\p{Cf}|\\p{Lm}|\\p{Sk}|\\p{Lu}|\\p{Ll}|'.
            '\\p{Lt}|\\p{Sk}|\\p{Cs}/u', $char) == 1;
        if ($mode) {
            if (!$res)
                $mode = false;
        }
        elseif ($res) {
            $mode = true;
            $char = mb_convert_case($char, MB_CASE_TITLE, "UTF-8");
        }
        $result .= $char;
    }

    return $result;
}

测试：

echo mb_convert_case_utf8_variation("HETÁ1200 Ááxt ítring uii");

给出：

HETÁ1200 Ááxt Ítring Uii

OK, let's try to recreate mb_convert_case as close as possible but only changing the first character of every word.

The relevant part of mb_convert_case implementation is this:

int mode = 0; 

for (i = 0; i < unicode_len; i+=4) {
    int res = php_unicode_is_prop(
        BE_ARY_TO_UINT32(&unicode_ptr[i]),
        UC_MN|UC_ME|UC_CF|UC_LM|UC_SK|UC_LU|UC_LL|UC_LT|UC_PO|UC_OS, 0);
    if (mode) {
        if (res) {
            UINT32_TO_BE_ARY(&unicode_ptr[i],
                php_unicode_tolower(BE_ARY_TO_UINT32(&unicode_ptr[i]),
                    _src_encoding TSRMLS_CC));
        } else {
            mode = 0;
        }   
    } else {
        if (res) {
            mode = 1;
            UINT32_TO_BE_ARY(&unicode_ptr[i],
                php_unicode_totitle(BE_ARY_TO_UINT32(&unicode_ptr[i]),
                    _src_encoding TSRMLS_CC));
        }
    }
}

Basically, this does the following:

Set mode to 0. mode will determine whether we are in the first character of a word. If it's 0, we are, otherwise, we're not.
Iterate through the characters of string.
- Determine what kind of character it is.
  - Set res to 1 if it's a word character. More specifically, set it to 1 if it has the property "Mark, Non-Spacing", "Mark, Enclosing", "Other, Format", "Letter, Modifier", "Symbol, Modifier", "Letter, Uppercase", "Letter, Lowercase", "Letter, Titlecase", "Punctuation, Other" or "Other, Surrogate". Oddly, "Letter, Other" is not included.
- If we're not in the beginning of a word
  - If we're at a word character, convert it to lowercase – this is what we don't want.
  - Otherwise, we're not at a word character, and we set mode to 0 to signal we're moving to the beginning of a word.
- If we're at the beggining of a word and we indeed have a word character
  - Convert this character to title case
  - Signal we're no longer at the beginning of a word.

The mbstring extension does not seem to expose the character properties. This leaves us with a problem, because we don't have a good way to determine if a character has any of the 10 properties for which mb_convert_case tests.

Fortunately, unicode character properties in regex can save us here.

A faithful reproduction of mb_convert_case without the problematic conversion to lowercase becomes:

function mb_convert_case_utf8_variation($s) {
    $arr = preg_split("//u", $s, -1, PREG_SPLIT_NO_EMPTY);
    $result = "";
    $mode = false;
    foreach ($arr as $char) {
        $res = preg_match(
            '/\\p{Mn}|\\p{Me}|\\p{Cf}|\\p{Lm}|\\p{Sk}|\\p{Lu}|\\p{Ll}|'.
            '\\p{Lt}|\\p{Sk}|\\p{Cs}/u', $char) == 1;
        if ($mode) {
            if (!$res)
                $mode = false;
        }
        elseif ($res) {
            $mode = true;
            $char = mb_convert_case($char, MB_CASE_TITLE, "UTF-8");
        }
        $result .= $char;
    }

    return $result;
}

Test:

echo mb_convert_case_utf8_variation("HETÁ1200 Ááxt ítring uii");

gives:

HETÁ1200 Ááxt Ítring Uii

回复收藏 0 原文

~没有更多了~

关于作者

余厌

暂无简介

0 文章

0 评论

23 人气

关注发私信

爱人如己

文章 0 评论 0

关注

萧瑟寒风

文章 0 评论 0

关注

云雾

文章 0 评论 0

关注

倒带

文章 0 评论 0

关注

浮世清欢

文章 0 评论 0

关注

撩起发的微风

文章 0 评论 0

友情链接

文江博客

php mb_convert_case() 保留大写单词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

爱人如己

萧瑟寒风

云雾

倒带

浮世清欢

撩起发的微风

友情链接

php mb_convert_case() 保留大写单词

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

爱人如己

萧瑟寒风

云雾

倒带

浮世清欢

撩起发的微风

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。