php mb_convert_case() 保留大写单词
假设我有一个字符串“HET1200 文本字符串”,我需要将其更改为“HET1200 文本字符串”。编码为 UTF-8。
我怎样才能做到这一点?目前,我使用 mb_convert_case($string, MB_CASE_TITLE, "UTF-8"); 但这会将“HET1200”更改为“Het1200”。
我可以指定一个例外,但这不会是详尽的。所以我宁愿所有大写单词都保持大写,
谢谢:)
Assuming I have a string "HET1200 text string" and I need it to change to "HET1200 Text String". Encoding would be UTF-8.
How can I do that? Currently, I use mb_convert_case($string, MB_CASE_TITLE, "UTF-8");
but that changes "HET1200" to "Het1200.
I could specify an exception, but it won't be an exhaustive. So I rather all uppercase words to remain uppercase.
Thanks :)
好的,让我们尝试尽可能接近地重新创建 mb_convert_case,但仅更改每个单词的第一个字符。
mb_convert_case
实现的相关部分是这样的:基本上,它执行以下操作:
mode
设置为0
。mode
将确定我们是否位于单词的第一个字符。如果是0
,我们就是,否则,我们不是。res
设置为1
。更具体地说,如果它具有属性“标记,非间距”,“标记,封闭”,“其他,格式”,“字母,修饰符”,“符号,修饰符”,则将其设置为1
、“字母,大写”、“字母,小写”、“字母,标题大写”、“标点符号,其他”或“其他,替代项”。奇怪的是,“信件,其他”不包括在内。mode
设置为0
以表明我们正在移至单词的开头。mbstring 扩展似乎没有公开字符属性。这给我们带来了一个问题,因为我们没有一个好的方法来确定一个角色是否具有
mb_convert_case
测试的 10 个属性中的任何一个。幸运的是,正则表达式中的unicode字符属性可以拯救我们。
忠实再现
mb_convert_case
而没有出现问题的小写转换变为:测试:
给出:
OK, let's try to recreate
mb_convert_case
as close as possible but only changing the first character of every word.The relevant part of
mb_convert_case
implementation is this:Basically, this does the following:
mode
to0
.mode
will determine whether we are in the first character of a word. If it's0
, we are, otherwise, we're not.res
to1
if it's a word character. More specifically, set it to1
if it has the property "Mark, Non-Spacing", "Mark, Enclosing", "Other, Format", "Letter, Modifier", "Symbol, Modifier", "Letter, Uppercase", "Letter, Lowercase", "Letter, Titlecase", "Punctuation, Other" or "Other, Surrogate". Oddly, "Letter, Other" is not included.mode
to0
to signal we're moving to the beginning of a word.The mbstring extension does not seem to expose the character properties. This leaves us with a problem, because we don't have a good way to determine if a character has any of the 10 properties for which
mb_convert_case
tests.Fortunately, unicode character properties in regex can save us here.
A faithful reproduction of
mb_convert_case
without the problematic conversion to lowercase becomes:Test:
gives: