不区分大小写的字符串比较
我想比较两个变量以查看它们是否相同,但我希望此比较不区分大小写。
例如,这将区分大小写:
if($var1 == $var2){
...
}
但我希望它不区分大小写,我将如何处理这个问题?
I would like to compare two variables to see if they are the same, but I want this comparison to be case-insensitive.
For example, this would be case sensitive:
if($var1 == $var2){
...
}
But I want this to be case insensitive, how would I approach this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这相当简单;您只需对这两个变量调用
strtolower()
即可。如果需要处理 Unicode 或国际字符集,可以使用
mb_strtolower()
。请注意,其他答案建议使用
strcasecmp()
- 该函数不处理多字节字符,因此任何 UTF-8 字符串的结果都是假的。This is fairly simple; you just need to call
strtolower()
on both variables.If you need to deal with Unicode or international character sets, you can use
mb_strtolower()
.Please note that other answers suggest using
strcasecmp()
—that function does not handle multibyte characters, so results for any UTF-8 string will be bogus.strcasecmp()
如果字符串相同(大小写除外),则返回 0变体)所以你可以使用:strcasecmp()
returns 0 if the strings are the same (apart from case variations) so you can use:如果您的字符串采用单字节编码,则很简单:
如果您的字符串是 UTF-8,则必须考虑 Unicode 的复杂性:to-lower-case 和 to-upper-case 不是双射函数,即如果您有一个小写字符,将其转换为大写字符,然后将其转换回小写字符,最终可能不会得到相同的代码点(如果以大写字符开头,情况也是如此)。
例如
上面带点的拉丁大写字母 I,U+0130
)是一个大写字符,其中“i”(拉丁小写字母 I,U+0069
) 作为其小写变体,而“i”的大写变体是“I”(拉丁大写字母 I, U+0049
)。拉丁小写字母 Dotless I, U+0131
)是一个小写字符,以“I”(拉丁大写字母 I, U+0049
)作为其字符大写变体 - 而“I”的小写变体是“i”(拉丁小写字母 I, U+0069
)因此
mb_strtolower('ı') === mb_strtolower(' i')
返回 false,即使它们具有相同的大写字符。如果您确实想要一个不区分大小写的字符串比较函数,则必须比较大写版本和小写版本:我已经从 https://codepoints.net (https://dumps.codepoints.net) 并且我找到了 180 个代码点,在采用小写字符的大写的小写时,我发现了不同的字符,并且在采用大写字符的小写的大写时,我发现了 8 个代码点
< strong>但情况变得更糟:用户看到的同一个字素簇可能有多种编码方式:“ä”可能表示为
带分音符的拉丁小写字母 a (U+00E4)< /code> 或
Latin Small Letter A (U+0061)
和Combining Diaeresis (U+0308)
– 如果您在字节级别比较它们,这将是' t 返回 true!但 Unicode 中有一个解决方案:标准化!有四种不同的形式:NFC、NFD、NFKC、NFKD。对于字符串比较,NFC 和 NFD 是等效的,NFKC 和 NFKD 是等效的。我会采用 NFKC,因为它比 NFKD 短,并且“ff”(
拉丁小连字 ff, U+FB00
)将转换为两个正常的“f”(但 2⁵ 也将扩展为25...)。结果函数变为:
请注意:
If your string is in a single byte encoding, it's simple:
If your string is UTF-8, you have to consider the complexity of Unicode: to-lower-case and to-upper-case are not bijective functions, i.e. if you have a lower case character, transform it to upper case, and transform it back to lower case, you may not end up with the same code point (and the same holds true if you start with an upper case character).
E.g.
Latin Capital Letter I with Dot Above, U+0130
) is an upper case character, with "i" (Latin Small Letter I, U+0069
) as its lower case variant – and "i"'s upper case variant is "I" (Latin Capital Letter I, U+0049
).Latin Small Letter Dotless I, U+0131
) is a lower case character, with "I" (Latin Capital Letter I, U+0049
) as its upper case variant – and "I"'s lower case variant is "i" (Latin Small Letter I, U+0069
)So
mb_strtolower('ı') === mb_strtolower('i')
returns false, even though they have the same upper case character. If you really want a case-insensitive string comparison function, you have to compare to upper case AND the lower case version:I've run a query against the Unicode database from https://codepoints.net (https://dumps.codepoints.net) and I've found 180 code point for which I found a different character when taking a lower case characters's upper case's lower case, and 8 code point for which I found a different character when taking an upper case characters's lower case's upper case
But it gets worse: the same grapheme cluster seen by the user, may have multiple ways of encoding it: "ä" may be represented as
Latin Small Letter a with Diaeresis (U+00E4)
or asLatin Small Letter A (U+0061)
andCombining Diaeresis (U+0308)
– and if you compare them at a byte level, this won't return true!But there is a solution for this in Unicode: Normalization! There are four different forms: NFC, NFD, NFKC, NFKD. For string comparison, NFC and NFD are equivalent and NFKC and NFKD are equivalent. I'd take NFKC as it is shorter than NFKD, and "ff" (
Latin Small Ligature ff, U+FB00
) will be transformed to two normal "f" (but 2⁵ will also be expanded to 25…).The resulting function becomes:
Please note:
此版本支持多字节字符,但不需要 intl 包。
This version supports multibyte characters but doesn't require the intl package.
为什么不:
Why not: