不区分大小写的字符串比较

发布于 2024-10-27 02:46:01 字数 146 浏览 1 评论 0原文

我想比较两个变量以查看它们是否相同,但我希望此比较不区分大小写。

例如,这将区分大小写:

if($var1 == $var2){
   ...
}

但我希望它不区分大小写,我将如何处理这个问题?

I would like to compare two variables to see if they are the same, but I want this comparison to be case-insensitive.

For example, this would be case sensitive:

if($var1 == $var2){
   ...
}

But I want this to be case insensitive, how would I approach this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

我不吻晚风 2024-11-03 02:46:01

这相当简单;您只需对这两个变量调用 strtolower() 即可。

如果需要处理 Unicode 或国际字符集,可以使用 mb_strtolower()

请注意,其他答案建议使用 strcasecmp() - 该函数不处理多字节字符,因此任何 UTF-8 字符串的结果都是假的。

This is fairly simple; you just need to call strtolower() on both variables.

If you need to deal with Unicode or international character sets, you can use mb_strtolower().

Please note that other answers suggest using strcasecmp()—that function does not handle multibyte characters, so results for any UTF-8 string will be bogus.

难理解 2024-11-03 02:46:01

strcasecmp() 如果字符串相同(大小写除外),则返回 0变体)所以你可以使用:

if (strcasecmp($var1, $var2) == 0) {
}

strcasecmp() returns 0 if the strings are the same (apart from case variations) so you can use:

if (strcasecmp($var1, $var2) == 0) {
}
筱武穆 2024-11-03 02:46:01

如果您的字符串采用单字节编码,则很简单:

if(strtolower($var1) === strtolower($var2))

如果您的字符串是 UTF-8,则必须考虑 Unicode 的复杂性:to-lower-case 和 to-upper-case 不是双射函数,即如果您有一个小写字符,将其转换为大写字符,然后将其转换回小写字符,最终可能不会得到相同的代码点(如果以大写字符开头,情况也是如此)。

例如

  • “ı”(上面带点的拉丁大写字母 I,U+0130)是一个大写字符,其中“i”(拉丁小写字母 I,U+0069 ) 作为其小写变体,而“i”的大写变体是“I”(拉丁大写字母 I, U+0049)。
  • “ı”(拉丁小写字母 Dotless I, U+0131)是一个小写字符,以“I”(拉丁大写字母 I, U+0049)作为其字符大写变体 - 而“I”的小写变体是“i”(拉丁小写字母 I, U+0069

因此 mb_strtolower('ı') === mb_strtolower(' i') 返回 false,即使它们具有相同的大写字符。如果您确实想要一个不区分大小写的字符串比较函数,则必须比较大写版本和小写版本:

if(mb_strtolower($string1) === mb_strtolower($string2)
  || mb_strtoupper($string1) === mb_strtoupper($string2))

我已经从 https://codepoints.net (https://dumps.codepoints.net) 并且我找到了 180 个代码点,在采用小写字符的大写的小写时,我发现了不同的字符,并且在采用大写字符的小写的大写时,我发现了 8 个代码点

< strong>但情况变得更糟:用户看到的同一个字素簇可能有多种编码方式:“ä”可能表示为 带分音符的拉丁小写字母 a (U+00E4)< /code> 或 Latin Small Letter A (U+0061)Combining Diaeresis (U+0308) – 如果您在字节级别比较它们,这将是' t 返回 true!

但 Unicode 中有一个解决方案:标准化!有四种不同的形式:NFC、NFD、NFKC、NFKD。对于字符串比较,NFC 和 NFD 是等效的,NFKC 和 NFKD 是等效的。我会采用 NFKC,因为它比 NFKD 短,并且“ff”(拉丁小连字 ff, U+FB00)将转换为两个正常的“f”(但 2⁵ 也将扩展为25...)。

结果函数变为:

function mb_is_string_equal_ci($string1, $string2) {
    $string1_normalized = Normalizer::normalize($string1, Normalizer::FORM_KC);
    $string2_normalized = Normalizer::normalize($string2, Normalizer::FORM_KC);
    return mb_strtolower($string1_normalized) === mb_strtolower($string2_normalized)
            || mb_strtoupper($string1_normalized) === mb_strtoupper($string2_normalized);
}

请注意:

  • 您需要 intlNormalizer
  • 您应该通过首先检查它们是否相等来优化此函数^^
  • 您可能想使用 NFC 而不是 NFKC,因为 NFKC 根据您的喜好删除了太多格式差异,
  • 您必须自己决定,如果您确实需要所有这些复杂性,或者如果您更喜欢此功能的更简单变体

If your string is in a single byte encoding, it's simple:

if(strtolower($var1) === strtolower($var2))

If your string is UTF-8, you have to consider the complexity of Unicode: to-lower-case and to-upper-case are not bijective functions, i.e. if you have a lower case character, transform it to upper case, and transform it back to lower case, you may not end up with the same code point (and the same holds true if you start with an upper case character).

E.g.

  • "İ" (Latin Capital Letter I with Dot Above, U+0130) is an upper case character, with "i" (Latin Small Letter I, U+0069) as its lower case variant – and "i"'s upper case variant is "I" (Latin Capital Letter I, U+0049).
  • "ı" (Latin Small Letter Dotless I, U+0131) is a lower case character, with "I" (Latin Capital Letter I, U+0049) as its upper case variant – and "I"'s lower case variant is "i" (Latin Small Letter I, U+0069)

So mb_strtolower('ı') === mb_strtolower('i') returns false, even though they have the same upper case character. If you really want a case-insensitive string comparison function, you have to compare to upper case AND the lower case version:

if(mb_strtolower($string1) === mb_strtolower($string2)
  || mb_strtoupper($string1) === mb_strtoupper($string2))

I've run a query against the Unicode database from https://codepoints.net (https://dumps.codepoints.net) and I've found 180 code point for which I found a different character when taking a lower case characters's upper case's lower case, and 8 code point for which I found a different character when taking an upper case characters's lower case's upper case

But it gets worse: the same grapheme cluster seen by the user, may have multiple ways of encoding it: "ä" may be represented as Latin Small Letter a with Diaeresis (U+00E4) or as Latin Small Letter A (U+0061) and Combining Diaeresis (U+0308) – and if you compare them at a byte level, this won't return true!

But there is a solution for this in Unicode: Normalization! There are four different forms: NFC, NFD, NFKC, NFKD. For string comparison, NFC and NFD are equivalent and NFKC and NFKD are equivalent. I'd take NFKC as it is shorter than NFKD, and "ff" (Latin Small Ligature ff, U+FB00) will be transformed to two normal "f" (but 2⁵ will also be expanded to 25…).

The resulting function becomes:

function mb_is_string_equal_ci($string1, $string2) {
    $string1_normalized = Normalizer::normalize($string1, Normalizer::FORM_KC);
    $string2_normalized = Normalizer::normalize($string2, Normalizer::FORM_KC);
    return mb_strtolower($string1_normalized) === mb_strtolower($string2_normalized)
            || mb_strtoupper($string1_normalized) === mb_strtoupper($string2_normalized);
}

Please note:

  • you need the intl package for the Normalizer
  • you should optimize this function by first checking if they're just equal^^
  • you may want to use NFC instead of NFKC, because NFKC removes too many formatting distinctions for your taste
  • you have to decide for yourself, if you really need all this complexity or if you prefer a simpler variant of this function
奈何桥上唱咆哮 2024-11-03 02:46:01
if(strtolower($var1) == strtolower($var2)){
}
if(strtolower($var1) == strtolower($var2)){
}
请持续率性 2024-11-03 02:46:01
// to return boolean "are input strings equal?"
function mb_strcasecmp($s1,$s2): bool {
  return 0 === mb_stripos($s1,$s2) && 0 === mb_stripos($s2,$s1);
}

此版本支持多字节字符,但不需要 intl 包。

// to return boolean "are input strings equal?"
function mb_strcasecmp($s1,$s2): bool {
  return 0 === mb_stripos($s1,$s2) && 0 === mb_stripos($s2,$s1);
}

This version supports multibyte characters but doesn't require the intl package.

东北女汉子 2024-11-03 02:46:01

为什么不:

if(strtolower($var1) == strtolower($var2)){
}

Why not:

if(strtolower($var1) == strtolower($var2)){
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文