当前位置：文江博客话题详情

如何进行不区分大小写的字符串比较？

发布于 2024-07-09 13:48:40 字数 117 浏览 11 评论 0原文

如何在Python中以不区分大小写的方式比较字符串？

我想使用简单的 Python 代码封装常规字符串与存储库字符串的比较。我还希望能够使用常规 python 字符串在由字符串散列的字典中查找值。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

心舞飞扬 2024-07-16 13:48:40

假设 ASCII 字符串：

string1 = 'Hello'
string2 = 'hello'

if string1.lower() == string2.lower():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are NOT the same (case insensitive)")

从 Python 3.3 开始， casefold() 为更好的选择：

string1 = 'Hello'
string2 = 'hello'

if string1.casefold() == string2.casefold():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are NOT the same (case insensitive)")

如果您想要一个更全面的解决方案来处理更复杂的 unicode 比较，请参阅其他答案。

Assuming ASCII strings:

string1 = 'Hello'
string2 = 'hello'

if string1.lower() == string2.lower():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are NOT the same (case insensitive)")

As of Python 3.3, casefold() is a better alternative:

string1 = 'Hello'
string2 = 'hello'

if string1.casefold() == string2.casefold():
    print("The strings are the same (case insensitive)")
else:
    print("The strings are NOT the same (case insensitive)")

If you want a more comprehensive solution that handles more complex unicode comparisons, see other answers.

回复收藏 0 原文

没企图 2024-07-16 13:48:40

以不区分大小写的方式比较字符串似乎微不足道，但事实并非如此。我将使用 Python 3，因为 Python 2 在这里尚未开发。

首先要注意的是，Unicode 中的大小写删除转换并非易事。存在 text.lower() != text.upper().lower() 的文本，例如 "ß"：

>>> "ß".lower()
'ß'
>>> "ß".upper().lower()
'ss'

但是假设您想要进行无大小写比较“BUSSE” 和 “Buße”。哎呀，您可能还想将 "BUSSE" 和 "BUẞE" 进行比较 - 这是较新的大写形式。推荐的方法是使用 casefold：

str.折页()
返回字符串的折叠副本。折叠琴弦可用于
无大小写匹配。
大小写类似于小写，但更具侵略性，因为它
旨在删除字符串中的所有大小写区别。 [...]

不要只使用lower。如果 casefold 不可用，执行 .upper().lower() 会有所帮助（但只是有所帮助）。

那么你应该考虑口音。如果您的字体渲染器很好，您可能会认为 "ê" == "ê" - 但事实并非如此：

>>> "ê" == "ê"
False

这是因为后者的重音是一个组合字符。

>>> import unicodedata
>>> [unicodedata.name(char) for char in "ê"]
['LATIN SMALL LETTER E WITH CIRCUMFLEX']
>>> [unicodedata.name(char) for char in "ê"]
['LATIN SMALL LETTER E', 'COMBINING CIRCUMFLEX ACCENT']

处理这个问题最简单的方法是 unicodedata.normalize< /a>. 您可能想使用 NFKD标准化，但随意检查文档。然后我们就

>>> unicodedata.normalize("NFKD", "ê") == unicodedata.normalize("NFKD", "ê")
True

完成了，这里用函数来表达：

import unicodedata

def normalize_caseless(text):
    return unicodedata.normalize("NFKD", text.casefold())

def caseless_equal(left, right):
    return normalize_caseless(left) == normalize_caseless(right)

Comparing strings in a case insensitive way seems trivial, but it's not. I will be using Python 3, since Python 2 is underdeveloped here.

The first thing to note is that case-removing conversions in Unicode aren't trivial. There is text for which text.lower() != text.upper().lower(), such as "ß":

>>> "ß".lower()
'ß'
>>> "ß".upper().lower()
'ss'

But let's say you wanted to caselessly compare "BUSSE" and "Buße". Heck, you probably also want to compare "BUSSE" and "BUẞE" equal - that's the newer capital form. The recommended way is to use casefold:

str.casefold()
Return a casefolded copy of the string. Casefolded strings may be used for
caseless matching.
Casefolding is similar to lowercasing but more aggressive because it is
intended to remove all case distinctions in a string. [...]

Do not just use lower. If casefold is not available, doing .upper().lower() helps (but only somewhat).

Then you should consider accents. If your font renderer is good, you probably think "ê" == "ê" - but it doesn't:

>>> "ê" == "ê"
False

This is because the accent on the latter is a combining character.

>>> import unicodedata
>>> [unicodedata.name(char) for char in "ê"]
['LATIN SMALL LETTER E WITH CIRCUMFLEX']
>>> [unicodedata.name(char) for char in "ê"]
['LATIN SMALL LETTER E', 'COMBINING CIRCUMFLEX ACCENT']

The simplest way to deal with this is unicodedata.normalize. You probably want to use NFKD normalization, but feel free to check the documentation. Then one does

>>> unicodedata.normalize("NFKD", "ê") == unicodedata.normalize("NFKD", "ê")
True

To finish up, here this is expressed in functions:

import unicodedata

def normalize_caseless(text):
    return unicodedata.normalize("NFKD", text.casefold())

def caseless_equal(left, right):
    return normalize_caseless(left) == normalize_caseless(right)

回复收藏 0 原文

滿滿的愛 2024-07-16 13:48:40

使用 Python 2，在每个字符串或 Unicode 对象上调用 .lower()

string1.lower() == string2.lower()

... 大多数情况下都可以工作，但实际上在 @tchrist 描述的情况。

假设我们有一个名为 unicode.txt 的文件，其中包含两个字符串 Σίσυφος 和 ΣΊΣYΦΟΣ。在 Python 2 中：

>>> utf8_bytes = open("unicode.txt", 'r').read()
>>> print repr(utf8_bytes)
'\xce\xa3\xce\xaf\xcf\x83\xcf\x85\xcf\x86\xce\xbf\xcf\x82\n\xce\xa3\xce\x8a\xce\xa3\xce\xa5\xce\xa6\xce\x9f\xce\xa3\n'
>>> u = utf8_bytes.decode('utf8')
>>> print u
Σίσυφος
ΣΊΣΥΦΟΣ

>>> first, second = u.splitlines()
>>> print first.lower()
σίσυφος
>>> print second.lower()
σίσυφοσ
>>> first.lower() == second.lower()
False
>>> first.upper() == second.upper()
True

Σ 字符有两种小写形式：ς 和 σ，并且 .lower() 无法帮助比较它们（不区分大小写）。

然而，从 Python 3 开始，所有三种形式都将解析为 ς，并且在两个字符串上调用 lower() 将正常工作：

>>> s = open('unicode.txt', encoding='utf8').read()
>>> print(s)
Σίσυφος
ΣΊΣΥΦΟΣ

>>> first, second = s.splitlines()
>>> print(first.lower())
σίσυφος
>>> print(second.lower())
σίσυφος
>>> first.lower() == second.lower()
True
>>> first.upper() == second.upper()
True

因此，如果您关心希腊语中的三个 sigmas 等边缘情况，请使用 Python 3。

（作为参考， Python 2.7.3 和 Python 3.3.0b1 显示在上面的解释器打印输出中。）

Using Python 2, calling .lower() on each string or Unicode object...

string1.lower() == string2.lower()

...will work most of the time, but indeed doesn't work in the situations @tchrist has described.

Assume we have a file called unicode.txt containing the two strings Σίσυφος and ΣΊΣΥΦΟΣ. With Python 2:

>>> utf8_bytes = open("unicode.txt", 'r').read()
>>> print repr(utf8_bytes)
'\xce\xa3\xce\xaf\xcf\x83\xcf\x85\xcf\x86\xce\xbf\xcf\x82\n\xce\xa3\xce\x8a\xce\xa3\xce\xa5\xce\xa6\xce\x9f\xce\xa3\n'
>>> u = utf8_bytes.decode('utf8')
>>> print u
Σίσυφος
ΣΊΣΥΦΟΣ

>>> first, second = u.splitlines()
>>> print first.lower()
σίσυφος
>>> print second.lower()
σίσυφοσ
>>> first.lower() == second.lower()
False
>>> first.upper() == second.upper()
True

The Σ character has two lowercase forms, ς and σ, and .lower() won't help compare them case-insensitively.

However, as of Python 3, all three forms will resolve to ς, and calling lower() on both strings will work correctly:

>>> s = open('unicode.txt', encoding='utf8').read()
>>> print(s)
Σίσυφος
ΣΊΣΥΦΟΣ

>>> first, second = s.splitlines()
>>> print(first.lower())
σίσυφος
>>> print(second.lower())
σίσυφος
>>> first.lower() == second.lower()
True
>>> first.upper() == second.upper()
True

So if you care about edge-cases like the three sigmas in Greek, use Python 3.

(For reference, Python 2.7.3 and Python 3.3.0b1 are shown in the interpreter printouts above.)

回复收藏 0 原文

捎一片雪花 2024-07-16 13:48:40

Unicode 标准第 3.13 节定义了无大小写的算法
匹配。

Python 3 中的 X.casefold() == Y.casefold() 实现了“默认无大小写匹配”(D144)。

大小写折叠不会在所有实例中保留字符串的规范化，因此需要进行规范化（'å' 与 'å'）。 D145 引入了“规范无大小写匹配”：

import unicodedata

def NFD(text):
    return unicodedata.normalize('NFD', text)

def canonical_caseless(text):
    return NFD(NFD(text).casefold())

对于涉及 U+0345 字符的非常罕见的边缘情况，NFD() 被调用两次。

示例：

>>> 'å'.casefold() == 'å'.casefold()
False
>>> canonical_caseless('å') == canonical_caseless('å')
True

还有针对 '㎒' (U+3392) 等情况的兼容性无大小写匹配 (D146) 和“标识符无大小写匹配”，以简化和优化标识符的无大小写匹配。

Section 3.13 of the Unicode standard defines algorithms for caseless
matching.

X.casefold() == Y.casefold() in Python 3 implements the "default caseless matching" (D144).

Casefolding does not preserve the normalization of strings in all instances and therefore the normalization needs to be done ('å' vs. 'å'). D145 introduces "canonical caseless matching":

import unicodedata

def NFD(text):
    return unicodedata.normalize('NFD', text)

def canonical_caseless(text):
    return NFD(NFD(text).casefold())

NFD() is called twice for very infrequent edge cases involving U+0345 character.

Example:

>>> 'å'.casefold() == 'å'.casefold()
False
>>> canonical_caseless('å') == canonical_caseless('å')
True

There are also compatibility caseless matching (D146) for cases such as '㎒' (U+3392) and "identifier caseless matching" to simplify and optimize caseless matching of identifiers.

回复收藏 0 原文

旧人 2024-07-16 13:48:40

您可以使用 casefold() 方法。 casefold() 方法在比较时忽略大小写。

firstString = "Hi EVERYONE"
secondString = "Hi everyone"

if firstString.casefold() == secondString.casefold():
    print('The strings are equal.')
else:
    print('The strings are not equal.')

输出：

The strings are equal.

You can use casefold() method. The casefold() method ignores cases when comparing.

firstString = "Hi EVERYONE"
secondString = "Hi everyone"

if firstString.casefold() == secondString.casefold():
    print('The strings are equal.')
else:
    print('The strings are not equal.')

Output:

The strings are equal.

回复收藏 0 原文

明月夜 2024-07-16 13:48:40

我在这里看到了这个解决方案使用正则表达式。

import re
if re.search('mandy', 'Mandy Pande', re.IGNORECASE):
# is True

它适用于重音符号

In [42]: if re.search("ê","ê", re.IGNORECASE):
....:        print(1)
....:
1

，但是不适用于不区分大小写的 unicode 字符。谢谢@Rhymoid 指出，据我的理解，它需要确切的符号，这样案例才是真实的。输出如下：

In [36]: "ß".lower()
Out[36]: 'ß'
In [37]: "ß".upper()
Out[37]: 'SS'
In [38]: "ß".upper().lower()
Out[38]: 'ss'
In [39]: if re.search("ß","ßß", re.IGNORECASE):
....:        print(1)
....:
1
In [40]: if re.search("SS","ßß", re.IGNORECASE):
....:        print(1)
....:
In [41]: if re.search("ß","SS", re.IGNORECASE):
....:        print(1)
....:

I saw this solution here using regex.

import re
if re.search('mandy', 'Mandy Pande', re.IGNORECASE):
# is True

It works well with accents

In [42]: if re.search("ê","ê", re.IGNORECASE):
....:        print(1)
....:
1

However, it doesn't work with unicode characters case-insensitive. Thank you @Rhymoid for pointing out that as my understanding was that it needs the exact symbol, for the case to be true. The output is as follows:

In [36]: "ß".lower()
Out[36]: 'ß'
In [37]: "ß".upper()
Out[37]: 'SS'
In [38]: "ß".upper().lower()
Out[38]: 'ss'
In [39]: if re.search("ß","ßß", re.IGNORECASE):
....:        print(1)
....:
1
In [40]: if re.search("SS","ßß", re.IGNORECASE):
....:        print(1)
....:
In [41]: if re.search("ß","SS", re.IGNORECASE):
....:        print(1)
....:

回复收藏 0 原文

两仪 2024-07-16 13:48:40

通常的方法是将字符串大写或小写以进行查找和比较。例如：

>>> "hello".upper() == "HELLO".upper()
True
>>>

The usual approach is to uppercase the strings or lower case them for the lookups and comparisons. For example:

>>> "hello".upper() == "HELLO".upper()
True
>>>

回复收藏 0 原文

羁绊已千年 2024-07-16 13:48:40

先转换成小写怎么样？您可以使用string.lower()。

回复收藏 0 原文

风为裳 2024-07-16 13:48:40

我找到了一个干净的解决方案，我正在其中使用一些常量文件扩展名。

from pathlib import Path


class CaseInsitiveString(str):
   def __eq__(self, __o: str) -> bool:
      return self.casefold() == __o.casefold()

GZ = CaseInsitiveString(".gz")
ZIP = CaseInsitiveString(".zip")
TAR = CaseInsitiveString(".tar")

path = Path("/tmp/ALL_CAPS.TAR.GZ")

GZ in path.suffixes, ZIP in path.suffixes, TAR in path.suffixes, TAR == ".tAr"

# (True, False, True, True)

a clean solution that I found, where I'm working with some constant file extensions.

from pathlib import Path


class CaseInsitiveString(str):
   def __eq__(self, __o: str) -> bool:
      return self.casefold() == __o.casefold()

GZ = CaseInsitiveString(".gz")
ZIP = CaseInsitiveString(".zip")
TAR = CaseInsitiveString(".tar")

path = Path("/tmp/ALL_CAPS.TAR.GZ")

GZ in path.suffixes, ZIP in path.suffixes, TAR in path.suffixes, TAR == ".tAr"

# (True, False, True, True)

回复收藏 0 原文

情丝乱 2024-07-16 13:48:40

您可以在 str.contains() 中提及 case=False

data['Column_name'].str.contains('abcd', case=False)

You can mention case=False in the str.contains()

data['Column_name'].str.contains('abcd', case=False)

回复收藏 0 原文

￠好甜 2024-07-16 13:48:40

def search_specificword(key, stng):
    key = key.lower()
    stng = stng.lower()
    flag_present = False
    if stng.startswith(key+" "):
        flag_present = True
    symb = [',','.']
    for i in symb:
        if stng.find(" "+key+i) != -1:
            flag_present = True
    if key == stng:
        flag_present = True
    if stng.endswith(" "+key):
        flag_present = True
    if stng.find(" "+key+" ") != -1:
        flag_present = True
    print(flag_present)
    return flag_present

输出：
search_specicword("经济适用住房", "欧洲经济适用住房的核心")
False

search_specificword("经济适用房", "欧洲经济适用房的核心")
真的

def search_specificword(key, stng):
    key = key.lower()
    stng = stng.lower()
    flag_present = False
    if stng.startswith(key+" "):
        flag_present = True
    symb = [',','.']
    for i in symb:
        if stng.find(" "+key+i) != -1:
            flag_present = True
    if key == stng:
        flag_present = True
    if stng.endswith(" "+key):
        flag_present = True
    if stng.find(" "+key+" ") != -1:
        flag_present = True
    print(flag_present)
    return flag_present

Output:
search_specificword("Affordable housing", "to the core of affordable outHousing in europe")
False

search_specificword("Affordable housing", "to the core of affordable Housing, in europe")
True

回复收藏 0 原文

故事灯 2024-07-16 13:48:40

from re import search, IGNORECASE

def is_string_match(word1, word2):
    #  Case insensitively function that checks if two words are the same
    # word1: string
    # word2: string | list

    # if the word1 is in a list of words
    if isinstance(word2, list):
        for word in word2:
            if search(rf'\b{word1}\b', word, IGNORECASE):
                return True
        return False

    # if the word1 is same as word2
    if search(rf'\b{word1}\b', word2, IGNORECASE):
        return True
    return False

is_match_word = is_string_match("Hello", "hELLO") 
True

is_match_word = is_string_match("Hello", ["Bye", "hELLO", "@vagavela"])
True

is_match_word = is_string_match("Hello", "Bye")
False

from re import search, IGNORECASE

def is_string_match(word1, word2):
    #  Case insensitively function that checks if two words are the same
    # word1: string
    # word2: string | list

    # if the word1 is in a list of words
    if isinstance(word2, list):
        for word in word2:
            if search(rf'\b{word1}\b', word, IGNORECASE):
                return True
        return False

    # if the word1 is same as word2
    if search(rf'\b{word1}\b', word2, IGNORECASE):
        return True
    return False

is_match_word = is_string_match("Hello", "hELLO") 
True

is_match_word = is_string_match("Hello", ["Bye", "hELLO", "@vagavela"])
True

is_match_word = is_string_match("Hello", "Bye")
False

回复收藏 0 原文

无人问我粥可暖 2024-07-16 13:48:40

考虑使用 FoldedCase 来自 jaraco.text：

>>> from jaraco.text import FoldedCase
>>> FoldedCase('Hello World') in ['hello world']
True

如果您想要一个以文本为键的字典（无论大小写），请使用 FoldedCaseKeyedDict 来自 jaraco.collections：

>>> from jaraco.collections import FoldedCaseKeyedDict
>>> d = FoldedCaseKeyedDict()
>>> d['heLlo'] = 'world'
>>> list(d.keys()) == ['heLlo']
True
>>> d['hello'] == 'world'
True
>>> 'hello' in d
True
>>> 'HELLO' in d
True

Consider using FoldedCase from jaraco.text:

>>> from jaraco.text import FoldedCase
>>> FoldedCase('Hello World') in ['hello world']
True

And if you want a dictionary keyed on text irrespective of case, use FoldedCaseKeyedDict from jaraco.collections:

>>> from jaraco.collections import FoldedCaseKeyedDict
>>> d = FoldedCaseKeyedDict()
>>> d['heLlo'] = 'world'
>>> list(d.keys()) == ['heLlo']
True
>>> d['hello'] == 'world'
True
>>> 'hello' in d
True
>>> 'HELLO' in d
True

回复收藏 0 原文

地狱即天堂 2024-07-16 13:48:40

def insenStringCompare(s1, s2):
    """ Method that takes two strings and returns True or False, based
        on if they are equal, regardless of case."""
    try:
        return s1.lower() == s2.lower()
    except AttributeError:
        print "Please only pass strings into this method."
        print "You passed a %s and %s" % (s1.__class__, s2.__class__)

def insenStringCompare(s1, s2):
    """ Method that takes two strings and returns True or False, based
        on if they are equal, regardless of case."""
    try:
        return s1.lower() == s2.lower()
    except AttributeError:
        print "Please only pass strings into this method."
        print "You passed a %s and %s" % (s1.__class__, s2.__class__)

回复收藏 0 原文

睫毛上残留的泪 2024-07-16 13:48:40

这是另一个我在上周学会喜欢/讨厌的正则表达式，所以通常导入为（在本例中是的）反映我感受的东西！
制作一个正常的函数....要求输入，然后使用....something = re.compile(r'foo*|spam*', yes.I)...... re.I (yes.I下面）与 IGNORECASE 相同，但你不能在编写它时犯那么多错误！

然后，您使用正则表达式搜索您的消息，但老实说，这应该是它自己的几页，但重点是 foo 或垃圾邮件通过管道连接在一起，并且忽略大小写。
然后，如果找到其中一个，那么lost_n_found将显示其中一个。如果两者都不是，则lost_n_found等于None。如果它不等于none，则使用“returnlost_n_found.lower()”以小写形式返回user_input，

这使您可以更轻松地匹配任何区分大小写的内容。最后（NCS）代表“没有人认真关心......！” 或者不区分大小写...

如果有人有任何问题请联系我..

    import re as yes

    def bar_or_spam():

        message = raw_input("\nEnter FoO for BaR or SpaM for EgGs (NCS): ") 

        message_in_coconut = yes.compile(r'foo*|spam*',  yes.I)

        lost_n_found = message_in_coconut.search(message).group()

        if lost_n_found != None:
            return lost_n_found.lower()
        else:
            print ("Make tea not love")
            return

    whatz_for_breakfast = bar_or_spam()

    if whatz_for_breakfast == foo:
        print ("BaR")

    elif whatz_for_breakfast == spam:
        print ("EgGs")

This is another regex which I have learned to love/hate over the last week so usually import as (in this case yes) something that reflects how im feeling!
make a normal function.... ask for input, then use ....something = re.compile(r'foo*|spam*', yes.I)...... re.I (yes.I below) is the same as IGNORECASE but you cant make as many mistakes writing it!

You then search your message using regex's but honestly that should be a few pages in its own , but the point is that foo or spam are piped together and case is ignored.
Then if either are found then lost_n_found would display one of them. if neither then lost_n_found is equal to None. If its not equal to none return the user_input in lower case using "return lost_n_found.lower()"

This allows you to much more easily match up anything thats going to be case sensitive. Lastly (NCS) stands for "no one cares seriously...!" or not case sensitive....whichever

if anyone has any questions get me on this..

    import re as yes

    def bar_or_spam():

        message = raw_input("\nEnter FoO for BaR or SpaM for EgGs (NCS): ") 

        message_in_coconut = yes.compile(r'foo*|spam*',  yes.I)

        lost_n_found = message_in_coconut.search(message).group()

        if lost_n_found != None:
            return lost_n_found.lower()
        else:
            print ("Make tea not love")
            return

    whatz_for_breakfast = bar_or_spam()

    if whatz_for_breakfast == foo:
        print ("BaR")

    elif whatz_for_breakfast == spam:
        print ("EgGs")

回复收藏 0 原文

~没有更多了~