C 中不区分大小写的字符串比较

发布于 2024-11-03 14:13:23 字数 155 浏览 5 评论 0原文

我有两个邮政编码 char* 我想比较,忽略大小写。 有一个函数可以做到这一点吗?

或者我是否必须循环遍历每个使用 tolower 函数,然后进行比较?

知道这个函数将如何对字符串中的数字做出反应

谢谢

I have two postcodes char* that I want to compare, ignoring case.
Is there a function to do this?

Or do I have to loop through each use the tolower function and then do the comparison?

Any idea how this function will react with numbers in the string

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

述情 2024-11-10 14:13:24

如果我们有一个空终止字符:

   bool striseq(const char* s1,const char* s2){ 
     for(;*s1;){ 
       if(tolower(*s1++)!=tolower(*s2++)) 
         return false; 
      } 
      return *s1 == *s2;
    }

或者使用按位运算的这个版本:

    int striseq(const char* s1,const char* s2)
       {for(;*s1;) if((*s1++|32)!=(*s2++|32)) return 0; return *s1 == *s2;}

我不确定这是否适用于符号,我没有在那里进行测试,但适用于字母。

if we have a null terminated character:

   bool striseq(const char* s1,const char* s2){ 
     for(;*s1;){ 
       if(tolower(*s1++)!=tolower(*s2++)) 
         return false; 
      } 
      return *s1 == *s2;
    }

or with this version that uses bitwise operations:

    int striseq(const char* s1,const char* s2)
       {for(;*s1;) if((*s1++|32)!=(*s2++|32)) return 0; return *s1 == *s2;}

i'm not sure if this works with symbols, I haven't tested there, but works fine with letters.

苹果你个爱泡泡 2024-11-10 14:13:24
int strcmpInsensitive(char* a, char* b)
{
    return strcmp(lowerCaseWord(a), lowerCaseWord(b));
}

char* lowerCaseWord(char* a)
{
    char *b=new char[strlen(a)];
    for (int i = 0; i < strlen(a); i++)
    {
        b[i] = tolower(a[i]);   
    }
    return b;
}

祝你好运,

Edit-lowerCaseWord 函数获取一个 char* 变量,并返回该 char* 的小写值。例如,对于 char* 的值“AbCdE”,将返回“abcde”。

基本上它的作用是获取两个 char* 变量,在转换为小写后,并对它们使用 strcmp 函数。

例如,如果我们对“AbCdE”和“ABCDE”的值调用 strcmpInsensitive 函数,它将首先返回两个小写值(“abcde”),然后对它们执行 strcmp 函数。

int strcmpInsensitive(char* a, char* b)
{
    return strcmp(lowerCaseWord(a), lowerCaseWord(b));
}

char* lowerCaseWord(char* a)
{
    char *b=new char[strlen(a)];
    for (int i = 0; i < strlen(a); i++)
    {
        b[i] = tolower(a[i]);   
    }
    return b;
}

good luck

Edit-lowerCaseWord function get a char* variable with, and return the lower case value of this char*. For example "AbCdE" for value of char*, will return "abcde".

Basically what it does is to take the two char* variables, after being transferred to lower case, and make use the strcmp function on them.

For example- if we call the strcmpInsensitive function for values of "AbCdE", and "ABCDE", it will first return both values in lower case ("abcde"), and then do strcmp function on them.

北座城市 2024-11-10 14:13:24
static int ignoreCaseComp (const char *str1, const char *str2, int length)
{
    int k;
    for (k = 0; k < length; k++)
    {

        if ((str1[k] | 32) != (str2[k] | 32))
            break;
    }

    if (k != length)
        return 1;
    return 0;
}

参考

static int ignoreCaseComp (const char *str1, const char *str2, int length)
{
    int k;
    for (k = 0; k < length; k++)
    {

        if ((str1[k] | 32) != (str2[k] | 32))
            break;
    }

    if (k != length)
        return 1;
    return 0;
}

Reference

停滞 2024-11-10 14:13:23

C 标准中没有函数可以执行此操作。符合 POSIX 的 Unix 系统需要有 strcasecmp< /a> 头文件strings.h;微软系统有stricmp。为了便于移植,请编写自己的解决方案:

int strcicmp(char const *a, char const *b)
{
    for (;; a++, b++) {
        int d = tolower((unsigned char)*a) - tolower((unsigned char)*b);
        if (d != 0 || !*a)
            return d;
    }
}

但请注意,这些解决方案都不适用于 UTF-8 字符串,只能使用 ASCII 字符串。

There is no function that does this in the C standard. Unix systems that comply with POSIX are required to have strcasecmp in the header strings.h; Microsoft systems have stricmp. To be on the portable side, write your own:

int strcicmp(char const *a, char const *b)
{
    for (;; a++, b++) {
        int d = tolower((unsigned char)*a) - tolower((unsigned char)*b);
        if (d != 0 || !*a)
            return d;
    }
}

But note that none of these solutions will work with UTF-8 strings, only ASCII ones.

谁对谁错谁最难过 2024-11-10 14:13:23

在进行不区分大小写的比较时需要注意的其他陷阱:


比较小写还是大写? (很常见的问题)

以下两者都将返回 0 strcicmpL("A", "a")strcicmpU("A", "a") .
然而,strcicmpL("A", "_")strcicmpU("A", "_") 可以返回不同的签名结果,如 '_' 通常位于大小写字母之间。

这会影响与 qsort(..., ..., ..., strcicmp) 一起使用时的排序顺序。非标准库 C 函数(例如常用的 stricmp()strcasecmp())往往定义良好,并且倾向于通过小写进行比较。但存在差异。

int strcicmpL(char const *a, char const *b) {
  while (*b) {
    int d = tolower(*a) - tolower(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return tolower(*a);
}

int strcicmpU(char const *a, char const *b) {
  while (*b) {
    int d = toupper(*a) - toupper(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return toupper(*a);
}

char 可以有负值。 (并不罕见)

touppper(int)tolower(int) 指定为 unsigned char 值和负 EOF。此外,strcmp() 返回的结果就好像每个 char 都转换为 unsigned char,无论 char 是否为 < em>已签名或无签名

tolower(*a); // Potential UB
tolower((unsigned char) *a); // Correct (Almost - see following)

char 可以为负值,但不能为 2 的补码。 (罕见)

上面的代码不能正确处理 -0 或其他负值,因为位模式应解释为 unsigned char。要正确处理所有整数编码,请首先更改指针类型。

// tolower((unsigned char) *a);
tolower(*(const unsigned char *)a); // Correct

区域设置(不太常见)

尽管使用 ASCII 代码 (0-127) 的字符集普遍存在,但其余代码往往存在区域设置特定问题。因此,strcasecmp("\xE4", "a") 可能在一个系统上返回 0,而在另一个系统上返回非零。


Unicode(未来之路)

如果解决方案需要处理的内容不只 ASCII,请考虑使用 unicode_strcicmp()。由于 C lib 不提供此类函数,因此建议使用某些备用库中的预编码函数。编写自己的 unicode_strcicmp() 是一项艰巨的任务。


所有字母都从下到上映射吗? (迂腐)

[AZ] 与 [az] 一对一映射,但各种区域设置将各种小写字符映射到一个大写字符,反之亦然。此外,某些大写字符可能缺少对应的小写字符,反之亦然。

这迫使代码通过 tolower()tolower() 进行隐藏。

int d = tolower(toupper(*a)) - tolower(toupper(*b));

同样,如果代码执行 tolower(toupper(*a))toupper(tolower(*a)),则排序可能会产生不同的结果。


可移植性

@B。 Nadolson 建议避免使用自己的 strcicmp(),这是合理的,除非代码需要高度等效的可移植功能。

下面是一种甚至比某些系统提供的功能执行得更快的方法。它在每个循环中进行一次比较,而不是使用 2 个与 '\0' 不同的不同表进行两次比较。您的结果可能会有所不同。

static unsigned char low1[UCHAR_MAX + 1] = {
  0, 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...  // @ABC... Z[...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...  // `abc... z{...
}
static unsigned char low2[UCHAR_MAX + 1] = {
// v--- Not zero, but A which matches none in `low1[]`
  'A', 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...
}

int strcicmp_ch(char const *a, char const *b) {
  // compare using tables that differ slightly.
  while (low1[*(const unsigned char *)a] == low2[*(const unsigned char *)b]) {
    a++;
    b++;
  }
  // Either strings differ or null character detected.
  // Perform subtraction using same table.
  return (low1[*(const unsigned char *)a] - low1[*(const unsigned char *)b]);
}

Additional pitfalls to watch out for when doing case insensitive compares:


Comparing as lower or as upper case? (common enough issue)

Both below will return 0 with strcicmpL("A", "a") and strcicmpU("A", "a").
Yet strcicmpL("A", "_") and strcicmpU("A", "_") can return different signed results as '_' is often between the upper and lower case letters.

This affects the sort order when used with qsort(..., ..., ..., strcicmp). Non-standard library C functions like the commonly available stricmp() or strcasecmp() tend to be well defined and favor comparing via lowercase. Yet variations exist.

int strcicmpL(char const *a, char const *b) {
  while (*b) {
    int d = tolower(*a) - tolower(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return tolower(*a);
}

int strcicmpU(char const *a, char const *b) {
  while (*b) {
    int d = toupper(*a) - toupper(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return toupper(*a);
}

char can have a negative value. (not rare)

touppper(int) and tolower(int) are specified for unsigned char values and the negative EOF. Further, strcmp() returns results as if each char was converted to unsigned char, regardless if char is signed or unsigned.

tolower(*a); // Potential UB
tolower((unsigned char) *a); // Correct (Almost - see following)

char can have a negative value and not 2's complement. (rare)

The above does not handle -0 nor other negative values properly as the bit pattern should be interpreted as unsigned char. To properly handle all integer encodings, change the pointer type first.

// tolower((unsigned char) *a);
tolower(*(const unsigned char *)a); // Correct

Locale (less common)

Although character sets using ASCII code (0-127) are ubiquitous, the remainder codes tend to have locale specific issues. So strcasecmp("\xE4", "a") might return a 0 on one system and non-zero on another.


Unicode (the way of the future)

If a solution needs to handle more than ASCII consider a unicode_strcicmp(). As C lib does not provide such a function, a pre-coded function from some alternate library is recommended. Writing your own unicode_strcicmp() is a daunting task.


Do all letters map one lower to one upper? (pedantic)

[A-Z] maps one-to-one with [a-z], yet various locales map various lower case chracters to one upper and visa-versa. Further, some uppercase characters may lack a lower case equivalent and again, visa-versa.

This obliges code to covert through both tolower() and tolower().

int d = tolower(toupper(*a)) - tolower(toupper(*b));

Again, potential different results for sorting if code did tolower(toupper(*a)) vs. toupper(tolower(*a)).


Portability

@B. Nadolson recommends to avoid rolling your own strcicmp() and this is reasonable, except when code needs high equivalent portable functionality.

Below is an approach that even performed faster than some system provided functions. It does a single compare per loop rather than two by using 2 different tables that differ with '\0'. Your results may vary.

static unsigned char low1[UCHAR_MAX + 1] = {
  0, 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...  // @ABC... Z[...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...  // `abc... z{...
}
static unsigned char low2[UCHAR_MAX + 1] = {
// v--- Not zero, but A which matches none in `low1[]`
  'A', 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...
}

int strcicmp_ch(char const *a, char const *b) {
  // compare using tables that differ slightly.
  while (low1[*(const unsigned char *)a] == low2[*(const unsigned char *)b]) {
    a++;
    b++;
  }
  // Either strings differ or null character detected.
  // Perform subtraction using same table.
  return (low1[*(const unsigned char *)a] - low1[*(const unsigned char *)b]);
}
铁轨上的流浪者 2024-11-10 14:13:23

我发现名为 from 的内置方法包含标准 header 的附加字符串函数。

这是相关的签名:

int  strcasecmp(const char *, const char *);
int  strncasecmp(const char *, const char *, size_t);

我还发现它在 xnu 内核 (osfmk/device/subrs.c) 中是同义词,并且在以下代码中实现,因此与原始 strcmp 函数相比,您不会期望在数量上有任何行为变化。

tolower(unsigned char ch) {
    if (ch >= 'A' && ch <= 'Z')
        ch = 'a' + (ch - 'A');
    return ch;
 }

int strcasecmp(const char *s1, const char *s2) {
    const unsigned char *us1 = (const u_char *)s1,
                        *us2 = (const u_char *)s2;

    while (tolower(*us1) == tolower(*us2++))
        if (*us1++ == '\0')
            return (0);
    return (tolower(*us1) - tolower(*--us2));
}

I've found built-in such method named from which contains additional string functions to the standard header .

Here's the relevant signatures :

int  strcasecmp(const char *, const char *);
int  strncasecmp(const char *, const char *, size_t);

I also found it's synonym in xnu kernel (osfmk/device/subrs.c) and it's implemented in the following code, so you wouldn't expect to have any change of behavior in number compared to the original strcmp function.

tolower(unsigned char ch) {
    if (ch >= 'A' && ch <= 'Z')
        ch = 'a' + (ch - 'A');
    return ch;
 }

int strcasecmp(const char *s1, const char *s2) {
    const unsigned char *us1 = (const u_char *)s1,
                        *us2 = (const u_char *)s2;

    while (tolower(*us1) == tolower(*us2++))
        if (*us1++ == '\0')
            return (0);
    return (tolower(*us1) - tolower(*--us2));
}
那一片橙海, 2024-11-10 14:13:23

我会使用 stricmp()。它比较两个字符串而不考虑大小写。

请注意,在某些情况下,将字符串转换为小写可能会更快。

I would use stricmp(). It compares two strings without regard to case.

Note that, in some cases, converting the string to lower case can be faster.

避讳 2024-11-10 14:13:23

正如其他人所说,没有可移植的功能适用于所有系统。您可以使用简单的 ifdef 部分规避此问题:

#include <stdio.h>

#ifdef _WIN32
#include <string.h>
#define strcasecmp _stricmp
#else // assuming POSIX or BSD compliant system
#include <strings.h>
#endif

int main() {
    printf("%d", strcasecmp("teSt", "TEst"));
}

As others have stated, there is no portable function that works on all systems. You can partially circumvent this with simple ifdef:

#include <stdio.h>

#ifdef _WIN32
#include <string.h>
#define strcasecmp _stricmp
#else // assuming POSIX or BSD compliant system
#include <strings.h>
#endif

int main() {
    printf("%d", strcasecmp("teSt", "TEst"));
}
溇涏 2024-11-10 14:13:23

C 中 strcasecmp()strncasecmp() 的 POSIX 头文件替换

更新 2024 年 7 月 25 日:

我的最新工作现在在这里:

  1. stringslib.h
  2. stringslib.c
  3. stringslib_unittest.cpp

上面的库包含我的 my_strcasecmp()my_strncasecmp 的实现() 并使用 Gtest 直接针对非 C 标准 POSIX 标头中包含的 POSIX 函数 strcasecmp()strncasecmp() 进行测试文件名为strings.h

要测试和运行:

  1. 如果在 Linux 中,请使用常规 Bash 终端。如果在 Windows 中,请使用 MSYS2 终端。请参阅我的说明:安装并安装从头开始设置 MSYS2,包括将所有 7 个配置文件添加到 Windows 终端

  2. 首先,按照我的说明安装 Gtest:
    如何通过 gcc/g++ 或 clang 构建和使用 googletest (gtest) 和 googlemock (gmock)?

  3. 然后,克隆我的存储库并将单元测试文件作为 Bash 脚本运行:

    # 克隆它
    git 克隆 https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world.git
    
    # cd 进入目录
    cd eRCaGuy_hello_world/c
    
    # 构建并运行单元测试
    ./stringslib_unittest.cpp
    

    示例运行和输出:

    eRCaGuy_hello_world/c$ ./stringslib_unittest.cpp 
    从 /home/gabriel/Downloads/Install_Files/gtest/googletest-1.14.0/googletest/src/gtest_main.cc 运行 main()
    [==========] 从 1 个测试套件运行 2 个测试。
    [----------] 全局测试环境设置。
    [----------] 来自 stringslib 的 2 个测试
    [ 运行 ] stringslib.strncasecmp
    [确定] stringslib.strncasecmp(0毫秒)
    [ 运行 ] stringslib.strcasecmp
    [确定] stringslib.strcasecmp(0毫秒)
    [----------] 来自 stringslib 的 2 次测试(总共 0 毫秒)
    
    [----------] 全局测试环境拆解
    [==========] 运行了 1 个测试套件中的 2 个测试。 (总共 0 毫秒)
    [ 通过 ] 2 项测试。
    

strncmpci(),直接、插入式不区分大小写字符串比较替换 strncmp()strcmp()

我并不是

这是 strncmp(),并已通过大量测试用例进行了测试,如下所示。

它与 strncmp() 相同,除了:

  1. 它不区分大小写。
  2. 如果任一字符串是 null ptr,则行为不是未定义的(它是明确定义的)。如果任一字符串为 null ptr,则常规 strncmp() 具有未定义的行为(请参阅:https://en.cppreference.com/w/cpp/string/byte/strncmp)。
  3. 如果任一输入字符串是 NULL ptr,它将返回 INT_MIN 作为特殊的哨兵错误值。

限制:请注意,此代码仅适用于原始7 位 ASCII 字符集(十进制值 0 到 127,包括在内),不适用于 unicode 字符,例如 unicode字符编码 UTF-8 (最流行),UTF-16UTF-32

这里只是代码(无注释):

int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = 0;
    size_t chars_compared = 0;

    if (!str1 || !str2)
    {
        ret_code = INT_MIN;
        return ret_code;
    }

    while ((chars_compared < num) && (*str1 || *str2))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

    return ret_code;
}

完整注释版本:

/// \brief      Perform a case-insensitive string compare (`strncmp()` case-insensitive) to see
///             if two C-strings are equal.
/// \note       1. Identical to `strncmp()` except:
///               1. It is case-insensitive.
///               2. The behavior is NOT undefined (it is well-defined) if either string is a null
///               ptr. Regular `strncmp()` has undefined behavior if either string is a null ptr
///               (see: https://en.cppreference.com/w/cpp/string/byte/strncmp).
///               3. It returns `INT_MIN` as a special sentinel value for certain errors.
///             - Posted as an answer here: https://stackoverflow.com/a/55293507/4561887.
///               - Aided/inspired, in part, by `strcicmp()` here:
///                 https://stackoverflow.com/a/5820991/4561887.
/// \param[in]  str1        C string 1 to be compared.
/// \param[in]  str2        C string 2 to be compared.
/// \param[in]  num         max number of chars to compare
/// \return     A comparison code (identical to `strncmp()`, except with the addition
///             of `INT_MIN` as a special sentinel value):
///
///             INT_MIN (usually -2147483648 for int32_t integers)  Invalid arguments (one or both
///                      of the input strings is a NULL pointer).
///             <0       The first character that does not match has a lower value in str1 than
///                      in str2.
///              0       The contents of both strings are equal.
///             >0       The first character that does not match has a greater value in str1 than
///                      in str2.
int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = 0;
    size_t chars_compared = 0;

    // Check for NULL pointers
    if (!str1 || !str2)
    {
        ret_code = INT_MIN;
        return ret_code;
    }

    // Continue doing case-insensitive comparisons, one-character-at-a-time, of `str1` to `str2`, so
    // long as 1st: we have not yet compared the requested number of chars, and 2nd: the next char
    // of at least *one* of the strings is not zero (the null terminator for a C-string), meaning
    // that string still has more characters in it.
    // Note: you MUST check `(chars_compared < num)` FIRST or else dereferencing (reading) `str1` or
    // `str2` via `*str1` and `*str2`, respectively, is undefined behavior if you are reading one or
    // both of these C-strings outside of their array bounds.
    while ((chars_compared < num) && (*str1 || *str2))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            // The 2 chars just compared don't match
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

    return ret_code;
}

测试代码:

从我的 eRCaGuy_hello_world 存储库下载完整的示例代码以及单元测试:“strncmpci.c":(

这只是一个片段)

int main()
{
    printf("-----------------------\n"
           "String Comparison Tests\n"
           "-----------------------\n\n");

    int num_failures_expected = 0;

    printf("INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!\n");
    EXPECT_EQUALS(strncmpci("hey", "HEY", 3), 'h' - 'H');
    num_failures_expected++;
    printf("------ beginning ------\n\n");


    const char * str1;
    const char * str2;
    size_t n;

    // NULL ptr checks
    EXPECT_EQUALS(strncmpci(NULL, "", 0), INT_MIN);
    EXPECT_EQUALS(strncmpci("", NULL, 0), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, NULL, 0), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, "", 10), INT_MIN);
    EXPECT_EQUALS(strncmpci("", NULL, 10), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, NULL, 10), INT_MIN);

    EXPECT_EQUALS(strncmpci("", "", 0), 0);
    EXPECT_EQUALS(strncmp("", "", 0), 0);

    str1 = "";
    str2 = "";
    n = 0;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "HEY";
    n = 0;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "HEY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "heY";
    str2 = "HeY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "hey";
    str2 = "HEdY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 'y' - 'd');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "heY";
    str2 = "hEYd";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'e' - 'E');

    str1 = "heY";
    str2 = "heyd";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'Y' - 'y');

    str1 = "hey";
    str2 = "hey";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "heyd";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
    EXPECT_EQUALS(strncmp(str1, str2, n), -'d');

    str1 = "hey";
    str2 = "heyd";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hEY";
    str2 = "heyYOU";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    str1 = "hEY";
    str2 = "heyYOU";
    n = 10;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'y');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    str1 = "hEYHowAre";
    str2 = "heyYOU";
    n = 10;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 'h' - 'y');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 'n' - 'N');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to meet you.,;", 100), 0);

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO UEET YOU.,;", 100), 'm' - 'u');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to uEET YOU.,;", 100), 'm' - 'u');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to UEET YOU.,;", 100), 'm' - 'U');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 'n' - 'N');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 5), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 5), 0);

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 100), 't' - 'e');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 100), 't' - 'e');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');


    if (globals.error_count == num_failures_expected)
    {
        printf(ANSI_COLOR_GRN "All unit tests passed!" ANSI_COLOR_OFF "\n");
    }
    else
    {
        printf(ANSI_COLOR_RED "FAILED UNIT TESTS! NUMBER OF UNEXPECTED FAILURES = %i"
            ANSI_COLOR_OFF "\n", globals.error_count - num_failures_expected);
    }

    assert(globals.error_count == num_failures_expected);
    return globals.error_count;
}

示例输出:

$ gcc -Wall -Wextra -Werror -ggdb -std=c11 -o ./bin/tmp strncmpci.c && ./bin/tmp
-----------------------
字符串比较测试
-----------------------

故意的单元测试失败展示单元测试失败是什么样子!
在 main 函数的第 250 行失败! strncmpci("嘿", "嘿", 3) != 'h' - 'H'
  a: strncmpci("嘿", "嘿", 3) 是 0
  b: 'h' - 'H' 为 32

------开始------

所有单元测试都通过了!

参考文献:

  1. 这个问题&这里的其他答案提供了灵感并提供了一些见解(C 中不区分大小写的字符串比较)
  2. http://www.cplusplus.com/reference/cstring/strncmp/
  3. https://en.wikipedia.org/wiki/ASCII
  4. https://en.cppreference.com/w/c/language/operator_precedence
  5. 未定义我为修复上面的部分代码而进行的行为研究(请参阅下面的评论):
    1. Google 搜索"c 读取数组边界之外的未定义行为"
    2. 是否正在访问未定义边界之外的全局数组行为?
    3. https://en.cppreference.com/w/cpp/language/ub - 另请参阅底部的许多非常好的“外部链接”!
    4. 1/3:http:// /blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
    5. 2/3:https:/ /blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
    6. 3/3:https:/ /blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html
    7. https://blog.regehr.org/archives/213
    8. https://www.geeksforgeeks.org/accessing-array-bounds- ccpp/

进一步研究的主题

  1. (注意:这是 C++,而不是 C)Unicode 字符的小写
  2. tolower_tests.c on OnlineGDB:https://onlinegdb.com/HyZieXcew

TODO:

  1. 制作一个也适用于 Unicode 的版本 UTF-8 实现(字符编码)!

POSIX <strings.h> header file replacement for strcasecmp() and strncasecmp() in C

Update 25 July 2024:

My latest work on this is now here:

  1. stringslib.h
  2. stringslib.c
  3. stringslib_unittest.cpp

The above library contains my implementations of my_strcasecmp() and my_strncasecmp() and uses Gtest to test them directly against the POSIX functions strcasecmp() and strncasecmp() which are contained in the non-C-standard POSIX header file named strings.h.

To test and run:

  1. If in Linux, use your regular Bash terminal. If in Windows, use the MSYS2 terminal. See my instructions here: Installing & setting up MSYS2 from scratch, including adding all 7 profiles to Windows Terminal

  2. First, install Gtest by following my instructions here:
    How do I build and use googletest (gtest) and googlemock (gmock) with gcc/g++ or clang?

  3. Then, clone my repo and run the unit test file as a Bash script:

    # clone it
    git clone https://github.com/ElectricRCAircraftGuy/eRCaGuy_hello_world.git
    
    # cd into the directory
    cd eRCaGuy_hello_world/c
    
    # build and run the unit test
    ./stringslib_unittest.cpp
    

    Example run and output:

    eRCaGuy_hello_world/c$ ./stringslib_unittest.cpp 
    Running main() from /home/gabriel/Downloads/Install_Files/gtest/googletest-1.14.0/googletest/src/gtest_main.cc
    [==========] Running 2 tests from 1 test suite.
    [----------] Global test environment set-up.
    [----------] 2 tests from stringslib
    [ RUN      ] stringslib.strncasecmp
    [       OK ] stringslib.strncasecmp (0 ms)
    [ RUN      ] stringslib.strcasecmp
    [       OK ] stringslib.strcasecmp (0 ms)
    [----------] 2 tests from stringslib (0 ms total)
    
    [----------] Global test environment tear-down
    [==========] 2 tests from 1 test suite ran. (0 ms total)
    [  PASSED  ] 2 tests.
    

strncmpci(), a direct, drop-in case-insensitive string comparison replacement for strncmp() and strcmp()

I'm not really a fan of the most-upvoted answer here (in part because it seems like it isn't correct since it should continue if it reads a null terminator in either string--but not both strings at once--and it doesn't do this), so I wrote my own.

This is a direct drop-in replacement for strncmp(), and has been tested with numerous test cases, as shown below.

It is identical to strncmp() except:

  1. It is case-insensitive.
  2. The behavior is NOT undefined (it is well-defined) if either string is a null ptr. Regular strncmp() has undefined behavior if either string is a null ptr (see: https://en.cppreference.com/w/cpp/string/byte/strncmp).
  3. It returns INT_MIN as a special sentinel error value if either input string is a NULL ptr.

LIMITATIONS: Note that this code works on the original 7-bit ASCII character set only (decimal values 0 to 127, inclusive), NOT on unicode characters, such as unicode character encodings UTF-8 (the most popular), UTF-16, and UTF-32.

Here is the code only (no comments):

int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = 0;
    size_t chars_compared = 0;

    if (!str1 || !str2)
    {
        ret_code = INT_MIN;
        return ret_code;
    }

    while ((chars_compared < num) && (*str1 || *str2))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

    return ret_code;
}

Fully-commented version:

/// \brief      Perform a case-insensitive string compare (`strncmp()` case-insensitive) to see
///             if two C-strings are equal.
/// \note       1. Identical to `strncmp()` except:
///               1. It is case-insensitive.
///               2. The behavior is NOT undefined (it is well-defined) if either string is a null
///               ptr. Regular `strncmp()` has undefined behavior if either string is a null ptr
///               (see: https://en.cppreference.com/w/cpp/string/byte/strncmp).
///               3. It returns `INT_MIN` as a special sentinel value for certain errors.
///             - Posted as an answer here: https://stackoverflow.com/a/55293507/4561887.
///               - Aided/inspired, in part, by `strcicmp()` here:
///                 https://stackoverflow.com/a/5820991/4561887.
/// \param[in]  str1        C string 1 to be compared.
/// \param[in]  str2        C string 2 to be compared.
/// \param[in]  num         max number of chars to compare
/// \return     A comparison code (identical to `strncmp()`, except with the addition
///             of `INT_MIN` as a special sentinel value):
///
///             INT_MIN (usually -2147483648 for int32_t integers)  Invalid arguments (one or both
///                      of the input strings is a NULL pointer).
///             <0       The first character that does not match has a lower value in str1 than
///                      in str2.
///              0       The contents of both strings are equal.
///             >0       The first character that does not match has a greater value in str1 than
///                      in str2.
int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = 0;
    size_t chars_compared = 0;

    // Check for NULL pointers
    if (!str1 || !str2)
    {
        ret_code = INT_MIN;
        return ret_code;
    }

    // Continue doing case-insensitive comparisons, one-character-at-a-time, of `str1` to `str2`, so
    // long as 1st: we have not yet compared the requested number of chars, and 2nd: the next char
    // of at least *one* of the strings is not zero (the null terminator for a C-string), meaning
    // that string still has more characters in it.
    // Note: you MUST check `(chars_compared < num)` FIRST or else dereferencing (reading) `str1` or
    // `str2` via `*str1` and `*str2`, respectively, is undefined behavior if you are reading one or
    // both of these C-strings outside of their array bounds.
    while ((chars_compared < num) && (*str1 || *str2))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            // The 2 chars just compared don't match
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

    return ret_code;
}

Test code:

Download the entire sample code, with unit tests, from my eRCaGuy_hello_world repository here: "strncmpci.c":

(this is just a snippet)

int main()
{
    printf("-----------------------\n"
           "String Comparison Tests\n"
           "-----------------------\n\n");

    int num_failures_expected = 0;

    printf("INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!\n");
    EXPECT_EQUALS(strncmpci("hey", "HEY", 3), 'h' - 'H');
    num_failures_expected++;
    printf("------ beginning ------\n\n");


    const char * str1;
    const char * str2;
    size_t n;

    // NULL ptr checks
    EXPECT_EQUALS(strncmpci(NULL, "", 0), INT_MIN);
    EXPECT_EQUALS(strncmpci("", NULL, 0), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, NULL, 0), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, "", 10), INT_MIN);
    EXPECT_EQUALS(strncmpci("", NULL, 10), INT_MIN);
    EXPECT_EQUALS(strncmpci(NULL, NULL, 10), INT_MIN);

    EXPECT_EQUALS(strncmpci("", "", 0), 0);
    EXPECT_EQUALS(strncmp("", "", 0), 0);

    str1 = "";
    str2 = "";
    n = 0;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "HEY";
    n = 0;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "HEY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "heY";
    str2 = "HeY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "hey";
    str2 = "HEdY";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 'y' - 'd');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'h' - 'H');

    str1 = "heY";
    str2 = "hEYd";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'e' - 'E');

    str1 = "heY";
    str2 = "heyd";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'Y' - 'y');

    str1 = "hey";
    str2 = "hey";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hey";
    str2 = "heyd";
    n = 6;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'d');
    EXPECT_EQUALS(strncmp(str1, str2, n), -'d');

    str1 = "hey";
    str2 = "heyd";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 0);

    str1 = "hEY";
    str2 = "heyYOU";
    n = 3;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 0);
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    str1 = "hEY";
    str2 = "heyYOU";
    n = 10;
    EXPECT_EQUALS(strncmpci(str1, str2, n), -'y');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    str1 = "hEYHowAre";
    str2 = "heyYOU";
    n = 10;
    EXPECT_EQUALS(strncmpci(str1, str2, n), 'h' - 'y');
    EXPECT_EQUALS(strncmp(str1, str2, n), 'E' - 'e');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 100), 'n' - 'N');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to meet you.,;", 100), 0);

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO UEET YOU.,;", 100), 'm' - 'u');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to uEET YOU.,;", 100), 'm' - 'u');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice to UEET YOU.,;", 100), 'm' - 'U');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "NICE TO MEET YOU.,;", 5), 'n' - 'N');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 5), 0);
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 5), 0);

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "NICE eo UEET YOU.,;", 100), 't' - 'e');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice eo uEET YOU.,;", 100), 't' - 'e');

    EXPECT_EQUALS(strncmpci("nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');
    EXPECT_EQUALS(strncmp(  "nice to meet you.,;", "nice-eo UEET YOU.,;", 5), ' ' - '-');


    if (globals.error_count == num_failures_expected)
    {
        printf(ANSI_COLOR_GRN "All unit tests passed!" ANSI_COLOR_OFF "\n");
    }
    else
    {
        printf(ANSI_COLOR_RED "FAILED UNIT TESTS! NUMBER OF UNEXPECTED FAILURES = %i"
            ANSI_COLOR_OFF "\n", globals.error_count - num_failures_expected);
    }

    assert(globals.error_count == num_failures_expected);
    return globals.error_count;
}

Sample output:

$ gcc -Wall -Wextra -Werror -ggdb -std=c11 -o ./bin/tmp strncmpci.c && ./bin/tmp
-----------------------
String Comparison Tests
-----------------------

INTENTIONAL UNIT TEST FAILURE to show what a unit test failure looks like!
FAILED at line 250 in function main! strncmpci("hey", "HEY", 3) != 'h' - 'H'
  a: strncmpci("hey", "HEY", 3) is 0
  b: 'h' - 'H' is 32

------ beginning ------

All unit tests passed!

References:

  1. This question & other answers here served as inspiration and gave some insight (Case Insensitive String Comparison in C)
  2. http://www.cplusplus.com/reference/cstring/strncmp/
  3. https://en.wikipedia.org/wiki/ASCII
  4. https://en.cppreference.com/w/c/language/operator_precedence
  5. Undefined Behavior research I did to fix part of my code above (see comments below):
    1. Google search for "c undefined behavior reading outside array bounds"
    2. Is accessing a global array outside its bound undefined behavior?
    3. https://en.cppreference.com/w/cpp/language/ub - see also the many really great "External links" at the bottom!
    4. 1/3: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
    5. 2/3: https://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
    6. 3/3: https://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html
    7. https://blog.regehr.org/archives/213
    8. https://www.geeksforgeeks.org/accessing-array-bounds-ccpp/

Topics to further research

  1. (Note: this is C++, not C) Lowercase of Unicode character
  2. tolower_tests.c on OnlineGDB: https://onlinegdb.com/HyZieXcew

TODO:

  1. Make a version of this code which also works on Unicode's UTF-8 implementation (character encoding)!
难以启齿的温柔 2024-11-10 14:13:23

如果库中没有任何库,您可以从 此处

它对所有 256 个字符使用一个表格。

  • 在该表中,除字母外的所有字符 - 使用其 ascii 代码。
  • 对于大写字母代码 - 该表列出了小写符号的代码。

然后我们只需要遍历字符串并比较表格单元格中给定的字符:

const char *cm = charmap,
        *us1 = (const char *)s1,
        *us2 = (const char *)s2;
while (cm[*us1] == cm[*us2++])
    if (*us1++ == '\0')
        return (0);
return (cm[*us1] - cm[*--us2]);

You can get an idea, how to implement an efficient one, if you don't have any in the library, from here

It use a table for all 256 chars.

  • in that table for all chars, except letters - used its ascii codes.
  • for upper case letter codes - the table list codes of lower cased symbols.

then we just need to traverse a strings and compare our table cells for a given chars:

const char *cm = charmap,
        *us1 = (const char *)s1,
        *us2 = (const char *)s2;
while (cm[*us1] == cm[*us2++])
    if (*us1++ == '\0')
        return (0);
return (cm[*us1] - cm[*--us2]);
欢你一世 2024-11-10 14:13:23

简单的解决方案:

int str_case_ins_cmp(const char* a, const char* b) {
  int rc;

  while (1) {
    rc = tolower((unsigned char)*a) - tolower((unsigned char)*b);
    if (rc || !*a) {
      break;
    }

    ++a;
    ++b;
  }

  return rc;
}

Simple solution:

int str_case_ins_cmp(const char* a, const char* b) {
  int rc;

  while (1) {
    rc = tolower((unsigned char)*a) - tolower((unsigned char)*b);
    if (rc || !*a) {
      break;
    }

    ++a;
    ++b;
  }

  return rc;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文