strcmp() 但 AZ 后面有 0-9? (C/C++)

发布于 2024-09-05 10:04:57 字数 774 浏览 3 评论 0原文

由于我完全不同意的原因,但“(反可用性)权力”尽管我反对,但仍然继续颁布法令,我有一个排序例程,它执行基本的 strcmp() 比较并按其名称排序。效果很好;很难弄错这一点。然而,在第 11 小时,决定以数字开头的条目应位于以字母开头的条目之后,这与 ASCII 顺序相反。他们引用 EBCDIC 标准的字母后有数字,因此先前的假设不是普遍真理,而且我没有能力赢得这场争论……但我离题了。

我的问题就在于此。我已将所有对 strcmp 的适当引用替换为新函数调用 nonstd_strcmp,现在需要实现修改以完成排序更改。我使用 FreeBSD 源代码作为基础: http: //freebsd.active-venture.com/FreeBSD-srctree/newsrc/libkern/strncmp.c.html

 if (n == 0)
  return (0);
 do {
  if (*s1 != *s2++)
   return (*(const unsigned char *)s1 -
    *(const unsigned char *)(s2 - 1));
  if (*s1++ == 0)
   break;
 } while (--n != 0);
 return (0);

我想我可能需要花一些时间来真正考虑应该如何完成,但我'我确信我不是唯一一个经历过发布前规格变更带来的脑死亡的人。

For reasons I completely disagree with but "The Powers (of Anti-Usability) That Be" continue to decree despite my objections, I have a sorting routine which does basic strcmp() compares to sort by its name. It works great; it's hard to get that one wrong. However, at the 11th hour, it's been decided that entries which begin with a number should come AFTER entries which begin with a letter, contrary to the ASCII ordering. They cite the EBCDIC standard has numbers following letters so the prior assumption isn't a universal truth, and I have no power to win this argument... but I digress.

Therein lies my problem. I've replaced all appropriate references to strcmp with a new function call nonstd_strcmp, and now need to implement the modifications to accomplish the sort change. I've used a FreeBSD source as my base: http://freebsd.active-venture.com/FreeBSD-srctree/newsrc/libkern/strncmp.c.html

 if (n == 0)
  return (0);
 do {
  if (*s1 != *s2++)
   return (*(const unsigned char *)s1 -
    *(const unsigned char *)(s2 - 1));
  if (*s1++ == 0)
   break;
 } while (--n != 0);
 return (0);

I guess I might need to take some time away to really think about how it should be done, but I'm sure I'm not the only one who's experienced the brain-deadness of just-before-release spec changes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

落花浅忆 2024-09-12 10:04:57

您需要做的是为每个角色创建一个排序表。这也是进行不区分大小写比较的最简单方法。

if (order_table[*s1] != order_table[*s2++])

请注意,字符可能带有符号,在这种情况下,表的索引可能会变为负数。此代码仅适用于签名字符:

int raw_order_table[256];
int * order_table = raw_order_table + 128;
for (int i = -128;  i < 128;  ++i)
    order_table[i] = (i >= '0' && i <= '9') ? i + 256 : toupper(i);

What you need to do is create an ordering table for each character. This is also the easiest way to do case-insensitive comparisons as well.

if (order_table[*s1] != order_table[*s2++])

Be aware that characters might be signed, in which case the index to your table might go negative. This code is for signed chars only:

int raw_order_table[256];
int * order_table = raw_order_table + 128;
for (int i = -128;  i < 128;  ++i)
    order_table[i] = (i >= '0' && i <= '9') ? i + 256 : toupper(i);
万劫不复 2024-09-12 10:04:57

如果您的权力与我遇到的所有其他权力一样,您可能希望将其作为一个选项(即使它是隐藏的):

排序顺序:

o 字母后数字

o 数字后的字母

,或者更糟糕的是,他们可能会发现他们希望数字按数字顺序排序(例如“A123”“A15”之后),然后可以是

o 字母后数字

o 数字后面的字母

o 字母后的智能数字

o 智能数字后的字母

这涉及到诊断真正的问题,而不是症状。我打赌他们有可能在第 11 小时第 59 分钟改变主意。

If your powers-that-be are like all the other powers-that-be that I've run into, you may want to make it an option (even if it's hidden):

Sort Order:

o Numbers after Letters

o Letters after Numbers

or even worse, they might figure out that they want Numbers to be sorted numerically (e.g. "A123" comes after "A15"), then it can be

o Numbers after Letters

o Letters after Numbers

o Smart Numbers after Letters

o Letters after Smart Numbers

This gets into diagnosing the real problem, not the symptom. I bet there's a slight chance they may change their mind at the 11th hour and 59th minute.

残疾 2024-09-12 10:04:57

比较字符时,您可以使用查找表将 ASCII 转换为 EBCDIC ;-)

You could use a lookup table to translate ASCII to EBCDIC when comparing characters ;-)

望她远 2024-09-12 10:04:57

在这种只有大写字母(如评论中的OP提到的)和数字0-9的特殊情况下,您也可以省略顺序表,而是将两个不同的字符乘以4并比较结果模数256. ASCII 数字的范围(48 到 57)不会溢出 8 位(57 × 4 = 228),但大写字母的范围(65 到 90)会溢出(65 × 4 = 260)。当我们比较模 256 的乘积时,每个字母的值将小于任何数字的值: 90×4 % 256 = 104 < 192 = 48×4

代码可能看起来像这样:

int my_strcmp (const char *s1, const char *s2) {
    for (; *s1 == *s2 && *s1; ++s1, ++s2);
    return (((*(const unsigned char *)s1) * 4) & 0xFF) - \
           (((*(const unsigned char *)s2) * 4) & 0xFF);
}

当然,顺序表解决方案一般来说要通用得多,因为它允许为每个字符定义排序顺序 - 该解决方案仅适用于 大写字母与数字。 (但例如在微控制器平台上,即使节省表使用的少量内存也可能是一个真正的好处。)

In this special case with only uppercase letters (as mentioned by the OP in comments) and digits 0-9, you could also omit the order table and instead multiply both differing characters by 4 and compare the results modulo 256. The range of ASCII digits (48 to 57) will not overflow 8 bits (57 × 4 = 228), but the range of uppercase letters (65 to 90) will (65 × 4 = 260). When we compare the multiplied values modulo 256, the value for each letter will be less than that of any digit: 90×4 % 256 = 104 < 192 = 48×4

The code might look something like:

int my_strcmp (const char *s1, const char *s2) {
    for (; *s1 == *s2 && *s1; ++s1, ++s2);
    return (((*(const unsigned char *)s1) * 4) & 0xFF) - \
           (((*(const unsigned char *)s2) * 4) & 0xFF);
}

Of course, the order table solution is far more versatile in general as it allows one to define a sort order for every character—this solution is sensible only for this special case with uppercase letters vs digits. (But e.g. on microcontroller platforms, saving even the small amount of memory used by the table can be a real benefit.)

一世旳自豪 2024-09-12 10:04:57

虽然总体上同意上述答案,但我认为对循环的每次迭代进行查找是愚蠢的,除非您认为大多数比较都会有不同的第一个字符,而您可以这样做

char c1, c2;
while((c1 = *(s1++)) == (c2 = *(s2++)) && c1 != '\0');
return order_table[c1] - order_table[c2];

另外,我建议构建 order_table使用静态初始化程序,这将提高速度(不需要每次或永远生成),并且可能还提高可读性

While in general agreement with the above answers, I think that it is silly to do lookups for every iteration of the loop, unless you think that most comparisons will have different first characters, when you could instead do

char c1, c2;
while((c1 = *(s1++)) == (c2 = *(s2++)) && c1 != '\0');
return order_table[c1] - order_table[c2];

Also, I would recommend constructing the order_table with a static initializer, which will improve speed (no need to generate every time -- or ever) and also perhaps readability

一向肩并 2024-09-12 10:04:57

这应该是一个非常好的字符串比较实现,类似于其他帖子描述的实现。

static const unsigned char char_remap_table[256] = /* values */

#define char_remap(c) (char_remap_table[(unsigned char) c])

int nonstd_strcmp(const char * restrict A, const char * restrict B) {
     while (1) {
          char a = *A++;
          char b = *B++;
          int x = char_remap(a) - char_remap(b);
          if (x) {
               return x;
          }
          /* Still using null termination, so test that from the original char,
           * but if \0 maps to \0 or you want to use a different end of string
           * then you could use the remapped version, which would probably work
           * a little better b/c the compiler wouldn't have to keep the original
           * var a around. */
          if (!a) { /* You already know b == a here, so only one test is needed */
               return x;  /* x is already 0 and returning it allows the compiler to
                           * store it in the register that it would store function
                           * return values in without doing any extra moves. */
          }
     }
}

除此之外,您可以概括该函数以将 char_remap_table 作为参数,这将允许您在以后需要时轻松使用不同的映射。

int nonstd_strcmp(const char * restrict a, const char * restrict b, const char * restrict map);

Here is what should be a pretty good implementation of the string compare similar to the one described by other posts.

static const unsigned char char_remap_table[256] = /* values */

#define char_remap(c) (char_remap_table[(unsigned char) c])

int nonstd_strcmp(const char * restrict A, const char * restrict B) {
     while (1) {
          char a = *A++;
          char b = *B++;
          int x = char_remap(a) - char_remap(b);
          if (x) {
               return x;
          }
          /* Still using null termination, so test that from the original char,
           * but if \0 maps to \0 or you want to use a different end of string
           * then you could use the remapped version, which would probably work
           * a little better b/c the compiler wouldn't have to keep the original
           * var a around. */
          if (!a) { /* You already know b == a here, so only one test is needed */
               return x;  /* x is already 0 and returning it allows the compiler to
                           * store it in the register that it would store function
                           * return values in without doing any extra moves. */
          }
     }
}

Above and beyond that you could generalize the function to take the char_remap_table as a parameter which would allow you to easily use different mappings later if you needed to.

int nonstd_strcmp(const char * restrict a, const char * restrict b, const char * restrict map);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文