如何编写更好的 strlen 函数?

发布于 2024-11-18 16:28:07 字数 394 浏览 5 评论 0原文

我正在阅读“Write Great Code Volume 2”,它显示了以下 strlen 实现:

int myStrlen( char *s )
{
    char *start;
    start = s;
    while( *s != 0 )
    {
        ++s;
    }
    return s - start;
}

书上说这种实现对于没有经验的 C 程序员来说是典型的。过去 11 年我一直在用 C 编写代码,但我不知道如何用 C 编写比这更好的函数(我可以考虑用汇编编写更好的东西)。怎么可能用 C 写出比这更好的代码呢?我查看了 glibc 中 strlen 函数的标准库实现,但我无法理解其中的大部分内容。在哪里可以找到有关如何编写高度优化的代码的更好信息?

I am reading "Write Great Code Volume 2" and it shows the following strlen impelementation:

int myStrlen( char *s )
{
    char *start;
    start = s;
    while( *s != 0 )
    {
        ++s;
    }
    return s - start;
}

the book says that this implementation is typical for an inexperienced C programmer. I have been coding in C for the past 11 years and i can't see how to write a function better than this in C(i can think of writing better thing in assembly). How is it possible to write code better than this in C? I looked the standard library implementation of the strlen function in glibc and I couldn't understand most part of it. Where can I find better information on how to write highly optimized code?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

王权女流氓 2024-11-25 16:28:07

来自优化 strlen(),Colm MacCarthaigh 的博文:

不幸的是,在 C 中,我们注定要实现 O(n) 的实现(最好的情况),但我们还没有完成……我们可以对 n 的大小做一些事情。

它给出了一个很好的例子,告诉你可以朝什么方向努力来加快速度。以及其中的另一句话

有时走得非常非常快只会让你变得非常非常疯狂。

From Optimising strlen(), a blogpost by Colm MacCarthaigh:

Unfortunately in C, we’re doomed to an O(n) implementation, best case, but we’re still not done … we can do something about the very size of n.

It gives good example in what direction you can work to speed it up. And another quote from it

Sometimes going really really fast just makes you really really insane.

可爱咩 2024-11-25 16:28:07

维克多,看看这个:
http://en.wikipedia.org/wiki/Strlen#Implementation

PS 你的原因不明白 glibc 版本可能是因为它使用位移位来查找 \0。

Victor, take a look at this:
http://en.wikipedia.org/wiki/Strlen#Implementation

P.S. The reason you don't understand the glibc version is probably because it uses bit shifting to find the \0.

风柔一江水 2024-11-25 16:28:07

对于初学者来说,这对于像 UTF-8 这样的编码来说毫无价值……也就是说,计算 UTF-8 字符串中的字符数更复杂,而字节数当然就像计算一样容易,比方说,一个 ASCII 字符串。

一般来说,您可以通过读取更大的寄存器来在某些平台上进行优化。由于到目前为止发布的其他链接没有相关示例,因此这里有一些低端字节序的伪代码:

int size = 0;
int x;
int *caststring = (int *) yourstring;
while (int x = *caststring++) {
  if (!(x & 0xff)) /* first byte in this int-sized package is 0 */ return size;
  else if (!(x & 0xff00)) /* second byte etc. */ return size+1;
  /* rinse and repeat depending on target architecture, i.e. twice more for 32 bit */
  size += sizeof (int);
}

For starters, this is worthless for encodings like UTF-8... that is, calculating the number of characters in an UTF-8 string is more complicated, whereas the number of bytes is, of course, just as easy to calculate as in, say, an ASCII string.

In general, you can optimize on some platforms by reading into larger registers. Since the other links posted so far don't have an example of that, here's a bit of pseudo-pseudocode for lower endian:

int size = 0;
int x;
int *caststring = (int *) yourstring;
while (int x = *caststring++) {
  if (!(x & 0xff)) /* first byte in this int-sized package is 0 */ return size;
  else if (!(x & 0xff00)) /* second byte etc. */ return size+1;
  /* rinse and repeat depending on target architecture, i.e. twice more for 32 bit */
  size += sizeof (int);
}
梦年海沫深 2024-11-25 16:28:07

正如其他人指出的那样,更快的算法会读取整个单词而不是单个字符,并使用按位操作来找到终止空值。如果您采用这种方法,请注意字对齐指针,因为某些 CPU 架构不允许您从未对齐的地址读取字(即使在不需要对齐的架构上,这也是触发段错误的好方法)。

底线

优秀的代码在除对性能最关键的情况外的所有情况下都强调可读性而不是速度。尽可能清晰地编写代码,并且只优化被证明是瓶颈的部分。

As others have pointed out, a faster algorithm reads entire words instead of individual characters and uses bitwise operations to find the terminating null. Be mindful of word-aligning your pointer if you take this approach, as some CPU architectures won't let you read words from an unaligned address (and it's a great way to trigger a segfault even on architectures that don't require alignment).

Bottom line:

Great code emphasizes readability over speed in all but the most performance-critical cases. Write your code as clearly as you can and only optimize the parts that prove to be bottlenecks.

美男兮 2024-11-25 16:28:07

读取与机器数据总线大小不同的变量的成本很高,因为机器只能读取该大小的变量。因此,每当请求不同大小(假设更小)的东西时,机器必须做一些工作以使其看起来像请求大小的变量(例如移位位)。
因此,您最好以机器大小的字读取数据,然后使用 AND 运算检查是否有 0。另外,在扫描字符串时,请确保从对齐的起始地址开始。

Reading a variable that is not of the same size as the machine data bus size is expensive, because the machine can only read variables of that size. Therefore, whenever something of different size (let's say smaller) is requested, the machine must do work to make it look like a variable of the requested size (like shifting the bits).
So you better read the data in machine sized words, and then use the AND operation to check for 0s. Also, when scanning through the string, make sure you start at an aligned start address.

梦里梦着梦中梦 2024-11-25 16:28:07

回答OP关于在哪里可以找到如何编写性能代码的建议的问题,这里是 链接 麻省理工学院关于编写优化 C 代码的开放课程(查找页面左侧的“材料”链接)。

Answering OP's question about where to find suggestions how to write code for performance, here's link to MIT OpenCourse on writing Optimized C Code (look for "Materials" link on the left of page).

热血少△年 2024-11-25 16:28:07

以下应该比朴素算法更快并且适用于 32/64 位。

union intptr {
    char* c;
    long* l;
#define LSIZE sizeof(long)
};

#define aligned_(x, a) \
    ((unsigned long) (x) % (a) == 0)

#define punpktt_(x, from, to) \
    ((to) (-1)/(from) (-1)*(from) (x))
#define punpkbl_(x) \
    punpktt_(x, unsigned char, unsigned long)

#define plessbl_(x, y) \
    (((x) - punpkbl_(y)) & ~(x) & punpkbl_(0x80))
#define pzerobl_(x) \
    plessbl_(x, 1)

static inline unsigned long maskffs_(unsigned long x)
{
    unsigned long acc = 0x00010203UL;
    if (LSIZE == 8)
       acc = ((acc << 16) << 16) | 0x04050607UL;
    return ((x & -x) >> 7) * acc >> (LSIZE*8-8);
}

size_t strlen(const char* base)
{
    union intptr p = { (char*) base };
    unsigned long mask;

    for ( ; !aligned_(p.c, LSIZE); p.c++ )
        if (*p.c == 0)
            return p.c - base;

    while ( !(mask = pzerobl_(*p.l)) )
        p.l++;
    return p.c - base + maskffs_(mask);
}

The following should be faster than the naive algorithm and work for 32/64 bit.

union intptr {
    char* c;
    long* l;
#define LSIZE sizeof(long)
};

#define aligned_(x, a) \
    ((unsigned long) (x) % (a) == 0)

#define punpktt_(x, from, to) \
    ((to) (-1)/(from) (-1)*(from) (x))
#define punpkbl_(x) \
    punpktt_(x, unsigned char, unsigned long)

#define plessbl_(x, y) \
    (((x) - punpkbl_(y)) & ~(x) & punpkbl_(0x80))
#define pzerobl_(x) \
    plessbl_(x, 1)

static inline unsigned long maskffs_(unsigned long x)
{
    unsigned long acc = 0x00010203UL;
    if (LSIZE == 8)
       acc = ((acc << 16) << 16) | 0x04050607UL;
    return ((x & -x) >> 7) * acc >> (LSIZE*8-8);
}

size_t strlen(const char* base)
{
    union intptr p = { (char*) base };
    unsigned long mask;

    for ( ; !aligned_(p.c, LSIZE); p.c++ )
        if (*p.c == 0)
            return p.c - base;

    while ( !(mask = pzerobl_(*p.l)) )
        p.l++;
    return p.c - base + maskffs_(mask);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文