当前位置：文江博客话题详情

为什么使用“strlen30()”而不是“strlen()”？

发布于 2024-11-27 01:34:12 字数 257 浏览 5 评论 0 原文

我读过并想知道 sqlite 的源代码

static int strlen30(const char *z){
  const char *z2 = z;
  while( *z2 ){ z2++; }
  return 0x3fffffff & (int)(z2 - z);
}

为什么使用 strlen30() 而不是 strlen() （在 string.h 中）？

原文

I've read and wondered about the source code of sqlite

static int strlen30(const char *z){
  const char *z2 = z;
  while( *z2 ){ z2++; }
  return 0x3fffffff & (int)(z2 - z);
}

Why use strlen30() instead of strlen() (in string.h)??

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

灯角 2024-12-04 01:34:12

提交消息此更改指出：

[793aaebd8024896c] 签入的一部分 [c872d55493] 切勿使用 strlen()。使用我们自己的内部 sqlite3Strlen30() ，保证永远不会溢出整数。额外的显式强制转换以避免出现令人讨厌的警告消息。 (CVS 6007) (用户: drh 分支: trunk)

回复收藏 0 原文

最好是你 2024-12-04 01:34:12

（这是我的答案为什么将 strlen 重新实现为循环+减法？< /a> ，但它已关闭）

我无法告诉你他们必须重新实现它的原因以及他们选择的原因int 而不是 size_t 作为返回类型。但关于功能：

/*
 ** Compute a string length that is limited to what can be stored in
 ** lower 30 bits of a 32-bit signed integer.
 */
static int strlen30(const char *z){
    const char *z2 = z;
    while( *z2 ){ z2++; }
    return 0x3fffffff & (int)(z2 - z);
}

标准参考

标准在（ISO/IEC 14882:2003(E)）3.9.1 基本类型，4.中说：

声明为无符号的无符号整数应遵守模 2ⁿ 的算术定律，其中 n 是该特定大小的整数的值表示中的位数。 41)

...

⁴¹⁾：这意味着无符号算术不会溢出，因为结果无法由结果无符号整数表示
类型以比结果无符号整数可以表示的最大值大 1 的数为模进行缩减
类型

标准的该部分没有定义有符号整数的溢出行为。如果我们看5。表达式，5.：

如果在计算表达式期间，结果未在数学上定义或不在其类型的可表示值范围内，则行为未定义，除非此类表达式是常量表达式
（5.19），在这种情况下程序是格式错误的。 [注意：大多数现有的 C++ 实现都会忽略整数
溢出。除以零的处理、使用零除数形成余数以及所有浮点
例外情况因机器而异，通常可以通过库函数进行调整。 ]

到目前为止已经溢出了。

至于两个指向数组元素的指针相减，5.7 加法运算符，6.：

当两个指向同一数组对象元素的指针相减时，结果是两个数组元素的下标之差。结果的类型是实现定义的有符号整型；该类型应与 cstddef 标头 (18.1) 中定义为 ptrdiff_t 的类型相同。 [...]

看看18.1：

内容与标准C库头文件相同stddef.h

那么让我们看一下 C 标准（不过，我只有 C99 的副本），7.17 通用定义 ：

用于 size_t 和 ptrdiff_t 的类型不应具有整数转换等级
大于signed long int 除非实现支持对象
足够大，有必要这样做。

对于 ptrdiff_t 不作进一步保证。然后，附件 E（仍在 ISO/IEC 9899:TC2 中）给出了有符号 long int 的最小量值，但不是最大值：

#define LONG_MAX +2147483647

现在 int 的最大值是多少， sqlite - strlen30() 的返回类型？让我们再次跳过将我们引向 C 标准的 C++ 引用，我们将在 C99 附录 E 中看到 int 的最小最大值：

#define INT_MAX +32767

摘要

通常，< code>ptrdiff_t 不大于 signed long，且不小于 32 位。
int 被定义为至少 16 位长。
因此，两个指针相减可能会得到不适合您平台的 int 的结果。
我们从上面记得，对于有符号类型，不适合的结果会产生未定义的行为。
strlen30 确实对指针减法结果应用按位或：

          | 32 bit                         |
ptr_diff  |10111101111110011110111110011111| // could be even larger
&         |00111111111111111111111111111111| // == 3FFFFFFF<sub>16</sub>
          ----------------------------------
=         |00111101111110011110111110011111| // truncated

通过将指针减法结果截断为最大值 3FFFFFFF₁₆ = 1073741823< 来防止未定义行为子>10。

我不确定他们为什么选择这个值，因为在大多数机器上，只有最有效位表示符号。与标准相比，选择最小值 INT_MAX 可能是有意义的，但在不知道更多细节的情况下，1073741823 确实有点奇怪（尽管它当然完美地实现了其函数上面的注释：截断为 30 位和防止溢出）。

(this is my answer from Why reimplement strlen as loop+subtraction? , but it was closed)

I can't tell you the reason why they had to re-implement it, and why they chose int instead if size_t as the return type. But about the function:

/*
 ** Compute a string length that is limited to what can be stored in
 ** lower 30 bits of a 32-bit signed integer.
 */
static int strlen30(const char *z){
    const char *z2 = z;
    while( *z2 ){ z2++; }
    return 0x3fffffff & (int)(z2 - z);
}

Standard References

The standard says in (ISO/IEC 14882:2003(E)) 3.9.1 Fundamental Types, 4.:

Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2ⁿ where n is the number of bits in the value representation of that particular size of integer. 41)

...

⁴¹⁾: This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer
type

That part of the standard does not define overflow-behaviour for signed integers. If we look at 5. Expressions, 5.:

If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined, unless such an expression is a constant expression
(5.19), in which case the program is ill-formed. [Note: most existing implementations of C + + ignore integer
overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point
exceptions vary among machines, and is usually adjustable by a library function. ]

So far for overflow.

As for subtracting two pointers to array elements, 5.7 Additive operators, 6.:

When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as ptrdiff_t in the cstddef header (18.1). [...]

Looking at 18.1:

The contents are the same as the Standard C library header stddef.h

So let's look at the C standard (I only have a copy of C99, though), 7.17 Common Definitions :

The types used for size_t and ptrdiff_t should not have an integer conversion rank
greater than that of signed long int unless the implementation supports objects
large enough to make this necessary.

No further guarantee made about ptrdiff_t. Then, Annex E (still in ISO/IEC 9899:TC2) gives the minimum magnitude for signed long int, but not a maximum:

#define LONG_MAX +2147483647

Now what are the maxima for int, the return type for sqlite - strlen30()? Let's skip the C++ quotation that forwards us to the C-standard once again, and we'll see in C99, Annex E, the minimum maximum for int:

#define INT_MAX +32767

Summary

Usually, ptrdiff_t is not bigger than signed long, which is not smaller than 32bits.
int is just defined to be at least 16bits long.
Therefore, subtracting two pointers may give a result that does not fit into the int of your platform.
We remember from above that for signed types, a result that does not fit yields undefined behaviour.
strlen30 does applies a bitwise or upon the pointer-subtract-result:

          | 32 bit                         |
ptr_diff  |10111101111110011110111110011111| // could be even larger
&         |00111111111111111111111111111111| // == 3FFFFFFF<sub>16</sub>
          ----------------------------------
=         |00111101111110011110111110011111| // truncated

That prevents undefiend behaviour by truncation of the pointer-subtraction result to a maximum value of 3FFFFFFF₁₆ = 1073741823₁₀.

I am not sure about why they chose exactly that value, because on most machines, only the most significant bit tells the signedness. It could have made sense versus the standard to choose the minimum INT_MAX, but 1073741823 is indeed slightly strange without knowing more details (though it of course perfectly does what the comment above their function says: truncate to 30bits and prevent overflow).