BitSet.size() 返回负值。已知错误？

发布于 2025-01-16 13:10:03 字数 900 浏览 1 评论 0原文

new BitSet(Integer.MAX_VALUE).size() 报告负值：

import java.util.BitSet;

public class NegativeBitSetSize {
    public static void main(String[] args) {
        BitSet a;

        a = new BitSet(Integer.MAX_VALUE);
        System.out.println(a.size()); // -2147483648

        a = new BitSet(Integer.MAX_VALUE - 50);
        System.out.println(a.size()); // -2147483648

        a = new BitSet(Integer.MAX_VALUE - 62);
        System.out.println(a.size()); // -2147483648

        a = new BitSet(Integer.MAX_VALUE - 63);
        System.out.println(a.size()); // 2147483584
    }
}

在测试系统上：

$ java -version
openjdk version "11.0.14" 2022-01-18
OpenJDK Runtime Environment (build 11.0.14+9-Ubuntu-0ubuntu2.18.04)
OpenJDK 64-Bit Server VM (build 11.0.14+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)

我找不到这方面的错误报告。这是已知的或有记录的吗？

原文

new BitSet(Integer.MAX_VALUE).size() reports a negative value:

import java.util.BitSet;

public class NegativeBitSetSize {
    public static void main(String[] args) {
        BitSet a;

        a = new BitSet(Integer.MAX_VALUE);
        System.out.println(a.size()); // -2147483648

        a = new BitSet(Integer.MAX_VALUE - 50);
        System.out.println(a.size()); // -2147483648

        a = new BitSet(Integer.MAX_VALUE - 62);
        System.out.println(a.size()); // -2147483648

        a = new BitSet(Integer.MAX_VALUE - 63);
        System.out.println(a.size()); // 2147483584
    }
}

On the test system:

$ java -version
openjdk version "11.0.14" 2022-01-18
OpenJDK Runtime Environment (build 11.0.14+9-Ubuntu-0ubuntu2.18.04)
OpenJDK 64-Bit Server VM (build 11.0.14+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)

I couldn't find a bug report for this. Is this known or documented?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伤痕我心 2025-01-23 13:10:03

我怀疑这是否会被记录下来。它肯定不会被“修复”，因为没有任何明智的修复可以不破坏向后兼容性，而且它还远没有足够的相关性来采取如此激烈的步骤。

深入探究——为什么会发生这种情况？

虽然 API 文档没有做出这样的保证，但 size() 的效果是它只是返回您在构造BitSet 实例...但向上舍入到下一个可被 64 整除的值：

sysout(new BitSet(1).size());   // 64
sysout(new BitSet(63).size());  // 64
sysout(new BitSet(64).size());  // 64
sysout(new BitSet(65).size());  // 128
sysout(new BitSet(100).size()); // 128
sysout(new BitSet(128).size()); // 128
sysout(new BitSet(129).size()); // 192

这是合乎逻辑的；该实现使用long值的数组来存储这些位（因为这比使用boolean[]更有效（8倍！），因为每个布尔值仍然占用数组中的一个字节，以及整个 long 的位作为单独变量）。

该规范并不保证这一点，但它解释了为什么会发生这种情况。

然后，它还解释了为什么您会看到自己的情况：Integer.MAX_VALUE 是 2147483647。将其四舍五入到最接近的 64 倍数，您将得到...2147483648。哪个溢出int - 和 Integer.MAX_VALUE + 1 / (int) 2147483648L - 都是相同的值：-2147483648。这是存在于有符号 int 空间中的 one 值，作为负数，没有匹配的正数（这也是有道理的：某些位序列需要表示 0，而 0 既不是按照惯例/按照 2 补码的规则，这就是 java 以位形式表示所有数字的方式，0 位于“正”空间中（假定它都是 0 位）。从那里开始，这个数字是2147483648。

让我们修复它！

一个简单的修复方法是让 size() 方法返回一个 long，它可以简单地表示 2147483648，不幸的是，这是正确的答案。因此，如果要求进行更改，则极不可能成功。

另一种解决方法是创建第二种方法，并使用一些“认输”的名称，例如。 accurateSize() 或诸如此类，以便 size() 保持不受干扰，从而保留向后兼容性，这确实返回 long。但这会永远弄脏 API，因为除了您可以要求的最大 63 个数字之外，这个细节与所有情况都无关。（Integer.MAX_VALUE-62 到 Integer.MAX_VALUE 是您可以为 nBits 传递的唯一值，这会导致 size() 返回负值。返回的负值将始终为 Integer.MIN_VALUE我怀疑他们会这样做。

第三个解决办法是撒谎并返回 Integer.MAX_VALUE，这不是正确的值（因为实际上多了 1 位）。假设您实际上无法“设置”该位值，因为您无法将 2147483648 传递给构造函数（因为您必须传递一个 int，即该数字）不能作为 int 传递，如果你尝试最终得到 -2147483648，这是负值并导致构造函数抛出，因此不会给你一个实例：没有黑客，例如使用反射来设置私有API 不需要处理的字段，您无法创建一个可以实际存储第 2147483648 位的值的 BitSet，

这让我们了解了 size() 的意义。是为了告诉你BitSet 对象占用的字节数吗？如果这就是重点，那么它从来就不是一个好方法：JVM 不保证 long[] 的内存大小是 arrSize*8 字节（尽管所有 JVM 实现都具有该大小），+数组头结构的一些低开销）。

相反，它可能只是让您知道可以用它做什么。即使您调用 new BitSet(5)，您仍然可以设置第 6 位（因为为什么不呢 - 它不会“花费”任何东西，我猜这就是意图）。您可以设置从 0 到 .size() 减 1 的所有位。

这让我们得到真正的答案！

size() 实际上并没有被破坏。返回的数字完全正确：实际上就是大小。只是当您打印它时，它“打印错误” - 因为 size() 的返回值应该被解释为无符号。 size() 的 javadoc 明确指出了它唯一的一点，即获取该数字，然后减去 1：然后这会告诉您可以设置的最大元素。

这工作得很好：

BitSet x = new BitSet(Integer.MAX_VALUE);
int maxIndex = x.size() - 1;
System.out.println(maxIndex);
x.set(maxIndex);

上面的代码工作得很好。正如预期，maxIndex 值为 2147483647（即 Integer.MAX_VALUE）。

因此，这里实际上没什么可做的：API 本身就很好，并且按照它建议您准确使用它的方式进行操作。您想提出的任何“更好”的 API 都将向后不兼容；改变 BitSet 并不是一个好主意，添加更多的方法，java.util.Vector 风格使 API 变得丑陋，这绝对是治本不如治病的情况。

只需在文档中添加注释即可。如果你深入研究文档中的这种异国情调，你最终会得到大量的文档，而这些文档又是比疾病更糟糕的治疗方法。可持续的解决方案可能是让 javadoc 获得编写深奥脚注的基本能力，例如 javadoc 工具可以通过默认折叠的“折叠”弹出界面元素将其转换为 HTML （即异国情调的脚注不可见），但如果您确实想阅读详细信息，可以扩展。

Javadoc没有这个。

结论：人们很容易认为 API 根本没有损坏； size() 中没有明确说明返回值应被解释为有符号 int；唯一明确的承诺是您可以从结果中减去 1 并将其用作索引，这样效果很好。充其量，您可以提交错误报告来更新文档，但这不是一个好主意，因为不可能（轻松）将此类深奥内容添加到文档中。如果您确实想走这条路，JDK 库中还有很多此类内容也没有记录。

I doubt this would be documented. It certainly won't be 'fixed', as there is no sensible fix available that doesn't break backwards compatibility, and it is nowhere near relevant enough to take such drastic steps.

Digging under the hood - why is this happening?

Whilst the API docs make no such guarantee, the effect of size() is that it simply returns the nBits value you passed when you constructed the BitSet instance... but rounded up to the next value that is evenly divisible by 64:

sysout(new BitSet(1).size());   // 64
sysout(new BitSet(63).size());  // 64
sysout(new BitSet(64).size());  // 64
sysout(new BitSet(65).size());  // 128
sysout(new BitSet(100).size()); // 128
sysout(new BitSet(128).size()); // 128
sysout(new BitSet(129).size()); // 192

This is logical; the implementation uses an array of long values to store these bits (as that's (by a factor of 8!) more efficient than using e.g. a boolean[], as each boolean still takes up a byte in an array, and an entire long's worth of bits as lone variable).

The spec doesn't guarantee this, but it explains why this is happening.

It then also explains why you are witnessing what you are: Integer.MAX_VALUE is 2147483647. Round that up to the nearest multiple of 64 and you get... 2147483648. Which overflows int - and Integer.MAX_VALUE + 1 / (int) 2147483648L - are both the same value: -2147483648. That is the one value that exists in signed int space as a negative number with no matching positive number (that makes sense too: Some bit sequence needs to represent 0 which is neither positive or negative. By convention / by the rules of 2s complement, which is how java represents in bit form all numbers, the 0 is in the 'positive' space (given that it's all 0 bits). It thus 'leaches' a number from there, and that number is 2147483648.

Let's fix it!

One easy fix is to have the size() method return a long instead, which can trivially represent 2147483648, which is the correct answer. Unfortunately, this is not backwards compatible. Hence, extremely unlikely to succeed if one would ask for that change.

Another fix is to create a second method with some throw-in-the-towel name such as accurateSize() or whatnot, so that size() remains unmolested and thus backwards compatibility is preserved, which does return long. But this is dirtying up the API forever, for a detail that is irrelevant for all cases except the largest 63 numbers you can ask for. (Integer.MAX_VALUE-62 through Integer.MAX_VALUE are the only values you can pass for nBits which result in size() returning a negative value. The negative value returned will always be Integer.MIN_VALUE. I doubt they'd do that.

A third fix is to lie and return Integer.MAX_VALUE instead, which isn't quite the right value (as 1 bit more is in fact 'available' in the bit space). Given that you can't actually 'set' that bit value, as you can't pass 2147483648 to the constructor (as you must pass an int, that number is not passable as an int, if you try you end up with -2147483648, which is negative and causes the constructor to throw, hence not giving you an instance: Without hackery such as using reflection to set private fields, which APIs do not need to adress, you can't make a BitSet that can actually store the value of the 2147483648th bit.

This then gets us to what the point of size() is. Is it for telling you the amount of bytes that the BitSet object occupies? If that's the point, it's never been a great way to go about it: The JVM doesn't guarantee that a long[]'s memory size is arrSize*8 bytes (though all JVM impls have that, + some low overhead for the array's header structure).

Instead it is perhaps simply to let you know what you can do with it. Even if you call, say, new BitSet(5), you can still set the 6th bit (because why not - it doesn't "cost" anything, I guess that was the intent). You can set all bits from 0 up to the .size() minus 1.

And this gets us to the real answer!

size() is not actually broken. The number returned is entirely correct: That is, in fact, the size. It's merely that when you print it, it 'prints wrong' - because size()'s return value should be interpreted as unsigned. The javadoc of size() explicitly calls out its only point, which is to take that number, and subtract 1: This then tells you the maximum element you can set.

And this works just fine:

BitSet x = new BitSet(Integer.MAX_VALUE);
int maxIndex = x.size() - 1;
System.out.println(maxIndex);
x.set(maxIndex);

The above code works fine. That maxIndex value is 2147483647 (Which is Integer.MAX_VALUE) as expected.

Hence, there's really nothing to do here: The API is fine as is and does what it suggests you use it for accurately. Any API you care to come up with that's 'better' would be backwards incompatible; changing BitSet is not a good idea, adding more methods, java.util.Vector style uglies up the API which is definitely a case of the cure being worse than the disease.

That just leaves adding notes to the docs. If you delve into this level of exotics in docs, you end up with huge documentation that is, again, a cure worse than the disease. The sustainable solution would perhaps be for javadoc to gain a fundamental ability to write esoteric footnotes, which e.g. the javadoc tool can turn into HTML by way of a 'folding' popdown interface element that is folded up by default (i.e. the exotic footnotes are not visible), but can be expanded if you really want to read the details.

Javadoc doesn't have this.

CONCLUSION: One can easily argue the API isn't broken at all; nothing in size() explicitly says that the returned value should be interpreted as a signed int; the only explicit promise is that you can subtract 1 from the result and use that as index, which works fine. At best, you could file a bug report to get the docs updated, except that's not a good idea because it's not (easily) possible to add such esoterics to the documentation. If you do want to go down that path, there's a lot more of this sort of thing in the JDK libraries that aren't documented either.

回复收藏 0 原文

~没有更多了~