如有必要,请切换到 BigInteger
我正在读取一个文本文件,其中包含 [1, 10^100] 范围内的数字。然后我对每个数字执行一系列算术运算。仅当数字超出 int/long 范围时,我才想使用 BigInteger。一种方法是计算字符串中有多少位数字,如果太多则切换到 BigInteger。否则我只会使用原始算术,因为它更快。有更好的办法吗?
如果 int 太小,Java 无法自动执行此操作(即切换到 BigInteger)是否有任何原因?这样我们就不用担心溢出的问题了。
I am reading a text file which contains numbers in the range [1, 10^100]. I am then performing a sequence of arithmetic operations on each number. I would like to use a BigInteger only if the number is out of the int/long range. One approach would be to count how many digits there are in the string and switch to BigInteger if there are too many. Otherwise I'd just use primitive arithmetic as it is faster. Is there a better way?
Is there any reason why Java could not do this automatically i.e. switch to BigInteger if an int was too small? This way we would not have to worry about overflows.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我怀疑对整数和实数使用原始值的决定(出于性能原因)使得该选项不可能实现。请注意,Python 和 Ruby 都按照您的要求执行。
在这种情况下,处理较小的特殊情况可能需要做更多的工作而不是值得的(您需要一些自定义类来处理这两种情况),并且您应该只使用 BigInteger。
I suspect the decision to use primitive values for integers and reals (done for performance reasons) made that option not possible. Note that Python and Ruby both do what you ask.
In this case it may be more work to handle the smaller special case than it is worth (you need some custom class to handle the two cases), and you should just use
BigInteger
.因为这是比当前 Java 更高级别的编程行为。该语言甚至不知道 BigInteger 类及其用途(即它不在 JLS 中)。它只知道用于装箱和拆箱目的的
Integer
(除其他外)。说到装箱/拆箱,
int
是一种原始类型;BigInteger
是一个引用类型。您不能拥有可以保存这两种类型的值的变量。Because that is a higher level programming behavior than what Java currently is. The language is not even aware of the
BigInteger
class and what it does (i.e. it's not in JLS). It's only aware ofInteger
(among other things) for boxing and unboxing purposes.Speaking of boxing/unboxing, an
int
is a primitive type;BigInteger
is a reference type. You can't have a variable that can hold values of both types.您可以将这些值读入 BigInteger,然后将它们转换为 long(如果它们足够小)。
(您可以将返回值设置为
List
并使其成为BigInteger
和Long
对象的混合集合,但这不会看起来非常好,但不会大幅提高性能。)如果文件中的大量数字足够小,可以容纳在一个文件中,那么性能可能会更好
long
(取决于计算的复杂程度)。仍然存在溢出风险,具体取决于您在primitiveCalculation
中所做的操作,并且您现在已经重复了代码,(至少)使潜在错误加倍,因此您必须确定性能增益是否真的有效是值得的。不过,如果您的代码与我的示例类似,那么通过并行化代码,您可能会获得更多收益,这样计算和 I/O 就不会在同一个线程上执行 - 您必须做一些对于像这样受 CPU 限制的架构来说,计算量相当大。
You could read the values into
BigInteger
s, and then convert them tolong
s if they're small enough.(You could make the return value a
List<Number>
and have it a mixed collection ofBigInteger
andLong
objects, but that wouldn't look very nice and wouldn't improve performance by a lot.)The performance may be better if a large amount of the numbers in the file are small enough to fit in a
long
(depending on the complexity of calculation). There's still risk for overflow depending on what you do inprimitiveCalculation
, and you've now repeated the code, (at least) doubling the bug potential, so you'll have to decide if the performance gain really is worth it.If your code is anything like my example, though, you'd probably have more to gain by parallelizing the code so the calculations and the I/O aren't performed on the same thread - you'd have to do some pretty heavy calculations for an architecture like that to be CPU-bound.
当较小的东西就足够了时,使用 BigDecimals 的影响是令人惊讶的,呃,大:
在我的系统上运行以下代码会产生以下输出:
MyLong
类只是为了查看拳击的效果,与您将得到的效果进行比较自定义BigOrLong
类。The impact of using BigDecimals when something smaller will suffice is surprisingly, err, big: Running the following code
produces, on my system, this output:
The
MyLong
class is there only to look at the effects of boxing, to compare against what you would get with a customBigOrLong
class.Java 很快——真的非常快。它只比 c 慢 2-4 倍,有时和大多数其他语言比 C/Java 慢 10 倍 (python) 到 100 倍 (ruby) 时一样快或快一点。 (顺便说一句,Fortran 也非常快)
部分原因是它不会为您执行诸如切换数字类型之类的操作。它可以,但目前它可以在几个字节内内联一个像“a * 5”这样的操作,想象一下如果 a 是一个对象,它必须经历的循环。它至少是对 a 的乘法方法的动态调用,这比 a 只是一个整数值时慢几百/千倍。
如今,Java 可能实际上可以使用 JIT 编译来更好地优化调用并在运行时内联它,但即使如此,也很少有库调用支持 BigInteger/BigDecimal,因此会有很多本机支持,这将是一种全新的语言。
还想象一下从 int 切换到 BigInteger 而不是 long 会让调试视频游戏变得非常困难! (是的,每次我们移动到屏幕右侧,游戏速度就会减慢 50 倍,代码都是一样的!这怎么可能?!?)
Java is Fast--really really Fast. It's only 2-4x slower than c and sometimes as fast or a tad faster where most other languages are 10x (python) to 100x (ruby) slower than C/Java. (Fortran is also hella-fast, by the way)
Part of this is because it doesn't do things like switch number types for you. It could, but currently it can inline an operation like "a*5" in just a few bytes, imagine the hoops it would have to go through if a was an object. It would at least be a dynamic call to a's multiply method which would be a few hundred / thousand times slower than it was when a was simply an integer value.
Java probably could, these days, actually use JIT compiling to optimize the call better and inline it at runtime, but even then very few library calls support BigInteger/BigDecimal so there would be a LOT of native support, it would be a completely new language.
Also imagine how switching from int to BigInteger instead of long would make debugging video games crazy-hard! (Yeah, every time we move to the right side of the screen the game slows down by 50x, the code is all the same! How is this possible?!??)
这可能吗?是的。但它也存在很多问题。
例如,考虑一下 Java 存储对 BigInteger 的引用,它实际上是在堆上分配的,但存储的是 int 文字。在 C 中可以清楚地看出这种差异:
现在,要自动从文字转换为引用,必须以某种方式对文字进行注释。例如,如果设置了 int 的最高位,则其他位可以用作某种表查找来检索正确的引用。这也意味着只要它溢出,您就会得到一个 BigInt** bi。
当然,这是通常用于符号的位,硬件指令很大程度上依赖于它。更糟糕的是,如果我们这样做,那么硬件将无法检测溢出并设置标志来指示它。因此,每个操作都必须伴随一些测试,以查看是否发生或将发生溢出(取决于何时可以检测到)。
所有这些都会给基本整数运算增加大量开销,这实际上会抵消您必须开始的任何好处。换句话说,假设 BigInt 比尝试使用 int 并检测溢出条件同时处理引用/文字问题更快。
因此,要获得任何真正的优势,就必须使用更多空间来表示整数。因此,我们不是在堆栈、对象或任何其他使用它们的地方存储 32 位,而是存储 64 位,并使用额外的 32 位来控制是否需要引用或文字。这可行,但有一个明显的问题——空间使用。 :-) 不过,我们可能会在 64 位硬件上看到更多这样的情况。
现在,您可能会问为什么不只是 40 位(32 位 + 1 字节)而不是 64 位?基本上,在现代硬件上,出于性能原因,最好以 32 位增量存储内容,因此无论如何我们都会将 40 位填充到 64 位。
编辑
让我们考虑如何在 C# 中执行此操作。现在,我没有 C# 编程经验,所以我无法编写代码来做到这一点,但我希望我可以给出一个概述。
这个想法是为其创建一个结构。它应该大致如下所示:
因此,如果数字在整数范围内,我们使用 int,否则我们使用 BigInteger。这些操作必须确保根据需要/可能从一种操作过渡到另一种操作。从客户的角度来看,这是透明的。它只是一种 MixedInt 类型,该类负责使用更适合的类型。
但请注意,这种优化很可能已经成为 C# BigInteger 的一部分,因为它是作为结构体实现的。
如果 Java 有类似 C# 的 struct 的东西,我们也可以在 Java 中做类似的事情。
Would it have been possible? Yes. But there are many problems with it.
Consider, for instance, that Java stores references to BigInteger, which is actually allocated on the heap, but store int literals. The difference can be made clear in C:
Now, to automatically go from a literal to a reference, one would necessarily have to annotate the literal somehow. For instance, if the highest bit of the int was set, then the other bits could be used as a table lookup of some sort to retrieve the proper reference. That also means you'd get a
BigInt** bi
whenever it overflowed into that.Of course, that's the bit usually used for sign, and hardware instructions pretty much depend on it. Worse still, if we do that, then the hardware won't be able to detect overflow and set the flags to indicate it. As a result, each operation would have to be accompanied by some test to see if and overflow has happened or will happen (depending on when it can be detected).
All that would add a lot of overhead to basic integer arithmetic, which would in practice negate any benefits you had to begin with. In other words, it is faster to assume BigInt than it is to try to use int and detect overflow conditions while at the same time juggling with the reference/literal problem.
So, to get any real advantage, one would have to use more space to represent ints. So instead of storing 32 bits in the stack, in the objects, or anywhere else we use them, we store 64 bits, for example, and use the additional 32 bits to control whether we want a reference or a literal. That could work, but there's an obvious problem with it -- space usage. :-) We might see more of it with 64 bits hardware, though.
Now, you might ask why not just 40 bits (32 bits + 1 byte) instead of 64? Basically, on modern hardware it is preferable to store stuff in 32 bits increments for performance reasons, so we'll be padding 40 bits to 64 bits anyway.
EDIT
Let's consider how one could go about doing this in C#. Now, I have no programming experience with C#, so I can't write the code to do it, but I expect I can give an overview.
The idea is to create a struct for it. It should look roughly like this:
So, if the number is in the integer range we use int, otherwise we use BigInteger. The operations have to ensure transition from one to another as required/possible. From the client point of view, this is transparent. It's just one type MixedInt, and the class takes care of using whatever fits better.
Note, however, that this kind of optimization may well be part of C#'s BigInteger already, given it's implementation as a struct.
If Java had something like C#'s struct, we could do something like this in Java as well.
这是动态类型的优点之一,但Java是静态类型并阻止了这种情况。
在动态类型语言中,当两个
Integer
相加时会产生溢出,系统可以自由地返回Long
。因为动态类型语言依赖于鸭子类型,所以没问题。在静态类型语言中不会发生同样的情况;它会破坏类型系统。编辑
鉴于我的答案和评论不清楚,在这里我尝试提供更多详细信息,为什么我认为静态类型是主要问题:
1)事实上,我们谈到原始类型是一个静态类型问题;我们不会关心动态类型语言。
2) 对于基本类型,溢出的结果不能转换为
int
以外的其他类型,因为它对于静态类型来说是不正确的3) 对于引用类型,它是相同的,只是我们有自动装箱。尽管如此,加法仍无法返回 BigInteger,因为它与静态类型系统不匹配(无法将 BigInteger 转换为 Integer) >)。
4) 可以做的就是子类化
Number
并在UnboundedNumeric
类型上实现,从而在内部优化表示(表示独立性)。不过,这并不是原来问题的真正答案。
5)对于动态类型,类似的东西
会返回一个
Long
,这是可以的。This is one of the advantage of dynamic typing, but Java is statically typed and prevents this.
In a dynamically type language when two
Integer
which are summed together would produce an overflow, the system is free to return, say, aLong
. Because dynamically typed language rely on duck typing, it's fine. The same can not happen in a statically typed language; it would break the type system.EDIT
Given that my answer and comment was not clear, here I try to provide more details why I think that static typing is the main issue:
1) the very fact that we speak of primitive type is a static typing issue; we wouldn't care in a dynamically type language.
2) with primitive types, the result of the overflow can not be converted to another type than an
int
because it would not be correct w.r.t static typing3) with reference types, it's the same except that we have autoboxing. Still, the addition could not return, say, a
BigInteger
because it would not match the static type sytem (ABigInteger
can not be casted toInteger
).4) what could be done is to subclass, say,
Number
and implement at typeUnboundedNumeric
that optimizes the representation internally (representation independence).Still, it's not really the answer to the original question.
5) with dynamic typing, something like
would return a
Long
which is ok.