为什么在处理二进制字符串的问题中使用整数比使用字符串慢得多?
我解决了以下leet代码问题https://leetcode。 com/problems/find-kth-bit-in-nth-binary-string/ 使用字符串(类 Solution
)和整数(类解决方案2
)。我认为整数解决方案应该更快并且使用更少的内存。但实际上需要更长的时间。当 n=18 和 k=200 时,需要 10.1 秒,而字符串解决方案需要 0.13 秒。
import time
class Solution:
def findKthBit(self, n: int, k: int) -> str:
i = 0
Sn = "0"
Sn = self.findR(Sn, i, n)
return Sn[k-1]
def findR(self, Sn, i, n):
if i == n:
return Sn
newSn = self.calcNewSn(Sn, i)
return self.findR(newSn, i+1, n)
def calcNewSn(self, Sn, i):
inverted = ""
for c in Sn:
inverted += "1" if c == "0" else "0"
newSn = Sn + "1" + (inverted)[::-1]
return newSn
class Solution2:
def findKthBitBin(self, n: int, k: int) -> str:
i = 0
# MSB (S1) has to be 1 for bit operations but is actually always 0
Sn = 1
Sn = self.findRBin(Sn, i, n)
lenSn = (2**(n+1)) - 1
return "0" if k == 1 else str((Sn >> (lenSn - k)) & 1)
def findRBin(self, Sn, i, n):
if i == n:
return Sn
newSn = self.calcNewSnBin(Sn, i)
return self.findRBin(newSn, i+1, n)
def calcNewSnBin(self, Sn, i):
lenSn = 2**(i+1) - 1
newSn = (Sn << 1) | 1
inverted = (~Sn | (1 << (lenSn - 1))) & self.getMask(lenSn)
newSn = self.reverseBits(newSn, inverted, lenSn)
return newSn
def getMask(self, lenSn):
mask = 0
for i in range(lenSn):
mask |= (1 << i)
return mask
def reverseBits(self, newSn, inverted, lenSn):
for i in range(lenSn):
newSn <<= 1
newSn |= inverted & 1
inverted >>= 1
return newSn
sol = Solution()
sol2 = Solution2()
start = time.time()
print(sol.findKthBit(18, 200))
end = time.time()
print(f"time using strings: {end - start}")
start = time.time()
print(sol2.findKthBitBin(18, 200))
end = time.time()
print(f"time using large integers: {end - start}")
为什么使用整数的解决方案这么慢?与字符串相比,其他语言中大整数的实现是否更快?
I solved the following leet code problem https://leetcode.com/problems/find-kth-bit-in-nth-binary-string/ by using strings (class Solution
) and integers (class Solution2
). I thought the integer solution should be a lot faster and using much less memory. But actually it takes a lot longer. With n=18 and k=200 it took 10.1 seconds compared to 0.13 seconds of the string solution.
import time
class Solution:
def findKthBit(self, n: int, k: int) -> str:
i = 0
Sn = "0"
Sn = self.findR(Sn, i, n)
return Sn[k-1]
def findR(self, Sn, i, n):
if i == n:
return Sn
newSn = self.calcNewSn(Sn, i)
return self.findR(newSn, i+1, n)
def calcNewSn(self, Sn, i):
inverted = ""
for c in Sn:
inverted += "1" if c == "0" else "0"
newSn = Sn + "1" + (inverted)[::-1]
return newSn
class Solution2:
def findKthBitBin(self, n: int, k: int) -> str:
i = 0
# MSB (S1) has to be 1 for bit operations but is actually always 0
Sn = 1
Sn = self.findRBin(Sn, i, n)
lenSn = (2**(n+1)) - 1
return "0" if k == 1 else str((Sn >> (lenSn - k)) & 1)
def findRBin(self, Sn, i, n):
if i == n:
return Sn
newSn = self.calcNewSnBin(Sn, i)
return self.findRBin(newSn, i+1, n)
def calcNewSnBin(self, Sn, i):
lenSn = 2**(i+1) - 1
newSn = (Sn << 1) | 1
inverted = (~Sn | (1 << (lenSn - 1))) & self.getMask(lenSn)
newSn = self.reverseBits(newSn, inverted, lenSn)
return newSn
def getMask(self, lenSn):
mask = 0
for i in range(lenSn):
mask |= (1 << i)
return mask
def reverseBits(self, newSn, inverted, lenSn):
for i in range(lenSn):
newSn <<= 1
newSn |= inverted & 1
inverted >>= 1
return newSn
sol = Solution()
sol2 = Solution2()
start = time.time()
print(sol.findKthBit(18, 200))
end = time.time()
print(f"time using strings: {end - start}")
start = time.time()
print(sol2.findKthBitBin(18, 200))
end = time.time()
print(f"time using large integers: {end - start}")
Why is the solution using integers so slow? Are the implemetations of big ints faster in other languages compared to strings?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
问题不在于整数实现有多快或多慢。问题是您编写了一个需要随机访问可变序列数据类型(如列表)的算法,而整数不是随机访问可变序列。
例如,查看您的
reverseBits
实现:对于列表,您只需调用
list.reverse()
即可。相反,您的代码会一遍又一遍地转换newSn
和inverted
,在每次迭代中重复复制几乎所有涉及的数据。字符串也不是一种随机访问的可变序列数据类型,但它们更接近,并且 Python 的标准实现有一个奇怪的优化,它实际上确实尝试像您编写的那样在循环中改变字符串:
而不是构建一个新字符串并复制
+=
的所有数据,Python 尝试就地调整字符串大小。当它起作用时,它可以避免您在基于整数的实现中进行的大部分不必要的复制。编写算法的正确方法是使用列表,而不是字符串或整数。然而,还有一种更好的方法,采用不同的算法,根本不需要构建位序列。您不需要整个位序列。您只需要计算一位即可。弄清楚如何计算那一点。
The problem isn't how fast or slow the integer implementation is. The problem is that you've written an algorithm that wants a random-access mutable sequence data type, like a list, and integers are not a random-access mutable sequence.
For example, look at your
reverseBits
implementation:With a list, you could just call
list.reverse()
. Instead, your code shiftsnewSn
andinverted
over and over, copying almost all the data involved repeatedly on every iteration.Strings aren't a random-access mutable sequence data type either, but they're closer, and the standard implementation of Python has a weird optimization that actually does try to mutate strings in loops like the one you've written:
Instead of building a new string and copying all the data for the
+=
, Python tries to resize the string in place. When that works, it avoids most of the unnecessary copying you're doing with the integer-based implementation.The right way to write your algorithm would be to use lists, instead of strings or integers. However, there's an even better way, with a different algorithm that doesn't need to build a bit sequence at all. You don't need the whole bit sequence. You just need to compute a single bit. Figure out how to compute just that bit.
大部分时间花在
reverseBits()
上。总的来说,您将在其中创建数十万个新的整数对象,长度最多为 262143 位。它本质上是二次时间。字符串形式的相应操作就是(inverted)[::-1]
,它完全以“C 速度”运行,并且是最坏情况的线性时间。低级别、高级别
只需替换
reverseBits()
即可获得 8 倍的加速,即线性时间。它将
inverted
转换为位字符串,反转该字符串,然后将结果转换回int。其他技巧可用于加速其他过慢的 bigint 算法。但是,正如另一个答案(你应该接受!)所说,只见树木不见森林:整个方法偏离了最佳轨道。即使使用更简单的代码,也可以编写一个函数,使所有这些都运行得非常快(每次调用的最坏情况是
O(n)
),并且只需要微不足道的工作内存:世界有足够的内存来构建一个包含
2**500-1
字符的字符串(或具有这么多位的 bigint)。The bulk of the time is spent in
reverseBits()
. Overall, you're creating hundreds of thousands of new integer objects in that, up to 262143 bits long. It's essentially quadratic time. The corresponding operation for the string form is just(inverted)[::-1]
, which runs entirely at "C speed", and is worst-case linear time.Low Level, High Level
You can get a factor of 8 speedup just by replacing
reverseBits()
That's linear time. It converts
inverted
to a bit string, reverses that string, and converts the result back to an int. Other tricks can be used to speed other overly-slow bigint algorithms.But, as the other answer (which you should accept!) said, it's missing the forest for the trees: the whole approach is off the best track. It's possible, and even with simpler code, to write a function such that all these run blazing fast (
O(n)
worst case per call), and require only trivial working memory:There's not a computer in the world with enough memory to build a string with
2**500-1
characters (or a bigint with that many bits).