为什么在处理二进制字符串的问题中使用整数比使用字符串慢得多？

发布于 2025-01-13 23:55:47 字数 2262 浏览 2 评论 0原文

我解决了以下leet代码问题https://leetcode。 com/problems/find-kth-bit-in-nth-binary-string/ 使用字符串（类 Solution）和整数（类解决方案2）。我认为整数解决方案应该更快并且使用更少的内存。但实际上需要更长的时间。当 n=18 和 k=200 时，需要 10.1 秒，而字符串解决方案需要 0.13 秒。

import time

class Solution:
    def findKthBit(self, n: int, k: int) -> str:
        i = 0
        Sn = "0"
        Sn = self.findR(Sn, i, n)
        return Sn[k-1]

    def findR(self, Sn, i, n):
        if i == n:
            return Sn
        newSn = self.calcNewSn(Sn, i)
        return self.findR(newSn, i+1, n)

    def calcNewSn(self, Sn, i):
        inverted = ""
        for c in Sn:
            inverted += "1" if c == "0" else "0"
        newSn = Sn + "1" + (inverted)[::-1]
        return newSn


class Solution2:
    def findKthBitBin(self, n: int, k: int) -> str:
        i = 0
        # MSB (S1) has to be 1 for bit operations but is actually always 0
        Sn = 1
        Sn = self.findRBin(Sn, i, n)
        lenSn = (2**(n+1)) - 1
        return "0" if k == 1 else str((Sn >> (lenSn - k)) & 1)

    def findRBin(self, Sn, i, n):
        if i == n:
            return Sn
        newSn = self.calcNewSnBin(Sn, i)
        return self.findRBin(newSn, i+1, n)

    def calcNewSnBin(self, Sn, i):
        lenSn = 2**(i+1) - 1
        newSn = (Sn << 1) | 1
        inverted = (~Sn | (1 << (lenSn - 1))) & self.getMask(lenSn)
        newSn = self.reverseBits(newSn, inverted, lenSn)
        return newSn

    def getMask(self, lenSn):
        mask = 0
        for i in range(lenSn):
            mask |= (1 << i)
        return mask

    def reverseBits(self, newSn, inverted, lenSn):
        for i in range(lenSn):
            newSn <<= 1
            newSn |= inverted & 1
            inverted >>= 1
        return newSn
 
sol = Solution()
sol2 = Solution2()

start = time.time()
print(sol.findKthBit(18, 200))
end = time.time()
print(f"time using strings: {end - start}")
start = time.time()
print(sol2.findKthBitBin(18, 200))
end = time.time()
print(f"time using large integers: {end - start}")

为什么使用整数的解决方案这么慢？与字符串相比，其他语言中大整数的实现是否更快？

原文

I solved the following leet code problem https://leetcode.com/problems/find-kth-bit-in-nth-binary-string/ by using strings (class Solution) and integers (class Solution2). I thought the integer solution should be a lot faster and using much less memory. But actually it takes a lot longer. With n=18 and k=200 it took 10.1 seconds compared to 0.13 seconds of the string solution.

import time

class Solution:
    def findKthBit(self, n: int, k: int) -> str:
        i = 0
        Sn = "0"
        Sn = self.findR(Sn, i, n)
        return Sn[k-1]

    def findR(self, Sn, i, n):
        if i == n:
            return Sn
        newSn = self.calcNewSn(Sn, i)
        return self.findR(newSn, i+1, n)

    def calcNewSn(self, Sn, i):
        inverted = ""
        for c in Sn:
            inverted += "1" if c == "0" else "0"
        newSn = Sn + "1" + (inverted)[::-1]
        return newSn


class Solution2:
    def findKthBitBin(self, n: int, k: int) -> str:
        i = 0
        # MSB (S1) has to be 1 for bit operations but is actually always 0
        Sn = 1
        Sn = self.findRBin(Sn, i, n)
        lenSn = (2**(n+1)) - 1
        return "0" if k == 1 else str((Sn >> (lenSn - k)) & 1)

    def findRBin(self, Sn, i, n):
        if i == n:
            return Sn
        newSn = self.calcNewSnBin(Sn, i)
        return self.findRBin(newSn, i+1, n)

    def calcNewSnBin(self, Sn, i):
        lenSn = 2**(i+1) - 1
        newSn = (Sn << 1) | 1
        inverted = (~Sn | (1 << (lenSn - 1))) & self.getMask(lenSn)
        newSn = self.reverseBits(newSn, inverted, lenSn)
        return newSn

    def getMask(self, lenSn):
        mask = 0
        for i in range(lenSn):
            mask |= (1 << i)
        return mask

    def reverseBits(self, newSn, inverted, lenSn):
        for i in range(lenSn):
            newSn <<= 1
            newSn |= inverted & 1
            inverted >>= 1
        return newSn
 
sol = Solution()
sol2 = Solution2()

start = time.time()
print(sol.findKthBit(18, 200))
end = time.time()
print(f"time using strings: {end - start}")
start = time.time()
print(sol2.findKthBitBin(18, 200))
end = time.time()
print(f"time using large integers: {end - start}")

Why is the solution using integers so slow? Are the implemetations of big ints faster in other languages compared to strings?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谁的年少不轻狂 2025-01-20 23:55:47

问题不在于整数实现有多快或多慢。问题是您编写了一个需要随机访问可变序列数据类型（如列表）的算法，而整数不是随机访问可变序列。

例如，查看您的 reverseBits 实现：

def reverseBits(self, newSn, inverted, lenSn):
    for i in range(lenSn):
        newSn <<= 1
        newSn |= inverted & 1
        inverted >>= 1
    return newSn

对于列表，您只需调用 list.reverse() 即可。相反，您的代码会一遍又一遍地转换 newSn 和 inverted，在每次迭代中重复复制几乎所有涉及的数据。

字符串也不是一种随机访问的可变序列数据类型，但它们更接近，并且 Python 的标准实现有一个奇怪的优化，它实际上确实尝试像您编写的那样在循环中改变字符串：

def calcNewSn(self, Sn, i):
    inverted = ""
    for c in Sn:
        inverted += "1" if c == "0" else "0"
    newSn = Sn + "1" + (inverted)[::-1]
    return newSn

而不是构建一个新字符串并复制 += 的所有数据，Python 尝试就地调整字符串大小。当它起作用时，它可以避免您在基于整数的实现中进行的大部分不必要的复制。

编写算法的正确方法是使用列表，而不是字符串或整数。然而，还有一种更好的方法，采用不同的算法，根本不需要构建位序列。您不需要整个位序列。您只需要计算一位即可。弄清楚如何计算那一点。

The problem isn't how fast or slow the integer implementation is. The problem is that you've written an algorithm that wants a random-access mutable sequence data type, like a list, and integers are not a random-access mutable sequence.

For example, look at your reverseBits implementation:

def reverseBits(self, newSn, inverted, lenSn):
    for i in range(lenSn):
        newSn <<= 1
        newSn |= inverted & 1
        inverted >>= 1
    return newSn

With a list, you could just call list.reverse(). Instead, your code shifts newSn and inverted over and over, copying almost all the data involved repeatedly on every iteration.

Strings aren't a random-access mutable sequence data type either, but they're closer, and the standard implementation of Python has a weird optimization that actually does try to mutate strings in loops like the one you've written:

def calcNewSn(self, Sn, i):
    inverted = ""
    for c in Sn:
        inverted += "1" if c == "0" else "0"
    newSn = Sn + "1" + (inverted)[::-1]
    return newSn

Instead of building a new string and copying all the data for the +=, Python tries to resize the string in place. When that works, it avoids most of the unnecessary copying you're doing with the integer-based implementation.

The right way to write your algorithm would be to use lists, instead of strings or integers. However, there's an even better way, with a different algorithm that doesn't need to build a bit sequence at all. You don't need the whole bit sequence. You just need to compute a single bit. Figure out how to compute just that bit.

回复收藏 0 原文

第几種人 2025-01-20 23:55:47

大部分时间花在 reverseBits() 上。总的来说，您将在其中创建数十万个新的整数对象，长度最多为 262143 位。它本质上是二次时间。字符串形式的相应操作就是(inverted)[::-1]，它完全以“C 速度”运行，并且是最坏情况的线性时间。

低级别、高级别

只需替换 reverseBits() 即可获得 8 倍的加速，

    def reverseBits(self, newSn, inverted, lenSn):
        inverted = int(f"{inverted:0{lenSn}b}"[::-1], 2)
        return (newSn << lenSn) | inverted

即线性时间。它将inverted转换为位字符串，反转该字符串，然后将结果转换回int。其他技巧可用于加速其他过慢的 bigint 算法。

但是，正如另一个答案（你应该接受！）所说，只见树木不见森林：整个方法偏离了最佳轨道。即使使用更简单的代码，也可以编写一个函数，使所有这些都运行得非常快（每次调用的最坏情况是O(n)），并且只需要微不足道的工作内存

>>> [findKthBit(4, i) for i in range(1, 16)] == list("011100110110001")
True
>>> findKthBit(500, 2**20)
'1'
>>> findKthBit(500, 2**20 - 1)
'1'
>>> findKthBit(500, 2**20 + 1)
'0'

：世界有足够的内存来构建一个包含 2**500-1 字符的字符串（或具有这么多位的 bigint）。

The bulk of the time is spent in reverseBits(). Overall, you're creating hundreds of thousands of new integer objects in that, up to 262143 bits long. It's essentially quadratic time. The corresponding operation for the string form is just (inverted)[::-1], which runs entirely at "C speed", and is worst-case linear time.

Low Level, High Level

You can get a factor of 8 speedup just by replacing reverseBits()

    def reverseBits(self, newSn, inverted, lenSn):
        inverted = int(f"{inverted:0{lenSn}b}"[::-1], 2)
        return (newSn << lenSn) | inverted

That's linear time. It converts inverted to a bit string, reverses that string, and converts the result back to an int. Other tricks can be used to speed other overly-slow bigint algorithms.

But, as the other answer (which you should accept!) said, it's missing the forest for the trees: the whole approach is off the best track. It's possible, and even with simpler code, to write a function such that all these run blazing fast (O(n) worst case per call), and require only trivial working memory:

>>> [findKthBit(4, i) for i in range(1, 16)] == list("011100110110001")
True
>>> findKthBit(500, 2**20)
'1'
>>> findKthBit(500, 2**20 - 1)
'1'
>>> findKthBit(500, 2**20 + 1)
'0'

There's not a computer in the world with enough memory to build a string with 2**500-1 characters (or a bigint with that many bits).

回复收藏 0 原文

~没有更多了~