使用缓冲区的最长公共前缀?
如果我有一个输入字符串和一个数组:
s = "to_be_or_not_to_be"
pos = [15, 2, 8]
我试图找到引用原始 s
的数组 pos
的连续元素之间的最长公共前缀。我试图获得以下输出:
longest = [3,1]
我获得此结果的方式是通过计算以下对的最长公共前缀:
s[15:]
,即_be
和< code>s[2:] 是_be_or_not_to_be
给出 3 (_be
)s[2:]
是_be_or_not_to_be
和s[8:]
,其中_not_to_be
给出 1 (_
)
但是,如果 s
很大,当我执行诸如 s[x:]
之类的操作时,我不想创建多个副本。经过几个小时的搜索,我发现函数 buffer 只维护输入字符串的一份副本,但我没有不确定在这种情况下使用它的最有效方法是什么。关于如何实现这一目标有什么建议吗?
If I have an input string and an array:
s = "to_be_or_not_to_be"
pos = [15, 2, 8]
I am trying to find the longest common prefix between the consecutive elements of the array pos
referencing the original s
. I am trying to get the following output:
longest = [3,1]
The way I obtained this is by computing the longest common prefix of the following pairs:
s[15:]
which is_be
ands[2:]
which is_be_or_not_to_be
giving 3 (_be
)s[2:]
which is_be_or_not_to_be
ands[8:]
which is_not_to_be
giving 1 (_
)
However, if s
is huge, I don't want to create multiple copies when I do something like s[x:]
. After hours of searching, I found the function buffer that maintains only one copy of the input string but I wasn't sure what is the most efficient way to utilize it here in this context. Any suggestions on how to achieve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这是一种没有
buffer
的方法,它不会复制,因为它一次只查看一个字符:我使用
islice
、izip
,和xrange
如果您正在谈论可能非常长的字符串。我也无法抗拒这个“One Liner”,它甚至不需要任何索引:
最后一种方法,使用 os.path.commonprefix:
Here is a method without
buffer
which doesn't copy, as it only looks at one character at a time:I use
islice
,izip
, andxrange
in case you're talking about potentially very long strings.I also couldn't resist this "One Liner" which doesn't even require any indexing:
One final method, using
os.path.commonprefix
:让 Python 为你管理内存。不要过早优化。
要获得确切的输出,您可以执行以下操作(如 @agf 建议) :
Let Python to manage memory for you. Don't optimize prematurely.
To get the exact output you could do (as @agf suggested):
我认为你对副本的担心是没有根据的。请参见下文:
除非您复制切片并留下对副本的引用,否则我认为这不会导致任何问题。
编辑:有一些关于字符串实习的技术细节和我自己不太清楚的东西。但我确信字符串切片并不总是副本:
我想我想要给出的答案是让python自己管理它的内存,首先,你可以看看内存缓冲区和视图稍后如果需要的话。如果这已经是您遇到的真正问题,请用实际问题的详细信息更新您的问题。
I think your worrying about copies is unfounded. See below:
Unless you're copying the slices and leaving references to the copies hanging around, I wouldn't think it could cause any problem.
edit: There are technical details with string interning and stuff that I'm not really clear on myself. But I'm sure that a string slice is not always a copy:
I guess the answer I'm trying to give is to just let python manage its memory itself, to begin with, you can look at memory buffers and views later if needed. And if this is already a real problem occurring for you, update your question with details of what the actual problem is.
下面给出了使用缓冲区的一种方法。然而,可能还有更快的方法。
One way of doing using
buffer
this is give below. However, there could be much faster ways.