是否有等效的 StringPosition[] 用于搜索列表?如果没有,实现此目的最快的方法是什么?
是否有一个函数可以在元素序列中搜索子序列?我正在寻找 List
的 StringPosition
的类似物。在我当前的应用程序中,我正在使用整数列表,但我对通用的 FindSequence[list, pattern, n] 函数感兴趣,该函数将查找第一个 n 次出现list
中的pattern
。
这是一个玩具示例:
生成一些数据:
In[1]:= $HistoryLength = 0
Out[1]= 0
In[2]:= Timing[digits = First[RealDigits[\[Pi], 2, 10000000]];]
Out[2]= {26.5, Null}
让我们将其转换为字符串,以便我们可以与 StringPosition
进行比较。这是非常慢的内存消耗。 (评估完成后,内存将被释放。)
In[3]:= Timing[str = StringJoin[ToString /@ digits];]
Out[3]= {43.813, Null}
我正在寻找这个子序列:
In[4]:= patt = {1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0,
1, 0, 1, 1};
In[5]:= strpatt = StringJoin[ToString /@ patt];
搜索字符串非常快:
In[6]:= StringPosition[str, strpatt] // Timing
Out[6]= {1.047, {{5737922, 5737943}}}
这是搜索数值数组的简单实现。它比 StringPosition
慢:
In[7]:= Timing[
corr = ListCorrelate[patt, digits];
Select[Flatten@Position[corr, patt.patt],
digits[[# ;; # + Length[patt] - 1]] === patt &]
]
Out[7]= {2.234, {5737922}}
摘要:
- 是否有一个内置函数可以在列表中搜索子序列?
- 如果没有,数字列表的快速而优雅的实现是什么(我的实际问题)?
- 可以包含任何内容的通用列表怎么样? (这里有两种可能性:仅“静态”模式,例如
{1,0,1}
,或一般模式,例如{1,_,1}
,尽管这些后者可能会带来复杂性。)
我希望这会有很多解决方案,一些快速,一些更优雅,一些更通用:-)
相关问题:
- Mathematica 中 Position2D 的快速实现(同一事物的 2D 情况)
- <一个href="https://stackoverflow.com/questions/8176518/what-is-the-best-way-to-find-the-period-of-a-repeating-list-in-mathematica">什么是最好的如何在 Mathematica 中查找(重复)列表的周期?
有趣的阅读:
编辑:
I刚刚找到了未记录的LongestCommonSubsequencePositions
。 LongestCommonSubsequencePositions[a, b]
将找到列表 a
和 b
的最长公共子序列,并返回其 first< /em> 仅在 a
和 b
中出现。 (记录的 LongestCommonSubsequence ,我不知道,只会返回子序列本身,而不是它的位置。)
它比上面的替代方案慢,但它适用于可以包含任何表达式的通用列表。
In[57]:= LongestCommonSubsequencePositions[digits, patt] // Timing
Out[57]= {5.25, {{5737922, 5737943}, {1, 22}}}
Is there a function that searches a sequence of elements for a subsequence? I am looking for an analogue of StringPosition
for List
s. In my current application I am working with integer lists, but I'd be interested in a general FindSequence[list, pattern, n]
function which will find the first n
occurrences of pattern
in list
.
Here's a toy example:
Generate some data:
In[1]:= $HistoryLength = 0
Out[1]= 0
In[2]:= Timing[digits = First[RealDigits[\[Pi], 2, 10000000]];]
Out[2]= {26.5, Null}
Let's convert it to a string so we can compare to StringPosition
. This is very slow an memory hungry. (The memory is freed when the evaluation finishes.)
In[3]:= Timing[str = StringJoin[ToString /@ digits];]
Out[3]= {43.813, Null}
I am looking for this subsequence:
In[4]:= patt = {1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0,
1, 0, 1, 1};
In[5]:= strpatt = StringJoin[ToString /@ patt];
Searching the string is very fast:
In[6]:= StringPosition[str, strpatt] // Timing
Out[6]= {1.047, {{5737922, 5737943}}}
This is a simple implementation of searching for numerical arrays. It's slower than StringPosition
:
In[7]:= Timing[
corr = ListCorrelate[patt, digits];
Select[Flatten@Position[corr, patt.patt],
digits[[# ;; # + Length[patt] - 1]] === patt &]
]
Out[7]= {2.234, {5737922}}
Summary:
- Is there a builtin that searches lists for subsequences?
- If there isn't, what is a fast and elegant implementation for numeric lists (my practical problem)?
- What about generic lists that can contain anything? (There are two possibilities here: "static" patterns only such as
{1,0,1}
, or general ones like{1,_,1}
, though these latter ones may introduce complications.)
I expect this will have many solutions, some fast, some more elegant, some more general :-)
Related questions:
- A fast implementation in Mathematica for Position2D (2D case of the same thing)
- What is the best way to find the period of a (repeating) list in Mathematica?
Interesting reading:
EDIT:
I just found the undocumented LongestCommonSubsequencePositions
. LongestCommonSubsequencePositions[a, b]
will find the longest common subsequence of the lists a
and b
, and return position of its first occurrence only in both a
and b
. (The documented LongestCommonSubsequence
, which I was not aware of, will only return the subsequence itself, not its position.)
It is slower than the alternatives above, but it works on general lists that can contain any expression.
In[57]:= LongestCommonSubsequencePositions[digits, patt] // Timing
Out[57]= {5.25, {{5737922, 5737943}, {1, 22}}}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以将
ReplaceList
与“前缀”和“后缀”___
一起使用并匹配整个列表。这为您提供了所有可以进行的替换(与替换
相反)。然后,模式的位置就是前缀的长度 + 1。它也非常快:编辑:认为使用延迟规则比映射
长度稍微优雅一些代码> 之后。
You can use
ReplaceList
with a "prefix" and "suffix" of___
and match the whole list. This gives you all the replacements that can be made (as opposed toReplace
). The position of your pattern is then simply the length of the prefix + 1. It's pretty fast as well:Edit: figured it's slightly more elegant to use a delayed rule than to map
Length
afterwards.请查看函数
seqPos
(通用列表)和seqposC
(整数列表,已编译),它们完全符合您的要求,而且速度很快。我在这个答案中使用了它们(对于问题你实际上链接到)。以下是各种解决方案的计时结果:
这些表明您使用
ListCorrelate
的方法一点也不差。我的第一个函数 seqPos(实际上是 Norbert Pozar 造成的)有点慢,但它完全通用,而 seqposC 则快得多。Please have a look at functions
seqPos
(general lists) andseqposC
(integer lists, compiled), which do exactly what you ask for, and are fast. I used them in this answer (for the question you actually linked to).Here are the timing results for various solutions:
These indicate that your approach with
ListCorrelate
is not bad at all. My first functionseqPos
(it is actually due to Norbert Pozar) is a bit slower but then it is completely general, whileseqposC
is much faster.这是一个编译版本,它避免了字符串转换,但速度并不快。
Here is a compiled version, that avoids the String conversion but is not faster.