是否有等效的 StringPosition[] 用于搜索列表？如果没有，实现此目的最快的方法是什么？

发布于 2024-12-25 03:46:45 字数 2561 浏览 2 评论 0原文

是否有一个函数可以在元素序列中搜索子序列？我正在寻找 List 的 StringPosition 的类似物。在我当前的应用程序中，我正在使用整数列表，但我对通用的 FindSequence[list, pattern, n] 函数感兴趣，该函数将查找第一个 n 次出现list 中的pattern。

这是一个玩具示例：

生成一些数据：

In[1]:= $HistoryLength = 0    
Out[1]= 0

In[2]:= Timing[digits = First[RealDigits[\[Pi], 2, 10000000]];]    
Out[2]= {26.5, Null}

让我们将其转换为字符串，以便我们可以与 StringPosition 进行比较。这是非常慢的内存消耗。（评估完成后，内存将被释放。）

In[3]:= Timing[str = StringJoin[ToString /@ digits];]    
Out[3]= {43.813, Null}

我正在寻找这个子序列：

In[4]:= patt = {1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 
   1, 0, 1, 1};

In[5]:= strpatt = StringJoin[ToString /@ patt];

搜索字符串非常快：

In[6]:= StringPosition[str, strpatt] // Timing    
Out[6]= {1.047, {{5737922, 5737943}}}

这是搜索数值数组的简单实现。它比 StringPosition 慢：

In[7]:= Timing[
           corr = ListCorrelate[patt, digits];
           Select[Flatten@Position[corr, patt.patt], 
             digits[[# ;; # + Length[patt] - 1]] === patt &]
        ]

Out[7]= {2.234, {5737922}}

摘要：

是否有一个内置函数可以在列表中搜索子序列？
如果没有，数字列表的快速而优雅的实现是什么（我的实际问题）？
可以包含任何内容的通用列表怎么样？（这里有两种可能性：仅“静态”模式，例如 {1,0,1}，或一般模式，例如 {1,_,1}，尽管这些后者可能会带来复杂性。）

我希望这会有很多解决方案，一些快速，一些更优雅，一些更通用:-)

相关问题：

Mathematica 中 Position2D 的快速实现（同一事物的 2D 情况）
<一个href="https://stackoverflow.com/questions/8176518/what-is-the-best-way-to-find-the-period-of-a-repeating-list-in-mathematica">什么是最好的如何在 Mathematica 中查找（重复）列表的周期？

有趣的阅读：

Boyer-Moore 字符串搜索算法

编辑：

I刚刚找到了未记录的LongestCommonSubsequencePositions。 LongestCommonSubsequencePositions[a, b] 将找到列表 a 和 b 的最长公共子序列，并返回其 first< /em> 仅在 a 和 b 中出现。（记录的 LongestCommonSubsequence ，我不知道，只会返回子序列本身，而不是它的位置。）

它比上面的替代方案慢，但它适用于可以包含任何表达式的通用列表。

In[57]:= LongestCommonSubsequencePositions[digits, patt] // Timing
Out[57]= {5.25, {{5737922, 5737943}, {1, 22}}}

原文

Is there a function that searches a sequence of elements for a subsequence? I am looking for an analogue of StringPosition for Lists. In my current application I am working with integer lists, but I'd be interested in a general FindSequence[list, pattern, n] function which will find the first n occurrences of pattern in list.

Here's a toy example:

Generate some data:

In[1]:= $HistoryLength = 0    
Out[1]= 0

In[2]:= Timing[digits = First[RealDigits[\[Pi], 2, 10000000]];]    
Out[2]= {26.5, Null}

Let's convert it to a string so we can compare to StringPosition. This is very slow an memory hungry. (The memory is freed when the evaluation finishes.)

In[3]:= Timing[str = StringJoin[ToString /@ digits];]    
Out[3]= {43.813, Null}

I am looking for this subsequence:

In[4]:= patt = {1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 
   1, 0, 1, 1};

In[5]:= strpatt = StringJoin[ToString /@ patt];

Searching the string is very fast:

In[6]:= StringPosition[str, strpatt] // Timing    
Out[6]= {1.047, {{5737922, 5737943}}}

This is a simple implementation of searching for numerical arrays. It's slower than StringPosition:

In[7]:= Timing[
           corr = ListCorrelate[patt, digits];
           Select[Flatten@Position[corr, patt.patt], 
             digits[[# ;; # + Length[patt] - 1]] === patt &]
        ]

Out[7]= {2.234, {5737922}}

Summary:

Is there a builtin that searches lists for subsequences?
If there isn't, what is a fast and elegant implementation for numeric lists (my practical problem)?
What about generic lists that can contain anything? （There are two possibilities here: "static" patterns only such as {1,0,1}, or general ones like {1,_,1}, though these latter ones may introduce complications.)

I expect this will have many solutions, some fast, some more elegant, some more general :-)

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

紫南 2025-01-01 03:46:46

您可以将 ReplaceList 与“前缀”和“后缀”___ 一起使用并匹配整个列表。这为您提供了所有可以进行的替换（与替换相反）。然后，模式的位置就是前缀的长度 + 1。它也非常快：

In[40]:= Timing[ReplaceList[digits, Join[{pre___}, patt, {___}] :> Length[{pre}]
   + 1]]

Out[40]= {1.3059, {5737922}}

编辑：认为使用延迟规则比映射长度稍微优雅一些代码> 之后。

You can use ReplaceList with a "prefix" and "suffix" of ___ and match the whole list. This gives you all the replacements that can be made (as opposed to Replace). The position of your pattern is then simply the length of the prefix + 1. It's pretty fast as well:

In[40]:= Timing[ReplaceList[digits, Join[{pre___}, patt, {___}] :> Length[{pre}]
   + 1]]

Out[40]= {1.3059, {5737922}}

Edit: figured it's slightly more elegant to use a delayed rule than to map Length afterwards.

回复收藏 0 原文

我早已燃尽 2025-01-01 03:46:46

请查看函数seqPos（通用列表）和seqposC（整数列表，已编译），它们完全符合您的要求，而且速度很快。我在这个答案中使用了它们（对于问题你实际上链接到）。

以下是各种解决方案的计时结果：

In[15]:= seqPos[digits, patt] // Timing
Out[15]= {1.297, {5737922}}

In[16]:= seqposC[digits, patt] // Timing
Out[16]= {0.125, {5737922}}

In[17]:= 
Timing[corr = ListCorrelate[patt, digits];
      Select[Flatten@Position[corr, patt.patt], 
         digits[[# ;; # + Length[patt] - 1]] === patt &]]

Out[17]= {0.844, {5737922}}

In[18]:= Timing[
    ReplaceList[digits, Join[{pre__}, patt, {___}] :> Length[{pre}] + 1]]

Out[18]= {0.953, {5737922}}

In[19]:= AbsoluteTiming[cf[digits, patt]]
Out[19]= {3.1914063, 5737922}

这些表明您使用 ListCorrelate 的方法一点也不差。我的第一个函数 seqPos（实际上是 Norbert Pozar 造成的）有点慢，但它完全通用，而 seqposC 则快得多。

Please have a look at functions seqPos (general lists) and seqposC (integer lists, compiled), which do exactly what you ask for, and are fast. I used them in this answer (for the question you actually linked to).

Here are the timing results for various solutions:

In[15]:= seqPos[digits, patt] // Timing
Out[15]= {1.297, {5737922}}

In[16]:= seqposC[digits, patt] // Timing
Out[16]= {0.125, {5737922}}

In[17]:= 
Timing[corr = ListCorrelate[patt, digits];
      Select[Flatten@Position[corr, patt.patt], 
         digits[[# ;; # + Length[patt] - 1]] === patt &]]

Out[17]= {0.844, {5737922}}

In[18]:= Timing[
    ReplaceList[digits, Join[{pre__}, patt, {___}] :> Length[{pre}] + 1]]

Out[18]= {0.953, {5737922}}

In[19]:= AbsoluteTiming[cf[digits, patt]]
Out[19]= {3.1914063, 5737922}

These indicate that your approach with ListCorrelate is not bad at all. My first function seqPos (it is actually due to Norbert Pozar) is a bit slower but then it is completely general, while seqposC is much faster.

回复收藏 0 原文

帥小哥 2025-01-01 03:46:46

这是一个编译版本，它避免了字符串转换，但速度并不快。

cf = Compile[{{in, _Integer, 1}, {patt, _Integer, 1}},
  Block[{lp, res},
   lp = Length[patt];
   res = 0;
   Do[
    If[Total[Abs[in[[i ;; i + lp - 1]] - patt]] == 0,
      res = i; Break[]];
    , {i, 1, Length[in] - lp}];
   res
   ]
  , CompilationTarget -> "C", RuntimeOptions -> "Speed"]


AbsoluteTiming[cf[digits, patt]]

Here is a compiled version, that avoids the String conversion but is not faster.

cf = Compile[{{in, _Integer, 1}, {patt, _Integer, 1}},
  Block[{lp, res},
   lp = Length[patt];
   res = 0;
   Do[
    If[Total[Abs[in[[i ;; i + lp - 1]] - patt]] == 0,
      res = i; Break[]];
    , {i, 1, Length[in] - lp}];
   res
   ]
  , CompilationTarget -> "C", RuntimeOptions -> "Speed"]


AbsoluteTiming[cf[digits, patt]]

回复收藏 0 原文

~没有更多了~