为什么 Cases[] 这里这么慢?有什么技巧可以加快速度吗?

发布于 2024-12-24 02:25:19 字数 1335 浏览 3 评论 0原文

在尝试粘贴图像时,我注意到 Cases[] 非常慢。

要重现,首先将大图像复制到剪贴板(只需按 Print Screen),然后评估以下内容:

In[33]:= SetSystemOptions["PackedArrayOptions" -> "UnpackMessage" -> True];

In[34]:= AbsoluteTiming[nb = NotebookGet@ClipboardNotebook[];]
Out[34]= {0.4687500, Null}

In[35]:= AbsoluteTiming[d1 = nb[[1, 1, 1, 1, 1, 1, 1]];]
Out[35]= {0., Null}

In[36]:= AbsoluteTiming[d2 = First@Cases[nb, r_RasterBox :> First[r], Infinity, 1];]

During evaluation of In[36]:= Developer`FromPackedArray::unpack: Unpacking array in call to Notebook. >>

Out[36]= {0.9375000, Null}

(我在 Windows 上执行此操作,不确定粘贴代码是否与在 Windows 上相同)其他系统。)

请注意,与直接使用 Part 相比,使用 Cases 提取数据的速度非常慢,即使我明确告诉 Cases > 我只需要一场比赛。

我确实发现(如上所示)Cases 由于某种原因会触发解包,即使搜索应该在到达内部的打包数组之前停止。使用比 Infinity 更浅的级别规范可能会避免解包。

问题:这里使用CasesPart既简单又可靠(如果子表达式可以出现在不同的位置怎么办?)有没有办法或许可以通过使用不同的模式或不同的选项来使 Cases 更快?


可能相关的问题:Mathematica 的模式匹配优化不佳? (这就是为什么我将 Cases 规则从 RasterBox[data_, ___] -> data 更改为 r_RasterBox :> First[r] .)

While trying to paste images, I noticed that Cases[] is very slow.

To reproduce, first copy a large image to the clipboard (just press Print Screen), then evaluate the following:

In[33]:= SetSystemOptions["PackedArrayOptions" -> "UnpackMessage" -> True];

In[34]:= AbsoluteTiming[nb = NotebookGet@ClipboardNotebook[];]
Out[34]= {0.4687500, Null}

In[35]:= AbsoluteTiming[d1 = nb[[1, 1, 1, 1, 1, 1, 1]];]
Out[35]= {0., Null}

In[36]:= AbsoluteTiming[d2 = First@Cases[nb, r_RasterBox :> First[r], Infinity, 1];]

During evaluation of In[36]:= Developer`FromPackedArray::unpack: Unpacking array in call to Notebook. >>

Out[36]= {0.9375000, Null}

(I did this on Windows, not sure if the paste code is the same on other systems.)

Note that extracting the data using Cases is extremely slow compared to using Part directly, even though I explicitly tell Cases that I need only one match.

I did find out (as shown above) that Cases triggers unpacking for some reason, even though the search should stop before it reaches the packed array inside. Using a shallower level specification than Infinity might avoid unpacking.

Question: Using Cases here is both easier and more reliable than Part (what if the subexpression can appear in different positions?) Is there a way to make Cases fast here, perhaps by using a different pattern or different options?


Possibly related question: Mathematica's pattern matching poorly optimized?
(This is why I changed the Cases rule from RasterBox[data_, ___] -> data to r_RasterBox :> First[r].)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

飘逸的'云 2024-12-31 02:25:19

我现在无法访问 Mathematica,因此接下来的内容未经测试。我的猜测是 Cases 在这里解包,因为它是深度优先搜索,因此首先看到打包的数组。如果这是正确的,那么您可以使用规则(ReplaceAll,而不是 Replace),并在第一次匹配时抛出异常:

Module[{tag},
   Catch[
     nb /. r_RasterBox :> Block[{}, Throw[First[r], tag] /; True]; 
     $Failed, 
     tag]
]

正如我所说,这只是一个未经测试的猜测。

编辑 2:一种基于从模式匹配器

序言

中屏蔽部分表达的方法在第一个编辑(如下)中,提出了一种相当繁重的方法。在许多情况下,人们可以采取另一条路线。在这个特定问题(以及许多其他类似问题)中,主要问题是以某种方式屏蔽某些子表达式免受模式匹配器的影响。这也可以通过使用规则来实现,用一些虚拟符号暂时替换感兴趣的部分。

代码

这里是对 Cases 的修改,它就是这样做的:

Clear[casesShielded];
casesShielded[expr_,pt_,shieldPattern_,levspec_,n_,opts:OptionsPattern[]]:=
   Module[{dummy,inverseShieldingRules, shielded, i=0},
      inverseShieldingRules =
        If[#==={},#,Dispatch@First@#]&@
           Reap[shielded= expr/.(p:shieldPattern):>
             With[{eval = With[{ind = ++i},Sow[dummy[ind]:>p];dummy[ind]]},
                eval/;True];
           ][[2]];
      Cases[shielded,pt,levspec,n,opts]/.inverseShieldingRules]; 

此版本的 Cases 有一个附加参数 shieldPattern(第三个),它指示哪个子表达式必须与模式匹配器屏蔽。

优点和适用性

上面的代码相当轻量(与下面 edit1 的建议相比),并且它允许人们完全重用和利用现有的 Cases 功能。这适用于主模式(或规则)对相关部分的屏蔽不敏感的情况,这是一种相当常见的情况(特别是涵盖 _h 类型的模式,包括这种情况就在手边)。这也可能比 myCases 的应用程序更快(如下所述)。

手头的案例

在这里,我们需要这个调用:

In[55]:=    
(d4=First@casesShielded[nb,x_RasterBox:>First@x,
    p_List/;Developer`PackedArrayQ[p],Infinity,1]);//Timing

Out[55]= {0.,Null}

结果当然和以前一样:

In[61]:= d2===d4
Out[61]= True

编辑:一个替代的类似案例的函数

动机和代码

我花了一段时间来生成这个函数,而且我不是 100% 确定它总是正确工作,但这里是 Cases 的一个版本,虽然仍然以深度优先的方式工作,但在子表达式之前分析整个表达式:

ClearAll[myCases];
myCases[expr_, lhs_ :> rhs_, upToLevel_: 1, max : (_Integer | All) : All, 
    opts : OptionsPattern[]] :=
 Module[{tag, result, f, found = 0, aux},
   With[{
    mopts = FilterRules[{opts}, {Heads -> False}],
    frule =
       Apply[
         RuleDelayed,
         Hold[lhs, With[{eval =  aux}, Null /; True]] /.
            {aux :> Sow[rhs, tag] /; max === All, 
             aux :> (found++; Sow[rhs, tag])}
       ]
    },
    SetAttributes[f, HoldAllComplete];
    If[max =!= All,
       _f /; found >= max := Throw[Null, tag]
    ];
    f[x_, n_] /; n > upToLevel := Null;
    f[x_, n_] :=
      Replace[
       HoldComplete[x],
       {
          frule,
          ex : _[___] :> 
            With[{ev = 
              Replace[
                HoldComplete[ex],
                y_ :> With[{eval = f[y, n + 1]}, Null /; True],
                {2},
                Sequence @@ mopts
              ]}, 
              Null /; True
            ]
       },
       {1}
      ]
   ]; (* external With *)
   result = 
     If[# === {}, #, First@#] &@
        Reap[Catch[f[expr, 0], tag], tag, #2 &][[2]];
   (* For proper garbage-collection of f *)
   ClearAll[f]; 
   result
 ]

是如何工作的 这不是最简单的代码片段,所以这里有一些注释。此版本的 Cases 基于我首先建议的相同想法 - 即,使用规则替换语义首先尝试对整个表达式进行模式匹配,并且仅当失败时才转到子表达式。我强调,这仍然是深度优先遍历,但与标准遍历不同(标准遍历用于大多数表达式遍历函数,如 MapScan案例等)。我使用 ReapSow 来收集中间结果(匹配)。这里最棘手的部分是防止子表达式求值,我必须将子表达式包装到 HoldComplete 中。因此,我必须使用 Trott-Strzebonski 技术(的嵌套版本)(也许有更简单的方法,但我看不到它们),以便能够对保留(子)表达式内的规则右侧进行求值,并使用具有适当级别规范的Replace,考虑额外添加的HoldComplete包装器。我在规则中返回Null,因为主要操作是Sow各个部分,所以最后注入到原始表达式中的内容并不重要。代码添加了一些额外的复杂性来支持级别规范(我只支持单个整数级别,指示要搜索的最大级别,而不是可能的 lev.specs 的全部范围),找到的结果的最大数量,以及Heads 选项。当我们想要查找所有元素时,frule 的代码不会引入对找到的元素进行计数的开销。我使用相同的 Module 生成的标签,既作为 Sow 的标签,又作为异常的标签(当找到足够的匹配项时,我用它来停止进程) ,就像我原来的建议一样)。

测试和基准测试

为了对此功能进行重要测试,我们可以在 myCasesDownValues 中查找所有符号,并与 Cases 进行比较code>:

In[185]:= 
And@@Flatten[
    Outer[
       myCases[DownValues[myCases],s_Symbol:>Hold[s],#1,Heads->#2]  ===
       Cases[DownValues[myCases],s_Symbol:>Hold[s],#1,Heads->#2]&,
       Range[0,20],
       {True,False}
    ]]

Out[185]= True

myCases 函数比 Cases 慢大约 20-30 倍,不过:

In[186]:= 
Do[myCases[DownValues[myCases],s_Symbol:>Hold[s],20,Heads->True],{500}];//Timing
Out[186]= {3.188,Null}

In[187]:= Do[Cases[DownValues[myCases],s_Symbol:>Hold[s],20,Heads->True],{500}];//Timing
Out[187]= {0.125,Null}

手头的情况

很容易检查 myCases 是否解决了问题原来的问题解包:

In[188]:= AbsoluteTiming[d3=First@myCases[nb,r_RasterBox:>First[r],Infinity,1];]
Out[188]= {0.0009766,Null}

In[189]:= d3===d2
Out[189]= True

希望 myCases 对于这种情况普遍有用,尽管使用它代替 Cases 的性能损失很大并且必须考虑在内。

I don't have access to Mathematica right now, so what follows is untested. My guess is that Cases unpacks here because it searches depth-first, and so sees the packed array first. If this is correct, then you could use rules instead (ReplaceAll, not Replace), and throw an exception upon first match:

Module[{tag},
   Catch[
     nb /. r_RasterBox :> Block[{}, Throw[First[r], tag] /; True]; 
     $Failed, 
     tag]
]

As I said, this is just an untested guess.

Edit 2: an approach based on shielding parts of expression from the pattern-matcher

Preamble

In the first edit (below) a rather heavy approach is presented. In many cases, one can take an alternative route. In this particular problem (and many others like it), the main problem is to somehow shield certain sub-expressions from the pattern-matcher. This can be achieved also by using rules, to temporarily replace the parts of interest by some dummy symbols.

Code

Here is a modification of Cases which does just that:

Clear[casesShielded];
casesShielded[expr_,pt_,shieldPattern_,levspec_,n_,opts:OptionsPattern[]]:=
   Module[{dummy,inverseShieldingRules, shielded, i=0},
      inverseShieldingRules =
        If[#==={},#,Dispatch@First@#]&@
           Reap[shielded= expr/.(p:shieldPattern):>
             With[{eval = With[{ind = ++i},Sow[dummy[ind]:>p];dummy[ind]]},
                eval/;True];
           ][[2]];
      Cases[shielded,pt,levspec,n,opts]/.inverseShieldingRules]; 

This version of Cases has one additional parameter shieldPattern (third one), which indicates which sub-expressions must be shielded from the pattern-matcher.

Advantages and applicability

The code above is pretty light-weight (compared to the suggestion of edit1 below), and it allows one to fully reuse and leverage the existing Cases functionality. This will work for cases when the main pattern (or rule) is insensitive to shielding of the relevant parts, which is a rather common situation (and in particular, covers patterns of the type _h, including the case at hand). This may also be faster than the application of myCases (described below).

The case at hand

Here, we need this call:

In[55]:=    
(d4=First@casesShielded[nb,x_RasterBox:>First@x,
    p_List/;Developer`PackedArrayQ[p],Infinity,1]);//Timing

Out[55]= {0.,Null}

and the result is of course the same as before:

In[61]:= d2===d4
Out[61]= True

Edit: an alternative Cases-like function

Motivation and code

It took me a while to produce this function, and I am not 100 percent sure it always works correctly, but here is a version of Cases which, while still working depth-first, analyzes expression as a whole before sub-expressions:

ClearAll[myCases];
myCases[expr_, lhs_ :> rhs_, upToLevel_: 1, max : (_Integer | All) : All, 
    opts : OptionsPattern[]] :=
 Module[{tag, result, f, found = 0, aux},
   With[{
    mopts = FilterRules[{opts}, {Heads -> False}],
    frule =
       Apply[
         RuleDelayed,
         Hold[lhs, With[{eval =  aux}, Null /; True]] /.
            {aux :> Sow[rhs, tag] /; max === All, 
             aux :> (found++; Sow[rhs, tag])}
       ]
    },
    SetAttributes[f, HoldAllComplete];
    If[max =!= All,
       _f /; found >= max := Throw[Null, tag]
    ];
    f[x_, n_] /; n > upToLevel := Null;
    f[x_, n_] :=
      Replace[
       HoldComplete[x],
       {
          frule,
          ex : _[___] :> 
            With[{ev = 
              Replace[
                HoldComplete[ex],
                y_ :> With[{eval = f[y, n + 1]}, Null /; True],
                {2},
                Sequence @@ mopts
              ]}, 
              Null /; True
            ]
       },
       {1}
      ]
   ]; (* external With *)
   result = 
     If[# === {}, #, First@#] &@
        Reap[Catch[f[expr, 0], tag], tag, #2 &][[2]];
   (* For proper garbage-collection of f *)
   ClearAll[f]; 
   result
 ]

How it works

This is not the most trivial piece of code, so here are some remarks. This version of Cases is based on the same idea I suggested first - namely, use rule-substitution semantics to first attempt the pattern-match on an entire expression and only if that fails, go to sub-expressions. I stress that this is still the depth-first traversal, but different from the standard one (which is used in most expression-traversing functions like Map, Scan, Cases, etc). I use Reap and Sow to collect the intermediate results (matches). The trickiest part here is to prevent sub-expressions from evaluation, and I had to wrap sub-expressions into HoldComplete. Consequently, I had to use (a nested version of the) Trott-Strzebonski technique (perhaps, there are simpler ways, but I wasn't able to see them), to enable evauation of rules' r.h.sides inside held (sub)expressions, and used Replace with proper level spec, accounting for extra added HoldComplete wrappers. I return Null in rules, since the main action is to Sow the parts, so it does not matter what is injected into the original expression at the end. Some extra complexity was added by the code to support the level specification (I only support the single integer level indicating the maximal level up to which to search, not the full range of possible lev.specs), the maximal number of found results, and the Heads option. The code for frule serves to not introduce the overhead of counting found elements in cases when we want to find all of them. I am using the same Module-generated tag both as a tag for Sow, and as a tag for exceptions (which I use to stop the process when enough matches have been found, just like in my original suggestion).

Tests and benchmarks

To have a non-trivial test of this functionality, we can for example find all symbols in the DownValues of myCases, and compare to Cases:

In[185]:= 
And@@Flatten[
    Outer[
       myCases[DownValues[myCases],s_Symbol:>Hold[s],#1,Heads->#2]  ===
       Cases[DownValues[myCases],s_Symbol:>Hold[s],#1,Heads->#2]&,
       Range[0,20],
       {True,False}
    ]]

Out[185]= True

The myCases function is about 20-30 times slower than Cases though:

In[186]:= 
Do[myCases[DownValues[myCases],s_Symbol:>Hold[s],20,Heads->True],{500}];//Timing
Out[186]= {3.188,Null}

In[187]:= Do[Cases[DownValues[myCases],s_Symbol:>Hold[s],20,Heads->True],{500}];//Timing
Out[187]= {0.125,Null}

The case at hand

It is easy to check that myCases solves the original problem of unpacking:

In[188]:= AbsoluteTiming[d3=First@myCases[nb,r_RasterBox:>First[r],Infinity,1];]
Out[188]= {0.0009766,Null}

In[189]:= d3===d2
Out[189]= True

It is hoped that myCases can be generally useful for situations like this, although the performance penalty of using it in place of Cases is substantial and has to be taken into account.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文