Mathematica 的模式匹配优化不佳？

发布于 2024-12-21 07:41:47 字数 744 浏览 1 评论 0原文

我最近询问为什么 PatternTest 导致大量不必要的评估：PatternTest 未优化？< /a> Leonid 回答说，对于我看来相当值得怀疑的方法来说，这是必要的。我可以接受这一点，尽管我更喜欢更有效的替代方案。

我现在意识到，我相信 Leonid 已经说过一段时间了，这个问题在 Mathematica 中表现得更深，我感到很困扰。我不明白为什么这没有或不能更好地优化。

考虑这个例子：

list = RandomReal[9, 20000];
Head /@ list; // Timing
MatchQ[list, {x__Integer, y__}] // Timing

{0., Null}

{1.014, False}

检查列表的头部基本上是瞬时的，但检查模式需要一秒钟以上的时间。当然，Mathematica 可以认识到，由于列表的第一个元素不是整数，因此模式无法匹配，并且与 PatternTest 的情况不同，我看不到如何存在任何可变性在模式中。对此有何解释？

关于打包数组似乎存在一些混乱，据我所知，这与这个问题没有关系。相反，我关心的是所有列表（打包或未打包）的 O(n²) 时间复杂度。

原文

I recently inquired about why PatternTest was causing a multitude of needless evaluations: PatternTest not optimized? Leonid replied that it is necessary for what seems to me as a rather questionable method. I can accept that, though I would prefer a more efficient alternative.

I now realize, which I believe Leonid has been saying for some time, that this problem runs much deeper in Mathematica, and I am troubled. I cannot understand why this is not or cannot be better optimized.

Consider this example:

list = RandomReal[9, 20000];
Head /@ list; // Timing
MatchQ[list, {x__Integer, y__}] // Timing

{0., Null}

{1.014, False}

Checking the heads of the list is essentially instantaneous, yet checking the pattern takes over a second. Surely Mathematica could recognize that since the first element of the list is not an Integer, the pattern cannot match, and unlike the case with PatternTest I cannot see how there is any mutability in the pattern. What is the explanation for this?

There appears to be some confusion regarding packed arrays, which as far as I can tell have no bearing on this question. Rather, I am concerned with the O(n²) time complexity on all lists, packed or unpacked.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

白馒头 2024-12-28 07:41:47

MatchQ 为此类测试解压。原因是没有实施这种特殊情况。原则上它可以包含任何东西。

On["Packing"]
MatchQ[list, {x_Integer, y__}] // Timing

MatchQ[list, {x__Integer, y__}] // Timing

改进这一点非常棘手 - 如果你破坏了模式匹配器，你就会遇到严重的问题。

编辑1：
确实，拆包并不是 O(n^2) 复杂度的原因。然而，它确实表明，对于 MatchQ[list, {x__Integer, y__}] 部分，代码转到算法的另一部分（需要解包列表）。其他需要注意的事项：仅当两个模式均为 __ 时，才会出现这种复杂性，如果其中任何一个为 _，则算法具有更好的复杂性。

然后该算法会遍历所有 n*n 个潜在匹配，并且似乎没有早期救助。大概是因为可以构造需要这种复杂性的其他模式 - 问题是上述模式迫使匹配器采用非常通用的算法。

然后我希望有 MatchQ[list, {Shortest[x__Integer], __}] 和朋友，但没有成功。

所以，我的两分钱：要么使用不同的模式（并使用 On["Packing"] 来查看它是否进入通用匹配器），要么进行预检查 DeveloperPackedArrayQ[expr] && Head[expr[[1]]]===Integer 或类似的值。

MatchQ unpacks for these kinds of tests. The reason is that no special case for this has been implemented. In principle it could contain anything.

On["Packing"]
MatchQ[list, {x_Integer, y__}] // Timing

MatchQ[list, {x__Integer, y__}] // Timing

Improving this is very tricky - if you break the pattern matcher you have a serious problem.

Edit 1:
It is true that the unpacking is not the cause for the O(n^2) complexity. It does, however, show that for the MatchQ[list, {x__Integer, y__}] part the code goes to another part of the algorithm (which needs the lists to be unpacked). Some other things to note: This complexity arises only if both patterns are __ if either one of them is _ the algorithm has a better complexity.

The algorithm then goes through all n*n potential matches and there seems no early bailout. Presumably because other patters could be constructed that would need this complexity - The issue is that the above pattern forces the matcher to a very general algorithm.

I then was hoping for MatchQ[list, {Shortest[x__Integer], __}] and friends but to no avail.

So, my two cents: either use a different pattern (and have On["Packing"] to see if it goes to the general matcher) or do a pre-check DeveloperPackedArrayQ[expr] && Head[expr[[1]]]===Integer or some such.

回复收藏 0 原文

在巴黎塔顶看东京樱花 2024-12-28 07:41:47

@第一个答案的作者。据我从逆向工程和阅读可用信息得知，这可能是由于检查模式的方式不同所致。事实上 - 正如他们所说 - 一个特殊的哈希码用于模式匹配。该散列（基本上是 FNV-1 轮）使得检查与所涉及的表达式类型相关的特定模式（一些异或运算的问题）变得非常容易。哈希算法在表达式内循环，每个子部分都与前一个子部分的输出进行异或。每个原子表达式都使用特殊的异或值 - machineInts、machineReals、bigNums、Rationals 等。因此，例如，_Integer很容易检查，因为任何整数的散列都是由整数的异或值形成的，所以我们需要做的就是进行逆运算并查看是否匹配 - 即我们是否得到一些特定的值或类似的东西（抱歉，如果我对实际的实现细节含糊其辞。它是WIP）。对于一般或不常见的模式，检查可能不会利用此哈希内容，并且需要不同的东西。

@the OP Head[] 只是作用于内部表达式，获取表达式的第一个指针的值（表达式被实现为指针数组）。因此，这样做就像复制和打印字符串一样简单 - 非常非常快。在这种情况下甚至没有调用模式匹配引擎。

回复收藏 0 原文

~没有更多了~