Mathematica 的模式匹配优化不佳?
我最近询问为什么 PatternTest
导致大量不必要的评估:PatternTest 未优化?< /a> Leonid 回答说,对于我看来相当值得怀疑的方法来说,这是必要的。我可以接受这一点,尽管我更喜欢更有效的替代方案。
我现在意识到,我相信 Leonid 已经说过一段时间了,这个问题在 Mathematica 中表现得更深,我感到很困扰。我不明白为什么这没有或不能更好地优化。
考虑这个例子:
list = RandomReal[9, 20000];
Head /@ list; // Timing
MatchQ[list, {x__Integer, y__}] // Timing
{0., Null}
{1.014, False}
检查列表的头部基本上是瞬时的,但检查模式需要一秒钟以上的时间。当然,Mathematica 可以认识到,由于列表的第一个元素不是整数,因此模式无法匹配,并且与 PatternTest
的情况不同,我看不到如何存在任何可变性在模式中。对此有何解释?
关于打包数组似乎存在一些混乱,据我所知,这与这个问题没有关系。相反,我关心的是所有列表(打包或未打包)的 O(n2) 时间复杂度。
I recently inquired about why PatternTest
was causing a multitude of needless evaluations: PatternTest not optimized? Leonid replied that it is necessary for what seems to me as a rather questionable method. I can accept that, though I would prefer a more efficient alternative.
I now realize, which I believe Leonid has been saying for some time, that this problem runs much deeper in Mathematica, and I am troubled. I cannot understand why this is not or cannot be better optimized.
Consider this example:
list = RandomReal[9, 20000];
Head /@ list; // Timing
MatchQ[list, {x__Integer, y__}] // Timing
{0., Null}
{1.014, False}
Checking the heads of the list is essentially instantaneous, yet checking the pattern takes over a second. Surely Mathematica could recognize that since the first element of the list is not an Integer, the pattern cannot match, and unlike the case with PatternTest
I cannot see how there is any mutability in the pattern. What is the explanation for this?
There appears to be some confusion regarding packed arrays, which as far as I can tell have no bearing on this question. Rather, I am concerned with the O(n2) time complexity on all lists, packed or unpacked.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
MatchQ
为此类测试解压。原因是没有实施这种特殊情况。原则上它可以包含任何东西。改进这一点非常棘手 - 如果你破坏了模式匹配器,你就会遇到严重的问题。
编辑1:
确实,拆包并不是 O(n^2) 复杂度的原因。然而,它确实表明,对于
MatchQ[list, {x__Integer, y__}]
部分,代码转到算法的另一部分(需要解包列表)。其他需要注意的事项:仅当两个模式均为__
时,才会出现这种复杂性,如果其中任何一个为_
,则算法具有更好的复杂性。然后该算法会遍历所有 n*n 个潜在匹配,并且似乎没有早期救助。大概是因为可以构造需要这种复杂性的其他模式 - 问题是上述模式迫使匹配器采用非常通用的算法。
然后我希望有
MatchQ[list, {Shortest[x__Integer], __}]
和朋友,但没有成功。所以,我的两分钱:要么使用不同的模式(并使用 On["Packing"] 来查看它是否进入通用匹配器),要么进行预检查
DeveloperPackedArrayQ[expr] && Head[expr[[1]]]===Integer
或类似的值。MatchQ
unpacks for these kinds of tests. The reason is that no special case for this has been implemented. In principle it could contain anything.Improving this is very tricky - if you break the pattern matcher you have a serious problem.
Edit 1:
It is true that the unpacking is not the cause for the O(n^2) complexity. It does, however, show that for the
MatchQ[list, {x__Integer, y__}]
part the code goes to another part of the algorithm (which needs the lists to be unpacked). Some other things to note: This complexity arises only if both patterns are__
if either one of them is_
the algorithm has a better complexity.The algorithm then goes through all n*n potential matches and there seems no early bailout. Presumably because other patters could be constructed that would need this complexity - The issue is that the above pattern forces the matcher to a very general algorithm.
I then was hoping for
MatchQ[list, {Shortest[x__Integer], __}]
and friends but to no avail.So, my two cents: either use a different pattern (and have On["Packing"] to see if it goes to the general matcher) or do a pre-check
DeveloperPackedArrayQ[expr] && Head[expr[[1]]]===Integer
or some such.@第一个答案的作者
。据我从逆向工程和阅读可用信息得知,这可能是由于检查模式的方式不同所致。事实上 - 正如他们所说 - 一个特殊的哈希码用于模式匹配。该散列(基本上是 FNV-1 轮)使得检查与所涉及的表达式类型相关的特定模式(一些异或运算的问题)变得非常容易。哈希算法在表达式内循环,每个子部分都与前一个子部分的输出进行异或。每个原子表达式都使用特殊的异或值 - machineInts、machineReals、bigNums、Rationals 等。因此,例如,_Integer
很容易检查,因为任何整数的散列都是由整数的异或值形成的,所以我们需要做的就是进行逆运算并查看是否匹配 - 即我们是否得到一些特定的值或类似的东西(抱歉,如果我对实际的实现细节含糊其辞。它是WIP)。对于一般或不常见的模式,检查可能不会利用此哈希内容,并且需要不同的东西。@the OP
Head[]
只是作用于内部表达式,获取表达式的第一个指针的值(表达式被实现为指针数组)。因此,这样做就像复制和打印字符串一样简单 - 非常非常快。在这种情况下甚至没有调用模式匹配引擎。@the author of the first answer
. As far as I know from reverse-engeneering and reading of available information, it may be due to different ways the patterns are checked. In fact - as they say - a special hash code is used for pattern matching. This hash (basically a FNV-1 round) makes it very easy to check for particular patterns related to the type of expression involved (matter of a few xor operations). The hashing algorithm cycles inside the expression and each subpart is xorred with the output of the previous one. Special xor values are used for each atom expression - machineInts, machineReals, bigNums, Rationals and so on. Hence, for example,_Integer
is easy to check because the hash of any integer is formed with integer's xor value, so all we need to do is doing the inverse op and see if matches - i.e. if we get some particular value or something like that (sorry if I'm vague on actual implementation details. It's WIP). For general or uncommon patterns the check may not take advantage of this hash stuff and require something different.@the OP
Head[]
simply acts on the internal expression, taking the value of the first pointer of the expression (expressions are implemented as arrays of pointers). So doing it is as easy as copying and printing a string - very very fast. The pattern matching engine is not even called in this case.