Mathematica 中的树数据结构
我主要使用mathematica 作为数学工作台和编写相对较小的临时程序。然而,我正在设计一个系统,打算在 Mathematica 中进行编程。我需要将数据存储在树中,并搜索和遍历树。尽管我知道如何实现树,但我更喜欢标准的、经过测试的代码。我在 Mathematica 用户 wiki 上查看了基本数据结构有哪些类型的包。尽管 Mathematica 文档中有一个小例子,但我还没有找到。
现在我的问题:
某处是否有可用的数据结构(开源)包?
您在数据结构方面使用了什么方法?逐步开发自己的 util 包?
(不是一个问题,只是一个评论。也许...缺乏(大量可用的)开源软件包是 Mathematica 没有应有的动力的原因。恐怕是先有鸡还是先有蛋的问题。)
I have used mathematica mostly as a mathematics workbench and for writing relatively small ad-hoc programs. I am however designing a system which I intend to program in Mathematica. I need to store data in a tree, and search and traverse the tree. Although I know how to implement a tree, I prefer standard, tested code. I looked at what sort of packages there are for basic datastructures at the Mathematica users wiki. I have found none, although there is a small example in the Mathematica documentation.
Now to my question(s):
Is there an ( open source ) package for data structures available somewhere?
What approach have you used with regard to data structures? Gradually developing your own util package?
( Not a question, just a remark. Maybe... the lack of ( lots of available ) open source packages is the reason why Mathematica doesn't have the momentum it deserves. A chicken / egg issue, I am afraid. )
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在 Mathematica 中,您所做的大部分工作都是基于表达式。表达式自然具有树结构。对于深度优先遍历(这可能是最常见的),您可以使用诸如
Scan
、Map
、Cases
等函数。更传统的语言是没有简单的方法来保存表达式树中单个节点的身份,因为 Mathematica 中没有指针。此外,当您只需要在几个地方修改表达式时,Mathematica 中惯用的许多表达式操作都会复制整个表达式,因为表达式是不可变的。使用不可变的 Mathematica 表达式作为树仍然有几个优点。一是,因为它们是不可变的,所以只需查看它们就很容易理解它们存储的内容(状态和行为不混合)。另一个是有高效且通用的函数,例如
Map
、MapIndexed
或Scan
来遍历它们。例如,访问者设计模式是不可见 - 它只是Map[f,tree ,Infinity]
,内置于语言中。此外,还有诸如Cases
、Replace
、ReplaceAll
等内置函数,允许您编写非常简洁和声明性的代码解构树,找到具有特定语法或满足某些条件的树片段等。由于树不限于仅从列表构建并且从不同头构建,因此可以有效地使用它来编写非常简洁的树处理代码。最后,本着 探索性和自下而上的编程,这缩短了开发周期并最终带来更好的设计。也就是说,您当然可以实现“有状态”(可变)树数据结构。我怀疑,尚未完成的真正原因通常是与构建、修改和遍历这样的树相关的性能影响,因为它在每一步都会经历完整的符号评估过程(请参阅这篇文章了解更多详细信息)。有关如何在 Mathematica 上下文中使用二叉搜索树来获得相当高效的代码的 2 个示例,请参阅我的帖子 此处(通用符号设置)和此处(在编译代码的上下文中)。对于在 Mathematica 中惯用地构建数据结构的一般方法,我推荐 Roman Maeder 的书籍:“Mathematica 中的编程”、“Mathematica 程序员 I&II”,尤其是“Mathematica 中的计算机科学”。在后者中,他详细讨论了如何在 Mathematica 中实现二叉搜索树。 编辑 正如 @Simon 提到的,@Daniel Lichtblau 的演讲也是一个很好的资源,它展示了如何构建数据结构并使其高效。
关于在 Mathematica 中实现包含某些状态的数据结构的一般方法,以下是从我在 此 Mathgroup 线程 - 它实现“对”数据结构。
以下是如何使用它:
创建新对对象的列表:
设置字段:
检查字段:
在我提到的帖子中有更详细的讨论。以这种方式创建的“对象”的一个大问题是它们没有自动垃圾收集,这可能是顶级 Mathematica 本身实现的 OOP 扩展没有真正起飞的主要原因之一。
Mathematica 有几个 OOP 扩展,例如 Roman Maeder 的
classes.m
包(源代码在他的“Mathematica Programmer”书中)、Objectica
商业包、和其他几个。但是,在 Mathematica 本身提供有效的机制(可能基于某种指针或引用机制)来构建可变数据结构(如果发生这种情况)之前,与此类数据结构的顶级实现相关的性能可能会受到很大影响在MMA。此外,由于 mma 基于不变性作为核心思想之一,因此使可变数据结构与 Mathematica 编程的其他习惯很好地契合并不容易。编辑
这是一个与上面示例类似的基本状态树实现:
一些使用示例:
有关使用此可变树数据结构的一个重要示例,请参阅我的这篇文章。它还将这种方法与更加重用 Mathematica 本地数据结构和函数的方法进行比较,并很好地说明了本文开头讨论的要点。
In Mathematica, most of what you do is based on expressions. Expressions naturally have the tree structure. For depth-first traversals (which are probably most common), you can then use functions like
Scan
,Map
,Cases
etc. The difference w.r.t the more traditional languages is that there is no simple way to preserve the identity of individual node in an expression tree, since there are no pointers in Mathematica. Also, many operations on expressions that are idiomatic in Mathematica would copy the entire expression when you only need to modify it in a few places, because expressions are immutable.Using immutable Mathematica expressions as trees still has several advantages. One is that, because they are immutable, it is easy to understand what they store by just looking at them (state and behavior are not mixed). Another is that there are efficient and generic functions such as
Map
,MapIndexed
orScan
, that traverse them. For example, the visitor design pattern is invisible - it is justMap[f,tree,Infinity]
, built into the langauge. Also, there are built-in functions such asCases
,Replace
,ReplaceAll
, etc, which allow one to write very concise and declarative code to destructure trees, find pieces of trees with certain syntax or satisfying some condition, etc. Since trees are not restricted to only be built from lists and be built from different heads, one can effectively use this to write very concise tree-processing code. Finally, a freedom to very easily build any tree structure you want makes it much easier to perform experiments and prototype things, in the spirit of exploratory and bottom-up programming, which shortens the development cycle and ultimately leads to better designs.That said, you can certainly implement "stateful" (mutable) tree data structure. The real reason it has not been done yet generally is, I suspect, the performance hit associated with building, modifying and traversing such a tree, since it will undergo a full symbolic evaluation process at every step (see this post for more details on that). For 2 examples of how, for example, a binary search tree can be used in Mathematica context for pretty efficient code, see my posts here (generic symbolic setting) and here (in the context of Compiled code). For general ways to construct data structures idiomatically in Mathematica, I recommend books of Roman Maeder: "Programming in Mathematica", "Mathematica programmer I&II", and especially "Computer Science in Mathematica". In the latter he has a detailed discussion of how to implement binary search tree in Mathematica. EDIT As @Simon mentioned, the talk of @Daniel Lichtblau is also a great resource, which shows how to build data structures and make them efficient.
Regarding general ways to implement data structures in Mathematica which would incorporate some state, here is a simple example extracted from my post in this Mathgroup thread - it implements a "pair" data structure.
Here is how you could use it:
Creating a list of new pair objects :
Setting the fields :
Checking the fields :
In the post I mentioned there is a more detailed discussion. One big problem for "objects" created in this way is that there is no automatic garbage collection for them, which may be one of the major reasons why OOP extensions implemented in top-level Mathematica itself did not really take off.
There are several OOP extensions for Mathematica, such as the
classes.m
package by Roman Maeder (the source is in his "Mathematica Programmer" book), theObjectica
commercial package, and several others. But until Mathematica itself would provide efficient mechanisms (perhaps based on some kind of pointer or reference mechanism) for building mutable data structures (if this ever happens), there will probably be a large performance hit associated with top-level implementations of such data structures in mma. Also, since mma is based on immutability as one of the core ideas, it is not very easy to make mutable data structures fit well with other idioms of Mathematica programming.EDIT
Here is a bare-bones stateful tree implementation along the lines of the example above:
Some examples of use:
For one non-trivial example of use of this mutable tree data structure, see this post of mine. It also confronts this method with the one more heavily reusing Mathematica native data structures and functions, and illustrates well the points discussed at the beginning of this post.
Mathematica 在这方面确实很擅长。
我避免在 Mathematica 中创建自己的数据结构,因为它无法有效地处理它们。具体来说,Mathematica 中的一般数据结构往往比其他地方慢 10-1,000 倍,这极大地限制了它们的实际用途。例如, Mathematica 比 F# 慢 100 倍计算红黑树的深度范围。
使用列表进行逻辑编程就是一个例子,其中 Mathematica 通常比其他编译语言慢几个数量级。以下 Mathematica 程序使用链表来解决 n 皇后问题:
这是等效的 F#:
解决 8 皇后问题使用 Mathematica 需要 10.5 秒,使用 F# 需要 0.07 秒。因此,在这种情况下,F# 比 Mathematica 快 150 倍。
Stack Overflow 问题Mathematica“链接列表”和性能给出了一个更极端的例子。将该 Mathematica 代码简单地转换为 F# 可以得到一个等效程序,其运行速度比 Mathematica 快 4,000 到 200,000 倍:
具体来说,Mathematica 执行一次迭代需要 0.156 秒到 16 秒,而 F# 需要 42 微秒到 86 微秒。
如果我真的想留在 Mathematica,那么我会把我所做的一切都塞进 Mathematica 的一些内置数据结构中,例如
Dispatch
。Mathematica really excels at this.
I avoid creating my own data structures in Mathematica because it cannot handle them efficiently. Specifically, general data structures tend to be 10-1,000× slower in Mathematica than elsewhere which greatly limits their practical usefulness. For example, Mathematica is 100× slower than F# at computing the range of depths in a red-black tree.
Logic programming with lists is one example where Mathematica is typically orders of magnitude slower than other compiled languages. The following Mathematica program uses linked lists to solve the n-queens problem:
Here is the equivalent F#:
Solving the 8-queens problem takes 10.5s with Mathematica and 0.07s with F#. So F# is 150× faster than Mathematica in this case.
The Stack Overflow question Mathematica "linked lists" and performance gives a more extreme example. Naive translation of that Mathematica code into F# gives an equivalent program that runs between 4,000 and 200,000× faster than the Mathematica:
Specifically, Mathematica takes 0.156s to 16s to perform a single iteration whereas the F# takes 42µs to 86µs.
If I really want to stay in Mathematica then I shoehorn everything I'm doing into Mathematica's handful of built-in data structures, e.g.
Dispatch
.