Mathematica 中的 Prepend 与 Append 性能

发布于 2024-12-20 06:11:35 字数 581 浏览 1 评论 0 原文

在类似 Lisp 的系统中,cons 是在列表中 PREPEND 元素的正常方法。附加到列表的函数要昂贵得多,因为它们将列表遍历到末尾,然后用对附加项的引用替换最终的空值。 IOW (pseudoLisp)

(prepend list item) = (cons item list) = cheap!
(append list item) = (cond ((null? list) (cons item null))
                           (#t (cons (car list (append (cdr list) item)))))

问题是 Mathemtica 中的情况是否类似?在大多数方面,Mathematica 的列表似乎像 lisp 的列表一样是单链接的,如果是这样,我们可以假设 Append[list,item] 比 Prepend[list,item] 昂贵得多。但是,我无法在 Mathematica 文档中找到任何内容来解决这个问题。如果 Mathematica 的列表是双向链接的或更巧妙地实现的,例如在堆中或仅维护指向最后一个的指针,则插入可能具有完全不同的性能配置文件。

任何建议或经验将不胜感激。

in Lisp-like systems, cons is the normal way to PREPEND an element to a list. Functions that append to a list are much more expensive because they walk the list to the end and then replace the final null with a reference to the appended item. IOW (pseudoLisp)

(prepend list item) = (cons item list) = cheap!
(append list item) = (cond ((null? list) (cons item null))
                           (#t (cons (car list (append (cdr list) item)))))

Question is whether the situation is similar in Mathemtica? In most regards, Mathematica's lists seem to be singly-linked like lisp's lists, and, if so, we may presume that Append[list,item] is much more expensive than Prepend[list,item]. However, I wasn't able to find anything in the Mathematica documentation to address this question. If Mathematica's lists are doubly-linked or implemented more cleverly, say, in a heap or just maintaining a pointer-to-last, then insertion may have a completely different performance profile.

Any advice or experience would be appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

£冰雨忧蓝° 2024-12-27 06:11:35

Mathematica 的列表不像 Common Lisp 中那样是单链表。最好将 mathematica 列表视为数组或类似向量的结构。插入的速度是O(n),但检索的速度是恒定的。

看看这个 href="http://library.wolfram.com/infocenter/Conferences/321/" rel="nofollow noreferrer">Mathematica 中的数据结构和高效算法,其中更详细地介绍了 mathematica 列表。

另外,请查看这个关于链表的Stack Overflow问题及其在mathematica中的表现。

Mathematica's lists are not singly linked lists like in Common Lisp. It is better to think of mathematica lists as array or vector like structures. The speed of insertion is O(n), but the speed of retrieval is constant.

Check out this page of Data structures and Efficient Algorithms in Mathematica which covers mathematica lists in further detail.

Additionally please check out this Stack Overflow question on linked lists and their performance in mathematica.

ぃ双果 2024-12-27 06:11:35

作为一个小补充,这里是 M- 中“AppendTo”的有效替代方案

myBag = Internal`Bag[]
Do[Internal`StuffBag[myBag, i], {i, 10}]
Internal`BagPart[myBag, All]

As a small add on, here is an efficient alternative to "AppendTo" in M-

myBag = Internal`Bag[]
Do[Internal`StuffBag[myBag, i], {i, 10}]
Internal`BagPart[myBag, All]
尾戒 2024-12-27 06:11:35

正如已经提到的,由于 Mathematica 列表是作为数组实现的,因此每次添加元素时,诸如 Append 和 Prepend 之类的操作都会导致列表被复制。更有效的方法是预先分配一个列表并填充它,但是我下面的实验并没有显示出我预期的那么大的差异。显然,更好的是链表方法,我将不得不研究它。

Needs["PlotLegends`"]
test[n_] := Module[{startlist = Range[1000]},
   datalist = RandomReal[1, n*1000];
   appendlist = startlist;
   appendtime = 
    First[AbsoluteTiming[AppendTo[appendlist, #] & /@ datalist]];
   preallocatedlist = Join[startlist, Table[Null, {Length[datalist]}]];
   count = -1;
   preallocatedtime = 
    First[AbsoluteTiming[
      Do[preallocatedlist[[count]] = datalist[[count]]; 
       count--, {Length[datalist]}]]];
   {{n, appendtime}, {n, preallocatedtime}}];
results = test[#] & /@ Range[26];
ListLinePlot[Transpose[results], Filling -> Axis, 
 PlotLegend -> {"Appending", "Preallocating"}, 
 LegendPosition -> {1, 0}]

比较 AppendTo 与预分配的时序图。 (运行时间:82 秒)

timing Chart

编辑

使用 nixeagle 的建议修改大大改善了预分配时序,即 preallocatedlist = Join[startlist, ConstantArray[0, {Length[datalist]}]];

在此处输入图像描述

第二次编辑

形式为 {{{startlist},data1} 的链接列表,data2} 效果更好,并且有一个很大的优点,即不需要像预分配那样提前知道大小。

Needs["PlotLegends`"]
test[n_] := Module[{startlist = Range[1000]},
   datalist = RandomReal[1, n*1000];
   linkinglist = startlist;
   linkedlisttime = 
    First[AbsoluteTiming[
      Do[linkinglist = {linkinglist, datalist[[i]]}, {i, 
        Length[datalist]}];
      linkedlist = Flatten[linkinglist];]];
   preallocatedlist = 
    Join[startlist, ConstantArray[0, {Length[datalist]}]];
   count = -1;
   preallocatedtime = 
    First[AbsoluteTiming[
      Do[preallocatedlist[[count]] = datalist[[count]]; 
       count--, {Length[datalist]}]]];
   {{n, preallocatedtime}, {n, linkedlisttime}}];
results = test[#] & /@ Range[26];
ListLinePlot[Transpose[results], Filling -> Axis, 
 PlotLegend -> {"Preallocating", "Linked-List"}, 
 LegendPosition -> {1, 0}]

链表与预分配的时序比较。 (运行时间:6 秒)

在此处输入图像描述

Since, as already mentioned, Mathematica lists are implemented as arrays, operations like Append and Prepend cause the list to be copied every time an element is added. A more efficient method is to preallocate a list and fill it, however my experiment below didn't show as great a difference as I expected. Better still, apparently, is the linked-list method, which I shall have to investigate.

Needs["PlotLegends`"]
test[n_] := Module[{startlist = Range[1000]},
   datalist = RandomReal[1, n*1000];
   appendlist = startlist;
   appendtime = 
    First[AbsoluteTiming[AppendTo[appendlist, #] & /@ datalist]];
   preallocatedlist = Join[startlist, Table[Null, {Length[datalist]}]];
   count = -1;
   preallocatedtime = 
    First[AbsoluteTiming[
      Do[preallocatedlist[[count]] = datalist[[count]]; 
       count--, {Length[datalist]}]]];
   {{n, appendtime}, {n, preallocatedtime}}];
results = test[#] & /@ Range[26];
ListLinePlot[Transpose[results], Filling -> Axis, 
 PlotLegend -> {"Appending", "Preallocating"}, 
 LegendPosition -> {1, 0}]

Timing chart comparing AppendTo against preallocating. (Run time: 82 seconds)

timing chart

Edit

Using nixeagle's suggested modification improved the preallocation timing considerably, i.e. with preallocatedlist = Join[startlist, ConstantArray[0, {Length[datalist]}]];

enter image description here

Second Edit

A linked-list of the form {{{startlist},data1},data2} works even better, and has the great advantage that the size does not need to be known in advance, as it does for preallocating.

Needs["PlotLegends`"]
test[n_] := Module[{startlist = Range[1000]},
   datalist = RandomReal[1, n*1000];
   linkinglist = startlist;
   linkedlisttime = 
    First[AbsoluteTiming[
      Do[linkinglist = {linkinglist, datalist[[i]]}, {i, 
        Length[datalist]}];
      linkedlist = Flatten[linkinglist];]];
   preallocatedlist = 
    Join[startlist, ConstantArray[0, {Length[datalist]}]];
   count = -1;
   preallocatedtime = 
    First[AbsoluteTiming[
      Do[preallocatedlist[[count]] = datalist[[count]]; 
       count--, {Length[datalist]}]]];
   {{n, preallocatedtime}, {n, linkedlisttime}}];
results = test[#] & /@ Range[26];
ListLinePlot[Transpose[results], Filling -> Axis, 
 PlotLegend -> {"Preallocating", "Linked-List"}, 
 LegendPosition -> {1, 0}]

Timing comparison of linked-list vs preallocating. (Run time: 6 seconds)

enter image description here

你丑哭了我 2024-12-27 06:11:35

如果您知道结果将有多少个元素并且可以计算元素,那么整个 Append、AppendTo、Linked-List 等都是不必要的。在克里斯的速度测试中,预分配才起作用,因为他提前知道元素的数量。对datelist的访问操作代表对当前元素的虚拟计算。

如果是这样的情况,我绝对不会使用这样的方法。一个简单的表与连接相结合会更快。让我重用 Chris 的代码:我将预分配添加到时间测量中,因为当使用 Append 或链表时,也会测量内存分配。此外,我确实使用结果列表并检查它们是否相等,因为聪明的解释器可能会识别简单、无用的命令并优化它们。

Needs["PlotLegends`"]
test[n_] := Module[{
    startlist = Range[1000],
    datalist, joinResult, linkedResult, linkinglist, linkedlist, 
    preallocatedlist, linkedlisttime, preallocatedtime, count, 
    joinTime, preallocResult},


   datalist = RandomReal[1, n*1000];
   linkinglist = startlist;
   {linkedlisttime, linkedResult} = 
    AbsoluteTiming[
     Do[linkinglist = {linkinglist, datalist[[i]]}, {i, 
       Length[datalist]}];
     linkedlist = Flatten[linkinglist]
     ];

   count = -1;
   preallocatedtime = First@AbsoluteTiming[
      (preallocatedlist = 
        Join[startlist, ConstantArray[0, {Length[datalist]}]];
       Do[preallocatedlist[[count]] = datalist[[count]];
        count--, {Length[datalist]}]
       )
      ];

   {joinTime, joinResult} =
    AbsoluteTiming[
     Join[startlist, 
      Table[datalist[[i]], {i, 1, Length[datalist]}]]];
   PrintTemporary[
    Equal @@@ Tuples[{linkedResult, preallocatedlist, joinResult}, 2]];
   {preallocatedtime, linkedlisttime, joinTime}];

results = test[#] & /@ Range[40];
ListLinePlot[Transpose[results], PlotStyle -> {Black, Gray, Red}, 
 PlotLegend -> {"Prealloc", "Linked", "Joined"}, 
 LegendPosition -> {1, 0}]

在此处输入图像描述

在我看来,有趣的情况是,当您事先不知道元素的数量并且您必须临时决定是否必须附加/前置某些内容。在这些情况下,Reap[] 和 Sow[] 可能值得一看。一般来说,我会说,AppendTo 是邪恶的,在使用它之前,请看看替代方案:

n = 10.^5 - 1;
res1 = {};
t1 = First@AbsoluteTiming@Table[With[{y = Sin[x]},
      If[y > 0, AppendTo[res1, y]]], {x, 0, 2 Pi, 2 Pi/n}
     ];

{t2, res2} = AbsoluteTiming[With[{r = Release@Table[
        With[{y = Sin[x]},
         If[y > 0, y, Hold@Sequence[]]], {x, 0, 2 Pi, 2 Pi/n}]},
    r]];

{t3, res3} = AbsoluteTiming[Flatten@Table[
     With[{y = Sin[x]},
      If[y > 0, y, {}]], {x, 0, 2 Pi, 2 Pi/n}]];

{t4, res4} = AbsoluteTiming[First@Last@Reap@Table[With[{y = Sin[x]},
        If[y > 0, Sow[y]]], {x, 0, 2 Pi, 2 Pi/n}]];

{res1 == res2, res2 == res3, res3 == res4}
{t1, t2, t3, t4}

Gives {5.151575, 0.250336, 0.128624, 0.148084}。 幸运的是,该结构

Flatten@Table[ With[{y = Sin[x]}, If[y > 0, y, {}]], ...]

确实具有可读性并且速度很快。

备注

在家里尝试最后一个例子时要小心。在这里,在我的 Ubuntu 64 位和 Mma 8.0.4 上,n=10^5 的 AppendTo 需要 10GB 内存。 n=10^6 占用我所有的 RAM(32GB)来创建一个包含 15MB 数据的数组。有趣的。

If you know how many elements your result will have and if you can calculate your elements, then the whole Append, AppendTo, Linked-List, etc is not necessary. In the speed-test of Chris, the preallocation only works, because he knows the number of elements in advance. The access operation to datelist stands for the virtual calculation of the current element.

If the situation is like that, I would never use such an approach. A simple Table combined with a Join is the hell faster. Let me reuse Chris' code: I add the preallocation to the time measurement, because when using Append or the linked list, the memory allocation is measured too. Furthermore, I really use the resulting lists and check wether they are equal, because a clever interpreter maybe would recognize simple, useless commands an optimize these out.

Needs["PlotLegends`"]
test[n_] := Module[{
    startlist = Range[1000],
    datalist, joinResult, linkedResult, linkinglist, linkedlist, 
    preallocatedlist, linkedlisttime, preallocatedtime, count, 
    joinTime, preallocResult},


   datalist = RandomReal[1, n*1000];
   linkinglist = startlist;
   {linkedlisttime, linkedResult} = 
    AbsoluteTiming[
     Do[linkinglist = {linkinglist, datalist[[i]]}, {i, 
       Length[datalist]}];
     linkedlist = Flatten[linkinglist]
     ];

   count = -1;
   preallocatedtime = First@AbsoluteTiming[
      (preallocatedlist = 
        Join[startlist, ConstantArray[0, {Length[datalist]}]];
       Do[preallocatedlist[[count]] = datalist[[count]];
        count--, {Length[datalist]}]
       )
      ];

   {joinTime, joinResult} =
    AbsoluteTiming[
     Join[startlist, 
      Table[datalist[[i]], {i, 1, Length[datalist]}]]];
   PrintTemporary[
    Equal @@@ Tuples[{linkedResult, preallocatedlist, joinResult}, 2]];
   {preallocatedtime, linkedlisttime, joinTime}];

results = test[#] & /@ Range[40];
ListLinePlot[Transpose[results], PlotStyle -> {Black, Gray, Red}, 
 PlotLegend -> {"Prealloc", "Linked", "Joined"}, 
 LegendPosition -> {1, 0}]

enter image description here

In my opinion, the interesting situations are, when you don't know the number of elements in advance and you have to decide ad hoc whether or not you have to append/prepend something. In those cases Reap[] and Sow[] maybe worth a look. In general I would say, AppendTo is evil and before using it, have a look at the alternatives:

n = 10.^5 - 1;
res1 = {};
t1 = First@AbsoluteTiming@Table[With[{y = Sin[x]},
      If[y > 0, AppendTo[res1, y]]], {x, 0, 2 Pi, 2 Pi/n}
     ];

{t2, res2} = AbsoluteTiming[With[{r = Release@Table[
        With[{y = Sin[x]},
         If[y > 0, y, Hold@Sequence[]]], {x, 0, 2 Pi, 2 Pi/n}]},
    r]];

{t3, res3} = AbsoluteTiming[Flatten@Table[
     With[{y = Sin[x]},
      If[y > 0, y, {}]], {x, 0, 2 Pi, 2 Pi/n}]];

{t4, res4} = AbsoluteTiming[First@Last@Reap@Table[With[{y = Sin[x]},
        If[y > 0, Sow[y]]], {x, 0, 2 Pi, 2 Pi/n}]];

{res1 == res2, res2 == res3, res3 == res4}
{t1, t2, t3, t4}

Gives {5.151575, 0.250336, 0.128624, 0.148084}. The construct

Flatten@Table[ With[{y = Sin[x]}, If[y > 0, y, {}]], ...]

is luckily really readable and fast.

Remark

Be careful trying this last example at home. Here, on my Ubuntu 64bit and Mma 8.0.4 the AppendTo with n=10^5 takes 10GB of Memory. n=10^6 takes all of my RAM which is 32GB to create an array containing 15MB of data. Funny.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文