优化零件提取
我有一个特定的函数来提取以下形式的列表部分: Give[list, elem]
返回 list 中与 elem 位置相对应的部分 在全局 $Reference
变量中(如果已定义)。我在我的代码中大量使用这个函数,所以我决定优化它。这是我到目前为止所取得的进展,但坦率地说,我不知道如何前进。
ClearAll[Give, $Reference, set];
Give::noref = "No, non-list or empty $Reference was defined to refer to by Give.";
Give::noelem = "Element (or some of the elements in) `1` is is not part of the reference set `2`.";
Give::nodepth = "Give cannot return all the elements corresponding to `1` as the list only has depth `2`.";
give[list_, elem_List, ref_] := Flatten[Pick[list, ref, #] & /@ elem, 1];
give[list_, elem_, ref_] := First@Pick[list, ref, elem];
Options[Give] = {Reference :> $Reference}; (* RuleDelayed is necessary, for it is possible that $Reference changes between two subsequent Give calls, and without delaying its assignment, ref would use previous value of $Reference instead of actual one. *)
Give[list_List, elem___, opts___?OptionQ] := Module[{ref, pos},
ref = Reference /. {opts} /. Options@Give;
Which[
Or[ref === {}, Head@ref =!= List], Message[Give::noref]; {},
Complement[Union@Flatten@{elem}, ref] =!= {}, Message[Give::noelem, elem, ref]; {},
Length@{elem} > Depth@list - 1, Message[Give::nodepth, {elem}, Depth@list]; {},
True, Fold[give[#1, #2, ref] &, list, {elem}]
]];
In[106]:= $Reference = {"A", "B", "C"};
set = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};
Give[set, "B"](* return specified row *)
Out[108]= {4, 5, 6}
In[109]:= Give[set, "B", "A"] (* return entry at specified row & column *)
Out[109]= 4
In[110]:= Give[set, {"B", "A"}] (* return multiple rows *)
Out[110]= {{4, 5, 6}, {1, 2, 3}}
我决定放弃不同的签名函数调用,因为列表版本可能会调用非列表版本,这意味着必须多次执行错误处理(对于列表中的每个元素)。遗憾的是,错误处理不能被丢弃。如果改进的版本更强大(例如可以处理更多维度),那不是问题,但是上面的示例就足够了。
In[139]:= First@Timing[Give[set, RandomChoice[$Reference, 10000]]] (* 1D test *)
Out[139]= 0.031
In[138]:= First@Timing[Table[Give[set, Sequence @@ RandomChoice[$Reference, 2]], {10000}]] (* 2d test *)
Out[138]= 0.499
我确信这不是有效的代码,所以请随意改进它。任何帮助都是值得赞赏的,即使它只减少了几纳秒。
I have this specific function to extract parts of a list in the form: Give[list, elem]
returns the part of list that corresponds to the position of elem in a global $Reference
variable (if defined). I use this function heavily throughout my code, so I decided to optimize it. This is where I managed to get so far, but frankly, I have no idea how to advance.
ClearAll[Give, $Reference, set];
Give::noref = "No, non-list or empty $Reference was defined to refer to by Give.";
Give::noelem = "Element (or some of the elements in) `1` is is not part of the reference set `2`.";
Give::nodepth = "Give cannot return all the elements corresponding to `1` as the list only has depth `2`.";
give[list_, elem_List, ref_] := Flatten[Pick[list, ref, #] & /@ elem, 1];
give[list_, elem_, ref_] := First@Pick[list, ref, elem];
Options[Give] = {Reference :> $Reference}; (* RuleDelayed is necessary, for it is possible that $Reference changes between two subsequent Give calls, and without delaying its assignment, ref would use previous value of $Reference instead of actual one. *)
Give[list_List, elem___, opts___?OptionQ] := Module[{ref, pos},
ref = Reference /. {opts} /. Options@Give;
Which[
Or[ref === {}, Head@ref =!= List], Message[Give::noref]; {},
Complement[Union@Flatten@{elem}, ref] =!= {}, Message[Give::noelem, elem, ref]; {},
Length@{elem} > Depth@list - 1, Message[Give::nodepth, {elem}, Depth@list]; {},
True, Fold[give[#1, #2, ref] &, list, {elem}]
]];
In[106]:= $Reference = {"A", "B", "C"};
set = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};
Give[set, "B"](* return specified row *)
Out[108]= {4, 5, 6}
In[109]:= Give[set, "B", "A"] (* return entry at specified row & column *)
Out[109]= 4
In[110]:= Give[set, {"B", "A"}] (* return multiple rows *)
Out[110]= {{4, 5, 6}, {1, 2, 3}}
I've decided to drop distinct signature function calls, as the list version might call the non-list version, which means that error handling has to be done multiple times (for each element in the list). Sadly, the error handling cannot be discarded. If the improved version is more robust (can e.g. handle more dimensions), that's not a problem, however the examples above will suffice.
In[139]:= First@Timing[Give[set, RandomChoice[$Reference, 10000]]] (* 1D test *)
Out[139]= 0.031
In[138]:= First@Timing[Table[Give[set, Sequence @@ RandomChoice[$Reference, 2]], {10000}]] (* 2d test *)
Out[138]= 0.499
I'm sure this is not efficient code, so feel free to improve it. Any help is appreciated, even if it trims off only a few nanoseconds.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
大型列表的主要效率问题似乎来自映射
Pick
。如果您将give
的相应定义替换为以下定义,则可以避免这种情况:这是我的测试代码:
对于此用例,您可以在此处获得 10 倍的速度提升。我没有测试 2D 的,但我猜它也应该有帮助。
编辑
您可以通过缓存
$Reference
的分派表来进一步提高性能(Dispatch[Thread[ref->Range[Length[$Reference]]]< /code>) 在
Give
主体的开头一次,然后将其传递给give
(显式或通过使give
成为一个内部函数 - 通过Module
变量 - 将引用它),这样当您通过Fold 多次调用
。您还可以有条件地执行此操作,假设您在give
时,就不必重新计算它elem
中有大量元素列表,以证明创建调度表所需的时间是合理的。The main efficiency problem for large lists seems to come from mapping
Pick
. This can be avoided if you replace the corresponding definition forgive
with this one:Here is my test code:
You get 10 - fold speed increase here, for this use case. I did not test the 2D one, but would guess it should help there too.
EDIT
You could further improve performance by caching the dispatched table for
$Reference
(Dispatch[Thread[ref->Range[Length[$Reference]]]
) once at the start in the body ofGive
, and then pass it togive
(either explicitly or by makinggive
an inner function - throughModule
variables - which would refer to it), so that you don't have to recompute it in case when you callgive
several times throughFold
. You can also do that conditionally, say of you have large lists of elements inelem
, to justify the time needed to create the dispatch table.这是该问题的另一个解决方案,基于我对实数进行索引的问题。如果需要,它使用惰性求值来显示错误消息(我在这个网站上学到的技巧!感谢所有人的奉献,在这里学习新东西总是很高兴!)
Here is another solution for this problem based on a problem I had for indexing real numbers. It uses lazy evaluation to display an error message if needed (a trick I learned on this site! Thanks to all for your dedication, it's always a pleasure to learn new stuff here!)
这与 Leonid 的答案类似,但以我自己的风格。
我使用相同的
Dispatch
表,并且我建议将其尽可能设置为外部表。为此,我建议使用一个新符号$Rules
,只要$Reference
发生更改,该符号就会更新。例如:为了方便起见,如果经常这样做(询问),可以将其设置为自动。
除此之外,我的完整代码:
This is similar to Leonid's answer, but in my own style.
I use the same
Dispatch
table, and I recommend making this as external as possible. To this end, I suggest a new symbol$Rules
that is updated whenever$Reference
is changed. For example:This can be made automatic for convenience, if it is done frequently (ask).
Aside from this, my complete code:
这是我把这段代码搁置了两年后得到的结果。它记住给定参考集的调度表,并使用
Part
类型语法。我删除了所有错误消息,还删除了全局$Reference
符号。非常不像Mathematica,而且我从来不喜欢它。记忆化确保给定
ref
的调度表仅计算一次。在内存中维护多个调度表不是问题,因为这些表通常很小。定时:
将其与原始函数的时间进行比较:
This is what I got after letting this piece of code rest for 2 years. It memoizes the dispatch table for the given reference set, and uses the
Part
-type syntax. I got rid of all the error messages and also dropped the global$Reference
symbol. Very un-Mathematica-like and I never liked it.Memoization ensures that the dispatch table for a given
ref
is only calculated once. Maintaining multiple dispatch tables in memory is not a problem as these are usually small.Timing:
Compare this to the timings of the original function: