优化零件提取

发布于 2024-11-04 06:00:50 字数 2125 浏览 0 评论 0原文

我有一个特定的函数来提取以下形式的列表部分： Give[list, elem] 返回 list 中与 elem 位置相对应的部分 在全局 $Reference 变量中（如果已定义）。我在我的代码中大量使用这个函数，所以我决定优化它。这是我到目前为止所取得的进展，但坦率地说，我不知道如何前进。

ClearAll[Give, $Reference, set];

Give::noref = "No, non-list or empty $Reference was defined to refer to by Give.";
Give::noelem = "Element (or some of the elements in) `1` is is not part of the reference set `2`.";
Give::nodepth = "Give cannot return all the elements corresponding to `1` as the list only has depth `2`.";

give[list_, elem_List, ref_] := Flatten[Pick[list, ref, #] & /@ elem, 1];
give[list_, elem_, ref_] := First@Pick[list, ref, elem];

Options[Give] = {Reference :> $Reference}; (* RuleDelayed is necessary, for it is possible that $Reference changes between two subsequent Give calls, and without delaying its assignment, ref would use previous value of $Reference instead of actual one. *)
Give[list_List, elem___, opts___?OptionQ] := Module[{ref, pos},
   ref = Reference /. {opts} /. Options@Give;
   Which[
      Or[ref === {}, Head@ref =!= List], Message[Give::noref]; {},
      Complement[Union@Flatten@{elem}, ref] =!= {}, Message[Give::noelem, elem, ref]; {},
      Length@{elem} > Depth@list - 1, Message[Give::nodepth, {elem}, Depth@list]; {},
      True, Fold[give[#1, #2, ref] &, list, {elem}]
]];



In[106]:= $Reference = {"A", "B", "C"};
set = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};

Give[set, "B"](* return specified row *)
Out[108]= {4, 5, 6}

In[109]:= Give[set, "B", "A"] (* return entry at specified row & column *)
Out[109]= 4

In[110]:= Give[set, {"B", "A"}] (* return multiple rows *)
Out[110]= {{4, 5, 6}, {1, 2, 3}}

我决定放弃不同的签名函数调用，因为列表版本可能会调用非列表版本，这意味着必须多次执行错误处理（对于列表中的每个元素）。遗憾的是，错误处理不能被丢弃。如果改进的版本更强大（例如可以处理更多维度），那不是问题，但是上面的示例就足够了。

In[139]:= First@Timing[Give[set, RandomChoice[$Reference, 10000]]] (* 1D test *)

Out[139]= 0.031

In[138]:= First@Timing[Table[Give[set, Sequence @@ RandomChoice[$Reference, 2]], {10000}]] (* 2d test *)

Out[138]= 0.499

我确信这不是有效的代码，所以请随意改进它。任何帮助都是值得赞赏的，即使它只减少了几纳秒。

原文

I have this specific function to extract parts of a list in the form: Give[list, elem] returns the part of list that corresponds to the position of elem in a global $Reference variable (if defined). I use this function heavily throughout my code, so I decided to optimize it. This is where I managed to get so far, but frankly, I have no idea how to advance.

ClearAll[Give, $Reference, set];

Give::noref = "No, non-list or empty $Reference was defined to refer to by Give.";
Give::noelem = "Element (or some of the elements in) `1` is is not part of the reference set `2`.";
Give::nodepth = "Give cannot return all the elements corresponding to `1` as the list only has depth `2`.";

give[list_, elem_List, ref_] := Flatten[Pick[list, ref, #] & /@ elem, 1];
give[list_, elem_, ref_] := First@Pick[list, ref, elem];

Options[Give] = {Reference :> $Reference}; (* RuleDelayed is necessary, for it is possible that $Reference changes between two subsequent Give calls, and without delaying its assignment, ref would use previous value of $Reference instead of actual one. *)
Give[list_List, elem___, opts___?OptionQ] := Module[{ref, pos},
   ref = Reference /. {opts} /. Options@Give;
   Which[
      Or[ref === {}, Head@ref =!= List], Message[Give::noref]; {},
      Complement[Union@Flatten@{elem}, ref] =!= {}, Message[Give::noelem, elem, ref]; {},
      Length@{elem} > Depth@list - 1, Message[Give::nodepth, {elem}, Depth@list]; {},
      True, Fold[give[#1, #2, ref] &, list, {elem}]
]];



In[106]:= $Reference = {"A", "B", "C"};
set = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};

Give[set, "B"](* return specified row *)
Out[108]= {4, 5, 6}

In[109]:= Give[set, "B", "A"] (* return entry at specified row & column *)
Out[109]= 4

In[110]:= Give[set, {"B", "A"}] (* return multiple rows *)
Out[110]= {{4, 5, 6}, {1, 2, 3}}

I've decided to drop distinct signature function calls, as the list version might call the non-list version, which means that error handling has to be done multiple times (for each element in the list). Sadly, the error handling cannot be discarded. If the improved version is more robust (can e.g. handle more dimensions), that's not a problem, however the examples above will suffice.

In[139]:= First@Timing[Give[set, RandomChoice[$Reference, 10000]]] (* 1D test *)

Out[139]= 0.031

In[138]:= First@Timing[Table[Give[set, Sequence @@ RandomChoice[$Reference, 2]], {10000}]] (* 2d test *)

Out[138]= 0.499

I'm sure this is not efficient code, so feel free to improve it. Any help is appreciated, even if it trims off only a few nanoseconds.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谁人与我共长歌 2024-11-11 06:00:50

大型列表的主要效率问题似乎来自映射 Pick。如果您将 give 的相应定义替换为以下定义，则可以避免这种情况：

give[list_, elem_List, ref_] := 
    list[[elem /. Dispatch[Thread[ref -> Range[Length[ref]]]]]];

这是我的测试代码：

In[114]:= 
  Block[{$Reference = Range[100000],set = Range[100000]^2,rnd,ftiming,stiming},
      rnd = RandomSample[$Reference,10000];
      ftiming = First@Timing[res1 = Give[set,rnd]];
      Block[{give},
        give[list_,elem_List,ref_]:=list[[elem/.Dispatch[Thread[ref->Range[Length[ref]]]]]];
        give[list_,elem_,ref_]:=First@Pick[list,ref,elem];
        stiming = First@Timing[res2 = Give[set,rnd]];];
   {ftiming,stiming,res1===res2}
]

Out[114]= {1.703,0.188,True}

对于此用例，您可以在此处获得 10 倍的速度提升。我没有测试 2D 的，但我猜它也应该有帮助。

编辑

您可以通过缓存$Reference的分派表来进一步提高性能（Dispatch[Thread[ref->Range[Length[$Reference]]]< /code>) 在 Give 主体的开头一次，然后将其传递给 give （显式或通过使 give 成为一个内部函数 - 通过 Module 变量 - 将引用它），这样当您通过 Fold 多次调用 give 时，就不必重新计算它。您还可以有条件地执行此操作，假设您在 elem 中有大量元素列表，以证明创建调度表所需的时间是合理的。

The main efficiency problem for large lists seems to come from mapping Pick. This can be avoided if you replace the corresponding definition for give with this one:

give[list_, elem_List, ref_] := 
    list[[elem /. Dispatch[Thread[ref -> Range[Length[ref]]]]]];

Here is my test code:

In[114]:= 
  Block[{$Reference = Range[100000],set = Range[100000]^2,rnd,ftiming,stiming},
      rnd = RandomSample[$Reference,10000];
      ftiming = First@Timing[res1 = Give[set,rnd]];
      Block[{give},
        give[list_,elem_List,ref_]:=list[[elem/.Dispatch[Thread[ref->Range[Length[ref]]]]]];
        give[list_,elem_,ref_]:=First@Pick[list,ref,elem];
        stiming = First@Timing[res2 = Give[set,rnd]];];
   {ftiming,stiming,res1===res2}
]

Out[114]= {1.703,0.188,True}

You get 10 - fold speed increase here, for this use case. I did not test the 2D one, but would guess it should help there too.

EDIT

You could further improve performance by caching the dispatched table for $Reference (Dispatch[Thread[ref->Range[Length[$Reference]]]) once at the start in the body of Give, and then pass it to give (either explicitly or by making give an inner function - through Module variables - which would refer to it), so that you don't have to recompute it in case when you call give several times through Fold. You can also do that conditionally, say of you have large lists of elements in elem, to justify the time needed to create the dispatch table.

回复收藏 0 原文

荒芜了季节 2024-11-11 06:00:50

这是该问题的另一个解决方案，基于我对实数进行索引的问题。如果需要，它使用惰性求值来显示错误消息（我在这个网站上学到的技巧！感谢所有人的奉献，在这里学习新东西总是很高兴！）

ListToIndexFunction[list_List,precision_:0.00001]:=
   Module[{numbersToIndexFunction},

      numbersToIndexFunction::indexNotFound="Index of `1` not found.";

      MapThread[(numbersToIndexFunction[#1]=#2)&,{Round[list,precision],Range[Length@list]}];
      numbersToIndexFunction[x_]/;(Message[numbersToIndexFunction::indexNotFound,x];False):=Null;

      numbersToIndexFunction[Round[#,precision]]&
   ];

Test: 
f=ListToIndexFunction[{1.23,2.45666666666,3}]
f[2.456666]
f[2.456665]

Here is another solution for this problem based on a problem I had for indexing real numbers. It uses lazy evaluation to display an error message if needed (a trick I learned on this site! Thanks to all for your dedication, it's always a pleasure to learn new stuff here!)

ListToIndexFunction[list_List,precision_:0.00001]:=
   Module[{numbersToIndexFunction},

      numbersToIndexFunction::indexNotFound="Index of `1` not found.";

      MapThread[(numbersToIndexFunction[#1]=#2)&,{Round[list,precision],Range[Length@list]}];
      numbersToIndexFunction[x_]/;(Message[numbersToIndexFunction::indexNotFound,x];False):=Null;

      numbersToIndexFunction[Round[#,precision]]&
   ];

Test: 
f=ListToIndexFunction[{1.23,2.45666666666,3}]
f[2.456666]
f[2.456665]

回复收藏 0 原文

我纯我任性 2024-11-11 06:00:50

这与 Leonid 的答案类似，但以我自己的风格。

我使用相同的 Dispatch 表，并且我建议将其尽可能设置为外部表。为此，我建议使用一个新符号 $Rules，只要 $Reference 发生更改，该符号就会更新。例如：

$Reference = RandomSample["A"~CharacterRange~"Z"];

$Rules = Dispatch@Thread[$Reference -> Range@Length@$Reference];

为了方便起见，如果经常这样做（询问），可以将其设置为自动。

除此之外，我的完整代码：

ClearAll[Give, $Reference, Reference, $Rules];

Give::noref = "No, non-list or empty $Reference was defined to refer to by Give.";
Give::noelem = "Element (or some of the elements in) `1` is is not part of the reference set `2`.";
Give::nodepth = "Give cannot return all the elements corresponding to `1` as the list only has depth `2`.";

Options[Give] = {Reference :> $Reference};

Give[list_List, elem___, opts : OptionsPattern[]] := 
  Module[{ref, pos, rls},
   ref = OptionValue[Reference];
   rls = If[{opts} == {}, $Rules, Dispatch@Thread[ref -> Range@Length@ref]];
   Which[
    ref === {} || Head@ref =!= List,
        Message[Give::noref]; {},
    Complement[Union@Flatten@{elem}, ref] =!= {},
        Message[Give::noelem, elem, ref]; {},
    Length@{elem} > Depth@list - 1, 
        Message[Give::nodepth, {elem}, Depth@list]; {},
    True,
        list[[##]] & @@ ({elem} /. rls)
   ]
  ];

This is similar to Leonid's answer, but in my own style.

I use the same Dispatch table, and I recommend making this as external as possible. To this end, I suggest a new symbol $Rules that is updated whenever $Reference is changed. For example:

$Reference = RandomSample["A"~CharacterRange~"Z"];

$Rules = Dispatch@Thread[$Reference -> Range@Length@$Reference];

This can be made automatic for convenience, if it is done frequently (ask).

Aside from this, my complete code:

ClearAll[Give, $Reference, Reference, $Rules];

Give::noref = "No, non-list or empty $Reference was defined to refer to by Give.";
Give::noelem = "Element (or some of the elements in) `1` is is not part of the reference set `2`.";
Give::nodepth = "Give cannot return all the elements corresponding to `1` as the list only has depth `2`.";

Options[Give] = {Reference :> $Reference};

Give[list_List, elem___, opts : OptionsPattern[]] := 
  Module[{ref, pos, rls},
   ref = OptionValue[Reference];
   rls = If[{opts} == {}, $Rules, Dispatch@Thread[ref -> Range@Length@ref]];
   Which[
    ref === {} || Head@ref =!= List,
        Message[Give::noref]; {},
    Complement[Union@Flatten@{elem}, ref] =!= {},
        Message[Give::noelem, elem, ref]; {},
    Length@{elem} > Depth@list - 1, 
        Message[Give::nodepth, {elem}, Depth@list]; {},
    True,
        list[[##]] & @@ ({elem} /. rls)
   ]
  ];

回复收藏 0 原文

清风挽心 2024-11-11 06:00:50

这是我把这段代码搁置了两年后得到的结果。它记住给定参考集的调度表，并使用 Part 类型语法。我删除了所有错误消息，还删除了全局 $Reference 符号。非常不像Mathematica，而且我从来不喜欢它。

dispatch[ref_] := dispatch@ref = (Dispatch@Thread[ref -> Range@Length@ref]);
give[list_, elem__, ref_] := list[[Sequence @@ ({elem} /. dispatch@ref)]];

记忆化确保给定ref的调度表仅计算一次。在内存中维护多个调度表不是问题，因为这些表通常很小。

ref = Reference = {"A", "B", "C"};
set = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};

give[set, "B", ref]          (* ==> {4, 5, 6}              *)
give[set, "B", "A", ref]     (* ==> 4                      *)
give[set, {"B", "A"}, ref]   (* ==> {{4, 5, 6}, {1, 2, 3}} *)

定时：

n = 20000;
{
First@Timing[give[set, #, ref] & /@ RandomChoice[ref, n]],
First@Timing[give[set, RandomChoice[ref, n], ref]],
First@Timing[Table[give[set, Sequence @@ RandomChoice[ref, 2], ref], {n}]]
}

<前><代码>{0.140401, 0., 0.202801}

将其与原始函数的时间进行比较：

{
First@Timing[Give[set, #] & /@ RandomChoice[ref, n]],
First@Timing[Give[set, RandomChoice[ref, n]]],
First@Timing[Table[Give[set, Sequence @@ RandomChoice[ref, 2]], {n}]]
}

<前><代码>{0.780005, 0.015600, 1.029607}

This is what I got after letting this piece of code rest for 2 years. It memoizes the dispatch table for the given reference set, and uses the Part-type syntax. I got rid of all the error messages and also dropped the global $Reference symbol. Very un-Mathematica-like and I never liked it.

dispatch[ref_] := dispatch@ref = (Dispatch@Thread[ref -> Range@Length@ref]);
give[list_, elem__, ref_] := list[[Sequence @@ ({elem} /. dispatch@ref)]];

Memoization ensures that the dispatch table for a given ref is only calculated once. Maintaining multiple dispatch tables in memory is not a problem as these are usually small.

ref = Reference = {"A", "B", "C"};
set = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}};

give[set, "B", ref]          (* ==> {4, 5, 6}              *)
give[set, "B", "A", ref]     (* ==> 4                      *)
give[set, {"B", "A"}, ref]   (* ==> {{4, 5, 6}, {1, 2, 3}} *)

Timing:

n = 20000;
{
First@Timing[give[set, #, ref] & /@ RandomChoice[ref, n]],
First@Timing[give[set, RandomChoice[ref, n], ref]],
First@Timing[Table[give[set, Sequence @@ RandomChoice[ref, 2], ref], {n}]]
}

{0.140401, 0., 0.202801}

Compare this to the timings of the original function:

{
First@Timing[Give[set, #] & /@ RandomChoice[ref, n]],
First@Timing[Give[set, RandomChoice[ref, n]]],
First@Timing[Table[Give[set, Sequence @@ RandomChoice[ref, 2]], {n}]]
}