Mathematica：列表上的条件运算

发布于 2024-11-09 12:35:48 字数 549 浏览 0 评论 0原文

我想对一列中的“行”进行平均。即在另一列中具有相同值的行。

例如：

e= {{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2}, 
   {69, 7, 30, 38, 16, 70, 97, 50, 97, 31, 81, 96, 60, 52, 35, 6, 
    24, 65, 76, 100}}

在此处输入图像描述

我想对第二列中具有相同值的所有值进行平均第一个。

所以这里：第 1 列的平均值 = 1 & Col 1 = 2

然后使用此操作的结果创建第三列。因此，前 10 行和接下来的 10 行中该列中的值应该相同。

非常感谢您提供的任何帮助！

输出理想格式：

在此处输入图像描述

原文

I would like to average across "Rows" in a column. That is rows that have the same value in another column.

For example :

e= {{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2}, 
   {69, 7, 30, 38, 16, 70, 97, 50, 97, 31, 81, 96, 60, 52, 35, 6, 
    24, 65, 76, 100}}

enter image description here

I would like to average all the Value in the second column that have the same value in the first one.

So Here : The Average for Col 1 = 1 & Col 1 = 2

And then create a third column with the result of this operation. So the values in that columns should be the same for the first 10 lines an next 10.

Many Thanks for any help you could provide !

Output Ideal Format :

enter image description here

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沉默的熊 2024-11-16 12:35:48

有趣的问题。这是我想到的第一件事：

e[[All, {1}]] /. Reap[Sow[#2, #] & @@@ e, _, # -> Mean@#2 &][[2]];

ArrayFlatten[{{e, %}}] // TableForm

要进行四舍五入，您只需在上面的代码中的 Mean 之前添加 Round@：Round@Mean@#2

这是一个稍微快一点的方法，但我实际上更喜欢上面的 Sow/Reap 方法：

#[[1, 1]] -> Round@Mean@#[[All, 2]] & /@ GatherBy[e, First];

ArrayFlatten[{{e, e[[All, {1}]] /. %}}] // TableForm

如果第一列中有许多不同的元素，则通过应用 Dispatch 到替换之前生成的规则列表(/.) 已完成。此命令告诉 Mathematica 为规则列表构建和使用优化的内部格式。

这是一个速度较慢的变体，但我很喜欢它，无论如何都可以分享：

Module[{q},
  Reap[{#, Sow[#2,#], q@#} & @@@ e, _, (q@# = Mean@#2) &][[1]]
]

另外，一般提示，您可以将

Table[RandomInteger[{1, 100}], {20}] 替换为 < code>RandomInteger[{1, 100}, 20]

和 Join[{c}, {d}] // 转置 与 Transpose[{c, d}]。

Interesting problem. This is the first thing that came into my mind:

e[[All, {1}]] /. Reap[Sow[#2, #] & @@@ e, _, # -> Mean@#2 &][[2]];

ArrayFlatten[{{e, %}}] // TableForm

To get rounding you may simply add Round@ before Mean in the code above: Round@Mean@#2

Here is a slightly faster method, but I actually prefer the Sow/Reap one above:

#[[1, 1]] -> Round@Mean@#[[All, 2]] & /@ GatherBy[e, First];

ArrayFlatten[{{e, e[[All, {1}]] /. %}}] // TableForm

If you have many different elements in the first column, either of the solutions above can be made faster by applying Dispatch to the rule list that is produced, before the replacement (/.) is done. This command tells Mathematica to build and use an optimized internal format for the rules list.

Here is a variant that is slower, but I like it enough to share anyway:

Module[{q},
  Reap[{#, Sow[#2,#], q@#} & @@@ e, _, (q@# = Mean@#2) &][[1]]
]

Also, general tips, you can replace:

Table[RandomInteger[{1, 100}], {20}] with RandomInteger[{1, 100}, 20]

and Join[{c}, {d}] // Transpose with Transpose[{c, d}].

回复收藏 0 原文

怎樣才叫好 2024-11-16 12:35:48

算了，我也来参加一下吧。这是我的版本：

Flatten/@Flatten[Thread/@Transpose@{#,Mean/@#[[All,All,2]]}&@GatherBy[e,First],1]

我想应该足够快。

编辑

为了回应 @Mr.Wizard 的批评（我的第一个解决方案是重新排序列表），并探索问题的高性能角落，这里有 2 个替代解决方案

getMeans[e_] := 
Module[{temp = ConstantArray[0, Max[#[[All, 1, 1]]]]},
  temp[[#[[All, 1, 1]]]] = Mean /@ #[[All, All, 2]];
  List /@ temp[[e[[All, 1]]]]] &[GatherBy[e, First]];

getMeansSparse[e_] := 
Module[{temp = SparseArray[{Max[#[[All, 1, 1]]] -> 0}]},
  temp[[#[[All, 1, 1]]]] = Mean /@ #[[All, All, 2]];
  List /@ Normal@temp[[e[[All, 1]]]]] &[GatherBy[e, First]];

：一种是最快的，以内存换取速度，并且可以在键都是整数时应用，并且您的最大“键”值（示例中为 2）不太大。第二种解决方案不受后一个限制，但速度较慢。这是一个很大的对列表：

In[303]:= 
tst = RandomSample[#, Length[#]] &@
   Flatten[Map[Thread[{#, RandomInteger[{1, 100}, 300]}] &, 
      RandomSample[Range[1000], 500]], 1];

In[310]:= Length[tst]

Out[310]= 150000

In[311]:= tst[[;; 10]]

Out[311]= {{947, 52}, {597, 81}, {508, 20}, {891, 81}, {414, 47}, 
{849, 45}, {659, 69}, {841, 29}, {700, 98}, {858, 35}}

这里的密钥可以是从 1 到 1000，其中有 500 个，每个密钥有 300 个随机数。现在，一些基准测试：

In[314]:= (res0 = getMeans[tst]); // Timing

Out[314]= {0.109, Null}

In[317]:= (res1 = getMeansSparse[tst]); // Timing

Out[317]= {0.219, Null}

In[318]:= (res2 =  tst[[All, {1}]] /. 
 Reap[Sow[#2, #] & @@@ tst, _, # -> Mean@#2 &][[2]]); // Timing

Out[318]= {5.687, Null}

In[319]:= (res3 = tst[[All, {1}]] /. 
 Dispatch[
  Reap[Sow[#2, #] & @@@ tst, _, # -> Mean@#2 &][[2]]]); // Timing

Out[319]= {0.391, Null}

In[320]:= res0 === res1 === res2 === res3

Out[320]= True

我们可以看到，getMeans 在这里是最快的，getMeansSparse 第二快，@Mr.Wizard 的解决方案稍慢一些，但仅当我们使用Dispatch，否则会慢很多。我的和 @Mr.Wizard 的解决方案（使用 Dispatch）在精神上是相似的，速度差异是由于（稀疏）数组索引比哈希查找更有效。当然，只有当您的列表非常大时，所有这些才重要。

EDIT 2

这是 getMeans 的一个版本，它使用带有 C 目标的 Compile 并返回数值（而不是有理数）。它比 getMeans 快大约两倍，是我的解决方案中最快的。

getMeansComp = 
 Compile[{{e, _Integer, 2}},
   Module[{keys = e[[All, 1]], values = e[[All, 2]], sums = {0.} ,
      lengths = {0}, , i = 1, means = {0.} , max = 0, key = -1 , 
      len = Length[e]},
    max = Max[keys];
    sums = Table[0., {max}];
    lengths = Table[0, {max}];
    means = sums;
    Do[key = keys[[i]];
      sums[[key]] += values[[i]];
      lengths[[key]]++, {i, len}];
    means = sums/(lengths + (1 - Unitize[lengths]));
    means[[keys]]], CompilationTarget -> "C", RuntimeOptions -> "Speed"]

getMeansC[e_] := List /@ getMeansComp[e];

代码 1 - Unitize[lengths] 可防止未使用的密钥被零除。我们需要单独子列表中的每个数字，因此我们应该调用 getMeansC，而不是直接调用 getMeansComp。以下是一些测量结果：

In[180]:= (res1 = getMeans[tst]); // Timing

Out[180]= {0.11, Null}

In[181]:= (res2 = getMeansC[tst]); // Timing

Out[181]= {0.062, Null}

In[182]:= N@res1 == res2

Out[182]= True

这可能被认为是高度优化的数值解决方案。事实上，@Mr.Wizard 的完全通用、简洁和漂亮的解决方案仅慢了大约 6-8 倍，这对于后一个通用简洁的解决方案来说非常好，所以，除非你想从中挤出每一微秒，否则我会坚持使用 @Mr.Wizard 的（使用 Dispatch）。但了解如何优化代码以及可以优化到什么程度（您可以期望什么）很重要。

What the heck, I'll join the party. Here is my version:

Flatten/@Flatten[Thread/@Transpose@{#,Mean/@#[[All,All,2]]}&@GatherBy[e,First],1]

Should be fast enough I guess.

EDIT

In response to the critique of @Mr.Wizard (my first solution was reordering the list), and to explore a bit the high-performance corner of the problem, here are 2 alternative solutions:

getMeans[e_] := 
Module[{temp = ConstantArray[0, Max[#[[All, 1, 1]]]]},
  temp[[#[[All, 1, 1]]]] = Mean /@ #[[All, All, 2]];
  List /@ temp[[e[[All, 1]]]]] &[GatherBy[e, First]];

getMeansSparse[e_] := 
Module[{temp = SparseArray[{Max[#[[All, 1, 1]]] -> 0}]},
  temp[[#[[All, 1, 1]]]] = Mean /@ #[[All, All, 2]];
  List /@ Normal@temp[[e[[All, 1]]]]] &[GatherBy[e, First]];

The first one is the fastest, trading memory for speed, and can be applied when keys are all integers, and your maximal "key" value (2 in your example) is not too large. The second solution is free from the latter limitation, but is slower. Here is a large list of pairs:

In[303]:= 
tst = RandomSample[#, Length[#]] &@
   Flatten[Map[Thread[{#, RandomInteger[{1, 100}, 300]}] &, 
      RandomSample[Range[1000], 500]], 1];

In[310]:= Length[tst]

Out[310]= 150000

In[311]:= tst[[;; 10]]

Out[311]= {{947, 52}, {597, 81}, {508, 20}, {891, 81}, {414, 47}, 
{849, 45}, {659, 69}, {841, 29}, {700, 98}, {858, 35}}

The keys can be from 1 to 1000 here, 500 of them, and there are 300 random numbers for each key. Now, some benchmarks:

In[314]:= (res0 = getMeans[tst]); // Timing

Out[314]= {0.109, Null}

In[317]:= (res1 = getMeansSparse[tst]); // Timing

Out[317]= {0.219, Null}

In[318]:= (res2 =  tst[[All, {1}]] /. 
 Reap[Sow[#2, #] & @@@ tst, _, # -> Mean@#2 &][[2]]); // Timing

Out[318]= {5.687, Null}

In[319]:= (res3 = tst[[All, {1}]] /. 
 Dispatch[
  Reap[Sow[#2, #] & @@@ tst, _, # -> Mean@#2 &][[2]]]); // Timing

Out[319]= {0.391, Null}

In[320]:= res0 === res1 === res2 === res3

Out[320]= True

We can see that the getMeans is the fastest here, getMeansSparse the second fastest, and the solution of @Mr.Wizard is somewhat slower, but only when we use Dispatch, otherwise it is much slower. Mine and @Mr.Wizard's solutions (with Dispatch) are similar in spirit, the speed difference is due to (sparse) array indexing being more efficient than hash look-up. Of course, all this matters only when your list is really large.

EDIT 2

Here is a version of getMeans which uses Compile with a C target and returns numerical values (rather than rationals). It is about twice faster than getMeans, and the fastest of my solutions.

getMeansComp = 
 Compile[{{e, _Integer, 2}},
   Module[{keys = e[[All, 1]], values = e[[All, 2]], sums = {0.} ,
      lengths = {0}, , i = 1, means = {0.} , max = 0, key = -1 , 
      len = Length[e]},
    max = Max[keys];
    sums = Table[0., {max}];
    lengths = Table[0, {max}];
    means = sums;
    Do[key = keys[[i]];
      sums[[key]] += values[[i]];
      lengths[[key]]++, {i, len}];
    means = sums/(lengths + (1 - Unitize[lengths]));
    means[[keys]]], CompilationTarget -> "C", RuntimeOptions -> "Speed"]

getMeansC[e_] := List /@ getMeansComp[e];

The code 1 - Unitize[lengths] protects against division by zero for unused keys. We need every number in a separate sublist, so we should call getMeansC, not getMeansComp directly. Here are some measurements:

In[180]:= (res1 = getMeans[tst]); // Timing

Out[180]= {0.11, Null}

In[181]:= (res2 = getMeansC[tst]); // Timing

Out[181]= {0.062, Null}

In[182]:= N@res1 == res2

Out[182]= True

This can probably be considered a heavily optimized numerical solution. The fact that the fully general, brief and beautiful solution of @Mr.Wizard is only about 6-8 times slower speaks very well for the latter general concise solution, so, unless you want to squeeze every microsecond out of it, I'd stick with @Mr.Wizard's one (with Dispatch). But it's important to know how to optimize code, and also to what degree it can be optimized (what can you expect).

回复收藏 0 原文

还给你自由 2024-11-16 12:35:48

一个天真的方法可能是：

Table[
  Join[ i, {Select[Mean /@ SplitBy[e, First], First@# == First@i &][[1, 2]]}]
, {i, e}] // TableForm

(*
1   59  297/5
1   72  297/5
1   90  297/5
1   63  297/5
1   77  297/5
1   98  297/5
1   3   297/5
1   99  297/5
1   28  297/5
1   5   297/5
2   87  127/2
2   80  127/2
2   29  127/2
2   70  127/2
2   83  127/2
2   75  127/2
2   68  127/2
2   65  127/2
2   1   127/2
2   77  127/2
*)

您还可以使用以下方法创建原始列表：

e = Array[{Ceiling[#/10], RandomInteger[{1, 100}]} &, {20}]

编辑

回答@Mr.的评论

如果列表未按其第一个元素排序，您可以这样做：

Table[Join[
  i, {Select[
     Mean /@ SplitBy[SortBy[e, First], First], First@# == First@i &][[1,2]]}],
{i, e}] //TableForm

但是这个在你的例子中没有必要

A naive approach could be:

Table[
  Join[ i, {Select[Mean /@ SplitBy[e, First], First@# == First@i &][[1, 2]]}]
, {i, e}] // TableForm

(*
1   59  297/5
1   72  297/5
1   90  297/5
1   63  297/5
1   77  297/5
1   98  297/5
1   3   297/5
1   99  297/5
1   28  297/5
1   5   297/5
2   87  127/2
2   80  127/2
2   29  127/2
2   70  127/2
2   83  127/2
2   75  127/2
2   68  127/2
2   65  127/2
2   1   127/2
2   77  127/2
*)

You could also create your original list by using for example:

e = Array[{Ceiling[#/10], RandomInteger[{1, 100}]} &, {20}]

Edit

Answering @Mr.'s comments

If the list is not sorted by its first element, you can do:

Table[Join[
  i, {Select[
     Mean /@ SplitBy[SortBy[e, First], First], First@# == First@i &][[1,2]]}],
{i, e}] //TableForm

But this is not necessary in your example

回复收藏 0 原文

一袭水袖舞倾城 2024-11-16 12:35:48

为什么不堆起来呢？

我认为这是最直接/易于阅读的答案，尽管不一定是最快的。但令人惊奇的是，在 Mathematica 中你可以用多种方式来思考这样的问题。

正如其他人指出的那样，巫师先生的显然非常酷。

@Nasser，你的解决方案不能推广到 n 类，尽管它可以很容易地修改为这样做。

meanbygroup[table_] := Join @@ Table[
   Module[
     {sublistmean},
     sublistmean = Mean[sublist[[All, 2]]];
     Table[Append[item, sublistmean], {item, sublist}]
   ]
   , {sublist, GatherBy[table, #[[1]] &]}
       ]
(* On this dataset: *) 
meanbygroup[e]

Why not pile on?

I thought this was the most straightforward/easy-to-read answer, though not necessarily the fastest. But it's really amazing how many ways you can think of a problem like this in Mathematica.

Mr. Wizard's is obviously very cool as others have pointed out.

@Nasser, your solution doesn't generalize to n-classes, although it easily could be modified to do so.

meanbygroup[table_] := Join @@ Table[
   Module[
     {sublistmean},
     sublistmean = Mean[sublist[[All, 2]]];
     Table[Append[item, sublistmean], {item, sublist}]
   ]
   , {sublist, GatherBy[table, #[[1]] &]}
       ]
(* On this dataset: *) 
meanbygroup[e]

回复收藏 0 原文

无人接听 2024-11-16 12:35:48

哇，这里的答案如此先进且看起来很酷，需要更多时间来学习它们。

这是我的答案，我仍然是恢复和过渡中的矩阵/向量/Matlab'ish 人，所以我的解决方案不像这里的专家解决方案那样有效，我将数据视为矩阵和向量（对我来说比将它们视为更容易）列表的列表等...）所以这里它

sizeOfList=10; (*given from the problem, along with e vector*)
m1 = Mean[e[[1;;sizeOfList,2]]];
m2 = Mean[e[[sizeOfList+1;;2 sizeOfList,2]]];
r  = {Flatten[{a,b}], d , Flatten[{Table[m1,{sizeOfList}],Table[m2,{sizeOfList}]}]} //Transpose;

MatrixForm[r]

显然不如功能解决方案那么好。

好吧，我现在就走，远离函数式程序员:)

--Nasser

Wow, the answers here are so advanced and cool looking, Need more time to learn them.

Here is my answer, I am still matrix/vector/Matlab'ish guy in recovery and transition, so my solution is not functional like the experts solution here, I look at data as matrices and vectors (easier for me than looking at them as lists of lists etc...) so here it is

sizeOfList=10; (*given from the problem, along with e vector*)
m1 = Mean[e[[1;;sizeOfList,2]]];
m2 = Mean[e[[sizeOfList+1;;2 sizeOfList,2]]];
r  = {Flatten[{a,b}], d , Flatten[{Table[m1,{sizeOfList}],Table[m2,{sizeOfList}]}]} //Transpose;

MatrixForm[r]

Clearly not as a good a solution as the functional ones.

Ok, I will go now and hide away from the functional programmers :)

--Nasser

回复收藏 0 原文

~没有更多了~