Mathematica:列表上的条件运算
我想对一列中的“行”进行平均。即在另一列中具有相同值的行。
例如:
e= {{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2},
{69, 7, 30, 38, 16, 70, 97, 50, 97, 31, 81, 96, 60, 52, 35, 6,
24, 65, 76, 100}}
我想对第二列中具有相同值的所有值进行平均第一个。
所以这里:第 1 列的平均值 = 1 & Col 1 = 2
然后使用此操作的结果创建第三列。因此,前 10 行和接下来的 10 行中该列中的值应该相同。
非常感谢您提供的任何帮助!
LA
输出理想格式:
I would like to average across "Rows" in a column. That is rows that have the same value in another column.
For example :
e= {{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2},
{69, 7, 30, 38, 16, 70, 97, 50, 97, 31, 81, 96, 60, 52, 35, 6,
24, 65, 76, 100}}
I would like to average all the Value in the second column that have the same value in the first one.
So Here : The Average for Col 1 = 1 & Col 1 = 2
And then create a third column with the result of this operation. So the values in that columns should be the same for the first 10 lines an next 10.
Many Thanks for any help you could provide !
LA
Output Ideal Format :
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
有趣的问题。这是我想到的第一件事:
要进行四舍五入,您只需在上面的代码中的
Mean
之前添加Round@
:Round@Mean@#2
这是一个稍微快一点的方法,但我实际上更喜欢上面的
Sow
/Reap
方法:如果第一列中有许多不同的元素,则通过应用
Dispatch
到替换之前生成的规则列表(/.
) 已完成。此命令告诉 Mathematica 为规则列表构建和使用优化的内部格式。这是一个速度较慢的变体,但我很喜欢它,无论如何都可以分享:
另外,一般提示,您可以将
Table[RandomInteger[{1, 100}], {20}]
替换为 < code>RandomInteger[{1, 100}, 20]和
Join[{c}, {d}] // 转置
与Transpose[{c, d}]
。Interesting problem. This is the first thing that came into my mind:
To get rounding you may simply add
Round@
beforeMean
in the code above:Round@Mean@#2
Here is a slightly faster method, but I actually prefer the
Sow
/Reap
one above:If you have many different elements in the first column, either of the solutions above can be made faster by applying
Dispatch
to the rule list that is produced, before the replacement (/.
) is done. This command tells Mathematica to build and use an optimized internal format for the rules list.Here is a variant that is slower, but I like it enough to share anyway:
Also, general tips, you can replace:
Table[RandomInteger[{1, 100}], {20}]
withRandomInteger[{1, 100}, 20]
and
Join[{c}, {d}] // Transpose
withTranspose[{c, d}]
.算了,我也来参加一下吧。这是我的版本:
我想应该足够快。
编辑
为了回应 @Mr.Wizard 的批评(我的第一个解决方案是重新排序列表),并探索问题的高性能角落,这里有 2 个替代解决方案
:一种是最快的,以内存换取速度,并且可以在键都是整数时应用,并且您的最大“键”值(示例中为 2)不太大。第二种解决方案不受后一个限制,但速度较慢。这是一个很大的对列表:
这里的密钥可以是从 1 到 1000,其中有 500 个,每个密钥有 300 个随机数。现在,一些基准测试:
我们可以看到,
getMeans
在这里是最快的,getMeansSparse
第二快,@Mr.Wizard 的解决方案稍慢一些,但仅当我们使用Dispatch,否则会慢很多。我的和 @Mr.Wizard 的解决方案(使用 Dispatch)在精神上是相似的,速度差异是由于(稀疏)数组索引比哈希查找更有效。当然,只有当您的列表非常大时,所有这些才重要。EDIT 2
这是
getMeans
的一个版本,它使用带有 C 目标的Compile
并返回数值(而不是有理数)。它比 getMeans 快大约两倍,是我的解决方案中最快的。代码
1 - Unitize[lengths]
可防止未使用的密钥被零除。我们需要单独子列表中的每个数字,因此我们应该调用getMeansC
,而不是直接调用getMeansComp
。以下是一些测量结果:这可能被认为是高度优化的数值解决方案。事实上,@Mr.Wizard 的完全通用、简洁和漂亮的解决方案仅慢了大约 6-8 倍,这对于后一个通用简洁的解决方案来说非常好,所以,除非你想从中挤出每一微秒,否则我会坚持使用 @Mr.Wizard 的(使用
Dispatch
)。但了解如何优化代码以及可以优化到什么程度(您可以期望什么)很重要。What the heck, I'll join the party. Here is my version:
Should be fast enough I guess.
EDIT
In response to the critique of @Mr.Wizard (my first solution was reordering the list), and to explore a bit the high-performance corner of the problem, here are 2 alternative solutions:
The first one is the fastest, trading memory for speed, and can be applied when keys are all integers, and your maximal "key" value (2 in your example) is not too large. The second solution is free from the latter limitation, but is slower. Here is a large list of pairs:
The keys can be from 1 to 1000 here, 500 of them, and there are 300 random numbers for each key. Now, some benchmarks:
We can see that the
getMeans
is the fastest here,getMeansSparse
the second fastest, and the solution of @Mr.Wizard is somewhat slower, but only when we useDispatch
, otherwise it is much slower. Mine and @Mr.Wizard's solutions (with Dispatch) are similar in spirit, the speed difference is due to (sparse) array indexing being more efficient than hash look-up. Of course, all this matters only when your list is really large.EDIT 2
Here is a version of
getMeans
which usesCompile
with a C target and returns numerical values (rather than rationals). It is about twice faster thangetMeans
, and the fastest of my solutions.The code
1 - Unitize[lengths]
protects against division by zero for unused keys. We need every number in a separate sublist, so we should callgetMeansC
, notgetMeansComp
directly. Here are some measurements:This can probably be considered a heavily optimized numerical solution. The fact that the fully general, brief and beautiful solution of @Mr.Wizard is only about 6-8 times slower speaks very well for the latter general concise solution, so, unless you want to squeeze every microsecond out of it, I'd stick with @Mr.Wizard's one (with
Dispatch
). But it's important to know how to optimize code, and also to what degree it can be optimized (what can you expect).一个天真的方法可能是:
您还可以使用以下方法创建原始列表:
编辑
回答@Mr.的评论
如果列表未按其第一个元素排序,您可以这样做:
但是这个在你的例子中没有必要
A naive approach could be:
You could also create your original list by using for example:
Edit
Answering @Mr.'s comments
If the list is not sorted by its first element, you can do:
But this is not necessary in your example
为什么不堆起来呢?
我认为这是最直接/易于阅读的答案,尽管不一定是最快的。但令人惊奇的是,在 Mathematica 中你可以用多种方式来思考这样的问题。
正如其他人指出的那样,巫师先生的显然非常酷。
@Nasser,你的解决方案不能推广到 n 类,尽管它可以很容易地修改为这样做。
Why not pile on?
I thought this was the most straightforward/easy-to-read answer, though not necessarily the fastest. But it's really amazing how many ways you can think of a problem like this in Mathematica.
Mr. Wizard's is obviously very cool as others have pointed out.
@Nasser, your solution doesn't generalize to n-classes, although it easily could be modified to do so.
哇,这里的答案如此先进且看起来很酷,需要更多时间来学习它们。
这是我的答案,我仍然是恢复和过渡中的矩阵/向量/Matlab'ish 人,所以我的解决方案不像这里的专家解决方案那样有效,我将数据视为矩阵和向量(对我来说比将它们视为更容易)列表的列表等...)所以这里它
显然不如功能解决方案那么好。
好吧,我现在就走,远离函数式程序员:)
--Nasser
Wow, the answers here are so advanced and cool looking, Need more time to learn them.
Here is my answer, I am still matrix/vector/Matlab'ish guy in recovery and transition, so my solution is not functional like the experts solution here, I look at data as matrices and vectors (easier for me than looking at them as lists of lists etc...) so here it is
Clearly not as a good a solution as the functional ones.
Ok, I will go now and hide away from the functional programmers :)
--Nasser