导出张量内的值的平均值

发布于 2024-12-19 11:25:01 字数 558 浏览 2 评论 0原文

我有一个 20000 x 185 x 5 张量，看起来像

{{{a1_1,a2_1,a3_1,a4_1,a5_1},{b1_1,b2_1,b3_1,b4_1,b5_1}... 
(continue for 185 times)}
 {{a1_2,a2_2,a3_2,a4_2,a5_2},{b1_2,b2_2,b3_2,b4_2,b5_2}...

 ...    
 ... 
 ...

{{a1_20000,a2_20000,a3_20000,a4_20000,a5_20000},
{b1_20000,b2_20000,b3_20000,b4_20000,b5_20000}... }}

20000 代表迭代次数，185 代表个体，每个个体有 5 个属性。我需要构建一个 185 x 5 矩阵，用于存储每个人 5 个属性的平均值（20000 次迭代的平均值）。

不确定执行此操作的最佳方法是什么。我知道 Mean[ ] 适用于矩阵，但对于张量，导出的值可能不是我需要的。另外，如果我尝试执行 Mean[tensor]，Mathematica 就会耗尽内存。请提供一些帮助或建议。谢谢。

原文

I have a 20000 x 185 x 5 tensor, which looks like

{{{a1_1,a2_1,a3_1,a4_1,a5_1},{b1_1,b2_1,b3_1,b4_1,b5_1}... 
(continue for 185 times)}
 {{a1_2,a2_2,a3_2,a4_2,a5_2},{b1_2,b2_2,b3_2,b4_2,b5_2}...

 ...    
 ... 
 ...

{{a1_20000,a2_20000,a3_20000,a4_20000,a5_20000},
{b1_20000,b2_20000,b3_20000,b4_20000,b5_20000}... }}

The 20000 represents iteration number, the 185 represents individuals, and each individual has 5 attributes. I need to construct a 185 x 5 matrix that stores the mean value for each individual's 5 attributes, averaged across the 20000 iterations.

Not sure what the best way to do this is. I know Mean[ ] works on matrices, but with a Tensor, the derived values might not be what I need. Also, Mathematica ran out of memory if I tried to do Mean[tensor]. Please provide some help or advice. Thank you.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一紙繁鸢 2024-12-26 11:25:01

如有疑问，请减小尺寸。（您仍然可以保持它们不同，以便轻松查看事情的最终结果。）

(* In[1]:= *) data = Array[a, {4, 3, 2}]

(* Out[1]= *) {{{a[1, 1, 1], a[1, 1, 2]}, {a[1, 2, 1], 
   a[1, 2, 2]}, {a[1, 3, 1], a[1, 3, 2]}}, {{a[2, 1, 1], 
   a[2, 1, 2]}, {a[2, 2, 1], a[2, 2, 2]}, {a[2, 3, 1], 
   a[2, 3, 2]}}, {{a[3, 1, 1], a[3, 1, 2]}, {a[3, 2, 1], 
   a[3, 2, 2]}, {a[3, 3, 1], a[3, 3, 2]}}, {{a[4, 1, 1], 
   a[4, 1, 2]}, {a[4, 2, 1], a[4, 2, 2]}, {a[4, 3, 1], a[4, 3, 2]}}}

(* In[2]:= *) Dimensions[data]

(* Out[2]= *) {4, 3, 2}

(* In[3]:= *) means = Mean[data]

(* Out[3]= *) {
  {1/4 (a[1, 1, 1] + a[2, 1, 1] + a[3, 1, 1] + a[4, 1, 1]), 
   1/4 (a[1, 1, 2] + a[2, 1, 2] + a[3, 1, 2] + a[4, 1, 2])}, 
  {1/4 (a[1, 2, 1] + a[2, 2, 1] + a[3, 2, 1] + a[4, 2, 1]), 
   1/4 (a[1, 2, 2] + a[2, 2, 2] + a[3, 2, 2] + a[4, 2, 2])}, 
  {1/4 (a[1, 3, 1] + a[2, 3, 1] + a[3, 3, 1] + a[4, 3, 1]), 
   1/4 (a[1, 3, 2] + a[2, 3, 2] + a[3, 3, 2] + a[4, 3, 2])}
  }

(* In[4]:= *) Dimensions[means]

(* Out[4]= *) {3, 2}

When in doubt, drop the size of the dimensions. (You can still keep them distinct to easily see where things end up.)

(* In[1]:= *) data = Array[a, {4, 3, 2}]

(* Out[1]= *) {{{a[1, 1, 1], a[1, 1, 2]}, {a[1, 2, 1], 
   a[1, 2, 2]}, {a[1, 3, 1], a[1, 3, 2]}}, {{a[2, 1, 1], 
   a[2, 1, 2]}, {a[2, 2, 1], a[2, 2, 2]}, {a[2, 3, 1], 
   a[2, 3, 2]}}, {{a[3, 1, 1], a[3, 1, 2]}, {a[3, 2, 1], 
   a[3, 2, 2]}, {a[3, 3, 1], a[3, 3, 2]}}, {{a[4, 1, 1], 
   a[4, 1, 2]}, {a[4, 2, 1], a[4, 2, 2]}, {a[4, 3, 1], a[4, 3, 2]}}}

(* In[2]:= *) Dimensions[data]

(* Out[2]= *) {4, 3, 2}

(* In[3]:= *) means = Mean[data]

(* Out[3]= *) {
  {1/4 (a[1, 1, 1] + a[2, 1, 1] + a[3, 1, 1] + a[4, 1, 1]), 
   1/4 (a[1, 1, 2] + a[2, 1, 2] + a[3, 1, 2] + a[4, 1, 2])}, 
  {1/4 (a[1, 2, 1] + a[2, 2, 1] + a[3, 2, 1] + a[4, 2, 1]), 
   1/4 (a[1, 2, 2] + a[2, 2, 2] + a[3, 2, 2] + a[4, 2, 2])}, 
  {1/4 (a[1, 3, 1] + a[2, 3, 1] + a[3, 3, 1] + a[4, 3, 1]), 
   1/4 (a[1, 3, 2] + a[2, 3, 2] + a[3, 3, 2] + a[4, 3, 2])}
  }

(* In[4]:= *) Dimensions[means]

(* Out[4]= *) {3, 2}

回复收藏 0 原文

绝情姑娘 2024-12-26 11:25:01

如果我尝试执行 Mean[tensor]，Mathematica 就会内存不足

这可能是因为中间结果大于最终结果。如果元素不是 Real 或 Integer 类型，则可能会出现这种情况。示例：

a = Tuples[{x, Sqrt[y], z^x, q/2, Mod[r, 1], Sin[s]}, {2, 4}];
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}

{109125576, 124244808}

{269465456, 376960648}

如果它们是，并且是打包数组形式，则可能这些元素使得数组在处理过程中解包。

这是一个示例，其中张量是小数字的压缩数组，并且不会发生解压缩。

a = RandomReal[99, {20000, 185, 5}];
PackedArrayQ[a]
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}

True

{163012808, 163016952}

{163018944, 163026688}

这是具有非常大数字的相同大小的张量。

a = RandomReal[$MaxMachineNumber, {20000, 185, 5}];
Developer`PackedArrayQ[a]
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}

True

{163010680, 458982088}

{163122608, 786958080}

Mathematica ran out of memory if I tried to do Mean[tensor]

This is probably because intermediate results are larger than the final result. This is likely if the elements are not type Real or Integer. Example:

a = Tuples[{x, Sqrt[y], z^x, q/2, Mod[r, 1], Sin[s]}, {2, 4}];
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}

{109125576, 124244808}

{269465456, 376960648}

If they are, and are in packed array form, perhaps the elements are such that the array in unpacked during processing.

Here is an example where the tensor is a packed array of small numbers, and unpacking does not occur.

a = RandomReal[99, {20000, 185, 5}];
PackedArrayQ[a]
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}

True

{163012808, 163016952}

{163018944, 163026688}

Here is the same size of tensor with very large numbers.

a = RandomReal[$MaxMachineNumber, {20000, 185, 5}];
Developer`PackedArrayQ[a]
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}

True

{163010680, 458982088}

{163122608, 786958080}

回复收藏 0 原文

我不会写诗 2024-12-26 11:25:01

为了详细说明其他答案，没有理由期望 Mathematica 函数在张量上的操作与在矩阵上的操作有本质上的不同，因为 Mathemetica 认为它们都是嵌套的列表，只是嵌套深度不同。函数在列表中的行为方式取决于它们是否是 Listable，您可以使用 Attributes[f] 检查，其中 f 是您要使用的函数。感兴趣。

您的数据列表的维度实际上并没有那么大。如果没有看到您的实际数据，很难确定，但我怀疑您内存不足的原因是您的某些数据是非数字的。

回复收藏 0 原文

牵你手 2024-12-26 11:25:01

我不知道你做错了什么（你的代码会有所帮助）。但是 Mean[] 已经按照您想要的方式工作了。

a = RandomReal[1, {20000, 185, 5}];
b = Mean@a;

Dimensions@b
Out[1]= {185, 5}

您甚至可以检查这是否正确：

{Max@b, Min@b}
Out[2]={0.506445, 0.494061}

鉴于 RandomReal 默认情况下使用均匀分布，这是平均值的预期值。

I don't know what you're doing incorrectly (your code will help). But Mean[] already works as you want it to.

a = RandomReal[1, {20000, 185, 5}];
b = Mean@a;

Dimensions@b
Out[1]= {185, 5}

You can even check that this is correct:

{Max@b, Min@b}
Out[2]={0.506445, 0.494061}

which is the expected value of the mean given that RandomReal uses a uniform distribution by default.

回复收藏 0 原文

情魔剑神 2024-12-26 11:25:01

假设您有以下数据：

a = Table[RandomInteger[100], {i, 20000}, {j, 185}, {k, 5}];

以简单的方式您可以找到存储手段的表a[[1,j,k]],a[[2,j,k]],...a[[20000,j,k]]：

c = Table[Sum[a[[i, j, k]], {i, Length[a]}], {j, 185}, {k, 5}]/
 Length[a] // N; // Timing
{37.487, Null}

或者简单地说：

d = Total[a]/Length[a] // N; // Timing
{0.702, Null}

第二种方法是大约快50倍。

c == d
True

Assume you have the following data :

a = Table[RandomInteger[100], {i, 20000}, {j, 185}, {k, 5}];

In a straightforward manner You can find a table which stores the means of a[[1,j,k]],a[[2,j,k]],...a[[20000,j,k]]:

c = Table[Sum[a[[i, j, k]], {i, Length[a]}], {j, 185}, {k, 5}]/
 Length[a] // N; // Timing
{37.487, Null}

or simply :

d = Total[a]/Length[a] // N; // Timing
{0.702, Null}

The second way is about 50 times faster.

c == d
True

回复收藏 0 原文

撑一把青伞 2024-12-26 11:25:01

为了稍微扩展 Brett 的答案，当您在 n 维张量上调用 Mean 时，它会对第一个索引进行平均并返回 n-1 维张量：

a = RandomReal[1, {a1, a2, a3, ... an}];
Dimensions[a] (* This would have n entries in it *)
b = Mean[a];
Dimensions[b] (* Has n-1 entries, where averaging was done over the first index *)

在更一般的情况下，您可能希望要对第 i 个参数求平均值，您必须首先转置数据。例如，假设您想要对 5 个维度中的第 3 个维度进行平均。您首先需要第三个元素，然后是第一个、第二个、第四个、第五个。

a = RandomReal[1, {5, 10, 2, 40, 10}];
b = Transpose[a, {2, 3, 4, 1, 5}];
c = Mean[b]; (* Now of dimensions {5, 10, 40, 10} *)

换句话说，您将调用 Transpose，将第 i 个索引作为第一个张量索引，并将其之前的所有内容移至前面。第 i 个索引之后的任何内容都保持不变。

当您的数据采用奇怪的格式时，这往往会派上用场，其中第一个索引可能并不总是代表数据样本的不同实现。例如，当我必须对大型风数据集进行时间平均时，我就遇到过这个问题，其中时间序列在可用的张量表示方面排在第三位（！）。

您可以想象 generalizedTenorMean 看起来像这样：

Clear[generalizedTensorMean];
generalizedTensorMean[A_, i_] := 
 Module[{n = Length@Dimensions@A, ordering},
  ordering = 
   Join[Table[x, {x, 2, i}], {1}, Table[x, {x, i + 1, n}]];
  Mean@Transpose[A, ordering]]

当 i == 1 时，这会简化为普通旧均值。尝试一下：

A = RandomReal[1, {2, 4, 6, 8, 10, 12, 14}];
Dimensions@A   (* {2, 4, 6, 8, 10, 12, 14} *)
Dimensions@generalizedTensorMean[A, 1]  (* {4, 6, 8, 10, 12, 14} *)
Dimensions@generalizedTensorMean[A, 7]  (* {2, 4, 6, 8, 10, 12} *)

顺便说一句，令我惊讶的是 Mathematica 默认不支持此功能。您并不总是想对列表的第一级进行平均。

To extend on Brett's answer a bit, when you call Mean on a n-dimensional tensor then it averages over the first index and returns an n-1 dimensional tensor:

a = RandomReal[1, {a1, a2, a3, ... an}];
Dimensions[a] (* This would have n entries in it *)
b = Mean[a];
Dimensions[b] (* Has n-1 entries, where averaging was done over the first index *)

In the more general case where you may wish to average over the i-th argument, you would have to transpose the data around first. For example, say you want to average the 3nd of 5 dimensions. You would need the 3rd element first, followed by the 1st, 2nd, 4th, 5th.

a = RandomReal[1, {5, 10, 2, 40, 10}];
b = Transpose[a, {2, 3, 4, 1, 5}];
c = Mean[b]; (* Now of dimensions {5, 10, 40, 10} *)

In other words, you would make a call to Transpose where you placed the i-th index as the first tensor index and moved everything before it ahead one. Anything that comes after the i-th index stays the same.

This tends to come in handy when your data comes in odd formats where the first index may not always represent different realizations of a data sample. I've had this come up, for example, when I had to do time averaging of large wind data sets where the time series came third (!) in terms of the tensor representation that was available.

You could imagine the generalizedTenorMean would look something like this then:

Clear[generalizedTensorMean];
generalizedTensorMean[A_, i_] := 
 Module[{n = Length@Dimensions@A, ordering},
  ordering = 
   Join[Table[x, {x, 2, i}], {1}, Table[x, {x, i + 1, n}]];
  Mean@Transpose[A, ordering]]

This reduces to the plain-old-mean when i == 1. Try it out:

A = RandomReal[1, {2, 4, 6, 8, 10, 12, 14}];
Dimensions@A   (* {2, 4, 6, 8, 10, 12, 14} *)
Dimensions@generalizedTensorMean[A, 1]  (* {4, 6, 8, 10, 12, 14} *)
Dimensions@generalizedTensorMean[A, 7]  (* {2, 4, 6, 8, 10, 12} *)

On a side note, I'm surprised that Mathematica doesn't support this by default. You don't always want to average over the first level of a list.

回复收藏 0 原文

~没有更多了~