计算numpy中许多矩阵的平均值
我有许多 csv 文件,每个文件都包含大致相同的矩阵。每个矩阵有 11 列 x 5 或 6 行。列是变量,行是测试条件。某些矩阵不包含最后一个测试条件的数据,这就是为什么某些矩阵中有 5 行,而其他矩阵中有 6 行的原因。
我的应用程序是使用 numpy 和 sciepy 的 python 2.6。
我的问题是这样的:
如何最有效地创建一个汇总矩阵,其中包含所有相同矩阵中每个单元格的平均值?
汇总矩阵将具有与所有其他矩阵相同的结构矩阵,不同之处在于汇总矩阵中每个单元格中的值将是所有其他矩阵中相同单元格中存储的值的平均值。如果一个矩阵不包含最后一个测试条件的数据,我想确保在求平均值时其内容不被视为零。换句话说,我想要所有非零值的平均值。
任何人都可以向我展示一种简短、灵活的方式来组织此代码,以便它用尽可能少的代码完成我想做的所有事情,并且在我想重新使用此代码时保持尽可能的灵活性稍后使用其他数据结构?
我知道如何提取所有 csv 文件以及如何写入输出。我只是不知道在脚本中构造数据流的最有效方法,包括是否使用 python 数组或 numpy 数组,以及如何构造操作等。
我尝试过用多种不同的方式对其进行编码,但如果我以后想将此代码用于其他数据结构,它们似乎都相当代码密集且不灵活。
I have many csv files which each contain roughly identical matrices. Each matrix is 11 columns by either 5 or 6 rows. The columns are variables and the rows are test conditions. Some of the matrices do not contain data for the last test condition, which is why there are 5 rows in some matrices and six rows in other matrices.
My application is in python 2.6 using numpy and sciepy.
My question is this:
How can I most efficiently create a summary matrix that contains the means of each cell across all of the identical matrices?
The summary matrix would have the same structure as all of the other matrices, except that the value in each cell in the summary matrix would be the mean of the values stored in the identical cell across all of the other matrices. If one matrix does not contain data for the last test condition, I want to make sure that its contents are not treated as zeros when the averaging is done. In other words, I want the means of all the non-zero values.
Can anyone show me a brief, flexible way of organizing this code so that it does everything I want to do with as little code as possible and also remain as flexible as possible in case I want to re-use this later with other data structures?
I know how to pull all the csv files in and how to write output. I just don't know the most efficient way to structure flow of data in the script, including whether to use python arrays or numpy arrays, and how to structure the operations, etc.
I have tried coding this in a number of different ways, but they all seem to be rather code intensive and inflexible if I later want to use this code for other data structures.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用屏蔽数组。假设 N 是 csv 文件的数量。您可以将所有数据存储在形状为 (N,11,6) 的掩码数组 A 中。
然后,沿第一轴的平均值并考虑屏蔽值,由下式给出:
You could use masked arrays. Say N is the number of csv files. You can store all your data in a masked array A, of shape (N,11,6).
Then, the mean values along first axis, and taking into account masked values, are given by: