从同一行指示的列返回值
我陷入了一个需要一个多小时才能运行的简单循环,并且需要帮助来加快速度。
基本上,我有一个 31 列和 400 000 行的矩阵。前 30 列有值,第 31 列有列号。我需要每行检索第 31 列指示的列中的值。
示例行: [26,354,72,5987..,461,3] (这意味着第 3 列中的值被寻找 (72))
太慢的循环如下所示:
a <- rep(0,nrow(data)) #To pre-allocate memory
for (i in 1:nrow(data)) {
a[i] <- data[i,data[i,31]]
}
我认为这会起作用:
a <- data[,data[,31]]
...但它会导致“错误:无法分配大小为 2.8 Mb 的向量”。
我担心这是一个非常简单的问题,所以我花了几个小时试图理解 apply、lapply、reshape 等,但不知何故我无法掌握 R 中的矢量化概念。
矩阵实际上有更多列也进入 a 参数,这就是为什么我不想重建矩阵或拆分它。
非常感谢您的支持!
克里斯
I'm stuck with a simple loop that takes more than an hour to run, and need help to speed it up.
Basically, I have a matrix with 31 columns and 400 000 rows. The first 30 columns have values, and the 31st column has a column-number. I need to, per row, retrieve the value in the column indicated by the 31st column.
Example row: [26,354,72,5987..,461,3] (this means that the value in column 3 is sought after (72))
The too slow loop looks like this:
a <- rep(0,nrow(data)) #To pre-allocate memory
for (i in 1:nrow(data)) {
a[i] <- data[i,data[i,31]]
}
I would think this would work:
a <- data[,data[,31]]
... but it results in "Error: cannot allocate vector of size 2.8 Mb".
I fear that this is a really simple question, so I've spent hours trying to understand apply, lapply, reshape, and more, but somehow I can't get a grip on the vectorization concept in R.
The matrix actually has even more columns that also go into the a-parameter, which is why I don't want to rebuild the matrix, or split it.
Your support is highly appreciated!
Chris
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是有效的,因为您可以引用数组格式和向量格式(在本例中为 400000*31 长向量)的矩阵,首先按列计数。要按行计数,可以使用转置。
This works because you can reference matricies both in array format, and vector format (a 400000*31 long vector in this case) counting column-wise first. To count row-wise, you use the transpose.
矩阵的单索引表示法可能使用更少的内存。这将涉及执行以下操作:
下面是 R 中矩阵的单索引表示法的示例。在此示例中,每行最大值的索引被附加为随机矩阵的最后一列。然后,最后一列用于通过单索引表示法选择每行最大值。
使用 索引矩阵 是一种替代方法可能会使用更多内存,但稍微清晰一些:
Singe-index notation for the matrix may use less memory. This would involve doing something like:
Below is an example of single-index notation for matrices in R. In this example, the index of the per-row maximum is appended as the last column of a random matrix. This last column is then used to select the per-row maxima via single-index notation.
Using an index matrix is an alternative that will probably use more memory but is slightly clearer:
尝试更改代码以一次处理一列:
如果最后一列的值为 i<,则这会将 a 中的所有元素设置为 i 列中的值/em>。构建矩阵比计算向量a花费的时间更长。
Try to change the code to work a column at a time:
This sets all elements in a with the values from column i if the last column has value i. It took longer to build the matrix than to calculate vector a.