在调用 fastLmPure 后,如何让 R 就地修改该矩阵?

发布于 2025-01-13 02:00:39 字数 1997 浏览 1 评论 0原文

我试图在循环中运行 fastLmPure ,避免复制大矩阵。

我已将矩阵预先分配到我需要的大小,这只是我需要更改值的最后一列。即它不会增长。

我将其归结为下面的最小案例,这些案例说明了我遇到的问题。

我希望矩阵的修改就地完成,但我得到了一个副本,因为引用计数器已通过调用 Rcpp 而增加。

Rcpp 不会修改 X,那么为什么会出现额外的引用计数,导致 R 在我下次修改时进行复制?
从 fastLm 返回执行后如何获得就地修改?

(我使用的是 R 4.1.2,并通过控制台运行以避免任何 RStudio 环境窗格引用问题。)

示例 1:

library(RcppArmadillo)
minimal_fastLm <- function() {
  y <- c(1,2)

  X <- matrix(data = c(1,1,2,3), nrow = 2, ncol = 2)
  .Internal(inspect(X))

  model <- .Call("_RcppArmadillo_fastLm_impl", X, y, PACKAGE = "RcppArmadillo")
  .Internal(inspect(X))

  X[, 2] <- c(3,4)
  .Internal(inspect(X))
}

minimal_fastLm()

给出输出(为了清晰起见进行了编辑):
通过调用 fastLm 地址不变,但引用计数增加,然后在修改时进行复制。 (参见粗体)。 <代码> @0x0000000017401560 14 RELSXP g0c3 [REF(1),ATT] (len=4, tl=0) 1,1,2,3 ... @0x0000000017401560 14 RELSXP g0c3 [REF(2),ATT] (len=4, tl=0) 1,1,2,3 ... @0x00000000174ad568 14 RELSXP g0c3 [REF(1),ATT] (len=4, tl=0) 1,1,3,4 ...

示例 2: 甚至更小的(纯 Rcpp)

library(Rcpp)
cppFunction('int rawSEXP(SEXP X) { return(1); }')
cppFunction('int asNumMat(NumericMatrix X) { return(1); }')

pureRcpp <- function() {
  X <- matrix(data = c(1,1,2,3), nrow = 2, ncol = 2)
  .Internal(inspect(X))

  X[, 2] <- c(4,5)    # Initially modifies in place because X has REF(1).
  .Internal(inspect(X))

  rawSEXP(X)          # Call Rcpp function with raw SEXP
  .Internal(inspect(X))

  X[, 2] <- c(6,7)    # Still modifies in place because X has REF(1).
  .Internal(inspect(X))

  asNumMat(X)         # Call Rcpp function that casts to NumericMatrix
  .Internal(inspect(X))

  X[, 2] <- c(8,9)    # Causes a copy because X has REF(3).
  .Internal(inspect(X))

}
pureRcpp()

X 也从未被 Rcpp 修改过。我也不希望这样。我只是想读取数据。
那么当 NumericMatrix 超出范围时,为什么引用计数器没有减回 1?

I'm trying to run fastLmPure in a loop, avoiding copying a large matrix.

I've pre-allocated the matrix to the size I need, and it's only the last column I need to change values in. i.e. it doesn't grow.

I've boiled it down to the minimal cases below, which illustrate the problem I'm having.

I expect the modification of matrix to be done in-place, but instead I get a copy because ref counter has been incremented by the call out to Rcpp.

Rcpp doesn't modify X, so why does an extra refcount hang around, which causes R to make a copy when I next modify?

How can I get a modify-in-place after execution returns from fastLm?

(I'm using R 4.1.2, and running via console to avoid any RStudio env pane reference issues.)

Example 1:

library(RcppArmadillo)
minimal_fastLm <- function() {
  y <- c(1,2)

  X <- matrix(data = c(1,1,2,3), nrow = 2, ncol = 2)
  .Internal(inspect(X))

  model <- .Call("_RcppArmadillo_fastLm_impl", X, y, PACKAGE = "RcppArmadillo")
  .Internal(inspect(X))

  X[, 2] <- c(3,4)
  .Internal(inspect(X))
}

minimal_fastLm()

Gives output (edited for clarity):

Address unchanged through call to fastLm, but ref count increased, then copy made on modify. (See bold).

@0x0000000017401560 14 REALSXP g0c3 [REF(1),ATT] (len=4, tl=0) 1,1,2,3
...
@0x0000000017401560 14 REALSXP g0c3 [REF(2),ATT] (len=4, tl=0) 1,1,2,3
...
@0x00000000174ad568 14 REALSXP g0c3 [REF(1),ATT] (len=4, tl=0) 1,1,3,4
...

Example 2:
Even more minimal (pure Rcpp)

library(Rcpp)
cppFunction('int rawSEXP(SEXP X) { return(1); }')
cppFunction('int asNumMat(NumericMatrix X) { return(1); }')

pureRcpp <- function() {
  X <- matrix(data = c(1,1,2,3), nrow = 2, ncol = 2)
  .Internal(inspect(X))

  X[, 2] <- c(4,5)    # Initially modifies in place because X has REF(1).
  .Internal(inspect(X))

  rawSEXP(X)          # Call Rcpp function with raw SEXP
  .Internal(inspect(X))

  X[, 2] <- c(6,7)    # Still modifies in place because X has REF(1).
  .Internal(inspect(X))

  asNumMat(X)         # Call Rcpp function that casts to NumericMatrix
  .Internal(inspect(X))

  X[, 2] <- c(8,9)    # Causes a copy because X has REF(3).
  .Internal(inspect(X))

}
pureRcpp()

X was never modified by Rcpp. Nor do I want it to be. I just want to read data.
So why was ref counter not decremented back to 1 when NumericMatrix went out of scope?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

那支青花 2025-01-20 02:00:39

R 使用写时复制,并且在来自 R 或 C(++) 代码的写入之间不会发生变化——更新的值只是更新的值。

因此,如果您想要可变的非复制数据,您可能必须自己在非 R 数据结构中管理数据。一个简单的犰狳向量或矩阵,你注意不要继承 R 数据就可以了。

分配一次,根据需要修改值,循环到您想要的内容。这应该符合你的要求,除非我误解了什么。

R uses copy-on-write and that doesn't change between the write coming from R or C(++) code -- an updated value is just an updated value.

So you if want mutable non-copied data you may have to manage the data yourself in a non R data structure. A simple Armadillo vector or matrix where you take care not to inherit from R data should do.

Allocate that once, modify values as you please, loop to your heart's content. That should suit your requirement, unless I misunderstood something.

欲拥i 2025-01-20 02:00:39

经过一番挖掘,我发现 Rcpp 可以做一些清理工作,但实际上却没有。

作为错误提出,并确定了解决方案。
将在下一版本的 Rcpp 中修复。
https://github.com/RcppCore/Rcpp/pull/1205

所以,答案是:

  • 这种就地修改确实应该是可能的,
  • 而且会在未来的 Rcpp 版本之后实现。

After some digging, I found that there's some cleaning up that Rcpp could be doing, but isn't.

Raised as a bug, with solution identified.
Will be fixed in the next release of Rcpp.
https://github.com/RcppCore/Rcpp/pull/1205

So, the answer is:

  • This modify-in-place should indeed be possible,
  • And it will be after a future release of Rcpp.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文