不使用 rbind 创建复制数据帧,继续之前的模拟循环问题

发布于 2024-12-28 22:25:29 字数 4181 浏览 1 评论 0原文

我没有添加更多评论或使原来的问题变得更长,而是创建了另一个问题。我在上一个问题中收到了很好的建议(这里),但我在 R 方面还不够好,无法实现评论中的建议。

花了很长时间的原始代码是:

Male.MC <-c()
for (j in 1:100)            {
    for (i in 1:nrow(Male.Distrib))  {
        u2        <- Male.Distrib$stddev_u2[i] * rnorm(1, mean = 0, sd = 1)
        mc_bca    <- Male.Distrib$FixedEff[i] + u2
        temp      <- Lambda.Value*mc_bca+1
        ginv_a    <- temp^(1/Lambda.Value)
        d2ginv_a  <- max(0,(1-Lambda.Value)*temp^(1/Lambda.Value-2))
        mc_amount <- ginv_a + d2ginv_a * Male.Resid.Var / 2
        z <- data.frame(
        RespondentID = Male.Distrib$RespondentID[i], 
        Subgroup     = Male.Distrib$Subgroup[i], 
        mc_amount    = mc_amount,
        IndvWeight   = Male.Distrib$INDWTS[i]/100
        )
        Male.MC <- as.data.frame(rbind(Male.MC,z))
    }
}

当我认为我只需要函数的一个输出(mc_amount)时,replicate()答案效果很好:

Male.Distrib = read.table('MaleDistrib.txt', check.names=F)

getMC <- function(df, Lambda.Value=0.4, Male.Resid.Var=12.1029420429778) {
      u2        <- df$stddev_u2 * rnorm(nrow(df), mean = 0, sd = 1)
      mc_bca    <- df$FixedEff + u2
      temp      <- Lambda.Value*mc_bca+1
      ginv_a    <- temp^(1/Lambda.Value)
      d2ginv_a  <- max(0,(1-Lambda.Value)*temp^(1/Lambda.Value-2))
      mc_amount <- ginv_a + d2ginv_a * Male.Resid.Var / 2
      mc_amount
}
replicate(10, getMC(Male.Distrib))

但是,即使进行数据更正后,我得到了意想不到的结果,因此我需要能够查看所有临时计算的值,以确定我的逻辑哪里出了问题。这就是我被困住的地方。我创建了一个名为 tempdata 的较小数据框进行测试,它只是来自 7135 个观测值的较大数据集的 head()tempdata 集是:

    RndmEff RespondentID Subgroup RespondentID Replicates IntakeAmt RACE INDWTS    TOTWTS   GRPWTS NUMSUBJECTS TOTSUBJECTS  FixedEff stddev_u2
1  1.343753         9966        6         9966      41067 33.449808    2  41067 120622201 41657878        1466        7135  6.089918  2.645938
2 -5.856516         9967        5         9967       2322  2.533528    3   2322 120622201 22715139        1100        7135  6.755664  2.645938
3 -3.648339         9970        4         9970      17434  9.575439    2  17434 120622201 10520535        1424        7135  7.079757  2.645938
4  2.697533         9972        6         9972      21723 43.340180    2  21723 120622201 41657878        1466        7135  6.089918  2.645938
5  3.531878         9974        3         9974        375 55.660607    3    375 120622201 10791729        1061        7135  6.176319  2.645938
6  6.627767         9976        6         9976      48889 91.480049    2  48889 120622201 41657878        1466        7135  6.089918  2.645938

我使用的更新命令是:

getMC <- function(df, Lambda.Value=0.4, Male.Resid.Var=12.1029420429778) {
    RespondentID <- df$RespondentID
    u2        <- df$stddev_u2 * rnorm(nrow(df), mean = 0, sd = 1)
    mc_bca    <- df$FixedEff + u2
    temp      <- max(Lambda.Value*mc_bca+1,Lambda.Value*Min_bca+1)
    ginv_a    <- temp^(1/Lambda.Value)
    d2ginv_a  <- max(0,(1-Lambda.Value)*temp^(1/Lambda.Value-2))
    mc_amount <- ginv_a + d2ginv_a * Male.Resid.Var / 2
    return(list(RespondentID, temp, ginv_a, d2ginv_a, mc_amount))
}
Test <- replicate(10, getMC(tempdata))

我为计算变量获得了非常好的布局(tempginv_ad2ginv_a, mc_amount),但结果有两个问题。这些问题可能是相关的,我不太了解,无法弄清楚发生了什么。

首先,我只获得与第一个 RespondentID 相关的 10 列,因此该函数似乎不适用于数据集中的 6 列。

其次,我得到 10 列,但 RespondentID 结果连接到每一列的一个单元格中。如果我将 u2mc_bca 添加到返回列表中,它们也会类似地连接到一个单元格中。我已阅读 Rreturn 帮助,它包含这一行

value 可以是一系列用逗号分隔的非空表达式。在这种情况下,返回的值是已计算表达式的列表,名称设置为表达式,其中这些是 R 对象的名称。 但我对 R 函数编程了解不够,不知道这是否相关。

我希望有一个快速且明显的解决方案。我一直无法找到可以复制解决方案的类似问题,我发现的所有函数多次返回的示例都使用了在函数中计算的变量。

我尝试了另一种方法:创建一个空的数据框,然后尝试将结果矢量化为该数据框。我在向量化方面比在复制方面更糟糕。

更新:错过了 min_bca 值,即 -2.44478269434376

Rather than adding in more comments or making my original question longer, I have created another question. I have received excellent advice in the previous question (here) but I am not good enough in R to implement the suggestions in the comments.

The original code, that took ages, was:

Male.MC <-c()
for (j in 1:100)            {
    for (i in 1:nrow(Male.Distrib))  {
        u2        <- Male.Distrib$stddev_u2[i] * rnorm(1, mean = 0, sd = 1)
        mc_bca    <- Male.Distrib$FixedEff[i] + u2
        temp      <- Lambda.Value*mc_bca+1
        ginv_a    <- temp^(1/Lambda.Value)
        d2ginv_a  <- max(0,(1-Lambda.Value)*temp^(1/Lambda.Value-2))
        mc_amount <- ginv_a + d2ginv_a * Male.Resid.Var / 2
        z <- data.frame(
        RespondentID = Male.Distrib$RespondentID[i], 
        Subgroup     = Male.Distrib$Subgroup[i], 
        mc_amount    = mc_amount,
        IndvWeight   = Male.Distrib$INDWTS[i]/100
        )
        Male.MC <- as.data.frame(rbind(Male.MC,z))
    }
}

The replicate() answer worked well when I thought I only needed one output (mc_amount) from the function:

Male.Distrib = read.table('MaleDistrib.txt', check.names=F)

getMC <- function(df, Lambda.Value=0.4, Male.Resid.Var=12.1029420429778) {
      u2        <- df$stddev_u2 * rnorm(nrow(df), mean = 0, sd = 1)
      mc_bca    <- df$FixedEff + u2
      temp      <- Lambda.Value*mc_bca+1
      ginv_a    <- temp^(1/Lambda.Value)
      d2ginv_a  <- max(0,(1-Lambda.Value)*temp^(1/Lambda.Value-2))
      mc_amount <- ginv_a + d2ginv_a * Male.Resid.Var / 2
      mc_amount
}
replicate(10, getMC(Male.Distrib))

However, even with data corrections made, I am getting unexpected results so I need to be able to see the values for all the interim calculations to determine where I have gone wrong in my logic. This is where I'm stuck. I have created a smaller data frame called tempdata for testing, which is just head() from my larger dataset of 7135 observations. The tempdata set is:

    RndmEff RespondentID Subgroup RespondentID Replicates IntakeAmt RACE INDWTS    TOTWTS   GRPWTS NUMSUBJECTS TOTSUBJECTS  FixedEff stddev_u2
1  1.343753         9966        6         9966      41067 33.449808    2  41067 120622201 41657878        1466        7135  6.089918  2.645938
2 -5.856516         9967        5         9967       2322  2.533528    3   2322 120622201 22715139        1100        7135  6.755664  2.645938
3 -3.648339         9970        4         9970      17434  9.575439    2  17434 120622201 10520535        1424        7135  7.079757  2.645938
4  2.697533         9972        6         9972      21723 43.340180    2  21723 120622201 41657878        1466        7135  6.089918  2.645938
5  3.531878         9974        3         9974        375 55.660607    3    375 120622201 10791729        1061        7135  6.176319  2.645938
6  6.627767         9976        6         9976      48889 91.480049    2  48889 120622201 41657878        1466        7135  6.089918  2.645938

The updated commands that I am using are:

getMC <- function(df, Lambda.Value=0.4, Male.Resid.Var=12.1029420429778) {
    RespondentID <- df$RespondentID
    u2        <- df$stddev_u2 * rnorm(nrow(df), mean = 0, sd = 1)
    mc_bca    <- df$FixedEff + u2
    temp      <- max(Lambda.Value*mc_bca+1,Lambda.Value*Min_bca+1)
    ginv_a    <- temp^(1/Lambda.Value)
    d2ginv_a  <- max(0,(1-Lambda.Value)*temp^(1/Lambda.Value-2))
    mc_amount <- ginv_a + d2ginv_a * Male.Resid.Var / 2
    return(list(RespondentID, temp, ginv_a, d2ginv_a, mc_amount))
}
Test <- replicate(10, getMC(tempdata))

I get a very nice layout for my calculated variables (temp, ginv_a, d2ginv_a, mc_amount) but there are two problems with the results. These problems could be related, I don't understand enough to work out what is happening.

First, I only get 10 columns relating to the first RespondentID, so the function does not seem to be applied to the 6 that are in the dataset.

Second, I get 10 columns, but the RespondentID results are concatenated into one cell in each column. If I add u2 or mc_bca to the return list, these are also similarly concatenated into one cell. I have read the R help for return and it contains this line

value could be a series of non-empty expressions separated by commas. In that case the value returned is a list of the evaluated expressions, with names set to the expressions where these are the names of R objects.
but I don't understand enough about R function programming to know if that is relevant.

I'm hoping there is a quick and obvious fix to this. I have been unable to find a similar problem of which I could copy the solution, all the examples of multiple returns from functions that I have found have used variables being calculated in the function.

I have tried the alternative of creating an empty data frame and then trying to vectorize the results into that. I'm worse at vectorizing than I am at replication.

Update: missed the min_bca value, which is -2.44478269434376

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

情痴 2025-01-04 22:25:29

经过几次编辑后,希望这是您问题的最终解决方案。

    getMC <- function(df, Lambda.Value=0.4, Male.Resid.Var=12.1029420429778,Min_bca=-2.44478269434376) {
            u2        <- df$stddev_u2 * rnorm(nrow(df), mean = 0, sd = 1)
            mc_bca    <- df$FixedEff + u2
            temp      <- pmax((Lambda.Value*mc_bca+1),(Lambda.Value*Min_bca+1))
            ginv_a    <- temp^(1/Lambda.Value)
            d2ginv_a  <- max(0,(1-Lambda.Value)*temp^(1/Lambda.Value-2))
            mc_amount <- ginv_a + d2ginv_a * Male.Resid.Var / 2
            return(data.frame(RespondentID=df$RespondentID,temp=temp, ginv_a, d2ginv_a, mc_amount))
        }

   data=rep(list(tempdata),10) # change 10 to a higher number of replicates
   result_data=llply(data,getMC, .progress = "text")

一些注意事项:我必须逐行对单个复制中的函数进行故障排除,以找出问题所在(这是您在发布之前应该做的事情,因为上面的问题与此问题无关)。 max(vector1,vector2)返回一个值,该值使得所有RespondentIDtemp相同。相反,我用 pmax 替换了它(有关解释,请参阅 ?max)。

After a couple of more edits, hopefully here is the final solution to your question.

    getMC <- function(df, Lambda.Value=0.4, Male.Resid.Var=12.1029420429778,Min_bca=-2.44478269434376) {
            u2        <- df$stddev_u2 * rnorm(nrow(df), mean = 0, sd = 1)
            mc_bca    <- df$FixedEff + u2
            temp      <- pmax((Lambda.Value*mc_bca+1),(Lambda.Value*Min_bca+1))
            ginv_a    <- temp^(1/Lambda.Value)
            d2ginv_a  <- max(0,(1-Lambda.Value)*temp^(1/Lambda.Value-2))
            mc_amount <- ginv_a + d2ginv_a * Male.Resid.Var / 2
            return(data.frame(RespondentID=df$RespondentID,temp=temp, ginv_a, d2ginv_a, mc_amount))
        }

   data=rep(list(tempdata),10) # change 10 to a higher number of replicates
   result_data=llply(data,getMC, .progress = "text")

Some notes: I had to troubleshoot your function on a single replicate, line by line, to find out what was wrong (this is something you should do before posting because the question above is not about this issue). max(vector1,vector2)returns a single value which makes temp the same for all RespondentID. Instead I replaced it with pmax (see ?max for an explanation).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文