使用R将数据分组崩溃以获得新的权重，但仍获得无限范围

发布于 2025-01-29 02:02:39 字数 160 浏览 2 评论 0原文

我有一个神经外科患者的数据集，我正在创建生存曲线。我正在尝试调整曲线，以匹配2000年美国人口的年龄分布，该分布包含在R生存包中。此“ USPOP2”数据集是一个带有日历年的数组。首先，我只想查看50岁及以上的年龄，因此我将使用相同的上年龄阈值来创建一个在组中观察到的年龄/性别计数的表“ TAB100”。新权重为

原文

I have a dataset of neurosurgery patients for which I am creating survival curves. I am trying to adjust my curves to match the age-sex distribution of the 2000 US population, which is included in the R survival package. This 'uspop2' dataset is an array with and calendar year. First, I'm only going to look at ages 50 and over, so I'll create a table 'tab100' of observed age/sex counts within group for our own data, using the same upper age threshold. New weights are the values of ????????= pi.us/tab100.

Here is the first code I write (note that I am using R in rpy2 in google collab):

%%R

#Reweighting
mydata$group <- factor(1 + 1*(mydata$Drill.Plunge..mm. > 2) + 1*(mydata$Drill.Plunge..mm. > 4), levels=1:3,labels=c("Plunge <= 2 mm", "Plunge 2 - 4 mm", "Plunge > 4 mm"))

refpop <- uspop2[as.character(50:100),c("female", "male"), "2000"]
pi.us <- refpop/sum(refpop)
age100 <- factor(ifelse(mydata$Age..yrs. >100, 100, mydata$Age..yrs.), levels=50:100)
tab100 <- with(mydata, table(age100, mydata$Sex, mydata$group))/ nrow(mydata)
us.wt <- rep(pi.us, 3)/ tab100 #new weights by age,sex, group
range(us.wt)

This yields a range of 0.006709405 to Infinity! This infinite weight happens because the US population has all age-sex combos represented, but my neurosurgery patient dataset does not. To get rid of these infinite weights, I attempt to collapse the US population into separate age groups...

%%R

mydata$group <- factor(1 + 1*(mydata$Drill.Plunge..mm. > 2) + 1*(mydata$Drill.Plunge..mm. > 4), levels=1:3,labels=c("Plunge <= 2 mm", "Plunge 2 - 4 mm", "Plunge > 4 mm"))

temp <- as.numeric(cut(50:100, c(49, 54, 59, 64, 69, 74, 79, 89, 110)+.5))
pi.us<- tapply(refpop, list(temp[row(refpop)], col(refpop)), sum)/sum(refpop)

print(pi.us)

tab2 <- with(mydata, table(mydata$Age..yrs., mydata$Sex, mydata$group))/nrow(mydata)

print(tab2)

us.wt <- rep(pi.us, 3)/tab2

print(range(us.wt))

index <- with(mydata, cbind(mydata$Age..yrs., mydata$Sex,
  as.numeric(mydata$group)))

mydata$uswt <- us.wt[index]
sfit3a <-survfit(Surv(Patient.LOS..days., Events) ~ group, data=mydata, weight=uswt)

Printing pi.us and tab2 show me that I did successfully collapse the ages into 8 groups. Yet when I set us.wt <- rep(pi.us, 3)/tab2, us.wt is still the exact same as before! It doesn't change. You can see below that the range outputted has a different lower bound, but still goes all the way up to infinity. It's no surprise, that I get a subscript out of bounds error for the next line of code. What the heck is going on?

[1] 0.4655699       Inf
R[write to console]: Error in `[.default`(us.wt, index) : subscript out of bounds


Error in `[.default`(us.wt, index) : subscript out of bounds

BTW I am basing my code exactly off of page 7 of this R paper: https://cran.r-project.org/web/packages/survival/vignettes/adjcurve.pdf

What am I doing wrong? :( Thanks for your help!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

疯狂的代价 2025-02-05 02:02:39

这是对您的问题的答案，但不是解决问题的方法。查看index和us.wt对象。显然，us.wt数组的边距的命名不匹配index的第三列中的值，

str(us.wt)
 'table' num [1:48, 1:3, 1:3] Inf Inf Inf Inf Inf ...
 - attr(*, "dimnames")=List of 3
  ..$ : chr [1:48] "2" "3" "4" "5" ...
  ..$ : chr [1:3] "" "F" "M"
  ..$ : chr [1:3] "Plunge <= 2 mm" "Plunge 2 - 4 mm" "Plunge > 4 mm"
> str(index)
 chr [1:240, 1:3] "2" "7" "11" "75" "59" "3" "88" "13" "75" "80" "5" "3" "65" "66" "93" "45" ...
> head(index)
     [,1] [,2] [,3]
[1,] "2"  "M"  "1" 
[2,] "7"  "M"  "3" 
[3,] "11" "M"  "1" 
[4,] "75" "M"  "3" 
[5,] "59" "M"  "1" 
[6,] "3"  "M"  "3"

我也认为我们的数组构造。 wt搞砸了。由于没有关于构建逻辑或目标的描述，因此我不会试图向您阅读并提供建议。这是我认为为什么会搞砸的方法：

> Hmisc::describe(us.wt)
us.wt 
       n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75 
     432        0       32    0.691      Inf      NaN    4.264    7.864   16.032      Inf      Inf 
     .90      .95 
     Inf      Inf 

lowest :  0.5586839  1.1173678  3.1599027  3.4755957  4.2639763
highest: 20.3399270 21.7412450 27.0462314 28.2128223        Inf
Warning message:
In w * sort(x - mean(x)) :
  longer object length is not a multiple of shorter object length
# Notice that more than half of the values are Inf

> head(us.wt)
, ,  = Plunge <= 2 mm

   
                      F         M
  2       Inf  7.053206  7.053206
  3       Inf 10.870622 10.870622
  4       Inf       Inf       Inf
  5       Inf 15.922230 15.922230
  6       Inf       Inf       Inf
  7       Inf 13.581011 13.581011

, ,  = Plunge 2 - 4 mm

   
                      F         M
  2       Inf 14.106411 14.106411
  3       Inf       Inf       Inf
  4       Inf 17.682214 17.682214
  5       Inf       Inf       Inf
  6       Inf       Inf       Inf
  7       Inf       Inf       Inf

, ,  = Plunge > 4 mm

   
                      F         M
  2       Inf       Inf       Inf
  3       Inf 10.870622 10.870622
  4       Inf 17.682214 17.682214
  5       Inf       Inf       Inf
  6       Inf 15.348446 15.348446
  7       Inf 13.581011 13.581011

This is an answer to your question but not a solution to your problem. Look at the index and us.wt objects. It should be apparent that the naming of margins of the us.wt array doesn't match the values in the third column of index

str(us.wt)
 'table' num [1:48, 1:3, 1:3] Inf Inf Inf Inf Inf ...
 - attr(*, "dimnames")=List of 3
  ..$ : chr [1:48] "2" "3" "4" "5" ...
  ..$ : chr [1:3] "" "F" "M"
  ..$ : chr [1:3] "Plunge <= 2 mm" "Plunge 2 - 4 mm" "Plunge > 4 mm"
> str(index)
 chr [1:240, 1:3] "2" "7" "11" "75" "59" "3" "88" "13" "75" "80" "5" "3" "65" "66" "93" "45" ...
> head(index)
     [,1] [,2] [,3]
[1,] "2"  "M"  "1" 
[2,] "7"  "M"  "3" 
[3,] "11" "M"  "1" 
[4,] "75" "M"  "3" 
[5,] "59" "M"  "1" 
[6,] "3"  "M"  "3"

I also think the array construction of us.wt got messed up. Since there is no description of the logic or goals in constructing it, I'm not attempting to read you mind and offer suggestions. Here's how to see why I think it's messed up:

> Hmisc::describe(us.wt)
us.wt 
       n  missing distinct     Info     Mean      Gmd      .05      .10      .25      .50      .75 
     432        0       32    0.691      Inf      NaN    4.264    7.864   16.032      Inf      Inf 
     .90      .95 
     Inf      Inf 

lowest :  0.5586839  1.1173678  3.1599027  3.4755957  4.2639763
highest: 20.3399270 21.7412450 27.0462314 28.2128223        Inf
Warning message:
In w * sort(x - mean(x)) :
  longer object length is not a multiple of shorter object length
# Notice that more than half of the values are Inf

> head(us.wt)
, ,  = Plunge <= 2 mm

   
                      F         M
  2       Inf  7.053206  7.053206
  3       Inf 10.870622 10.870622
  4       Inf       Inf       Inf
  5       Inf 15.922230 15.922230
  6       Inf       Inf       Inf
  7       Inf 13.581011 13.581011

, ,  = Plunge 2 - 4 mm

   
                      F         M
  2       Inf 14.106411 14.106411
  3       Inf       Inf       Inf
  4       Inf 17.682214 17.682214
  5       Inf       Inf       Inf
  6       Inf       Inf       Inf
  7       Inf       Inf       Inf

, ,  = Plunge > 4 mm

   
                      F         M
  2       Inf       Inf       Inf
  3       Inf 10.870622 10.870622
  4       Inf 17.682214 17.682214
  5       Inf       Inf       Inf
  6       Inf 15.348446 15.348446
  7       Inf 13.581011 13.581011

回复收藏 0 原文

~没有更多了~