使用R将数据分组崩溃以获得新的权重,但仍获得无限范围
我有一个神经外科患者的数据集,我正在创建生存曲线。我正在尝试调整曲线,以匹配2000年美国人口的年龄分布,该分布包含在R生存包中。此“ USPOP2”数据集是一个带有日历年的数组。首先,我只想查看50岁及以上的年龄,因此我将使用相同的上年龄阈值来创建一个在组中观察到的年龄/性别计数的表“ TAB100”。新权重为
I have a dataset of neurosurgery patients for which I am creating survival curves. I am trying to adjust my curves to match the age-sex distribution of the 2000 US population, which is included in the R survival package. This 'uspop2' dataset is an array with and calendar year. First, I'm only going to look at ages 50 and over, so I'll create a table 'tab100' of observed age/sex counts within group for our own data, using the same upper age threshold. New weights are the values of ????????= pi.us/tab100.
Here is the first code I write (note that I am using R in rpy2 in google collab):
%%R
#Reweighting
mydata$group <- factor(1 + 1*(mydata$Drill.Plunge..mm. > 2) + 1*(mydata$Drill.Plunge..mm. > 4), levels=1:3,labels=c("Plunge <= 2 mm", "Plunge 2 - 4 mm", "Plunge > 4 mm"))
refpop <- uspop2[as.character(50:100),c("female", "male"), "2000"]
pi.us <- refpop/sum(refpop)
age100 <- factor(ifelse(mydata$Age..yrs. >100, 100, mydata$Age..yrs.), levels=50:100)
tab100 <- with(mydata, table(age100, mydata$Sex, mydata$group))/ nrow(mydata)
us.wt <- rep(pi.us, 3)/ tab100 #new weights by age,sex, group
range(us.wt)
This yields a range of 0.006709405 to Infinity! This infinite weight happens because the US population has all age-sex combos represented, but my neurosurgery patient dataset does not. To get rid of these infinite weights, I attempt to collapse the US population into separate age groups...
%%R
mydata$group <- factor(1 + 1*(mydata$Drill.Plunge..mm. > 2) + 1*(mydata$Drill.Plunge..mm. > 4), levels=1:3,labels=c("Plunge <= 2 mm", "Plunge 2 - 4 mm", "Plunge > 4 mm"))
temp <- as.numeric(cut(50:100, c(49, 54, 59, 64, 69, 74, 79, 89, 110)+.5))
pi.us<- tapply(refpop, list(temp[row(refpop)], col(refpop)), sum)/sum(refpop)
print(pi.us)
tab2 <- with(mydata, table(mydata$Age..yrs., mydata$Sex, mydata$group))/nrow(mydata)
print(tab2)
us.wt <- rep(pi.us, 3)/tab2
print(range(us.wt))
index <- with(mydata, cbind(mydata$Age..yrs., mydata$Sex,
as.numeric(mydata$group)))
mydata$uswt <- us.wt[index]
sfit3a <-survfit(Surv(Patient.LOS..days., Events) ~ group, data=mydata, weight=uswt)
Printing pi.us and tab2 show me that I did successfully collapse the ages into 8 groups. Yet when I set us.wt <- rep(pi.us, 3)/tab2, us.wt is still the exact same as before! It doesn't change. You can see below that the range outputted has a different lower bound, but still goes all the way up to infinity. It's no surprise, that I get a subscript out of bounds error for the next line of code. What the heck is going on?
[1] 0.4655699 Inf
R[write to console]: Error in `[.default`(us.wt, index) : subscript out of bounds
Error in `[.default`(us.wt, index) : subscript out of bounds
BTW I am basing my code exactly off of page 7 of this R paper: https://cran.r-project.org/web/packages/survival/vignettes/adjcurve.pdf
What am I doing wrong? :( Thanks for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是对您的问题的答案,但不是解决问题的方法。查看
index
和us.wt
对象。显然,us.wt
数组的边距的命名不匹配index
的第三列中的值,我也认为我们的数组构造。 wt搞砸了。由于没有关于构建逻辑或目标的描述,因此我不会试图向您阅读并提供建议。这是我认为为什么会搞砸的方法:
This is an answer to your question but not a solution to your problem. Look at the
index
andus.wt
objects. It should be apparent that the naming of margins of theus.wt
array doesn't match the values in the third column ofindex
I also think the array construction of us.wt got messed up. Since there is no description of the logic or goals in constructing it, I'm not attempting to read you mind and offer suggestions. Here's how to see why I think it's messed up: