在R中重新编码复杂的综合评分
假设我的研究涉及一项观察性纵向队列研究。
令γ_comp成为感兴趣的综合结果, γ1 ....γ4 at time t1 < /strong>和 t2 表示γ_comp的组件。此外,数据集还有其他三个变量(χ1,χ2和χ3),这些变量将在以后的分析中使用,但并不是必需的代码γ_comp。 的摘录,
df <- structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
Y1_t1 = c(5, 6, 10, 7, 5, 7, 5, 4, 7, 4),
Y2_t1 = c(6, 4, 8, 8, 7, 10, 7, 6, 5, 7),
Y3_t1 = c(5, 6, 10, 4, 8, 5, 10, 5, 4, 6),
Y4_t1 = c(4.5, 8.5, 9.5, 4.5, 5, 8, 4.5, 8.5, 4, 6),
Y1_t2 = c(6, 4, 5, 5, 3, 4, 8, 4, 3, 2),
Y2_t2 = c(5, 4, 3, 6, 5, 5, 5, 2, 2, 8),
Y3_t2 = c(2, 2, 4, 5, 4, 9, 5, 3, 2, 4),
Y4_t2 = c(3.5, 6, 5, 5, 4.5, 4, 2.5, 7, 4.5, 4),
X1 = c(40, 45, 52, 44, 42, 65, 55, 61, 52, 49),
X2 = c("NL", "UK", "NL", "US", "UK", "US", "NL", "NL", "UK", "UK"),
X3 = c(2000, 2005, 2003, 2000, 2001, 2002, 2003, 2004, 2001, 2000)),
class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L))
这是data.frame
结构
spec_tbl_df [10 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ ID : num [1:10] 1 2 3 4 5 6 7 8 9 10
$ Y1_t1: num [1:10] 5 6 10 7 5 7 5 4 7 4
$ Y2_t1: num [1:10] 6 4 8 8 7 10 7 6 5 7
$ Y3_t1: num [1:10] 5 6 10 4 8 5 10 5 4 6
$ Y4_t1: num [1:10] 4.5 8.5 9.5 4.5 5 8 4.5 8.5 4 6
$ Y1_t2: num [1:10] 6 4 5 5 3 4 8 4 3 2
$ Y2_t2: num [1:10] 5 4 3 6 5 5 5 2 2 8
$ Y3_t2: num [1:10] 2 2 4 5 4 9 5 3 2 4
$ Y4_t2: num [1:10] 3.5 6 5 5 4.5 4 2.5 7 4.5 4
$ X1 : num [1:10] 40 45 52 44 42 65 55 61 52 49
$ X2 : chr [1:10] "NL" "UK" "NL" "US" ...
$ X3 : num [1:10] 2000 2005 2003 2000 2001 ...
如前所述,我有兴趣计算γ_comp。记录的规则如下:
- 4个组件中有3个(即,γ1....γ4必须在数字刻度上具有超过 20%的改进(IE降低) (0-10)[与 t2 相比,在 t1 时,在 t2 上更高]。
- 与 t2 相比
我相信必须采取以下步骤来实现这一目标。首先,必须为每个组件计算y1_diff = y1_t2/y1_t1
。这是两个时间点之间的比例,应为&lt; 0.80。接下来,必须应用if_else条件
,如果满足规则,并且0
(如果不是(即,IE,)对治疗是否有回应)。
例如,这可能是所需的输出:
ID Ycomp Y1_t1 Y2_t1 Y3_t1 Y4_t1 Y1_t2 Y2_t2 Y3_t2 Y4_t2 Y1_diff Y2_diff Y3_diff Y4_diff X1 X2 X3
1 1 0 5 6 5 4.5 6 5 2 3.5 1.2 0.83 0.4 0.78 40 NL 2000
2 2 1 6 4 6 8.5 4 4 2 6 0.67 1 0.33 0.71 45 UK 2005
3 3 1 10 8 10 9.5 5 3 4 5 0.5 0.38 0.4 0.53 52 NL 2003
4 4 0 7 8 4 4.5 5 6 5 5 0.71 0.75 1.25 1.11 44 US 2000
5 5 1 5 7 8 5 3 5 4 4.5 0.6 0.71 0.5 0.9 42 UK 2001
6 6 0 7 10 5 8 4 5 9 4 0.57 0.5 1.8 0.5 65 US 2002
7 7 0 5 7 10 4.5 8 5 5 2.5 1.6 0.71 0.5 0.56 55 NL 2003
8 8 0 4 6 5 8.5 4 2 3 7 1 0.33 0.6 0.82 61 NL 2004
9 9 1 7 5 4 4 3 2 2 4.5 0.43 0.4 0.5 1.13 52 UK 2001
10 10 1 4 7 6 6 2 8 4 4 0.5 1.14 0.67 0.67 49 UK 2000
我感谢您对复合分数γ_comp的任何建议。也欢迎替代方法。这个想法是在将来的分析中使用γ_comp。
Assume my research concerns an observational longitudinal cohort study.
Let γ_comp be the composite outcome of interest and γ1....γ4 at time t1 and t2 denote components of γ_comp. In addition, the dataset has three other variables (χ1, χ2, and χ3) which will be used in future analysis but are not necessary to code γ_comp. Here is an excerpt of the data.frame
df <- structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
Y1_t1 = c(5, 6, 10, 7, 5, 7, 5, 4, 7, 4),
Y2_t1 = c(6, 4, 8, 8, 7, 10, 7, 6, 5, 7),
Y3_t1 = c(5, 6, 10, 4, 8, 5, 10, 5, 4, 6),
Y4_t1 = c(4.5, 8.5, 9.5, 4.5, 5, 8, 4.5, 8.5, 4, 6),
Y1_t2 = c(6, 4, 5, 5, 3, 4, 8, 4, 3, 2),
Y2_t2 = c(5, 4, 3, 6, 5, 5, 5, 2, 2, 8),
Y3_t2 = c(2, 2, 4, 5, 4, 9, 5, 3, 2, 4),
Y4_t2 = c(3.5, 6, 5, 5, 4.5, 4, 2.5, 7, 4.5, 4),
X1 = c(40, 45, 52, 44, 42, 65, 55, 61, 52, 49),
X2 = c("NL", "UK", "NL", "US", "UK", "US", "NL", "NL", "UK", "UK"),
X3 = c(2000, 2005, 2003, 2000, 2001, 2002, 2003, 2004, 2001, 2000)),
class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L))
Structure
spec_tbl_df [10 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ ID : num [1:10] 1 2 3 4 5 6 7 8 9 10
$ Y1_t1: num [1:10] 5 6 10 7 5 7 5 4 7 4
$ Y2_t1: num [1:10] 6 4 8 8 7 10 7 6 5 7
$ Y3_t1: num [1:10] 5 6 10 4 8 5 10 5 4 6
$ Y4_t1: num [1:10] 4.5 8.5 9.5 4.5 5 8 4.5 8.5 4 6
$ Y1_t2: num [1:10] 6 4 5 5 3 4 8 4 3 2
$ Y2_t2: num [1:10] 5 4 3 6 5 5 5 2 2 8
$ Y3_t2: num [1:10] 2 2 4 5 4 9 5 3 2 4
$ Y4_t2: num [1:10] 3.5 6 5 5 4.5 4 2.5 7 4.5 4
$ X1 : num [1:10] 40 45 52 44 42 65 55 61 52 49
$ X2 : chr [1:10] "NL" "UK" "NL" "US" ...
$ X3 : num [1:10] 2000 2005 2003 2000 2001 ...
As mentioned earlier, I am interested in calculating γ_comp. The rules for recording are as follows:
- 3 out of 4 components (i.e., γ1....γ4 must have more than 20% improvement (i.e. decrease) on numeric scale (0 - 10) [higher is worse] at t2 compared to t1).
- In the "remaining component," there should be no worsening of more than 20% at t2 compared to t1
I believe the following steps have to be taken to achieve this aim. First, Y1_diff = Y1_t2/Y1_t1
must be calculated for every component. This is the proportion between two time points and should be <0.80. Next, an if_else condition
has to be applied, which reinforces these rules and returns 1
if the rules are met and 0
if not (i.e., "responded" to treatment or not).
For example, this could be a desired output:
ID Ycomp Y1_t1 Y2_t1 Y3_t1 Y4_t1 Y1_t2 Y2_t2 Y3_t2 Y4_t2 Y1_diff Y2_diff Y3_diff Y4_diff X1 X2 X3
1 1 0 5 6 5 4.5 6 5 2 3.5 1.2 0.83 0.4 0.78 40 NL 2000
2 2 1 6 4 6 8.5 4 4 2 6 0.67 1 0.33 0.71 45 UK 2005
3 3 1 10 8 10 9.5 5 3 4 5 0.5 0.38 0.4 0.53 52 NL 2003
4 4 0 7 8 4 4.5 5 6 5 5 0.71 0.75 1.25 1.11 44 US 2000
5 5 1 5 7 8 5 3 5 4 4.5 0.6 0.71 0.5 0.9 42 UK 2001
6 6 0 7 10 5 8 4 5 9 4 0.57 0.5 1.8 0.5 65 US 2002
7 7 0 5 7 10 4.5 8 5 5 2.5 1.6 0.71 0.5 0.56 55 NL 2003
8 8 0 4 6 5 8.5 4 2 3 7 1 0.33 0.6 0.82 61 NL 2004
9 9 1 7 5 4 4 3 2 2 4.5 0.43 0.4 0.5 1.13 52 UK 2001
10 10 1 4 7 6 6 2 8 4 4 0.5 1.14 0.67 0.67 49 UK 2000
I would appreciate any advice on recoding the composite score γ_comp. Alternative methods are also welcome. The idea is to use γ_comp in logistic regression in future analysis.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这应该为您做。
假设我的理解是正确的:输出:
说明:
这是
df
和一个包含两个列ID> ID
和<的新数据框架之间的内在加入, 代码> y_comp 。第二帧是如何创建的?ID
和以“ Y” i i键长的那些列group_by(id)
),我将生成一个列impt
,因为次数更改
超过0.2。 对于每个ID的id
,这是常数impct == 4
(即所有都是改进,我将y_comp
true定义为true )或,如果三个是改进,并且集合中的最小值不小于负0.2)。rostocate()
:
更新 有错误,可能是由命名空间碰撞引起的;一种解决方案是针对所使用的软件包具体说明:
This should do it for you, assuming my understanding is correct:
Output:
Explanation:
This is an inner join between
df
, and a new dataframe that contains two columnsID
andY_comp
. How is this second frame created?ID
and those starting with "Y"group_by(ID)
), I generate a columnimpt
as the number of timeschange
exceeds 0.2. This is constant overID
Y_comp
as TRUE if all of the rows haveimpct==4
(i.e. all are improvements) OR, if three are improvements and the minimum in the set is not less than negative 0.2).relocate()
Update:
The OP is having an error, likely caused by namespace collision; one solution is to be specific about the packages being used:
通过Langtang方法渗透,我发现了一个可能的解决方案:
解释
我首先创建四个变量,该变量评估相对差异是否小于0.8(IE ,提高了20%),在0.8-1.2之间或恶化,超过1.2。在改进的情况下,这些变量之间的这些(YN_DIFF)之间的编码为+1,+0如果在之间,则为-1。我还查看了是否在时间 t2 时,变量输出为零,并给出了0的得分,因为在我的真实数据集中,有一些方案 t1 和<强> t2 是0,这给出了 naan 误差。最后,我添加了所有变量,该变量在变量YCOMP中给出了正确的输出。
输出
Insipired by the method of langtang, I found one possible solution to the problem:
Explanation
I am creating four variables first, which assess whether the relative difference was less than 0.8 (i.e., 20% improved), between 0.8-1.2, or worsened and was more than >1.2. In the case of improvement, these between variables (Yn_diff) were coded +1, +0 if in between, and -1 if worsened. I also looked if, at time t2, the variable output was zero and gave it a score of 0 because, in my real dataset, there were scenario's where both t1 and t2 were 0, which gives NaaN error. Finally, I added up all the variables, which gives the correct output in the variable Ycomp.
Output