R:用子字符串替换数据框的行名[2]
我有一个关于 gsub 的使用的问题。我的数据的行名称具有相同的部分名称。见下文:
> rownames(test)
[1] "U2OS.EV.2.7.9" "U2OS.PIM.2.7.9" "U2OS.WDR.2.7.9" "U2OS.MYC.2.7.9"
[5] "U2OS.OBX.2.7.9" "U2OS.EV.18.6.9" "U2O2.PIM.18.6.9" "U2OS.WDR.18.6.9"
[9] "U2OS.MYC.18.6.9" "U2OS.OBX.18.6.9" "X1.U2OS...OBX" "X2.U2OS...MYC"
[13] "X3.U2OS...WDR82" "X4.U2OS...PIM" "X5.U2OS...EV" "exp1.U2OS.EV"
[17] "exp1.U2OS.MYC" "EXP1.U20S..PIM1" "EXP1.U2OS.WDR82" "EXP1.U20S.OBX"
[21] "EXP2.U2OS.EV" "EXP2.U2OS.MYC" "EXP2.U2OS.PIM1" "EXP2.U2OS.WDR82"
[25] "EXP2.U2OS.OBX"
在我之前的问题中,我问是否有办法为相同的部分名称获取相同的名称。请参阅此问题:用子字符串替换数据帧的行名称
答案是一个非常好的解决方案。函数 gsub 是这样使用的:
transfecties = gsub(".*(MYC|EV|PIM|WDR|OBX).*", "\\1", rownames(test)
现在,我有另一个问题,我用 R 运行的程序 ( Galaxy) 无法识别 |人物。我的问题是,是否有另一种方法可以在不使用此 | 的情况下获得相同的解决方案?
谢谢!
I have a question about the use of gsub. The rownames of my data, have the same partial names. See below:
> rownames(test)
[1] "U2OS.EV.2.7.9" "U2OS.PIM.2.7.9" "U2OS.WDR.2.7.9" "U2OS.MYC.2.7.9"
[5] "U2OS.OBX.2.7.9" "U2OS.EV.18.6.9" "U2O2.PIM.18.6.9" "U2OS.WDR.18.6.9"
[9] "U2OS.MYC.18.6.9" "U2OS.OBX.18.6.9" "X1.U2OS...OBX" "X2.U2OS...MYC"
[13] "X3.U2OS...WDR82" "X4.U2OS...PIM" "X5.U2OS...EV" "exp1.U2OS.EV"
[17] "exp1.U2OS.MYC" "EXP1.U20S..PIM1" "EXP1.U2OS.WDR82" "EXP1.U20S.OBX"
[21] "EXP2.U2OS.EV" "EXP2.U2OS.MYC" "EXP2.U2OS.PIM1" "EXP2.U2OS.WDR82"
[25] "EXP2.U2OS.OBX"
In my previous question, I asked if there is a way to get the same names for the same partial names. See this question: Replacing rownames of data frame by a sub-string
The answer is a very nice solution. The function gsub is used in this way:
transfecties = gsub(".*(MYC|EV|PIM|WDR|OBX).*", "\\1", rownames(test)
Now, I have another problem, the program I run with R (Galaxy) doesn't recognize the | characters. My question is, is there another way to get to the same solution without using this |?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您不想使用“|”角色,你可以尝试这样的东西:
但我会认真考虑向 Galaxy 团队提及这一点,因为不能使用 OR 符号似乎相当尴尬......
If you don't want to use the "|" character, you can try something like :
But I would seriously consider mentioning this to the team of galaxy, as it seems to be rather awkward not to be able to use the symbol for OR...
我不会建议在 R 中这样做,因为它的效率远低于 解决方案 @csgillespie 提供,但另一种方法是循环遍历您想要匹配的各种字符串,分别对每个字符串进行替换,即搜索
"MYN"
并仅替换那些与"MYN"
匹配的行名。以下是使用 @csgillespie 的 答案:
复制数据,以便我们稍后可以比较(这只是为了示例):
然后创建一个字符串列表你想要匹配:
然后我们循环
matches
中的值并执行三件事(代码中编号为##X
):i
与我们要使用的正则表达式的其他位,grepl()
我们返回x2
这些元素的逻辑指示符code> 包含字符串i
gsub()
,但仅使用与字符串匹配的x2
元素,并仅替换这些元素。循环是:
给出:
I wouldn't recommend doing this in general in R as it is far less efficient than the solution @csgillespie provided, but an alternative is to loop over the various strings you want to match and do the replacements on each string separately, i.e. search for
"MYN"
and replace only in those rownames that match"MYN"
.Here is an example using the
x
data from @csgillespie's Answer:Copy the data so we have something to compare with later (this just for the example):
Then create a list of strings you want to match on:
Then we loop over the values in
matches
and do three things (numbered##X
in the code):i
with the other bits of the regular expression we want to use,grepl()
we return a logical indicator for those elements ofx2
that contain the stringi
gsub()
call as you were already shown, but use only the elements ofx2
that matched the string, and replace only those elements.The loop is:
Which gives: