R：用子字符串替换数据框的行名[2]

发布于 2024-11-14 09:28:23 字数 1037 浏览 6 评论 0原文

我有一个关于 gsub 的使用的问题。我的数据的行名称具有相同的部分名称。见下文：

> rownames(test)
[1] "U2OS.EV.2.7.9"   "U2OS.PIM.2.7.9"  "U2OS.WDR.2.7.9"  "U2OS.MYC.2.7.9"
[5] "U2OS.OBX.2.7.9"  "U2OS.EV.18.6.9"  "U2O2.PIM.18.6.9" "U2OS.WDR.18.6.9"
[9] "U2OS.MYC.18.6.9" "U2OS.OBX.18.6.9" "X1.U2OS...OBX"   "X2.U2OS...MYC"
[13] "X3.U2OS...WDR82" "X4.U2OS...PIM"   "X5.U2OS...EV"    "exp1.U2OS.EV"
[17] "exp1.U2OS.MYC"   "EXP1.U20S..PIM1" "EXP1.U2OS.WDR82" "EXP1.U20S.OBX"
[21] "EXP2.U2OS.EV"    "EXP2.U2OS.MYC"   "EXP2.U2OS.PIM1"  "EXP2.U2OS.WDR82"
[25] "EXP2.U2OS.OBX"

在我之前的问题中，我问是否有办法为相同的部分名称获取相同的名称。请参阅此问题：用子字符串替换数据帧的行名称

答案是一个非常好的解决方案。函数 gsub 是这样使用的：

 transfecties = gsub(".*(MYC|EV|PIM|WDR|OBX).*", "\\1", rownames(test)

现在，我有另一个问题，我用 R 运行的程序 ( Galaxy) 无法识别 |人物。我的问题是，是否有另一种方法可以在不使用此 | 的情况下获得相同的解决方案？

谢谢！

原文

I have a question about the use of gsub. The rownames of my data, have the same partial names. See below:

> rownames(test)
[1] "U2OS.EV.2.7.9"   "U2OS.PIM.2.7.9"  "U2OS.WDR.2.7.9"  "U2OS.MYC.2.7.9"
[5] "U2OS.OBX.2.7.9"  "U2OS.EV.18.6.9"  "U2O2.PIM.18.6.9" "U2OS.WDR.18.6.9"
[9] "U2OS.MYC.18.6.9" "U2OS.OBX.18.6.9" "X1.U2OS...OBX"   "X2.U2OS...MYC"
[13] "X3.U2OS...WDR82" "X4.U2OS...PIM"   "X5.U2OS...EV"    "exp1.U2OS.EV"
[17] "exp1.U2OS.MYC"   "EXP1.U20S..PIM1" "EXP1.U2OS.WDR82" "EXP1.U20S.OBX"
[21] "EXP2.U2OS.EV"    "EXP2.U2OS.MYC"   "EXP2.U2OS.PIM1"  "EXP2.U2OS.WDR82"
[25] "EXP2.U2OS.OBX"

In my previous question, I asked if there is a way to get the same names for the same partial names. See this question: Replacing rownames of data frame by a sub-string

The answer is a very nice solution. The function gsub is used in this way:

 transfecties = gsub(".*(MYC|EV|PIM|WDR|OBX).*", "\\1", rownames(test)

Now, I have another problem, the program I run with R (Galaxy) doesn't recognize the | characters. My question is, is there another way to get to the same solution without using this |?

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

雪若未夕 2024-11-21 09:28:23

如果您不想使用“|”角色，你可以尝试这样的东西：

Rnames <-
c( "U2OS.EV.2.7.9",   "U2OS.PIM.2.7.9",  "U2OS.WDR.2.7.9",  "U2OS.MYC.2.7.9" ,
 "U2OS.OBX.2.7.9" , "U2OS.EV.18.6.9"  ,"U2O2.PIM.18.6.9" ,"U2OS.WDR.18.6.9"  )

Rlevels <- c("MYC","EV","PIM","WDR","OBX")    
tmp <- sapply(Rlevels,grepl,Rnames)
apply(tmp,1,function(i)colnames(tmp)[i])
[1] "EV"  "PIM" "WDR" "MYC" "OBX" "EV"  "PIM" "WDR"

但我会认真考虑向 Galaxy 团队提及这一点，因为不能使用 OR 符号似乎相当尴尬......

If you don't want to use the "|" character, you can try something like :

Rnames <-
c( "U2OS.EV.2.7.9",   "U2OS.PIM.2.7.9",  "U2OS.WDR.2.7.9",  "U2OS.MYC.2.7.9" ,
 "U2OS.OBX.2.7.9" , "U2OS.EV.18.6.9"  ,"U2O2.PIM.18.6.9" ,"U2OS.WDR.18.6.9"  )

Rlevels <- c("MYC","EV","PIM","WDR","OBX")    
tmp <- sapply(Rlevels,grepl,Rnames)
apply(tmp,1,function(i)colnames(tmp)[i])
[1] "EV"  "PIM" "WDR" "MYC" "OBX" "EV"  "PIM" "WDR"

But I would seriously consider mentioning this to the team of galaxy, as it seems to be rather awkward not to be able to use the symbol for OR...

回复收藏 0 原文

初相遇 2024-11-21 09:28:23

我不会建议在 R 中这样做，因为它的效率远低于解决方案 @csgillespie 提供，但另一种方法是循环遍历您想要匹配的各种字符串，分别对每个字符串进行替换，即搜索 "MYN" 并仅替换那些与 "MYN" 匹配的行名。

以下是使用 @csgillespie 的答案：

x <-  c("U2OS.EV.2.7.9", "U2OS.PIM.2.7.9", "U2OS.WDR.2.7.9", "U2OS.MYC.2.7.9",
       "U2OS.OBX.2.7.9", "U2OS.EV.18.6.9", "U2O2.PIM.18.6.9","U2OS.WDR.18.6.9",
       "U2OS.MYC.18.6.9","U2OS.OBX.18.6.9", "X1.U2OS...OBX","X2.U2OS...MYC")

复制数据，以便我们稍后可以比较（这只是为了示例）：

x2 <- x

然后创建一个字符串列表你想要匹配：

matches <- c("MYC","EV","PIM","WDR","OBX")

然后我们循环 matches 中的值并执行三件事（代码中编号为 ##X）：

通过将以下内容粘贴在一起来创建正则表达式当前匹配字符串 i 与我们要使用的正则表达式的其他位，
使用 grepl() 我们返回 x2 这些元素的逻辑指示符code> 包含字符串 i
然后我们使用相同的样式正如您已经显示的那样调用 gsub() ，但仅使用与字符串匹配的 x2 元素，并仅替换这些元素。

循环是：

for(i in matches) {
    rgexp <- paste(".*(", i, ").*", sep = "") ## 1
    ind <- grepl(rgexp, x)                    ## 2
    x2[ind] <- gsub(rgexp, "\\1", x2[ind])    ## 3
}
x2

给出：

> x2
 [1] "EV"  "PIM" "WDR" "MYC" "OBX" "EV"  "PIM" "WDR" "MYC" "OBX" "OBX" "MYC"

I wouldn't recommend doing this in general in R as it is far less efficient than the solution @csgillespie provided, but an alternative is to loop over the various strings you want to match and do the replacements on each string separately, i.e. search for "MYN" and replace only in those rownames that match "MYN".

Here is an example using the x data from @csgillespie's Answer:

x <-  c("U2OS.EV.2.7.9", "U2OS.PIM.2.7.9", "U2OS.WDR.2.7.9", "U2OS.MYC.2.7.9",
       "U2OS.OBX.2.7.9", "U2OS.EV.18.6.9", "U2O2.PIM.18.6.9","U2OS.WDR.18.6.9",
       "U2OS.MYC.18.6.9","U2OS.OBX.18.6.9", "X1.U2OS...OBX","X2.U2OS...MYC")

Copy the data so we have something to compare with later (this just for the example):

x2 <- x

Then create a list of strings you want to match on:

matches <- c("MYC","EV","PIM","WDR","OBX")

Then we loop over the values in matches and do three things (numbered ##X in the code):

Create the regular expression by pasting together the current match string i with the other bits of the regular expression we want to use,
Using grepl() we return a logical indicator for those elements of x2 that contain the string i
We then use the same style gsub() call as you were already shown, but use only the elements of x2 that matched the string, and replace only those elements.

The loop is:

for(i in matches) {
    rgexp <- paste(".*(", i, ").*", sep = "") ## 1
    ind <- grepl(rgexp, x)                    ## 2
    x2[ind] <- gsub(rgexp, "\\1", x2[ind])    ## 3
}
x2

Which gives:

> x2
 [1] "EV"  "PIM" "WDR" "MYC" "OBX" "EV"  "PIM" "WDR" "MYC" "OBX" "OBX" "MYC"

回复收藏 0 原文

~没有更多了~

关于作者

爱要勇敢去追

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

R：用子字符串替换数据框的行名[2]

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

R：用子字符串替换数据框的行名[2]

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

尘曦

在梵高的星空下

善良天后

韬韬不绝

qq_CgiN62

不美如何

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。