从循环粘贴到数据框架r的粘贴值r

发布于 2025-01-21 19:24:27 字数 1599 浏览 2 评论 0原文

我在R，Recurrent和L1HS中有两个数据帧。我正在尝试找到一种方法：

如果复发中的序列与L1HS中的序列匹配，请从列中的一列中粘贴一个从复发的列中粘贴到L1HS中的新列中。

复发数据帧看起来像这样：

> head(recurrent)
    chr     start       end X  Y level                   unique
1: chr4  56707846  56708347 0 38    03   chr4_56707846_56708347
2: chr1  20252181  20252682 0 37    03   chr1_20252181_20252682
3: chr2 224560903 224561404 0 37    03 chr2_224560903_224561404
4: chr5 131849595 131850096 0 36    03 chr5_131849595_131850096
5: chr7  46361610  46362111 0 36    03   chr7_46361610_46362111
6: chr1  20251169  20251670 0 36    03   chr1_20251169_20251670

L1HS数据集包含许多列包含遗传序列底部和一个列“序列”的列，希望在复发数据框架中与“唯一”匹配，例如：

> head(L1HS$Sequence)
"chr1_35031657_35037706" 
"chr1_67544575_67550598" 
"chr1_81404889_81410942" 
"chr1_84518073_84524089"
"chr1_87144764_87150794"

我知道如何使用匹配项：我知道如何使用匹配

test <- recurrent$unique %in% L1HS$Sequence

要获得布尔人：

> head(test)
[1] FALSE FALSE FALSE FALSE FALSE FALSE

但是我从这里有几个问题。如果找到了序列，我想将“级别”值从重复数据集复制到新列中的L1HS数据集。例如，如果在全长数据中找到了从复发数据中找到的序列“ CHR4_56707846_56708347”，我希望全长的数据框架看起来像：

Sequence                level    other_columns
chr4_56707846_56708347   03     gggtttcatgaccc....

我正在考虑尝试类似的东西：

for (i in L1HS){
   if (recurrent$unique %in% L1HS$Sequence{
     L1HS$level <- paste(recurrent$level[i])}
}

但是，这当然是' t工作，我无法弄清楚。

我想知道最好的方法是什么！我想知道合并/相交/应用是否更容易/更好，或者对于这样一个简单的问题，最佳实践可能是什么样的。我发现了一些类似的python/pandas示例，但我却陷入困境。

提前致谢！

原文

I have two dataframes in R, recurrent and L1HS. I am trying to find a way to do this:

If a sequence in recurrent matches sequence in L1HS, paste a value from a column in recurrent into new column in L1HS.

The recurrent dataframe looks like this:

> head(recurrent)
    chr     start       end X  Y level                   unique
1: chr4  56707846  56708347 0 38    03   chr4_56707846_56708347
2: chr1  20252181  20252682 0 37    03   chr1_20252181_20252682
3: chr2 224560903 224561404 0 37    03 chr2_224560903_224561404
4: chr5 131849595 131850096 0 36    03 chr5_131849595_131850096
5: chr7  46361610  46362111 0 36    03   chr7_46361610_46362111
6: chr1  20251169  20251670 0 36    03   chr1_20251169_20251670

The L1HS dataset contains many columns containing genetic sequence basepairs and a column "Sequence" that should hopefully have some matches with "unique" in the recurrent data frame, like so:

> head(L1HS$Sequence)
"chr1_35031657_35037706" 
"chr1_67544575_67550598" 
"chr1_81404889_81410942" 
"chr1_84518073_84524089"
"chr1_87144764_87150794"

I know how to search for matches using

test <- recurrent$unique %in% L1HS$Sequence

to get the Booleans:

> head(test)
[1] FALSE FALSE FALSE FALSE FALSE FALSE

But I have a couple of problems from here. If the sequence is found, I want to copy the "level" value from the recurrent dataset to the L1HS dataset in a new column. For example, if the sequence "chr4_56707846_56708347" from the recurrent data was found in the full-length data, I'd like the full-length data frame to look like:

Sequence                level    other_columns
chr4_56707846_56708347   03     gggtttcatgaccc....

I was thinking of trying something like:

for (i in L1HS){
   if (recurrent$unique %in% L1HS$Sequence{
     L1HS$level <- paste(recurrent$level[i])}
}

but of course this isn't working and I can't figure it out.

I am wondering what the best approach is here! I'm wondering if merge/intersect/apply might be easier/better, or just what best practice might look like for a somewhat simple question like this. I've found some similar examples for Python/pandas, but am stuck here.

Thanks in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

痴情 2025-01-28 19:24:27

您可以使用l1Hs使用dplyr进行简单的left_join将级别添加到。

library(dplyr)

L1HS %>%
  left_join(., recurrent %>% select(unique, level), by = c("Sequence" = "unique"))

或使用合并：

merge(x=L1HS,y=recurrent[, c("unique", "level")], by.x = "Sequence", by.y = "unique",all.x=TRUE)

输出

                Sequence level
1 chr1_35031657_35037706     4
2 chr1_67544575_67550598     2
3 chr1_81404889_81410942    NA
4 chr1_84518073_84524089     3
5 chr1_87144764_87150794    NA

*注意：这仍然将保留l1Hs中的所有列。我只是没有在下面的示例数据中创建任何其他列。

数据

recurrent <- structure(list(chr = c("chr4", "chr1", "chr2", "chr5", "chr7", 
"chr1"), start = c(56707846L, 20252181L, 224560903L, 131849595L, 
46361610L, 20251169L), end = c(56708347L, 20252682L, 224561404L, 
131850096L, 46362111L, 20251670L), X = c(0L, 0L, 0L, 0L, 0L, 
0L), Y = c(38L, 37L, 37L, 36L, 36L, 36L), level = c(3L, 2L, 3L, 
3L, 3L, 4L), unique = c("chr4_56707846_56708347", "chr1_67544575_67550598", 
"chr2_224560903_224561404", "chr5_131849595_131850096", "chr1_84518073_84524089", 
"chr1_35031657_35037706")), class = "data.frame", row.names = c(NA, 
-6L))

L1HS <- structure(list(Sequence = c("chr1_35031657_35037706", "chr1_67544575_67550598", 
"chr1_81404889_81410942", "chr1_84518073_84524089", "chr1_87144764_87150794"
)), class = "data.frame", row.names = c(NA, -5L))

You can do a simple left_join to add level to L1HS with dplyr.

library(dplyr)

L1HS %>%
  left_join(., recurrent %>% select(unique, level), by = c("Sequence" = "unique"))

Or with merge:

merge(x=L1HS,y=recurrent[, c("unique", "level")], by.x = "Sequence", by.y = "unique",all.x=TRUE)

Output

                Sequence level
1 chr1_35031657_35037706     4
2 chr1_67544575_67550598     2
3 chr1_81404889_81410942    NA
4 chr1_84518073_84524089     3
5 chr1_87144764_87150794    NA

*Note: This will still retain all the columns in L1HS. I just didn't create any additional columns in the example data below.

Data

recurrent <- structure(list(chr = c("chr4", "chr1", "chr2", "chr5", "chr7", 
"chr1"), start = c(56707846L, 20252181L, 224560903L, 131849595L, 
46361610L, 20251169L), end = c(56708347L, 20252682L, 224561404L, 
131850096L, 46362111L, 20251670L), X = c(0L, 0L, 0L, 0L, 0L, 
0L), Y = c(38L, 37L, 37L, 36L, 36L, 36L), level = c(3L, 2L, 3L, 
3L, 3L, 4L), unique = c("chr4_56707846_56708347", "chr1_67544575_67550598", 
"chr2_224560903_224561404", "chr5_131849595_131850096", "chr1_84518073_84524089", 
"chr1_35031657_35037706")), class = "data.frame", row.names = c(NA, 
-6L))

L1HS <- structure(list(Sequence = c("chr1_35031657_35037706", "chr1_67544575_67550598", 
"chr1_81404889_81410942", "chr1_84518073_84524089", "chr1_87144764_87150794"
)), class = "data.frame", row.names = c(NA, -5L))

回复收藏 0 原文

~没有更多了~