在summary()中提取匹配变量
示例
gene_nameomot_idmatched_sequenceAy1CCCAy2CCAAAAy3AAGAy3ATBy1AAAABy4AATCy5AAGG | : | R |
---|---|---|
gene_nameNode1Node2 | 那样 | 序列 |
, | Ay1y2CCC | CCAAA |
数据 | 有 | 一个 |
我 | 的 | 数据 |
中 | 尝试 | 出现 |
像 | 集 | 并 |
集 | | |
获取
| | | | |
---|---|---|---|---|
| | | | 2 |
A | y1 | y3 | CCC,AAG,AAT | 3 |
A | y2 | y3 | CCAAA,AGG,AAT | 3 |
B | y1 | y4 | AAAA,AAT | 2 |
motif_id 列始终有一个目标,并从没有任何重叠的起始列及其列表的每个组合中寻找共同的 gene_name的序列。
我已经尝试过:
data%>%
group_by(gene_name, motif_id) %>%
summarize(matched_sequence = paste0(matched_sequence, collapse = ",")) %>%
mutate(count = n()) %>% filter(count>=2) %>%
summarize(motif_id = combn(motif_id, 2, function(x) list(setNames(x, c('Node1', 'Node2')))), matched_sequence = toString(matched_sequence),
.groups = 'keep') %>%
tidyr::unnest_wider(motif_id)
但是未能获取序列和发生列。有人能给我建议吗?
I have a example data set
gene_name | motif_id | matched_sequence |
---|---|---|
A | y1 | CCC |
A | y2 | CCAAA |
A | y3 | AAG |
A | y3 | AT |
B | y1 | AAAA |
B | y4 | AAT |
C | y5 | AAGG |
and trying to get dataset like in R :
gene_name | Node1 | Node2 | sequence | occurence |
---|---|---|---|---|
A | y1 | y2 | CCC, CCAAA | 2 |
A | y1 | y3 | CCC,AAG,AAT | 3 |
A | y2 | y3 | CCAAA,AGG,AAT | 3 |
B | y1 | y4 | AAAA,AAT | 2 |
motif_id column alway has a target and looking for common gene_name from each combination of start column without any overlaps and its list of sequence.
I have tried :
data%>%
group_by(gene_name, motif_id) %>%
summarize(matched_sequence = paste0(matched_sequence, collapse = ",")) %>%
mutate(count = n()) %>% filter(count>=2) %>%
summarize(motif_id = combn(motif_id, 2, function(x) list(setNames(x, c('Node1', 'Node2')))), matched_sequence = toString(matched_sequence),
.groups = 'keep') %>%
tidyr::unnest_wider(motif_id)
however failed to acquire sequence and occurence columns. Can anyone give me an advise?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我们按“gene_name”分组,仅保留“motif_id”中不同(
n_distinct
)元素数量大于 1 的组。获取“unique”的成对组合。 ' 元素,通过提取与 'motif_id' 值匹配的 'matched_sequence' 来创建 'sequence',获取该序列的长度
'occurrence' 中的list
,使用unnest_wider
从list
列创建列,并转换 'sequence'list
> 通过将list
中的元素粘贴
到character
列- 输出
数据
We group by 'gene_name', keep only the groups where the number of distinct (
n_distinct
elements in 'motif_id' is greater than 1. get the pairwisecombn
ations of 'unique' elements, create the 'sequence' by extracting the 'matched_sequence' that matches with the 'motif_id' values, get thelengths
of thelist
in 'occurence', useunnest_wider
to create columns from thelist
column, and convert the 'sequence'list
tocharacter
column bypaste
ing the elements in thelist
-output
data