在r dataframe中的列字符串之间获取差异
我在R中有一个基本问题:
考虑到我有一个数据框架,每列将核苷酸突变的集合成两个样本“主要”和“次要”
major <- c("T2A,C26T,G652A")
minor <- c("T2A,C26T,G652A,C725T")
df <- data.frame(major,minor)
tibble(df)
#A tibble: 1 x 2
major minor
<chr> <chr>
1 T2A,C26T,G652A T2A,C26T,G652A,C725T
,我想确定“次要”中存在的突变不在“专业”中。
我知道,如果那些“主要”和“次要”突变是存储的矢量,我可以使用setdiff获得这种差异,但是,我收到的数据被存储为长字符串,其中有一些突变由逗号分隔,而我不喜欢t知道如何将此列字符串转换为数据框架中的列向量以获得此差异(我尝试了无成功)。
直接在列中使用SetDiff:
setdiff(df$minor, df$major)
# I got
[1] "T2A C26T G652A C725T"
预期的结果是:
C725T
有人可以帮助我吗?
最好的,
I'm with a fundamental question in R:
Considering that I have a data frame, where each column represent the set of nucleotide mutations into two samples 'major' and 'minor'
major <- c("T2A,C26T,G652A")
minor <- c("T2A,C26T,G652A,C725T")
df <- data.frame(major,minor)
tibble(df)
#A tibble: 1 x 2
major minor
<chr> <chr>
1 T2A,C26T,G652A T2A,C26T,G652A,C725T
And I want to identify the mutations present in 'minor' that aren't in 'major'.
I know that if those 'major' and 'minor' mutations were stored vectors, I could use setdiff to get this difference, but, the data that I received is stored as a long string with some mutations separated by comma, and I don't know how transform this column string to a column vector in the data frame to get this difference (I tried without success).
using the setdiff directly in the columns:
setdiff(df$minor, df$major)
# I got
[1] "T2A C26T G652A C725T"
The expected results was:
C725T
Could anyone help me?
Best,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这在多行数据框架上起作用,按行进行比较:
请注意,它确实修改了
Major
和Minor
列,将它们变成每个内包含字符向量的列表列排。如果您需要保留原始内容,则可以将.names
参数转换为。This works on a multi-row data frame, doing comparisons by row:
Note that it does modify the
major
andminor
columns, turning them into list columns containing character vectors within each row. You can use the.names
argument toacross
if you need to keep the originals.最简单的方法;定义
主要
和minor
作为字符vectormajor&lt; - c(“ t2a”,“ c26t”,“ g652a”)
和
Minor&lt; - C(“ T2A”,“ C26T”,“ G652A”,“ C725T”)
,
如果不可能将主要和小型级分为字符向量,则可以使用
stringr 包裹做这项工作。
Easiest way to do this; define
major
andminor
as character vectormajor <- c("T2A", "C26T", "G652A")
and
minor <- c("T2A", "C26T", "G652A", "C725T")
then
If not possible to split major and minor as character vector, you can use
stringr
package to do that job.