在竖线字符“|”处分割字符串

发布于 2025-01-14 11:55:23 字数 708 浏览 3 评论 0原文

我觉得这个问题被问了很多,但我发现的所有解决方案也不适合我。

我有一个 dataframe ,其中有一列(称为 ID),其中有一串数字和字母(例如:Q8A203)。在几行中,有两个由竖线分隔的结构(例如:Q8AA66|Q8AAT5)。对于我的分析,保留哪一个并不重要,因此我想创建一个名为 NewColumn 的新列,在其中传输第一个列并在 | 处拆分字符串。

我知道竖线必须区别对待,并且我必须将 \\ 放在前面。我尝试了 strsplit()unlist()

df$NewColumn <- strsplit(df$ID,split='\\|',fixed=TRUE)
df$NewColumn <- unlist(strsplit(df$ID, " \\| ", fixed=TRUE))

这两个选项都从列 IDNewColumn 返回完全相同的内容>。

我将非常感谢您的帮助。

I feel like this question is asked a lot but all the solutions I found don't work for me either.

I have a dataframe with a column (called ID) in which I have a string of numbers and letters (e.g: Q8A203). In a few rows there are two of those constructs separated by a vertical bar (e.g: Q8AA66|Q8AAT5). For my analysis it doesn't matter which one I keep so I wanted to make a new column named NewColumn in which I transfer the first and split the string at |.

I know that the vertical bar must be treated differently and that I have to put \\ in front. I tried strsplit() and unlist():

df$NewColumn <- strsplit(df$ID,split='\\|',fixed=TRUE)
df$NewColumn <- unlist(strsplit(df$ID, " \\| ", fixed=TRUE))

Both options return the exact same content from column ID to the NewColumn.

I would very much appreciate the help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

对风讲故事 2025-01-21 11:55:23

您可以简单地将第二部分替换为任何内容,而不是拆分,它将保留第一个 ID。

df <- data.frame(ID = c("Q8A203", "Q8AA66|Q8AAT5"))
df$NewColumn <- sub("\\|.*$","", df$ID, )
df  
#              ID NewColumn
# 1        Q8A203    Q8A203
# 2 Q8AA66|Q8AAT5    Q8AA66

请下次添加一个最小的可重现示例(此处为您的 df)以加快答案速度;)

如果您删除固定选项,strsplit 就可以工作,但您需要提供精确的正则表达式。此外,您之后还需要使用列表,这更加复杂。

# Working with a list
unlist(lapply(strsplit(df$ID, split='\\|'), "[[", 1))

Rather than splitting you can simply substitute the second part with nothing and it will keep the first ID.

df <- data.frame(ID = c("Q8A203", "Q8AA66|Q8AAT5"))
df$NewColumn <- sub("\\|.*
quot;,"", df$ID, )
df  
#              ID NewColumn
# 1        Q8A203    Q8A203
# 2 Q8AA66|Q8AAT5    Q8AA66

Please next time, add an minimal reproductible example (your df here) to speed up answers ;)

strsplit can work if you remove the fixed option, but you need to provide an exact regex. Also, you will need to work with a list after, which is more complex.

# Working with a list
unlist(lapply(strsplit(df$ID, split='\\|'), "[[", 1))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文