如何使用较低的情况敏感性作为因素和外观功能将R分开R中的列
我在R中有一个大型数据框,其中由单列中的较低情况和大写字母组成。
df1 <- data.frame(a = c('GCCTTGATTTTTTGGCGGGGACCGTcatGGCGTCGC', 'GATTTTTTGGCGGGGACCGTcatGGCGTCGC', 'TCACCACCATCtCATTCTGC', 'ACTGGTTCCAcCAGCGGGTCACGAC'),
stringsAsFactors = FALSE)
我希望输出将所有“上案字母”带到任何下部案例字母的左侧;即,类似于外观功能的功能。
例如,
GCCTTGATTTTTTTTTTTTGGGGGGACCGTCATGGCGTCGC将成为Gccttgattttttttttgggggggggt gattttttttggggggggcgtcatggcgtcgc将变成gatttttttgggggggaCggt ACTGGTTCCACCAGGGGTCACGAC将成为ActGGTTCCA,
我只对较低案例字符的第一个实例的左侧的大写字符感兴趣。如果没有较低的情况,我也希望代码不掉落。
我尝试查看: case 分裂字符串 但是我似乎无法将其调整以寻找上层案例。
非常感谢您的帮助。
I have a large dataframe in R that is comprised of lower case and uppercase letters in a single column.
df1 <- data.frame(a = c('GCCTTGATTTTTTGGCGGGGACCGTcatGGCGTCGC', 'GATTTTTTGGCGGGGACCGTcatGGCGTCGC', 'TCACCACCATCtCATTCTGC', 'ACTGGTTCCAcCAGCGGGTCACGAC'),
stringsAsFactors = FALSE)
I would like the output to take all of the 'upper case letters' to the left of any lower case letters; i.e., something similar to a look-behind feature.
For example
GCCTTGATTTTTTGGCGGGGACCGTcatGGCGTCGC would become GCCTTGATTTTTTGGCGGGGACCGT
GATTTTTTGGCGGGGACCGTcatGGCGTCGC would become GATTTTTTGGCGGGGACCGT
ACTGGTTCCAcCAGCGGGTCACGAC would become ACTGGTTCCA
I am only interested in the upper case characters to the left hand side of the first instance of lower case characters. I would like also for the code to not fall over if there is no instance of lower case.
I have tried looking at: Splitting strings by case
but i cannot seem to adapt it to look behind for upper case.
I really thank you in advance for your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
代码:
输出:
放置Na,其中字符串没有任何较低的案例字符。
输出:
Code:
Output:
Putting NA, where the string dont have any lower cases charaters.
Output:
您可以将
sub
与[Az]。后。
设置为
NA
没有较低情况:You can use
sub
with[a-z].*
or[[:lower:]].*
to remove the first lower case letter and everything after.Set to
NA
where there is no lower case:您可以使用带有正面lookahead Regex的代码行(将所有内容捕获到第一个较低的情况),因此您无需处理
na
's。是否有比赛。按照评论中的要求添加新列B中的结果:
You can do it all with a line of code with a positive lookahead regex (capturing everything up to the first lower case), so you don't need to deal with the
NA
's. Either there is a match or not.To add a new column b with the result as asked in the comments: