将一个字符串分成不同行上的多个字符串
我有一个数据框,其中包含一个长字符串,每个字符串都与一个“样本”相关联:
Sample Data
1 000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N
2 000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N
我想编写一种简单的方法来将该字符串分成以下格式的 5 部分:
Sample X
CCT6 - Characters 1-33
GAT1 - Characters 34-68
IMD3 - Characters 69-99
PDR3 - Characters 100-130
RIM15 - Characters 131-168
为每个样本提供如下所示的输出:
Sample 1
CCT6 - 000000000000000000000000000N01000
GAT1 - 000000000N0N000000000N00N0000NN00N0
IMD3 - N000000100000N00N0N0000000NNNN0
PDR3 - 1111111111111111111111111111111
RIM15 - 0000000000000000000N000000N0000000000N
我已经能够使用 substr 函数将长字符串分成单独的部分,但我希望能够将其自动化,这样我就可以在一个输出中获得所有 5 个部分。理想情况下,该输出也是一个数据框。
I have a data frame that contains a long character string each associated with a 'Sample':
Sample Data
1 000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N
2 000000000000000000000000000N01000000000000N0N000000000N00N0000NN00N0N000000100000N00N0N0000000NNNN011111111111111111111111111111110000000000000000000N000000N0000000000N
I would like to code an easy way to break this string into 5 pieces in the following format:
Sample X
CCT6 - Characters 1-33
GAT1 - Characters 34-68
IMD3 - Characters 69-99
PDR3 - Characters 100-130
RIM15 - Characters 131-168
Giving an output that looks like this for each sample:
Sample 1
CCT6 - 000000000000000000000000000N01000
GAT1 - 000000000N0N000000000N00N0000NN00N0
IMD3 - N000000100000N00N0N0000000NNNN0
PDR3 - 1111111111111111111111111111111
RIM15 - 0000000000000000000N000000N0000000000N
I've been able to use the substr
function to break the long string into individual pieces but id like to able to automate it so I can get all 5 pieces in one output. Ideally this output would also be a data frame.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这就是
?read.fwf
的用途。首先是一些看起来像您的问题的数据:
现在使用
read.fwf
,指定每个字段的宽度及其名称,并且所有字段都应该是character
模式。我们将示例数据的文本列包装在textConnection
中,以便我们可以将其视为通常由read.*
和其他函数理解的连接。现在循环遍历各行并根据您的示例打印出每一行:
例如,给出:
This is what
?read.fwf
is for.First some data which looks like your question:
Now use
read.fwf
, specify the widths of each field and their names, and that all should be of modecharacter
. We wrap the text column of the example data intextConnection
so that we can treat it like a connection understood generally by theread.*
and other functions.Now loop over the rows and print out each one as per your example:
Giving, for example:
此代码将分成片段:
此代码将以列表格式传递片段:
并获取指定的输出格式:
需要
invisible
来抑制列表结构中的 NULL 返回。This code will break into segments:
This code would deliver the fragments in list format:
And to get the specified output format:
The
invisible
was needed to suppress the NULL returns from the list structure.