R:在数据框中组合相同的标识符
我有一个包含 2 列的数据框,一个标识符和一个包含名称的列。每个标识符在列 ID 中多次出现(见下文)。
ID Names
uc001aag.1 DKFZp686C24272
uc001aag.1 DQ786314
uc001aag.1 uc001aag.1
uc001aah.2 AK056232
uc001aah.2 FLJ00038
uc001aah.2 uc001aah.1
uc001aah.2 uc001aah.2
uc001aai.1 AY217347
现在我想创建一个像这样的数据框:
ID Names
uc001aag.1 DKFZp686C24272 | DQ786314 | uc001aag.1
uc001aah.2 AK056232 | FLJ00038 | uc001aah.1 | uc001aah.2
uc001aai.1 AY217347
有人可以帮助我吗?
I have a dataframe with 2 columns, one Identifier and column with names. Each Identifier is several times present in the column ID (see below).
ID Names
uc001aag.1 DKFZp686C24272
uc001aag.1 DQ786314
uc001aag.1 uc001aag.1
uc001aah.2 AK056232
uc001aah.2 FLJ00038
uc001aah.2 uc001aah.1
uc001aah.2 uc001aah.2
uc001aai.1 AY217347
Now I want to create a dataframe like this:
ID Names
uc001aag.1 DKFZp686C24272 | DQ786314 | uc001aag.1
uc001aah.2 AK056232 | FLJ00038 | uc001aah.1 | uc001aah.2
uc001aai.1 AY217347
Can anyone help me?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Aggregate 是一种相当快的方法,但您可以使用 sapply 解决方案来并行化代码。这可以在 Windows 上使用
snowfall
轻松完成:并行版本将为您带来额外的加速,但单个 sapply 解决方案实际上比聚合解决方案慢。 Taply 速度稍快一些,但无法使用降雪进行并行化。在我的计算机上:
注意:
根据记录,更好的 sapply-solution 是:
相当于 tapply。但并行化实际上会更慢,因为您必须在 sfSapply 内移动更多数据。速度来自于将数据集复制到每个CPU。当你的数据集很大时,你必须记住这一点:你将付出更多的内存使用来提高速度。
Aggregate is quite a fast one, but you can use an sapply solution to parallelize the code. This can easily be done on Windows using
snowfall
:The parallel version will give you an extra speedup, but the single sapply solution is actually slower than aggregate. Tapply is a bit faster, but can't be parallelized using snowfall. on my computer :
Note:
For the record, the better sapply-solution would be :
which is equivalent to tapply. But parallelizing this one is actually slower, as you have to move more data around within the
sfSapply
. The speed comes from copying the dataset to every cpu. This is what you have to keep in mind when your dataset is huge : you'll pay the speed with more memory usage.您可以使用
聚合
:You can use
aggregate
: