合并多个不同列长度的 data.frames 并操作列

发布于 2024-11-09 18:07:47 字数 4349 浏览 1 评论 0原文

我使用 9 个具有不同数据的文件（每个组织的蛋白质数据）。每个文件代表不同的组织并具有蛋白质表达值（以数字形式）。我正在尝试将数据合并到一个 data.frame 中。我

read.delim("fileName.txt")

用于所有文件。之后，我使用了所有数据帧的列表

l <- list(data.frame1,..etc)

，然后我使用了 plyr 库和 do.call(rbind.fill,l)。

我的问题：

1）我希望循环遍历 9 个数据帧的列表，找到其中的唯一数据并将其绘制在直方图中。如果我发现多个具有相同名称但不同组织的条目，则应将其添加到直方图中，每个条目位于正确的组织标签上方。也就是说 - 我转到列表中的第一个 data.frame，从中取出第一个条目，搜索是否在其他 data.frame 中找到该条目，如果是，则将其添加到直方图中。

直方图的 x 轴有 9 个组织，y 轴是我的文件中的值。我不知道如何获取直方图（和代码）以适当地更改名称以及如何在正确的位置显示条形图。

此外，我不知道如何构建轴来获取每个条形下的组织名称。

我有一些基本代码没有做我想做的事：

i=1

for( val in list2[1:9] )
{
    if( val appears in one of the other data.frames)
           plot a bar over the correct tissue.

    hist(val[i,8],breaks=11,col="blue",density=13,angle=45,
           labels=c("Lung","ErythroleukemicCellLine","TCells","Blood","liver",
           "BLimpho","pancreas","prostate","Bladder"), main=fileName[i,1])
    dev.new() #each hist in a new window
    i = i + 1

}

谢谢 yigeal

这是代码输出末尾的几行：使用 read.delim("nameOfFile.txt") 读取文件后，

 dput(BloodErythroleukemicCellLineFile)
 "Tax_Id=9606 Gene_Symbol=ZNF589 Uncharacterized protein", 
    "Tax_Id=9606 Gene_Symbol=ZNF598 Isoform 1 of Zinc finger protein 598", 
    "Tax_Id=9606 Gene_Symbol=ZNF609 Zinc finger protein 609", 
    "Tax_Id=9606 Gene_Symbol=ZNF610 Isoform 1 of Zinc finger protein 610", 
    "Tax_Id=9606 Gene_Symbol=ZNF613 Isoform 1 of Zinc finger protein 613", 
    "Tax_Id=9606 Gene_Symbol=ZNF614 Zinc finger protein 614", 
    "Tax_Id=9606 Gene_Symbol=ZNF622 Zinc finger protein 622", 
    "Tax_Id=9606 Gene_Symbol=ZNF625 Zinc finger protein 625", 
    "Tax_Id=9606 Gene_Symbol=ZNF638 Isoform 1 of Zinc finger protein 638", 
    "Tax_Id=9606 Gene_Symbol=ZNF638 Isoform 4 of Zinc finger protein 638", 
    "Tax_Id=9606 Gene_Symbol=ZNF646 Isoform 1 of Zinc finger protein 646", 
    "Tax_Id=9606 Gene_Symbol=ZNF658B Zinc finger protein 658B", 
    "Tax_Id=9606 Gene_Symbol=ZNF667 Zinc finger protein 667, isoform CRA_a", 
    "Tax_Id=9606 Gene_Symbol=ZNF671 Zinc finger protein 671", 
    "Tax_Id=9606 Gene_Symbol=ZNF687 Isoform 1 of Zinc finger protein 687", 
    "Tax_Id=9606 Gene_Symbol=ZNF687 Zinc finger protein 687", 
    "Tax_Id=9606 Gene_Symbol=ZNF691 cDNA FLJ56317, highly similar to Zinc finger protein 691", 
    "Tax_Id=9606 Gene_Symbol=ZNF700 Zinc finger protein 700", 
    "Tax_Id=9606 Gene_Symbol=ZNF714 Isoform 1 of Zinc finger protein 714", 
    "Tax_Id=9606 Gene_Symbol=ZNF72 Zinc finger protein 72 (Fragment)", 
    "Tax_Id=9606 Gene_Symbol=ZNF721 zinc finger protein 721", 
    "Tax_Id=9606 Gene_Symbol=ZNF76 Isoform 2 of Zinc finger protein 76", 
    "Tax_Id=9606 Gene_Symbol=ZNF782 Zinc finger protein 782", 
    "Tax_Id=9606 Gene_Symbol=ZNF787 Zinc finger protein 787", 
    "Tax_Id=9606 Gene_Symbol=ZNF800 Zinc finger protein 800", 
    "Tax_Id=9606 Gene_Symbol=ZNF827 21 kDa protein", "Tax_Id=9606 Gene_Symbol=ZNF828 Zinc finger protein 828", 
    "Tax_Id=9606 Gene_Symbol=ZNF837 Zinc finger protein 837", 
    "Tax_Id=9606 Gene_Symbol=ZNF878 Zinc finger protein 878", 
    "Tax_Id=9606 Gene_Symbol=ZNF891 Zinc finger protein 891", 
    "Tax_Id=9606 Gene_Symbol=ZNHIT2 Zinc finger HIT domain-containing protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZP2 Zona pellucida sperm-binding protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZRANB2 Isoform 1 of Zinc finger Ran-binding domain-containing protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZSWIM6 Zinc finger SWIM domain-containing protein 6", 
    "Tax_Id=9606 Gene_Symbol=ZUFSP 32 kDa protein", "Tax_Id=9606 Gene_Symbol=ZW10 Centromere/kinetochore protein zw10 homolog", 
    "Tax_Id=9606 Gene_Symbol=ZWINT ZW10 interactor", "Tax_Id=9606 Gene_Symbol=ZYG11B Isoform 1 of Protein zyg-11 homolog B", 
    "Tax_Id=9606 Gene_Symbol=ZYX cDNA FLJ53160, highly similar to Zyxin", 
    "Tax_Id=9606 Gene_Symbol=ZYX Uncharacterized protein", "Tax_Id=9606 Gene_Symbol=ZYX Zyxin"
    ), class = "factor")), .Names = c("proteinIdentifier", "protein", 
"spectra", "unique_peptides", "FDR", "local_FDR", "sequence_coverage", 
"expression_value", "expression_percentile", "organism", "tissue", 
"localization", "condition", "experiment", "annotation"), class = "data.frame", row.names = c(NA, 
-4802L))

它在控制台中的长度要长得多

原文

I am using 9 files with different data (proteins per tissue data). Each file represents a different tissue and has values of proteins expression (as numbers). I am trying to merge the data into one data.frame. I used

read.delim("fileName.txt")

for all the files. After that, i used a list for all the data frames

l <- list(data.frame1,..etc)

Then I used the plyr library and the do.call(rbind.fill,l).

my questions:

1) I wish to loop through the list of 9 data.frames find the unique data in them and plot it in a histogram. If i find more than one entry with the same name but different tissue it should be added to the histogram each above the correct tissue label. That is - I go to the first data.frame in the list, from it I take out the first entry, search if this entry is found in one of the other data.frames and if so add it to the histogram.

The histogram has 9 tissues at the x axis and the y axis is the value from my files. I can't figure how to get the histogram (and the code) to change the name appropriately and how to display the bar in the correct place.

In addition i do not know how to build the axis to get the tissue names under each bar.

I have some basic code that is not doing what i want :

i=1

for( val in list2[1:9] )
{
    if( val appears in one of the other data.frames)
           plot a bar over the correct tissue.

    hist(val[i,8],breaks=11,col="blue",density=13,angle=45,
           labels=c("Lung","ErythroleukemicCellLine","TCells","Blood","liver",
           "BLimpho","pancreas","prostate","Bladder"), main=fileName[i,1])
    dev.new() #each hist in a new window
    i = i + 1

}

thank you
yigeal

this are a few lines of the end of the output of the code:
after reading the file in with read.delim("nameOfFile.txt")

 dput(BloodErythroleukemicCellLineFile)
 "Tax_Id=9606 Gene_Symbol=ZNF589 Uncharacterized protein", 
    "Tax_Id=9606 Gene_Symbol=ZNF598 Isoform 1 of Zinc finger protein 598", 
    "Tax_Id=9606 Gene_Symbol=ZNF609 Zinc finger protein 609", 
    "Tax_Id=9606 Gene_Symbol=ZNF610 Isoform 1 of Zinc finger protein 610", 
    "Tax_Id=9606 Gene_Symbol=ZNF613 Isoform 1 of Zinc finger protein 613", 
    "Tax_Id=9606 Gene_Symbol=ZNF614 Zinc finger protein 614", 
    "Tax_Id=9606 Gene_Symbol=ZNF622 Zinc finger protein 622", 
    "Tax_Id=9606 Gene_Symbol=ZNF625 Zinc finger protein 625", 
    "Tax_Id=9606 Gene_Symbol=ZNF638 Isoform 1 of Zinc finger protein 638", 
    "Tax_Id=9606 Gene_Symbol=ZNF638 Isoform 4 of Zinc finger protein 638", 
    "Tax_Id=9606 Gene_Symbol=ZNF646 Isoform 1 of Zinc finger protein 646", 
    "Tax_Id=9606 Gene_Symbol=ZNF658B Zinc finger protein 658B", 
    "Tax_Id=9606 Gene_Symbol=ZNF667 Zinc finger protein 667, isoform CRA_a", 
    "Tax_Id=9606 Gene_Symbol=ZNF671 Zinc finger protein 671", 
    "Tax_Id=9606 Gene_Symbol=ZNF687 Isoform 1 of Zinc finger protein 687", 
    "Tax_Id=9606 Gene_Symbol=ZNF687 Zinc finger protein 687", 
    "Tax_Id=9606 Gene_Symbol=ZNF691 cDNA FLJ56317, highly similar to Zinc finger protein 691", 
    "Tax_Id=9606 Gene_Symbol=ZNF700 Zinc finger protein 700", 
    "Tax_Id=9606 Gene_Symbol=ZNF714 Isoform 1 of Zinc finger protein 714", 
    "Tax_Id=9606 Gene_Symbol=ZNF72 Zinc finger protein 72 (Fragment)", 
    "Tax_Id=9606 Gene_Symbol=ZNF721 zinc finger protein 721", 
    "Tax_Id=9606 Gene_Symbol=ZNF76 Isoform 2 of Zinc finger protein 76", 
    "Tax_Id=9606 Gene_Symbol=ZNF782 Zinc finger protein 782", 
    "Tax_Id=9606 Gene_Symbol=ZNF787 Zinc finger protein 787", 
    "Tax_Id=9606 Gene_Symbol=ZNF800 Zinc finger protein 800", 
    "Tax_Id=9606 Gene_Symbol=ZNF827 21 kDa protein", "Tax_Id=9606 Gene_Symbol=ZNF828 Zinc finger protein 828", 
    "Tax_Id=9606 Gene_Symbol=ZNF837 Zinc finger protein 837", 
    "Tax_Id=9606 Gene_Symbol=ZNF878 Zinc finger protein 878", 
    "Tax_Id=9606 Gene_Symbol=ZNF891 Zinc finger protein 891", 
    "Tax_Id=9606 Gene_Symbol=ZNHIT2 Zinc finger HIT domain-containing protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZP2 Zona pellucida sperm-binding protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZRANB2 Isoform 1 of Zinc finger Ran-binding domain-containing protein 2", 
    "Tax_Id=9606 Gene_Symbol=ZSWIM6 Zinc finger SWIM domain-containing protein 6", 
    "Tax_Id=9606 Gene_Symbol=ZUFSP 32 kDa protein", "Tax_Id=9606 Gene_Symbol=ZW10 Centromere/kinetochore protein zw10 homolog", 
    "Tax_Id=9606 Gene_Symbol=ZWINT ZW10 interactor", "Tax_Id=9606 Gene_Symbol=ZYG11B Isoform 1 of Protein zyg-11 homolog B", 
    "Tax_Id=9606 Gene_Symbol=ZYX cDNA FLJ53160, highly similar to Zyxin", 
    "Tax_Id=9606 Gene_Symbol=ZYX Uncharacterized protein", "Tax_Id=9606 Gene_Symbol=ZYX Zyxin"
    ), class = "factor")), .Names = c("proteinIdentifier", "protein", 
"spectra", "unique_peptides", "FDR", "local_FDR", "sequence_coverage", 
"expression_value", "expression_percentile", "organism", "tissue", 
"localization", "condition", "experiment", "annotation"), class = "data.frame", row.names = c(NA, 
-4802L))

it is much longer in the console

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

濫情▎り 2024-11-16 18:07:47

从你的问题中找出问题的核心并不容易。
要使用一些公共字段（或多个字段）合并数据框，您可以使用 merge() 函数，例如：

merge(dataframe1, dataframe2, by=c('column_name1','column_name2'), suffixes=c('.from_df1','.from_df2'))

如果您想选择行或列，您可以这样做：

dataframe1[dataframe$column1 == 'some_value", c('col1', 'col2')]

等等...
这对你有帮助吗？

It is not easy to find the core of the problem in your question.
For merging data frames using some common field (or fields) you can use the merge() function, like:

merge(dataframe1, dataframe2, by=c('column_name1','column_name2'), suffixes=c('.from_df1','.from_df2'))

If you want to select rows or columns, you can do it like this:

dataframe1[dataframe$column1 == 'some_value", c('col1', 'col2')]

etc...
Does this help you?

回复收藏 0 原文

~没有更多了~

关于作者

染柒℉

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

合并多个不同列长度的 data.frames 并操作列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

合并多个不同列长度的 data.frames 并操作列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。