在分析中将行名称引用为数字(geiger 包)

发布于 2024-12-01 03:19:02 字数 507 浏览 0 评论 0原文

我正在尝试在 R 中的geiger包中执行tip.disparity函数。

我的数据:

Family    Length   Wing    Tail  
Alced    2.21416 1.88129 1.66744 
Brachypt 2.36734 2.02373 2.03335 
Bucco    2.23563 1.91364 1.80675 

当我使用函数“name.check”检查数据中的名称与我的树上的名称是否匹配时,它返回

$data.not.tree
[1] "1" "10" "11" "12" "2" etc

显示它是按数字指代名称。我尝试过转换为字符向量等,

我尝试过运行它,

data.names=NULL

我只是想编辑我的数据框,以便包将名称与我的树中的名称相匹配(树是 newick 格式)

希望这更清楚 谢谢

I'm trying to carry out tip.disparity function in the geiger package in R.

My data:

Family    Length   Wing    Tail  
Alced    2.21416 1.88129 1.66744 
Brachypt 2.36734 2.02373 2.03335 
Bucco    2.23563 1.91364 1.80675 

When I use the function "name.check" to check the names from my data match those on my tree, it returns

$data.not.tree
[1] "1" "10" "11" "12" "2" etc

Showing that it is referring to the names by number. Ive tried converting to character vector etc

I've tried running it with

data.names=NULL

I'm looking simply to edit my data frame so that the package matches the names to those in my tree (tree is newick format)

Hope this is clearer
Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

枉心 2024-12-08 03:19:02

我相信线索在文档中(?check.names):

data.names: names of the tips in the order of the data; if this is not
          given, names will be taken from the names or rownames of the
          object data

如果您希望程序返回数据框中包含但不存在于树中的分类单元的名称,您可以需要指定相应的名称作为数据框的行名称,或者在 data.names 参数中单独指定它们。请注意,数据框的默认行名称是行号的字符,与您在上面看到的完全相同......

根据上面的附加信息进行编辑

R 无法猜测(或不想)这些名称包含在数据框的 Family 元素中。尝试:

check.names(traitdata,tree,data.names=as.character(traitdata$Family))

从长远来看可能更好:

rownames(traitdata) <- as.character(traitdata$Family)
traitdata <- subset(traitdata,-Family)
check.names(traitdata,tree)

因为你不想将 Family 包含在你的特征数据集中 - 它是一个标识符,而不是一个特征......

如果你看一下包中给出的示例数据的结构:

data(geospiza)
geospiza.data

您可以看到分类单元名称作为行名称包含在内,而不是作为数据框本身的列...

PS 它不是一个很好的界面和 StackOverflow 一样,但是有一个非常友好和活跃的R-for-phylogeny 邮件列表位于 [电子邮件受保护] ...

I believe the clue is in the documentation (?check.names):

data.names: names of the tips in the order of the data; if this is not
          given, names will be taken from the names or rownames of the
          object data

If you want the program to return the names of the taxa that are included in the data frame but not present in the tree, you either need to assign the corresponding names as row names of your data frame, or specify them separately in the data.names argument. Note that the default row names of a data frame are the character equivalent of the row number, exactly what you're seeing above ...

edit based on additional information above:

R can't guess (or doesn't want to) that the names are contained in the Family element of your data frame. Try:

check.names(traitdata,tree,data.names=as.character(traitdata$Family))

Probably better in the long run to do:

rownames(traitdata) <- as.character(traitdata$Family)
traitdata <- subset(traitdata,-Family)
check.names(traitdata,tree)

Because you don't want to have Family included in your data set of traits -- it's an identifier, not a trait ...

If you look at the structure of the example data given in the package:

data(geospiza)
geospiza.data

you can see that the taxon names are included as row names, not as a column in the data frame itself ...

PS it's not as nice an interface as StackOverflow, but there's a very friendly and active R-for-phylogeny mailing list at [email protected] ...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文