减少/合并名义类别和序数类别的算法
我收到的数据集有几个变量有超过 10 个类别(一些序数/一些名义),并且我怀疑可以合并其中几个类别,这既是为了更容易呈现,也是为了获得足够的事件进行分析。这可以/应该利用先验知识来完成,但是简化过程的算法将非常受欢迎。这样的算法存在吗?这是在 R 中实现的吗?
编辑:
data("GBSG2", package = "ipred")
cut(GBSG2$tsize,seq(0,100,10))->GBSG2$size
现在我想知道 GBSG2$size 或 GBSG2$tgrade 中的任何类别以及哪些类别可以合并,而不会在预测 GBSG2$cens 状态的能力方面造成重大信息损失。我知道我可以通过合并两个变量中的几个类别,运行逻辑回归并比较手动合并变量之前和之后的结果来手动完成此操作,但是还有其他方法吗?
I'm receiving datasets where several variables have >10 categories (some ordinal/some nominal) and I suspect that several of the categories could be merged, both for easier presentation but also to gain enough events for analysis. This could/should be done with a priori knowledge, but an algorithm simplifying the process would be very welcome. Does such an algorithm exist? Is this implemented in R?
edit:
data("GBSG2", package = "ipred")
cut(GBSG2$tsize,seq(0,100,10))->GBSG2$size
Now I´d like to find whether any of the categories and which categories in GBSG2$size or GBSG2$tgrade can be merged without a significant loss of information in their ability to predict GBSG2$cens status. I know I could do it manually by merging several of the categories in the two variables, running the logistic regression and comparing the results before and after merging the variables manually, but is there any other methods?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论