如何使用mapreduce执行决策树查找?我正在寻找优化版本
我有包含数百万个节点的决策树,在 HDFS 上序列化。任何人都可以帮我提供一些如何进行更好的序列化的指导,以便我可以使用 MapReduce 在 Hadoop 上更有效地执行搜索。
谢谢。
I have decision tree with millions of nodes, serialized on HDFS. Can any one please help me giving some pointer how to do better serialization so that I can perform search more efficiently on Hadoop using map reduce.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为了遍历你的树,你需要将模型加载到内存中。一旦加载,执行实例的遍历就非常容易和快速。你无法避免将你的模型存储到 hdfs 中,因此为了执行更好的遍历,你需要在主内存中做一些更好的事情。但正如我所说,树遍历总是非常快。也许提供一些有关您的问题的更多信息会更好。您的问题是拥有数百万个新示例并预测它们的标签?
Well in order to traverse your tree, you need the model to be loaded into memory. Once it is loaded it is pretty easy and fast to perform a traverse of an instance. You cant avoid storing your model into hdfs, so in order to perform a better traverse, you need to do something better in your main memory. But as i said, a tree traverse is always super fast. Perhaps providing some more information about your problem would be nice. Your problem is having millions of new examples and predicting their label?