如何将jar添加到类路径中并在不重新启动hadoop集群的情况下生效?
我编写了一些引用一些外部 jar 的 mapreduce
作业。 所以我将它们添加到“正在运行”集群的 CLASSPATH 中以便运行作业。
一旦我尝试运行它们,我就遇到了类未发现异常。 我用谷歌搜索了修复它的方法,发现我需要重新启动集群才能应用 改变了 CLASSPATH,它确实起作用了。
哦,恶心! 每次将新的 jar 添加到 CLASSPATH 中时,我真的需要重新启动集群吗? 我认为这没有道理。
有谁知道如何应用更改而不重新启动它们?
我想我需要添加一些细节来征求您的建议。
我编写了一个自定义 hbase
过滤器类并将其打包在一个 jar 中。 我编写了一个使用自定义过滤器类的 mapreduce
作业,并将其打包到另一个 jar 中。 因为过滤器类 jar 不在我的“正在运行”集群的类路径中,所以我添加了它。 但在重新启动集群之前我无法成功运行该作业。
当然,我知道我可以将过滤器类和作业打包在一个 jar 中。 但我不是故意的。 我很好奇如果我需要添加新的外部 jars,我应该再次重新启动集群吗?
I wrote some mapreduce
jobs that reference a few external jars.
so I added them into the CLASSPATH of the "running" cluster in order to run jobs.
Once I tried to run them, I got class not found exceptions.
I Googled for ways to fix it and I found that I needed to restart the cluster for applying
the changed CLASSPATH, and it actually worked.
Oh, yuck!
Should I really need to restart a cluster every time I add new jars into the CLASSPATH?
I don't think that it makes sense.
Does anyone know how to apply the changes without restarting them?
I think I need to add some detail to beg your advice.
I wrote a custom hbase
filter class and packed it in a jar.
And I wrote a mapreduce
job that uses the custom filter class and packed it in an another jar.
Because the filter class jar wasn't in the class path of my "running" cluster, I added it.
But I couldn't succeed to run the job until I restarted the cluster.
Of course, I know I could packed the filter class and the job in a single jar together.
But I didn't mean it.
And I'm curious I should restart the cluster again if I need to add new external jars?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
检查 Cloudera 文章,用于包含该作业所需的第 3 方库。选项 (1) 和 (2) 不需要重新启动集群。
Check the Cloudera article for including 3rd party libraries required for the Job. Option (1) and (2) don't require the Cluster to be restarted.
您可以拥有这样一个系统,可以动态地将类名解析为接口类型来处理您的数据。
只是我的2分钱。
You could have such a system that dynamically resolve class names to an interface type to process your data.
Just my 2 cents.