Hadoop 任务调度程序:容量 vs 公平共享还是其他?
背景
我的雇主正在逐步将我们的资源密集型 ETL 和后端处理逻辑从 MySQL 转移到 Hadoop(dfs 和 hive)。目前,一切仍然较小且易于管理(10 个节点 20 TB),但我们打算逐步增加集群大小。
现在,hadoop 正在转向生产使用,它成为批处理调度和在临时用户 hive 查询、每小时 M/R 进程之间共享集群的一个更大问题,我相信最终会使用 hbase。令人担心的是,用户发出的简单查询可能会运行不合理的时间(例如 4 小时),从而堵塞任务队列并产生潜在的基础设施负载不稳定。
问题
我公司的另一个部分已经被 Flume 的不成熟所困扰,所以我的问题是,两个已知的调度程序(Capacity 和 Fair)有多稳定,除了在它们的赞助公司(Yahoo 和 Facebook)中使用之外,它们还在其他地方使用吗?
编辑:背景信息
http://www.cloudera.com /blog/2008/11/job-scheduling-in-hadoop/
http://hadoop.apache.org/mapreduce/docs/r0.21.0/fair_scheduler.html
http://hadoop.apache.org/mapreduce/docs/r0.21.0/capacity_scheduler.html
Background
My employer is progressively shifting our resource intensive ETL and backend processing logic from MySQL to Hadoop ( dfs & hive ). At the moment everything is still somewhat small and manageable ( 20 TB over 10 nodes ) but we intend to progressively increase the cluster size.
Now that hadoop is being shifted into production use, its becoming a bigger issue of batch scheduling and sharing the cluster between ad-hoc user hive queries, hourly M/R processes, and I believe eventually some usage of hbase. The fear is that a naive query will be made by a user that could potentially run for an unreasonable amount of time ( say 4 hours ) clogging up the task queue and producing potential infrastructure load instabilities.
Question
Another section of my company has already been burned by Flume's immaturity, so my question is, how stable are the two known schedulers ( Capacity & Fair ) and besides usage in their sponsoring companies ( Yahoo & Facebook ) are they used elsewhere?
Edit: Background info
http://www.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/
http://hadoop.apache.org/mapreduce/docs/r0.21.0/fair_scheduler.html
http://hadoop.apache.org/mapreduce/docs/r0.21.0/capacity_scheduler.html
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我们默认启用公平共享调度程序来交付 CDH。是相当稳定的。
We ship CDH with the Fair Share scheduler on by default. It's quite stable.