Cloudera 的 Flume 与 Facebook 的 Scribe

发布于 2024-12-06 11:10:48 字数 161 浏览 1 评论 0原文

有没有人有机会同时从事这两项工作?我需要建立一个框架来移动数据。基本上,我们有以文本文件形式输入的点击流数据。这些数据需要从应用程序服务器移动到 HDFS,然后在存档后移动到 S3。

我需要帮助在 Flume 和 Scribe 之间进行选择。哪一个在可管理性、设置方面更好,哪一个更容易定制?

Is there anyone who got a chance to work on both? I need to set up a framework to move data around. Basically, we have clickstream data coming in as text files. This data needs to be moved around form the app-servers to HDFS, and then to S3 after archival.

I need help in choosing between Flume and Scribe. Which one is better in terms of manageability, setting up and which is easier to customize?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

谎言月老 2024-12-13 11:10:48

查看此处发布的答案

我引用一下答案:

  1. Flume 允许您从以下位置配置 Flume 安装:
    中心点,无需 ssh 进入每台机器,更新
    配置变量并重新启动一两个守护进程。你可以开始,
    在任何机器上停止、创建、删除和重新配置逻辑节点
    使用 Flume 从网络中的任何命令行运行 Flume
    罐子可用。

  2. Flume 还具有集中式活动监控功能。我们听说过一个
    Scribe 进程默默失败但撒谎的几个故事
    在 Scribe 安装的其余部分之前几天都未被发现
    在增加的负载下开始吱吱作响。 Flume 可以让你看到
    所有逻辑节点的运行状况都集中在一个地方(请注意,这是
    与机器活跃度监控不同;机器经常停留
    过程可能会失败)。

  3. Flume 支持三种不同类型的可靠性保证,
    允许您在资源使用和
    可靠性。特别是,Flume 支持完全确认的可靠性,
    保证所有事件最终都会顺利进行
    通过事件流。

  4. Flume 也非常具有可扩展性 - 编写自己的 Flume 非常容易
    源或汇,并将大多数系统与 Flume 集成。如果滚动
    你自己的是不切实际的,拥有你的通常是非常简单的
    应用程序以 Flume 可以理解的形式输出事件(Flume
    例如,可以运行 Unix 进程,所以如果你可以使用 shell 脚本
    获取您的数据,您就是黄金)。

这并不是使用 Flume 好处的详尽列表 - 我没有
涉及使用装饰器进行轻量级转换或
元数据提取、配置语言、运行能力
单个Flume进程中的多个逻辑节点,自动分桶
以及在 HDFS 中滚动日志文件...还有更多关于 Flume 的内容
我们期待与大家分享。

对我来说主要的区别是 Cloudera 正在积极支持
水槽。虽然我总体上相信 Facebook 能够保持良好的开放性
源项目,Cloudera 的业务围绕提供支持而建立
对于这样的工具,所以我相信 Flume 将长期成为
更好的支持。我想尽量减少思考的时间
这个特殊问题。也就是说,到目前为止我遇到了很多烦人的事情
Flume 的抽象有点复杂,或者
正如您可能对 1.0 之前的版本所期望的那样,其实现中存在错误
技术。如果 Asana 不是仍处于测试阶段,我可能会选择
抄写员

View the answer posted here

I'll quote the answer:

  1. Flume allows you to configure your Flume installation from a
    central point, without having to ssh into every machine, update a
    configuration variable and restart a daemon or two. You can start,
    stop, create, delete and reconfigure logical nodes on any machine
    running Flume from any command line in your network with the Flume
    jar available.

  2. Flume also has centralised liveness monitoring. We've heard a
    couple of stories of Scribe processes silently failing, but lying
    undiscovered for days until the rest of the Scribe installation
    starts creaking under the increased load. Flume allows you to see the
    health of all your logical nodes in one place (note that this is
    different from machine liveness monitoring; often the machine stays
    up while the process might fail).

  3. Flume supports three distinct types of reliability guarantees,
    allowing you to make tradeoffs between resource usage and
    reliability. In particular, Flume supports fully ACKed reliability,
    with the guarantee that all events will eventually make their way
    through the event flow.

  4. Flume's also really extensible - it's really easy to write your own
    source or sink and integrate most any system with Flume. If rolling
    your own is impractical, it's often very straightforward to have your
    applications output events in a form that Flume can understand (Flume
    can run Unix processes, for example, so if you can use shell script
    to get at your data, you're golden).

This isn't an exhaustive list of benefits to using Flume - I haven't
touched on using decorators for lightweight transformation or
metadata extraction, the configuration language, the ability to run
several logical nodes in a single Flume process, automatic bucketing
and rolling of log files in HDFS... there's lots more about Flume
that we're looking forward to sharing with everyone.

The key difference to me is that Cloudera is actively supporting
Flume. While I do generally trust Facebook to maintain great open
source projects, Cloudera's business is built around providing support
for tools like this, so I have faith that Flume will longterm be
better supported. I want to minimize the time I have to think about
this particular problem. That said, so far I've had a lot of annoying
issues where Flume was either a bit convoluted in its abstraction or
buggy in its implementation, as you might expect from a pre-1.0
technology. If Asana weren't still in beta, I'd probably have chosen
Scribe

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文