Cloudera 用户指南中的 Flume 收集器示例无法按预期工作

发布于 2024-12-21 18:28:26 字数 7007 浏览 0 评论 0原文

用户指南中的部分向您展示了如何设置收集器并向其写入http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_tiering_flume_nodes_agents_and_collectors 具有此配置:

host : console | agentSink("localhost",35853) ;
collector : collectorSource(35853) | console ;

我将其更改为:

dataSource : console | agentSink("localhost") ;
dataCollector : collectorSource() | console ;

我将节点生成为:

flume node_nowatch -n dataSource
flume node_nowatch -n dataCollector

我有在两个系统上进行了尝试:

  1. Cloudera 自己的演示虚拟机在具有 2GB RAM 的 VirtualBox 中运行。 它配备了 Flume 0.9.4-cdh3u2

  2. Ubuntu LTS (Lucid),带有 debian 软件包和 openJDK(减去安装的任何 hadoop 软件包),作为在具有 2GB RAM 的 VirtualBox 中运行的虚拟机 按照此处的步骤 https://ccp.cloudera.com/display/CDHDOC /Flume+Installation#FlumeInstallation-安装 FlumeRPM 或 Debian 软件包

这是我所做的:

flume dump 'collectorSource()' 导致

$ sudo netstat -anp | grep 35853
tcp6       0      0 :::35853                :::*                    LISTEN      3520/java
$ ps aux | grep java | grep 3520
1000      3520  0.8  2.3 1050508 44676 pts/0   Sl+  15:38   0:02 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -1 -s -n dump -c dump: collectorSource() | console;

我的假设是:

flume dump 'collectorSource()'

与运行配置相同:

dump : collectorSource() | console ;

并使用 dataSource 启动节点

flume node -1 -n dump -c "dump: collectorSource() | console;" -s 

: console | agentSink("localhost") 导致

$ sudo netstat -anp | grep 35853
tcp6       0      0 :::35853                :::*                    LISTEN      3520/java       
tcp6       0      0 127.0.0.1:44878         127.0.0.1:35853         ESTABLISHED 3593/java       
tcp6       0      0 127.0.0.1:35853         127.0.0.1:44878         ESTABLISHED 3520/java 

$ ps aux | grep java | grep 3593
1000      3593  1.2  3.0 1130956 57644 pts/1   Sl+  15:41   0:07 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -n dataSource

观察到的行为在 VirtualBox 虚拟机中完全相同

dataSource 处存在无终止的流程

2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO
durability.NaiveFileWALManager: File lives in
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152748172-0500.1116926245855.00000034
2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: constructed new seqfile event sink:
file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:27:58,254 [naive file wal consumer-35] INFO
durability.NaiveFileWALManager: opening log file
20111215-152748172-0500.1116926245855.00000034
2011-12-15 15:27:58,254 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener began
20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:27:58,256 [naive file wal consumer-35] INFO
agent.WALAckManager: Ack for
20111215-152748172-0500.1116926245855.00000034 is queued to be checked
2011-12-15 15:27:58,257 [naive file wal consumer-35] INFO
durability.WALSource: end of file NaiveFileWALManager
(dir=/tmp/flume-cloudera/agent/dataSource )
2011-12-15 15:28:07,874 [Heartbeat] INFO agent.WALAckManager:
Retransmitting 20111215-152657736-0500.1066489868855.00000034 after
being stale for 60048ms
2011-12-15 15:28:07,875 [naive file wal consumer-35] INFO
durability.NaiveFileWALManager: opening log file
20111215-152657736-0500.1066489868855.00000034
2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO
agent.WALAckManager: Ack for
20111215-152657736-0500.1066489868855.00000034 is queued to be checked
2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO
durability.WALSource: end of file NaiveFileWALManager
(dir=/tmp/flume-cloudera/agent/dataSource )
2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: closed
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener ended
20111215-152758253-0500.1127006668855.00000034

2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
durability.NaiveFileWALManager: File lives in
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: constructed new seqfile event sink:
file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034
2011-12-15 15:28:08,336 [naive file wal consumer-35] INFO
durability.NaiveFileWALManager: opening log file
20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:28:08,337 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener began
20111215-152808335-0500.1137089135855.00000034
2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO
agent.WALAckManager: Ack for
20111215-152758253-0500.1127006668855.00000034 is queued to be checked
2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO
durability.WALSource: end of file NaiveFileWALManager
(dir=/tmp/flume-cloudera/agent/dataSource )
2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: closed
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034
2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener ended
20111215-152808335-0500.1137089135855.00000034

..

2011-12-15 15:35:24,763 [Heartbeat] INFO agent.WALAckManager:
Retransmitting 20111215-152707823-0500.1076576334855.00000034 after
being stale for 60277ms
2011-12-15 15:35:24,763 [Heartbeat] INFO
durability.NaiveFileWALManager: Attempt to retry chunk
'20111215-152707823-0500.1076576334855.00000034'  in LOGGED state.
There is no need for state transition.

无终止dataCollector 的流程:

localhost [INFO Thu Dec 15 15:31:09 EST 2011] {
AckChecksum : (long)1323981059821  (string) ' 4Ck��' (double)6.54133557402E-312 } { AckTag : 20111215-153059819-0500.1308572847855.00000034 } { AckType : end }

如何获取控制台 <->通过收集器进行的控制台通信再次正常工作吗?

The bit in the UserGuide that shows you how to setup a collector and write to it http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_tiering_flume_nodes_agents_and_collectors has this configuration:

host : console | agentSink("localhost",35853) ;
collector : collectorSource(35853) | console ;

I changed this to:

dataSource : console | agentSink("localhost") ;
dataCollector : collectorSource() | console ;

I spawned the nodes as:

flume node_nowatch -n dataSource
flume node_nowatch -n dataCollector

I have tried this on two systems:

  1. Cloudera's own demo VM running inside VirtualBox with 2GB RAM.
    It comes with Flume 0.9.4-cdh3u2

  2. Ubuntu LTS (Lucid) with the debian package and openJDK (minus any hadoop packages installed) as a VM running inside VirtualBox with 2GB RAM
    Followed the steps here https://ccp.cloudera.com/display/CDHDOC/Flume+Installation#FlumeInstallation-InstallingtheFlumeRPMorDebianPackages

Here is what I did:

flume dump 'collectorSource()' leads to

$ sudo netstat -anp | grep 35853
tcp6       0      0 :::35853                :::*                    LISTEN      3520/java
$ ps aux | grep java | grep 3520
1000      3520  0.8  2.3 1050508 44676 pts/0   Sl+  15:38   0:02 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -1 -s -n dump -c dump: collectorSource() | console;

My assumption is that:

flume dump 'collectorSource()'

is same as running the config:

dump : collectorSource() | console ;

and starting the node with

flume node -1 -n dump -c "dump: collectorSource() | console;" -s 

dataSource : console | agentSink("localhost") leads to

$ sudo netstat -anp | grep 35853
tcp6       0      0 :::35853                :::*                    LISTEN      3520/java       
tcp6       0      0 127.0.0.1:44878         127.0.0.1:35853         ESTABLISHED 3593/java       
tcp6       0      0 127.0.0.1:35853         127.0.0.1:44878         ESTABLISHED 3520/java 

$ ps aux | grep java | grep 3593
1000      3593  1.2  3.0 1130956 57644 pts/1   Sl+  15:41   0:07 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -n dataSource

The observed behaviour is exactly the same in both the VirtualBox VMs:

Un-ending flow of this at dataSource

2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO
durability.NaiveFileWALManager: File lives in
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152748172-0500.1116926245855.00000034
2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: constructed new seqfile event sink:
file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:27:58,254 [naive file wal consumer-35] INFO
durability.NaiveFileWALManager: opening log file
20111215-152748172-0500.1116926245855.00000034
2011-12-15 15:27:58,254 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener began
20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:27:58,256 [naive file wal consumer-35] INFO
agent.WALAckManager: Ack for
20111215-152748172-0500.1116926245855.00000034 is queued to be checked
2011-12-15 15:27:58,257 [naive file wal consumer-35] INFO
durability.WALSource: end of file NaiveFileWALManager
(dir=/tmp/flume-cloudera/agent/dataSource )
2011-12-15 15:28:07,874 [Heartbeat] INFO agent.WALAckManager:
Retransmitting 20111215-152657736-0500.1066489868855.00000034 after
being stale for 60048ms
2011-12-15 15:28:07,875 [naive file wal consumer-35] INFO
durability.NaiveFileWALManager: opening log file
20111215-152657736-0500.1066489868855.00000034
2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO
agent.WALAckManager: Ack for
20111215-152657736-0500.1066489868855.00000034 is queued to be checked
2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO
durability.WALSource: end of file NaiveFileWALManager
(dir=/tmp/flume-cloudera/agent/dataSource )
2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: closed
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener ended
20111215-152758253-0500.1127006668855.00000034

2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
durability.NaiveFileWALManager: File lives in
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: constructed new seqfile event sink:
file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034
2011-12-15 15:28:08,336 [naive file wal consumer-35] INFO
durability.NaiveFileWALManager: opening log file
20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:28:08,337 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener began
20111215-152808335-0500.1137089135855.00000034
2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO
agent.WALAckManager: Ack for
20111215-152758253-0500.1127006668855.00000034 is queued to be checked
2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO
durability.WALSource: end of file NaiveFileWALManager
(dir=/tmp/flume-cloudera/agent/dataSource )
2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: closed
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034
2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener ended
20111215-152808335-0500.1137089135855.00000034

..

2011-12-15 15:35:24,763 [Heartbeat] INFO agent.WALAckManager:
Retransmitting 20111215-152707823-0500.1076576334855.00000034 after
being stale for 60277ms
2011-12-15 15:35:24,763 [Heartbeat] INFO
durability.NaiveFileWALManager: Attempt to retry chunk
'20111215-152707823-0500.1076576334855.00000034'  in LOGGED state.
There is no need for state transition.

Un-ending flow of this at dataCollector:

localhost [INFO Thu Dec 15 15:31:09 EST 2011] {
AckChecksum : (long)1323981059821  (string) ' 4Ck��' (double)6.54133557402E-312 } { AckTag : 20111215-153059819-0500.1308572847855.00000034 } { AckType : end }

How do I get the console <-> console communication via collectors working again correctly?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小矜持 2024-12-28 18:28:26

我不太确定您的预期行为是什么。

但看起来您可能只绑定到 IPv6 接口。我知道在 Hadoop 配置< /a> 你必须解决这个问题:

# Ubuntu wants us to use IPv6. Hadoop doesn't support that, but nevertheless binds to :::50010. Let's tell it we don't agree.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

你可能需要一个类似的选项。首先,为什么不显式设置主机名和端口号,然后依次取消?

I'm not exactly sure what your expected behavior is.

But it looks like you may only be binding to the IPv6 interface. I know in the Hadoop config you have to work around this:

# Ubuntu wants us to use IPv6. Hadoop doesn't support that, but nevertheless binds to :::50010. Let's tell it we don't agree.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

You may need a similar option. To start with, why not set the hostname and port number explicitly, and then back off each in turn?

往日情怀 2024-12-28 18:28:26
  • 转到 /usr/lib/flume/bin
  • 将名为:flume-env.sh.template 的文件重命名为:flume-env.sh
  • 在文件末尾添加以下行:
    export UOPTS=-Djava.net.preferIPv4Stack=true
  • 重新启动 Flume 实例

=>您将仅侦听 IP v4 地址。

  • Go to /usr/lib/flume/bin
  • rename the file called : flume-env.sh.template to : flume-env.sh
  • add this line at the end of the file :
    export UOPTS=-Djava.net.preferIPv4Stack=true
  • restart your flume instances

=> you will only listen on IP v4 addresses.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文