Cloudera 用户指南中的 Flume 收集器示例无法按预期工作
用户指南中的部分向您展示了如何设置收集器并向其写入http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_tiering_flume_nodes_agents_and_collectors 具有此配置:
host : console | agentSink("localhost",35853) ;
collector : collectorSource(35853) | console ;
我将其更改为:
dataSource : console | agentSink("localhost") ;
dataCollector : collectorSource() | console ;
我将节点生成为:
flume node_nowatch -n dataSource
flume node_nowatch -n dataCollector
我有在两个系统上进行了尝试:
Cloudera 自己的演示虚拟机在具有 2GB RAM 的 VirtualBox 中运行。 它配备了 Flume 0.9.4-cdh3u2
Ubuntu LTS (Lucid),带有 debian 软件包和 openJDK(减去安装的任何 hadoop 软件包),作为在具有 2GB RAM 的 VirtualBox 中运行的虚拟机 按照此处的步骤 https://ccp.cloudera.com/display/CDHDOC /Flume+Installation#FlumeInstallation-安装 FlumeRPM 或 Debian 软件包
这是我所做的:
flume dump 'collectorSource()'
导致
$ sudo netstat -anp | grep 35853
tcp6 0 0 :::35853 :::* LISTEN 3520/java
$ ps aux | grep java | grep 3520
1000 3520 0.8 2.3 1050508 44676 pts/0 Sl+ 15:38 0:02 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -1 -s -n dump -c dump: collectorSource() | console;
我的假设是:
flume dump 'collectorSource()'
与运行配置相同:
dump : collectorSource() | console ;
并使用 dataSource 启动节点
flume node -1 -n dump -c "dump: collectorSource() | console;" -s
: console | agentSink("localhost") 导致
$ sudo netstat -anp | grep 35853
tcp6 0 0 :::35853 :::* LISTEN 3520/java
tcp6 0 0 127.0.0.1:44878 127.0.0.1:35853 ESTABLISHED 3593/java
tcp6 0 0 127.0.0.1:35853 127.0.0.1:44878 ESTABLISHED 3520/java
$ ps aux | grep java | grep 3593
1000 3593 1.2 3.0 1130956 57644 pts/1 Sl+ 15:41 0:07 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -n dataSource
观察到的行为在 VirtualBox 虚拟机中完全相同:
在 dataSource 处存在无终止的流程
2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO
durability.NaiveFileWALManager: File lives in
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152748172-0500.1116926245855.00000034
2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: constructed new seqfile event sink:
file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:27:58,254 [naive file wal consumer-35] INFO
durability.NaiveFileWALManager: opening log file
20111215-152748172-0500.1116926245855.00000034
2011-12-15 15:27:58,254 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener began
20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:27:58,256 [naive file wal consumer-35] INFO
agent.WALAckManager: Ack for
20111215-152748172-0500.1116926245855.00000034 is queued to be checked
2011-12-15 15:27:58,257 [naive file wal consumer-35] INFO
durability.WALSource: end of file NaiveFileWALManager
(dir=/tmp/flume-cloudera/agent/dataSource )
2011-12-15 15:28:07,874 [Heartbeat] INFO agent.WALAckManager:
Retransmitting 20111215-152657736-0500.1066489868855.00000034 after
being stale for 60048ms
2011-12-15 15:28:07,875 [naive file wal consumer-35] INFO
durability.NaiveFileWALManager: opening log file
20111215-152657736-0500.1066489868855.00000034
2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO
agent.WALAckManager: Ack for
20111215-152657736-0500.1066489868855.00000034 is queued to be checked
2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO
durability.WALSource: end of file NaiveFileWALManager
(dir=/tmp/flume-cloudera/agent/dataSource )
2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: closed
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener ended
20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
durability.NaiveFileWALManager: File lives in
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: constructed new seqfile event sink:
file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034
2011-12-15 15:28:08,336 [naive file wal consumer-35] INFO
durability.NaiveFileWALManager: opening log file
20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:28:08,337 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener began
20111215-152808335-0500.1137089135855.00000034
2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO
agent.WALAckManager: Ack for
20111215-152758253-0500.1127006668855.00000034 is queued to be checked
2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO
durability.WALSource: end of file NaiveFileWALManager
(dir=/tmp/flume-cloudera/agent/dataSource )
2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: closed
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034
2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener ended
20111215-152808335-0500.1137089135855.00000034
..
2011-12-15 15:35:24,763 [Heartbeat] INFO agent.WALAckManager:
Retransmitting 20111215-152707823-0500.1076576334855.00000034 after
being stale for 60277ms
2011-12-15 15:35:24,763 [Heartbeat] INFO
durability.NaiveFileWALManager: Attempt to retry chunk
'20111215-152707823-0500.1076576334855.00000034' in LOGGED state.
There is no need for state transition.
无终止dataCollector 的流程:
localhost [INFO Thu Dec 15 15:31:09 EST 2011] {
AckChecksum : (long)1323981059821 (string) ' 4Ck��' (double)6.54133557402E-312 } { AckTag : 20111215-153059819-0500.1308572847855.00000034 } { AckType : end }
如何获取控制台 <->通过收集器进行的控制台通信再次正常工作吗?
The bit in the UserGuide that shows you how to setup a collector and write to it http://archive.cloudera.com/cdh/3/flume/UserGuide/index.html#_tiering_flume_nodes_agents_and_collectors has this configuration:
host : console | agentSink("localhost",35853) ;
collector : collectorSource(35853) | console ;
I changed this to:
dataSource : console | agentSink("localhost") ;
dataCollector : collectorSource() | console ;
I spawned the nodes as:
flume node_nowatch -n dataSource
flume node_nowatch -n dataCollector
I have tried this on two systems:
Cloudera's own demo VM running inside VirtualBox with 2GB RAM.
It comes with Flume 0.9.4-cdh3u2Ubuntu LTS (Lucid) with the debian package and openJDK (minus any hadoop packages installed) as a VM running inside VirtualBox with 2GB RAM
Followed the steps here https://ccp.cloudera.com/display/CDHDOC/Flume+Installation#FlumeInstallation-InstallingtheFlumeRPMorDebianPackages
Here is what I did:
flume dump 'collectorSource()'
leads to
$ sudo netstat -anp | grep 35853
tcp6 0 0 :::35853 :::* LISTEN 3520/java
$ ps aux | grep java | grep 3520
1000 3520 0.8 2.3 1050508 44676 pts/0 Sl+ 15:38 0:02 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -1 -s -n dump -c dump: collectorSource() | console;
My assumption is that:
flume dump 'collectorSource()'
is same as running the config:
dump : collectorSource() | console ;
and starting the node with
flume node -1 -n dump -c "dump: collectorSource() | console;" -s
dataSource : console | agentSink("localhost")
leads to
$ sudo netstat -anp | grep 35853
tcp6 0 0 :::35853 :::* LISTEN 3520/java
tcp6 0 0 127.0.0.1:44878 127.0.0.1:35853 ESTABLISHED 3593/java
tcp6 0 0 127.0.0.1:35853 127.0.0.1:44878 ESTABLISHED 3520/java
$ ps aux | grep java | grep 3593
1000 3593 1.2 3.0 1130956 57644 pts/1 Sl+ 15:41 0:07 java -Dflume.log.dir=/usr/lib/flume/logs -Dflume.log.file=flume.log -Dflume.root.logger=INFO,console -Dzookeeper.root.logger=ERROR,console -Dwatchdog.root.logger=INFO,console -Djava.library.path=/usr/lib/flume/lib::/usr/lib/hadoop/lib/native/Linux-amd64-64 com.cloudera.flume.agent.FlumeNode -n dataSource
The observed behaviour is exactly the same in both the VirtualBox VMs:
Un-ending flow of this at dataSource
2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO
durability.NaiveFileWALManager: File lives in
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152748172-0500.1116926245855.00000034
2011-12-15 15:27:58,253 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: constructed new seqfile event sink:
file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:27:58,254 [naive file wal consumer-35] INFO
durability.NaiveFileWALManager: opening log file
20111215-152748172-0500.1116926245855.00000034
2011-12-15 15:27:58,254 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener began
20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:27:58,256 [naive file wal consumer-35] INFO
agent.WALAckManager: Ack for
20111215-152748172-0500.1116926245855.00000034 is queued to be checked
2011-12-15 15:27:58,257 [naive file wal consumer-35] INFO
durability.WALSource: end of file NaiveFileWALManager
(dir=/tmp/flume-cloudera/agent/dataSource )
2011-12-15 15:28:07,874 [Heartbeat] INFO agent.WALAckManager:
Retransmitting 20111215-152657736-0500.1066489868855.00000034 after
being stale for 60048ms
2011-12-15 15:28:07,875 [naive file wal consumer-35] INFO
durability.NaiveFileWALManager: opening log file
20111215-152657736-0500.1066489868855.00000034
2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO
agent.WALAckManager: Ack for
20111215-152657736-0500.1066489868855.00000034 is queued to be checked
2011-12-15 15:28:07,877 [naive file wal consumer-35] INFO
durability.WALSource: end of file NaiveFileWALManager
(dir=/tmp/flume-cloudera/agent/dataSource )
2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: closed
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener ended
20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
durability.NaiveFileWALManager: File lives in
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:28:08,335 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: constructed new seqfile event sink:
file=/tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034
2011-12-15 15:28:08,336 [naive file wal consumer-35] INFO
durability.NaiveFileWALManager: opening log file
20111215-152758253-0500.1127006668855.00000034
2011-12-15 15:28:08,337 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener began
20111215-152808335-0500.1137089135855.00000034
2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO
agent.WALAckManager: Ack for
20111215-152758253-0500.1127006668855.00000034 is queued to be checked
2011-12-15 15:28:08,339 [naive file wal consumer-35] INFO
durability.WALSource: end of file NaiveFileWALManager
(dir=/tmp/flume-cloudera/agent/dataSource )
2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO
hdfs.SeqfileEventSink: closed
/tmp/flume-cloudera/agent/dataSource/writing/20111215-152808335-0500.1137089135855.00000034
2011-12-15 15:28:18,421 [Roll-TriggerThread-1] INFO
endtoend.AckListener$Empty: Empty Ack Listener ended
20111215-152808335-0500.1137089135855.00000034
..
2011-12-15 15:35:24,763 [Heartbeat] INFO agent.WALAckManager:
Retransmitting 20111215-152707823-0500.1076576334855.00000034 after
being stale for 60277ms
2011-12-15 15:35:24,763 [Heartbeat] INFO
durability.NaiveFileWALManager: Attempt to retry chunk
'20111215-152707823-0500.1076576334855.00000034' in LOGGED state.
There is no need for state transition.
Un-ending flow of this at dataCollector:
localhost [INFO Thu Dec 15 15:31:09 EST 2011] {
AckChecksum : (long)1323981059821 (string) ' 4Ck��' (double)6.54133557402E-312 } { AckTag : 20111215-153059819-0500.1308572847855.00000034 } { AckType : end }
How do I get the console <-> console communication via collectors working again correctly?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我不太确定您的预期行为是什么。
但看起来您可能只绑定到 IPv6 接口。我知道在 Hadoop 配置< /a> 你必须解决这个问题:
你可能需要一个类似的选项。首先,为什么不显式设置主机名和端口号,然后依次取消?
I'm not exactly sure what your expected behavior is.
But it looks like you may only be binding to the IPv6 interface. I know in the Hadoop config you have to work around this:
You may need a similar option. To start with, why not set the hostname and port number explicitly, and then back off each in turn?
export UOPTS=-Djava.net.preferIPv4Stack=true
=>您将仅侦听 IP v4 地址。
export UOPTS=-Djava.net.preferIPv4Stack=true
=> you will only listen on IP v4 addresses.