在Ubuntu上安装大数据模块的顺序
安装Hadoop,Sqoop,Zookeeper,Spark,Java,Apache,Pig,Hive,Flume,Flume,Kafka,Kafka,MySQL和其他包裹的顺序是什么?
What is the order of installing Hadoop, Sqoop, Zookeeper, Spark, Java, Apache, Pig, Hive, Flume, Kafka, Mysql and other packages on Ubuntu?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
从此 https://www.digitalocean.com/community/tutorials/how-to-intall-hadoop-hadoop-in-and-and-and-and-andan-alone-mode-arone-on-on-ubuntu-20-04 或 https://phoenixnap.com/kb/kb/install-hadoop-ubuntu
。
Zookeeper如果运行Hadoop群集。
然后火花,然后是卡夫卡。 mysql。但是在此行上订购并不是那么重要。
Start with this https://www.digitalocean.com/community/tutorials/how-to-install-hadoop-in-stand-alone-mode-on-ubuntu-20-04 or https://phoenixnap.com/kb/install-hadoop-ubuntu
Forget PIG, Flume, no longer relevant.
Zookeeper if running a Hadoop Cluster.
Then Spark, then Kafka. Mysql. But order here on this line not so relevant.
您提到的所有内容,减去mysql,都需要Java,所以从那里开始。
为了高可用性HDFS或KAFKA,您需要动物园管理器。 Zookeeper没有依赖性,所以接下来是。 (生产群集的最低服务器)
接下来可以设置Kafka,因为它没有其他依赖关系。 (另外3个用于高可用性的服务器)
Hive需要一个Metastore,例如MySQL,因此您然后设置MySQL并在其上运行Hive Metastore架构查询。 (至少有2台用于读写mysql复制的服务器)
HDF可以是下一个-Namenodes,可用于高可用性,数据码和纱线。 (7个用于2个纳米诺德台,2个资源管理器和3个datanodes + NodeManagers的服务器)
Hive可以选择使用HDFS,因此,假设您想使用它,则可以使用它,并且可以在HDFS Namenodes上配置高可用性,以便为Zookeeper配置高可用性。 Presto或Spark是比Hive快的选择,并且还将使用Metastore。 (2 hiververs用于高可用性),
使用纱线,HDF和Hive,您可以设置火花。
Flume将是下一个,但前提是您实际需要它。否则,可以将代码配置为直接写入Kafka。
SQOOP是一个退休的Apache项目,可以使用Spark。猪一样。
总的来说,与Kafka和MySQL一起使用最少的生产就绪的Hadoop群集将至少需要17台服务器。如果添加负载平衡器和LDAP/Active Directory,请添加更多。
Everything you've mentioned, minus mysql, requires Java, so start there.
For high availability of HDFS, or Kafka, you need Zookeeper. Zookeeper has no dependencies, so that's next. (3 servers minimum for a production cluster)
Kafka can be setup next since it has no other dependencies. (Another 3 servers for high availability)
Hive requires a metastore, such as Mysql, so you'd then setup Mysql and run the Hive metastore schema queries on it. (at least 2 servers for read-write mysql replication)
HDFS can be next - multiple namenodes for high availability, datanodes, and YARN. (7 servers for 2 namenodes , 2 resource managers and 3 datanodes + nodemanagers)
Hive can optionally use HDFS, so that would be next, assuming you wanted to use it, and you can configure high availability on HDFS namenodes to Zookeeper. Presto or Spark are option that are faster than Hive and would also use the metastore. (2 HiveServers for high availability)
With YARN, HDFS, and Hive, you can setup Spark.
Flume would be next, but only if you actually needed it. Otherwise, code can be configured to write directly to Kafka.
Sqoop is a retired Apache project, and Spark can be used instead. Same for Pig.
In total, a minimal production-ready Hadoop cluster with Kafka and MySQL would require at least 17 servers. If you add load balancers and LDAP/Active Directory, then add more.
只需在Ubuntu上安装CDH(Cloudera)或Ambari即可安装所有Hadoop生态系统模块,然后分别安装MySQL和Kafka即可使用。
Just Install CDH (Cloudera) or Ambari on Ubuntu to install all Hadoop ecosystem modules as you mentioned and then install MySQL and Kafka separately to work with.