You can access Hadoop from many different languages and a number of resources set up Hadoop for you. You could try Amazon's Elastic MapReduce (EMR), for instance, without having to go through the hassle of configuring the servers, workers, etc. This is a good way to get your head around MapReduce processing while delaying a bit the issues of learning how to use HDFS well, how to manage your scheduler, etc.
It is very desirable to know Java. Hadoop is written in Java. Its popular Sequence File format is dependent on Java.
Even if you use Hive or Pig, you'll probably need to write your own UDF someday. Some people still try to write them in other languages, but I guess that Java has more robust and primary support for them.
Most Hadoop tools are not mature enough (like Sqoop, HCatalog and so on), so you'll see many Java error stack traces and probably you'll want to hack the source code someday
Answer 2
It is not required for you to know Java.
As the others said, it would be very helpful depending on how complex your processing may be. However, there is an incredible amount you can do with just Pig and say Hive.
I would agree that it is fairly likely you will eventually need to write a user defined function (UDF), however, I've written those in Python, and it is very easy to write UDFs in Python.
Granted, if you have very stringent performance requirements, then a Java based MapReduce program would be the way to go. However, great advancements in performance are being made all of the time in both Pig and Hive.
So, the short answer to your question is, "No", it is not required for you to know Java in order to perform Hadoop development.
It sounds like you are on the right track. I recommend setting up some Virtual Machines on your home computer to start taking what you see in the books and implementing them in your VMs. As with many things the only way to become better at something is to practice it. Once you get into I am sure you will have enough knowledge to start a small project to implement Hadoop with. Here are some examples of things people have built with Hadoop: Powered by Hadoop
Go through the Yahoo Hadoop tutorial before going through Hadoop the definitive guide. The Yahoo tutorial gives you a very clean and easy understanding of the architecture.
I think the concepts are not arranged properly in the Book. That makes it a little difficult to study it.
So do not study it together. Go through the web tutorial first.
Feel free to join my blog about Big Data - https://oyermolenko.blog. I’ve been working with Hadoop for a couple of years and in this blog want to share my experience from the early start. I came from .NET environment and faced a couple of challenges related to switching from one language into another. My blog is oriented on people who didn’t work with Hadoop but have some primary technical background like you. Step by step I want to cover the whole family of Big Data services, describe the concepts and common problems I met working with them. Hope you will enjoy it
发布评论
评论(8)
以下是一些关于 MapReduce 的精彩 YouTube 视频
http://www.youtube.com/watch?v =yjPBkvYh-ss
http://www.youtube.com/watch?v=-vD6PUdf3Js
http://www.youtube.com/watch?v=5Eib_H_zCEY
http://www.youtube.com/watch?v=1ZDybXl212Q
http://www.youtube.com/watch?v=BT-piFBP4fE
此外,这里还有关于如何在 Ubuntu 上设置 Hadoop 的精彩教程
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
Here are some nice YouTube videos on MapReduce
http://www.youtube.com/watch?v=yjPBkvYh-ss
http://www.youtube.com/watch?v=-vD6PUdf3Js
http://www.youtube.com/watch?v=5Eib_H_zCEY
http://www.youtube.com/watch?v=1ZDybXl212Q
http://www.youtube.com/watch?v=BT-piFBP4fE
Also, here are nice tutorials on how to setup Hadoop on Ubuntu
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
您可以通过多种不同的语言访问 Hadoop,并且有大量资源可以为您设置 Hadoop。例如,您可以尝试 Amazon 的 Elastic MapReduce (EMR),而无需经历配置服务器、工作人员等的麻烦。这是一个很好的方法,可以让您了解 MapReduce 处理,同时延迟一点学习如何处理的问题。如何用好HDFS,如何管理你的调度程序等等。
搜索你喜欢的语言和语言并不难。找到它的 Hadoop API 或至少一些有关将其与 Hadoop 链接的教程。例如,以下是在 Hadoop 上运行的 PHP 应用程序的演练: http://www.lunchpauze.com/2007/10/writing-hadoop-mapreduce-program-in-php.html
You can access Hadoop from many different languages and a number of resources set up Hadoop for you. You could try Amazon's Elastic MapReduce (EMR), for instance, without having to go through the hassle of configuring the servers, workers, etc. This is a good way to get your head around MapReduce processing while delaying a bit the issues of learning how to use HDFS well, how to manage your scheduler, etc.
It's not hard to search for your favorite language & find Hadoop APIs for it or at least some tutorials on linking it with Hadoop. For instance, here's a walkthrough on a PHP app run on Hadoop: http://www.lunchpauze.com/2007/10/writing-hadoop-mapreduce-program-in-php.html
答案1:
答案 2
Answer 1 :
Answer 2
1)学习Java。没有办法解决这个问题,抱歉。
2)利润!之后一切都会变得非常容易——Hadoop 非常简单。
1) Learn Java. No way around that, sorry.
2) Profit! It'll be very easy after that -- Hadoop is pretty darn simple.
听起来你走在正确的轨道上。我建议在您的家用计算机上设置一些虚拟机,以开始采用您在书中看到的内容并在您的虚拟机中实现它们。与许多事情一样,要想在某件事上变得更好,唯一的方法就是练习。一旦您进入,我相信您将拥有足够的知识来启动一个小项目来实施 Hadoop。以下是人们使用 Hadoop 构建的一些示例:由 Hadoop 提供支持
It sounds like you are on the right track. I recommend setting up some Virtual Machines on your home computer to start taking what you see in the books and implementing them in your VMs. As with many things the only way to become better at something is to practice it. Once you get into I am sure you will have enough knowledge to start a small project to implement Hadoop with. Here are some examples of things people have built with Hadoop: Powered by Hadoop
在阅读 Yahoo Hadoop 教程 .stackoverflow.com/amzn/click/com/1449311520" rel="nofollow noreferrer">Hadoop 权威指南。雅虎教程让您对架构有一个非常清晰和容易的理解。
我认为书中的概念安排不合理。这使得研究它有点困难。
所以不要一起学习。首先浏览网络教程。
Go through the Yahoo Hadoop tutorial before going through Hadoop the definitive guide. The Yahoo tutorial gives you a very clean and easy understanding of the architecture.
I think the concepts are not arranged properly in the Book. That makes it a little difficult to study it.
So do not study it together. Go through the web tutorial first.
我刚刚整理了一篇关于这个主题的论文。上面的资源很好,但我想您会在这里找到一些额外的提示: http://images .globalknowledge.com/wwwimages/whitepaperpdf/WP_CL_Learning_Hadoop.pdf
I just put together a paper on this topic. Great resources above, but I think you'll find some additional pointers here: http://images.globalknowledge.com/wwwimages/whitepaperpdf/WP_CL_Learning_Hadoop.pdf
欢迎加入我关于大数据的博客 - https://oyermolenko.blog。我使用 Hadoop 已经有几年了,希望在这篇博客中分享我从早期开始的经验。我来自 .NET 环境,面临着从一种语言切换到另一种语言相关的一些挑战。我的博客面向那些没有使用过 Hadoop 但像您一样具有一些主要技术背景的人。我想逐步涵盖整个大数据服务系列,描述我在使用它们时遇到的概念和常见问题。希望你会喜欢它
Feel free to join my blog about Big Data - https://oyermolenko.blog. I’ve been working with Hadoop for a couple of years and in this blog want to share my experience from the early start. I came from .NET environment and faced a couple of challenges related to switching from one language into another. My blog is oriented on people who didn’t work with Hadoop but have some primary technical background like you. Step by step I want to cover the whole family of Big Data services, describe the concepts and common problems I met working with them. Hope you will enjoy it