Unfortunately, it seems to me that the reigning language is MATLAB... I say unfortunately because I neither like nor use this language, I'm much more likely to program in C++/Java. But Data Miners and Machine Learning persons around me tend to stick to MATLAB...
I'm not experienced in Java and Hadoop but I used both Python and MATLAB for machine learning stuff and I use MATLAB more often now. Actually, the important factors for my case are as follows:
Almost all of my colleagues use MATLAB and C++, and very few of them use Python. Their Python usage is limited to general scripting, not particular machine learning stuff. So, when I use Python, the only way to get help is web and we face problems to share code within the lab.
The IDE of MATLAB and its extensive documentation makes it powerful for my case.
You can handle large data sets in MATLAB. link 1link2
There are many machine learning/data mining libraries written in MATLAB, and most of the libraries written in C++/Java have MATLAB wrappers.
Some points are also true for Python. But as I mentioned, the community I work in plays an important role in deciding the language.
R is an excellent candidate for data mining (certainly) and machine learning as well.
(Generalizations, of course.)
Java and Hadoop are really meaningful in context of seriously big data and/or scaling requirements. Java gives you the libraries and and an army of programmers. Hadoop gives you fairly painless distribution and a growing knowledge base of mapping various algorithms to the framework.
Python seems to have the academics on its side, specially recent graduates who are now active and influential in the professional practice. Also, if you just want to try out stuff, an expressive dynamic language like Python obviously will prove to be quite useful.
Then there is R. (There is a lot more, but this is the extent of my knowledge /g/)
I think besides the obvious focus on data that R brings to the table (and thus a community of data geeks to help out with the science part as well), it is a delightfully lightweight system and not too shabby at all in terms of libraries as well.
That said, one would think the (~) functional languages (Scala, Clojure on JVM; Haskell, etc.) would be quite a good fit for manipulating data and working on huge datasets.
I think in this field most popular combination is Java/Hadoop. When vacancies requires also python/perl/ruby it usually means that they are migrating from those script languages(usually main languages till that time) to java due to moving from startup code base to enterprise. Also in real world data mining application python is frequently used for prototyping, small sized data processing tasks.
Python is gaining in popularity, has a lot of libraries, and is very useful for prototyping. I find that due to the many versions of python and its dependencies on C libs to be difficult to deploy though.
R is also very popular, has a lot of libraries, and was designed for data science. However, the underlying language design tends to make things overcomplicated.
Personally, I prefer Clojure because it has great data manipulation support and can interop with Java ecosystem. The downside of it currently is that there aren't too many data science libraries yet!
发布评论
评论(5)
不幸的是,在我看来,主导语言是 MATLAB...我说不幸是因为我既不喜欢也不使用这种语言,我更有可能使用 C++/Java 进行编程。但我周围的数据挖掘人员和机器学习人员倾向于坚持使用 MATLAB...
编辑:我刚刚在 R 上的维基百科页面:
Unfortunately, it seems to me that the reigning language is MATLAB... I say unfortunately because I neither like nor use this language, I'm much more likely to program in C++/Java. But Data Miners and Machine Learning persons around me tend to stick to MATLAB...
Edit : I've just read a really interesting line in Wikipedia's page on R :
我在 Java 和 Hadoop 方面没有经验,但我使用 Python 和 MATLAB 进行机器学习,而且现在我更频繁地使用 MATLAB。其实,我的案例的重要因素如下:
有些观点对于 Python 来说也是正确的。但正如我所提到的,我工作的社区在决定语言方面发挥着重要作用。
I'm not experienced in Java and Hadoop but I used both Python and MATLAB for machine learning stuff and I use MATLAB more often now. Actually, the important factors for my case are as follows:
Some points are also true for Python. But as I mentioned, the community I work in plays an important role in deciding the language.
R 也是数据挖掘(当然)和机器学习的优秀候选者。
(当然,是概括性的。)
Java 和 Hadoop 在严格的大数据和/或扩展需求背景下确实很有意义。 Java 为您提供了库和程序员大军。 Hadoop 为您提供相当轻松的分发和不断增长的将各种算法映射到框架的知识库。
Python 似乎得到了学术界的支持,特别是那些现在在专业实践中活跃且有影响力的应届毕业生。另外,如果你只是想尝试一些东西,像 Python 这样的富有表现力的动态语言显然会非常有用。
然后是 R。(还有更多,但这是我的知识范围/g/)
我认为除了 R 带来的对数据的明显关注之外(因此数据极客社区可以帮助解决问题)科学部分),它是一个令人愉快的轻量级系统,在库方面也一点也不差< /a>.
也就是说,人们会认为 (~) 函数式语言(Scala、JVM 上的 Clojure;Haskell 等)非常适合操作数据和处理大型数据集。
R is an excellent candidate for data mining (certainly) and machine learning as well.
(Generalizations, of course.)
Java and Hadoop are really meaningful in context of seriously big data and/or scaling requirements. Java gives you the libraries and and an army of programmers. Hadoop gives you fairly painless distribution and a growing knowledge base of mapping various algorithms to the framework.
Python seems to have the academics on its side, specially recent graduates who are now active and influential in the professional practice. Also, if you just want to try out stuff, an expressive dynamic language like Python obviously will prove to be quite useful.
Then there is R. (There is a lot more, but this is the extent of my knowledge /g/)
I think besides the obvious focus on data that R brings to the table (and thus a community of data geeks to help out with the science part as well), it is a delightfully lightweight system and not too shabby at all in terms of libraries as well.
That said, one would think the (~) functional languages (Scala, Clojure on JVM; Haskell, etc.) would be quite a good fit for manipulating data and working on huge datasets.
我认为在这个领域最流行的组合是Java/Hadoop。当职位空缺还需要 python/perl/ruby 时,通常意味着由于从启动代码库迁移到企业代码库,他们正在从这些脚本语言(通常是当时的主要语言)迁移到 java。
此外,在现实世界的数据挖掘应用程序中,Python 也经常用于原型设计、小型数据处理任务。
I think in this field most popular combination is Java/Hadoop. When vacancies requires also python/perl/ruby it usually means that they are migrating from those script languages(usually main languages till that time) to java due to moving from startup code base to enterprise.
Also in real world data mining application python is frequently used for prototyping, small sized data processing tasks.
Python 越来越受欢迎,拥有大量库,并且对于原型设计非常有用。我发现由于 python 版本众多且其对 C 库的依赖,部署起来很困难。
R 也很流行,有很多库,并且是为数据科学而设计的。然而,底层语言设计往往会让事情变得过于复杂。
就我个人而言,我更喜欢 Clojure,因为它具有出色的数据操作支持,并且可以与 Java 生态系统进行互操作。目前它的缺点是还没有太多的数据科学库!
Python is gaining in popularity, has a lot of libraries, and is very useful for prototyping. I find that due to the many versions of python and its dependencies on C libs to be difficult to deploy though.
R is also very popular, has a lot of libraries, and was designed for data science. However, the underlying language design tends to make things overcomplicated.
Personally, I prefer Clojure because it has great data manipulation support and can interop with Java ecosystem. The downside of it currently is that there aren't too many data science libraries yet!