Java 还是 Ruby,有区别吗?
我正在编写一个大型金融应用程序,主要使用 Java。现在,为了获取一些数据,我需要编写一个小脚本 (<200 LOC) 来下载 CSV 文件(超过 20,000 个)并将它们存储到磁盘。我需要快点,但是几分钟对我来说没有什么区别。我本来打算用Java编写它,这不是很困难,但是,如果我用Ruby编写它,我会完成得更快,所以我想知道Ruby(或JRuby)之间的速度是否会有很大差异和爪哇。这 20,000 个文件大约有 1/2 MB,而且我下载的服务器并不热衷于泄露数据(它完全合法,不用担心),所以,我的应用程序必须随机休眠之间,并且,如果网站拒绝请求,则必须休眠 3 分钟。
欢迎推荐任何其他比 Java 类型更简单的语言。
I'm writing a large scale financial application that I use mostly Java for. Now, to get some data, I need to write a small script (<200 LOC) to download CSV files (over 20,000 of them) and store them to disk. I need this to be fast, but, a few minutes doesn't make a difference to me. I was planning to write it in Java which isn't very hard, but, I would be done a lot faster if I wrote it in Ruby, so I was wondering if there would be a large difference in speed between Ruby (or JRuby) and Java. The 20,000 files are all about 1/2 a megabyte, and the server I'm downloading from isn't keen to give away data (its completely legal, don't worry about that), so, my application has to randomly sleep in between, and, if the website denies a request, it has to sleep for 3 minutes.
Recommendations to any other easier-than-Java type of languages is welcome.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用任何让您感到舒服的东西。语言实现速度可能不会成为问题,网络速度和您必须投入的睡眠无论如何都将成为瓶颈。
Use whatever makes you comfortable. Language implementation speed probably won't be an issue there, network speed and the sleeps you have to put in will be a bottleneck anyway.
听起来你的应用程序将受到 I/O 限制,因此语言的速度并不是非常重要。
在 Ruby 或 Python 这样的语言中,我希望这更像是 20 LOC 或更少。特别是由于您的请求率有限,因此使用同时连接来尝试加快速度是没有意义的。
如果您有一堆具有不同 IP 地址的计算机(或一台具有多个外部地址的计算机),您可以将作业拆分到这些计算机上加快速度,因为速率限制通常是通过 IP 地址进行的
您的 url 来自哪里?
Sounds like you app will be I/O bound, so the speed of the language is not terribly important
In a language like Ruby or Python, I would expect this to be more like 20 LOC or less. Especially since you have a limited request rate, there is no point using simultaneous connections to try to speed things up
If you have a bunch of machines with different ip addresses ( or one machine with several external addresses), you could split the job across those to speed things up since the rate limiting is usually by ip address
Where do your urls come from?