Ruby 还是 Python 用于大量导入脚本?

发布于 2024-10-10 01:18:15 字数 331 浏览 0 评论 0原文

我有一个用 PHP(在 symfony 上)编写的应用程序,可以导入大型 CSV 文件(最多 100,000 行)。它有一个真正的内存使用问题。一旦它经过大约 15,000 行,它就会慢慢停止。

我知道我可以在 PHP 中采取一些措施,但无论如何我已经不再使用 PHP 了。

如果我想编写一个导入 CSV 文件的应用程序,您认为 Ruby 和 Python 之间会有什么显着差异吗?其中之一是否适合更多与进口相关的任务?我意识到我是在基于很少的信息提出问题。请随意要求我澄清事情,或者只是笼统地说一下。

如果有什么不同的话,我真的很喜欢 Lisp,如果可能的话,我更喜欢两种语言中的 Lispier。

I have an application I wrote in PHP (on symfony) that imports large CSV files (up to 100,000 lines). It has a real memory usage problem. Once it gets through about 15,000 rows, it grinds to a halt.

I know there are measures I could take within PHP but I'm kind of done with PHP, anyway.

If I wanted to write an app that imports CSV files, do you think there would be any significant difference between Ruby and Python? Is either one of them geared to more import-related tasks? I realize I'm asking a question based on very little information. Feel free to ask me to clarify things, or just speak really generally.

If it makes any difference, I really like Lisp and I would prefer the Lispier of the two languages, if possible.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

情话墙 2024-10-17 01:18:15

您要将 CSV 文件导入到什么目录中?您不能以一种不会将整个文件一次加载到内存中的方式解析 CSV 文件(即一次处理一行)吗?

如果是这样,那么您可以使用 Python 的标准 csv 库执行类似以下操作

import csv
with open('csvfile.csv', 'rb') as source:
    rdr= csv.reader( source )
    for row in rdr:
        # do whatever with row

现在不要将此答案作为切换到 Python 的直接原因。如果 PHP 在其 CSV 库等中没有类似的功能,我会感到非常惊讶。

What are you importing the CSV file into? Couldn't you parse the CSV file in a way that doesn't load the whole thing into memory at once (i.e. work with one line at a time)?

If so, then you can use Python's standard csv library to do something like the following

import csv
with open('csvfile.csv', 'rb') as source:
    rdr= csv.reader( source )
    for row in rdr:
        # do whatever with row

Now don't take this answer as an immediate reason to switch to Python. I'd be very surprised if PHP didn't have a similar functionality in its CSV library, etc.

一笔一画续写前缘 2024-10-17 01:18:15

您要将 CSV 文件导入到什么目录中?您不能以一种不会将整个文件一次加载到内存中的方式解析 CSV 文件(即一次处理一行)吗?

如果是这样,那么您可以使用 Ruby 的标准 CSV 库执行类似以下操作“

CSV.open('csvfile.csv', 'r') do |row|
  #executes once for each row
  p row
end

现在不要将此答案作为切换到 Ruby 的直接原因。如果 PHP 不这样做,我会非常感到惊讶其 CSV 库中没有类似的功能,因此您应该在决定需要切换语言之前更彻底地研究 PHP。

What are you importing the CSV file into? Couldn't you parse the CSV file in a way that doesn't load the whole thing into memory at once (i.e. work with one line at a time)?

If so, then you can use Ruby's standard CSV library to do something like the following"

CSV.open('csvfile.csv', 'r') do |row|
  #executes once for each row
  p row
end

Now don't take this answer as an immediate reason to switch to Ruby. I'd be very surprised if PHP didn't have a similar functionality in its CSV library, so you should investigate PHP more thoroughly before deciding that you need to switch languages.

后来的我们 2024-10-17 01:18:15

python 中的等效项(等等):

import csv
reader = csv.reader(open("some.csv", "rb"))
for row in reader:
    print row

此代码不会首先将整个 csv 文件加载到内存中,而是使用迭代器逐行解析它。我敢打赌你的问题发生在读取该行“之后”,你以某种方式缓冲数据(通过将其存储在字典或某种数组中)。

在处理大数据时,您需要尽快丢弃数据并尽可能少地缓冲。在上面的示例中,“print”就是这样做的,对数据行执行一些操作,但不存储/缓冲任何数据,因此 python 的 GC 可以在循环作用域结束时立即删除该引用。

我希望这有帮助。

The equivalent in python (wait for it):

import csv
reader = csv.reader(open("some.csv", "rb"))
for row in reader:
    print row

This code does not load the entire csv file in memory first but, instead, parses it line by line with iterators. I bet your problem is happening "after" the line is read, where you are somehow buffering the data (by storing it in a dictionary or array of some sort).

When dealing with bigdata, you need to discard of the data as fast as you can and buffer a little as possible. In the example above "print" is doing just that, performing some operation on the line of data but not storing/buffering any of it so python's GC can do away with that reference as soon as the loop scope ends.

I hope this helps.

半﹌身腐败 2024-10-17 01:18:15

我认为问题是您立即将 csv 加载到内存中。如果是这样的话,我相信 python/ruby 也会让你大吃一惊。我是Python的忠实粉丝,但这只是个人观点。

I think the problem is that you are loading the csv in memory at once. If that is the case then I am sure that also python/ruby is going to blow up on you. I am a big fan of python, but that is just a personal opinion.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文