在 Delphi 中导入和处理 CSV 文件中的数据
我有一个面试前任务,我已经完成了该任务并且解决方案有效,但是由于使用了 TADODataset,我被标记下来并且没有得到面试。我基本上导入了一个填充数据集的 CSV 文件,数据必须以特定方式处理,因此我使用数据集的过滤和排序来确保数据按照我想要的方式排序,然后我执行了while循环中的逻辑处理。收到的反馈说这很糟糕,因为对于大文件来说它会非常慢。
我的主要问题是,如果使用内存数据集处理大文件速度很慢,那么从 csv 文件访问信息的更好方法是什么。我应该使用字符串列表或类似的东西吗?
I had an pre-interview task, which I have completed and the solution works, however I was marked down and did not get an interview due to having used a TADODataset. I basically imported a CSV file which populated the dataset, the data had to be processed in a specific way, so I used Filtering and Sorting of the dataset to make sure that the data was ordered in the way I wanted it and then I did the logic processing in a while loop. The feedback that was received said that this was bad as it would be very slow for large files.
My main question here is if using an in memory dataset is slow for processing large files, what would have been better way to access the information from the csv file. Should I have used String Lists or something like that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这实际上取决于任务的“大小”和可用资源(在本例中为 RAM)。
“收到的反馈表明,这很糟糕,因为对于大文件来说,速度会非常慢。”
CSV 文件通常用于移动数据(在大多数情况下,我遇到的文件大约为 1MB+ 到 10MB,但这并不是说其他人不会以 CSV 格式转储更多数据),而不必担心太多(如果在all) 关于导入/导出,因为它非常简单。
假设你有一个 80MB 的 CSV 文件,现在这是一个你想要分块处理的文件,否则(取决于你的处理)你可以吃掉数百MB的 RAM,在这种情况下我会做的是:
在上面的情况下,你'并没有将80MB的数据加载到RAM中,而是只有几百KB,其余的数据用于处理,即链接列表,动态插入查询(批量插入)等。
“...但是我被标记下来并做了没有得到面试机会由于使用了 TADODataset。”
我并不感到惊讶,他们可能想看看您是否有能力创建算法并当场提供简单的解决方案,但不使用“现成的”解决方案。
他们可能正在考虑看到您使用动态数组并创建一个(或多个)排序算法。
“我应该使用字符串列表或类似的东西吗?”
反应可能是一样的,我想他们想看看你是如何“工作”的。
It really depends on how "big" and the available resources(in this case RAM) for the task.
"The feedback that was received said that this was bad as it would be very slow for large files."
CSV files are usually used for moving data around(in most cases that I've encountered files are ~1MB+ up to ~10MB, but that's not to say that others would not dump more data in CSV format) without worrying too much(if at all) about import/export since it is extremely simplistic.
Suppose you have a 80MB CSV file, now that's a file you want to process in chunks, otherwise(depending on your processing) you can eat hundreds of MB of RAM, in this case what I would do is:
In the above case, you're not loading 80MB of data into RAM, but only a few hundred KB, and the rest you use for processing, i.e. linked lists, dynamic insert queries(batch insert), etc.
"...however I was marked down and did not get an interview due to having used a TADODataset."
I'm not surprised, they were probably looking to see if you're capable of creating algorithm(s) and provide simple solutions on the spot, but without using "ready-made" solutions.
They were probably thinking of seeing you use dynamic arrays and creating one(or more) sorting algorithm(s).
"Should I have used String Lists or something like that?"
The response might have been the same, again, I think they wanted to see how you "work".
面试官说得很对。
对于任何中等以上的文件,正确、可扩展且最快的解决方案是使用“外部排序”。
“外部排序”是一个 2 阶段过程,第一阶段是将每个文件拆分为可管理且已排序的较小文件。第二阶段是将这些文件合并回单个排序文件,然后可以逐行处理该文件。
它对于任何超过 200,000 行的 CSV 文件都非常有效。可以控制进程运行的内存量,从而消除内存不足的危险。
我已经实现了许多这样的排序过程,并且在 Delphi 中推荐 TStringList、TList 和 TQueue 类的组合。
祝你好运
The interviewer was quite right.
The correct, scalable and fastest solution on any medium file upwards is to use an 'external sort'.
An 'External Sort' is a 2 stage process, the first stage being to split each file into manageable and sorted smaller files. The second stage is to merge these files back into a single sorted file which can then be processed line by line.
It is extremely efficient on any CSV file with over say 200,000 lines. The amount of memory the process runs in can be controlled and thus dangers of running out of memory can be eliminated.
I have implemented many such sort processes and in Delphi would recommend a combination of TStringList, TList and TQueue classes.
Good Luck