将非常大的数据集写入文本文件
我有一个非常大的数据集,目前正在将其写入文本文件(IO)。它非常慢并且导致系统消耗大量资源,因为有数十或数千行。
我想知道是否有人可以推荐一种好方法来减少我系统上的负载,或者至少平滑该过程,以避免对内存资源等的需求大幅增加。我不介意这是否意味着需要更长的时间,但只要它不给机器带来太大的负载。
I have a very large dataset which I am currently writing out to a text file (IO). It is very slow and causing the system to chew up a lot of resources as there are 10's of thousands of rows.
I'm wondering if anybody can recommend a good way to do this to reduce the load on my system or at least smooth out the process to avoid big spikes in demand for memory resources etc. I don't mind if it means it takes longer, but as long as it's not putting too much load on the machine.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
你的问题几乎没有意义,但假设你正在以块的形式从数据库读取结果,你可以将它们以块的形式写入文件,以避免将整个数据集加载到内存中,就像这样:
就内存消耗而言,这将摇滚'N'滚动和性能方面当然取决于 SQL 查询的优化和 SQL 服务器的功能。
Your question hardly makes sense, but assuming you are reading the results from database in chunks you could write them in chunks to the file to avoid loading the entire dataset in memory, just like so:
In terms of memory consumption this will Rock'N'Roll and in terms of performance it would of course depend on the optimization of your SQL query and the capabilities of your SQL server.
如果系统不依赖于此,您可以生成一个线程来执行实际写入并尝试对其进行批处理/缓冲,以最大限度地减少 CPU/内存峰值。这取决于您的具体情况,并且您没有提供太多信息:)
If the system does not depend on that, you can spawn a thread to do the actual writing and trying to batch/buffer it in order to minimize cpu/memory spikes. It would depend on your particular case and you do not give much information :)
使用
StreamWriter
编写文件 我最近不得不编写一个 300 万行的文件,它似乎工作得很好。确保您还在流中读取大量数据。Use a
StreamWriter
to write the file I recently had to write a 3 million line file and it seemed to work very well. Make sure you are also reading the large amount of data in in a stream.在这种情况下,您不应将整个数据集加载到内存中。考虑到我使用 NHibernate 作为 ORM,在这种情况下,我会小批量地从数据库读取数据,例如每个事务一次读取 100 行。这样,在任何给定时刻,我的内存将仅保存 100 行数据,而不是 100000 行数据,将这 100 行写入文件,然后再次从数据库读取接下来的 100 行并写入文件等。
查找分页。
In that case you shouldn't load the whole dataset in memory. Considering I use NHibernate as my ORM, for such cases I would read from DB in small batches like 100 rows at a time per transaction. This way at any given moment my memory will hold only 100 rows of data rather than 100000, write the 100 rows to the file then again read the next 100 rows from database and write to file etc.
Look for paging.
写入文件的解决方案之一是使用 log4Net 写入文件。
它很有效,而且不会占用太多资源。
One of solution for writing into file is using log4Net to write file.
It's effective and not suck too much resource.