按需将数据从文件加载到集合的最有效方法是什么?
我正在开发一个java项目,该项目将允许用户解析可能有数千行的多个文件。解析的信息将存储在不同的对象中,然后将这些对象添加到集合中。
由于 GUI 不需要一次加载所有这些对象并将它们保留在内存中,因此我正在寻找一种从文件加载/卸载数据的有效方法,以便仅在用户请求时将数据加载到集合中。
我现在只是评估选项。我还考虑过以下情况:将数据子集加载到集合中并将其呈现在 GUI 上之后,重新加载先前观察到的数据的最佳方法是。重新运行解析器/填充集合/填充 GUI?或者可能找到一种方法将集合保存到内存中,或者序列化/反序列化集合本身?
我知道如果执行某种数据过滤,加载/卸载数据子集可能会变得很棘手。假设我根据 ID 进行过滤,因此我的新子集将包含来自之前分析的两个子集的数据。这不会有问题,因为我在内存中保留了整个数据的主副本。
我读到,谷歌集合在处理大量数据时非常有效且高效,并且提供了简化许多事情的方法,因此这可能提供了一种替代方案,让我可以将集合保留在内存中。这只是一般性的谈论。使用什么集合的问题是一个单独且复杂的问题。
您知道针对此类任务的一般建议是什么吗?我想听听您对类似情况做了什么。
如果需要,我可以提供更多细节。
I'm working on a java project that will allows users to parse multiple files with potentially thousands of lines. The information parsed will be stored in different objects, which then will be added to a collection.
Since the GUI won't require to load ALL these objects at once and keep them in memory, I'm looking for an efficient way to load/unload data from files, so that data is only loaded into the collection when a user requests it.
I'm just evaluation options right now. I've also thought of the case where, after loading a subset of the data into the collection, and presenting it on the GUI, the best way to reload the previously observed data. Re-run the parser/Populate collection/Populate GUI? or probably find a way to keep the collection into memory, or serialize/deserialize the collection itself?
I know that loading/unloading subsets of data can get tricky if some sort of data filtering is performed. Let's say that I filter on ID, so my new subset will contain data from two previous analyzed subsets. This would be no problem is I keep a master copy of the whole data in memory.
I've read that google-collections are good and efficient when handling big amounts of data, and offer methods that simplify lots of things so this might offer an alternative to allow me to keep the collection in memory. This is just general talking. The question on what collection to use is a separate and complex thing.
Do you know what's the general recommendation on this type of task? I'd like to hear what you've done with similar scenarios.
I can provide more specifics if needed.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以将数据库嵌入到应用程序中,例如 HSQLDB。这样,您可以第一次解析文件,然后使用 SQL 执行简单和复杂的查询。
You can embed a database into the application, like HSQLDB. That way you parse the files the first time and then use SQL to do simple and complex querys.
如果您有大量数据、大量文件,并且内存不足,则可以对文件进行初始扫描以对其建立索引。如果文件按换行符分为记录,并且您知道如何读取记录,则可以按字节位置索引记录。稍后,如果您想读取一组特定的索引,您将进行快速查找以找到需要读取的字节范围,并从文件的输入流中读取这些字节范围。当你不再需要这些物品时,它们将被GCed。您永远不会在堆中容纳比您需要的更多的项目。
这将是一个简单的解决方案。我相信您可以找到一个库来为您提供更多功能。
If you have tons of data, lots of files, and you are short on memory, you can do an initial scan of the file to index it. If the file is divided into records by line feeds, and you know how to read the record, you could index your records by byte locations. Later, if you wanted to read a certain set of indeces, you would do a fast lookup to find which byte ranges you need to read, and read those from the File's InputStream. When you don't need those items anymore, they will be GCed. You will never hold more items than you need into the heap.
This would be a simple solution. I'm sure you can find a library to provide you with more features.