适合解析大数据文件的Java数据结构

发布于 2024-12-18 21:13:19 字数 505 浏览 3 评论 0 原文

我有一个相当大的文本文件(约 4m 行),我想解析,并且正在寻找有关存储数据的合适数据结构的建议。该文件包含如下行:

Date        Time    Value
2011-11-30  09:00   10
2011-11-30  09:15   5
2011-12-01  12:42   14
2011-12-01  19:58   19
2011-12-01  02:03   12

我想按日期对行进行分组,因此我最初的想法是使用 TreeMap> 将日期映射到其余部分行,但是 ListTreeMap 是一件可笑的事情吗?我想我可以用日期对象替换 String 键(以消除如此多的字符串比较),但我担心将 List 作为值可能不合适。

我使用 TreeMap 因为我想按日期顺序迭代键。

I have a rather large text file (~4m lines) I'd like to parse and I'm looking for advice about a suitable data structure in which to store the data. The file contains lines like the following:

Date        Time    Value
2011-11-30  09:00   10
2011-11-30  09:15   5
2011-12-01  12:42   14
2011-12-01  19:58   19
2011-12-01  02:03   12

I want to group the lines by date so my initial thought was to use a TreeMap<String, List<String>> to map the date to the rest of the line but is a TreeMap of Lists a ridiculous thing to do? I suppose I could replace the String key with a date object (to eliminate so many string comparisons) but it's the List as a value that I'm worried might be unsuitable.

I'm using a TreeMap because I want to iterate the keys in date order.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

喜你已久 2024-12-25 21:13:19

使用 List 作为 Map 的值没有任何问题。所有这些 <> 看起来都很丑,但是将泛型类放在泛型类中是完全可以的。

与其使用 String 作为键,不如使用 java.util.Date 因为键是日期。这将使 TreeMap 能够更准确地对日期进行排序。如果将日期存储为Strings,则TreeMap 可能无法正确对日期进行排序(它们将按字符串排序,而不是“真实”日期)。

Map<Date, List<String>> map = new TreeMap<Date, List<String>>();

There's nothing wrong with using a List as the value for a Map. All of those <> look ugly, but it's perfectly fine to put a generics class inside of a generics class.

Instead of using a String as the key, it would probably be better to use java.util.Date because the keys are dates. This will allow the TreeMap to more accurately sort the dates. If you store the dates as Strings, then the TreeMap may not properly sort the dates (they will be sorted as strings, not as "real" dates).

Map<Date, List<String>> map = new TreeMap<Date, List<String>>();
久伴你 2024-12-25 21:13:19

列表的 TreeMap 是一件荒谬的事情吗?

从概念上讲不是,但它的内存效率非常低(既因为 Map 也因为 List)。您看到的开销为 200% 或更多。这可能是可接受的,也可能是不可接受的,具体取决于您浪费了多少内存。

为了获得更节省内存的解决方案,请创建一个类,其中每列都有字段(包括 Date),将所有这些放入 List 中并对其进行排序(最好使用快速排序) )当你读完后。

is a TreeMap of Lists a ridiculous thing to do?

Conceptually not, but it is going to be very memory-inefficient (both because of the Map and because of the List). You're looking at an overhead of 200% or more. Which may or may not be acceptable, depending on how much memory you have to waste.

For a more memory-efficient solution, create a class that has fields for every column (including a Date), put all those in a List and sort it (ideally using quicksort) when you're done reading.

栩栩如生 2024-12-25 21:13:19

没有人反对使用列表。尽管在您的情况下,可能将 List 作为 Map 的值是合适的。

There is no objection against using Lists. Though in your case maybe a List<Integer> as values of the Map would be appropriate.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文