我有一个相当大的文本文件(约 4m 行),我想解析,并且正在寻找有关存储数据的合适数据结构的建议。该文件包含如下行:
Date Time Value
2011-11-30 09:00 10
2011-11-30 09:15 5
2011-12-01 12:42 14
2011-12-01 19:58 19
2011-12-01 02:03 12
我想按日期对行进行分组,因此我最初的想法是使用 TreeMap>
将日期映射到其余部分行,但是 List
的 TreeMap
是一件可笑的事情吗?我想我可以用日期对象替换 String 键(以消除如此多的字符串比较),但我担心将 List
作为值可能不合适。
我使用 TreeMap
因为我想按日期顺序迭代键。
I have a rather large text file (~4m lines) I'd like to parse and I'm looking for advice about a suitable data structure in which to store the data. The file contains lines like the following:
Date Time Value
2011-11-30 09:00 10
2011-11-30 09:15 5
2011-12-01 12:42 14
2011-12-01 19:58 19
2011-12-01 02:03 12
I want to group the lines by date so my initial thought was to use a TreeMap<String, List<String>>
to map the date to the rest of the line but is a TreeMap
of List
s a ridiculous thing to do? I suppose I could replace the String key with a date object (to eliminate so many string comparisons) but it's the List
as a value that I'm worried might be unsuitable.
I'm using a TreeMap
because I want to iterate the keys in date order.
发布评论
评论(3)
使用
List
作为Map
的值没有任何问题。所有这些<>
看起来都很丑,但是将泛型类放在泛型类中是完全可以的。与其使用
String
作为键,不如使用java.util.Date
因为键是日期。这将使TreeMap
能够更准确地对日期进行排序。如果将日期存储为Strings
,则TreeMap
可能无法正确对日期进行排序(它们将按字符串排序,而不是“真实”日期)。There's nothing wrong with using a
List
as the value for aMap
. All of those<>
look ugly, but it's perfectly fine to put a generics class inside of a generics class.Instead of using a
String
as the key, it would probably be better to usejava.util.Date
because the keys are dates. This will allow theTreeMap
to more accurately sort the dates. If you store the dates asStrings
, then theTreeMap
may not properly sort the dates (they will be sorted as strings, not as "real" dates).从概念上讲不是,但它的内存效率非常低(既因为
Map
也因为List
)。您看到的开销为 200% 或更多。这可能是可接受的,也可能是不可接受的,具体取决于您浪费了多少内存。为了获得更节省内存的解决方案,请创建一个类,其中每列都有字段(包括
Date
),将所有这些放入List
中并对其进行排序(最好使用快速排序) )当你读完后。Conceptually not, but it is going to be very memory-inefficient (both because of the
Map
and because of theList
). You're looking at an overhead of 200% or more. Which may or may not be acceptable, depending on how much memory you have to waste.For a more memory-efficient solution, create a class that has fields for every column (including a
Date
), put all those in aList
and sort it (ideally using quicksort) when you're done reading.没有人反对使用列表。尽管在您的情况下,可能将
List
作为 Map 的值是合适的。There is no objection against using Lists. Though in your case maybe a
List<Integer>
as values of the Map would be appropriate.