我是否应该在 Android 应用程序上将缓冲读取更改为内存中/标记化以读取 100,000 行文件?
目前,我正在使用缓冲读取将包含 100,000 行的文本文件加载到 SortedMap 中。我是否应该放弃这种方法,而是将整个文件加载到内存中,然后通过换行标记到 SortedMap 中?请注意,我必须解析每一行以提取键并创建一个每个键的支持对象,然后将其插入到 SortedMap 中。该文件大小小于 4MB,符合 Android 内存中文件大小限制。我想知道是否值得努力切换到内存中方法,或者所获得的加速是否不值得。
另外,HashMap 会比 SortedMap 快很多吗?我只需要按键查找,并且如果需要的话可以在没有排序键的情况下生存,但如果有的话那就太好了。如果有比我正在使用的更好的结构,请告诉我,如果您有任何与此问题相关的 Android 速度提示,也请提及。
——罗施勒
Currently I am loading a text file that contains 100,000 lines into a SortedMap using buffered reads. Should I abandon this approach and instead load the entire file into memory and then tokenize by line feeds into the SortedMap? Note, I have to parse each line to extract the key and create a per-key supporting object that I then insert into the SortedMap. The file is less than 4MB in size so that fits in line with Android's in-memory file size limitations. I am wondering if it's worth the effort to switch to the in-memory approach or if the speed-up gained just isn't worth it.
Also, would a HashMap be a lot faster than a SortedMap? I only need lookup-by-key and can live without the sorted keys if necessary, but it would be nice to have around. If there is a better structure than what I am using let me know and if you have any Android speed tips related to this issue please mention those too.
-- roschler
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不清楚为什么将整个文件加载到内存中然后标记化会更简单。一次读取一行并以这种方式解析它非常简单,不是吗?虽然我完全赞成一次性加载所有内容,因为它真正使事情变得更简单,但我看不出这里会变得更加容易。
至于
SortedMap
与HashMap
- 如果没有很多哈希冲突,通常HashMap
查找的时间复杂度为 O(1),但是如果没有相等的元素,>SortedMap
查找的时间复杂度仅为 O(log n)。与对象模型中的哈希计算相比,比较的成本有多高?对于 100,000 个元素,每次查找将进行大约 16-17 次比较。最终,我不想猜测哪个会更快 - 您应该测试它,就像所有性能选项一样。也看看内存使用情况...我希望SortedMap
使用更少的内存,但我很容易错。It's unclear to me why it would be simpler to load the entire file into memory and then tokenize. Reading a line at a time and parsing it that way is pretty simple, isn't it? While I'm all for loading things all at once when it genuinely makes things simpler, I can't see that it would be significantly easier here.
As for
SortedMap
vsHashMap
- typically aHashMap
lookup is O(1) if you don't have many hash collisions, but aSortedMap
lookup is only O(log n) if there aren't equal elements. How expensive are comparisions compared with hash computations in your object model? With 100,000 elements you'll have around 16-17 comparisons per lookup. Ultimately, I wouldn't want to guess which will be faster - you should test it, as for all performance options. Look at the memory usage too... I would expect aSortedMap
to use less memory, but I could easily be wrong.